Abstract
Metastatic colonization is an ominous feature of cancer progression. Recent studies have established the importance of pre-mRNA alternative splicing (AS) in cancer biology. However, little is known about the transcriptome-wide landscape of AS associated with metastatic colonization. Both in vitro and in vivo models of metastatic colonization were utilized to study AS regulation associated with cancer metastasis. Transcriptome profiling of prostate cancer cells and derivatives crossing in vitro or in vivo barriers of metastasis revealed splicing factors with significant gene expression changes associated with metastatic colonization. These include splicing factors known to be differentially regulated in epithelial–mesenchymal transition (ESRP1, ESRP2, and RBFOX2), a cellular process critical for cancer metastasis, as well as novel findings (NOVA1 and MBNL3). Finally, RNA-seq indicated a large network of AS events regulated by multiple splicing factors with altered gene expression or protein activity. These AS events are enriched for pathways important for cell motility and signaling, and affect key regulators of the invasive phenotype such as CD44 and GRHL1.
Implications: Transcriptome-wide remodeling of AS is an integral regulatory process underlying metastatic colonization, and AS events affect the metastatic behavior of cancer cells. Mol Cancer Res; 13(2); 305–18. ©2014 AACR.
This article is featured in Highlights of This Issue, p. 209
Introduction
A central goal of cancer research is to elucidate molecular events that contribute to the life-threatening metastasis of cancer cells. Hematogenous dissemination and metastatic colonization of distant organs is a salient feature of malignant cancers (1). An emerging theme in our understanding of this critical process of cancer progression is that epithelial-to-mesenchymal transition (EMT) may facilitate metastasis (2). EMT is a normal process in development whereby cells phenotypically switch from an epithelial state to a mesenchymal state enabling, among other things, invasive and migratory cell behavior (2). It is also considered as a key process by which epithelial tumors acquire the capability to metastasize (3). This phenotypically plastic EMT-like state is driven by global changes of the gene regulatory program at both the transcriptional and post-transcriptional levels (4–7). Transcriptome profiling studies have identified the core transcriptional regulatory network of EMT involving key transcription factors and their upstream regulators and downstream targets (4). Recent studies have also begun to reveal widespread changes in mRNA isoforms during EMT arising from pre-mRNA alternative splicing (AS; refs. 6–8).
AS is an important mechanism of generating regulatory complexity in higher eukaryotes (9). Through alternative choices of exons and splice sites during splicing, a single gene can generate functionally distinct mRNA and protein isoforms. Many genes are differentially spliced during development or among different tissues and cell types (10). Cell-type–specific AS patterns are largely controlled by a small number of master splicing regulators (“splicing factors”) with cell-type–specific gene expression or protein activity (11–13). Pioneering work by Warzecha and colleagues (14) identified the epithelial-specific splicing factors ESRP1 and ESRP2 as the master splicing regulators in epithelial cells. During EMT, the expression levels of ESRPs are greatly diminished (15). This triggers a transcriptome-wide remodeling of the AS regulatory program, resulting in the loss of epithelial-specific protein isoforms that promote cell–cell adhesion and the acquisition of mesenchymal-specific protein isoforms that promote cell migration and motility (6, 16–18). Ectopic expression or knockdown of ESRPs leads to changes in cell morphology and phenotypes that resemble the transitions between epithelial and mesenchymal cells (6, 7). Using cell culture models of EMT, a large number of EMT-associated AS changes have been identified (7). Many of these AS events are regulated by ESRPs, whereas additional splicing factors such as those from the RBFOX, MBNL, CELF, and hnRNP families may also play roles in AS regulation during EMT (7). However, it should be noted that these AS events are identified using in vitro models of EMT or by perturbing the cellular concentrations of EMT-relevant splicing factors. The transcriptome-wide landscape of AS regulation in cancer cells undergoing metastatic colonization remains to be elucidated.
To study the metastatic colonization of cancer cells, we previously derived a series of cell lines from the commonly used PC-3 prostate cancer cell line by selecting for subpopulations of PC-3 cells that crossed in vitro or in vivo barriers for metastasis (19). The PC-3 cell line was used in classic studies by Pettaway and colleagues to demonstrate that passage of these cells through mice selects for variants of greater metastatic colonization potential (20), and thus makes this an experimentally advantageous model with which to probe the determinants of metastatic colonization. Specifically, we used an in vitro transendothelial migration model to isolate variants of PC-3 cells on the basis of their ability to cross an endothelial monolayer, which is related to the process of extravasation during cancer metastasis (19). We also isolated cells from an in vivo murine metastatic colonization model, in which we injected PC-3 cells (or derivative cells) into mouse tail vein and subsequently harvested cells from metastatic tumors in distal organs. Compared with the parental PC-3 cell line, these derivative cell lines exhibited molecular and phenotypic hallmarks of EMT, including the loss of E-cadherin and the expression of mesenchymal transcription factors such as ZEB1, which is required for their enhanced invasive phenotype (19). These in vitro and in vivo models thus represent unbiased phenotypic selection experiments and allow us to comprehensively characterize transcriptome profiles, including AS events and splicing factor expression associated with metastatic colonization.
In this study, we performed comprehensive transcriptome analyses of the parental PC-3 cell line and a series of derivative cell lines to elucidate AS regulation associated with metastatic colonization. Using microarray and RNA-seq, we discovered splicing factors with significant changes in gene expression between the PC-3 and its derivative cell lines. The ESRP splicing factors had the most dramatic change in expression levels through serial passages across the in vitro and in vivo barriers of metastasis, whereas several additional splicing factors were also significantly up- or down-regulated. From the RNA-seq data, we identified nearly a thousand splice variants with significant differential AS between the parental and derivative cells, providing novel splicing signatures for cancer metastasis. Our data demonstrate that transcriptome-wide remodeling of the AS network is an integral regulatory process underlying metastatic colonization, and reveal an extensive resource of AS events that may affect the invasiveness and metastatic behavior of cancer cells.
Materials and Methods
Gene expression microarray analysis
The PC-3 cell line and its derivative cell lines were profiled for gene expression using the Affymetrix Human Genome U133 Plus 2.0 GeneChip. Microarray data were deposited into the NCBI Gene Expression Omnibus (GEO) with the accession ID GSE14405 and GSE56410. RNA samples were processed by the University of Iowa DNA Core Facility. Probe intensity values from the Affymetrix CEL files, which summarize the pixel intensities, were used for further analyses. Reading and processing of the probe intensities were done using the affy package in Bioconductor (21). Gene expression estimates (log2 scale) were obtained using the default parameters of the GCRMA algorithm from the gcrma package (22). All samples were quantile normalized as a group in GCRMA. Ensembl Gene IDs corresponding to Affymetrix probe set IDs were obtained using the biomaRt package (23, 24). The gene expression matrices were then refined by excluding control probe sets and probe sets with ambiguous gene assignments (i.e., annotated to multiple Ensembl gene IDs). For genes with multiple probe sets, we selected the probe set with the highest overall probe signal to represent its Ensembl gene. In total, we compiled 20,595 probe sets representing 20,595 unique Ensembl human genes. Principal component analysis (PCA) based on all 20,595 genes' expression levels was performed using the pcaMethods package (25). All data processing steps were carried out in the R statistical computing environment (http://www.r-project.org).
Prostate cancer cell lines and real-time qPCR analysis of splicing factors
The PC-3 cell line and its derivatives were cultured in DMEM/F12 with 10% FBS, 1% NEAA and 400 μg/mL G-418. All cell lines were grown at 37°C and 5% CO2. Total RNAs were extracted using the RNeasy Mini kit (Qiagen Science, Germantown, MD) and then subjected to reverse transcription by the High-Capacity cDNA Kit (Applied Biosystems, Foster city, CA). Quantitative real-time PCR was carried out using SYBR Green reagents. GAPDH was used as the reference gene. Relative gene expression level was measured by the comparative Ct (2−ΔΔCt) method (26). Real-time qPCR primer sequences were provided in Supplementary Table S1.
RNA sequencing and data analysis
The PC-3E and GS689.Li cell lines were used for Illumina RNA sequencing, using RNAs from three independent cell cultures per cell line. Total RNA samples with RIN (RNA integrity number) >9.5 (measured by the Agilent 2100 BioAnalyzer) were used for RNA-seq library preparation using the TruSeq RNA Sample Preparation Kit (Illumina). The 101 bp × 2 paired-end RNA-seq reads were generated on the HiSeq-2000 sequencer. The RNA-seq data were deposited into the NCBI Sequence Read Archive (accession ID SRS354082).
We mapped RNA-seq reads to the human genome (hg19) and transcriptome (Ensembl, release 65) using the software TopHat (v1.4.1; ref. 27), allowing up to 3-bp mismatches per read and up to 2-bp mismatches per 25 bp seed. We used Cufflinks (v2.1.1; ref. 28) to calculate RNA-seq based gene expression levels using the FPKM metric (fragments per kilobase of exon per million fragments mapped). We used edgeR (29) to identify differential gene expression between the two cell types under FDR < 1% and fold change >2. To identify differential AS events between the two cell types, we used rMATS (http://rnaseq-mats.sourceforge.net, version 3.0.7) to identify differential AS events corresponding to all five basic types of AS patterns (see the entire list in Supplementary Table S2). Briefly, rMATS uses a modified version of the generalized linear mixed model to detect differential AS from RNA-seq data with replicates. It accounts for exon-specific sequencing coverage in individual samples as well as variation in exon splicing levels among replicates. For each AS event, we used both the reads mapped to the splice junctions and the reads mapped to the exon body as the input for rMATS.
Gene Ontology enrichment analysis
We performed GO enrichment analysis using DAVID (30). Genes with differential AS or differential gene expression were submitted to DAVID as the gene list of interest. We used all expressed genes with average FPKM > 1 in at least one of the two cell types as the background gene list. Only GO terms with at least 10 counts (genes) in the gene list of interest were considered. Significant GO terms were defined as those with the Benjamini-corrected FDR of <0.05.
Motif enrichment analysis
We sought to identify binding sites of splicing factors and other RNA binding proteins that were significantly enriched in differential exon skipping events between the PC-3E and GS689.Li cell lines as compared with control (nonregulated) alternative exons. We collected 112 known binding sites of human RNA binding proteins including many well-characterized splicing factors from the literature (Supplementary Table S3; refs. 16, 31–33). For each motif, we scanned for motif occurrences separately in exons or their 250 bp upstream or downstream intronic sequences. For intronic sequences, we excluded the 20 bp sequence within the 3′ splice site and the 6 bp sequence within the 5′ splice site. 5,671 alternative exons without splicing changes (rMATS FDR > 50%) in highly expressed genes (FPKM > 5.0 in at least one cell type) were treated as control exons. For each motif, after we counted the number of occurrences in the differentially spliced exons and the control exons, we calculated P value for motif enrichment via the Fisher exact test (one-sided) and used Benjamini FDR correction to adjust for multiple testing for exons, upstream intronic sequences, and downstream intronic sequences separately.
Compilation of known target exons of ESRP1, RBFOX2, and NOVA1
276 ESRP1 target exons were collected from our exon microarray and RNA-seq analysis of ESRP1-regulated AS events (6, 16). 121 RBFOX2 target exons were collected from the work of Venables and colleagues which compiled a list of RBFOX2-regulated AS events based on RT-PCR and CLIP-seq studies (34). 287 NOVA1 target exons were compiled by converting the coordinates of NOVA1-regulated mouse exons (35) to the human genome.
Fluorescently labeled RT-PCR
Fluorescently labeled RT-PCR was performed to quantify exon splicing levels as described previously (36). RT-PCR primers were shown in Supplementary Table S1.
Ectopic expression of ESRP1, RBFOX2, and NOVA1 in prostate cancer cells
Murine Esrp1 coding sequence (amplified from mouse kidney cDNA) and human NOVA1 coding sequence (obtained from Open Biosystems; clone ID: 30915356) were subcloned into the pQCXIP vector. Final plasmids were confirmed by DNA Sanger sequencing (University of Iowa DNA facility, Iowa city, IA, USA). pcDNA-RBFOX2 was a gift from Dr. Douglas Black (UCLA). Based on the expression levels of these three splicing factors, we ectopically expressed Esrp1 in GS6893.Li, and RBFOX2 and NOVA1 in PC-3E. Retroviruses containing Esrp1 and NOVA1 were generated by transient transfection of retroviral vector together with a packaging plasmid pVSV-G into the packaging cell line GP2-293. The supernatant containing retroviruses was collected, filtered through 0.45-μm filter and used to infect recipient cells. After infection, cells were selected in puromycin (0.5 μg/mL) and further passaged in culture. Transient transfection of pcDNA-RBFOX2 was carried out using Lipofectamine 2000 following the manufacturer's protocol. RNA extraction was done approximately 48 hours after transfection. RT-PCR primer sequences for the 30 randomly selected exons were provided in Supplementary Table S1.
Ectopic expression of EMT-relevant transcription factors in prostate cancer cells
For ectopic expression in PC-3E cells, coding sequences of ZEB1 (purchased from Origene) and TWIST2 (amplified from cDNA of PC-3 cells) were subcloned into the pQCXIP or pLEX vector. All plasmids constructed using PCR were confirmed by DNA sequencing. pWZL-Blast-SNAI1-ER were purchased from Addgene. Retroviruses containing TWIST2 and SNAI1-ER were generated by transient transfection in packaging cell line GP2-293. Lentiviruses containing ZEB1 were produced in packaging cell line 293FT. The supernatant containing retroviruses or lentiviruses was collected, filtered through 0.45-μm filter and used to infect PC-3E cells. Infected cells were selected in puromycin (1 μg/mL) or blasticidin (20 μg/mL) and further passaged in culture.
Western blot analysis
Protein lysates were scraped and harvested in RIPA buffer. Primary antibodies reported in this study are anti–β-actin (Sigma A1978), anti-NOVA1 (a gift from Dr. Douglas Black), anti-FLAG (Sigma F3165), anti–E-cadherin (R&D System), anti-TWIST2 (Sigma, clone:3C8), anti-SNAI1 (Cell Signaling Technology; clone C15D3), anti-ZEB1 (a gift from Dr. Douglas Darling). Secondary antibodies are: goat anti-mouse IgG FITC (Millipore), Odyssey goat anti-mouse IRDye 800CW and Odyssey goat anti-rabbit IRDye 680 (Li-Cor Biosciences), and IgG Isotype (Sigma). Western blot analysis was performed following the standard protocol. Signals were detected by the Odyssey Infrared Imager (LI-COR Biosciences).
Survival analysis of The Cancer Genome Atlas data
The RNA-seq data of 13 cancer types were downloaded from The Cancer Genome Atlas (TCGA) data portal (https://tcga-data.nci.nih.gov/tcga/). We obtained the RNA-seq data of 877 patients with breast cancer (BRCA), 152 patients with glioblastoma (GBM), 380 patients with lung adenocarcinoma (LUAD), 353 patients with lung squamous cell carcinoma (LUSC), 469 patients with clear cell renal cell carcinoma (ccRCC), 260 patients with ovarian cancer (OV), 149 patients with rectum adenocarcinoma (READ), 156 patients with prostate cancer (PRAD), 390 patients with colon adenocarcinoma (COAD), 26 patients with cutaneous melanoma (SKCM), 466 patients with thyroid cancer (THCA), 303 patients with head and neck squamous cell carcinoma (HNSC), and 441 patients with uterine corpus endometrial carcinoma (UCEC). The RNA-seq data were mapped to genes and gene expression levels were calculated by the TCGA consortium. We obtained the expression levels of ESRP1 from the processed gene expression levels provided by TCGA. To test the association of ESRP1 expression level with patient survival, for each cancer type unsupervised 2-means clustering based on ESRP1 expression levels was performed to segregate patients into two distinct groups, one group with high expression of ESRP1 and the other group with low expression of ESRP1. The Kaplan–Meier survival estimator was plotted for the two patient groups in ccRCC and breast cancer separately. Cox regression was used to calculate the P value of association between gene expression and survival.
Results
Altered expression of pre-mRNA splicing factors in models of metastatic colonization
Tissue invasion and metastasis are a hallmark of cancer (1). To study transcriptome changes associated with metastasis, we previously used in vitro and in vivo models of metastatic colonization to derive a series of cell lines from the PC-3 prostate cancer cell line (Fig. 1A). Specifically, starting from the parental PC-3 cell line, we utilized an in vitro transendothelial migration (TEM) model to derive cell lines after single passage (TEM2-5, TEM3-8) or double passage (TEM4-18) of PC-3 cells through an endothelial monolayer (19). We also injected the PC-3 or TEM4-18 cells into the tail vein of mice and harvested cells through single or double rounds of selection in vivo from metastatic tumors in distal organs, such as left axillary lymph node (denoted as LALN), lung (Lu), urogenital (Ug), kidney (Ki), left adrenal (LAd), liver (Li). Of note, these derived cell lines have enhanced ability to migrate through the endothelial monolayer and are more aggressive in the murine metastatic colonization model in vivo, with cell lines derived from in vivo passage having a somewhat greater metastatic colonization potential than those derived in vitro by selection for transendothelial migration (19). A detailed schematic diagram describing the derivation and relationships of these cell lines is shown in Fig. 1A.
Microarray analysis of global gene expression profiles of PC-3 and derivative cell lines. A, a schematic diagram describing the generation of the PC-3 derivative cell lines using the in vitro transendothelial migration model or the in vivo murine metastatic colonization model. The relationship between the parental PC-3 cell line and all derivative cell lines is depicted. Derivative cell lines selected for gene expression microarray analysis are indicated by boxes. B, PCA of global gene expression profiles of PC-3 and derivative cell lines. C, gene expression levels of 60 well-known splicing factors. The six splicing factors with significant expression differences between PC-3 and derivative cell lines are highlighted. D, real-time qPCR results of six splicing factors. GAPDH was used as the reference gene. Relative gene expression level was normalized to PC-3 cells. Error bar, SEM of technical triplicates.
Microarray analysis of global gene expression profiles of PC-3 and derivative cell lines. A, a schematic diagram describing the generation of the PC-3 derivative cell lines using the in vitro transendothelial migration model or the in vivo murine metastatic colonization model. The relationship between the parental PC-3 cell line and all derivative cell lines is depicted. Derivative cell lines selected for gene expression microarray analysis are indicated by boxes. B, PCA of global gene expression profiles of PC-3 and derivative cell lines. C, gene expression levels of 60 well-known splicing factors. The six splicing factors with significant expression differences between PC-3 and derivative cell lines are highlighted. D, real-time qPCR results of six splicing factors. GAPDH was used as the reference gene. Relative gene expression level was normalized to PC-3 cells. Error bar, SEM of technical triplicates.
To characterize the transcriptome profiles of these cell lines, we carried out global gene expression profiling of the parental PC-3 cell line and its derivative cell lines (boxed in Fig. 1A) using the Affymetrix Human Genome U133 Plus 2.0 gene expression microarray, with multiple replicates for some of the cell lines. We used principal component analysis (PCA) to compare and contrast the global gene expression profiles among these cell lines. As shown in Fig. 1B, the PCA analysis revealed a clear separation between the parental PC-3 cell line and other derivative cell lines. This pattern was consistent with our previous report that all derivative cell lines showed molecular and phenotypic features of EMT as compared with PC-3 (19). Additionally, we note that these derivative cell lines could be further classified on the basis of their gene expression profiles in a way that matched their number of passages and relationships in the in vitro or in vivo model of metastatic colonization. The two cell lines derived in vitro (TEM2-5, TEM4-18) were clustered, along with two additional cell lines (GS683.LALN, JD1203.Lu) derived from the TEM4-18 cell line after single round of selection in vivo. Similarly, the two cell lines from two rounds of selection in vivo (GS694.LAd and GS689.Li) that shared the same immediate parental cell line JD549.Ki were clustered together, despite the fact that they were isolated from metastatic tumors in different organs (left adrenal and liver, respectively). Taken together, this PCA analysis suggests that the global expression profiles of these cell lines reflect an intrinsic transcriptome signature of metastatic colonization rather than the specific sites of the secondary metastatic tumors.
To investigate the possibility of a transcriptome-wide remodeling of splicing regulation during metastatic colonization, we next sought to identify splicing factors with significant differences in gene expression levels between the PC-3 and its derivative cell lines. We compiled a list of 60 splicing factors with well-established roles in AS regulation (37). Requiring at least 2-fold change in gene expression levels between the parental PC-3 cell line and the average of all derivative cell lines, we identified six splicing factors (ESRP1, ESRP2, MBNL3, NOVA1, RBFOX2, RBM39) with significant increase or decrease in gene expression associated with metastatic colonization (Fig. 1C). Real-time qPCR analysis using RNAs from independent cell cultures confirmed the significant expression change of 5 of these 6 splicing factors, with the only exception being RBM39 which had a fairly modest expression difference between the PC-3 and derivative cells according to qPCR (Fig. 1D). The most striking pattern was observed for the epithelial-specific splicing factor ESRP1. Its expression level was high in the parental PC-3 cell line, dropped by 17-fold in the TEM2-5 cell line after one round of selection in vitro, and was almost completely nonexpressed (i.e., with its microarray signal at the background level) after two rounds of selection in vitro and in cell lines selected in vivo (Fig. 1C). Its close homolog ESRP2 showed a similar (although less dramatic) trend. These observations are consistent with the key role of ESRP1 and ESRP2 in maintaining epithelial-specific AS patterns, and suggest that the downregulation of the ESRP splicing network is a key feature of cancer cells with greater metastatic colonization potential. We also observed the upregulation or downregulation of three other splicing factors (RBFOX2, NOVA1, and MBNL3). These include RBFOX2 which has been implicated in regulating mesenchymal AS patterns during EMT (7, 8, 38), as well as novel findings (NOVA1, MBNL3). As each splicing factor may regulate AS of hundreds or even thousands of genes, the substantial change in gene expression levels of these splicing factors suggests the global remodeling of the AS regulatory network associated with metastatic colonization.
Transcriptome-wide changes in pre-mRNA AS associated with metastatic colonization
To discover and characterize AS changes associated with metastatic colonization, we selected one of the in vivo selected cell lines GS689.Li for comparison to the E-cadherin-positive fraction of PC-3 (referred to as PC-3E in this article). PC-3E cells were derived from the parental PC-3 population by fluorescence-activated cell sorting for E-cadherin-positive cells (19). PC-3E constitutes approximately 75% of the PC-3 cells and contains cells that lack the ability to cross the metastatic barrier. We collected RNAs from three independent cell cultures of both PC-3E and GS689.Li. As a pilot analysis, we took 10 ESRP-regulated EMT signature exons (6) and used RT-PCR to quantify their inclusion levels in these two cell types. Except for one exon of the gene RALGPS2 with poor RT-PCR amplification, we observed significant changes of 9 exons between these two cell lines (Fig. 2A). These splicing switches are consistent with the mesenchymal features of GS689.Li (19) and indicate that the global AS regulatory program is significantly different in GS689.Li as compared with the PC-3E cells.
Comparison of AS profiles between the PC-3E and GS689.Li cell lines. A, RT-PCR–based exon inclusion levels of nine EMT signature exons in PC-3E and GS689.Li. Left, fluorescently labeled RT-PCR gel images; the estimated exon inclusion levels are indicated below each gel picture. Right, heatmap of nine exons' exon inclusion levels. B, flowchart of RNA-seq analysis of PC-3E and GS689.Li. C, summary of differential AS events detected between PC-3E and GS689.Li.
Comparison of AS profiles between the PC-3E and GS689.Li cell lines. A, RT-PCR–based exon inclusion levels of nine EMT signature exons in PC-3E and GS689.Li. Left, fluorescently labeled RT-PCR gel images; the estimated exon inclusion levels are indicated below each gel picture. Right, heatmap of nine exons' exon inclusion levels. B, flowchart of RNA-seq analysis of PC-3E and GS689.Li. C, summary of differential AS events detected between PC-3E and GS689.Li.
To globally compare AS patterns between the GS689.Li and PC-3E cell lines, we performed deep RNA sequencing (RNA-seq) with three biological replicates for each cell line (Fig. 2B). In total, 114 to 132 million 101bp x 2 paired-end reads were generated per RNA sample, with approximately 80% of reads mapped uniquely to the human genome and transcriptome (hg19). From the RNA-seq data, we used our rMATS algorithm (see Materials and Methods) to detect differential AS events between these two cell lines corresponding to five basic types of AS patterns (SE: skipped exon, A5SS: alternative 5′ splice site, A3SS: alternative 3′ splice site, MXE: mutually exclusive exons, and RI: retained intron). In total, 129,683 AS events were analyzed and rMATS was used to calculate the P value and false discovery rate (FDR) that the average splicing level (denoted as ψ, or percent-spliced-in (39)) differed by more than 5% between these two cell types. Under FDR < 0.05, we identified 866 differential AS events in 494 genes (Supplementary Table S2). Of these events, the largest category of AS patterns was skipped exon (557), followed by mutually exclusive exons (183), retained intron (72), alternative 5′ splice site (28), and alternative 3′ splice site (26) (Fig. 2C and Fig. 3A). The RNA-seq data and RT-PCR validation of selected AS events are shown in Fig. 3B and C. Of note, among these 866 events, 353 (41%) events showed a change of percent-spliced-in (ψ) of more than 25% (∣Δψ∣>25%), including 82 (9.5%) events that had “switch-like” AS patterns (40, 41) with the average ψ <34% in one cell line and >66% in the other cell line. These data indicate that numerous genes have a significant shift in isoform usage between these two phenotypically distinct cell types.
Differential AS between PC-3E and GS689.Li preferentially targets pathways important for cell motility and signaling. A, GO enrichment analysis of differential exon skipping events. All significant GO terms (Benjamini-corrected FDR < 5%) in the GO BP (“biological process”), MF (“molecular function”) and CC (“cellular component”) categories are listed in the bar graph. SE, skipped exon; A3SS and A5SS, alternative 3′ or 5′ splice sites; MXE, mutually exclusive exons; RI, retained intron. B, RNA-seq read density plot indicates a metastasis-relevant isoform switch of CD44 between the two cell types. C, RT-PCR gel images of selected differential AS events. The RT-PCR–estimated exon inclusion levels are indicated below each gel picture.
Differential AS between PC-3E and GS689.Li preferentially targets pathways important for cell motility and signaling. A, GO enrichment analysis of differential exon skipping events. All significant GO terms (Benjamini-corrected FDR < 5%) in the GO BP (“biological process”), MF (“molecular function”) and CC (“cellular component”) categories are listed in the bar graph. SE, skipped exon; A3SS and A5SS, alternative 3′ or 5′ splice sites; MXE, mutually exclusive exons; RI, retained intron. B, RNA-seq read density plot indicates a metastasis-relevant isoform switch of CD44 between the two cell types. C, RT-PCR gel images of selected differential AS events. The RT-PCR–estimated exon inclusion levels are indicated below each gel picture.
Metastasis-associated AS events are enriched in pathways important for cell motility and signaling
To explore gene functional groups preferentially influenced by AS events associated with metastatic colonization, we performed Gene Ontology (GO) enrichment analysis of genes with differential AS between PC-3E and GS689.Li. We identified 9 significantly enriched GO terms at FDR<5% (Fig. 3A). Genes with these enriched GO terms are listed in Supplementary Table S4. We found several enriched GO terms related to the regulation of cell motility and migration, such as cytoskeletal protein binding, actin binding, GTPase binding, and cell leading edge, suggesting the role of these AS events in regulating cellular processes related to cell invasion and migration. For example, our RNA-seq data revealed significant isoform switch of CD44 from the CD44v (variant) isoforms in PC-3E to the CD44s (standard) isoforms in GS689.Li (Fig. 3B). This isoform switch is known to occur during EMT and is associated with enhanced invasiveness and aggressiveness of cancer cells (42–44).
We also found a significant enrichment (FDR = 0.048) of genes involved in the transmembrane receptor tyrosine kinases (RTK) pathway (Fig. 3A), which plays a key role in cancer development and progression (45). Of 19 differential AS events in the RTK pathway (Table 1), 18 were in protein-coding regions and one (in AKT1) was in the 5′-UTR. These include several AS events previously demonstrated to have functional impact. For example, the insulin receptor (IR) gene INSR has two isoforms (IR-A and IR-B), resulting from the AS of exon 11. IR-B is the full-length mRNA isoform with exon 11 included. It is highly expressed in liver and is reported to be the major INSR isoform involved in insulin metabolism, whereas IR-A which lacks exon 11 plays a role in promoting cell proliferation (46–48). Here, the average exon inclusion level of INSR exon 11 was 54% in PC-3E but dropped significantly to 33% in GS689.Li (Fig. 3C). In other words, the ratio of the proliferative isoform (IR-A) over the metabolic isoform (IR-B) increased significantly in cells with greater metastatic colonization potential. Interestingly, an increased ratio of IR-A over IR-B was independently observed in clinical tissue specimens of breast cancer, prostate cancer, and liver cancer (49–51), suggesting the functional and clinical relevance of this AS event in cancer. FLNA encodes an actin-binding protein that regulates cell shape and migration through cytoskeleton remodeling (52–55). An in-frame alternatively spliced exon (exon 30) of FLNA encodes eight amino acids within one filamin repeat domain with a crucial role in remodeling the cytoskeleton. We found that GS689.Li had a significantly higher inclusion level of this exon as compared with PC-3E (67% versus 46%, see Fig. 3C), indicating a higher percentage of the full-length protein isoform with intact filamin repeat domain in GS689.Li. Interestingly, we also observed a higher gene expression level of FLNA in GS689.Li (fold change over PC-3E = 1.43, FDR = 0.0005), suggesting that cells underwent concurrent changes in gene expression and AS to produce more full-length FLNA in metastatic cancer cells. Another example was the exon 6 of Platelet-derived growth factor A (PDGFA). PDGFA is a mitogen which affects tumor growth and drives pathological mesenchymal response in various diseases (56–58). Previous studies showed that the inclusion of PDGFA exon 6 primarily occurred in epithelial tissues (57). Consistent with this previous observation, we found a significantly higher exon inclusion level of PDGFA exon 6 in PC-3E as compared with GS689.Li (41% versus 6%, Fig. 3C). Vascular endothelial growth factor A (VEGFA) is another member of the PDGF/VEGF growth factor family. VEGFA induces angiogenesis and promotes cell migration (59–61). The skipping of VEGFA exon 7 was increased in GS689.Li (Table 1), and it has been reported that the skipping of VEGFA exon 7 promotes tumor angiogenesis of human prostate cancer (62). Given the importance of the RTK pathway in cancer cell signaling, it is possible that coordinated AS changes in the RTK pathway affect key aspects of gene regulation and signaling pathways of metastatic cancer cells. Importantly, of the 18 differential AS events in the protein-coding regions of these RTK-related genes, 15 (83%) involve an in-frame alternative exon with an exact multiple of 3 nt in length. This percentage is much higher than the transcriptome-wide average (40, 63), suggesting that these metastasis-associated AS events produce functional protein isoforms in distinct cell types.
Differential AS events involved in the transmembrane receptor protein tyrosine kinase signaling pathway
Gene symbol . | Gene name . | Exon coordinates (hg19) . | Exon size . | Region . | ΔΨ(GS689.Li – PC-3E) by RNA-seq . |
---|---|---|---|---|---|
PDGFA | Platelet-derived growth factor alpha polypeptide | −chr7:540067–540136 | 69 | CDS + 3′UTR | −49% |
INSR | Insulin receptor | −chr19:7150507–7150543 | 36 | CDS | −48% |
BAIAP2 | BAI1-associated protein 2 | +chr17:79084713–79084759 | 46 | CDS + 3′UTR | −48% |
SORBS1 | Sorbin and SH3 domain containing 1 | −chr10:97174250–97174619 | 369 | CDS | −48% |
TSC2 | Tuberous sclerosis 2 | +chr16:2127598–2127727 | 129 | CDS | −44% |
ERBB2IP | Erbb2 interacting protein | +chr5:65364704–65364848 | 144 | CDS | 39% |
ERBB2IP | Erbb2 interacting protein | +chr5:65367996–65368119 | 123 | CDS | −35% |
GHR | Growth hormone receptor | +chr5:42629139–42629205 | 66 | CDS | −30% |
ABI1 | Abl-interactor 1 | −chr10:27044583–27044670 | 87 | CDS | 29% |
FLNA | Filamin A, alpha (actin binding protein 280) | −chrX:153585618–153585642 | 24 | CDS | 25% |
PTK2 | PTK2 protein tyrosine kinase 2 | −chr8:141679825–141679834 | 9 | CDS | 24% |
VEGFA | Vascular endothelial growth factor A | +chr6:43749692–43749824 | 132 | CDS | −24% |
PXN | Paxillin | −chr12:120653362–120653464 | 102 | CDS | −21% |
AKT1 | V-akt murine thymoma viral oncogene homolog 1 | −chr14:105259463–105259547 | 84 | 5′UTR | −20% |
SOCS7 | Suppressor of cytokine signaling 7 | +chr17:36520634–36520739 | 105 | CDS | 19% |
MPZL1 | Myelin protein zero-like 1 | +chr1:167745300–167745403 | 103 | CDS | −17% |
ARF4 | ADP-ribosylation factor 4 | −chr3:57569624–57569734 | 110 | CDS | −15% |
RAPGEF1 | Rap guanine nucleotide exchange factor (GEF) 1 | −chr9:134479347–134479440 | 93 | CDS | −12% |
PXN | Paxillin | −chr12:120657009–120657894 | 885 | CDS | 11% |
Gene symbol . | Gene name . | Exon coordinates (hg19) . | Exon size . | Region . | ΔΨ(GS689.Li – PC-3E) by RNA-seq . |
---|---|---|---|---|---|
PDGFA | Platelet-derived growth factor alpha polypeptide | −chr7:540067–540136 | 69 | CDS + 3′UTR | −49% |
INSR | Insulin receptor | −chr19:7150507–7150543 | 36 | CDS | −48% |
BAIAP2 | BAI1-associated protein 2 | +chr17:79084713–79084759 | 46 | CDS + 3′UTR | −48% |
SORBS1 | Sorbin and SH3 domain containing 1 | −chr10:97174250–97174619 | 369 | CDS | −48% |
TSC2 | Tuberous sclerosis 2 | +chr16:2127598–2127727 | 129 | CDS | −44% |
ERBB2IP | Erbb2 interacting protein | +chr5:65364704–65364848 | 144 | CDS | 39% |
ERBB2IP | Erbb2 interacting protein | +chr5:65367996–65368119 | 123 | CDS | −35% |
GHR | Growth hormone receptor | +chr5:42629139–42629205 | 66 | CDS | −30% |
ABI1 | Abl-interactor 1 | −chr10:27044583–27044670 | 87 | CDS | 29% |
FLNA | Filamin A, alpha (actin binding protein 280) | −chrX:153585618–153585642 | 24 | CDS | 25% |
PTK2 | PTK2 protein tyrosine kinase 2 | −chr8:141679825–141679834 | 9 | CDS | 24% |
VEGFA | Vascular endothelial growth factor A | +chr6:43749692–43749824 | 132 | CDS | −24% |
PXN | Paxillin | −chr12:120653362–120653464 | 102 | CDS | −21% |
AKT1 | V-akt murine thymoma viral oncogene homolog 1 | −chr14:105259463–105259547 | 84 | 5′UTR | −20% |
SOCS7 | Suppressor of cytokine signaling 7 | +chr17:36520634–36520739 | 105 | CDS | 19% |
MPZL1 | Myelin protein zero-like 1 | +chr1:167745300–167745403 | 103 | CDS | −17% |
ARF4 | ADP-ribosylation factor 4 | −chr3:57569624–57569734 | 110 | CDS | −15% |
RAPGEF1 | Rap guanine nucleotide exchange factor (GEF) 1 | −chr9:134479347–134479440 | 93 | CDS | −12% |
PXN | Paxillin | −chr12:120657009–120657894 | 885 | CDS | 11% |
Splicing factors responsible for AS regulation associated with metastatic colonization
The widespread differences in AS between PC-3E and GS689.Li raised a question about the mechanisms for AS switches associated with metastatic colonization. To address this question, we investigated which splicing factors may be responsible for these AS differences, by identifying enriched binding sites of splicing factors and other RNA binding proteins around the differentially spliced exons. We focused this motif analysis on alternatively spliced exons involved in exon skipping events. From the rMATS results, after removing redundancies we obtained a list of 424 unique alternative exons with significant differential exon skipping between the two cell types. Of them, 221 and 203 exons had significantly lower or higher exon inclusion in GS689.Li, respectively. We also compiled a control list of 5,671 alternatively spliced exons without splicing differences between the two cell types (see details in Materials and Methods). Using a comprehensive list of 112 binding sites of human RNA binding proteins, including many well-characterized splicing factors (Supplementary Table S3), we scanned for motif occurrences separately in exons or their 250nt upstream or downstream intronic sequences and identified significantly enriched motifs in GS689.Li downregulated or upregulated exons as compared with the control (nonregulated) alternative exons.
Of all 112 motifs analyzed, six motifs of well-known splicing factors (PTBP1, SRSF3, RBFOXs, QKI, ESRPs and MBNLs) had a Benjamini-adjusted P value of < 0.05 within or around exons downregulated in GS689.Li, whereas four motifs (corresponding to MBNLs, PTBP1, U2AF2 and RBFOXs) were enriched in upstream or downstream intronic sequences of exons upregulated in GS689.Li (Fig. 4A). Of note, the binding site of ESRPs (UGGUGG) was significantly enriched in the downstream intronic regions of GS689.Li downregulated exons. Given that ESRP expression was largely diminished in the GS689.Li cell line, this motif pattern is consistent with the literature that exons with downstream intronic binding sites of ESRPs generally are positively regulated by ESRPs (16). Similarly, the binding motif of the RBFOX family of splicing factors (WGCAUGM) was significantly enriched within GS689.Li downregulated exons and in the downstream intronic regions of GS689.Li upregulated exons. This effect was most likely mediated by RBFOX2, which had a higher expression level in the GS689.Li cell line (Figs. 1 and 4B). Moreover, the positions of the motif enrichment matched the well-known position-dependent effects of RBFOX2 on its target alternative exons (64).
Splicing factors responsible for AS regulation associated with metastatic colonization. A, significantly enriched binding sites of splicing factors and other RNA binding proteins in differential exon skipping events between PC-3E and GS689.Li. Gene symbols of splicing factors are followed by the consensus binding motifs and their Benjamini-adjusted P values. Y: C/U; K: G/U; W: A/U; H: U/C/A; V: A/C/G; M: A/C. B, scatter plot of 221 RNA binding proteins' average gene expression levels (log2 FPKM) in PC-3E and GS689.Li. Splicing factors with significant gene expression change are colored in red, while splicing factors whose binding motifs are significantly enriched are colored in blue. Some splicing factors belong to both categories. The inset image shows the western blot confirmation of NOVA1 protein expression levels. C, the number of known target exons of ESRP1, RBFOX2, or NOVA1 among differential exon skipping events between PC-3E and GS689.Li. D, heatmap showing the effects on 30 randomly selected metastasis-associated skipped exons by ectopic expression of ESRP1, NOVA1, and RBFOX2. The Δψ values between PC-3E and GS689.Li or upon ectopic expression of a specific splicing factor are color coded for three individual replicates and the average of all three replicates. Significant splicing changes are highlighted by the bold box. EV, empty vector (control).
Splicing factors responsible for AS regulation associated with metastatic colonization. A, significantly enriched binding sites of splicing factors and other RNA binding proteins in differential exon skipping events between PC-3E and GS689.Li. Gene symbols of splicing factors are followed by the consensus binding motifs and their Benjamini-adjusted P values. Y: C/U; K: G/U; W: A/U; H: U/C/A; V: A/C/G; M: A/C. B, scatter plot of 221 RNA binding proteins' average gene expression levels (log2 FPKM) in PC-3E and GS689.Li. Splicing factors with significant gene expression change are colored in red, while splicing factors whose binding motifs are significantly enriched are colored in blue. Some splicing factors belong to both categories. The inset image shows the western blot confirmation of NOVA1 protein expression levels. C, the number of known target exons of ESRP1, RBFOX2, or NOVA1 among differential exon skipping events between PC-3E and GS689.Li. D, heatmap showing the effects on 30 randomly selected metastasis-associated skipped exons by ectopic expression of ESRP1, NOVA1, and RBFOX2. The Δψ values between PC-3E and GS689.Li or upon ectopic expression of a specific splicing factor are color coded for three individual replicates and the average of all three replicates. Significant splicing changes are highlighted by the bold box. EV, empty vector (control).
We correlated the enriched motifs with the gene expression profiles of their corresponding splicing factors (Fig. 4B). As expected, several enriched motifs corresponded to splicing factors with significant differential gene expression between these two cell types, such as ESRP1/2, MBNL3, and RBFOX2. We also identified splicing factors whose motifs were significantly enriched but the splicing factors themselves were not differentially expressed. Examples include PTBP1, SRSF3, and QKI, which have been previously linked to AS regulation in cancer (65–70). It is possible that the functional activities of these splicing factors are altered at the post-transcriptional or post-translational levels, without significant changes in their steady-state gene expression levels. Lastly, our RNA-seq data also revealed significant differential expression of additional splicing factors, despite the fact that their binding sites did not show up among the enriched motifs. An intriguing example is NOVA1, whose gene expression level was significantly higher in GS689.Li (fold change = 13, FDR = 8.24e−121, see Fig. 4B; also see Fig. 1 for the microarray and qPCR data of NOVA1). We further confirmed the upregulation of NOVA1 protein expression in GS689.Li using western blot analysis (Fig. 4B, inset). NOVA1 has traditionally been studied as an important neuronal-specific splicing factor (33, 71), and little is known about NOVA1 in nonneural tissues. In future studies it would be interesting to elucidate the endogenous targets and functional relevance of NOVA1 in metastatic cancer cells. Collectively, our motif enrichment analysis in conjunction with the expression analysis of splicing factors suggests a complex network of multiple splicing factors that drives AS changes associated with metastatic colonization.
We performed additional analyses and experiments to investigate the functional association between the AS events and specific splicing factors. As ESRP1, RBFOX2, and NOVA1 were among the most differentially expressed splicing factors between PC-3E and GS689.Li and the transcriptome-wide targets of these splicing factors were characterized in a series of published work, we asked which exons identified in our study were known targets of ESRP1, RBFOX2, and NOVA1. To do this, we collected from the literature 276 ESRP1 target exons, 121 RBFOX2 target exons, and 287 NOVA1 target exons (see details in Materials and Methods). Of the 424 unique skipped exons identified in our study, 77, 42, and 10 exons were known targets of ESRP1, RBFOX2, and NOVA1 (Fig. 4C), representing a statistically significant overlap with the targets of all three splicing factors (Fisher exact test P value = 7.5e−53 for ESRP1, 8.7e−35 for RBFOX2, and 0.0001 for NOVA1 respectively). The overlap with known NOVA1 targets was the lowest among the three splicing factors. This was not surprising, as previous studies of NOVA1 exon targets were conducted exclusively in neuronal tissues (35). Furthermore, to experimentally test the functional associations between the AS events and the variation in the levels of specific splicing factors in our prostate cancer cell lines, we ectopically expressed ESRP1, RBFOX2, and NOVA1 in PC-3E or GS689.Li cells (Materials and Methods), and tested by RT-PCR if the endogenous splicing levels of specific AS events were affected. We randomly tested 30 metastasis-associated AS events, including several from the RTK pathway (Fig. 3C), without using any prior information about the known targets or binding sites of these splicing factors. This allowed us to assess in an unbiased manner what fraction of metastasis-associated AS events were influenced by the levels of these three splicing factors. We performed the experiments in triplicates and defined an exon as being regulated by a splicing factor if all three replicates showed the same direction of splicing change and an average change of exon inclusion levels of greater than 5%. Of these 30 exons, 12 (40%), 4 (13%), and 10 (33%) were affected by ectopic expression of ESRP1, RBFOX2, and NOVA1 respectively (Fig. 4D). Four exons were affected by two of the three splicing factors and another one was affected by all three. This suggests that multiple splicing factors may coordinately regulate the splicing of certain metastasis-associated AS events. For example, the splicing level of the GRHL1 exon was strongly enhanced by ESRP1, whereas both RBFOX2 and NOVA1 had measurable repressive effects on this exon. In total, up to two thirds (20 of 30) of tested exons were regulated by at least one of these three splicing factors. As expected, ESRP1 had the broadest and most potent effects on these AS events. By contrast, RBFOX2 had the weakest effects on these randomly selected exons, consistent with the fact that it had the least dramatic expression change among these three splicing factors between PC-3E and GS689.Li. Of note, a third of these randomly selected exons were influenced by variation in NOVA1 levels. This underscores the significant differential expression of NOVA1 between PC-3E and GS689.Li and the likely functional relevance of NOVA1 to the metastasis-associated AS program.
ESRP1 as a prognostic marker of patient survival for certain cancer types
To explore the potential clinical significance of our findings, we asked whether the expression level of ESRP1 could serve as a molecular or prognostic marker of human cancers. Specifically, we tested the association of ESRP1 expression with patient survival using TCGA RNA-seq data across 13 types of cancers (see details in Materials and Methods). In two cancer types, we found a significant positive association between ESRP1 expression and patient survival [Cox regression model, P = 0.018 for ccRCC, and P = 0.009 for breast cancer]. For example, in ccRCC, although the vast majority of patients had low expression of ESRP1, we found a subgroup (approximately 5%) of patients with ccRCC with high expression levels of ESRP1, and these patients had significantly better prognosis (Fig. 5A). In both cancer types, the expression level of ESRP1 may mark specific subtypes of cancer with epithelial characteristics and less aggressive clinical behaviors (6, 72).
ESRP1 gene expression level as molecular markers of cancer specimens. A, Kaplan–Meier survival curves of two patient subgroups with low or high ESRP1 expression levels in ccRCC and breast cancer. Unsupervised 2-means clustering based on ESRP1 expression levels was performed to segregate patients into two subgroups. Cox regression was used to calculate the P value of association between ESRP1 expression level and patient survival. The inset image shows the dot plot of ESRP1 expression levels in individual patients from the two subgroups, with the grey line indicating the average gene expression level in each subgroup. TPM, transcripts per million. B, real-time qPCR analysis of ESRP1 expression levels in laser-capture microdissected tumor/benign samples from 16 patients with prostate cancer. GAPDH was used as the reference gene. All tumor samples were normalized to their patient-matched benign tissues. The bar plot on the right was generated by pooling all prostate cancer samples with different Gleason grades and all patient-matched benign samples. Error bar, SEM.
ESRP1 gene expression level as molecular markers of cancer specimens. A, Kaplan–Meier survival curves of two patient subgroups with low or high ESRP1 expression levels in ccRCC and breast cancer. Unsupervised 2-means clustering based on ESRP1 expression levels was performed to segregate patients into two subgroups. Cox regression was used to calculate the P value of association between ESRP1 expression level and patient survival. The inset image shows the dot plot of ESRP1 expression levels in individual patients from the two subgroups, with the grey line indicating the average gene expression level in each subgroup. TPM, transcripts per million. B, real-time qPCR analysis of ESRP1 expression levels in laser-capture microdissected tumor/benign samples from 16 patients with prostate cancer. GAPDH was used as the reference gene. All tumor samples were normalized to their patient-matched benign tissues. The bar plot on the right was generated by pooling all prostate cancer samples with different Gleason grades and all patient-matched benign samples. Error bar, SEM.
It should be noted that the ability to detect association with patient survival depends on a variety of factors, such as sample size, follow-up time, and mortality rate. For example, in the TCGA cohort of prostate cancer, the mortality rate is very low (0.6%) within the follow-up time, making it almost impossible to identify survival signals. To assess ESRP1 expression levels in prostate cancer, we analyzed laser-capture microdissected (LCM) samples from 16 patients with prostate cancer. Real-time qPCR was performed to compare ESRP1 expression levels in tumor specimens and patient-matched benign tissues. As shown in Fig. 5B, except for one patient (patient 13), ESRP1 was consistently down regulated in prostate cancer specimens. Of note, recent studies have reported the functional roles of ESRP1 in breast cancer and prostate cancer cells (6, 7). In pancreatic ductal adenocarcinoma, ESRP1 inhibits cancer cell migration and metastasis in vitro and in vivo, and high ESRP1 expression level is associated with favorable patient survival (73). Here, our results corroborate the role of ESRP1 in cancer progression and indicate the value of ESRP1 as a prognostic marker for multiple types of cancers.
Discussion
In this work, we performed transcriptome analyses of cancer cell lines derived from in vitro and in vivo models of metastatic colonization to study AS events and splicing factors associated with metastatic cancer cells. Although previous studies have surveyed cell-type–specific AS during EMT (6, 7), no study to date has investigated transcriptome-wide changes in AS in cancer cells under phenotypic selection for metastatic colonization. The panel of PC-3 and derivative cell lines crossing in vitro or in vivo barriers of metastasis thus provides an excellent and unbiased system for us to comprehensively characterize AS events and identify key splicing factors that affect splicing regulation underlying metastasis. Our data confirm the important role of several known EMT-relevant splicing factors (ESRPs, RBFOX2) in determining the splicing profiles of metastatic cancer cells. The most striking pattern was observed for ESRP1, which had dramatic stepwise reduction in gene expression levels as cells went through multiple rounds of selection across the in vitro endothelial migration barrier. Meanwhile, our data also indicate that ESRPs alone only account for a minor fraction of AS switches observed in the PC-3 derivative cell lines, as only 18% of differential exon skipping events were known ESRP targets and 32% of differential exon skipping events contained putative ESRP binding sites within the exons or flanking intronic sequences. Indeed, our motif analysis revealed significant enrichment of binding sites of other splicing factors such as those from the MBNL and PTB families of splicing factors. Some of these splicing factors (such as MBNL3) were differentially expressed between the two cell types. Others (such as PTBP1, SRSF3, and QKI) may be regulated at the post-transcriptional or post-translational levels to produce differential splicing factor activities between the two cell types. Our RNA-seq data also revealed significant differential expression of several less-characterized RNA binding proteins, such as BICC1 and RBM47 (Fig. 4B). BICC1 and RBM47 had dramatically decreased expression levels in GS689.Li (BICC1: fold change = 7, FDR = 3.24e−84; RBM47: fold change = 67, FDR = 3.42e−261; see Fig. 4B). Interestingly, prior work revealed that loss of mouse Bicc1 disrupts E-cadherin-based cell–cell adhesion and epithelial cell polarity (74). RBM47 was recently found to inhibit breast cancer re-initiation and growth and act as a suppressor of breast cancer metastasis (75). Taken together, our data suggest that multiple RNA binding proteins collectively contribute to AS switches (and likely other post-transcriptional changes) associated with metastatic colonization.
Using deep RNA-seq, we identified 866 significant differential AS events between PC-3E and one of its derivative cell lines (GS689.Li). GO enrichment analyses suggest that these AS events preferentially target genes involved in pathways associated with cancer development and metastasis. For example, genes in the RTK signaling pathway are significantly enriched (Fig. 3A). Besides these enriched functional categories, we also identified AS events in master regulators of cancer cell gene expression and signaling with intriguing functional implications. For example, GRHL1 is a member of the Grainyhead family of transcription factors. These transcription factors are highly expressed in epithelial cells over mesenchymal cells and have been recently implicated in transcriptome regulation during EMT (76). We found an alternative exon of GRHL1 (exon 5) with differential AS between the PC-3E and GS689.Li cell lines. This exon had close to 100% exon inclusion in PC-3E. By contrast, its exon inclusion level was much lower (∼50%) in GS689.Li (Fig. 3C). The skipping of this exon resulted in a transcript isoform containing a premature termination codon, which would be targeted for degradation by the mRNA nonsense-mediated decay (NMD) pathway. Consistent with this finding, the expression level of GRHL1 was 5.4-fold lower in GS689.Li cells, suggesting that AS of this exon contributed to the molecular processes that turned off GRHL1 expression in metastatic cancer cells. Of all the identified AS events, cell-type–specific isoforms arising from a small number of events have been functionally characterized previously (described in Results). However, for the vast majority of events the consequences of AS are unknown. This represents an extensive resource for detailed functional investigations.
Shapiro and colleagues (7) previously carried out AS profiling of an in vitro EMT model triggered by ectopic expression of the EMT transcription factor Twist, and generated 27–30 million 39bp single-end RNA-seq reads on the epithelial or mesenchymal cell state respectively. To compare the similarities and differences in AS profiles between the in vitro EMT model and our in vivo model of metastatic colonization, we re-analyzed the RNA-seq data of Shapiro and colleagues (7) using the same bioinformatics pipeline as applied to our RNA-seq data. Under the same FDR cutoff, we identified 166 unique exon skipping events with significant differential splicing between the epithelial and mesenchymal cell states (RNA-seq data of Shapiro and colleagues), as compared with 424 events in our data (PC-3E versus GS689.Li). Of these, 40 events were in common, representing a highly statistically significant overlap (Fisher exact test P value = 9.3e−23). On the other hand, the majority of the identified differential exon skipping events were unique to one of the two datasets. Such differences in AS profiles may be attributed to multiple biological or technical factors. First, Shapiro and colleagues (7) analyzed a human mammary epithelial cell line (HMLE), whereas we analyzed AS profiles of two prostate cancer cell lines. AS events in cell-type–specific genes may not be shared between these two datasets. Second, our cell lines (PC-3E and GS689.Li) were derived from an in vivo model of metastatic colonization (19), whereas Shapiro and colleagues specifically overexpressed one EMT transcription factor Twist to trigger EMT in vitro (7). Third, our RNA-seq coverage was substantially deeper than that of the Shapiro and colleagues' data. Specifically, we analyzed three biological replicates for each cell line, with 114 to 132 million 101bp x 2 paired-end reads per replicate. By contrast, Shapiro and colleagues had no replicate for RNA-seq and 27–30 million 39bp single-end RNA-seq reads per RNA sample. Our increased RNA-seq coverage would lead to improved statistical power, as evidenced by the fact that the number of significant exon skipping events identified from the Shapiro and colleagues dataset was less than half of the events identified from our data. Consistent with the trend observed at the exon level, we found a similar pattern when we compared the gene expression profiles of RNA binding proteins between these two datasets. Although certain RNA binding proteins had strong differential gene expression in both systems (ESRP1, ESRP2, RBM47), a few others were differentially expressed in only one of the two systems. For example, NOVA1, MBNL3, and RBM24 were among the most differentially expressed splicing factors between PC-3E and GS689.Li cells (Fig. 4B), but there was no evidence of differential expression in the in vitro EMT model. Another example is BICC1, which had opposite directions of gene expression change in the two datasets. Specifically, BICC1 had 7-fold reduction in expression levels in our in vivo model of metastatic colonization (Fig. 4B), but approximately 3-fold increase in expression levels in the in vitro EMT model. These expression differences in RNA binding proteins are expected to contribute to downstream differences in AS. In summary, our comparison to the Shapiro and colleagues dataset demonstrated a conserved core AS program and splicing regulators associated with the invasiveness and metastatic properties of cancer cells, but also revealed many events unique to specific cell types and biological systems.
It is noteworthy that we did not find significant overlap between differentially spliced genes (DSG) and differentially expressed genes (DEG) between PC-3E and GS689.Li. The vast majority of DSGs had similar steady-state gene expression levels between the two cell types (Fig. 6A). Additionally, DSGs and DEGs had distinct sets of enriched GO terms (see Fig. 3A for enriched GO terms of DSGs and Supplementary Table S5 for enriched GO terms of DEGs). This pattern is similar to observations made in other developmental or evolutionary systems (77, 78), suggesting that differential gene transcription and splicing act in an orthogonal manner to regulate and fine-tune the global gene expression programs and cellular phenotypes (Fig. 6B).
Differential AS and differential gene expression act in an orthogonal manner to promote metastatic colonization. A, scatter plot of gene expression levels indicate a low overlap between DSGs and DEGs. B, differential AS and differential gene expression influence distinct sets of genes with important roles in regulating cellular phenotypes associated with cancer metastasis.
Differential AS and differential gene expression act in an orthogonal manner to promote metastatic colonization. A, scatter plot of gene expression levels indicate a low overlap between DSGs and DEGs. B, differential AS and differential gene expression influence distinct sets of genes with important roles in regulating cellular phenotypes associated with cancer metastasis.
The extent to which EMT is involved in prostate cancer progression has been somewhat controversial (79). However, support for its involvement in prostate cancer progression includes evidence of downregulation of E-cadherin, upregulation of N-cadherin and expression of EMT transcription factors such as ZEB1 in primary prostate cancers (80–82). Previously, we determined that ZEB1 was partially responsible for the aggressive metastatic behavior evident in the PC-3 derivatives used in this study (19). Here we found significant downregulation of ESRP1, which is repressed by ZEB1 (83), in laser-capture microdissected glands from patients with prostate cancer. This is also supportive of the notion that EMT results in progressive loss of ESRP1 expression in clinical specimens. An interesting direction for further research would be to determine whether increased ZEB1 and decreased ESRP1 are observed in the same cells from prostate cancer tissue, which might mark a subpopulation of cells undergoing EMT. It should also be noted that EMT may be reversed during productive metastatic colonization, consistent with the finding that E-cadherin is often detected in bone metastases in patients with prostate cancer (79, 84). The PC-3 model employed here may not adequately reflect this situation because in mouse xenografts cells in an EMT-like state exhibit productive metastatic colonization (19). In this regard, it would be of interest to determine the AS patterns and splicing factor expression in metastatic lesions of prostate cancer.
One surprising observation in this study was the significant upregulation of the splicing factor NOVA1 in all derivative cell lines. This upregulation was observed in both in vitro and in vivo models (Fig. 1C), and was confirmed at both the mRNA and protein levels (Fig. 1D and 4B). NOVA1 has been extensively studied as a neuronal-specific splicing regulator (33, 71). Little is known about the function of NOVA1 outside the neuronal system. However, we note that NOVA1 was originally identified in a lung cancer cell line (85). Moreover, increased expression of NOVA1 mRNA has been observed in docetaxel resistant PC-3 prostate cancer cells (86) and in lymph node metastasis (87), although neither study commented on the functional significance of this observation. It was also reported recently that higher NOVA1 expression was associated with worse survival outcomes in patients with hepatocellular carcinoma (HCC), and the overexpression of NOVA1 in HCC cells promoted cell proliferation (88). Collectively, these data imply a yet unappreciated role of NOVA1 in metastatic cancer cells. Using cells with ectopic expression of several EMT-relevant transcriptional regulators such as SNAI1, TWIST2, and ZEB1, we obtained the preliminary evidence that the protein expression of NOVA1 can be induced in a time-dependent manner by SNAI1, but not by TWIST2 or ZEB1 (see Supplementary Fig. S1), suggesting that NOVA1 may be a downstream target of the SNAI1-regulated gene network. We should note that the regulatory and functional significance of NOVA1 upregulation in the PC-3 derivative cell lines is currently unclear. Motif analysis did not find significant enrichment of the NOVA1 motif around the differential AS events, but our RT-PCR analysis of 30 randomly selected exons did indicate that a considerable portion of metastasis-associated AS events were influenced by the variation in NOVA1 levels. It is also possible that NOVA1 may play a role in other aspects of RNA regulation in addition to splicing control in these cancer cells. Alternatively, given that GO terms such as “neuron differentiation” and “cell morphogenesis involved in neuron differentiation” are significantly enriched among the DEGs (see enriched GO terms among DEGs in Supplementary Table S5), the upregulation of NOVA1 in the PC-3 derivative cells may reflect a neuronal-like gene expression program associated with neuroendocrine differentiation occurring in aggressive prostate cancers (89, 90).
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: M.D. Henry, Y. Xing
Development of methodology: Y. Xing
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): L. Lin
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): Z.-x. Lu, Q. Huang, J.W. Park, S. Shen, C.J. Tokheim, Y. Xing
Writing, review, and/or revision of the manuscript: Z.-x. Lu, Q. Huang, M.D. Henry, Y. Xing
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Y. Xing
Study supervision: Y. Xing
Acknowledgments
The authors thank Hong Wu (UCLA), Douglas Black (UCLA), and Peter Stoilov (WVU) for discussions on this work, Douglas Black (UCLA) for providing the NOVA1 antibody, and the Biospecimen Core of the Pacific Northwest Prostate Cancer SPORE (CA097186) for providing prostate cancer cDNAs.
Grant Support
This work was supported by the NIH (CA130916 to M.D. Henry and GM105431 to Y. Xing). Y. Xing is supported by an Alfred Sloan Research Fellowship.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.