Most cancer deaths are due to metastasis, and epithelial-to-mesenchymal transition (EMT) plays a central role in driving cancer cell metastasis. EMT is induced by different stimuli, leading to different signaling patterns and therapeutic responses. TGFβ is one of the best-studied drivers of EMT, and many drugs are available to target this signaling pathway. A comprehensive bioinformatics approach was employed to derive a signature for TGFβ-induced EMT which can be used to score TGFβ-driven EMT in cells and clinical specimens. Considering this signature in pan-cancer cell and tumor datasets, a number of cell lines (including basal B breast cancer and cancers of the central nervous system) show evidence for TGFβ-driven EMT and carry a low mutational burden across the TGFβ signaling pathway. Furthermore, significant variation is observed in the response of high scoring cell lines to some common cancer drugs. Finally, this signature was applied to pan-cancer data from The Cancer Genome Atlas to identify tumor types with evidence of TGFβ-induced EMT. Tumor types with high scores showed significantly lower survival rates than those with low scores and also carry a lower mutational burden in the TGFβ pathway. The current transcriptomic signature demonstrates reproducible results across independent cell line and cancer datasets and identifies samples with strong mesenchymal phenotypes likely to be driven by TGFβ.

Implications: The TGFβ-induced EMT signature may be useful to identify patients with mesenchymal-like tumors who could benefit from targeted therapeutics to inhibit promesenchymal TGFβ signaling and disrupt the metastatic cascade. Mol Cancer Res; 15(5); 619–31. ©2017 AACR.

Epithelial-to-mesenchymal transition (EMT) is a physiologic process involved in development (1) and wound healing (2) subverted during cancer metastasis, together with the reverse MET process (3). EMT is associated with a migratory, invasive phenotype and cancer stem cell (CSC) characteristics that aid metastasis, promote drug resistance, and repress apoptosis (4). General EMT signatures have been derived using a variety of cell lines and/or stimuli (5–7), with the aim of identifying a general molecular program underlying EMT. Our previous work, however, suggests that EMT is a phenotypic endpoint that can be driven by different stimuli, with important implications for drug responses (8).

TGFβ is a canonical driver of EMT (9). Mutations in the TGFβ pathway can promote cell invasiveness and motility, and reduced TGFβ type II receptor (TGFBR2) expression has been reported in different cancer types (10). In various cancer cell lines, active TGFβ signaling induces the promesenchymal transcription factors SNAI1 and SNAI2, which in turn suppress expression of E-cadherin (CDH1), resulting in loss of cell–cell junctions (11). TGFβ controls important EMT markers through both canonical and noncanonical signaling in different cancers (12). Note that EMT not only is regulated through transcription but also is influenced and maintained through interconnected changes in epigenetics (13), micro-RNAs (14), long, noncoding RNAs (15), and protein synthesis (16, 17).

In a clinical context, it is important to note that TGFβ can play a dual role in human cancer, promoting metastasis or acting as a tumor suppressor (18). To some extent, this parallels contrasting effects of TGFβ on epithelial cells where it inhibits proliferation, and mesenchymal cells where it is stimulatory (19). Although high tumor and circulatory levels of TGFβ are associated with poor prognosis across multiple cancers, TGFβ levels alone are not sufficient to identify patients for TGFβ inhibitor therapies due to these suppressor effects (20). Given the importance of TGFβ-induced EMT in tumor progression (21), we sought to identify a signature which provides evidence of this specific phenotypic program and thus may better identify patients where cancer progression is driven by TGFβ. We have applied transcript meta-analysis methods, using data from a range of cancer cell lines with TGFβ stimulation and verified EMT. To our knowledge, this is the first time that using comprehensive meta-analysis methods, a transcriptional signature of EMT specifically associated with TGFβ has been obtained. It should be noted that this signature captures the change in gene expression caused by TGFβ-induced EMT and does not necessarily indicate further increases in TGFβ signaling.

We have assessed this signature using a wide range of cancer cell line and clinical tumor data. Specifically, we applied single-sample gene set analysis methods to obtain TGFβ–EMT scores (TES) and identify cancer cell lines and tumor samples with evidence of TGFβ-induced EMT. We illustrate a novel approach of comparing information from multiple signatures, such as more general epithelial and mesenchymal signatures, to examine specific molecular drivers of EMT across a large number of cell lines and clinical samples. Finally, we demonstrate that TES is correlated with a differential response to certain drugs, and we show that higher TES is associated with lower overall survival outcome in pan-cancer data. Our signature may be useful to identify candidate patients for TGFβ signaling inhibitor therapies (22), and the approach of gene set scoring is particularly promising for the strategic design of personalized cancer treatments.

All computational and statistical analyses were performed using R (versions 3.1.1, 3.2.2, and 3.2.4) and Bioconductor (version 3.0). Further details are given in the Supplementary Methods, and the digital archive reproducing our results is also available at https://github.com/DavisLaboratory/mforoutan_tgfb_paper_2016.

Microarray data for the gene expression signature of TGFβ-induced EMT

Publically available microarray data were collected from Gene Expression Omnibus (GEO) using the NCBI portal (http://www.ncbi.nlm.nih.gov/geo/), based upon the following criteria: (i) microarray experiments must have been performed using Affymetrix, Agilent, or Illumina platforms; (ii) data must be collected from human cancer cell lines; (iii) there must be evidence of TGFβ-induced EMT through morphologic or phenotypic assays and expression assays for EMT markers (e.g. VIM, CDH1 etc.); and (iv) datasets must include biological replicates.

Data quality was assessed using methods provided within the R/Bioconductor package Simpleaffy (23) for Affymetrix platforms and using unsupervised methods for other platforms to exclude all low quality samples from the analysis. All 10 datasets used in this study were of high quality.

Obtaining our TGFβ–EMT signature

To obtain our TGFβ–EMT signature, we applied 2 meta-analysis techniques. In the first technique, we calculated the “Product of Rank” (PR) metric which has been shown to have better performance characteristics (biologic association, stability, and robustness) than the other available methods for identifying genes that are differentially expressed in all of the datasets (24). In the second method, we integrated all the 10 datasets after RMA normalization and applied SVA (25) and ComBat (26) to estimate and remove batch effects and then performed differential expression analysis using limma.

Public cancer cell line and patient tumor data

Cell line data used in this study were derived from NCI-60 (27), CCLE (Cancer Cell Line Encyclopedia; ref. 28), COSMIC (29) pan-cancer cell lines, as well as the breast cancer cell lines obtained from the study by Neve and colleagues (30) and Heiser and colleagues (31). We also used patients' data from The Cancer Genome Atlas (TCGA) including breast cancer microarray data as well as the pan-cancer RNASeq data. All of the datasets were scored against the signature, and COSMIC, NCI-60, and Heiser were further analyzed for drug response assessment. Batch analysis of the large datasets is illustrated in Supplementary Results, Supplementary Figures S16–S27.

Single-sample scoring methods applied on the cell line and tumor data

Using the GSVA package in R/Bioconductor, we applied ssGSEA and GSVA methods to obtain the TES for samples in each dataset. We additionally defined a simple single-sample scoring method to examine whether the TES obtained by ssGSEA and GSVA is associated with batches in the large datasets.

Data for TGFβ-induced EMT were identified across the literature and integrated

To identify a gene expression signature associated with TGFβ-induced EMT, we collected microarray data from GEO (32) for 10 studies that examined TGFβ-induced EMT across a range of epithelial cell lines (Table 1). Six differentially expressed genes (DEG) were shared across all studies (COL1A1, FN1, ADAM19, SERPINE1, TAGLN, and PMEPA1), all of which are associated with EMT and TGFβ stimulation.

Table 1.

Microarray studies examining TGFβ-induced EMT

Cell lineTissue of cell line originPlatformReplicatesGEO accession numberReference
HMLE Breast Affymetrix- HT_HG-U133A GSE24202 Taube and colleagues (5) PMID: 20713713 
MCF10A Breast Agilent-014850 Whole Human Genome GSE28569 Deshiere and colleagues (2013) PMID: 22562247 
HMEC-TR Breast Affymetrix- HG-U133_Plus_2 GSE28448 Hesling and colleagues (2011) PMID: 21597466 
A549 Lung Affymetrix- HG-U133_Plus_2 GSE17708 Keshamouni and colleagues(2010) PMID: 19118450 
A549 Lung Affymetrix- HG-U133_Plus_2 GSE49644 Sun and colleagues (2014) PMID: 25379179 
HCC827 Lung Affymetrix- HG-U133_Plus_2 GSE49644 Sun and colleagues (2014) PMID: 25379179 
NCI-H358 Lung Affymetrix- HG-U133_Plus_2 GSE49644 Sun and colleagues (2014) PMID: 25379179 
HK-2 Kidney Affymetrix- HG-U133A_2 GSE23338 Walsh and colleagues (2008) PMID: 25379179 
Human proximal tubular cell line Kidney Illumina- HumanWG-6 v3.0 GSE20247 Hills and colleagues (2010) PMID: 20197308 
Panc-1 Pancreas Affymetrix- HG-U133_Plus_2 GSE23952 Maupin and colleagues (2010) PMID: 20885998 
Cell lineTissue of cell line originPlatformReplicatesGEO accession numberReference
HMLE Breast Affymetrix- HT_HG-U133A GSE24202 Taube and colleagues (5) PMID: 20713713 
MCF10A Breast Agilent-014850 Whole Human Genome GSE28569 Deshiere and colleagues (2013) PMID: 22562247 
HMEC-TR Breast Affymetrix- HG-U133_Plus_2 GSE28448 Hesling and colleagues (2011) PMID: 21597466 
A549 Lung Affymetrix- HG-U133_Plus_2 GSE17708 Keshamouni and colleagues(2010) PMID: 19118450 
A549 Lung Affymetrix- HG-U133_Plus_2 GSE49644 Sun and colleagues (2014) PMID: 25379179 
HCC827 Lung Affymetrix- HG-U133_Plus_2 GSE49644 Sun and colleagues (2014) PMID: 25379179 
NCI-H358 Lung Affymetrix- HG-U133_Plus_2 GSE49644 Sun and colleagues (2014) PMID: 25379179 
HK-2 Kidney Affymetrix- HG-U133A_2 GSE23338 Walsh and colleagues (2008) PMID: 25379179 
Human proximal tubular cell line Kidney Illumina- HumanWG-6 v3.0 GSE20247 Hills and colleagues (2010) PMID: 20197308 
Panc-1 Pancreas Affymetrix- HG-U133_Plus_2 GSE23952 Maupin and colleagues (2010) PMID: 20885998 

NOTE: To identify a signature for TGFβ-induced EMT that is not specific to a particular type of cancer, we used cell lines that represent a variety of cancers.

A gene signature for TGFβ-induced EMT was obtained using meta-analysis methods

When using list overlap to derive gene signatures, the results often have low statistical power due to the small number of samples within each study. Given that only 6 DEGs were identified across all datasets, we next applied 2 meta-analysis techniques and took the union of the resulting gene sets for our signature.

First, within each study, genes were ranked by their limma (33, 34) t statistic and the product of ranks (24, 35) was calculated and compared with a permuted null distribution (details in Supplementary Methods). This approach identified 186 up- and 82 downregulated genes implicated in TGFβ-induced EMT (Fig. 1A).

Figure 1.

A, Distribution of gene scores by product of ranks. The product of ranks (PR) value for each gene was calculated across all 10 datasets studied and then log2-transformed (histogram). A permuted null distribution was calculated (density line; n = 119 × 106), and a significance threshold was specified at 99.999% of the cumulative density (dashed line) to identify genes with a product of rank lower than expected (P < 5e-6). B, Overlap of our TGFβ–EMT signature (Foroutan) with more general EMT signatures reported by Cursons and colleagues (8) and Du and colleagues (36).

Figure 1.

A, Distribution of gene scores by product of ranks. The product of ranks (PR) value for each gene was calculated across all 10 datasets studied and then log2-transformed (histogram). A permuted null distribution was calculated (density line; n = 119 × 106), and a significance threshold was specified at 99.999% of the cumulative density (dashed line) to identify genes with a product of rank lower than expected (P < 5e-6). B, Overlap of our TGFβ–EMT signature (Foroutan) with more general EMT signatures reported by Cursons and colleagues (8) and Du and colleagues (36).

Close modal

Next, we obtained DEGs by combining the 10 datasets after RMA normalization. Although the merged data had a larger number of samples, providing us with greater statistical power, we needed to minimize batch effects between the different studies. The R package SVA (25) was used with the SVA and ComBat (26) algorithms to estimate and remove batch effects within our integrated data. Using limma (33, 34) with batch corrected data identified 121 up- and 74 down-regulated genes (Supplementary Table S1).

Across both signatures, there were 165 common genes and 301 genes in total. The combined list was used as our signature of TGFβ-induced EMT (108 down- and 193 up-regulated genes, Supplementary Table S1).

The TGFβ-induced EMT gene signature had stimulus-dependent overlap with reported general EMT signatures

A number of transcript abundance signatures have been identified for EMT, including reports by Taube and colleagues (ngenes = 250; ref. 5), Groger and colleagues (ngenes = 131; ref. 6), Tan and colleagues (ngenes = 218; ref. 7), Cursons and colleagues (ngenes = 206; Supplementary Table S2; ref. 8), and Du and colleagues (ngenes= 571; ref. 36). Comparing our TGFβ-induced EMT signature against these previous studies, only 3 genes were shared across all: vimentin (VIM), keratin-15 (KRT15), and tetraspanin-1 (TSPAN1); however, comparing each pair of signatures, the number of shared genes was higher (Supplementary Results, Supplementary Fig. S1).

Low gene overlap across all signatures may reflect differences in EMT stimuli and experimental and computational methods. Our signature was derived using 10 datasets that specifically examined TGFβ-induced EMT using multiple cancer cell lines (Table 1), and these data were subjected to consistent preprocessing and normalization methods to minimize artifacts. Conversely, Cursons and colleagues derived their signature using EGF- or hypoxia-induced EMT within basal breast cancer cell lines (8). Taube and colleagues used a number of different stimuli (including TGFβ) to induce EMT; however, they only used one cell line and thus their signature may show bias toward the genetic/phenotypic background of HMLE cells (5). Groger and colleagues selected genes differentially expressed in at least 10 (of 18) data sets (6), whereas our hypothesis setting identified genes differentially expressed across all datasets. The signature recently reported by Du and colleagues (36) was obtained using only 2 cancer cell lines, with EMT induced by TGFβ and oncostatin-M (OSM), and their EMT signature was defined simply by taking the intersection of DEGs between the 2 cell lines. Finally, Tan and colleagues (7) applied an alternative approach—rather than studying stimulated cell lines, they used known EMT markers to classify epithelial or mesenchymal cell lines and tumors and then derived their signature.

Supporting our suggestion that stimulus-specific differences underlie the differences in the above signatures, our TGFβ-EMT signature had the greatest overlap with Du and colleagues (using TGFβ and OSM) and the lowest overlap with the EGF/hypoxia-induced EMT signature from Cursons and colleagues (Fig. 1B). In general, the signature obtained in the study by Cursons and colleagues shared the least genes with other studies (Supplementary Results, Supplementary Fig. S1).

There is evidence of active TGFβ signaling in CD44+ breast cancer cells and embryonic stem cells (37, 38) and examining overlap between our TGFβ-EMT signature and the metastatic CD44+ breast cancer signature derived by Shipitsin and colleagues (ngenes= 952; ref. 37) identified 49 common genes (greater than expected, P ≤ 1e-5; Supplementary Table S3). Similarly, Blick and colleagues (39) identified 55 “basal B Discriminator” genes that distinguish basal B cell lines from both basal A and luminal subgroup breast cancer cell lines (37) and which shared 8 genes with our TGFβ-EMT gene signature (greater than expected, P < 2e-5): ANK3, ELF3, ERBB3, FBN1, INHBB, TGFB1I1, MYO5C, and RAB25. Given the aggressive nature of these breast cancer subgroups, this overlap suggests that TGFβ signaling may play a role in mediating clinical disease progression. In support of this, upregulated genes from our signature showed significantly higher expression values in basal B and triple-negative breast cancer cell line subgroups, whereas downregulated genes showed significantly lower expression (Supplementary Results, Supplementary Fig. S2).

A wide range of cancer cell lines show graded variations in scoring metrics derived from the signature of TGFβ-induced EMT

To identify cancer cell lines with evidence of TGFβ-induced EMT, we created a relative TES which quantified the degree of concordance with our signature. Using both the ssGSEA (40) and GSVA (41) single-sample scoring methods, scores from the 108 down- and 193 upregulated gene sets in our signature were calculated separately and then summed (Supplementary Methods) to obtain summed-ssGSEA (Figs. 2 and 3) and summed-GSVA (Supplementary Results, Supplementary Figs. S3 and S4) scores. These were calculated for the experimental data used to derive our TES (Fig. 2A), the Neve and colleagues breast cancer cell line data (nCellLines = 54; Fig. 2B; ref. 30), the NCI-60 data (nCellLines = 59; Fig. 2C; refs. 27, 42), and CCLE (nCellLines = 1,036; Fig. 2D, all scores given in Supplementary Tables S7–S11; refs. 43, 44). A comparison between GSVA and ssGSEA scores is provided in Supplementary Fig. S5 (Supplementary Results).

Figure 2.

Relative TES for cancer cell lines from different datasets. Up- and downregulated sets from our 301-gene signature were scored and combined using a summed-ssGSEA approach. This was applied to (A) the integrated TGFβ-EMT dataset (Table 1), (B) Neve and colleagues cell lines as clustered, (C) NCI-60 cell lines, and (D) CCLE cell lines. Where cell lines are present in multiple datasets, we calculated the correlation between TES, for (E) CCLE and NCI-60 data, and (F) CCLE and Neve data, demonstrating that scores are highly correlated in independently measured data.

Figure 2.

Relative TES for cancer cell lines from different datasets. Up- and downregulated sets from our 301-gene signature were scored and combined using a summed-ssGSEA approach. This was applied to (A) the integrated TGFβ-EMT dataset (Table 1), (B) Neve and colleagues cell lines as clustered, (C) NCI-60 cell lines, and (D) CCLE cell lines. Where cell lines are present in multiple datasets, we calculated the correlation between TES, for (E) CCLE and NCI-60 data, and (F) CCLE and Neve data, demonstrating that scores are highly correlated in independently measured data.

Close modal
Figure 3.

The (A) epithelial score (ES) and (B) mesenchymal score (MS) of the control and TGFβ-treated samples within the integrated TGFβ–EMT dataset (obtained by ssGSEA method). As shown, control cells (blue histograms) tend to have a high ES and low MS, whereas TGFβ-treated cells tend to have a low ES and high MS. C–E, Comparison of TES to ES and MS in CCLE cell lines with (C) low TES, (D) medium TES, and (E) high TES. Samples with high TES show high MS and low ES, whereas samples with low TES tend to have low MS and high ES. F, Correlation between TES and ES across all CCLE cell lines (Spearman ρ = −0.83). G, Correlation between TES and MS in all CCLE cell lines (Spearman ρ = 0.8).

Figure 3.

The (A) epithelial score (ES) and (B) mesenchymal score (MS) of the control and TGFβ-treated samples within the integrated TGFβ–EMT dataset (obtained by ssGSEA method). As shown, control cells (blue histograms) tend to have a high ES and low MS, whereas TGFβ-treated cells tend to have a low ES and high MS. C–E, Comparison of TES to ES and MS in CCLE cell lines with (C) low TES, (D) medium TES, and (E) high TES. Samples with high TES show high MS and low ES, whereas samples with low TES tend to have low MS and high ES. F, Correlation between TES and ES across all CCLE cell lines (Spearman ρ = −0.83). G, Correlation between TES and MS in all CCLE cell lines (Spearman ρ = 0.8).

Close modal

Samples with a high TES show strong concordance with our signature and are thus predicted to have undergone some form of TGFβ-induced EMT. As expected, TGFβ-treated samples used to derive the signature showed higher TES than the control samples (Fig. 2A). All high TES breast cancer cell lines within the Neve data were from the very aggressive basal B subgroup (Fig. 2B), which has CSC-like characteristics and a mesenchymal phenotype (30, 39). Across the NCI-60 (Fig. 2C) and CCLE (Fig. 2D) data, very consistent results were obtained for each cancer type. When cell lines were ranked by their TES and classified as “high TES” (top 10%), “medium TES” (middle 10%), and “low TES” (bottom 10%), however, there appeared to be some enrichment for different tissues of origin. Within the CCLE data, most high TES cell lines are derived from the central nervous system, and there are a number of skin and bone cancer cell lines (Fig. 2D). Conversely, the majority of the low TES cancer cell lines are from the large intestine, followed by lung and breast cancer cell lines (Fig. 2D). Considering the medium TES group, most cell lines are derived from hematopoietic and lymphoid tissues and lung cancers (Fig. 2D). Lists of cell lines with very high, very low, and medium TES are given in Supplementary Table S4.

The Neve, NCI-60, and CCLE data shared some common cell lines, and the TES calculated across these independent studies was very consistent (Fig. 2E and F), suggesting that our TGFβ-EMT signature is stable and robust across different experimental systems.

High TES cell lines also show a high mesenchymal score

To quantify changes across the landscape of epithelial and mesenchymal phenotypes, we used both ssGSEA (Fig. 3A and B) and GSVA (Supplementary Results, Supplementary Fig. S4) to derive epithelial and mesenchymal scores (ES and MS, respectively) from the corresponding Tan and colleagues gene sets (7). For TGFβ–EMT data used to derive our signature, epithelial scores provided reasonable separation of control and TGFβ-treated samples, whereas mesenchymal scores gave good separation (Fig. 3A and B). For CCLE samples grouped by low, medium, and high TES, we extracted the TES, epithelial score, and mesenchymal score (Fig. 3C–E). Low TES cell lines tend to have lower mesenchymal and higher epithelial scores, whereas samples with a high TES have high mesenchymal and low epithelial scores. Interestingly, samples with medium TES show a greater range of epithelial and mesenchymal scores (Fig. 3D). When all CCLE cell lines are considered together, there is a strong negative correlation between TES and epithelial score (Fig. 3F, ρ = −0.83, P < 2e-16) and positive correlation between TES and mesenchymal score (Fig. 3G, ρ = 0.80, P < 2e-16). As noted above, a number of stimuli can drive EMT and this raises the possibility that cell lines with a low/moderate TES but relatively high mesenchymal score represent those with a different molecular etymology inducing EMT.

We also scored CCLE using previously identified signatures of general EMT (Supplementary Results, Supplementary Fig. S6). Across the CCLE data, there was a high correlation between TES and scores derived from the Du (ρ = 0.97), Groger (ρ = 0.95), or Taube EMT signatures (ρ = 0.91), all of which used TGFβ as one of their stimuli. It should be noted, however, that there were a number of samples, particularly with lower TES where these metrics diverged (Supplementary Results, Supplementary Fig. S6). Conversely, our TES had almost no association (ρ = 0.25) with scores from the Cursons and colleagues signature which was derived using EGF and hypoxia stimulation. This appears to support the earlier observation of stimulus-dependent similarities and differences between EMT signature list overlap.

Cell lines with high TES tend to carry a lower mutation load within TGFβ signaling elements

We extracted gene names for all components of the KEGG and Reactome TGFβ signaling pathways (nGenes = 98; Supplementary Table S5), and for CCLE cell lines with a high, medium, and low TES, we looked at the frequency of mutations within these genes (Table 2). The TGFβ mutational load for each group was calculated as the ratio of total mutations in TGFβ pathway genes to the number of cell lines in that group.

Table 2.

Number of mutations for selected TGFβ pathway components within low, medium and high TES groups

Absolute frequency of mutations within cell lines (number expected to cause protein sequence changes)
Selected genes from TGFβ pathwayLow TESMed TESHigh TES
TGFB1 — 3 (3) — 
TGFB2 26 (1) 25 (3) 16 (1) 
TGFB3 6 (5) 2 (1) — 
TGFBR1 17 (16) 1 (0) — 
TGFBR2 24 (24) 6 (4) 1 (1) 
SMAD2 6 (6) 2 (1) 1 (1) 
SMAD3 5 (5) 2 (1) 1 (0) 
SMAD4 21 (20) — 1 (1) 
SMAD7 10 (3) 2 (1) 1 (0) 
RHOA 17 (15) 2 (2) 1 (1) 
 Relative frequency of TGFβ pathway mutations within cell line groups 
Number of genes mutated in TGFβ pathway 40 33 28 
Total mutations in TGFβ pathway genes 466 (316) (n = 104 CLs) 228 (268) (n = 104 CLs) 135 (42) (n = 104 CLs) 
Average TGFβ pathway mutations per cell line 4.48 2.20 1.29 
Total mutations 19,012 10,607 7,626 
Absolute frequency of mutations within cell lines (number expected to cause protein sequence changes)
Selected genes from TGFβ pathwayLow TESMed TESHigh TES
TGFB1 — 3 (3) — 
TGFB2 26 (1) 25 (3) 16 (1) 
TGFB3 6 (5) 2 (1) — 
TGFBR1 17 (16) 1 (0) — 
TGFBR2 24 (24) 6 (4) 1 (1) 
SMAD2 6 (6) 2 (1) 1 (1) 
SMAD3 5 (5) 2 (1) 1 (0) 
SMAD4 21 (20) — 1 (1) 
SMAD7 10 (3) 2 (1) 1 (0) 
RHOA 17 (15) 2 (2) 1 (1) 
 Relative frequency of TGFβ pathway mutations within cell line groups 
Number of genes mutated in TGFβ pathway 40 33 28 
Total mutations in TGFβ pathway genes 466 (316) (n = 104 CLs) 228 (268) (n = 104 CLs) 135 (42) (n = 104 CLs) 
Average TGFβ pathway mutations per cell line 4.48 2.20 1.29 
Total mutations 19,012 10,607 7,626 

Abbreviation: CL, cell lines.

The high TES group contains a lower overall relative mutational load within TGFβ signaling pathway genes (Table 2), and although mutations were observed for TGFB2 and TGFBR2, other important upstream regulators such as TGFB1, TGFB3, and TGFBR1 were not mutated within high TES cell lines (Table 2). Furthermore, in high TES cell lines, 69% of mutations within TGFβ pathway components are not predicted to change protein primary sequence (Table 3, details in Supplementary Table S6). Conversely, in low TES cell lines, only 32% of TGFβ pathway mutations are not predicted to affect protein sequence (Supplementary Table S6).

Table 3.

The proportion of TGFβ signaling mutations in each mutation class

Score3′ UTR5′ UTRFrameshift DelFrameshift InsIn-frame DelIntronMissenseNonsenseSilentSplice site InsSplice site SNP
High TES 0.53 0.13 — — 0.01 0.03 0.28 0.01 0.01 — — 
Med TES 0.33 0.10 0.05 0.02 0.01 0.04 0.37 0.04 0.02 0.004 0.02 
Low TES 0.21 0.05 0.16 0.02 0.01 0.05 0.42 0.04 0.01 0.004 0.02 
Score3′ UTR5′ UTRFrameshift DelFrameshift InsIn-frame DelIntronMissenseNonsenseSilentSplice site InsSplice site SNP
High TES 0.53 0.13 — — 0.01 0.03 0.28 0.01 0.01 — — 
Med TES 0.33 0.10 0.05 0.02 0.01 0.04 0.37 0.04 0.02 0.004 0.02 
Low TES 0.21 0.05 0.16 0.02 0.01 0.05 0.42 0.04 0.01 0.004 0.02 

NOTE: Data shown for CCLE mutations listed in Table 2.

Abbreviations: Del, deletion; Ins, insertion (ref. 28).

More than half of the TGFβ component mutations in the high TES group occur within the 3′-untranslated region (UTR; 53%), whereas missense mutations (28%) and 5′-UTR mutations (13%) are also relatively common (Table 3). For the low TES group, however, a large proportion of TGFβ mutations are missense (42%) or 3′-UTR (21%) mutations, whereas frameshift deletions are more frequent (16%) compared with the high TES group (Table 3). When the mutation frequency in TGFβ signaling genes was normalized against the total number of mutations in each cell line, there was no difference between the high TES and low TES groups, implying that lower TES is associated with higher genomic mutational load.

High TES group responds differently to certain drugs

An association between EMT and general drug resistance has been reported (45, 46); thus, we explored drug efficacy for the high TES group of cell lines, which also show a relatively high mesenchymal score (Fig. 3E). We examined NCI-60 (27, 42) and COSMIC (29) pan-cancer drug data and breast cancer cell line drug data from Heiser and colleagues (31) and found that samples with a high TES showed large, drug-specific differences in their association with resistance (e.g., NSC 150412) or sensitivity (e.g., NSC 674493; Fig. 4A).

Figure 4.

A, NCI-60 cell lines ordered by their relative TES and log10(GI50) of the 2 drugs (scaled by z-score). Higher values of log10(GI50) are associated with higher resistance of samples to that drug. B, Volcano plot shows the Spearman correlation between TES and log10(GI50) along with the related P values for the correlation. The left and the right plots are the compounds with the highest negative and positive correlations, respectively. A positive correlation is associated with higher resistance of high TES group to that drug, whereas this group is more sensitive to drugs with a negative correlation.

Figure 4.

A, NCI-60 cell lines ordered by their relative TES and log10(GI50) of the 2 drugs (scaled by z-score). Higher values of log10(GI50) are associated with higher resistance of samples to that drug. B, Volcano plot shows the Spearman correlation between TES and log10(GI50) along with the related P values for the correlation. The left and the right plots are the compounds with the highest negative and positive correlations, respectively. A positive correlation is associated with higher resistance of high TES group to that drug, whereas this group is more sensitive to drugs with a negative correlation.

Close modal

For each drug across each dataset, we calculated the Spearman correlation between cell line TES and GI50 or IC50 (Fig. 4B for NCI-60 data; further information in Supplementary Results, Supplementary Figs. S9–S11). The pan-drug analysis shows a wide range of correlations between TES and GI50, suggesting there was no general drug resistance associated with TGFβ-induced EMT and reinforcing the observation of drug-specific effects.

Associations in our results suggest that high TES cell lines may have resistance against EGFR and AKT inhibitors. In particular, the COSMIC pan-cancer data identifies afatinib (BIB2992), gefitinib, and AKT inhibitor VIII; for COSMIC breast cancer cell lines, lapatinib, A443654, and GSK2141795; and for the Heiser data, Sigma AKT1/2 inhibitor. Conversely, samples with high TES appeared to show sensitivity to a number of drugs inhibiting cell proliferation, mitosis, and DNA synthesis. Within the COSMIC pan-cancer data, this included NVPBEZ235, temsirolimus, BX795, GSK269962A, and cisplatin; whereas the COSMIC breast cancer and Heiser data suggested sensitivity to NSC663284, docetaxel, PF3814735, GSK650394, AZ628, and OSI906.

This is consistent with our previous results (8) showing that different EMT stimuli lead to differences in drug response, and the work of others demonstrating that EMT is not universally associated with drug resistance (7). It must be stressed that these results capture “population” level, observational associations between TES and drug response, and cannot be considered to be predictive of response; however, they highlight the importance of stimulus-specific EMT signatures when interpreting general trends for drug response data.

A wide range of cancer samples show graded variations in scoring metrics derived from the TGFβ-induced EMT signature

A number of drugs targeting TGFβ pathway elements are in development for clinical applications (22); thus, we examined whether our signature could identify clinical samples with evidence of TGFβ-induced EMT. We used ssGSEA to score the TCGA pan-cancer data (Fig. 5A) and breast cancer data (Supplementary Results, Supplementary Fig. S12) against our TGFβ-EMT signature, the Tan and colleagues (2014) tumor epithelial and mesenchymal signatures (7), as well as the Du (36), Groger (6), Taube (5), and Cursons (8) EMT signatures. For breast cancer samples where corresponding microarray data were available (nTumors= 571), we also calculated the TES, epithelial score, and mesenchymal score and found a high correlation (Supplementary Results, Supplementary Fig. S13), suggesting that our results are robust to artifacts associated with different detection platforms.

Figure 5.

A, Relative TES for the TCGA pan-cancer data using the ssGSEA method. B–D, Comparing TES, epithelial scores (ES), and mesenchymal scores (MS) in the TCGA pan-cancer data with tumors classified as (B) low TES, (C) medium TES, and (D) high TES. In general, samples with a high TES have a lower ES and higher MS, whereas samples with a low TES tend to have a higher ES and lower MS. E–G, Relationship between TES and tumor purity (CPE) in (E) TCGA pan-cancer data (nSamples = 7,805), (F) colorectal cancer with the strongest negative correlation, and in (G) glioblastoma with the least correlation.

Figure 5.

A, Relative TES for the TCGA pan-cancer data using the ssGSEA method. B–D, Comparing TES, epithelial scores (ES), and mesenchymal scores (MS) in the TCGA pan-cancer data with tumors classified as (B) low TES, (C) medium TES, and (D) high TES. In general, samples with a high TES have a lower ES and higher MS, whereas samples with a low TES tend to have a higher ES and lower MS. E–G, Relationship between TES and tumor purity (CPE) in (E) TCGA pan-cancer data (nSamples = 7,805), (F) colorectal cancer with the strongest negative correlation, and in (G) glioblastoma with the least correlation.

Close modal

The TCGA pan-cancer mRNA sequencing data includes 9,756 clinical samples covering 34 cancer types from 29 tissues. The tumor TES of individual cancer types (Fig. 5A) show a very similar distribution to the corresponding cell line data (Fig. 2C and D)—brain, bone, and skin cancer samples tend to display a high TES, hematopoietic cancer samples have an intermediate TES, whereas colorectal cancers have a low TES. Segregating pan-cancer tumors with a high (top 10%; nTumors = 976), moderate (middle 10%), or low (bottom 10%) TES and examining the epithelial and mesenchymal scores (Fig. 5B–D), a high TES was associated with tumors that have high mesenchymal and low epithelial scores. Similar results were observed for the TCGA breast cancer data (Supplementary Results, Supplementary Fig. S12), although there was less agreement between TES and mesenchymal scores for the breast cancer samples, suggesting that TGFβ may not be a common driver of mesenchymal behavior across all breast cancers.

Analysis of the TCGA pan-cancer mutation data showed that samples with high TES have relatively fewer mutations in TGFβ signaling elements compared to samples with low TES, with fewer patients having at least one mutation in the TGFβ signaling elements (35% vs. 47%). These differences, while more modest, are none-the-less consistent with the results of our cell line analysis, demonstrating that even taking intratumor heterogeneity into account, the genomic observations translate from cell lines to patient samples.

Across all patients, there were significant (P < 2e-16) correlations between TES and mesenchymal scores (Supplementary Results, Supplementary Fig. S8) for breast cancer (ρ= 0.57) and pan-cancer data (ρ= 0.53) and between TES and epithelial scores for breast cancer (ρ= −0.44) and pan-cancer (ρ=-0.73) data. Examining other EMT signatures across the TCGA pan-cancer data, our TES was highly correlated with scores from the Du (ρ= 0.93), Groger (ρ= 0.80), and Taube (ρ= 0.66) signatures but not correlated with the Cursons' EMT score (ρ= 0.07; Supplementary Results, Supplementary Fig. S7), which intriguingly shows 2 populations with separate TES but a range of overlapping Cursons' EMT scores. These results again support the notion that signatures derived using similar stimuli are highly associated, whereas EMT signatures derived using alternative stimuli are largely independent, highlighting the need for stimulus-specific EMT signatures.

Given that stromal cells often have a more mesenchymal phenotype and modulate TGFβ signaling (47), it is possible that tumor TES is predominantly derived from stromal cells within the tissue biopsy. To examine this, we calculated the correlation between TES and the consensus measurements of purity estimation (CPE; ref. 48) for TCGA pan-cancer samples (Fig. 5E). There was a weak but significant negative correlation (ρ= −0.20, P < 2e-16) between TES and tumor purity across all cancer types, suggesting that lower purity (i.e., more stromal contamination) is associated with a similar or higher TES. Considering each cancer type separately, weak negative correlations were also observed between TES and tumor purity for all cancer types except glioblastoma (ρ=0.003) and adrenocortical carcinoma (ACC; ρ=0.21; Supplementary Table S12). The strongest negative correlation was seen in colorectal cancers (ρ= −0.55; Fig. 5F). This is particularly interesting, as colorectal cancers tend to have a very low TES; thus, despite the observation that this tumor type is the most likely to show an inflated TES due to stromal contamination, we still see low TES overall. Conversely, we see no correlation between tumor purity and TES in glioblastoma (ρ= 0.003, Fig. 5G), despite this tumor type being among those with the highest TES on average. In addition, we calculated the TES for normal and TGFβ-stimulated fibroblasts (Supplementary Results, Supplementary Fig. S15) and found that the TES of TGFβ-stimulated fibroblasts was only slightly higher than untreated samples, perhaps reflecting the specificity of our signature toward EMT. Collectively, these results demonstrate that even though tumor purity does show some influence on TES, the influence is weak, particularly in cancer types with a very high TES.

The TES effectively separates pan-cancer patient cohort with significant differences in survival rates

Tumors with a high TES (nTumors = 976) from the TCGA pan-cancer data were associated with a significantly lower survival rate than samples with low TES (nTumors = 976; Fig. 6A; HR, 0.6; P = 2e-09). To investigate whether this effect can be attributed to different survival rates for tumors that could be separated by their relative TES, we compared the overall survival rates of all cancer types. As shown in Fig. 5A, skin and bone tumors have a higher TES; however, these tumor types also have longer median survival time (Fig. 6B), suggesting that tumor-dependent differences in survival may not confound the apparent TES-mediated differences in survival time (Fig. 6A).

Figure 6.

A, Overall survival of pan-cancer patients with high versus low TES (nTumors = 976 in each group, P = 2e-9, HR = 0.6). B, Overall survival rates of different cancer types with 10-year censoring (ordered by median values for overall survival)

Figure 6.

A, Overall survival of pan-cancer patients with high versus low TES (nTumors = 976 in each group, P = 2e-9, HR = 0.6). B, Overall survival rates of different cancer types with 10-year censoring (ordered by median values for overall survival)

Close modal

Although we observed a highly significant difference in overall survival for patients with low TES versus high TES tumors across the TCGA pan-cancer data, this difference was not statistically significant within all cancer types (Supplementary Results, Supplementary Fig. S14). Highly significant, poor survival rates were associated with high-versus-low TES samples in brain (P = 2e-5), kidney (P = 3e-4), and lining of body cavity cancers (P = 0.003). For cancers of adrenal gland, thyroid gland, eye, and bladder, smaller but significant (P < 0.05) differences were seen between survival rates of the low and high TES groups (Supplementary Results, Supplementary Fig. S14). For other cancer types, no significant differences were observed in survival rates of the patients with high versus low TES tumors; however, the trend was similar, as the high TES groups tended to have a worse survival outcome. It should be noted, however, that white blood cell (P = 0.04) and head and neck (P = 0.15) cancers showed the opposite association, as low TES was associated with poor survival.

We have derived a transcriptional signature for TGFβ-driven EMT and show consistent scoring results across large sets of cell line and patient data. This analysis highlights basal B breast cancer cell lines and clinical brain, bone, and skin tumors as malignancies with relatively high TGFβ signaling activity, whereas colorectal and endometrium cancers tend to show lower scores. The association our TES shows with epithelial and mesenchymal scores suggests that TGFβ signaling contributes to EMT in numerous cancer cell lines (Fig. 3C–G) and tumors (Fig. 5B–D); however, there are interesting discrepancies. It is tempting to speculate that samples with a high mesenchymal/general EMT score and low TES represent malignancies where alternative drivers of EMT are active, and conversely, samples with a high TES but low general EMT scores are more likely to display evidence of TGFβ-driven EMT (Supplementary Results). This illustrates the advantage of scoring multiple signatures associated with molecular phenotypes to gain a deeper understanding of tumor biology and guide the development and application of targeted therapies. In this context, we note that Tan and colleagues (7) combine their epithelial and mesenchymal scores into a single metric; however, samples with a specific TES can show a range of epithelial or mesenchymal score values (Figs. 3F and G and 5B–D), suggesting that summation into a single metric may hide meaningful variation. The range of correlations between our TES and general EMT signature scores appeared to reflect the relative contribution of TGFβ stimulation in the study design (Supplementary Figs. S6 and S7, Supplementary Results), highlighting the need for more specific signatures of EMT, such as the one we present here, that are matched to their underlying molecular etymology.

Examining mutation frequencies in genes of TGFβ signaling components, samples with a higher TES tended to contain fewer mutations and these were localized to noncoding regions. None of the 3′-UTR, 5′-UTR, or intron mutations were predicted to cause protein sequence alterations, although some may influence miRNA binding, alternative splicing, or transcript stability (49–51). Given the much greater number of mutations within TGFβ signaling pathway genes for the low TES group, it is likely that many of these mutations tend to repress TGFβ signaling activity. These data also suggest that for cells within the high TES group, the TGFβ signaling pathway tends to be intact, implying that aberrant signaling is not driven by loss-of-function mutations in inhibitory pathway components, and indicating that these cells may respond to targeted inhibitor therapies. In agreement with this, Park and colleagues recently used a novel TGFBR1 inhibitor, IN-1130, to successfully block TGFβ-induced EMT in MCF10A and MDA-MB-231 breast cancer cell lines (52) both of which have a relatively high TES (Supplementary Tables S8 and S9).

The effects of EMT on drug efficacy are contentious. Recent reports suggest that EMT can drive general drug resistance (46, 53); however, our analysis shows that high TES and low TES groups respond differentially only for specific drugs, more consistent with the observations of Tan and colleagues which examined response across a spectrum of epithelial and mesenchymal phenotypes (7). We note a number of specific differences, particularly for cancer types such as bone and skin that tend to have a higher TES. For example, Tan and colleagues reported that more mesenchymal melanoma cell lines were resistant to GSK650394 (SGK1/SGK2 inhibitor), whereas those with a more epithelial phenotype were resistant to docetaxel. Our results showed no correlation between melanoma cell line TES and IC50 values for GSK650394 (ρ = 0.02) and docetaxel (ρ = −0.05) but instead suggested that high TES cell lines tend to show parthenolide resistance and sensitivity to CGP60474 (CDK1/CDK2 inhibitor). Furthermore, we found that bone cancer cell lines with high TES tended to be epothilone (microtubule inhibitor)-resistant and erlotinib (EGFR inhibitor)-sensitive, whereas Tan and colleagues found little correlation for the efficacy of these drugs. These observations concur with our previous work that demonstrated drug responses were strongly influenced by the nature of the stimulus that generated EMT and highlight the need for stimulus-specific signatures such as the one we present here.

We believe our TES can be used to inform a number of therapeutic options. The correlative associations indicating resistance for several drugs in high TES cell lines raises the possibility that combination therapies with TGFβ signaling inhibitors may increase sensitivity for these drugs. Furthermore, a number of studies have shown that chemotherapy-induced TGFβ signaling can promote a CSC phenotype and/or drug resistance. In this context, combination therapy using TGFβ inhibitors and chemotherapeutic agents has been examined in cell line and xenograft models (54–57) where they show promise (56–58). Finally, for high TES samples, the application of TGFβ inhibitors alone may be a potential treatment strategy. In support of this, antisense TGFβ1/2 oligonucleotides (e.g., AP12009) are efficacious in the clinical treatment of patients with aggressive gliomas (59), which have a high TES.

When working with complex tissue samples such as those derived from tumor biopsies, there is likely to be at least some degree of stromal contamination. We show that for TCGA pan-cancer samples, tumor purity (CPE) and TES have a weak negative correlation (Fig. 5E–G), suggesting that higher TES values may be associated with an increased fraction of stromal cells. The strongest negative association between TES and tumor purity was observed for colorectal cancers, suggesting that they may be sensitive to confounding effects from tumor purity. Intriguingly, however, these tumor samples had a relatively low TES, whereas brain cancers (e.g., glioblastoma and low-grade glioma) which tend to show higher TES values often had greater sample purity, suggesting that stromal contamination is not a dominant effect. We note that TGFβ mediates stromal–tumor interactions and can drive tumor microenvironment remodeling to promote progression at all stages of carcinogenesis (60, 61), which may also influence the TES; however these effects were not explored in detail.

The relationship between patient survival and EMT-related gene signatures is discordant across the literature. These inconsistencies may reflect the role of phenotypic switching in disease progression, and the need for MET in establishment and growth of metastasis (62, 63); alternatively, these results may have arisen due to difficulties with taking a biopsy from the invasive front of a clinical tumor sample. For example, Taube and colleagues found no correlation between their core EMT signature and breast cancer patient survival (5), whereas Tan and colleagues showed a tumor-type–dependent association between their EMT signature and survival and surprisingly better overall survival outcomes for mesenchymal breast cancer and malignant melanoma (7). Conversely, Shipitsin and colleagues reported an association between poor distant metastasis-free survival and higher expression values for TGFβ signaling pathway elements that were upregulated in CD44+ breast cancer samples for one of their datasets (37).

We show that tumors with a high TES are significantly associated with poor survival within the TCGA pan-cancer data, but there were large differences between cancers. The association between TES and survival was significant for brain, kidney, lining of body cavities, adrenal gland, thyroid gland, and eye cancers (P < 0.05). For some cancer types, there was no significant association between TES and survival, although a similar trend was observed, such that the high TES group tended to have reduced survival. Surprisingly, for hematopoietic or head and neck cancers, a lower TES was associated with reduced survival rates. This may reflect the fact that hematopoietic cells are not epithelial-derived, thus different molecular mechanisms may be active, whereas a large proportion of the head and neck cancers is squamous cell carcinomas where TGFβ signaling elements such as TGFBR2, SMAD4 and SMAD2 are strongly downregulated (64). Consistent with our observation, it has been shown that patients with head and neck squamous cell carcinoma (HNSCC) who do not have phospho-SMAD2/3 show better survival than patients with active phospho-SMAD2/3 (65).

Given the increasing availability of experimental and clinical drugs targeting the TGFβ signaling pathway (22), our TGFβ–EMT gene signature may be useful to identify clinical subsets of patients who will benefit from targeted treatment to inhibit metastasis. We believe that our results illustrate several advantages for stimulus-specific classification of EMT programs; in particular, the ability to pair signatures may facilitate improved molecular stratification of cancer subtypes for targeted therapies. The future of personalized therapeutics will certainly benefit from further development of driver-specific signatures for clinically relevant phenotypic programs to better target therapies.

No potential conflicts of interest were disclosed.

Conception and design: M. Foroutan, J. Cursons, E.W. Thompson, M.J. Davis

Development of methodology: M. Foroutan, J. Cursons, M.J. Davis

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M. Foroutan, J. Cursons, E.W. Thompson, M.J. Davis

Writing, review, and/or revision of the manuscript: M. Foroutan, J. Cursons, E.W. Thompson, M.J. Davis

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S. Hediyeh-Zadeh

Study supervision: J. Cursons, E.W. Thompson, M.J. Davis

The authors would like to acknowledge the TCGA Research Network (http://cancergenome.nih.gov/) for generating the data used in this article. We thank Professor Greg Goodall, Associate Professor Frederic Hollande, and Dr. Hong-Jian Zhu for the valuable suggestions and feedback and Professor Terry Speed for his advice in data analysis.

M. Foroutan was supported by Melbourne International Fee Remission Scholarship (MIFRS) and Melbourne International Research Scholarship (MIRS). J. Cursons was supported by Australian Research Council Centre of Excellence in Convergent Bio-Nano Science and Technology (project number CE140100036). E.W. Thompson was supported in part by the EMPathy Breast Cancer Network (CG-10-04) a National Collaborative Research Program of the National Breast Cancer Foundation, Australia. The Translational Research Institute is supported by a grant from the Australian Government. M.J. Davis was supported by National Breast Cancer Foundation (NBCF-ECF-043-14).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Hay
ED
. 
An overview of epithelio-mesenchymal transformation
.
Acta Anat (Basel)
1995
;
154
:
8
20
.
2.
Iwano
M
,
Plieth
D
,
Danoff
TM
,
Xue
C
,
Okada
H
,
Neilson
EG
. 
Evidence that fibroblasts derive from epithelium during tissue fibrosis
.
J Clin Invest
2002
;
110
:
341
50
.
3.
Diepenbruck
M
,
Christofori
G
. 
Epithelial-mesenchymal transition (EMT) and metastasis: yes, no, maybe?
Curr Opin Cell Biol
2016
;
43
:
7
13
.
4.
Reka
AK
,
Chen
G
,
Jones
RC
,
Amunugama
R
,
Kim
S
,
Karnovsky
A
, et al
Epithelial-mesenchymal transition-associated secretory phenotype predicts survival in lung cancer patients
.
Carcinogenesis
2014
;
35
:
1292
300
.
5.
Taube
JH
,
Herschkowitz
JI
,
Komurov
K
,
Zhou
AY
,
Gupta
S
,
Yang
J
, et al
Core epithelial-to-mesenchymal transition interactome gene-expression signature is associated with claudin-low and metaplastic breast cancer subtypes
.
Proc Natl Acad Sci U S A
2010
;
107
:
15449
54
.
6.
Groger
CJ
,
Grubinger
M
,
Waldhor
T
,
Vierlinger
K
,
Mikulits
W
. 
Meta-analysis of gene expression signatures defining the epithelial to mesenchymal transition during cancer progression
.
PLoS One
2012
;
7
:
e51136
.
7.
Tan
TZ
,
Miow
QH
,
Miki
Y
,
Noda
T
,
Mori
S
,
Huang
RY
, et al
Epithelial-mesenchymal transition spectrum quantification and its efficacy in deciphering survival and drug responses of cancer patients
.
EMBO Mol Med
2014
;
6
:
1279
93
.
8.
Cursons
J
,
Leuchowius
KJ
,
Waltham
M
,
Tomaskovic-Crook
E
,
Foroutan
M
,
Bracken
CP
, et al
Stimulus-dependent differences in signalling regulate epithelial-mesenchymal plasticity and change the effects of drugs in breast cancer cell lines
.
Cell Commun Signal
2015
;
13
:
26
.
9.
Xu
J
,
Lamouille
S
,
Derynck
R
. 
TGF-β-induced epithelial to mesenchymal transition
.
Cell Res
2009
;
19
:
156
72
.
10.
Myeroff
LL
,
Parsons
R
,
Kim
SJ
,
Hedrick
L
,
Cho
KR
,
Orth
K
, et al
A transforming growth factor beta receptor type II gene mutation common in colon and gastric but rare in endometrial cancers with microsatellite instability
.
Cancer Res
1995
;
55
:
5545
7
.
11.
Taylor
MA
,
Parvani
JG
,
Schiemann
WP
. 
The pathophysiology of epithelial-mesenchymal transition induced by transforming growth factor-β in normal and malignant mammary epithelial cells
.
J Mammary Gland Biol Neoplasia
2010
;
15
:
169
90
.
12.
Wendt
MK
,
Allington
TM
,
Schiemann
WP
. 
Mechanisms of the epithelial-mesenchymal transition by TGF-β
.
Future Oncol
2009
;
5
:
1145
68
.
13.
Chen
ZY
,
Wang
PW
,
Shieh
DB
,
Chiu
KY
,
Liou
YM
. 
Involvement of gelsolin in TGF-beta 1 induced epithelial to mesenchymal transition in breast cancer cells
.
J Biomed Sci
2015
;
22
:
90
.
14.
Gregory
PA
,
Bracken
CP
,
Smith
E
,
Bert
AG
,
Wright
JA
,
Roslan
S
, et al
An autocrine TGF-β/ZEB/miR-200 signaling network regulates establishment and maintenance of epithelial-mesenchymal transition
.
Mol Biol Cell
2011
;
22
:
1686
98
.
15.
Richards
EJ
,
Zhang
G
,
Li
ZP
,
Permuth-Wey
J
,
Challa
S
,
Li
Y
, et al
Long non-coding RNAs (LncRNA) regulated by transforming growth factor (TGF) β: LncRNA-hit-mediated TGF-induced epithelial to mesenchymal transition in mammary epithelia
.
J Biol Chem
2015
;
290
:
6857
67
.
16.
Desnoyers
G
,
Frost
LD
,
Courteau
L
,
Wall
ML
,
Lewis
SM
. 
Decreased eIF3e expression can mediate epithelial-to-mesenchymal transition through activation of the TGFβ signaling pathway
.
Mol Cancer Res
2015
;
13
:
1421
30
.
17.
Gillis
LD
,
Lewis
SM
. 
Decreased eIF3e/Int6 expression causes epithelial-to-mesenchymal transition in breast epithelial cells
.
Oncogene
2013
;
32
:
3598
605
.
18.
Muraoka-Cook
RS
,
Dumont
N
,
Arteaga
CL
. 
Dual role of transforming growth factor β in mammary tumorigenesis and metastatic progression
.
Clin Cancer Res
2005
;
11
:
937s
43s
.
19.
Freytag
J
,
Wilkins-Port
CE
,
Higgins
CE
,
Higgins
SP
,
Samarakoon
R
,
Higgins
PJ
. 
PAI-1 mediates the TGF-β1+EGF-induced "scatter" response in transformed human keratinocytes
.
J Invest Dermatol
2010
;
130
:
2179
90
.
20.
Smith
AL
,
Robin
TP
,
Ford
HL
. 
Molecular pathways: targeting the TGF-β pathway for cancer therapy
.
Clin Cancer Res
2012
;
18
:
4514
21
.
21.
Katz
LH
,
Li
Y
,
Chen
JS
,
Munoz
NM
,
Majumdar
A
,
Chen
J
, et al
Targeting TGF-β signaling in cancer
.
Expert Opin Ther Targets
2013
;
17
:
743
60
.
22.
Akhurst
RJ
,
Hata
A
. 
Targeting the TGFβ signalling pathway in disease
.
Nat Rev Drug Discov
2012
;
11
:
790
811
.
23.
Wilson
CL
,
Miller
CJ
. 
Simpleaffy: a BioConductor package for Affymetrix quality control and data analysis
.
Bioinformatics
2005
;
21
:
3683
5
.
24.
Chang
LC
,
Lin
HM
,
Sibille
E
,
Tseng
GC
. 
Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline
.
BMC bioinformatics
2013
;
14
:
368
.
25.
Leek
JT
,
Johnson
WE
,
Parker
HS
,
Jaffe
AE
,
Storey
JD
. 
The sva package for removing batch effects and other unwanted variation in high-throughput experiments
.
Bioinformatics
2012
;
28
:
882
3
.
26.
Johnson
WE
,
Li
C
,
Rabinovic
A
. 
Adjusting batch effects in microarray expression data using empirical Bayes methods
.
Biostatistics
2007
;
8
:
118
27
.
27.
Reinhold
WC
,
Sunshine
M
,
Liu
H
,
Varma
S
,
Kohn
KW
,
Morris
J
, et al
CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set
.
Cancer Res
2012
;
72
:
3499
511
.
28.
Barretina
J
,
Caponigro
G
,
Stransky
N
,
Venkatesan
K
,
Margolin
AA
,
Kim
S
, et al
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
.
Nature
2012
;
483
:
603
7
.
29.
Garnett
MJ
,
Edelman
EJ
,
Heidorn
SJ
,
Greenman
CD
,
Dastur
A
,
Lau
KW
, et al
Systematic identification of genomic markers of drug sensitivity in cancer cells
.
Nature
2012
;
483
:
570
5
.
30.
Neve
RM
,
Chin
K
,
Fridlyand
J
,
Yeh
J
,
Baehner
FL
,
Fevr
T
, et al
A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes
.
Cancer Cell
2006
;
10
:
515
27
.
31.
Heiser
LM
,
Sadanandam
A
,
Kuo
WL
,
Benz
SC
,
Goldstein
TC
,
Ng
S
, et al
Subtype and pathway specific responses to anticancer compounds in breast cancer
.
Proc Natl Acad Sci U S A
2012
;
109
:
2724
9
.
32.
Barrett
T
,
Wilhite
SE
,
Ledoux
P
,
Evangelista
C
,
Kim
IF
,
Tomashevsky
M
, et al
NCBI GEO: archive for functional genomics data sets–update
.
Nucleic Acids Res
2013
;
41
:
D991
5
.
33.
Ritchie
ME
,
Phipson
B
,
Wu
D
,
Hu
Y
,
Law
CW
,
Shi
W
, et al
limma powers differential expression analyses for RNA-sequencing and microarray studies
.
Nucleic Acids Res
2015
;43:e47.
34.
Smyth
GK
. 
Limma: linear models for microarray data
.
In
:
Bioinformatics and computational biology solutions using R and Bioconductor
.
Springer-Verlag New York
; 
2005
. p
397
420
.
35.
Dreyfuss
JM
,
Johnson
MD
,
Park
PJ
. 
Meta-analysis of glioblastoma multiforme versus anaplastic astrocytoma identifies robust gene markers
.
Mol Cancer
2009
;
8
:
71
.
36.
Du
L
,
Yamamoto
S
,
Burnette
BL
,
Huang
D
,
Gao
K
,
Jamshidi
N
, et al
Transcriptome profiling reveals novel gene expression signatures and regulating transcription factors of TGFβ-induced epithelial-to-mesenchymal transition
.
Cancer Med
2016
;
5
:
1962
72
.
37.
Shipitsin
M
,
Campbell
LL
,
Argani
P
,
Weremowicz
S
,
Bloushtain-Qimron
N
,
Yao
J
, et al
Molecular definition of breast tumor heterogeneity
.
Cancer Cell
2007
;
11
:
259
73
.
38.
James
D
,
Levine
AJ
,
Besser
D
,
Hemmati-Brivanlou
A
. 
TGFβ/activin/nodal signaling is necessary for the maintenance of pluripotency in human embryonic stem cells
.
Development
2005
;
132
:
1273
82
.
39.
Blick
T
,
Hugo
H
,
Widodo
E
,
Waltham
M
,
Pinto
C
,
Mani
SA
, et al
Epithelial mesenchymal transition traits in human breast cancer cell lines parallel the CD44(hi/)CD24 (lo/-) stem cell phenotype in human breast cancer
.
J Mammary Gland Biol Neoplasia
2010
;
15
:
235
52
.
40.
Verhaak
RG
,
Tamayo
P
,
Yang
JY
,
Hubbard
D
,
Zhang
H
,
Creighton
CJ
, et al
Prognostically relevant gene signatures of high-grade serous ovarian carcinoma
.
J Clin Invest
2013
;
123
:
517
25
.
41.
Hanzelmann
S
,
Castelo
R
,
Guinney
J
. 
GSVA: gene set variation analysis for microarray and RNA-seq data
.
BMC
2013
;
14
:
7
.
42.
Shankavaram
UT
,
Varma
S
,
Kane
D
,
Sunshine
M
,
Chary
KK
,
Reinhold
WC
, et al
CellMiner: a relational database and query tool for the NCI-60 cancer cell lines
.
BMC
2009
;
10
:
277
.
43.
Seashore-Ludlow
B
,
Rees
MG
,
Cheah
JH
,
Cokol
M
,
Price
EV
,
Coletti
ME
, et al
Harnessing connectivity in a large-scale small-molecule sensitivity dataset
.
Cancer Discov
2015
;
5
:
1210
23
.
44.
Basu
A
,
Bodycombe
NE
,
Cheah
JH
,
Price
EV
,
Liu
K
,
Schaefer
GI
, et al
An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules
.
Cell
2013
;
154
:
1151
61
.
45.
Maheswaran
S
,
Haber
DA
. 
Cell fate: Transition loses its invasive edge
.
Nature
2015
;
527
:
452
3
.
46.
Shang
Y
,
Cai
X
,
Fan
D
. 
Roles of epithelial-mesenchymal transition in cancer drug resistance
.
Curr Cancer Drug Targets
2013
;
13
:
915
29
.
47.
Quail
DF
,
Joyce
JA
. 
Microenvironmental regulation of tumor progression and metastasis
.
Nat Med
2013
;
19
:
1423
37
.
48.
Aran
D
,
Sirota
M
,
Butte
AJ
. 
Systematic pan-cancer analysis of tumour purity
.
Nat Commun
2015
;
6
:
8971
.
49.
Halvorsen
M
,
Martin
JS
,
Broadaway
S
,
Laederach
A
. 
Disease-associated mutations that alter the RNA structural ensemble
.
PLoS Genet
2010
;
6
:
e1001074
.
50.
Orom
UA
,
Nielsen
FC
,
Lund
AH
. 
MicroRNA-10a binds the 5prime;UTR of ribosomal protein mRNAs and enhances their translation
.
Mol Cell
2008
;
30
:
460
71
.
51.
Palaniswamy
R
,
Teglund
S
,
Lauth
M
,
Zaphiropoulos
PG
,
Shimokawa
T
. 
Genetic variations regulate alternative splicing in the 5′ untranslated regions of the mouse glioma-associated oncogene 1, Gli1
.
BMC Mol Biol
2010
;
11
:
32
.
52.
Park
CY
,
Min
KN
,
Son
JY
,
Park
SY
,
Nam
JS
,
Kim
DK
, et al
An novel inhibitor of TGF-β type I receptor, IN-1130, blocks breast cancer lung metastasis through inhibition of epithelial-mesenchymal transition
.
Cancer Lett
2014
;
351
:
72
80
.
53.
Singh
A
,
Settleman
J
. 
EMT, cancer stem cells and drug resistance: an emerging axis of evil in the war on cancer
.
Oncogene
2010
;
29
:
4741
51
.
54.
Teicher
BA
,
Maehara
Y
,
Kakeji
Y
,
Ara
G
,
Keyes
SR
,
Wong
J
, et al
Reversal of in vivo drug resistance by the transforming growth factor-β inhibitor decorin
.
Int J Cancer
1997
;
71
:
49
58
.
55.
Bandyopadhyay
A
,
Wang
L
,
Agyin
J
,
Tang
Y
,
Lin
S
,
Yeh
IT
, et al
Doxorubicin in combination with a small TGFβ inhibitor: a potential novel therapy for metastatic breast cancer in mouse models
.
PLoS One
2010
;
5
:
e10365
.
56.
Kim
YJ
,
Hwang
JS
,
Hong
YB
,
Bae
I
,
Seong
YS
. 
Transforming growth factor beta receptor I inhibitor sensitizes drug-resistant pancreatic cancer cells to gemcitabine
.
Anticancer Res
2012
;
32
:
799
806
.
57.
Park
SY
,
Kim
MJ
,
Park
SA
,
Kim
JS
,
Min
KN
,
Kim
DK
, et al
Combinatorial TGF-β attenuation with paclitaxel inhibits the epithelial-to-mesenchymal transition and breast cancer stem-like cells
.
Oncotarget
2015
;
6
:
37526
43
.
58.
Bhola
NE
,
Balko
JM
,
Dugger
TC
,
Kuba
MG
,
Sanchez
V
,
Sanders
M
, et al
TGF-β inhibition enhances chemotherapy action against triple-negative breast cancer
.
J Clin Invest
2013
;
123
:
1348
58
.
59.
Hau
P
,
Jachimczak
P
,
Schlingensiepen
R
,
Schulmeyer
F
,
Jauch
T
,
Steinbrecher
A
, et al
Inhibition of TGF-β2 with AP 12009 in recurrent malignant gliomas: from preclinical to phase I/II studies
.
Oligonucleotides
2007
;
17
:
201
12
.
60.
Neuzillet
C
,
Tijeras-Raballand
A
,
Cohen
R
,
Cros
J
,
Faivre
S
,
Raymond
E
, et al
Targeting the TGFβ pathway for cancer therapy
.
Pharmacol Ther
2015
;
147
:
22
31
.
61.
Drabsch
Y
,
ten Dijke
P
. 
TGF-β signalling and its role in cancer progression and metastasis
.
Cancer Metast Rev
2012
;
31
:
553
68
.
62.
Korpal
M
,
Ell
BJ
,
Buffa
FM
,
Ibrahim
T
,
Blanco
MA
,
Celia-Terrassa
T
, et al
Direct targeting of Sec 23a by miR-200s influences cancer cell secretome and promotes metastatic colonization
.
Nat Med
2011
;
17
:
1101
8
.
63.
Haviv
I
,
Thompson
EW
. 
Soiling the seed: microenvironment and epithelial mesenchymal plasticity
.
Cancer Microenviron
2012
;
5
:
1
3
.
64.
White
RA
,
Malkoski
SP
,
Wang
XJ
. 
TGFβ signaling in head and neck squamous cell carcinoma
.
Oncogene
2010
;
29
:
5437
46
.
65.
Xie
W
,
Aisner
S
,
Baredes
S
,
Sreepada
G
,
Shah
R
,
Reiss
M
. 
Alterations of Smad expression and activation in defining 2 subtypes of human head and neck squamous cell carcinoma
.
Head Neck
2013
;
35
:
76
85
.

Supplementary data