Carcinomas originate from epithelial tissues, which have apical (luminal) and basal orientations. The degree of luminal versus basal differentiation in cancer has been shown to be biologically important in some carcinomas and impacts treatment response.
Although prior studies have focused on individual cancer types, we used a modified clinical-grade classifier (PAM50) to subtype 8,764 tumors across 22 different carcinomas into luminal A, luminal B, and basal-like tumors.
We found that all epithelial tumors demonstrated similar gene expression–based luminal/basal subtypes. As expected, basal-like tumors were associated with increased expression of the basal markers KRT5/6 and KRT14, and luminal-like tumors were associated with increased expression of the luminal markers KRT20. Luminal A tumors consistently had improved outcomes compared with basal across many tumor types, with luminal B tumors falling between the two. Basal tumors had the highest rates of TP53 and RB1 mutations and copy number loss. Luminal breast, cervical, ovarian, and endometrial tumors had increased ESR1 expression, and luminal prostate, breast, cervical, and bladder tumors had increased androgen receptor (AR) expression. Furthermore, luminal B tumors had the highest rates of AR and ESR1 mutations and had increased sensitivity in vitro to bicalutamide and tamoxifen. Luminal B tumors were more sensitive to gemcitabine, and basal tumors were more sensitive to docetaxel.
This first pan-carcinoma luminal/basal subtyping across epithelial tumors reveals global similarities across carcinomas in the transcriptome, genome, clinical outcomes, and drug sensitivity, emphasizing the biological and translational importance of these luminal versus basal subtypes.
By definition, epithelial tissues all have apical (luminal) and basal orientations (1). Tumors originating from epithelial tissues (e.g., carcinomas) may reflect this dichotomy with relative degrees of luminal or basal differentiation (1, 2). Understanding this key biological difference is important because the luminal-ness or basal-ness of a particular tumor may impact both overall prognosis and response to treatment. Clinically important luminal and basal subtypes of several different carcinomas have been described previously. The PAM50 subtyping is a clinical-grade luminal-basal classifier, which has been used to group breast cancers into Luminal A (LumA), Luminal B (LumB), Basal, and Her2-like subsets (3, 4). The luminal breast cancer subtypes express higher levels of ER and PR and are more responsive to hormonal therapy (5). The luminal and basal subtypes of prostate cancer were also recently described using a slightly modified PAM50 algorithm (6). Analogous to breast cancer, the luminal subtypes of prostate cancer exhibited higher expression of AR and LumB-like tumors preferentially benefited from androgen deprivation therapy (6). Bladder cancers also demonstrate luminal and basal subtypes, which predict response to first-line chemotherapy (7).
Although carcinomas have historically been classified and treated primarily based on their histology and anatomic site of origin, there is reason to believe that a pan-carcinoma classification schema such as PAM50 could have utility across a number of cancer types independent of site of origin. Although the landmark publication of The Cancer Genome Atlas (TCGA) pan-cancer atlas demonstrated that the cell-of-origin patterns dominate the biological differences between cancers (8), commonalities were noted within gynecologic/breast cancers (9), gastrointestinal adenocarcinomas (10), and squamous cell carcinomas (11). Furthermore, numerous common biological axes transcending tumor type were found, including driver mutations (12), oncogenic signaling pathways (13), DNA repair defects (14), metabolomics subtypes (15), immunity (16), and stem-ness (17). To our knowledge, no pan-cancer RNA-based subtypes have yet been identified.
We hypothesized that luminal/basal subtypes represent an important and clinically meaningful measure of tumor biology that transcends cancer type. To test this, we utilized a modified PAM50 (6) to classify 8764 tumors across 22 different tumor types into luminal and basal subtypes. In the first pan-carcinoma study of its kind, we show that luminal and basal subtypes exist for all epithelial-derived tumors, and that these subtypes exhibit different patterns of expression, genomic alterations, clinical outcomes, and response to therapy.
Materials and Methods
Luminal and basal subtyping
Subtyping into luminal and basal subtypes was performed using the original PAM50 algorithm (18) (Supplementary Table S1). Although other luminal/basal subtyping algorithms exist, PAM50 is the only one that has been developed into a commercial clinical test (18) and has been demonstrated to work in multiple tumor types (4, 6, 7). Source code was downloaded from the University of North Carolina Microarray Database (https://genome.unc.edu/pubsup/breastGEO/) and run without modification as has been performed in other tumor types (3, 6). Because the majority of epithelial tumors are not known to be HER2 driven, we excluded the HER2 subtype and instead only LumA, LumB, and Basal subtypes were assigned as described previously (6).
Gene set enrichment analysis
Identification of genes correlated with subtypes was performed by first assessing Spearman's correlation for each gene with the LumA, LumB, and Basal-ness scores from the PAM50 algorithm. Additional subtype-specific genes were identified by selecting genes with a Spearman ρ ≥ 0.4 and a multiple-testing adjusted P value (FDR) ≤ 0.05 with one subtype, and Spearman ρ ≤ 0.2 with the other two subtypes. The correlation coefficients were then input to gene set enrichment analysis (GSEA) preranked. The hallmark epithelial–mesenchymal transition (EMT) gene set was used, as well as a custom gene set for the nuclear hormone receptor family obtained from the HUGO Gene Nomenclature Committee (www.genenames.org/cgi-bin/genefamilies/set/71).
TCGA pan-cancer data were obtained via the UCSC Xena Browser. The GDC HTSeq FPKM RNAseq dataset, the Mutect2 somatic mutation dataset, and the Affymetrix SNP Array 6.0 masked copy number segment dataset were downloaded for analysis (19). All datasets comprising epithelial tumors (carcinomas) were included [ACC (adrenocortical carcinoma), BLCA (bladder urothelial cancer), BRCA (breast cancer), CESC (cervical squamous cell cancer), CHOL (cholangiocarcinoma), COAD (colon adenocarcinoma), ESCA (esophageal carcinoma), HNSC (head & neck squamous cell carcinoma), KIRC (renal cell carcinoma), KIRP (renal papillary cell carcinoma), LIHC (hepatocellular carcinoma), LUAD (lung adenocarcinoma), LUSC (lung squamous cell carcinoma), MESO (mesothelioma), OV (ovarian serous cystadenocarcinoma), PAAD (pancreatic adenocarcinoma), PRAD (prostate adenocarcinoma), READ (rectal adenocarcinoma), STAD (gastric adenocarcinoma), THCA (thyroid carcinoma), THYM (thymoma), UCEC (endometrial carcinoma)]. Cutaneous carcinomas were not included in TCGA and thus are not represented. Mutations were counted if they were exonic and non-silent if in a coding gene. Copy number (CN) gain was defined as log2(CN/2) ≥ 1. Copy number loss was defined as shallow: log2(CN/2) ≤−1 or deep: log2(CN/2) ≤−2. TCGA proliferation scores, mutation rates, fraction altered, and aneuploidy scores were previously published (16). Comparison of luminal and basal markers was performed by first mean-centering log2(FPKM+1) and scaling by the SD to generate a z-score of each gene within each individual cancer type. This standardization was performed independently for each cancer type because gene expression ranges could vary.
Cancer cell line Affymetrix Human Genome U219 array gene expression and drug response data were obtained from the Genomics of Drug Sensitivity in Cancer (GDSC) project (ref. 20; www.cancerrxgene.org). Drug response was assessed using the IC50. Two dose ranges for bicalutamide were available, and the dosages that had more response data from cell lines from hormone-responsive tumors was selected (0.039–10 μmol/L).
Overall survival (OS) was the primary clinical outcome in the TCGA pan-cancer data, as it was available for all tumor types. All carcinomas were included in the above genomic analyses, but we excluded breast cancer from clinical analysis given the long natural history of the disease and the limited follow-up in TCGA, and the fact that the prognostic implications of the PAM50 subtypes of breast cancer have been extensively explored in the literature in more appropriate cohorts (3–5, 18). The TCGA prostate cancer cohort faces a similar issue of a long natural history and limited follow-up, and has likewise previously been investigated in large clinical cohorts (6). We also excluded thymoma, and thyroid cancer from the clinical analyses due to very low event rates suggesting similar issues. We excluded cholangiocarcinoma from clinical analysis given the small number of patients with outcomes available (N = 45). Comparison of continuous variables across subtypes was performed using ANOVA, with a post hoc Tukey test to examine individual groups. Comparison of categorical variables across subtypes was performed using Fisher exact test. All analyses performed in using R version 3.4.4. All statistical testing was two-sided, and a P ≤ 0.05 was considered significant. Multiple testing correction was performed using the Benjamini–Hochberg procedure.
We first subtyped 8764 TCGA pan-cancer tumor samples across 22 carcinoma types into LumA, LumB, and basal-like subtypes using the PAM50 clustering algorithm (Fig. 1A; Supplementary Table S2, Supplementary Fig. S1). The gene expression patterns in all tumor types were roughly consistent with the patterns seen in breast cancer. The frequently used basal markers of KRT5/6 (average of KRT5, KRT6A-C) (1) and KRT14 (21, 22) were both significantly increased across the basal-like carcinomas (t test P < 0.0001; Fig. 1B) and the luminal marker KRT20 (1) was significantly increased across the luminal-like carcinomas (t test P < 0.0001; Fig. 1C). GSEA revealed that the Hallmark EMT gene signature was most correlated with Basal-ness [normalized enrichment score (NES) = 1.87 for Basal vs. 1.45 for LumA and −2.67 for LumB] consistent with the literature in breast (23) and prostate (24) cancer (Supplementary Fig. S2). Pan-carcinoma proliferation scores were also modestly higher in basal-like tumors compared with LumB-like tumors (ANOVA P < 0.0001, Tukey P = 0.0015), and both were much higher compared with LumA-like tumors (Tukey P < 0.0001 for both), consistent with other tumor types (3, 6) (Supplementary Fig. S1). Additional subtype-specific genes are shown in Supplementary Table S3. Silhouette scores (a measure of cluster fit, ref. 25) were highest in breast cancer as expected (Supplementary Fig. S3).
For each cancer type, we then examined clinical outcome differences between the subtypes. In eight (ACC, KIRC, KIRP, LIHC, LUAD, MESO, PAAD, UCEC) of 17 different tumor types analyzed (5 of the initial 22 were not included in the clinical analysis, see Materials and Methods), we found that patients with basal-like tumors had significantly worse survival (FDR q < 0.05) compared with patients with LumA-like tumors, with LumB-like tumors falling between the two, akin to what has been reported for breast cancer (ref. 3; Fig. 2A–H). No LumA subgroup had significantly worse survival than basal in any cancer type.
We next explored differences in mutational profiles between subtypes. Overall, LumA-like carcinomas had lower mutation rates, fraction altered, and aneuploidy scores than LumB or basal-like carcinomas (ANOVA P < 0.0001, Tukey P < 0.0001; Supplementary Fig. S4; Supplementary Table S4). When we performed an unbiased ranking of all genes with an overall mutation rate ≥1% using Fisher exact multiple testing adjusted P values (FDR q values), we found that the top two most differentially mutated genes were TP53 and RB1 (Fig. 2I). TP53 mutation frequency was highest in the basal-like subtype overall (49.5% in basal, 25.0% in LumA, 36.0% in LumB, FDR q < 0.0001), as well as in 15 of 22 individual tumor types (Fig. 2J). RB1 mutation frequency was also highest in basal-like tumors overall, though only slightly less than in LumB-like tumors (5.9% in Basal, 1.9% in LumA, 5.6% in LumB, FDR q < 0.0001). RB1 mutations were least frequent in LumA-like tumors overall as well as in 12 of 19 individual tumor types with RB1 mutations and were tied for least frequent in 4 others (Fig. 2K). These results remain similar after accounting for inactivation by deep deletion (Supplementary Figs. S5 and S6). The full results for genes with differential rates of deep deletion can be found in Supplementary Table S5.
Luminal subtypes have been shown to express higher levels of hormone receptors and respond better to hormonal therapy in hormone-driven tumors (3, 5, 6). In breast cancer, as expected (3, 4), luminal tumors expressed ESR1 at higher levels (ANOVA P < 0.0001; Fig. 3A), and LumA-like tumors expressed PGR at the highest levels (ANOVA P < 0.0001; Fig. 3A). Interestingly, luminal breast tumors also expressed AR at higher levels (ANOVA P < 0.0001; Fig. 3A), consistent with prior publications (26). Surprisingly, luminal cervical squamous tumors demonstrated the same patterns of expression for ESR1, PGR, and AR (ANOVA P < 0.0001, P = 0.0001, P = 0.0004, respectively, Fig. 3A), providing additional evidence that hormonal receptors may play a role in cervical cancer (27, 28). We found that, similar to breast cancer, other female reproductive cancers such as ovarian and endometrial cancers (Fig. 3B) likewise expressed ER at higher levels (ovarian: ANOVA P < 0.0001; endometrial: ANOVA P < 0.0001), and LumA-like tumors expressed PR at higher levels in endometrial cancer (ANOVA P < 0.0001). Analogously, we found that luminal-like prostate tumors expressed AR at higher levels compared with basal-like tumors (ANOVA P < 0.0001; Fig. 3C), which is supported by existing literature (6). A small percentage of bladder tumors are also known to express AR (29), and we found that luminal-like bladder tumors also express AR at higher levels than basal tumors (ANOVA P < 0.0001; Fig. 3C). We performed a global analysis of nuclear hormone receptors using GSEA and found a strong positive association only with LumA (NES = 2.39) and negative associations with LumB (NES = −2.32) and Basal (NES = −2.02) (Supplementary Fig. S2).
Exploratory drug response
We sought to determine whether the biological differences between basal and luminal subtypes could confer differing sensitivity to specific treatments. In the GDSC cell line drug response data, we grouped 421 carcinoma cell lines into the same luminal and basal subtypes (Fig. 1A; Supplementary Table S6) and compared the response across subtypes in four hormone-driven tumors commonly treated with antihormonal therapies (breast, prostate, ovarian, and endometrial cancer). We found that in cell lines from these tumors, LumB-like tumors were preferentially sensitive to both tamoxifen (ANOVA P = 0.043, endometrial cancer excluded since tamoxifen is a partial agonist; Fig. 3D) and bicalutamide (ANOVA P = 0.0028; Fig. 3D). Interestingly, when we examined mutations and CN gains for AR and ESR1 across carcinomas, we found that AR and ESR1 mutations or CN gains were more frequent in LumB-like tumors [P = 0.049 and P = 0.008, respectively (Fig. 3E), including in hormone-driven tumors (Supplementary Figs. S7 and S8)]. The full results for genes with differential rates of CN gain can be found in Supplementary Table S7.
We also examined drug response data across all carcinoma cell lines for 11 chemotherapeutic agents in GDSC that are used in clinical practice to treat carcinomas. After accounting for multiple testing, we found that gemcitabine and docetaxel had significant differences in drug response between subtypes (ANOVA FDR q < 0.05). LumB-like cell lines showed increased sensitivity to gemcitabine, whereas basal cell lines showed increased sensitivity to docetaxel (Fig. 4). These results suggest that the biological differences between subtypes may have clinical implications and provide preliminary evidence that these subtypes may be important in selecting therapies in patients with cancer.
Herein, we describe the first luminal–basal molecular classification scheme applied broadly across a broad array of carcinomas and demonstrate that luminal and basal subtypes are present across all tumor histologies regardless of site of origin. We show that across cancer types, there are consistent differences in gene expression, mutation/CN alteration patterns, and clinical outcomes between molecular subtypes. Our preliminary data suggest that these differences may result in differing sensitivities to specific therapies. The basal markers KRT5/6, KRT14, and the luminal marker KRT20, are concordant with these subtypes across carcinomas. Furthermore, the pan-carcinoma patterns of TP53 and RB1 mutation/CN loss reflect previously reported findings in breast and bladder cancer (30, 31). The proliferation score pattern also matches what is found in breast and prostate cancer (3, 6). The consistency of our findings across cancer types with the published literature in breast and bladder cancer supports the biological validity of these subtypes.
Furthermore, we have shown that these molecular subtypes predict clinical outcome. In eight tumor types, patients with basal-like tumors had significantly worse survival compared with LumA-like tumors, with LumB-like tumors falling somewhere in between, matching prior reports in breast cancer and bladder cancer. In breast cancer, LumA-like tumors tend to have better outcomes than LumB-like tumors, and both tend to have better outcomes than basal-like tumors, although with longer term follow-up of about 10 years, the LumB-like outcomes converge with basal tumors (32). In bladder cancer, the luminal-like tumors likewise have better outcomes than basal-like tumors (31). These results are likely driven, in part, by the strong difference in the proliferation-related genes between the LumA-like and basal-like tumors. However, while LumB-like tumors have similar proliferation scores to basal-like tumors, they do not always have similar survival or mutational or gene expression profiles indicating other important biological differences.
Luminal and basal subtypes are perhaps best known for their implications in treatment response for hormonal therapies. Luminal breast cancers have been shown to respond preferentially to antiestrogen therapies (5). More recently, LumB-like prostate cancers have been shown to respond preferentially to antiandrogen therapies (6), and this is now being tested in a randomized national trial (clinicalTrials.gov ID: NCT03371719). In our exploratory cell line drug response analysis, we found that LumB-like hormone-driven tumors overall responded better to both antiestrogen (tamoxifen) and antiandrogen (bicalutamide) therapies, and LumB-like tumors also had globally increased rates of mutation or CN gain of ESR1 and AR. This is suggestive that these subtypes may have potential in selecting patients who preferentially benefit from antihormonal therapies in other hormone-driven tumors such as endometrial and ovarian cancer.
Luminal and basal subtypes have also been implicated in treatment response to cytotoxic chemotherapies. In breast cancer, the basal subtype has been shown to especially benefit from taxane therapy (33), consistent with our exploratory pan-carcinoma drug sensitivity results. Basal-like bladder tumors have been shown to preferentially benefit from several other chemotherapies (7), although we did not observe this globally in our cell line data. LumA-like metastatic breast cancers have also been shown to benefit less from gemcitabine with carboplatin (34), consistent with our cell line results, although other trials have shown that the benefit of gemcitabine is primarily in basal-like breast cancers (35). Nonetheless, our exploratory analysis would suggest that the luminal and basal subtypes across carcinomas may respond differently to chemotherapies.
This study is not without limitations. We are unable to account for the effect of tumor heterogeneity, as the TCGA performed bulk tumor sequencing, and does not include single-cell RNAseq data. Similarly, bulk sequencing also includes other cell types such as stroma, vasculature, immune infiltrate, etc., which can affect the gene expression. However, this is an inherent limitation of all cancer subtyping efforts performed on bulk sequencing of tumors to date, and is not unique to our study. The cell line data should be minimally affected by these issues.
Current systemic treatment of solid tumors is largely driven by histology and site of origin. However, the completion of the landmark TCGA-sequencing efforts has revealed that there are many commonalities between neoplasms that transcend organ sites (12–17). We demonstrate that many epithelial tumors, including but also extending well beyond breast, bladder, and prostate cancer, have luminal and basal subtypes that are biologically and clinically meaningful. Observations such as this are the impetus behind moving toward a new paradigm in oncology where molecular information is used to subgroup and ultimately target treatments for patients with cancer.
Disclosure of Potential Conflicts of Interest
S.G. Zhao reports receiving other commercial research support from and holds ownership interest (including patents) in GenomeDx Biosciences. S.L. Chang is an employee of PFS Genomics. E. Davicioni is an employee of and holds ownership interest (including patents) in GenomeDx Biosciences. S. Jolly is a consultant/advisory board member for Varian and AstraZeneca. P.L. Nguyen reports receiving commercial research grants from Janssen, Astellas, and Bayer; holds ownership interest (including patents) in Augmenix; and is a consultant/advisory board member for Augmenix, Ferring, Blue Earth, Bayer, Cota, Dendreon, GenomeDx, and Nanobiotix. E.J. Small reports receiving commercial research grants from Janssen, and is a consultant/advisory board member for Janssen and Fortis Therapeutics. F.Y. Feng is an employee of PFS Genomics, and is a consultant/advisory board member for Sanofi, Janssen, Medivation/Astellas, Dandreon, Ferring, EMD Serono, Bayer, and Clovis. No potential conflicts of interest were disclosed by the other authors.
Conception and design: S.G. Zhao, W.S. Chen, S.A. Tomlins, B.A. Mahal, Y. Liu, E. Davicioni, E.J. Small, F.Y. Feng
Development of methodology: S.G. Zhao, W.S. Chen, H.X. Dang, Y. Liu, E. Davicioni, E.M. Posadas
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.G. Zhao, W.S. Chen, Y. Liu, E.M. Posadas, P.L. Nguyen, E.J. Small
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S.G. Zhao, W.S. Chen, R. Das, S.L. Chang, D.A. Quigley, H.X. Dang, B.A. Mahal, Y. Liu, S. Jolly, P.L. Nguyen, C.A. Maher, F.Y. Feng
Writing, review, and/or revision of the manuscript: S.G. Zhao, W.S. Chen, R. Das, S.L. Chang, S.A. Tomlins, J. Chou, D.A. Quigley, H.X. Dang, B.A. Mahal, E.A. Gibb, Y. Liu, E. Davicioni, L.R. Duska, E.M. Posadas, S. Jolly, D.E. Spratt, P.L. Nguyen, C.A. Maher, E.J. Small, F.Y. Feng
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S.G. Zhao, T.J. Barnard, D.E. Spratt, F.Y. Feng
Study supervision: S.G. Zhao, D.E. Spratt, F.Y. Feng
We would like to acknowledge the assistance of Steven Kronenberg with graphic design of the figures. S.G. Zhao, B.A. Mahal, D.A. Quigley, E.J. Small and F.Y. Feng are supported by the Prostate Cancer Foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.