Abstract
Lung carcinoids (LC) are rare and slow growing primary lung neuroendocrine tumors. We performed targeted exome sequencing, mRNA sequencing, and DNA methylation array analysis on macro-dissected LCs. Recurrent mutations were enriched for genes involved in covalent histone modification/chromatin remodeling (34.5%; MEN1, ARID1A, KMT2C, and KMT2A) as well as DNA repair (17.2%) pathways. Unsupervised clustering and principle component analysis on gene expression and DNA methylation profiles showed three robust molecular subtypes (LC1, LC2, LC3) with distinct clinical features. MEN1 gene mutations were found to be exclusively enriched in the LC2 subtype. LC1 and LC3 subtypes were predominately found at peripheral and endobronchial lung, respectively. The LC3 subtype was diagnosed at a younger age than LC1 and LC2 subtypes. IHC staining of two biomarkers, ASCL1 and S100, sufficiently stratified the three subtypes. This molecular classification of LCs into three subtypes may facilitate understanding of their molecular mechanisms and improve diagnosis and clinical management.
Integrative genomic analysis of lung carcinoids identifies three novel molecular subtypes with distinct clinical features and provides insight into their distinctive molecular signatures of tumorigenesis, diagnosis, and prognosis.
Introduction
Lung carcinoids (LC) are an indolent and rare type of primary lung neoplasms that are, in general, understudied. The 2015 World Health Organization (WHO; ref. 1) classification of LCs includes atypical carcinoids (AC; ∼0.2% prevalence) and typical carcinoids (TC; ∼2% prevalence). TCs are slow growing tumors that rarely spread beyond the lungs whereas ACs are faster growing tumors and have a greater chance of metastasizing to other tissues (2). The WHO classification relies mainly on morphology, proliferation rate (mitotic index), and necrosis assessment (3). This current method of classification has its drawbacks as studies have shown that the reproducibility of cancer classification and its prognostic efficacy have high interobserver variability (3, 4), especially for differentiating between TC and AC (5). Recent WHO classifications highlight use of the Ki67 cell proliferation marker to distinguish ACs from TCs (1). However, overlapping distribution of Ki67 between ACs and TCs does not enable reliable stratification between well-differentiated LCs (6, 7). It has also been reported that TCs and ACs are overdiagnosed as small cell lung carcinomas (SCLC) in small crush biopsy specimens (8), a situation where artifacts in specimens appear as bluish clusters in which cellular details are not recognizable. As SCLCs are highly malignant, incorrect diagnosis of TC and AC tumors as SCLC can subject patients to unnecessary stress and treatment (8). More accurate molecular diagnostic tools and stratification for LCs will help ensure more appropriate treatment and clinical management.
Previous genomic analysis of LC tumors has identified recurrent mutations in MEN1, PSIP1, and ARID1A (9), whereas no significant mutations or focal copy alterations were observed in genes that are frequently mutated in non–small cell lung cancer (NSCLC), large cell neuroendocrine carcinoma (LCNEC), and SCLC (e.g., KRAS, TP53, EGFR, and RB1; ref. 10). The different mutation spectrum and low mutation burden (9) of LCs indicate they are distinct from NSCLC and high-grade lung neuroendocrine tumors (NET). It is not known if there are distinct molecular subtypes of LCs or their cells of origin.
In this study, we performed genotyping on 29 LCs to detect mutations in a 354-cancer gene panel, mRNA sequencing (n = 30) and DNA methylation 450K-array analysis (n = 18) and identified 3 molecular subtypes with distinct clinical features. We also identified 2 key biomarkers (ASCL1 and S100) to stratify these subtypes. Integration of genetic and epigenetic signatures distinguishes each subtype of carcinoid, providing deeper insight into their distinctive molecular signatures of tumorigenesis as well as cells of origin.
Materials and Methods
Patient's data
Retrospective and prospective reviews of 30 LC neoplasms were performed using the pathology files and institutional database at Memorial Sloan Kettering Cancer Center (MSKCC, New York, NY). All studies were conducted in accordance with appropriate ethical guidelines (following U.S. Common Rule) and with Institutional Review Board approval. Written informed consent was obtained from the patients. All patients were evaluated clinically with confirmed pathologic diagnoses, appropriate radiological and laboratory studies, and surgical or oncological management. Fresh-frozen tumor and paired normal tissues were obtained from MSKCC's tissue bank under Institutional Review Board protocol. Targeted cancer gene panel DNA sequencing, RNA sequencing (RNA-seq), and DNA methylation array were performed on fresh-frozen samples. Relevant clinical and pathologic information is presented in Supplementary File S1.
DNA sequencing and analysis
DNA extraction from microdissected tumor samples and normal adjacent tissues was performed using a commercially available DNA Extraction Kit (DNeasy Blood and Tissue Kit, Catalog No. 69504; Qiagen), according to the manufacturer's protocol. Targeted sequencing on 29 LCs was performed using MSK-IMPACT (11) hybrid capture cancer gene panel (n = 354). Single-nucleotide variants and short indels (<30 bp) were identified and annotated using MSK-IMPACT pipeline (11). Briefly, raw reads were filtered based on quality, mapped to NCBI b37 genome using BWA–MEM version 0.7.5a (http://arxiv.org/abs/1303.3997), post-processed using GATK (12), and variant identification using MuTect (13). Variants were filtered based on its entry in NCBI-dbSNPs (http://www.ncbi.nlm.nih.gov/snp), 1000G project (http://www.1000genomes.org/), and COSMIC (http://cancer.sanger.ac.uk/cosmic). Filtered variants were manually reviewed on IGV. We created the mutational OncoPrint plot for our 29 LCs dataset using the online cBioPortal website (http://www.cbioportal.org/oncoprinter.jsp).
RNA-seq and data analysis
Total RNA extraction from microdissected frozen tumor samples was performed using RNeasy Mini Kit (Catalog No. 74106) followed by Illumina HiSeq sequencing (2 × 100 bp). We performed RNA-seq data analysis as described previously (14, 15). Briefly, raw fastq files were examined for sequencing quality control using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Sequencing reads were mapped to human transcripts corresponding to hg19 Genepattern (16) GTF annotations using RSEM (17) with default parameters. STAR (18) aligner was used to map reads in RSEM algorithm followed by calculating gene expression from mapped BAM files. Gene transcripts mapped data were normalized to transcript per million (TPM). We used log2(TPM+1) values for all downstream analysis. For unsupervised clustering, we used Pearson distance metric and hclust (ward.D2) method (unless stated otherwise). PCA was done using prcomp in R. R (http://www.r-project.org/) was used for all analysis and visualization of data. The R package DeSeq2 (19) was used to identify differentially expressed genes using Benjamini and Hochberg corrected P value (<0.05) and fold change of ≥2. We used DAVID bioinformatics resources (20) version 6.8 and gene set enrichment analysis (GSEA; including PreRanked test; ref. 21) to find significant pathways, gene ontology terms and transcription factor motif analysis with default parameter.
Independent dataset of LCs
To validate our novel molecular subtyping, we used gene expression and mutation data from an independent LC (n = 65) dataset. The gene expression and mutational data were downloaded from supplement data files (https://www.nature.com/articles/ncomms4518#supplementary-information) reported in ref. 9. This gene expression data were reported as transcript expression instead of gene expression. We used callapseRows (22) on transcript expression to convert to respective gene expression using MaxVariance option. Gene expression for LCs signature (top 100 variant genes from our 30 LCs dataset) was extracted from this RNAseq dataset. Unsupervised clustering and PCA analysis on this dataset were performed as previously described for our 30 LCs dataset.
DNA methylation and analysis
DNA were extracted from LC samples and interrogated for CpG DNA methylation using the Illumina 450K array platform (Illumina Inc.). For CpG DNA methylation data analysis, we used ChAMP (23) package in R with default parameters. Briefly, IDAT raw data files were imported in R and filtered to exclude samples with detection P-value <0.01 and bead count <3 in at least 5% of samples and normalized using FunctionNormalization (24). Additional filtering was performed to remove probes that have annotated SNPs, present on X and Y chromosome, and have CpG probes mapped to multiple locations. β values for 413,176 probes were used for all subsequent analysis. Subtype specific differentially methylated CpG probes (DMP) and CpG island were identified using COHCAP (default parameter; ref. 25). For DMP, we focused on probes present at TSS1500/200 and first exon for subsequent analysis. We integrated gene expression and DNA methylation to investigate subtype-specific gene expression using default parameter for COHCAP.integrate.avg.by.island with FDR P value <0.01.
IHC staining
IHC staining was performed using commercially available antibodies at optimal dilutions as follows: ASCL1 (a-MASH1; monoclonal, 1:300; BD Biosciences) and S100 (monoclonal, 1:4,000; BG). Briefly, IHC was performed by a standard protocol on Ventana Discovery XT automated stainer (Ventana Medical Systems Inc.) for the S100 antibody. Antigen retrieval was performed with Cell Conditioning 1 buffer (CC1; citrate buffer pH 6.0, Ventana). For the ASCL1 (a-MASH1) antibody, ER2 pretreatment was used and the Leica Bondmax autostainer (Leica). Immunoreactivity was performed on whole tissue sections of formalin-fixed and paraffin-embedded tissue section as well as on tissue microarray (TMA) constructed from 173 pulmonary carcinoid tumors formalin-fixed, paraffin-embedded tumor specimens were used for TMA construction. Briefly, 4 representative tumor areas were marked on H&E-stained slides, and cylindrical 0.6-mm tissue cores were arrayed from the corresponding paraffin blocks into a recipient block using an automated tissue arrayer ATA-27 (Beecher Instruments). From each TMA, 4-μm-thick paraffin sections were prepared for IHC. In all, 173 cases with adequate cores were available for IHC analysis (26). The results were semiquantitatively scored based on the percentage of reactive tumor cells (≥25% tumor cells) and the intensity of staining.
Data availability
The authors declare that all data supporting the findings of this study are available within the article and its Supplementary data and figures. Data generated in this study were deposited to NCBI under GEO SuperSeries GSE118133 (GSE118131 for RNAseq and GSE118132 for 450 K methylation). We do not impose any restrictions on data availability.
Results
Patient cohort, clinical annotations, and mutational profile of LCs
We analyzed 30 randomly selected and histologically confirmed, well-differentiated LCs (17 TCs and 13 ACs) comprising the discovery dataset. Most specimens were from pulmonary lobectomy with lymph node detection. Tumor locations, that is, peripheral versus central (endobronchial), were assessed by combination of radiographic reveal and pathologic observations. Fifty-four percent (7/13) of ACs had either lymph node or distant metastasis, whereas 6% (1/17) of TCs had local lymph node metastasis. The 5-year disease-specific survival was 89% and 55% for TC and AC, respectively. Clinical information and features are presented in Supplementary File S1. In addition, a TMA containing 173 cases of LCs had been prepared previously (26) and used for study of clinical correlates.
We performed targeted sequencing of a 354-cancer gene panel (MSK-IMPACT; ref. 11) on 29 LCs from the discovery dataset. The mutated genes were enriched for those implicated in covalent histone modification/chromatin remodeling and found in 10 samples [MEN1 (13.8%), ARID1A (10%), KMT2A (3%), KMT2C (7%), KMT2D (3%), and SMARCA4 (3%)] reproducing the results from a previous study (9). We also found mutations in DNA repair pathways (17.2% of samples) (Supplementary File S2; Fig. 1A). Mutations were not detected in the 354-cancer gene panel for 13 LC samples. Mutations in MEN1, the most frequently mutated gene, were found in 4 samples (4 ACs) and 4 of these mutations had variant allele frequencies higher than 70% indicating LOH (Supplementary Fig. S1). One sample (Lu-Aty9) has 2 MEN1 mutations (an in-frame deletion and a missense substitution), a few bases apart on the same copy of MEN1 (Supplementary File S2). The ARID1A gene is mutated in 3 samples with LOH occurring in one of the 3 samples. Using variant allele frequencies and LOH status of MEN1 and ARID1A, we found median tumor purity to be 91% (Supplemental File S1), consistent with our pathology-based estimates (Supplementary Table S1). In addition to MEN1, other genes encoding SET1/MLL complex proteins are also found mutated [KMT2A (3%), KMT2C (7%), KMT2D (3%)]. One sample (Lu-ty4) has the highest number of mutations (9) including mutations in POLE (DNA polymerase B domain: V1016M), ROS1, FAT1, NBN, PARP1, and TERT (in-frame deletion close to Telomerase RBD). We found homozygous deletions only in the FANCA and RAD51 genes in 2 different ACs. The most recurrent CNV are single copy deletions in FANCA (17%), FAT1 (10%), MEN1 (7%), ATM (17%), SDHD (17%), and CHEK1 (17%), many of which reside on chr11q. We did not observe changes in the transcription levels of these genes with hemizygous deletions in comparison to wild-type samples. There are 18 samples (4 ACs and 14 TCs) with normal karyotype, 6 samples (4 ACs and 2 TCs) with nearly normal karyotype (aneuploid for only one or 2 different chromosomes), and 6 samples (5 ACs and 1 TCs) with aneuploidy in more than 2 different chromosomes in our dataset (Supplementary File S2). We did not find any known pathogenic germline mutations in the panel of cancer-associated genes in our samples. TP53 and RB1 genes were not mutated in this cohort, unlike high-grade lung NETs and SCLC.
Three novel molecular subtypes of LCs with mutational, gene expression, and DNA CpG methylation with distinct clinical features. A, Mutated genes in LCs on a 354-cancer gene panel. Samples summary from DNA (n = 29), methylation (n = 18), RNA-seq (n = 30), carcinoids samples (atypical (n = 13; gray), and typical (n = 17; white) and specimen location [endobronchial, white (n = 9); peripheral, gray (n = 21)]. Samples are grouped according to their gene expression and DNA methylation pattern. Orange, subtype 1 (LC1); red, subtype 2 (LC2; blue, subtype 3 (LC3). Column represents sample and row represents gene name. Gene expression (n = 30) and DNA CpG methylation (n = 18) analysis revealed three LC subtypes using unsupervised clustering and principal component analysis. B, Heatmap of unsupervised clustering of top 100 variably expressed gene across all samples. C, Principal component analysis of top 3,000 variably expressed genes. D, Heatmap of unsupervised clustering of top 500 variable methylated CpG probes. E, Principal component analysis of top 3,000 variable methylated CpG probes.
Three novel molecular subtypes of LCs with mutational, gene expression, and DNA CpG methylation with distinct clinical features. A, Mutated genes in LCs on a 354-cancer gene panel. Samples summary from DNA (n = 29), methylation (n = 18), RNA-seq (n = 30), carcinoids samples (atypical (n = 13; gray), and typical (n = 17; white) and specimen location [endobronchial, white (n = 9); peripheral, gray (n = 21)]. Samples are grouped according to their gene expression and DNA methylation pattern. Orange, subtype 1 (LC1); red, subtype 2 (LC2; blue, subtype 3 (LC3). Column represents sample and row represents gene name. Gene expression (n = 30) and DNA CpG methylation (n = 18) analysis revealed three LC subtypes using unsupervised clustering and principal component analysis. B, Heatmap of unsupervised clustering of top 100 variably expressed gene across all samples. C, Principal component analysis of top 3,000 variably expressed genes. D, Heatmap of unsupervised clustering of top 500 variable methylated CpG probes. E, Principal component analysis of top 3,000 variable methylated CpG probes.
Transcriptome and methylome profiles reveal three distinct subtypes
We performed RNA-seq on 30 LCs (13 ACs and 17 TCs) and DNA methylation analysis on 18 LCs (12 of the 30 samples did not have sufficient material for analysis) from the discovery dataset. Unsupervised clustering and principal component analysis on the top 3,000 variable genes showed 3 distinct clusters (Fig. 1B and C). These clusters are robust when different numbers of top variable genes were used for clustering (Supplementary Fig. S2). We named these subtypes LC1, LC2, and LC3. Heatmap of Pearson correlation on top 3,000 variable genes shows 3 blocks representing the 3 evident subtypes (Supplementary Fig. S3). The top 100 variable genes (Supplementary Fig. S4) across all LCs show enrichment for gene ontologies related to hormonal secretions, endogenous stimulus, wound healing, and developmental processes (Supplementary Table S2). Gene expression analysis revealed greater similarity between LC2 and LC3 as compared with LC1 (Supplementary Figs. S3 and S4).
We investigated the DNA methylation profiles of LCs (n = 18), using the Illumina 450K microarray. Unsupervised clustering and PCA of the top 3,000 variable CpG sites revealed 3 distinct subtypes in complete agreement with the gene expression based subtypes (Fig. 1D and E). Consistent with gene expression, we also observed greater similarity of DNA methylation levels for LC2 and LC3 subtypes when compared with LC1. The 3 grouping of subtypes was robust and reproducible using different numbers of top variable CpG sites (Supplementary Fig. S5). Total genome-wide DNA methylation level is not different between the 3 subtypes (Supplementary Table S3).
Subtype-specific molecular characterization of LCs
We investigated gene expression and CpG DNA methylation profiles to determine subtype-specific molecular alterations (see Materials and Methods section). Genes upregulated in LC2 and LC3 as compared with LC1 are enriched for having the transcription factor motifs for HNF1 (FDR q-value <0.001) and HNF4 (FDR q-value <0.001; Fig. 2; Supplementary File 3). This is in agreement with the observed high gene expression and DNA hypomethylation of HNF1A, FOXA3, and HNF4A in LC2 and LC3 as compared with LC1 (Fig. 3A and B; Supplementary Fig. S6). In addition, many of the most highly expressed genes (APOH, GC, HAO1, G6PC, TM4SF4, PKLR, UGT2B17, CDH1, and SERPINA1/2/6) in LC2 and LC3 are targets of these hepatocyte nuclear factors (Fig. 2). Cancer hallmark gene set enrichment analysis shows complement and coagulation, xenobiotic, retinol and bile acid metabolism to be significantly upregulated in LC2 and LC3 as compared with LC1, a gene signature also found in subset of pancreatic neuroendocrine tumors (Supplementary File 3; ref. 14). However, we also identified TFs that are differentially expressed between LC2 and LC3 (FEV and POU3F4 are more highly expressed in LC2 and LC3, respectively; Fig. 2; Supplementary File S3). MEN1 gene is required for regulation of several members of the HOX gene family (27). Indeed, the LC2 subtype, which included all of the MEN1 mutant samples, has low expression of HOXB2/3/4/5/6 genes as compared with LC1 and LC3 (Supplementary Fig. S7).
Heatmap of differentially expressed genes between LC subtypes. Supervised analysis on LC subtypes reveals differential expression of transcription factor and neuropeptide (some are highlighted on the left side of the heatmap). Heatmap expression level is in z-score.
Heatmap of differentially expressed genes between LC subtypes. Supervised analysis on LC subtypes reveals differential expression of transcription factor and neuropeptide (some are highlighted on the left side of the heatmap). Heatmap expression level is in z-score.
Subtype-specific molecular characterization of gene expression and DNA methylation profiles. A, Heatmap of differentially methylated CpG sites (probes from TSS1500, TSS200, and first exon) of genes among the three LC subtypes. Some genes with altered gene expression and CpG sites are highlighted on the left of heatmap. Dark black line, subtype-specific blocks. B, Anticorrelation of gene expression and respective CpG island methylation (18 matched samples) for HNF1A, FOXA3, FEV, and ILRL2 across three subtypes. Each plot represents gene expression on x-axis and average CpG island β value on y-axis along with Pearson correlation (r) and P value (P) are on top of the plot.
Subtype-specific molecular characterization of gene expression and DNA methylation profiles. A, Heatmap of differentially methylated CpG sites (probes from TSS1500, TSS200, and first exon) of genes among the three LC subtypes. Some genes with altered gene expression and CpG sites are highlighted on the left of heatmap. Dark black line, subtype-specific blocks. B, Anticorrelation of gene expression and respective CpG island methylation (18 matched samples) for HNF1A, FOXA3, FEV, and ILRL2 across three subtypes. Each plot represents gene expression on x-axis and average CpG island β value on y-axis along with Pearson correlation (r) and P value (P) are on top of the plot.
We integrated subtype-specific CpG DNA methylation (see Materials and Methods section) with gene expression by focusing on CpG sites between 1,500 bps and 200 bps upstream to the transcription start site (TSS) and in the first exon, which have been shown to inversely correlate with gene expression (28). Fig. 3A shows subtype-specific differentially methylated CpG probes (DMP) and their inverse correlation with neighboring gene expression. We found 75 genes with expression to be significantly anticorrelated with respective CpG island methylation level (FDR P-value < 0.01; Supplementary File S4). HNF1A and FOXA3 are hypermethylated and low expressed in LC1. FEV, GATA2, and PROCR are hypomethylated and highly expressed in LC2. SOX1 is hypermethylated and low expressed in LC2. SIX2, ONECUT2, and IL1RL2 are hypomethylated and highly expressed in LC3 (Fig. 3B). Many of these observations suggest further mechanistic studies but there are currently no appropriate LC cell lines or animal models available.
Independent validation of LC classification
We validated our novel classification and gene expression biomarkers using published LC data from Fernandez-Cuesta and colleagues (9), which include genome/exome and RNA sequencing of 65 samples (56 TCs, 6 ACs and 3 carcinoids). Using our gene signatures derived from the top 100 most variable genes (Supplementary File S5) across LCs, we found 3 distinct subtypes using unsupervised clustering and PCA that are consistent with the subtypes identified from our data (Fig. 4A and B; Supplementary Fig. S8). Moreover, all MEN1 mutated LCs are found exclusively in LC2 (Fig. 4B). In addition, we found HNF1A and FOXA3 are more highly expressed in LC2 and LC3 as compared with LC1 whereas FEV is highly expressed only in LC2 consistent with our data (Supplementary Fig. S9).
Validation of novel classification of LC on an independent collection of LCs from Fernandez-Cuesta and colleagues (9). A and B, Principal component analysis and heatmap of hierarchical clustering of gene expression of LCs from Fernandez-Cuesta and colleagues (9) using our top 100 variable gene set signature shows three distinct subtypes LC1 (orange), LC2 (red), and LC3 (blue). Black sticks represent samples with MEN1 mutations and they are all found in subtype LC2. C, Boxplot of ASCL1 and S100 gene expression from Fernandez-Cuesta and colleagues (9) is consistent with LC subtypes. Centerline, median; bounds of box, the first and third quartiles; and upper and lower whisker is defined to be 1.5 × interquartile range more than the third and first quartile.
Validation of novel classification of LC on an independent collection of LCs from Fernandez-Cuesta and colleagues (9). A and B, Principal component analysis and heatmap of hierarchical clustering of gene expression of LCs from Fernandez-Cuesta and colleagues (9) using our top 100 variable gene set signature shows three distinct subtypes LC1 (orange), LC2 (red), and LC3 (blue). Black sticks represent samples with MEN1 mutations and they are all found in subtype LC2. C, Boxplot of ASCL1 and S100 gene expression from Fernandez-Cuesta and colleagues (9) is consistent with LC subtypes. Centerline, median; bounds of box, the first and third quartiles; and upper and lower whisker is defined to be 1.5 × interquartile range more than the third and first quartile.
ASCL1 and S100 are novel biomarkers for LC subtypes
We selected genes with distinct subtype-specific expression to test for use as biomarkers. ASCL1 encodes a transcription factor that plays a role in neuronal differentiation and proliferation (29), neuroepithelial bodies formation (30), and is a lineage-specific oncogene for high-grade neuroendocrine lung cancer (31). ASCL1 is significantly highly expressed in LC1 along with its transcriptional targets (Figs. 4C and 5A; Supplementary Fig. S10). S100, a family of proteins containing 2 EF-hand calcium-binding motifs, is implicated in tumor progression and poor prognosis (32). Its gene expression levels are significantly higher in subtype LC2 (Fig. 5A). We performed IHC staining of ASCL1 and S100 to use as biomarkers. ASCL1 stained positively only for LC1 samples (n = 11) and S100 stained positively only for LC2 samples (n = 5) (Fig. 5B). Both of these genes stained negatively for LC3 samples (n = 4; Supplementary Table S4).
Gene expression and immunohistochemistry for ASCL1 and S100 biomarker genes. A, Boxplot of ASCL1 and S100 gene expression. Centerline, median; bounds of box, the first and third quartiles; and top and bottom whisker is defined to be 1.5 × interquartile range more than the third and first quartile. B, IHC staining results for ASCL1 and S100 in LC samples: LC1 (n = 11), LC2 (n = 5), and LC3 (n = 4). Supplementary Table S4 has IHC results for all samples.
Gene expression and immunohistochemistry for ASCL1 and S100 biomarker genes. A, Boxplot of ASCL1 and S100 gene expression. Centerline, median; bounds of box, the first and third quartiles; and top and bottom whisker is defined to be 1.5 × interquartile range more than the third and first quartile. B, IHC staining results for ASCL1 and S100 in LC samples: LC1 (n = 11), LC2 (n = 5), and LC3 (n = 4). Supplementary Table S4 has IHC results for all samples.
Additionally, we performed ASCL1 and S100 IHC staining on a panel of 173 LCs TMA (Supplementary File S6). ASCL1 positive and S100 negative samples (n = 54) were designated LC1. ASCL1 negative and S100 positive samples (n = 15) were designated LC2. ASCL1 negative and S100 negative samples (n = 71) were designated LC3. Fifteen percent of the TMA samples stained positive for ASCL1 and S100, which are not represented in our discovery dataset.
Cell cycle and mitotic genes are highly expressed in ACs of LC1
Pathologically, ACs are more aggressive and have a higher mitotic index in comparison to TCs. To find the gene signature responsible for these features of ACs, we compared ACs (n = 13) and TCs (n = 17) from our 30 LC cohort. Surprisingly, we did not find cell cycle or mitosis-related genes to be differentially expressed. We then controlled for LC subtypes and compared ACs (n = 8) and TCs (n = 7) from subtype LC1 and found differentially expressed genes (Fig. 6) were then enriched for mitotic and cell cycle related pathways with high expression in the ACs (Supplementary File 7). Of the 8 AC tumors, the 3 with highest gene expression signature for mitotic/cell-cycle pathway have metastases or recurrences whereas only 1 of the 5 with lower gene expression signature have recurrence or metastases (Fig. 6). We also observed high aneuploidy in the ACs with high gene expression signature of mitotic/cell-cycle pathway. We performed the same analysis for ACs and TCs from LC2 and did not find any significant gene signatures, which may be due to the small sample size of LC2.
Heatmap of differentially expressed genes between ACs and TCs within LC1 subtype. Upregulated genes in ACs (of LC1) are significantly enriched for genes involved in cell cycle/mitosis.
Heatmap of differentially expressed genes between ACs and TCs within LC1 subtype. Upregulated genes in ACs (of LC1) are significantly enriched for genes involved in cell cycle/mitosis.
LC subtypes have distinct clinical phenotypes
The 3 subtypes of LCs have distinct clinical phenotypes. Subtype LC1 is enriched for peripheral lung (P-value <0.003 in discovery dataset; P-value <0.002 in TMA), whereas subtype LC3 is found predominately at endobronchial lung (P-value < 0.054 in discovery dataset; P-value <3.8e−5 in TMA; Fig. 1A, box; Supplementary File S1). Subtype LC3 has significantly younger age of diagnosis (median age of 33, 44.5, and 48 years in discovery, Fernandez-Cuesta and colleagues (9), and TMA datasets, respectively) than LC1 (median age of 67, 66, and 60 years, respectively) and LC2 (median age of 62.5, 57, and 65 years, respectively; Supplementary Fig. S1A and S1B). LC1 subtype was enriched for female patients (6.5-fold, P-value < 0.007 in discovery dataset; 3.9-fold, P-value < 1.4e−5 in TMA) but not for LC2 or LC3 (Supplementary File S1).
Discussion
We identified 3 novel molecular subtypes of LCs with distinct clinical features using gene expression, DNA methylation, and mutational profiles (Fig. 7). Integrative analysis of gene expression and DNA methylation identified subtype-specific transcriptional profiles of key differentiation transcription factors (ASCL1, HNF1A, FOXA3) and their downstream target genes. Mutational analysis revealed recurrent mutations in chromatin remodeling genes found in all subtypes of LCs with exception of MEN1 mutations occurring only in subtype LC2 tumors. Importantly, we found mutations in DNA repair genes in 17% of our LC samples. Subtype LC3 has younger age of diagnosis and is predominantly endobronchial, whereas subtypes LC1 are predominantly found in peripheral regions of the lung. These findings may argue that subtypes of LC potentially originate from different neuroendocrine cell lineages, although the lack of available cell-type–specific gene markers prevents a definitive validation beyond speculation at this time. Nevertheless, we believe that a more comprehensive single-cell approach can uncover lung NE cell-type–specific gene signatures and reveal the cells of origin for the LC subtypes. The younger age of diagnosis for LC3 by 15 to 20 years as compared with LC1 and LC2 may be due to earlier diagnosis from the clinically symptomatic tumors located in the central lung as compared with asymptomatic tumors at the peripheral lung or suggest a possible distinct pathogenesis predisposing to LC3, including germline mutations. However, we were not able to detect any pathogenic germline mutations in the panel of cancer-associated genes used in the MSK IMPACT testing for any of the LCs. Our classification and gene expression biomarkers were validated in 65 additional LC samples from Fernandez-Cuesta and colleagues (9). Using our subtype classification, we found gene signature for cell cycle and mitotic processes activated in ACs as compared with TCs of the LC1 subtype and those ACs with the high gene signature have worst outcome (Fig. 6; Supplementary File 7). This gene signature will need to be reproduced with a larger sample size but may potentially serve as a diagnostic and prognostic biomarker to differentiate malignant from more benign ACs from subtype LC1. This gene signature is specific to LC1 and would not have been found from comparing ACs to TCs from all LCs demonstrating the need to study distinct subtypes individually.
Schematic representation of novel molecular subtypes of LCs with the principal genomic and clinical characteristics.
Schematic representation of novel molecular subtypes of LCs with the principal genomic and clinical characteristics.
Our molecular classification introduces 3 subtypes of LCs with distinct clinical phenotypes. This can refine and complement the WHO classification of LCs into typical or ACs and help diagnose ambiguous cases of LCs from the more malignant LCNEC and SCLC. In addition, the stratification of LCs into distinct molecular subtypes will help with future study of their tumorigenesis, prognosis, and treatment options.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: S.V. Laddha, W.D. Travis, L.H. Tang, C.S. Chan
Development of methodology: S.V. Laddha, J.T. Poirier, C.S. Chan
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.V. Laddha, K. Robzyk, B.R. Untch, W.D. Travis, L.H. Tang
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S.V. Laddha, B.R. Untch, H. Ke, W.D. Travis, L.H. Tang, C.S. Chan
Writing, review, and/or revision of the manuscript: S.V. Laddha, E.M. da Silva, K. Robzyk, B.R. Untch, N. Rekhtman, W.D. Travis, L.H. Tang, C.S. Chan
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S.V. Laddha, E.M. da Silva, K. Robzyk
Study supervision: S.V. Laddha, C.S. Chan
Acknowledgments
We would like to thank Starr Cancer Consortium Grant (L.H. Tang), Raymond and Beverly Sackler Foundation (L.H. Tang), Caring for Carcinoid Foundation (L.H. Tang), Mushett Family Foundation (L.H. Tang), MSKCC Support Grant/Core Grant (P30 CA008748), and Stand Up To Cancer (C.S. Chan). This material is based upon work supported in part by the National Science Foundation under Grant No. 1546101. This research was also supported by the Biomedical Informatics shared resource of Rutgers Cancer Institute of New Jersey (P30CA072720). Computational resources were provided by the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey, under the National Institutes of Health Grant (S10OD012346).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.