Lung carcinoids (LC) are rare and slow growing primary lung neuroendocrine tumors. We performed targeted exome sequencing, mRNA sequencing, and DNA methylation array analysis on macro-dissected LCs. Recurrent mutations were enriched for genes involved in covalent histone modification/chromatin remodeling (34.5%; MEN1, ARID1A, KMT2C, and KMT2A) as well as DNA repair (17.2%) pathways. Unsupervised clustering and principle component analysis on gene expression and DNA methylation profiles showed three robust molecular subtypes (LC1, LC2, LC3) with distinct clinical features. MEN1 gene mutations were found to be exclusively enriched in the LC2 subtype. LC1 and LC3 subtypes were predominately found at peripheral and endobronchial lung, respectively. The LC3 subtype was diagnosed at a younger age than LC1 and LC2 subtypes. IHC staining of two biomarkers, ASCL1 and S100, sufficiently stratified the three subtypes. This molecular classification of LCs into three subtypes may facilitate understanding of their molecular mechanisms and improve diagnosis and clinical management.

Significance:

Integrative genomic analysis of lung carcinoids identifies three novel molecular subtypes with distinct clinical features and provides insight into their distinctive molecular signatures of tumorigenesis, diagnosis, and prognosis.

Lung carcinoids (LC) are an indolent and rare type of primary lung neoplasms that are, in general, understudied. The 2015 World Health Organization (WHO; ref. 1) classification of LCs includes atypical carcinoids (AC; ∼0.2% prevalence) and typical carcinoids (TC; ∼2% prevalence). TCs are slow growing tumors that rarely spread beyond the lungs whereas ACs are faster growing tumors and have a greater chance of metastasizing to other tissues (2). The WHO classification relies mainly on morphology, proliferation rate (mitotic index), and necrosis assessment (3). This current method of classification has its drawbacks as studies have shown that the reproducibility of cancer classification and its prognostic efficacy have high interobserver variability (3, 4), especially for differentiating between TC and AC (5). Recent WHO classifications highlight use of the Ki67 cell proliferation marker to distinguish ACs from TCs (1). However, overlapping distribution of Ki67 between ACs and TCs does not enable reliable stratification between well-differentiated LCs (6, 7). It has also been reported that TCs and ACs are overdiagnosed as small cell lung carcinomas (SCLC) in small crush biopsy specimens (8), a situation where artifacts in specimens appear as bluish clusters in which cellular details are not recognizable. As SCLCs are highly malignant, incorrect diagnosis of TC and AC tumors as SCLC can subject patients to unnecessary stress and treatment (8). More accurate molecular diagnostic tools and stratification for LCs will help ensure more appropriate treatment and clinical management.

Previous genomic analysis of LC tumors has identified recurrent mutations in MEN1, PSIP1, and ARID1A (9), whereas no significant mutations or focal copy alterations were observed in genes that are frequently mutated in non–small cell lung cancer (NSCLC), large cell neuroendocrine carcinoma (LCNEC), and SCLC (e.g., KRAS, TP53, EGFR, and RB1; ref. 10). The different mutation spectrum and low mutation burden (9) of LCs indicate they are distinct from NSCLC and high-grade lung neuroendocrine tumors (NET). It is not known if there are distinct molecular subtypes of LCs or their cells of origin.

In this study, we performed genotyping on 29 LCs to detect mutations in a 354-cancer gene panel, mRNA sequencing (n = 30) and DNA methylation 450K-array analysis (n = 18) and identified 3 molecular subtypes with distinct clinical features. We also identified 2 key biomarkers (ASCL1 and S100) to stratify these subtypes. Integration of genetic and epigenetic signatures distinguishes each subtype of carcinoid, providing deeper insight into their distinctive molecular signatures of tumorigenesis as well as cells of origin.

Patient's data

Retrospective and prospective reviews of 30 LC neoplasms were performed using the pathology files and institutional database at Memorial Sloan Kettering Cancer Center (MSKCC, New York, NY). All studies were conducted in accordance with appropriate ethical guidelines (following U.S. Common Rule) and with Institutional Review Board approval. Written informed consent was obtained from the patients. All patients were evaluated clinically with confirmed pathologic diagnoses, appropriate radiological and laboratory studies, and surgical or oncological management. Fresh-frozen tumor and paired normal tissues were obtained from MSKCC's tissue bank under Institutional Review Board protocol. Targeted cancer gene panel DNA sequencing, RNA sequencing (RNA-seq), and DNA methylation array were performed on fresh-frozen samples. Relevant clinical and pathologic information is presented in Supplementary File S1.

DNA sequencing and analysis

DNA extraction from microdissected tumor samples and normal adjacent tissues was performed using a commercially available DNA Extraction Kit (DNeasy Blood and Tissue Kit, Catalog No. 69504; Qiagen), according to the manufacturer's protocol. Targeted sequencing on 29 LCs was performed using MSK-IMPACT (11) hybrid capture cancer gene panel (n = 354). Single-nucleotide variants and short indels (<30 bp) were identified and annotated using MSK-IMPACT pipeline (11). Briefly, raw reads were filtered based on quality, mapped to NCBI b37 genome using BWA–MEM version 0.7.5a (http://arxiv.org/abs/1303.3997), post-processed using GATK (12), and variant identification using MuTect (13). Variants were filtered based on its entry in NCBI-dbSNPs (http://www.ncbi.nlm.nih.gov/snp), 1000G project (http://www.1000genomes.org/), and COSMIC (http://cancer.sanger.ac.uk/cosmic). Filtered variants were manually reviewed on IGV. We created the mutational OncoPrint plot for our 29 LCs dataset using the online cBioPortal website (http://www.cbioportal.org/oncoprinter.jsp).

RNA-seq and data analysis

Total RNA extraction from microdissected frozen tumor samples was performed using RNeasy Mini Kit (Catalog No. 74106) followed by Illumina HiSeq sequencing (2 × 100 bp). We performed RNA-seq data analysis as described previously (14, 15). Briefly, raw fastq files were examined for sequencing quality control using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Sequencing reads were mapped to human transcripts corresponding to hg19 Genepattern (16) GTF annotations using RSEM (17) with default parameters. STAR (18) aligner was used to map reads in RSEM algorithm followed by calculating gene expression from mapped BAM files. Gene transcripts mapped data were normalized to transcript per million (TPM). We used log2(TPM+1) values for all downstream analysis. For unsupervised clustering, we used Pearson distance metric and hclust (ward.D2) method (unless stated otherwise). PCA was done using prcomp in R. R (http://www.r-project.org/) was used for all analysis and visualization of data. The R package DeSeq2 (19) was used to identify differentially expressed genes using Benjamini and Hochberg corrected P value (<0.05) and fold change of ≥2. We used DAVID bioinformatics resources (20) version 6.8 and gene set enrichment analysis (GSEA; including PreRanked test; ref. 21) to find significant pathways, gene ontology terms and transcription factor motif analysis with default parameter.

Independent dataset of LCs

To validate our novel molecular subtyping, we used gene expression and mutation data from an independent LC (n = 65) dataset. The gene expression and mutational data were downloaded from supplement data files (https://www.nature.com/articles/ncomms4518#supplementary-information) reported in ref. 9. This gene expression data were reported as transcript expression instead of gene expression. We used callapseRows (22) on transcript expression to convert to respective gene expression using MaxVariance option. Gene expression for LCs signature (top 100 variant genes from our 30 LCs dataset) was extracted from this RNAseq dataset. Unsupervised clustering and PCA analysis on this dataset were performed as previously described for our 30 LCs dataset.

DNA methylation and analysis

DNA were extracted from LC samples and interrogated for CpG DNA methylation using the Illumina 450K array platform (Illumina Inc.). For CpG DNA methylation data analysis, we used ChAMP (23) package in R with default parameters. Briefly, IDAT raw data files were imported in R and filtered to exclude samples with detection P-value <0.01 and bead count <3 in at least 5% of samples and normalized using FunctionNormalization (24). Additional filtering was performed to remove probes that have annotated SNPs, present on X and Y chromosome, and have CpG probes mapped to multiple locations. β values for 413,176 probes were used for all subsequent analysis. Subtype specific differentially methylated CpG probes (DMP) and CpG island were identified using COHCAP (default parameter; ref. 25). For DMP, we focused on probes present at TSS1500/200 and first exon for subsequent analysis. We integrated gene expression and DNA methylation to investigate subtype-specific gene expression using default parameter for COHCAP.integrate.avg.by.island with FDR P value <0.01.

IHC staining

IHC staining was performed using commercially available antibodies at optimal dilutions as follows: ASCL1 (a-MASH1; monoclonal, 1:300; BD Biosciences) and S100 (monoclonal, 1:4,000; BG). Briefly, IHC was performed by a standard protocol on Ventana Discovery XT automated stainer (Ventana Medical Systems Inc.) for the S100 antibody. Antigen retrieval was performed with Cell Conditioning 1 buffer (CC1; citrate buffer pH 6.0, Ventana). For the ASCL1 (a-MASH1) antibody, ER2 pretreatment was used and the Leica Bondmax autostainer (Leica). Immunoreactivity was performed on whole tissue sections of formalin-fixed and paraffin-embedded tissue section as well as on tissue microarray (TMA) constructed from 173 pulmonary carcinoid tumors formalin-fixed, paraffin-embedded tumor specimens were used for TMA construction. Briefly, 4 representative tumor areas were marked on H&E-stained slides, and cylindrical 0.6-mm tissue cores were arrayed from the corresponding paraffin blocks into a recipient block using an automated tissue arrayer ATA-27 (Beecher Instruments). From each TMA, 4-μm-thick paraffin sections were prepared for IHC. In all, 173 cases with adequate cores were available for IHC analysis (26). The results were semiquantitatively scored based on the percentage of reactive tumor cells (≥25% tumor cells) and the intensity of staining.

Data availability

The authors declare that all data supporting the findings of this study are available within the article and its Supplementary data and figures. Data generated in this study were deposited to NCBI under GEO SuperSeries GSE118133 (GSE118131 for RNAseq and GSE118132 for 450 K methylation). We do not impose any restrictions on data availability.

Patient cohort, clinical annotations, and mutational profile of LCs

We analyzed 30 randomly selected and histologically confirmed, well-differentiated LCs (17 TCs and 13 ACs) comprising the discovery dataset. Most specimens were from pulmonary lobectomy with lymph node detection. Tumor locations, that is, peripheral versus central (endobronchial), were assessed by combination of radiographic reveal and pathologic observations. Fifty-four percent (7/13) of ACs had either lymph node or distant metastasis, whereas 6% (1/17) of TCs had local lymph node metastasis. The 5-year disease-specific survival was 89% and 55% for TC and AC, respectively. Clinical information and features are presented in Supplementary File S1. In addition, a TMA containing 173 cases of LCs had been prepared previously (26) and used for study of clinical correlates.

We performed targeted sequencing of a 354-cancer gene panel (MSK-IMPACT; ref. 11) on 29 LCs from the discovery dataset. The mutated genes were enriched for those implicated in covalent histone modification/chromatin remodeling and found in 10 samples [MEN1 (13.8%), ARID1A (10%), KMT2A (3%), KMT2C (7%), KMT2D (3%), and SMARCA4 (3%)] reproducing the results from a previous study (9). We also found mutations in DNA repair pathways (17.2% of samples) (Supplementary File S2; Fig. 1A). Mutations were not detected in the 354-cancer gene panel for 13 LC samples. Mutations in MEN1, the most frequently mutated gene, were found in 4 samples (4 ACs) and 4 of these mutations had variant allele frequencies higher than 70% indicating LOH (Supplementary Fig. S1). One sample (Lu-Aty9) has 2 MEN1 mutations (an in-frame deletion and a missense substitution), a few bases apart on the same copy of MEN1 (Supplementary File S2). The ARID1A gene is mutated in 3 samples with LOH occurring in one of the 3 samples. Using variant allele frequencies and LOH status of MEN1 and ARID1A, we found median tumor purity to be 91% (Supplemental File S1), consistent with our pathology-based estimates (Supplementary Table S1). In addition to MEN1, other genes encoding SET1/MLL complex proteins are also found mutated [KMT2A (3%), KMT2C (7%), KMT2D (3%)]. One sample (Lu-ty4) has the highest number of mutations (9) including mutations in POLE (DNA polymerase B domain: V1016M), ROS1, FAT1, NBN, PARP1, and TERT (in-frame deletion close to Telomerase RBD). We found homozygous deletions only in the FANCA and RAD51 genes in 2 different ACs. The most recurrent CNV are single copy deletions in FANCA (17%), FAT1 (10%), MEN1 (7%), ATM (17%), SDHD (17%), and CHEK1 (17%), many of which reside on chr11q. We did not observe changes in the transcription levels of these genes with hemizygous deletions in comparison to wild-type samples. There are 18 samples (4 ACs and 14 TCs) with normal karyotype, 6 samples (4 ACs and 2 TCs) with nearly normal karyotype (aneuploid for only one or 2 different chromosomes), and 6 samples (5 ACs and 1 TCs) with aneuploidy in more than 2 different chromosomes in our dataset (Supplementary File S2). We did not find any known pathogenic germline mutations in the panel of cancer-associated genes in our samples. TP53 and RB1 genes were not mutated in this cohort, unlike high-grade lung NETs and SCLC.

Figure 1.

Three novel molecular subtypes of LCs with mutational, gene expression, and DNA CpG methylation with distinct clinical features. A, Mutated genes in LCs on a 354-cancer gene panel. Samples summary from DNA (n = 29), methylation (n = 18), RNA-seq (n = 30), carcinoids samples (atypical (n = 13; gray), and typical (n = 17; white) and specimen location [endobronchial, white (n = 9); peripheral, gray (n = 21)]. Samples are grouped according to their gene expression and DNA methylation pattern. Orange, subtype 1 (LC1); red, subtype 2 (LC2; blue, subtype 3 (LC3). Column represents sample and row represents gene name. Gene expression (n = 30) and DNA CpG methylation (n = 18) analysis revealed three LC subtypes using unsupervised clustering and principal component analysis. B, Heatmap of unsupervised clustering of top 100 variably expressed gene across all samples. C, Principal component analysis of top 3,000 variably expressed genes. D, Heatmap of unsupervised clustering of top 500 variable methylated CpG probes. E, Principal component analysis of top 3,000 variable methylated CpG probes.

Figure 1.

Three novel molecular subtypes of LCs with mutational, gene expression, and DNA CpG methylation with distinct clinical features. A, Mutated genes in LCs on a 354-cancer gene panel. Samples summary from DNA (n = 29), methylation (n = 18), RNA-seq (n = 30), carcinoids samples (atypical (n = 13; gray), and typical (n = 17; white) and specimen location [endobronchial, white (n = 9); peripheral, gray (n = 21)]. Samples are grouped according to their gene expression and DNA methylation pattern. Orange, subtype 1 (LC1); red, subtype 2 (LC2; blue, subtype 3 (LC3). Column represents sample and row represents gene name. Gene expression (n = 30) and DNA CpG methylation (n = 18) analysis revealed three LC subtypes using unsupervised clustering and principal component analysis. B, Heatmap of unsupervised clustering of top 100 variably expressed gene across all samples. C, Principal component analysis of top 3,000 variably expressed genes. D, Heatmap of unsupervised clustering of top 500 variable methylated CpG probes. E, Principal component analysis of top 3,000 variable methylated CpG probes.

Close modal

Transcriptome and methylome profiles reveal three distinct subtypes

We performed RNA-seq on 30 LCs (13 ACs and 17 TCs) and DNA methylation analysis on 18 LCs (12 of the 30 samples did not have sufficient material for analysis) from the discovery dataset. Unsupervised clustering and principal component analysis on the top 3,000 variable genes showed 3 distinct clusters (Fig. 1B and C). These clusters are robust when different numbers of top variable genes were used for clustering (Supplementary Fig. S2). We named these subtypes LC1, LC2, and LC3. Heatmap of Pearson correlation on top 3,000 variable genes shows 3 blocks representing the 3 evident subtypes (Supplementary Fig. S3). The top 100 variable genes (Supplementary Fig. S4) across all LCs show enrichment for gene ontologies related to hormonal secretions, endogenous stimulus, wound healing, and developmental processes (Supplementary Table S2). Gene expression analysis revealed greater similarity between LC2 and LC3 as compared with LC1 (Supplementary Figs. S3 and S4).

We investigated the DNA methylation profiles of LCs (n = 18), using the Illumina 450K microarray. Unsupervised clustering and PCA of the top 3,000 variable CpG sites revealed 3 distinct subtypes in complete agreement with the gene expression based subtypes (Fig. 1D and E). Consistent with gene expression, we also observed greater similarity of DNA methylation levels for LC2 and LC3 subtypes when compared with LC1. The 3 grouping of subtypes was robust and reproducible using different numbers of top variable CpG sites (Supplementary Fig. S5). Total genome-wide DNA methylation level is not different between the 3 subtypes (Supplementary Table S3).

Subtype-specific molecular characterization of LCs

We investigated gene expression and CpG DNA methylation profiles to determine subtype-specific molecular alterations (see Materials and Methods section). Genes upregulated in LC2 and LC3 as compared with LC1 are enriched for having the transcription factor motifs for HNF1 (FDR q-value <0.001) and HNF4 (FDR q-value <0.001; Fig. 2; Supplementary File 3). This is in agreement with the observed high gene expression and DNA hypomethylation of HNF1A, FOXA3, and HNF4A in LC2 and LC3 as compared with LC1 (Fig. 3A and B; Supplementary Fig. S6). In addition, many of the most highly expressed genes (APOH, GC, HAO1, G6PC, TM4SF4, PKLR, UGT2B17, CDH1, and SERPINA1/2/6) in LC2 and LC3 are targets of these hepatocyte nuclear factors (Fig. 2). Cancer hallmark gene set enrichment analysis shows complement and coagulation, xenobiotic, retinol and bile acid metabolism to be significantly upregulated in LC2 and LC3 as compared with LC1, a gene signature also found in subset of pancreatic neuroendocrine tumors (Supplementary File 3; ref. 14). However, we also identified TFs that are differentially expressed between LC2 and LC3 (FEV and POU3F4 are more highly expressed in LC2 and LC3, respectively; Fig. 2; Supplementary File S3). MEN1 gene is required for regulation of several members of the HOX gene family (27). Indeed, the LC2 subtype, which included all of the MEN1 mutant samples, has low expression of HOXB2/3/4/5/6 genes as compared with LC1 and LC3 (Supplementary Fig. S7).

Figure 2.

Heatmap of differentially expressed genes between LC subtypes. Supervised analysis on LC subtypes reveals differential expression of transcription factor and neuropeptide (some are highlighted on the left side of the heatmap). Heatmap expression level is in z-score.

Figure 2.

Heatmap of differentially expressed genes between LC subtypes. Supervised analysis on LC subtypes reveals differential expression of transcription factor and neuropeptide (some are highlighted on the left side of the heatmap). Heatmap expression level is in z-score.

Close modal
Figure 3.

Subtype-specific molecular characterization of gene expression and DNA methylation profiles. A, Heatmap of differentially methylated CpG sites (probes from TSS1500, TSS200, and first exon) of genes among the three LC subtypes. Some genes with altered gene expression and CpG sites are highlighted on the left of heatmap. Dark black line, subtype-specific blocks. B, Anticorrelation of gene expression and respective CpG island methylation (18 matched samples) for HNF1A, FOXA3, FEV, and ILRL2 across three subtypes. Each plot represents gene expression on x-axis and average CpG island β value on y-axis along with Pearson correlation (r) and P value (P) are on top of the plot.

Figure 3.

Subtype-specific molecular characterization of gene expression and DNA methylation profiles. A, Heatmap of differentially methylated CpG sites (probes from TSS1500, TSS200, and first exon) of genes among the three LC subtypes. Some genes with altered gene expression and CpG sites are highlighted on the left of heatmap. Dark black line, subtype-specific blocks. B, Anticorrelation of gene expression and respective CpG island methylation (18 matched samples) for HNF1A, FOXA3, FEV, and ILRL2 across three subtypes. Each plot represents gene expression on x-axis and average CpG island β value on y-axis along with Pearson correlation (r) and P value (P) are on top of the plot.

Close modal

We integrated subtype-specific CpG DNA methylation (see Materials and Methods section) with gene expression by focusing on CpG sites between 1,500 bps and 200 bps upstream to the transcription start site (TSS) and in the first exon, which have been shown to inversely correlate with gene expression (28). Fig. 3A shows subtype-specific differentially methylated CpG probes (DMP) and their inverse correlation with neighboring gene expression. We found 75 genes with expression to be significantly anticorrelated with respective CpG island methylation level (FDR P-value < 0.01; Supplementary File S4). HNF1A and FOXA3 are hypermethylated and low expressed in LC1. FEV, GATA2, and PROCR are hypomethylated and highly expressed in LC2. SOX1 is hypermethylated and low expressed in LC2. SIX2, ONECUT2, and IL1RL2 are hypomethylated and highly expressed in LC3 (Fig. 3B). Many of these observations suggest further mechanistic studies but there are currently no appropriate LC cell lines or animal models available.

Independent validation of LC classification

We validated our novel classification and gene expression biomarkers using published LC data from Fernandez-Cuesta and colleagues (9), which include genome/exome and RNA sequencing of 65 samples (56 TCs, 6 ACs and 3 carcinoids). Using our gene signatures derived from the top 100 most variable genes (Supplementary File S5) across LCs, we found 3 distinct subtypes using unsupervised clustering and PCA that are consistent with the subtypes identified from our data (Fig. 4A and B; Supplementary Fig. S8). Moreover, all MEN1 mutated LCs are found exclusively in LC2 (Fig. 4B). In addition, we found HNF1A and FOXA3 are more highly expressed in LC2 and LC3 as compared with LC1 whereas FEV is highly expressed only in LC2 consistent with our data (Supplementary Fig. S9).

Figure 4.

Validation of novel classification of LC on an independent collection of LCs from Fernandez-Cuesta and colleagues (9). A and B, Principal component analysis and heatmap of hierarchical clustering of gene expression of LCs from Fernandez-Cuesta and colleagues (9) using our top 100 variable gene set signature shows three distinct subtypes LC1 (orange), LC2 (red), and LC3 (blue). Black sticks represent samples with MEN1 mutations and they are all found in subtype LC2. C, Boxplot of ASCL1 and S100 gene expression from Fernandez-Cuesta and colleagues (9) is consistent with LC subtypes. Centerline, median; bounds of box, the first and third quartiles; and upper and lower whisker is defined to be 1.5 × interquartile range more than the third and first quartile.

Figure 4.

Validation of novel classification of LC on an independent collection of LCs from Fernandez-Cuesta and colleagues (9). A and B, Principal component analysis and heatmap of hierarchical clustering of gene expression of LCs from Fernandez-Cuesta and colleagues (9) using our top 100 variable gene set signature shows three distinct subtypes LC1 (orange), LC2 (red), and LC3 (blue). Black sticks represent samples with MEN1 mutations and they are all found in subtype LC2. C, Boxplot of ASCL1 and S100 gene expression from Fernandez-Cuesta and colleagues (9) is consistent with LC subtypes. Centerline, median; bounds of box, the first and third quartiles; and upper and lower whisker is defined to be 1.5 × interquartile range more than the third and first quartile.

Close modal

ASCL1 and S100 are novel biomarkers for LC subtypes

We selected genes with distinct subtype-specific expression to test for use as biomarkers. ASCL1 encodes a transcription factor that plays a role in neuronal differentiation and proliferation (29), neuroepithelial bodies formation (30), and is a lineage-specific oncogene for high-grade neuroendocrine lung cancer (31). ASCL1 is significantly highly expressed in LC1 along with its transcriptional targets (Figs. 4C and 5A; Supplementary Fig. S10). S100, a family of proteins containing 2 EF-hand calcium-binding motifs, is implicated in tumor progression and poor prognosis (32). Its gene expression levels are significantly higher in subtype LC2 (Fig. 5A). We performed IHC staining of ASCL1 and S100 to use as biomarkers. ASCL1 stained positively only for LC1 samples (n = 11) and S100 stained positively only for LC2 samples (n = 5) (Fig. 5B). Both of these genes stained negatively for LC3 samples (n = 4; Supplementary Table S4).

Figure 5.

Gene expression and immunohistochemistry for ASCL1 and S100 biomarker genes. A, Boxplot of ASCL1 and S100 gene expression. Centerline, median; bounds of box, the first and third quartiles; and top and bottom whisker is defined to be 1.5 × interquartile range more than the third and first quartile. B, IHC staining results for ASCL1 and S100 in LC samples: LC1 (n = 11), LC2 (n = 5), and LC3 (n = 4). Supplementary Table S4 has IHC results for all samples.

Figure 5.

Gene expression and immunohistochemistry for ASCL1 and S100 biomarker genes. A, Boxplot of ASCL1 and S100 gene expression. Centerline, median; bounds of box, the first and third quartiles; and top and bottom whisker is defined to be 1.5 × interquartile range more than the third and first quartile. B, IHC staining results for ASCL1 and S100 in LC samples: LC1 (n = 11), LC2 (n = 5), and LC3 (n = 4). Supplementary Table S4 has IHC results for all samples.

Close modal

Additionally, we performed ASCL1 and S100 IHC staining on a panel of 173 LCs TMA (Supplementary File S6). ASCL1 positive and S100 negative samples (n = 54) were designated LC1. ASCL1 negative and S100 positive samples (n = 15) were designated LC2. ASCL1 negative and S100 negative samples (n = 71) were designated LC3. Fifteen percent of the TMA samples stained positive for ASCL1 and S100, which are not represented in our discovery dataset.

Cell cycle and mitotic genes are highly expressed in ACs of LC1

Pathologically, ACs are more aggressive and have a higher mitotic index in comparison to TCs. To find the gene signature responsible for these features of ACs, we compared ACs (n = 13) and TCs (n = 17) from our 30 LC cohort. Surprisingly, we did not find cell cycle or mitosis-related genes to be differentially expressed. We then controlled for LC subtypes and compared ACs (n = 8) and TCs (n = 7) from subtype LC1 and found differentially expressed genes (Fig. 6) were then enriched for mitotic and cell cycle related pathways with high expression in the ACs (Supplementary File 7). Of the 8 AC tumors, the 3 with highest gene expression signature for mitotic/cell-cycle pathway have metastases or recurrences whereas only 1 of the 5 with lower gene expression signature have recurrence or metastases (Fig. 6). We also observed high aneuploidy in the ACs with high gene expression signature of mitotic/cell-cycle pathway. We performed the same analysis for ACs and TCs from LC2 and did not find any significant gene signatures, which may be due to the small sample size of LC2.

Figure 6.

Heatmap of differentially expressed genes between ACs and TCs within LC1 subtype. Upregulated genes in ACs (of LC1) are significantly enriched for genes involved in cell cycle/mitosis.

Figure 6.

Heatmap of differentially expressed genes between ACs and TCs within LC1 subtype. Upregulated genes in ACs (of LC1) are significantly enriched for genes involved in cell cycle/mitosis.

Close modal

LC subtypes have distinct clinical phenotypes

The 3 subtypes of LCs have distinct clinical phenotypes. Subtype LC1 is enriched for peripheral lung (P-value <0.003 in discovery dataset; P-value <0.002 in TMA), whereas subtype LC3 is found predominately at endobronchial lung (P-value < 0.054 in discovery dataset; P-value <3.8e−5 in TMA; Fig. 1A, box; Supplementary File S1). Subtype LC3 has significantly younger age of diagnosis (median age of 33, 44.5, and 48 years in discovery, Fernandez-Cuesta and colleagues (9), and TMA datasets, respectively) than LC1 (median age of 67, 66, and 60 years, respectively) and LC2 (median age of 62.5, 57, and 65 years, respectively; Supplementary Fig. S1A and S1B). LC1 subtype was enriched for female patients (6.5-fold, P-value < 0.007 in discovery dataset; 3.9-fold, P-value < 1.4e−5 in TMA) but not for LC2 or LC3 (Supplementary File S1).

We identified 3 novel molecular subtypes of LCs with distinct clinical features using gene expression, DNA methylation, and mutational profiles (Fig. 7). Integrative analysis of gene expression and DNA methylation identified subtype-specific transcriptional profiles of key differentiation transcription factors (ASCL1, HNF1A, FOXA3) and their downstream target genes. Mutational analysis revealed recurrent mutations in chromatin remodeling genes found in all subtypes of LCs with exception of MEN1 mutations occurring only in subtype LC2 tumors. Importantly, we found mutations in DNA repair genes in 17% of our LC samples. Subtype LC3 has younger age of diagnosis and is predominantly endobronchial, whereas subtypes LC1 are predominantly found in peripheral regions of the lung. These findings may argue that subtypes of LC potentially originate from different neuroendocrine cell lineages, although the lack of available cell-type–specific gene markers prevents a definitive validation beyond speculation at this time. Nevertheless, we believe that a more comprehensive single-cell approach can uncover lung NE cell-type–specific gene signatures and reveal the cells of origin for the LC subtypes. The younger age of diagnosis for LC3 by 15 to 20 years as compared with LC1 and LC2 may be due to earlier diagnosis from the clinically symptomatic tumors located in the central lung as compared with asymptomatic tumors at the peripheral lung or suggest a possible distinct pathogenesis predisposing to LC3, including germline mutations. However, we were not able to detect any pathogenic germline mutations in the panel of cancer-associated genes used in the MSK IMPACT testing for any of the LCs. Our classification and gene expression biomarkers were validated in 65 additional LC samples from Fernandez-Cuesta and colleagues (9). Using our subtype classification, we found gene signature for cell cycle and mitotic processes activated in ACs as compared with TCs of the LC1 subtype and those ACs with the high gene signature have worst outcome (Fig. 6; Supplementary File 7). This gene signature will need to be reproduced with a larger sample size but may potentially serve as a diagnostic and prognostic biomarker to differentiate malignant from more benign ACs from subtype LC1. This gene signature is specific to LC1 and would not have been found from comparing ACs to TCs from all LCs demonstrating the need to study distinct subtypes individually.

Figure 7.

Schematic representation of novel molecular subtypes of LCs with the principal genomic and clinical characteristics.

Figure 7.

Schematic representation of novel molecular subtypes of LCs with the principal genomic and clinical characteristics.

Close modal

Our molecular classification introduces 3 subtypes of LCs with distinct clinical phenotypes. This can refine and complement the WHO classification of LCs into typical or ACs and help diagnose ambiguous cases of LCs from the more malignant LCNEC and SCLC. In addition, the stratification of LCs into distinct molecular subtypes will help with future study of their tumorigenesis, prognosis, and treatment options.

No potential conflicts of interest were disclosed.

Conception and design: S.V. Laddha, W.D. Travis, L.H. Tang, C.S. Chan

Development of methodology: S.V. Laddha, J.T. Poirier, C.S. Chan

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.V. Laddha, K. Robzyk, B.R. Untch, W.D. Travis, L.H. Tang

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): S.V. Laddha, B.R. Untch, H. Ke, W.D. Travis, L.H. Tang, C.S. Chan

Writing, review, and/or revision of the manuscript: S.V. Laddha, E.M. da Silva, K. Robzyk, B.R. Untch, N. Rekhtman, W.D. Travis, L.H. Tang, C.S. Chan

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): S.V. Laddha, E.M. da Silva, K. Robzyk

Study supervision: S.V. Laddha, C.S. Chan

We would like to thank Starr Cancer Consortium Grant (L.H. Tang), Raymond and Beverly Sackler Foundation (L.H. Tang), Caring for Carcinoid Foundation (L.H. Tang), Mushett Family Foundation (L.H. Tang), MSKCC Support Grant/Core Grant (P30 CA008748), and Stand Up To Cancer (C.S. Chan). This material is based upon work supported in part by the National Science Foundation under Grant No. 1546101. This research was also supported by the Biomedical Informatics shared resource of Rutgers Cancer Institute of New Jersey (P30CA072720). Computational resources were provided by the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey, under the National Institutes of Health Grant (S10OD012346).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Travis
WD
,
Brambilla
E
,
Nicholson
AG
,
Yatabe
Y
,
Austin
JHM
,
Beasley
MB
, et al
The 2015 World Health Organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification
.
J Thorac Oncol
2015
;
10
:
1243
60
.
2.
Caplin
ME
,
Baudin
E
,
Ferolla
P
,
Filosso
P
,
Garcia-Yuste
M
,
Lim
E
, et al
Pulmonary neuroendocrine (carcinoid) tumors: European Neuroendocrine Tumor Society expert consensus and recommendations for best practice for typical and atypical pulmonary carcinoids
.
Ann Oncol
2015
;
26
:
1604
20
.
3.
Travis
WD
,
Rush
W
,
Flieder
DB
,
Falk
R
,
Fleming
MV
,
Gal
AA
, et al
Survival analysis of 200 pulmonary neuroendocrine tumors with clarification of criteria for atypical carcinoid and its separation from typical carcinoid
.
Am J Surg Pathol
1998
;
22
:
934
44
.
4.
van den Bent
MJ
. 
Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician's perspective
.
Acta Neuropathol
2010
;
120
:
297
304
.
5.
Swarts
DR
,
van Suylen
RJ
,
den Bakker
MA
,
van Oosterhout
MF
,
Thunnissen
FB
,
Volante
M
, et al
Interobserver variability for the WHO classification of pulmonary carcinoids
.
Am J Surg Pathol
2014
;
38
:
1429
36
.
6.
Pelosi
G
,
Papotti
M
,
Rindi
G
,
Scarpa
A
. 
Unraveling tumor grading and genomic landscape in lung neuroendocrine tumors
.
Endocr Pathol
2014
;
25
:
151
64
.
7.
Volante
M
,
Gatti
G
,
Papotti
M
. 
Classification of lung neuroendocrine tumors: lights and shadows
.
Endocrine
2015
;
50
:
315
9
.
8.
Pelosi
G
,
Rodriguez
J
,
Viale
G
,
Rosai
J
. 
Typical and atypical pulmonary carcinoid tumor overdiagnosed as small-cell carcinoma on biopsy specimens: a major pitfall in the management of lung cancer patients
.
Am J Surg Pathol
2005
;
29
:
179
87
.
9.
Fernandez-Cuesta
L
,
Peifer
M
,
Lu
X
,
Sun
R
,
Ozretic
L
,
Seidal
D
, et al
Frequent mutations in chromatin-remodelling genes in pulmonary carcinoids
.
Nat Commun
2014
;
5
:
3518
.
10.
Vollbrecht
C
,
Werner
R
,
Walter
RF
,
Christoph
DC
,
Heukamp
LC
,
Peifer
M
, et al
Mutational analysis of pulmonary tumours with neuroendocrine features using targeted massive parallel sequencing: a comparison of a neglected tumour group
.
Br J Cancer
2015
;
113
:
1704
11
.
11.
Cheng
DT
,
Mitchell
TN
,
Zehir
A
,
Shah
RH
,
Benayed
R
,
Syed
A
, et al
Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology
.
J Mol Diagn
2015
;
17
:
251
64
.
12.
McKenna
A
,
Hanna
M
,
Banks
E
,
Sivachenko
A
,
Cibulskis
K
,
Kernytsky
A
, et al
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
.
Genome Res
2010
;
20
:
1297
303
.
13.
Cibulskis
K
,
Lawrence
MS
,
Carter
SL
,
Sivachenko
A
,
Jaffe
D
,
Sougnez
C
, et al
Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples
.
Nat Biotechnol
2013
;
31
:
213
9
.
14.
Chan
CS
,
Laddha
SV
,
Lewis
PW
,
Koletsky
MS
,
Robzyk
K
,
Da Silva
E
, et al
ATRX, DAXX or MEN1 mutant pancreatic neuroendocrine tumors are a distinct alpha-cell signature subgroup
.
Nat Commun
2018
;
9
:
4158
.
15.
Conesa
A
,
Madrigal
P
,
Tarazona
S
,
Gomez-Cabrero
D
,
Cervera
A
,
McPherson
A
, et al
A survey of best practices for RNA-seq data analysis
.
Genome Biol
2016
;
17
:
13
.
16.
Reich
M
,
Liefeld
T
,
Gould
J
,
Lerner
J
,
Tamayo
P
,
Mesirov
JP
. 
GenePattern 2.0
.
Nat Genet
2006
;
38
:
500
1
.
17.
Li
B
,
Dewey
CN
. 
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
.
BMC Bioinformatics
2011
;
12
:
323
.
18.
Dobin
A
,
Davis
CA
,
Schlesinger
F
,
Drenkow
J
,
Zaleski
C
,
Jha
S
, et al
STAR: ultrafast universal RNA-seq aligner
.
Bioinformatics
2013
;
29
:
15
21
.
19.
Love
MI
,
Huber
W
,
Anders
S
. 
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
.
Genome Biol
2014
;
15
:
550
.
20.
Huang
da W
,
Sherman
BT
,
Lempicki
RA
. 
Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
.
Nat Protoc
2009
;
4
:
44
57
.
21.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
22.
Miller
JA
,
Cai
C
,
Langfelder
P
,
Geschwind
DH
,
Kurian
SM
,
Salomon
DR
, et al
Strategies for aggregating gene expression data: the collapseRows R function
.
BMC Bioinformatics
2011
;
12
:
322
.
23.
Morris
TJ
,
Butcher
LM
,
Feber
A
,
Teschendorff
AE
,
Chakravarthy
AR
,
Wojdacz
TK
, et al
ChAMP: 450k chip analysis methylation pipeline
.
Bioinformatics
2014
;
30
:
428
30
.
24.
Fortin
JP
,
Labbe
A
,
Lemire
M
,
Zanke
BW
,
Hudson
TJ
,
Fertig
EJ
, et al
Functional normalization of 450k methylation array data improves replication in large cancer studies
.
Genome Biol
2014
;
15
:
503
.
25.
Warden
CD
,
Lee
H
,
Tompkins
JD
,
Li
X
,
Wang
C
,
Riggs
AD
, et al
COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis
.
Nucleic Acids Res
2013
;
41
:
e117
.
26.
Rekhtman
N
,
Pietanza
CM
,
Sabari
J
,
Montecalvo
J
,
Wang
H
,
Habeeb
O
, et al
Pulmonary large cell neuroendocrine carcinoma with adenocarcinoma-like features: napsin A expression and genomic alterations
.
Mod Pathol
2018
;
31
:
111
21
.
27.
Yokoyama
A
,
Somervaille
TC
,
Smith
KS
,
Rozenblatt-Rosen
O
,
Meyerson
M
,
Cleary
ML
. 
The menin tumor suppressor protein is an essential oncogenic cofactor for MLL-associated leukemogenesis
.
Cell
2005
;
123
:
207
18
.
28.
Brenet
F
,
Moh
M
,
Funk
P
,
Feierstein
E
,
Viale
AJ
,
Socci
ND
, et al
DNA methylation of the first exon is tightly linked to transcriptional silencing
.
PLoS One
2011
;
6
:
e14524
.
29.
Castro
DS
,
Martynoga
B
,
Parras
C
,
Ramesh
V
,
Pacary
E
,
Johnston
C
, et al
A novel function of the proneural factor Ascl1 in progenitor proliferation identified by genome-wide characterization of its targets
.
Genes Dev
2011
;
25
:
930
45
.
30.
Guha
A
,
Vasconcelos
M
,
Cai
Y
,
Yoneda
M
,
Hinds
A
,
Qian
J
, et al
Neuroepithelial body microenvironment is a niche for a distinct subset of Clara-like precursors in the developing airways
.
Proc Natl Acad Sci U S A
2012
;
109
:
12592
7
.
31.
Borromeo
MD
,
Savage
TK
,
Kollipara
RK
,
He
M
,
Augustyn
A
,
Osborne
JK
, et al
ASCL1 and NEUROD1 reveal heterogeneity in pulmonary neuroendocrine tumors and regulate distinct genetic programs
.
Cell Rep
2016
;
16
:
1259
72
.
32.
Chen
H
,
Xu
C
,
Jin
Q
,
Liu
Z
. 
S100 protein family in human cancer
.
Am J Cancer Res
2014
;
4
:
89
115
.

Supplementary data