Purpose:

Lung adenocarcinoma (LUAD) is a clinically heterogeneous disease, which is highlighted by the unpredictable recurrence in low-stage tumors and highly variable responses observed in patients treated with immunotherapies, which cannot be explained by mutational profiles. DNA methylation–based classification and understanding of microenviromental heterogeneity may allow stratification into clinically relevant molecular subtypes of LUADs.

Experimental Design:

We characterize the genome-wide DNA methylation landscape of 88 resected LUAD tumors. Exome sequencing focusing on a panel of cancer-related genes was used to genotype these adenocarcinoma samples. Bioinformatic and statistical tools, the immune cell composition, DNA methylation age (DNAm age), and DNA methylation clustering were used to identify clinically relevant subgroups.

Results:

Deconvolution of DNA methylation data identified immunologically hot and cold subsets of LUADs. In addition, concurrent factors were analyzed that could affect the immune microenvironment, such as smoking history, ethnicity, or presence of KRAS or TP53 mutations. When the DNAm age was calculated, a lower DNAm age was correlated with the presence of a set of oncogenic drivers, poor overall survival, and specific immune cell populations. Unsupervised DNA methylation clustering identified six molecular subgroups of LUAD tumors with distinct clinical and microenvironmental characteristics.

Conclusions:

Our results demonstrate that DNA methylation signatures can stratify LUAD into clinically relevant subtypes, and thus such classification of LUAD at the time of resection may lead to better methods in predicting tumor recurrence and therapy responses.

Translational Relevance

Non–small cell lung cancer has the highest number of cancer-related deaths worldwide, with lung adenocarcinoma (LUAD) being the most common type. Even if it is caught at the early stages, 30% to 40% of patients who have undergone curative resections will eventually die from recurrent disease. Understanding the heterogeneity and tumor microenvironment of early-stage LUAD tumors could be instrumental in tailoring clinical management and predicting overall outcomes. This article provides a multifaceted approach using genome-wide DNA methylation data to characterize 88 LUAD tumors. This was used to identify six subgroups with different immunologic classifications and DNA methylation age. Furthermore, we identified a subgroup that had no reported patient deaths at the time of follow-up and significantly better recurrence-free survival. We believe that these results advance our understanding of the diverse nature of this disease and provide rationale for using DNA methylation profiles to characterize subgroups within these mostly early-stage patients with LUAD.

Non–small cell lung cancer (NSCLC) is the leading cause of cancer-related deaths, with lung adenocarcinoma (LUAD) being the most common type (1, 2). This high mortality rate is due in part to a high number of cases being diagnosed at later stages (2). However, even when NSCLC is caught early, 30% to 40% of patients who have undergone curative resections die of recurrent disease (3, 4). The addition of adjuvant chemotherapy has only modest benefits on recurrence and overall survival for early-stage patients with LUAD (1, 4). Targeted inhibition of key oncogenic drivers, such as EGFR and ALK, and immune checkpoint inhibitors has greatly changed the treatment paradigm for LUAD. The adjuvant use of the tyrosine kinase inhibitor (TKI) osimertinib in resected EGFR mutant–positive stage IB–IIIA cancers resulted in significantly longer disease-free survival (5). However, TKIs are only effective for those patients with targetable drivers. Immuno-oncology (IO) therapy has shown to produce durable results for a subset of patients (2), emphasizing the great diversity within LUAD. Improved characterization of LUAD molecular subtypes and understanding of tumor immune microenvironment could improve our understanding of the molecular differences influencing clinical management, therapeutic response, and overall outcome.

Analysis of genome-wide DNA methylation has emerged as a diagnostic and prognostic assay for molecular classification of solid cancers including brain tumors (6), sarcomas (7, 8), meningiomas (9), and sinonasal carcinomas (10). DNA methylation may therefore provide additional insight into the molecular heterogeneity of LUAD not captured by mutational analysis. In support of this, previous studies have compared DNA methylation signatures to categorize clinically relevant subgroups in NSCLC, and DNA methylation has been proposed as a prognostic tool to predict early recurrence in stage I LUAD tumors (3, 11–14). However, until recently correlation of these subgroups with immune phenotypes has been challenging. With the development of the bioinformatic method MethylCIBERSORT, the immune cell composition of tumors can be determined in silico based on DNA methylation profiles (15). This method was generated on the basis of CIBERSORT (Cell type Identification By Estimating Relative Subsets Of known RNA Transcripts), which uses RNA. MethylCIBERSORT has the advantage of using more robust DNA samples for accurate estimates of tumor immune infiltrates. This method was initially applied to identify immunologically hot versus cold tumors across multiple cancers using the DNA methylation data accessible through The Cancer Genome Atlas (15); however, it has yet to be applied to a large cohort of early-stage LUAD tumors for a detailed evaluation correlating immune cell composition to subgroups identified by mutational oncogenic drivers, clinical characteristics, or DNA methylation profiles.

In addition to being able to utilize distinct methylation signatures to deconvolute the immune microenvironment, DNA methylation has been shown to change as a function of age, and analysis of DNA methyl sites can be used to calculate chronologic age of healthy tissues with a relatively small degree of error (16–18). However, for certain conditions or disease states this calculated DNA methylation age (DNAm age) has been shown to differ from chronologic age and linked to genetic aberrations, such as TP53 mutations (16–18). Intriguingly, accelerated age calculated from blood samples has been shown to be a potential biomarker for the development of NSCLC (19). However, whether differences in DNAm age of LUAD tumors correlate to the composition of the immune microenvironment and clinical subgroups has yet to be determined.

Here we perform a detailed analysis comparing the DNA methylation profiles with the clinical and genomic features of 88 resected LUAD tumors. We focused our study on resectable, lower-stage LUADs, where guidelines regarding adjuvant therapy are not well established. Using genome-wide DNA methylation analyses, we were able to subclassify LUAD tumors based on their tumor immune microenvironment and DNAm age. Using unsupervised DNA methylation clustering, we identified six DNA methylation subgroups with distinct immune cell composition and clinical outcomes for both overall and recurrence-free survival. By taking a multifaceted analysis approach to analyze DNA methylation signatures of LUAD tumors, we can identify clinically meaningful subtypes to potentially improve prognosis and long-term outcomes.

Patient samples

Genomic DNA was extracted from 27 homogenized or flash-frozen bulk LUAD surgical resections (850K Analysis Cohort, Table 1) and matched adjacent normal lung tissue or peripheral blood mononuclear cells using the Qiagen DNeasy Blood and Tissue Kit (51306). These specimens were collected at New York University Langone Hospital (New York, NY) following the approval of the Institutional Review Board, in accordance with the Declaration of Helsinki, CIOMS, Belmont Report, and U.S. Common Rule, and signed patient consent. Additional sample collection and DNA extraction used for the 450K Analysis Cohort (Table 1) have been described previously (20). The clinical characteristics for all 88 patients with LUAD are found in Table 1.

Table 1.

Clinical demographics of LUAD primary samples used for methylation sequencing analysis.

Total Cohort (n = 88)850K Analysis Cohort (n = 27)450K Analysis Cohort (n = 61)
Characteristic Number (%) Number (%) Number (%) 
Age range 
 <60 13 (14.8%) (7.4%) 11 (18.0%) 
 60–69 24 (27.3%) (29.6%) 16 (26.2%) 
 70–79 41 (46.6%) 14 (51.9%) 27 (44.3%) 
 80–89 10 (11.4%) (11.1%) (11.5%) 
History of smoking 
 Yes 72 (81.8%) 19 (70.4%) 53 (86.9%) 
 No 14 (15.9%) (29.6%) (9.8%) 
 Unknown (2.3%) (0%) (3.3%) 
Gender 
 Male 31 (35.2%) (33.3%) 22 (36.1%) 
 Female 56 (63.6%) 18 (66.6%) 38 (62.3%) 
 Unknown (1.1%) (0%) (1.6%) 
Ethnicity 
 Caucasian 73 (83.0%) 18 (66.6%) 55 (90.2%) 
 African American (4.5%) (11.1%) (1.6%) 
 Asian (8.0%) (18.5%) (3.3%) 
 Other (2.3%) (3.7%) (1.6%) 
 Unknown (2.3%) (0%) (3.3%) 
Staging 
 I 69 (78.4%) 23 (85.2%) 46 (75.4%) 
 II 13 (14.8%) (7.4%) 11 (18.0%) 
 III (4.5%) (3.7%) (4.9%) 
 IV (2.3%) (3.7%) (1.6%) 
Recurrence 
 Yes 21 (23.9%) (0%) 21 (34.4%) 
 No 35 (38.8%) (0%) 35 (57.4%) 
 N/A 32 (36.4%) 27 (100%) (8.2%) 
Overall survival 
 Alive 66 (75.0%) 27 (100%) 39 (63.9%) 
 Dead 19 (21.6%) (0%) 19 (31.1%) 
 N/A (3.4%) (0%) (4.9%) 
Total Cohort (n = 88)850K Analysis Cohort (n = 27)450K Analysis Cohort (n = 61)
Characteristic Number (%) Number (%) Number (%) 
Age range 
 <60 13 (14.8%) (7.4%) 11 (18.0%) 
 60–69 24 (27.3%) (29.6%) 16 (26.2%) 
 70–79 41 (46.6%) 14 (51.9%) 27 (44.3%) 
 80–89 10 (11.4%) (11.1%) (11.5%) 
History of smoking 
 Yes 72 (81.8%) 19 (70.4%) 53 (86.9%) 
 No 14 (15.9%) (29.6%) (9.8%) 
 Unknown (2.3%) (0%) (3.3%) 
Gender 
 Male 31 (35.2%) (33.3%) 22 (36.1%) 
 Female 56 (63.6%) 18 (66.6%) 38 (62.3%) 
 Unknown (1.1%) (0%) (1.6%) 
Ethnicity 
 Caucasian 73 (83.0%) 18 (66.6%) 55 (90.2%) 
 African American (4.5%) (11.1%) (1.6%) 
 Asian (8.0%) (18.5%) (3.3%) 
 Other (2.3%) (3.7%) (1.6%) 
 Unknown (2.3%) (0%) (3.3%) 
Staging 
 I 69 (78.4%) 23 (85.2%) 46 (75.4%) 
 II 13 (14.8%) (7.4%) 11 (18.0%) 
 III (4.5%) (3.7%) (4.9%) 
 IV (2.3%) (3.7%) (1.6%) 
Recurrence 
 Yes 21 (23.9%) (0%) 21 (34.4%) 
 No 35 (38.8%) (0%) 35 (57.4%) 
 N/A 32 (36.4%) 27 (100%) (8.2%) 
Overall survival 
 Alive 66 (75.0%) 27 (100%) 39 (63.9%) 
 Dead 19 (21.6%) (0%) 19 (31.1%) 
 N/A (3.4%) (0%) (4.9%) 

Exome sequencing and mutational analysis

Exome sequencing of the 61 LUAD cases from the 450K Analysis Cohort (Table 1) has been reported previously (20). The samples from the 850K Analysis Cohort (Table 1) were sequenced using clinically validated and FDA 510(k) cleared NYU Langone Genome PACT (LG PACT) targeting all exons of 580 cancer-implicated genes using hybrid capture on NextSeq 500 platform (Illumina, RRID:SCR_014983). Libraries consisting of 150 bp paired-end reads were demultiplexed using bcl2fastq (RRID:SCR_015058). Raw reads were checked for quality using FastQC (RRID:SCR_014583; ref. 21) followed by adapter removal and read trimming using trimmomatic (RRID:SCR_011848; ref. 22). Filtered reads were aligned to human reference genome hg19 using bwa-mem aligner (23). Aligned reads were realigned, recalibrated, and deduplicated using Genome Analysis Toolkit (GATK; RRID:SCR_012830; ref. 24). For somatic variant calling we used MuTect2 (GATK, RRID:SCR_000559), Strelka (RRID:SCR_005109; ref. 25), and LoFreqSomatic (26) callers and the subsequent variants (single-nucleotide variants and INDELs) were annotated using annovar (RRID:SCR_012821; ref. 27). The LG PACT pipeline for calling somatic variants can be found here: https://github.com/NYU-Molecular-Pathology/NGS580-nf. We then filtered for exonic variants that had a read-depth over 200, an allele frequency over 5%, and, if only called by 1 variant caller, quality over 100. From those exonic mutations, we focused our analysis on nonsynonymous mutations, stopgain mutations, frameshift insertions, and frameshift deletions. The list of genes included in each exome panel and the final filtered mutations for each cohort are found in Supplementary Data S1.

DNA methylation profiling

Whole-genome DNA methylation profiling was performed on all 88 LUAD cases. A total of 27 LUAD cases were profiled using Illumina EPIC array and combined with 61 LUAD cases previously profiled using 450K array (20, 28). Raw IDATs generated from iScan (RRID:SCR_016388) were processed and analyzed using Bioconductor R package Minfi (RRID:SCR_012830; ref. 29). Because different arrays were used, data were combined to analyze overlapping probes present on both arrays. The probes were normalized using quantile normalization and corrected for background signal. Sex probes and SNPs were removed. Samples were checked for their quality using mean detection P values, and beta values were obtained for probes with mean detection P < 0.05. Most variable probes were identified by calculating variance across probes and samples. The 10,000 most variable probes (Supplementary Data S2) were used to generate an unsupervised heatmap using ComplexHeatmap package (RRID:SCR_017270; ref. 30). Euclidean distance method was used for generating the distance matrix as it is the best method for continuous data, and ward.D2 method was used for hierarchical clustering because it minimizes the within cluster variances. Both these methods were used together to generate unsupervised heatmap that resulted in six distinct clusters. To have a clear visualization of these cluster subgroups, K-means was applied to identify each subgroup on the heatmap.

MethylCIBERSORT analysis

MethylCIBERSORT was used to calculate the stromal cell composition of the tumor microenvironment (15). Beta values obtained from raw IDATs along with the Signature genes were passed through the CIBERSORT to deconvolute the immune cell populations. PAM clustering was used to determine hot versus cold immune phenotype based on the immune populations calculated by the MethylCIBERSORT analysis as described previously (15).

IHC studies

Chromogenic IHC was performed on 10% neutral buffered formalin-fixed, paraffin-embedded, 5-μm human archival sample sections collected on plus slides (Thermo Fisher Scientific, catalog no. 22-042-924) and stored at room temperature. IHC was performed on a Ventana Medical Systems Discovery Ultra platform using unconjugated, rabbit anti-human CD8 IVD (Ventana Medical Systems, catalog no. 760-4460, lot nos. H04754, H35525, RRID AB_2335985) clone SP57 and unconjugated, mouse anti-human CD20 IVD (Ventana Medical Systems, catalog no. 760-2531, lot nos. H33787, RRID AB_ 2335956) clone L26 using Ventana's reagents and detection kits unless otherwise noted. In brief, slides were deparaffinized online. Antigen retrieval was performed for CD8 using Cell Conditioner 1 for 64 minutes and CD20 for 20 minutes, both at 95°C. Antibody was applied neat and incubated for 16 minutes at 37°C. Primary antibody was detected using standard Ultraview Detection, counterstained with hematoxylin II, dehydrated and mounted with permanent media. Positive and negative (diluent only) controls were run in parallel with study sections. All slides were reviewed and scored by a board-certified pathologist (M. Snuderl). Fraction of positive cells was established by counting 10 regions of interest (ROI) across all tumors from each DNA methylation group cluster at 400× magnification.

DNAm age analysis

DNAm age was calculated using the Horvath clock, which is a multitissue age predictor that estimates DNAm age by automatically selecting a set of 353 CpG probes by an elastic net regression model (16). Beta values for our cohort were obtained from the raw IDATs, and quantile normalization was applied. Sex probes and SNPs were not removed. The final matrix of beta values for all the samples was passed as input for Horvath clock to generate DNAm age for each sample.

Kyoto Encyclopedia of Genes and Genomes pathway analysis

To find the most enriched signaling pathways in subgroup 1, the most significant differentially methylated probes (top 10,000) between subgroup 1 and subgroups 2–6 where passed through the ClusterProfiler R (RRID:SCR_016884) package for Kyoto Encyclopedia of Genes and Genomes enrichment (31). Pathview, a pathway visualization tool (RRID:SCR_002732), was used to look at the promoter methylation levels for genes involved in PI3K-AKT signaling pathway (32).

Statistical analysis

For all analyses comparing two datasets, exact, two-tailed P values were calculated using nonparametric t (Mann–Whitney) test. For comparisons of more than two groups, the Kruskal–Wallis test was used to calculate approximate P values. Log-ranked (Mantel–Cox) tests were used to determine statistical significance between groups in all Kaplan–Meier curves. Correlation between immune cell populations and DNAm age were shown by linear regression and 95% confidence intervals.

Data availability statement

The data generated in this study are available upon request from the corresponding author.

Tumor immune composition differs based on clinical and mutational subtypes

We analyzed the whole-genome DNA methylation status of 88 primary LUADs. To generate this cohort, 27 LUADs analyzed using the Illumina EPIC (850K) methylation array were combined with 61 lung adenocarcinomas previously analyzed using the 450K methylation array (Table 1; ref. 20). MethylCIBERSORT was used to determine the fibroblast, endothelial, CD4+ effector T cells, CD8+ cytotoxic T cells, CD4+ regulatory T cells (Treg), CD56+ natural killer (NK) cells, CD19+ B cells, eosinophils, neutrophils, and CD14+ monocytes/macrophages for all 88 adenocarcinoma samples (Fig. 1A; Supplementary Data S3; ref. 15). Using the cellular composition determined by MethylCIBERSORT, PAM clustering was used to classify the tumors as immunologically hot or cold (15). The immunologically hot tumors had significantly higher CD8+ T cells (P < 0.0001), Tregs (P = 0.0004), and B cells (P < 0.0001), and significantly lower levels of eosinophils (P = 0.0004; Fig. 1B). Furthermore, we found that the hot tumors had a significantly higher CD8+ T cell to Treg ratio when compared with the cold tumors (P = 0.0122; Fig. 1C). The CD8+ to Treg ratio has been associated with the immunologically hot tumor classification and has been shown to be a favorable biomarker for multiple cancers (15, 33, 34). Interestingly, no difference was observed in the overall survival when comparing the two immunologic phenotypes (P = 0.5278; Supplementary Fig. S1A).

Figure 1.

MethylCIBERSORT analysis identified immune cell populations that differ based on smoking history, ethnicity, and genotype. A, The stromal cellular composition of each tumor identified by MethylCIBERSORT. B, Box plots comparing the immune cell populations present in the tumors characterized as immunologically hot verses cold. C, Box (top) and scatter (bottom) plots depicting the CD8+ T-cell/Treg ratio of the immunologically hot versus cold tumors. D, The eight immune cell populations were compared in patients who had a history of smoking (yes) and never smokers (no). E, Box plots comparing CD4+ T cells, CD56+ NK cells, CD14+ monocytes/macrophages, and neutrophils between the patients with (yes) or without (no) KRAS mutations. F, Patients with (yes) or without (no) TP53 mutations show significant differences in percentages of CD4+ T cells and Tregs. For all the box plots, the whiskers represent the minimum and maximum values, with all values shown. For comparison between two datasets, exact, two-tailed P values were calculated using nonparametric t (Mann–Whitney) test. For comparisons of more than two groups, the Kruskal–Wallis test was used to calculate approximate P values.

Figure 1.

MethylCIBERSORT analysis identified immune cell populations that differ based on smoking history, ethnicity, and genotype. A, The stromal cellular composition of each tumor identified by MethylCIBERSORT. B, Box plots comparing the immune cell populations present in the tumors characterized as immunologically hot verses cold. C, Box (top) and scatter (bottom) plots depicting the CD8+ T-cell/Treg ratio of the immunologically hot versus cold tumors. D, The eight immune cell populations were compared in patients who had a history of smoking (yes) and never smokers (no). E, Box plots comparing CD4+ T cells, CD56+ NK cells, CD14+ monocytes/macrophages, and neutrophils between the patients with (yes) or without (no) KRAS mutations. F, Patients with (yes) or without (no) TP53 mutations show significant differences in percentages of CD4+ T cells and Tregs. For all the box plots, the whiskers represent the minimum and maximum values, with all values shown. For comparison between two datasets, exact, two-tailed P values were calculated using nonparametric t (Mann–Whitney) test. For comparisons of more than two groups, the Kruskal–Wallis test was used to calculate approximate P values.

Close modal

When correlating immune cell composition with other clinical and molecular characteristics, patients with a history of smoking had significantly increased CD8+ T cells (P = 0.0369), Tregs (P = 0.0018), B cells (P = 0.0082), and neutrophils (P = 0.0082) and lower levels of NK cells (P = 0.0199), eosinophils (P < 0.001), and CD4+ T cells (P = 0183) compared with never smokers (Fig. 1D). Furthermore, there were significant differences in CD8+ T cells (P = 0.0408), Tregs (P = 0.0065), B cells (P = 0.0335), NK cells (P = 0.0012), and the granulocyte populations (neutrophils; P = 0.0318 and eosinophils; P = 0.0112) between the four ethnicity groups, with Caucasians having the highest levels of CD8+ T cells, Tregs, and B cells, followed by African Americans (AA), Asians (AS), and Other (Supplementary Fig. S1B). These findings suggest that ethnic background could play a factor in the immune microenvironment composition. However, due to the low sample size of the AA, AS, and Other groups, more work will need to be done to investigate this correlation. No significant differences were observed between immune cell populations based on tumor stage, sex, or recurrence frequency (Supplementary Fig. S1C–S1E).

To determine whether the immune populations differ based on the oncogenic driver mutation, DNA mutational analysis was performed on all 88 samples using clinically validated next-generation sequencing panels (Supplementary Data S1). For the 27 tumors in the 850K Analysis Cohort (Table 1), an exonic 580 gene panel (NYU Langone Genome PACT) was used to determine the mutational status. A smaller, LUAD-specific exonic 57-gene panel had been previously used to genotype the 61 tumors profiled by the 450K Analysis Cohort (Table 1; ref. 20). We focused on six known oncogenic drivers for LUAD which were present on both panels: KRAS, EGFR, TP53, STK11, KEAP1, and ATM. Of the 88-sample cohort, 29.5% had KRAS mutations as the sole oncogenic driver or with known comutations in TP53, STK11, or KEAP1 (Supplementary Fig. S2). Following KRAS, TP53 (17.0%) and ATM (9.1%) had the highest mutational frequency.

There were no observed differences in immune cell composition between the samples based on EGFR or STK11 mutational status (Supplementary Fig. S3C and S3D). Granulocytes differed on the basis of KEAP1 or ATM mutational status, with a significant decrease in eosinophils when KEAP1 was mutated (P = 0.0393) and a significant increase in neutrophils in the presence of ATM mutations (P = 0.0027; Supplementary Fig. S3E and S3F). The most striking differences in immune composition were observed when comparing tumors with or without KRAS or TP53 mutations (Fig. 1E and F; Supplementary Fig. S3A and S3B). CD4+ T cells were significantly decreased in the presence of either mutation (P = 0.0086 and P = 0.0458, respectively). The myeloid populations were also affected in KRAS-mutated tumors, with increases in CD14+ monocyte/macrophages (P = 0.0279) and neutrophils (P = 0.0084) and a decrease in NK cells (P = 0.0492). Interestingly, there was a significant increase in the populations of Tregs in TP53-mutated tumors (P = 0.0053), which has been previously reported and highlights the immunosuppressive role of TP53 mutation (35).

DNAm age is associated with overall survival, mutational status, and immune cell composition

DNA methylation strongly correlates with chronologic age, and for different disease states the DNAm age and chronologic age has been shown to differ (16–18). Stratifying LUAD into subgroups using DNAm age has yet to been examined. Using the Horvath clock (16), the DNAm age was calculated for all 88 LUAD tumors and compared with chronologic age demonstrating a significant correlation in many tumors (P = 0.0033; R = 0.3096; Fig. 2A). However, some tumors differed in chronologic and DNAm ages (Fig. 2A). Previous characterizations found that chronologic age was a significant factor in clinical outcome, with chronologically older patients having worse overall survival (36). Survival information was available for most patients in our cohort (n = 85), with an average follow-up period of 50 months. When comparing chronologic age groups, younger patients (<70 years old based on the average age of patients at the time of resection) had better overall survival compared with patients that were older at the time of resection, although these findings were not significant (P = 0.7056; Fig. 2B). Interestingly, when survival was compared on the basis of DNAm age, a trend was identified showing patients with a higher DNAm age (>75 years old taking into account the bimodal distribution of DNAm age; Supplementary Fig. S4A) had better overall survival compared with patients with a younger DNAm age (P = 0.5353; Fig. 2C). Both DNAm age and chronologic age was then compared to determine whether either ages differed between clinical characteristics and no significant difference in either age classifications were observed (Supplementary Fig. S4B and S4C). When comparing oncogenic driver mutations, tumors with STK11 (P = 0.0476), KEAP1 (P = 0.0028), and ATM (P = 0.0260) had significantly lower DNAm age (Fig. 2D). In addition, a trend in lower DNAm age was observed for tumors with TP53 mutations (P = 0.0911; Fig. 2D), similar to what has been reported previously (16). No difference in DNAm age was identified in tumors with either EGFR or KRAS mutations, and no difference in chronologic age was shown for any of the genotypes analyzed (Fig. 2D; Supplementary Fig. S5).

Figure 2.

DNAm age correlates with mutational status, tumor immune infiltrates, and overall survival. A, Chronologic age of the patient is plotted versus calculated DNAm age showing the linear regression and 95% confidence intervals (**, P = 0.0033; R = 0.3096). B and C, Kaplan–Meier curves comparing chronologically older (> 70 years) with chronologically younger (< 70 years) patients and biologically older (DNAm age > 75 years) to biologically younger (DNAm age < 75 years) tumors. No statistical significance was observed between groups (P < 0.05) using a log-ranked (Mantel–Cox) test (P = 0.7056 and P = 0.5353, respectively). D, Box plots comparing DNAm age of patients with (yes) and without (no) the presence of mutations in six known NSCLC oncogenic drivers. E, The immune cell compositions calculated by MethylCIBERSORT were compared between tumors with high DNAm age (D > 75) and low DNAm age (D < 75). For all the box plots, the whiskers represent the minimum and maximum values, with all values shown, and exact, two-tailed P values were calculated using the nonparametric t (Mann–Whitney) test. F, Positive correlation was observed between DNAm age and the percent cellular population of CD4+ T cells and CD56+ NK cells. Percentage of Tregs and CD19+ B cells in the tumor microenvironment was negatively correlated with DNAm age. Correlation shown by linear regression and 95% confidence intervals.

Figure 2.

DNAm age correlates with mutational status, tumor immune infiltrates, and overall survival. A, Chronologic age of the patient is plotted versus calculated DNAm age showing the linear regression and 95% confidence intervals (**, P = 0.0033; R = 0.3096). B and C, Kaplan–Meier curves comparing chronologically older (> 70 years) with chronologically younger (< 70 years) patients and biologically older (DNAm age > 75 years) to biologically younger (DNAm age < 75 years) tumors. No statistical significance was observed between groups (P < 0.05) using a log-ranked (Mantel–Cox) test (P = 0.7056 and P = 0.5353, respectively). D, Box plots comparing DNAm age of patients with (yes) and without (no) the presence of mutations in six known NSCLC oncogenic drivers. E, The immune cell compositions calculated by MethylCIBERSORT were compared between tumors with high DNAm age (D > 75) and low DNAm age (D < 75). For all the box plots, the whiskers represent the minimum and maximum values, with all values shown, and exact, two-tailed P values were calculated using the nonparametric t (Mann–Whitney) test. F, Positive correlation was observed between DNAm age and the percent cellular population of CD4+ T cells and CD56+ NK cells. Percentage of Tregs and CD19+ B cells in the tumor microenvironment was negatively correlated with DNAm age. Correlation shown by linear regression and 95% confidence intervals.

Close modal

We then compared the DNAm age with the immune composition determined by MethylCIBERSORT and found that patients with higher DNAm age had significantly lower levels of both Tregs (P = 0.0135) and CD19+ B cells (P = 0.0002; Fig. 2E). Furthermore, those patients also had higher levels of CD56+ NK cells (P = 0.0040), CD14+ monocytes/macrophages (P = 0.0443), and CD4+ T cells (P = 0.0234). The percent of immune cells in each tumor was then plotted as a function of DNAm age. Tregs (P = 0.0023) and CD19+ B cells (P = 0.0051) showed a significant negative correlation with DNAm age (Fig. 2F). Conversely, CD4+ T cells (P = 0.0057) and CD56+ NK cells (P = 0.0005) showed a significant positive correlation. Interestingly, the only association observed between chronologic age and an immune population was a negative correlation with CD8+ T cells (Supplementary Fig. S6A and S6B).

Unsupervised DNA methylation clustering identifies six distinct subgroups

Unsupervised clustering analysis was used to analyze DNA methylation–based subgroups of LUAD tumors, which identified six distinct molecular subgroups (Fig. 3). To better understand what stratified these subgroups, the clinical and mutational classifications of each subgroup were compared (Supplementary Fig. S7A–S7J). Subgroup 1 had the highest percentage of never smokers (45.5%) whereas all the tumors in subgroups 5 and 6 were from patients with smoking history (Supplementary Fig. S7A). The majority of tumors profiled were diagnosed as stage I (78.4%; Table 1), and 100% of subgroup 5 consisted of stage I tumors (Supplementary Fig. S7B). In contrary, subgroups 4 and 6 included more advanced stages (20% II and 20% III; 20% II and 20% IV, respectively) suggesting more aggressive phenotype of these DNA methylation molecular subtypes. The mutational statuses of oncogenic drivers were also compared across the six subtypes. Almost half of the samples in both subgroups 2 and 5 were composed of KRAS-mutant tumors (47.4% and 50%, respectively; Supplementary Fig. S7E). Interestingly, all the tumors in subgroup 6 had TP53 mutations but no mutations in the other five oncogenic drivers (Supplementary Fig. S7F). Overall, most LUAD driver mutations including EGFR, STK11, KEAP1, and ATM did not show enrichment in any of the subgroups (Supplementary Fig. S7G–S7J).

Figure 3.

Unsupervised clustering of DNA methylation identifies six subgroups. Heatmap of unsupervised K-mean clustering of 10,000 DNA methylation sites (excluding sex chromosomes) from 88 LUAD samples grouped into six different subtypes. Columns represent the individual tumor samples and rows individual CpG site. The clinical and genotypic information for each individual sample is annotated.

Figure 3.

Unsupervised clustering of DNA methylation identifies six subgroups. Heatmap of unsupervised K-mean clustering of 10,000 DNA methylation sites (excluding sex chromosomes) from 88 LUAD samples grouped into six different subtypes. Columns represent the individual tumor samples and rows individual CpG site. The clinical and genotypic information for each individual sample is annotated.

Close modal

LUAD subtypes have different tumor immune microenvironments

The estimated immune cell composition determined by MethylCIBERSORT was then used to determine whether the subgroups identified by the unsupervised clustering also differed in their tumor immune microenvironments. CD8+ T cells (P < 0.0001), Tregs (P = 0.0003), and CD19+ B cells (P < 0.0001) showed significant differences between the subgroups (Fig. 4A), with subgroup 4 having the highest percentage of CD8+ T cells and CD19+ B cells. To determine whether these subgroups could be further characterized by the immune phenotype, we compared the percentage of hot versus cold tumors in each subgroup. A total of 100% of the tumors in subgroup 4 were classified as being immunologically hot (Fig. 4B). Groups 3, 5, and 6 were composed of both hot and cold tumors to varying proportions (20%–50%). Both subgroups 1 and 2 were shown to consist of only immunologically cold tumors. Consistent with this hot versus cold characterization, subgroup 4 had a significantly higher CD8+ T cell to Treg ratio compared with groups 1, 2, 5, and 6 (Fig. 4C).

Figure 4.

Methylation subgroups have distinct immune phenotypes. A, Box plots comparing percentage of immune cell populations identified using MethyCIBERSORT between the six subgroups identified by unsupervised clustering. The Kruskal–Wallis test was used to calculate approximate P values. B, The methylation subgroups were further defined by the percent composition of immunologically hot versus cold tumors. C, Scatter (left) and box (right) plots depicting the ratio of CD8+ T cells to Tregs of the individual subgroups. Nonparametric t (Mann–Whitney) test shows significance between subgroups 1 and 2 (*, P = 0.0374), 1 and 3 (**, P = 0.0085), 1 and 4 (***, P = 0.0008), 2 and 4 (*, P = 0.0208), 2 and 6 (**, P = 0.0020), 3 and 6 (****, P < 0.0001), 4 and 5 (*, P = 0.0233), 4 and 6 (***, P = 0.0003), and 5 and 6 (*, P = 0.0303). All other comparisons did not reach significance. For all the box plots, the whiskers represent the minimum and maximum values, with all values shown. D, Bar graph comparing the IHC scoring of CD8+ T cells in subgroups 1–4. Nonparametric t (Mann–Whitney test show significance between subgroups 1 and 3 (**, P = 0.0038), 1 and 4 (***, P = 0.0006); 2 and 3 (*, P = 0.0257), and 2 and 4 (**, P = 0.0044). All other comparisons did not reach significance. E, Bar graph comparing the IHC scoring of CD20+ B cells in subgroups 1–4. Nonparametric t (Mann–Whitney test show significance between subgroups 1 and 3 (*, P = 0.0117), 1 and 4 (****, P < 0.0001); 2 and 4 (****, P < 0.0001), and 3 and 4 (****, P < 0.0001). All other comparisons did not reach significance. The error bars for both bar graphs represent the SEM.

Figure 4.

Methylation subgroups have distinct immune phenotypes. A, Box plots comparing percentage of immune cell populations identified using MethyCIBERSORT between the six subgroups identified by unsupervised clustering. The Kruskal–Wallis test was used to calculate approximate P values. B, The methylation subgroups were further defined by the percent composition of immunologically hot versus cold tumors. C, Scatter (left) and box (right) plots depicting the ratio of CD8+ T cells to Tregs of the individual subgroups. Nonparametric t (Mann–Whitney) test shows significance between subgroups 1 and 2 (*, P = 0.0374), 1 and 3 (**, P = 0.0085), 1 and 4 (***, P = 0.0008), 2 and 4 (*, P = 0.0208), 2 and 6 (**, P = 0.0020), 3 and 6 (****, P < 0.0001), 4 and 5 (*, P = 0.0233), 4 and 6 (***, P = 0.0003), and 5 and 6 (*, P = 0.0303). All other comparisons did not reach significance. For all the box plots, the whiskers represent the minimum and maximum values, with all values shown. D, Bar graph comparing the IHC scoring of CD8+ T cells in subgroups 1–4. Nonparametric t (Mann–Whitney test show significance between subgroups 1 and 3 (**, P = 0.0038), 1 and 4 (***, P = 0.0006); 2 and 3 (*, P = 0.0257), and 2 and 4 (**, P = 0.0044). All other comparisons did not reach significance. E, Bar graph comparing the IHC scoring of CD20+ B cells in subgroups 1–4. Nonparametric t (Mann–Whitney test show significance between subgroups 1 and 3 (*, P = 0.0117), 1 and 4 (****, P < 0.0001); 2 and 4 (****, P < 0.0001), and 3 and 4 (****, P < 0.0001). All other comparisons did not reach significance. The error bars for both bar graphs represent the SEM.

Close modal

To confirm the immune phenotypes of the in silico analysis, CD8+ T cells and B cells were analyzed by IHC from representative tumors from each of the four largest subgroups (subgroups 1–4). Scoring of 10 ROIs from each subgroup confirmed the in silico analysis that subgroup 1 had the least number of CD8+ T-cell and B-cell infiltrates and subgroup 4 had the highest number of each immune population (Fig. 4D and E; Supplementary Fig. S8). These findings support that DNA methylation can be used to identify immunologically hot and cold LUAD subgroups.

LUAD tumors composing DNA methylation subgroup 1 have higher DNAm age, better overall survival, and longer recurrence-free survival

To determine whether the subgroups identified by unsupervised clustering also differ based on age, the chronologic age and DNAm biological age were compared. The six subgroups had significantly different DNAm age (P = 0.0007; Fig. 5A), but no difference was observed for chronologic age (Fig. 5B). Subgroup 1 had the highest DNAm age (average 79.1 years), while subgroup 6 was shown to have the youngest DNAm age (average 48.6 years). This finding is consistent with the presence of TP53 trending with lower DNAm age, as all the patients in subgroup 6 had TP53 mutations (Fig. 2D; Supplementary Fig. S7F).

Figure 5.

DNA methylation subgroup 1 identifies patients with no instances of recurrence and greater overall survival. Box plot depicting the differences in DNAm age (A) and chronologic age (B) between the methylation subgroups. For both box plots, the whiskers represent the minimum and maximum values, with all values shown, and the Kruskal–Wallis test was used to calculate approximate P values. C, Kaplan–Meier curve of overall survival based on the time between surgical resection and either death or last follow-up. No statistical significance was observed between groups (P < 0.05) using a log-ranked (Mantel–Cox) test; however, comparison of group 1 with groups 1–5 had all had P values < 0.06 (1 vs. 2 P = 0.0582; 1 vs. 3 P = 0.0562; 1 vs. 4 P = 0.0596; 1 vs. 5 P = 0.0514). D, The percent composition of the tumors that had experienced a recurrence event in the six subtypes identified by unsupervised clustering of the 10,000 DNA methylation sites. E, Kaplan–Meier curve of recurrence-free survival based on the time between surgical resection and the diagnosis of a recurrent event (either local or systemic) or date of last follow-up. Log-ranked (Mantel–Cox) test showed statistically significant difference between subgroup 1 and 2 (*, P = 0.0272), 1 and 4 (*, P = 0.0373), and 1 and 5 (**, P = 0.0081). All other comparisons did not reach significance. NS, not significant. F, Dot plot depicting the ratio of the genes (x-axis) involved in each signaling pathway (y-axis). Size of the dots represents the gene count, and the color denotes the significance level.

Figure 5.

DNA methylation subgroup 1 identifies patients with no instances of recurrence and greater overall survival. Box plot depicting the differences in DNAm age (A) and chronologic age (B) between the methylation subgroups. For both box plots, the whiskers represent the minimum and maximum values, with all values shown, and the Kruskal–Wallis test was used to calculate approximate P values. C, Kaplan–Meier curve of overall survival based on the time between surgical resection and either death or last follow-up. No statistical significance was observed between groups (P < 0.05) using a log-ranked (Mantel–Cox) test; however, comparison of group 1 with groups 1–5 had all had P values < 0.06 (1 vs. 2 P = 0.0582; 1 vs. 3 P = 0.0562; 1 vs. 4 P = 0.0596; 1 vs. 5 P = 0.0514). D, The percent composition of the tumors that had experienced a recurrence event in the six subtypes identified by unsupervised clustering of the 10,000 DNA methylation sites. E, Kaplan–Meier curve of recurrence-free survival based on the time between surgical resection and the diagnosis of a recurrent event (either local or systemic) or date of last follow-up. Log-ranked (Mantel–Cox) test showed statistically significant difference between subgroup 1 and 2 (*, P = 0.0272), 1 and 4 (*, P = 0.0373), and 1 and 5 (**, P = 0.0081). All other comparisons did not reach significance. NS, not significant. F, Dot plot depicting the ratio of the genes (x-axis) involved in each signaling pathway (y-axis). Size of the dots represents the gene count, and the color denotes the significance level.

Close modal

Tumors with higher DNAm age were shown to have at trend toward better survival (Fig. 2C). To determine whether the DNAm ages of the subgroups correlated with clinical outcomes, overall survival and recurrence-free survival were compared between all subgroups. Subgroup 1 with the highest DNAm age had no reported patient deaths and had a trend for better overall survival when compared with subgroups 2–5 (Fig. 5C). Though significance was not reached, comparison of subgroup 1 with subgroups 2–5 all had P values < 0.06 (1 vs. 2 P = 0.0582; 1 vs. 3 P = 0.0562; 1 vs. 4 P = 0.0596; 1 vs. 5 P = 0.0514). Subgroup 6, which had the lowest DNAm age, also had no reported patient deaths; however, subgroup 6 had a small number of tumors (N = 5) and a shorter median follow-up period compared with the other groups (34.6 vs. 42.5 months) limiting our conclusions.

For the purposes of this study, recurrence-free survival was defined by having no recurrence event (local or systemic) and a minimum contact time of 36 months. When comparing only patients that did or did not have tumor recurrence, none of the patients identified in subgroup 1 presented with any type of recurrence (Fig. 5D and E), instead they had significantly better recurrence-free survival compared with three of the other groups (1 vs. 2 P = 0.0272; 1 vs. 4 P = 0.0373; 1 vs. 5 P = 0.0081). To determine whether the stage of the tumor affects our observed difference in recurrence, we looked at recurrence between methylation subgroups using only the stage I tumors. Comparing Fig. 5D (all stages) with Supplementary Fig. S9 (stage I) shows that the percentage of recurrence does not change dramatically between only stage I tumors with all stages.

In addition to the age and immune characteristics of this potential clinically relevant subgroup, a pathway analysis was performed on the most differently methylated (either hypomethylated or hypermethylated) genes to determine expression of which signaling pathways characterize the tumors in subgroup 1 compared with the other subgroups. The pathways with a P < 0.001 are listed in Supplementary Data S4. When comparing the number of genes altered in the pathway, the top differentially methylated pathways included PI3K-AKT, MAPK, RAS, and cellular senescence (Fig. 5F). Interestingly, the top pathways identified also included pathways involved in tumor invasiveness and metastasis, including endocytosis, regulation of the actin cytoskeleton, focal adhesion genes, and genes involved in the tight junction (Fig. 5F). Detailed analysis of the PI3K-Akt pathway, one of the top pathways altered in subgroup 1, revealed that both AKT and PI3K gene promoters were hypermethylated, whereas the promoter for the negative regulator of this pathway PTEN was hypomethylated (Supplementary Fig. S10). These findings suggest that multiple cellular pathways associated with aggressive disease, including a downregulation of the PI3K-Akt pathway, may be altered in subgroup 1.

LUAD is clinically heterogeneous disease. Understanding the epigenetic heterogeneity of resectable LUAD tumors could greatly improve our ability to identify high-risk patients and provide adjuvant therapy. Here we take a multifaceted approach to dissect this heterogeneity by utilizing DNA methylation and mutational profiling data to characterize and categorize 88 LUAD tumors based on their immune composition, DNAm age, whole-genome DNA methylation landscape, and oncogenic drivers. We were able to distinguish immunologically hot versus cold tumors based on in silico analysis of the immune cell populations, and verify those results by IHC. Furthermore, we found that methylation-based biological DNAm age is associated with presence of specific oncogenic mutations, overall survival, and differences in the tumor immune microenvironment. Finally, by unsupervised clustering we stratified lung adenocarcinomas into six subgroups with unique clinical and immunologic profiles.

Tumor recurrence remains a major factor in the high mortality rate of LUAD, and the addition of adjuvant chemotherapy in early-stage disease results in only modest survival benefits (37, 38). Identifying biomarkers which predict high risk of recurrence at the time of initial surgical resection would classify a subset of patients in need of more aggressive therapy and close monitoring. Furthermore, distinguishing the molecular and immunologic phenotypes of these high-risk patients could guide the use of targeted therapy or immunotherapies as adjuvant treatments. Currently, clinical trials are examining adjuvant treatment of TKIs in early stage (IB–III) EGFR-mutant LUADs. These trials show that adjuvant treatment of TKIs, with or without chemotherapy, improve disease-free survival for those patients harboring the EGFR mutation (5, 39–41). The results of these studies show how targeted adjuvant therapy may provide benefit to patients that are stratified by genotype. However, this only benefits a subset of LUADs based on their mutational status. Therefore, identifying tumors with a low or high risk of recurrence will improve our understanding of the biological mechanisms of recurrence and can guide clinical management and early targeted intervention.

Using DNA methylation, we identified LUAD subgroup 1 that did not present with any recurrence or reported deaths after an average follow-up time of 49 months. Interestingly, our data suggest that the PI3K-AKT signaling pathway was altered in subgroup 1 when compared with the other subgroups, including hypermethylation of both AKT and PI3K gene promoters and hypomethylation of PTEN. Though further research is needed to determine whether this pathway is repressed at the mRNA and protein level, this suggests that repression of this signaling pathway may facilitate the better clinical outcomes that were observed for this subgroup. Interestingly, analysis of methylation marks associated with age indicated that this subgroup also had the highest DNAm age, indicating that pathways involved in differentiation and cellular plasticity, like the PI3K-Akt pathway (42, 43), may in turn affect the DNAm age. In support of this, we also show that tumors with TP53 mutation, which is also involved in differentiation and cellular plasticity (44, 45), had a trend toward lower DNAm age. Taken together, dysregulation of cellular differentiation could contribute to the calculated DNAm age of these solid tumors and therefore play a role in the correlation between DNAm age and overall survival.

In addition to targeted treatments, ongoing clinical trials are examining the potential benefits of IO therapies after surgical resection of early-stage LUADs (4, 46). The results of these studies could change the standard of care for these patients; however, similar to late-stage use of IO therapies, there is the possibility that only a percentage of this patient population may benefit from these treatments. The composition of the immune microenvironment has been shown to be predictive of response to IO therapies (47, 48). Furthermore, the ability to use DNA methylation as a prognostic tool to analyze the immune microenvironment has multiple benefits, including the relative stable nature of DNA and the ability to use frozen samples. Here we were able to use unsupervised clustering of DNA methylation to identify subgroups that had distinct immune phenotypes, and thus could potentially have differential responses to IO therapies. Subgroup 4 consisted of LUAD tumors that were immunologically hot, with high levels of CD8+ T cells, B cells, and Tregs; whereas subgroup 2 was composed of immunologically cold tumors. Interestingly, subgroup 1 had the highest percentage of never smokers and was also immunologically cold with lowest CD8+ and Treg populations. This supports the findings that patients with no smoking history have better overall survival but poor response to IO therapy (49).

In summary, taking a multifaceted approach, we were able to stratify 88 LUAD tumors based on immunologic phenotype, DNAm age, and DNA methylation signatures using genome-wide DNA methylation data. Characterizing the molecular and immunologic heterogeneity of these LUAD tumors is pivotal to identify personalized targeted or IO-based adjuvant therapy after surgical resection.

K. Guidry reports personal fees from GlaxoSmithKline outside the submitted work. A.L. Moreira reports grants from BMS and other support from Roche outside the submitted work. H. Pass reports grants from NCI/NIH/EDRN during the conduct of the study. K.-K. Wong reports grants from Blueprint, Pfizer, Janssen, Merus, and BMS; and grants and personal fees from Zentalis and Mirati outside the submitted work; and is scientific founder and equity holder of G1 Therapeutics. No disclosures were reported by the other authors.

K. Guidry: Conceptualization, data curation, formal analysis, investigation, writing-original draft, writing–review and editing. V. Vasudevaraja: Conceptualization, formal analysis, methodology, writing–original draft. K. Labbe: Conceptualization, data curation, project administration, writing–review and editing. H. Mohamed: Investigation. J. Serrano: Formal analysis. B.W. Guidry: Formal analysis. M. DeLorenzo: Formal analysis. H. Zhang: Project administration, writing–review and editing. J. Deng: Conceptualization, writing–review and editing. S. Sahu: Data curation, writing–review and editing. C. Almonte: Resources, project administration. A.L. Moreira: Conceptualization, resources, supervision, funding acquisition, methodology, writing–review and editing. A. Tsirigos: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, visualization, methodology, writing–original draft, writing–review and editing. T. Papagiannakopoulos: Conceptualization, resources, data curation, supervision, funding acquisition, visualization, methodology, writing–review and editing. H. Pass: Conceptualization, resources, supervision, funding acquisition, writing–review and editing. M. Snuderl: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, visualization, methodology, writing–original draft, writing–review and editing. K.-K. Wong: Conceptualization, resources, data curation, supervision, funding acquisition, visualization, methodology, writing–review and editing.

We would like to thank the Genome Technology Center for expert library preparation and sequencing and the Applied Bioinformatics Laboratories for providing bioinformatics support, analysis, and interpretation of the data. This work has used computing resources at the NYU School of Medicine High Performance Computing Facility. We would also like to thank NYU Langone's Center for Biospecimen Research and Development core and the Experimental Pathology Research Laboratory for their help with the clinical samples and with the IHC analysis. The Genome Technology Center, the Applied Bioinformatics Laboratories, and the Experimental Pathology Research Laboratory are shared resources partially supported by the Cancer Center Support Grant (P30CA016087) at the Laura and Isaac Perlmutter Cancer Center. Furthermore, the Experimental Pathology Research Laboratory is partially supported by the Shared Instrumentation Grant (S10 OD021747). The DNA methylation profiling at NYU is in part supported by the Friedberg Charitable Foundation grant awarded to M. Snuderl. Additional support is provided by the NIH NCI grant awarded to H. Pass (U01CA214195).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).

1.
Duma
N
,
Santana-Davila
R
,
Molina
JR
.
Non-small cell lung cancer: epidemiology, screening, diagnosis, and treatment
.
Mayo Clin Proc
2019
;
94
:
1623
40
.
2.
Ettinger
DS
,
Wood
DE
,
Aisner
DL
,
Akerley
W
,
Bauman
J
,
Chirieac
LR
, et al
.
Non-small cell lung cancer, version 5.2017, NCCN clinical practice guidelines in oncology
.
J Natl Compr Canc Netw
2017
;
15
:
504
35
.
3.
Brock
MV
,
Hooker
CM
,
Ota-Machida
E
,
Han
Y
,
Guo
M
,
Ames
S
, et al
.
DNA methylation markers and early recurrence in stage I lung cancer
.
N Engl J Med
2008
;
358
:
1118
28
.
4.
Ortega-Franco
A
,
Calvo
V
,
Franco
F
,
Provencio
M
,
Califano
R
.
Integrating immune checkpoint inhibitors and targeted therapies in the treatment of early stage non-small cell lung cancer: a narrative review
.
Transl Lung Cancer Res
2020
;
9
:
2656
73
.
5.
Wu
YL
,
Tsuboi
M
,
He
J
,
John
T
,
Grohe
C
,
Majem
M
, et al
.
Osimertinib in resected EGFR-mutated non-small-cell lung cancer
.
N Engl J Med
2020
;
383
:
1711
23
.
6.
Capper
D
,
Jones
DTW
,
Sill
M
,
Hovestadt
V
,
Schrimpf
D
,
Sturm
D
, et al
.
DNA methylation-based classification of central nervous system tumours
.
Nature
2018
;
555
:
469
74
.
7.
Koelsche
C
,
Schrimpf
D
,
Stichel
D
,
Sill
M
,
Sahm
F
,
Reuss
DE
, et al
.
Sarcoma classification by DNA methylation profiling
.
Nat Commun
2021
;
12
:
498
.
8.
Wu
SP
,
Cooper
BT
,
Bu
F
,
Bowman
CJ
,
Killian
JK
,
Serrano
J
, et al
.
DNA methylation-based classifier for accurate molecular diagnosis of bone sarcomas
.
JCO Precis Oncol
2017
;
2017
:
PO.17.00031
.
9.
Sahm
F
,
Schrimpf
D
,
Stichel
D
,
Jones
DTW
,
Hielscher
T
,
Schefzyk
S
, et al
.
DNA methylation-based classification and grading system for meningioma: a multicentre, retrospective analysis
.
Lancet Oncol
2017
;
18
:
682
94
.
10.
Dogan
S
,
Vasudevaraja
V
,
Xu
B
,
Serrano
J
,
Ptashkin
RN
,
Jung
HJ
, et al
.
DNA methylation-based classification of sinonasal undifferentiated carcinoma
.
Mod Pathol
2019
;
32
:
1447
59
.
11.
Cancer Genome Atlas Research Network
.
Comprehensive molecular profiling of lung adenocarcinoma
.
Nature
2014
;
511
:
543
50
.
12.
Karlsson
A
,
Jonsson
M
,
Lauss
M
,
Brunnstrom
H
,
Jonsson
P
,
Borg
A
, et al
.
Genome-wide DNA methylation analysis of lung carcinoma reveals one neuroendocrine and four adenocarcinoma epitypes associated with patient outcome
.
Clin Cancer Res
2014
;
20
:
6127
40
.
13.
Sandoval
J
,
Mendez-Gonzalez
J
,
Nadal
E
,
Chen
G
,
Carmona
FJ
,
Sayols
S
, et al
.
A prognostic DNA methylation signature for stage I non-small-cell lung cancer
.
J Clin Oncol
2013
;
31
:
4140
7
.
14.
Shinjo
K
,
Okamoto
Y
,
An
B
,
Yokoyama
T
,
Takeuchi
I
,
Fujii
M
, et al
.
Integrated analysis of genetic and epigenetic alterations reveals CpG island methylator phenotype associated with distinct clinical characters of lung adenocarcinoma
.
Carcinogenesis
2012
;
33
:
1277
85
.
15.
Chakravarthy
A
,
Furness
A
,
Joshi
K
,
Ghorani
E
,
Ford
K
,
Ward
MJ
, et al
.
Pan-cancer deconvolution of tumour composition using DNA methylation
.
Nat Commun
2018
;
9
:
3220
.
16.
Horvath
S
.
DNA methylation age of human tissues and cell types
.
Genome Biol
2013
;
14
:
R115
.
17.
Zhu
T
,
Gao
Y
,
Wang
J
,
Li
X
,
Shang
S
,
Wang
Y
, et al
.
CancerClock: A DNA methylation age predictor to identify and characterize aging clock in pan-cancer
.
Front Bioeng Biotechnol
2019
;
7
:
388
.
18.
Yu
M
,
Hazelton
WD
,
Luebeck
GE
,
Grady
WM
.
Epigenetic aging: more than just a clock when it comes to cancer
.
Cancer Res
2020
;
80
:
367
74
.
19.
Levine
ME
,
Hosgood
HD
,
Chen
B
,
Absher
D
,
Assimes
T
,
Horvath
S
.
DNA methylation age of blood predicts future onset of lung cancer in the women's health initiative
.
Aging
2015
;
7
:
690
700
.
20.
Romero
R
,
Sayin
VI
,
Davidson
SM
,
Bauer
MR
,
Singh
SX
,
LeBoeuf
SE
, et al
.
Keap1 loss promotes Kras-driven lung cancer and results in dependence on glutaminolysis
.
Nat Med
2017
;
23
:
1362
8
.
21.
FASTQC
.
0.11.9: Babraham Bioinformatics
;
2010
.
22.
Bolger
AM
,
Lohse
M
,
Usadel
B
.
Trimmomatic: a flexible trimmer for Illumina sequence data
.
Bioinformatics
2014
;
30
:
2114
20
.
23.
Li
H
.
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
.
arXiv
:
1303.3997v1
:
Oxford University Press
;
2013
.
24.
McKenna
A
,
Hanna
M
,
Banks
E
,
Sivachenko
A
,
Cibulskis
K
,
Kernytsky
A
, et al
.
The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data
.
Genome Res
2010
;
20
:
1297
303
.
25.
Saunders
CT
,
Wong
WS
,
Swamy
S
,
Becq
J
,
Murray
LJ
,
Cheetham
RK
.
Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs
.
Bioinformatics
2012
;
28
:
1811
7
.
26.
Wilm
A
,
Aw
PP
,
Bertrand
D
,
Yeo
GH
,
Ong
SH
,
Wong
CH
, et al
.
LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets
.
Nucleic Acids Res
2012
;
40
:
11189
201
.
27.
Wang
K
,
Li
M
,
Hakonarson
H
.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic Acids Res
2010
;
38
:
e164
.
28.
Serrano
J
,
Snuderl
M
.
Whole genome DNA methylation analysis of human glioblastoma using illumina beadarrays
.
Methods Mol Biol
2018
;
1741
:
31
51
.
29.
Aryee
MJ
,
Jaffe
AE
,
Corrada-Bravo
H
,
Ladd-Acosta
C
,
Feinberg
AP
,
Hansen
KD
, et al
.
Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays
.
Bioinformatics
2014
;
30
:
1363
9
.
30.
Gu
Z
,
Eils
R
,
Schlesner
M
.
Complex heatmaps reveal patterns and correlations in multidimensional genomic data
.
Bioinformatics
2016
;
32
:
2847
9
.
31.
Yu
G
,
Wang
LG
,
Han
Y
,
He
QY
.
clusterProfiler: an R package for comparing biological themes among gene clusters
.
OMICS
2012
;
16
:
284
7
.
32.
Luo
W
,
Brouwer
C
.
Pathview: an R/Bioconductor package for pathway-based data integration and visualization
.
Bioinformatics
2013
;
29
:
1830
1
.
33.
Sato
E
,
Olson
SH
,
Ahn
J
,
Bundy
B
,
Nishikawa
H
,
Qian
F
, et al
.
Intraepithelial CD8+ tumor-infiltrating lymphocytes and a high CD8+/regulatory T cell ratio are associated with favorable prognosis in ovarian cancer
.
Proc Natl Acad Sci U S A
2005
;
102
:
18538
43
.
34.
Suzuki
H
,
Chikazawa
N
,
Tasaka
T
,
Wada
J
,
Yamasaki
A
,
Kitaura
Y
, et al
.
Intratumoral CD8(+) T/FOXP3 (+) cell ratio is a predictive marker for survival in patients with colorectal cancer
.
Cancer Immunol Immunother
2010
;
59
:
653
61
.
35.
Blagih
J
,
Zani
F
,
Chakravarty
P
,
Hennequart
M
,
Pilley
S
,
Hobor
S
, et al
.
Cancer-specific loss of p53 leads to a modulation of myeloid and T cell responses
.
Cell Rep
2020
;
30
:
481
96
.
36.
Tas
F
,
Ciftci
R
,
Kilic
L
,
Karabulut
S
.
Age is a prognostic factor affecting survival in lung cancer patients
.
Oncol Lett
2013
;
6
:
1507
13
.
37.
Arriagada
R
,
Bergman
B
,
Dunant
A
,
Le Chevalier
T
,
Pignon
JP
,
Vansteenkiste
J
, et al
.
Cisplatin-based adjuvant chemotherapy in patients with completely resected non-small-cell lung cancer
.
N Engl J Med
2004
;
350
:
351
60
.
38.
Pignon
JP
,
Tribodet
H
,
Scagliotti
GV
,
Douillard
JY
,
Shepherd
FA
,
Stephens
RJ
, et al
.
Lung adjuvant cisplatin evaluation: a pooled analysis by the LACE Collaborative Group
.
J Clin Oncol
2008
;
26
:
3552
9
.
39.
Pennell
NA
,
Neal
JW
,
Chaft
JE
,
Azzoli
CG
,
Janne
PA
,
Govindan
R
, et al
.
SELECT: a phase II trial of adjuvant erlotinib in patients with resected epidermal growth factor receptor-mutant non-small-cell lung cancer
.
J Clin Oncol
2019
;
37
:
97
104
.
40.
Li
N
,
Ou
W
,
Ye
X
,
Sun
HB
,
Zhang
L
,
Fang
Q
, et al
.
Pemetrexed-carboplatin adjuvant chemotherapy with or without gefitinib in resected stage IIIA-N2 non-small cell lung cancer harbouring EGFR mutations: a randomized, phase II study
.
Ann Surg Oncol
2014
;
21
:
2091
6
.
41.
Feng
S
,
Wang
Y
,
Cai
K
,
Wu
H
,
Xiong
G
,
Wang
H
, et al
.
Randomized adjuvant chemotherapy of EGFR-mutated non-small cell lung cancer patients with or without icotinib consolidation therapy
.
PLoS One
2015
;
10
:
e0140794
.
42.
Kong
D
,
Hughes
CJ
,
Ford
HL
.
Cellular plasticity in breast cancer progression and therapy
.
Front Mol Biosci
2020
;
7
:
72
.
43.
Yu
JS
,
Cui
W
.
Proliferation, survival and metabolism: the role of PI3K/AKT/mTOR signalling in pluripotency and cell fate determination
.
Development
2016
;
143
:
3050
60
.
44.
Tschaharganeh
DF
,
Xue
W
,
Calvisi
DF
,
Evert
M
,
Michurina
TV
,
Dow
LE
, et al
.
p53-dependent Nestin regulation links tumor suppression to cellular plasticity in liver cancer
.
Cell
2014
;
158
:
579
92
.
45.
Molchadsky
A
,
Rivlin
N
,
Brosh
R
,
Rotter
V
,
Sarig
R
.
p53 is balancing development, differentiation and de-differentiation to assure cancer prevention
.
Carcinogenesis
2010
;
31
:
1501
8
.
46.
Vansteenkiste
J
,
Wauters
E
,
Reymen
B
,
Ackermann
CJ
,
Peters
S
,
De Ruysscher
D
.
Current status of immune checkpoint inhibition in early-stage NSCLC
.
Ann Oncol
2019
;
30
:
1244
53
.
47.
Havel
JJ
,
Chowell
D
,
Chan
TA
.
The evolving landscape of biomarkers for checkpoint inhibitor immunotherapy
.
Nat Rev Cancer
2019
;
19
:
133
50
.
48.
Maleki Vareki
S
.
High and low mutational burden tumors versus immunologically hot and cold tumors and response to immune checkpoint inhibitors
.
J Immunother Cancer
2018
;
6
:
157
.
49.
Li
B
,
Huang
X
,
Fu
L
.
Impact of smoking on efficacy of PD-1/PD-L1 inhibitors in non-small cell lung cancer patients: a meta-analysis
.
Onco Targets Ther
2018
;
11
:
3691
6
.

Supplementary data