Purpose: The National Lung Cancer Screening Trial has confirmed that lung cancer mortality can be reduced if tumors are diagnosed early, that is, at stage I. However, a substantial fraction of stage I lung cancer patients still develop metastatic disease within 5 years from surgery. Prognostic biomarkers are therefore needed to identify patients at risk of an adverse outcome, who might benefit from multimodality treatment.

Experimental Design: We extensively validated a 10-gene prognostic signature in a cohort of 507 lung adenocarcinoma patients using formalin-fixed paraffin-embedded samples. Furthermore, we performed an integrated analysis of gene expression, methylation, somatic mutations, copy number variations, and proteomic profiles on an independent cohort of 468 patients from The Cancer Genome Atlas (TCGA).

Results: Stage I lung cancer patients (N = 351) identified as high-risk by the 10-gene signature displayed a 4-fold increased risk of death [HR = 3.98; 95% confidence interval (CI), 1.73–9.14], with a 3-year overall survival of 84.2% (95% CI, 78.7–89.7) compared with 95.6% (92.4–98.8) in low-risk patients. The analysis of TCGA cohort revealed that the 10-gene signature identifies a subgroup of stage I lung adenocarcinomas displaying distinct molecular characteristics and associated with aggressive behavior and poor outcome.

Conclusions: We validated a 10-gene prognostic signature capable of identifying a molecular subtype of stage I lung adenocarcinoma with characteristics remarkably similar to those of advanced lung cancer. We propose that our signature might aid the identification of stage I patients who would benefit from multimodality treatment. Clin Cancer Res; 23(1); 62–72. ©2016 AACR.

This article is featured in Highlights of This Issue, p. 1

Translational Relevance

We validated a 10-gene prognostic signature capable of identifying an aggressive molecular subtype of stage I lung adenocarcinoma, with genetic characteristics remarkably similar to advanced lung cancer. Although the overall prognostic outcome of stage I NSCLC is favorable, up to 40% of these patients will eventually relapse with metastatic disease. This issue of high-risk stage I patients is becoming increasingly relevant to the clinical management of non–small cell lung cancer (NSCLC) as our ability to detect the disease in its early phases improves. Risk stratification tools may guide clinical decision making in patients with stage I lung cancer at high risk of disease recurrence, who would be eligible for multimodality treatment. In this study, we moved toward the clinical application of this signature by (i) testing it in large independent cohorts of NSCLC patients; (ii) assessing the robustness of its prognostic power across different technological platforms; (iii) reducing it into practice in a closer-to-the-clinic setting, represented by FFPE specimens.

Lung cancer is the primary cause of cancer-related death worldwide (1). Survival of patients with non–small cell lung cancer (NSCLC), the predominant type of lung cancer, accounting for approximately 85% of all lung cancer cases, largely depends on tumor stage at diagnosis; only approximately 15% of all patients with advanced disease (stage III–IV) are alive after 5 years, while survival increases to approximately 60% in patients diagnosed with stage I disease (1). Thus, efforts have been devoted to the development of strategies for early lung cancer detection. In particular, annual low-dose CT (LDCT) screening in high-risk individuals (>55 years and smokers, >30 pack/year) was shown to be effective in diagnostic anticipation, resulting in a reduction in mortality (2–5).

As our ability to detect NSCLC in its early stage improves, the issue of the clinical management of stage I patients is becoming increasingly relevant. As of today, a sizable fraction of stage I NSCLC patients (up to ∼40%) develops disease recurrence within 5 years from surgery. Stage I NSCLC is treated preferentially by surgery, as the benefit of adjuvant chemotherapy in these patients remains controversial (6–9). However, prognostic biomarkers could change this scenario by allowing the stratification of stage I patients according to risk of disease recurrence and the selection of those patients who might benefit from multimodality treatment.

We previously described a 10-gene signature able to predict prognosis of patients with stage I lung adenocarcinoma, the major subtype of NSCLC (10). Subsequently, other prognostic gene expression signatures in lung cancer have been proposed, although few of them were validated in large and independent cohorts of patients (11, 12). We now developed and optimized a PCR-based method for assessing our 10-gene signature in formalin-fixed paraffin-embedded (FFPE) tissue samples. Using this optimized protocol, we validated the prognostic accuracy of the 10-gene signature in an independent [obtained from a different hospital with respect to that of our original report (10)] and large cohort of 507 lung adenocarcinoma patients, including 351 stage I lung patients (AJCC, 7th edition). Furthermore, we performed an integrated analysis of gene expression, methylation, somatic mutations, copy number variations (CNV), and proteomic profiles on a second independent cohort of 468 lung tumors profiled by The Cancer Genome Atlas (TCGA; ref. 13) consortium. This latter “multiomics” analysis revealed that the 10-gene signature identifies an aggressive subtype of stage I lung adenocarcinoma with molecular characteristics similar to more advanced cancers. Notably, we found a prevalence of KEAP1/Nrf2 pathway alterations, which is particularly relevant in light of ongoing efforts aimed at the identification of small-molecule inhibitors targeting Nrf2 (14, 15).

Study population and data sources

IEO stage I cohort.

Between July 1998 and December 2012, 425 patients with stage I (IA and IB, according to the 7th edition of the TNM Classification of Malignant Tumors) adenocarcinoma of the lung underwent surgery at the European Institute of Oncology (IEO, Milan, Italy). None of these patients received preoperative chemotherapy or had been previously diagnosed with cancer. Clinical information was obtained through review of medical records. Vital status was assessed through the Vital Records Offices of the patients' towns of residence or by contacting directly the patients or their families. Total mRNA was successfully extracted from 359 FFPE blocks. To avoid any possible bias in the estimate of patients' survival due to the inclusion of perioperative deaths, 3 patients who died within 30 days of resection were excluded from the analysis. Five samples with spurious qRT-PCR results, due to poor quality mRNA, were also excluded from analysis. In total, therefore, specimens from 351 patients with stage I lung adenocarcinoma were analyzed and divided into a training (N = 189) and a validation (N = 162) set. The clinical–pathologic information of the 351 patients is reported in Table 1. 

Table 1.

Patient and tumor characteristics

Training set (stage I)Validation seta (stage I)Advanced cancer setb (stage II–III)
N 189 162 156 
Age at surgery (years) 
 Median (Q1;Q3) 65 (59;71) 65 (59;71) 65 (60;70) 
 Min–max 42–84 42–83 39–81 
Gender (female) 48 (25.4%) 59 (36.4%) 43 (27.6%) 
Smoking history 
 Current/former 144 (76.2%) 126 (77.8%) 128 (82.1%) 
 Never 21 (11.1%) 23 (14.2%) 15 (9.6%) 
 Unknown 24 (12.7%) 13 (8.0%) 13 (8.3%) 
Stage 
 I 189 162 
  IA 90 (47.6%) 78 (48.1%) — 
  IB 99 (52.4%) 84 (51.9%) — 
 II 105 (67.3%) 
  IIA — — 64 (41.0%) 
  IIB — — 41 (26.3%) 
 III 51 (32.7%) 
  IIIA — — 47 (30.1%) 
  IIIB — — 4 (2.6%) 
Follow-up 
 Deaths within 3 years 19 (10.1%) 15 (9.3%) 51 (32.7%) 
 Survivors length of follow-up 
  <1 yr 4 (2.1%) 6 (3.7%) 10 (6.4%) 
  1–2 yrs 7 (4.3%) 5 (3.2%) 
  2–3 yrs 1 (0.5%) 30 (18.5%) 5 (3.2%) 
  >3 yrs 165 (87.3%) 104 (64.2%) 85 (54.5%) 
 Total pt-yrs 527 424 357 
Training set (stage I)Validation seta (stage I)Advanced cancer setb (stage II–III)
N 189 162 156 
Age at surgery (years) 
 Median (Q1;Q3) 65 (59;71) 65 (59;71) 65 (60;70) 
 Min–max 42–84 42–83 39–81 
Gender (female) 48 (25.4%) 59 (36.4%) 43 (27.6%) 
Smoking history 
 Current/former 144 (76.2%) 126 (77.8%) 128 (82.1%) 
 Never 21 (11.1%) 23 (14.2%) 15 (9.6%) 
 Unknown 24 (12.7%) 13 (8.0%) 13 (8.3%) 
Stage 
 I 189 162 
  IA 90 (47.6%) 78 (48.1%) — 
  IB 99 (52.4%) 84 (51.9%) — 
 II 105 (67.3%) 
  IIA — — 64 (41.0%) 
  IIB — — 41 (26.3%) 
 III 51 (32.7%) 
  IIIA — — 47 (30.1%) 
  IIIB — — 4 (2.6%) 
Follow-up 
 Deaths within 3 years 19 (10.1%) 15 (9.3%) 51 (32.7%) 
 Survivors length of follow-up 
  <1 yr 4 (2.1%) 6 (3.7%) 10 (6.4%) 
  1–2 yrs 7 (4.3%) 5 (3.2%) 
  2–3 yrs 1 (0.5%) 30 (18.5%) 5 (3.2%) 
  >3 yrs 165 (87.3%) 104 (64.2%) 85 (54.5%) 
 Total pt-yrs 527 424 357 

aComparison of distributions between training and validation sets. Wilcoxon test for age: P = 0.69. Fisher exact test for gender: P = 0.03; smoking history: P = 0.29.

bComparison of distributions between training and advanced cancer sets. Wilcoxon test for age: P = 0.80. Fisher exact test for gender: P = 0.71; smoking history: P = 0.36.

IEO stage II–III cohort.

We randomly identified a second cohort of 156 patients, the “advanced cancer set,” from the cohort of patients with stage II and III adenocarcinomas, operated at IEO during the same period as the patients of the stage I cohort, and using the same inclusion criteria.

RNA extraction, qRT-PCR, and optimization of the protocol in FFPE samples

We initially compared the efficiency of detection of the target genes of interest in RNA extracted from fresh-frozen and FFPE lung specimens, by comparing paired FFPE and fresh-frozen tumor and normal samples (3 each) by qRT-PCR analysis (Supplementary Fig. S1A–S1D). Up to 5 tissue sections (5 μm thick) were prepared from each specimen according to tumor size and/or adequate tumor cellularity (>60%). Total RNA was extracted from fresh-frozen tissues using TRIzol, followed by RNeasy Mini Spin Columns (RNeasy FFPE Kit, Qiagen) and from FFPE tissues using the AllPrep DNA/RNA FFPE Kit (Qiagen). Total RNA (200 ng), measured using the NanoDrop ND-1000 Spectrophotometer, was reverse transcribed (SuperScript VILO cDNA Synthesis Kit, Thermo Fisher Scientific) and preamplified using the PreAmp Master Mix Kit (Thermo Fisher Scientific) for 10 cycles, following the manufacturer's instructions. The resulting cDNA was then diluted 1:10 prior to PCR analysis (2 ng of cDNA/reaction). In comparison, 1.0 μg of total RNA extracted from fresh-frozen and FFPE samples was reverse transcribed and directly analyzed by quantitative PCR without preamplification (5 ng of cDNA/reaction for fresh-frozen samples and 20 ng/reaction for FFPE samples). qRT-PCR was performed using hydrolysis probe (Thermo Fisher Scientific) with the SsoAdvanced Universal Probes Supermix (Bio-Rad Laboratories) in a LightCycler 480 real-time PCR instrument (Roche). Thermal cycling amplification was performed with an initial incubation at 95°C for 30 seconds, followed by 45 cycles of 95°C for 5 seconds and 60°C for 30 seconds. For the design of custom TaqMan assays (listed in Supplementary Table S1), we employed Primer Express Software V3.0 (Thermo Scientific). For accurate sample normalization, we included the 3 reference genes (TBP, HPRT1, and GUSB), used for the identification of the original 10-gene signature (10) from fresh-frozen samples, and one additional reference gene (ESD).

For the expression analysis of the 10-gene signature in the retrospective NSCLC study cohorts (IEO-stage I and IEO-stage II–III), total RNA was extracted from one tissue core (1.5 mm in diameter) taken from FFPE blocks in representative tumor areas with adequate tumor cellularity (>60%), selected by a pathologist. We employed the AllPrep DNA/RNA FFPE Kit automated on QIAcube, following the manufacturer's instructions (Qiagen).

We also assessed the effect of long-term storage of FFPE samples on RNA quality, using the cohort of lung cancer patients profiled in this study. The qRT-PCR analysis revealed a trend toward an increase of Cq values as a function of the age of the FFPE samples, which is consistent with augmented degradation of RNA, as described previously (16). With the exception of TBP gene, which seemed not to be affected by time of storage, all the other reference genes shared a similar profile that was maintained in almost all the genes of the 10-gene signature (Supplementary Fig. S1E–S1F). For this reason, the TBP gene was replaced by ESD as reference gene, together with GUSB and HRPT1, for the assessment of the prognostic significance of the 10-gene signature.

We defined Cq = 35 as our limit of detection; Cq values above this limit were set to 35 for further calculations. For each gene in each sample, the expression level was measured in triplicate, and the average Cq was calculated (the average was calculated from triplicate values when the SD of replicates was <0.4 and from the best duplicate values when SD was ≥0.4). Data (average Cq) were normalized using the 3 genes (HPRT1, ESD, GUSB) as reference. The normalized Cq|$( {{C_{\rm{q}}}_{_{{\rm{normalized}}}}} )$| of each target gene (E2F1, E2F4, HOXB7, HSPG2, MCM6, NUDCD1, RRM2, SCGB3A1, SERPINB5, SF3B1) was calculated using the following formula:

formula

where SF is the difference between a constant reference value K and the average Cq value of the reference genes; K represents the mean of the average Cq of the 3 reference genes calculated across all samples (K = 23.66). This normalization strategy allows the retention of information about the abundance of the original transcripts, as measured by PCR (i.e., in CT scale), which is conversely lost when using the more classical ΔCt method. Normalized data were then processed for statistical analysis.

Survival analysis

Cox regression model to calculate a continuous risk score.

The ridge-penalized Cox regression model was implemented on the training set considering the normalized gene expression of the 10 genes in the prognostic signature as continuous covariates with log-linear effect. Cross-validated (10-fold) log-likelihood (CVL) with optimization of the tuning penalty parameter was applied. Tuning of the penalty parameter was repeated 500 times using a different folding at each simulation, and the model associated with the highest CVL was selected (17–19).

A continuous risk score was assigned to each patient based on:

formula

where i is the summation index for E2F1, E2F4, HOXB7, HSPG2, MCM6, NUDCD1, RRM2, SCGB3A1, SERPINB5, and SF3B1; β is the ridge-penalized Cox model coefficient for each target gene; and average |${C_{\rm{q}}}_{_{{\rm{normalized}}}}$| is the normalized average Cq for each target gene.

Minimum and maximum risk scores from the training set were used to scale risk scores in a 0 to 1 range. The median of scaled risk scores from the training set was used as a cutoff to categorize patients into high- and low-risk groups (Supplementary Tables S2 and S3).

Sample size calculation for stage II–III cohort

The sample size was calculated to detect a minimum of 15% difference in the 3-year overall survival for high- and low-risk groups (6–9). We applied the formula proposed by Schoenfeld (20) assuming 80% power, one-sided 5% significance level, 50% of cases assigned to the high-risk group, and a 3-year survival for stage II–III lung adenocarcinomas of 55% to 65% (based on our previous experience). We assumed a 20% drop in the total sample size due to inadequate tissue blocks, poor quality mRNA, and exclusion of perioperative deaths. Two hundred patients were selected and, after tissue blocks and qRT-PCR data quality control and exclusion of perioperative deaths, a total of 156 patients were finally included in the analysis. Clinical and pathologic characteristics of these 156 patients and of the original cohort are shown in Supplementary Table S4.

In silico analysis of the TCGA dataset

Database.

We downloaded genomics data for a cohort of 468 patients with lung adenocarcinoma available in the TCGA data portal (http://cancergenome.nih.gov; see Supplementary Table S5 for patient information). In addition, for further validation of the 10-gene signature, we downloaded gene expression data for a cohort of 442 patients with lung adenocarcinoma available in GEO database (DCC dataset; GSE68465; http://www.ncbi.nlm.nih.gov/geo/). Overall survival was defined as the time from the date of tumor resection until death from any cause and estimated by the Kaplan–Meier method. Follow-up was truncated at 3 years to reduce the potential overestimation of overall mortality with respect to lung cancer–specific mortality.

Gene expression analysis.

We used “reads per kilobase of transcript per million reads mapped” (RPKM) data from the TCGA patient cohort or MAS5 signal (as it was originally performed by authors) from the DCC patient cohort to perform hierarchical clustering analysis. Data were log2 transformed and median centered. To reduce the complexity of the two datasets and extract the most informative data, we analyzed the SD of gene expression data across samples and selected genes belonging to the upper 25% (SD > Q3) of the SD distribution for a total of 5,126 genes of the 20,501 annotated genes (TCGA; IDs are available in Supplementary Table S6A) and 4,238 genes of the 13,513 annotated genes (DCC). We then used the significance analysis of microarray (SAM) to identify differentially expressed genes (delta = 1.03578 and fudge factor = 0.03554, TCGA; delta = 0.76071 and fudge factor = 0.11973, DCC; FDR (q-value) < 0.05 was used as cutoff). The uncentered correlation and the centroid linkage methods were used to cluster gene expression data in Cluster 3.0 for Mac OS X (http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm). Hierarchical clustering structures and heatmaps were generated using Java TreeView 1.1.1 (http://jtreeview.sourceforge.net). JMP 10 (SAS) generated Kaplan–Meier survival plots based on clustering analysis together with calculation of P values based on the log-rank test. The Ingenuity Pathway Analysis (IPA) and Upstream Regulator Analysis were performed using the online available web tool (http://www.ingenuity.com/).

Somatic mutation analysis.

The BI Mutation Calling Level 2 DNA-seq data available for 445 of 468 patients (TCGA) were used in our analysis. Silent mutations and redundant mutations, due to replicate sequencing, were discarded. The gene mutation rate (“MutGene rate”) of patients in every stage/cluster was computed as follows:

formula

where Mi is the number of mutated genes in stage/cluster i; Pi is the total number of patients in each stage/cluster i.

The number of mutated genes in each stage/cluster was counted using a binary annotation: for each patient–gene pair, a score of 1 was assigned when one or more mutations were present and 0 in the absence of any mutations. See Supplementary Table S5 for patient–cluster correspondence.

Frequency of mutation of each gene was calculated by dividing the total number of patients found with a given mutated gene by the total number of patients.

CNV analysis

We used the cBIO data portal (21) to download SNP array data (GISTIC-preanalyzed data; http://gdac.broadinstitute.org), which were available for 466 of 468 patients (TCGA). We used the CGDS-R package to retrieve GISTIC calls of CNVs and annotated the dataset with tumor stage and cluster information (see Supplementary Table S5 for patient–cluster correspondence). We computed a CNV rate considering amplifications (GISTIC value = 2) and deletions (GISTIC value = −2) together, as a general index for CNVs, and then calculated:

formula

where VSi is the number of altered genes (amplified/deleted) in stage/cluster i; Pi is the total number of patients in stage/cluster i.

Methylation analysis

Illumina Infinium HumanMethylation450 BeadChip data were available for 402 of 468 samples (TCGA). Patients with unknown tumor stage (13 of 402) were removed from further analyses. Quality control, preprocessing, and normalization were performed using RnBeads R package (http://rnbeads.mpi-inf.mpg.de). One sample was marked as “unreliable” during quality control step and removed; the final dataset resulted in 388 samples that were used in our analysis (Supplementary Table S5). β values, representing the percentage of methylation for each CpG site in a sample, were extracted for promoter regions. To reduce the complexity of the dataset and extract the most informative data, we first analyzed the SD of β values calculated across samples. We then selected the most informative (i.e., those with the highest variability) 7,134 promoters (out of 28,534 unique promoters; IDs are available in Supplementary Table S6B), belonging to the upper 25% (SD >Q3) of the SD distribution. SAM was then used to identify differentially methylated genes using this set of 7,134 promoter regions (delta value = 1.19456; fudge factor = 0.02526; FDR (q-value) < 0.05 was used as cutoff).

Hierarchical clustering analysis was performed using Gene Cluster 3.0 for Mac OSX (http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm). Methylation data were clustered using the uncentered correlation and centroid linkage methods. Tree pictures were generated using Java TreeView software (http://jtreeview.sourceforge.net).

Protein expression analysis

Reverse-phase protein array data relative to 190 proteins (Supplementary Table S6C) were available for 234 of 468 patients (TCGA). Data were downloaded from The Cancer Protein Atlas (22). SAM was used to identify the differentially expressed proteins (delta value = 1.01291; fudge factor = 0.02792; FDR (q-value) < 0.05 was used as cutoff). Normalized protein expression data (level 3) were used in clustering and statistical analyses. Hierarchical clustering analysis was performed using Gene Cluster 3.0 for Mac OSX (http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm). Expression data were clustered using uncentered correlation and centroid linkage methods. Tree pictures were generated using Java TreeView software (http://jtreeview.sourceforge.net).

Optimization of qRT-PCR conditions for the analysis of the 10-gene signature in FFPE samples

The 10-gene prognostic signature was originally derived using RNA extracted from frozen lung tissue specimens from lung cancer patients (10). To apply this signature to FFPE samples, we first optimized the procedure for measuring target gene expression using lower quality, partially degraded, RNA, as is typically obtained from FFPE samples. First, we designed qRT-PCR probes and primers targeting short regions of each transcript (<100 bp), preferably in the 3′UTR region of the gene (Supplementary Table S1). We also introduced a gene-specific preamplification reaction, which allowed us to increase the number of qRT-PCR reactions we could perform on each sample and to improve the signal-to-noise ratio.

We then profiled paired fresh-frozen and FFPE samples (3 normal lung and 3 lung tumors) to compare the gene detection efficiency under the two conditions. We observed that the average Cq in FFPE samples without the preamplification step was approximately 5 Cq higher than that of the matching fresh-frozen samples. This is equivalent to approximately 32 times fewer copies of detected mRNAs in the FFPE samples compared with the fresh-frozen samples (Supplementary Fig. S1A). Importantly, adding the preamplification step before the qRT-PCR in the FFPE samples resulted in a decrease of approximately 7-8 Cq (Supplementary Fig. S1B) and improved the correlation between FFPE and fresh-frozen samples (0.63 vs. 0.43; R2 of bivariate linear fit; P < 0.0001; Supplementary Fig. S1C–S1D).

Validation of the 10-gene signature in a cohort of FFPE lung adenocarcinomas

We applied the optimized protocol to profile a consecutive cohort of 351 patients with stage I lung adenocarcinoma and 156 patients with more advanced lung adenocarcinoma (stage II–III, i.e., the “IEO cohort”; Fig. 1A; Table 1). Overall survival in the entire group of 507 patients was dependent, as expected, on tumor stage, and no differences were observed between stage IA and IB patients (Fig. 1B).

Figure 1.

The 10-gene prognostic model. A, Overall study design. B, Kaplan–Meier survival curves of all IEO lung cancer patients included in the study stratified by stage (IA, IB, II, III); IEO cohort stage I plus stage II–III (N = 507). C, Kaplan–Meier survival curves of all IEO patients with stage I disease in the validation set stratified using the 10-gene model (N = 162). D, Kaplan–Meier survival curves of all IEO patients with stage I disease (i.e., the training set plus the validation set, N = 351) stratified using the 10-gene model. E, Kaplan–Meier survival curves of the “advanced cancer set” (N = 156) of IEO patients with stage II–III lung stratified using the 10-gene model. B–E, Numbers of patients at risk at 0, 12, 24, and 36 months are reported underneath the graphs. Log-rank test P values are reported.

Figure 1.

The 10-gene prognostic model. A, Overall study design. B, Kaplan–Meier survival curves of all IEO lung cancer patients included in the study stratified by stage (IA, IB, II, III); IEO cohort stage I plus stage II–III (N = 507). C, Kaplan–Meier survival curves of all IEO patients with stage I disease in the validation set stratified using the 10-gene model (N = 162). D, Kaplan–Meier survival curves of all IEO patients with stage I disease (i.e., the training set plus the validation set, N = 351) stratified using the 10-gene model. E, Kaplan–Meier survival curves of the “advanced cancer set” (N = 156) of IEO patients with stage II–III lung stratified using the 10-gene model. B–E, Numbers of patients at risk at 0, 12, 24, and 36 months are reported underneath the graphs. Log-rank test P values are reported.

Close modal

Stage I patients were divided into training (N = 189) and validation (N = 162) sets. The two sets were balanced for overall survival (P = 0.94; Supplementary Fig. S2A). Using the training set, we fitted a ridge-penalized Cox regression model to generate a continuous risk score from the expression data of the 10-gene signature that was then assigned to each patient (Supplementary Table S2). Patients were classified at high- or low-risk of death if their risk scores were above, or below, the median risk (i.e., the cutoff; Supplementary Table S3) calculated in the training set (Supplementary Fig. S2B and S2C).

In the validation set, 83 patients were classified at high risk and 79 patients at low risk according to the 10-gene signature (Table 2). The 3-year overall survival was 84.0% [95% confidence interval (CI), 76.6–91.4] for patients at high risk and 95.6% (91.4–99.8) for patients at low risk (Fig. 1C). The risk of death was significantly higher for high-risk compared with low-risk patients both in univariate and multivariate analyses (HR = 4.04; 95% CI, 1.14–14.31; P = 0.03; univariate, HR = 4.20; 1.18–14.99; P = 0.03, multivariate; Table 2). Of note, the risk model improved when the “10-gene risk” was added to traditional clinical and pathologic parameters (age, sex, smoking status, and tumor stage; P = 0.01, nested likelihood ratio test; Table 2). No significant improvement was observed in model fit adding clinical and pathologic parameters to the 10-gene signature (P = 0.43, nested likelihood ratio test; Table 2).

Table 2.

Cox proportional hazards models in the stage I validation set (N = 162; N deaths = 15)

UnivariateMultivariate without risk classaMultivariate with risk classa
NN deathsHR (95% CI)PbHR (95% CI)PbHR (95% CI)Pb
10-gene risk high (vs. low) 83 12 4.04 (1.14–14.31) 0.03 —  4.20 (1.18–14.99) 0.03 
Age ≥65 (vs. <65) 86 10 1.78 (0.61–5.22) 0.29 1.72 (0.57–5.14) 0.33 1.96 (0.66–5.87) 0.23 
Male (vs. female) 103 11 1.54 (0.49–4.84) 0.46 1.24 (0.38–4.07) 0.72 1.18 (0.37–3.81) 0.78 
Never smoker (vs. smoker/unknown) 23 0.46 (0.06–3.50) 0.45 0.44 (0.05–3.54) 0.44 0.49 (0.06–3.93) 0.50 
Stage IB (vs. IA) 84 10 1.95 (0.67–5.71) 0.22 1.76 (0.59–5.28) 0.31 1.83 (0.62–5.42) 0.27 
UnivariateMultivariate without risk classaMultivariate with risk classa
NN deathsHR (95% CI)PbHR (95% CI)PbHR (95% CI)Pb
10-gene risk high (vs. low) 83 12 4.04 (1.14–14.31) 0.03 —  4.20 (1.18–14.99) 0.03 
Age ≥65 (vs. <65) 86 10 1.78 (0.61–5.22) 0.29 1.72 (0.57–5.14) 0.33 1.96 (0.66–5.87) 0.23 
Male (vs. female) 103 11 1.54 (0.49–4.84) 0.46 1.24 (0.38–4.07) 0.72 1.18 (0.37–3.81) 0.78 
Never smoker (vs. smoker/unknown) 23 0.46 (0.06–3.50) 0.45 0.44 (0.05–3.54) 0.44 0.49 (0.06–3.93) 0.50 
Stage IB (vs. IA) 84 10 1.95 (0.67–5.71) 0.22 1.76 (0.59–5.28) 0.31 1.83 (0.62–5.42) 0.27 

aNested likelihood ratio test to compare the model with 10-gene risk class added to the model with clinical and pathologic parameters: P = 0.01; nested likelihood ratio test to compare the model with clinical and pathologic parameters added to the model with 10-gene risk class: P = 0.43.

bWald test P value.

To test the performance of the 10-gene signature in a larger cohort, we decided to pool together training and validation sets. We are aware of the possible bias related to present resubstitution statistics (11); however, as the performance of the signature was very similar in the training and validation sets (training set: HR = 3.93; 1.31–11.85; P = 0.02; validation set: HR = 4.04; 1.14–14.31; P = 0.03), we assumed that the possible bias introduced by the pooling procedure was trivial. When both sets were considered together (N = 351), 178 patients were found to be at high risk (27 deaths) and 173 at low risk (7 deaths) according to the 10-gene signature, with a 3-year overall survival of 95.6% (92.4–98.8) in low-risk patients compared with 84.2% (78.7–89.7) in high-risk patients (Fig. 1D). Of note, high-risk patients diagnosed with a stage IA tumor had an approximately 4-fold increased risk of death, similarly to patients diagnosed with a stage IB tumor (stage IA: HR = 4.04; 1.11–14.66; P = 0.03; stage IB: HR = 3.83; 1.29–11.39; P = 0.02). Importantly, the 10-gene signature remained prognostic also when we considered the disease-specific mortality (cause of death available for 23 of 34, 68%, see Supplementary Table S7): HR = 4.39; 1.48–13.06; P = 0.01.

Finally, we assessed whether the 10-gene model was prognostic in more advanced lung tumors (stage II–III; 156 patients). We observed an increase in survival at 3 years of more than 7% for low-risk compared with high-risk patients, although the prognostic statistical significance of the 10-gene signature could not be confirmed (HR = 1.40; 0.80–2.45; P = 0.23; Fig. 1E).

In silico analysis of the 10-gene signature using TCGA data

We probed into the molecular characteristics of another large and independent cohort of lung cancer patients using the 10-gene signature (see Supplementary Table S5 for patient information). We took advantage of the TCGA (13) lung cancer database containing whole -omic data (i.e., transcriptional, mutational, CNV, DNA methylation, and proteomic profiles) of 468 lung adenocarcinoma patients (247 stage I; Fig. 2). As these tumors were profiled by RNA sequencing (RNA-seq) and not by qRT-PCR, we could not apply directly our risk model to the TCGA data due to the different dynamic ranges of the two methodologies. However, we performed a hierarchical clustering analysis of RNA-seq data corresponding to the 10-gene signature, which highlighted four main gene expression clusters (Fig. 2A). Of these, cluster C1 displayed a significant increase in the expression of 8 of the 9 genes whose upregulation in the 10-gene signature associates with poor prognosis (10) and a decrease in the expression of SCGB3A1, which is downregulated in the 10-gene signature (Supplementary Fig. S2D–S2E; ref. 10). Consistently, survival analysis revealed that patients (all stages) in cluster C1 displayed a more adverse prognosis (35% overall survival at 3 years vs. ∼70% in the other clusters; Fig. 2B). Results were comparable when we restricted the analysis to stage I lung cancer patients (N = 247; C1 patients: 55% overall survival at 3 years vs. ∼85% in the other clusters; Fig. 2B) or when we used another independent cohort of lung adenocarcinoma profiled by Affymetrix [N = 442; the DCC dataset (23); Supplementary Fig. S3A], despite the absence of the expression profile of two genes of the 10-gene signature (SCGB3A1 and NUDCD1) that were not present in the Affymetrix Chip (HG-U133A). These results further confirmed the prognostic significance of the 10-gene signature in independent cohorts of lung cancer patients, screened by RNA-seq or Affymetrix, both for stage I and stage II–IV disease.

Figure 2.

Multiomics analysis of the 468 TCGA lung adenocarcinoma patients stratified by the 10-gene signature. A, Hierarchical cluster analysis of the entire cohort of 468 patients using RNA-seq data corresponding to the 10 genes of the signature. Yellow, increased expression; blue, decreased expression. The four main identified clusters are indicated by a color code (red, cluster 1-C1; green, cluster 2-C2; orange, cluster 3-C3; blue, cluster 4-C4). The “survival” bar below the heatmap indicates death events in black, while the “stage” bar indicates stage I disease in gray and more advanced stage disease (stages II–IV) in violet. B, Kaplan–Meier survival curves of all patients in the TCGA cohort (left), or limited to stage I (N = 247) or to stage II–IV (N = 221) lung cancer patients (middle and right, respectively), using the “Cluster IDs” as grouping parameters defined by the hierarchical clustering analysis as described in A. P values were computed by using the log-rank test. C, Mutational analysis of the 18 most frequently mutated genes in lung adenocarcinoma in all the TCGA patients with available information (445 patients, left) or limited to stage I lung cancer patients (238 patients, right). Columns represent the patients ordered by the hierarchical clustering analysis and labeled with cluster IDs as in A. Rows represent the mutational profile of each of the 18 genes across patients. The overall gene mutation rate is reported for each cluster (top bar, “MutGene rate”). Frequency of mutation of each gene is reported to the right of the diagram. Smoking status and gender of patients are shown by colored bars in the lower part of the diagram. D, MutGene rate and CNV rate in lung cancer patients stratified by stage. Stage I patients were subdivided into those belonging to C1 (I-C1; red) or not (I-other; violet). P values were computed using the exact Poisson test. E, Kaplan–Meier survival curves of patients grouped as in D. P values were computed using the log-rank test.

Figure 2.

Multiomics analysis of the 468 TCGA lung adenocarcinoma patients stratified by the 10-gene signature. A, Hierarchical cluster analysis of the entire cohort of 468 patients using RNA-seq data corresponding to the 10 genes of the signature. Yellow, increased expression; blue, decreased expression. The four main identified clusters are indicated by a color code (red, cluster 1-C1; green, cluster 2-C2; orange, cluster 3-C3; blue, cluster 4-C4). The “survival” bar below the heatmap indicates death events in black, while the “stage” bar indicates stage I disease in gray and more advanced stage disease (stages II–IV) in violet. B, Kaplan–Meier survival curves of all patients in the TCGA cohort (left), or limited to stage I (N = 247) or to stage II–IV (N = 221) lung cancer patients (middle and right, respectively), using the “Cluster IDs” as grouping parameters defined by the hierarchical clustering analysis as described in A. P values were computed by using the log-rank test. C, Mutational analysis of the 18 most frequently mutated genes in lung adenocarcinoma in all the TCGA patients with available information (445 patients, left) or limited to stage I lung cancer patients (238 patients, right). Columns represent the patients ordered by the hierarchical clustering analysis and labeled with cluster IDs as in A. Rows represent the mutational profile of each of the 18 genes across patients. The overall gene mutation rate is reported for each cluster (top bar, “MutGene rate”). Frequency of mutation of each gene is reported to the right of the diagram. Smoking status and gender of patients are shown by colored bars in the lower part of the diagram. D, MutGene rate and CNV rate in lung cancer patients stratified by stage. Stage I patients were subdivided into those belonging to C1 (I-C1; red) or not (I-other; violet). P values were computed using the exact Poisson test. E, Kaplan–Meier survival curves of patients grouped as in D. P values were computed using the log-rank test.

Close modal

Next, we analyzed, in the four identified clusters, the mutational profile of 18 genes, previously identified as frequently mutated in lung adenocarcinoma (13). We observed an increased mutation rate in cluster C1, which was independent to smoking status (Supplementary Table S8; Supplementary Fig. S2F), both when considering all stages or stage I alone (Fig. 2C). Importantly, stage I tumors clustered in C1 (C1-stage I) had a mutation rate comparable to more advanced stage tumors (stage II–IV) and a CNV rate higher than all other groups (Fig. 2D). Consistently, prognosis of C1-stage I patients was similar to that of patients with more advanced lung cancer (P = 0.10; Fig. 2E). These results argue that C1-stage I tumors, identified using the 10-gene signature, represent a distinct subtype of stage I tumors that shares genetic and clinical characteristics with more advanced lung tumors. This notion is further supported by the large difference in gene expression (2, 349 genes, q < 0.05), DNA methylation (1,199 promoters, q < 0.05), and protein (25 proteins, q < 0.05) expression profiles observed when we compared C1-stage I tumors with other stage I tumors (Fig. 3A–C; Supplementary Table S6A–S6C). Conversely, the difference was smaller when C1-stage I tumors were compared with more advanced lung cancer (Supplementary Fig. S4).

Finally, we explored the mechanisms that could potentially explain the enhanced aggressiveness of C1-stage I tumors by analyzing the “C1-stage I” gene expression signature (2,349 genes) with the Ingenuity (IPA) Upstream Regulator Analysis, which predicts the activation/inhibition of specific modulators of gene expression (e.g., transcription factors, miRNAs). We identified 183 potential upstream modulators predicted to be regulated in C1-stage I tumors (|z-score| > 2; see Supplementary Table S9). These modulators display activity on the majority of genes present in the C1-stage I signature (N = 1,568, 66.8%), which span a multitude of cancer-relevant mechanisms (Supplementary Table S10).

The top hit in the list of modulators was the redox-sensitive transcription factor, nuclear factor (erythroid-derived 2)-like 2, also known as NFE2L2 or Nrf2, (z-score = 8.4; Supplementary Table S9). Nrf2 is involved in the oxidative stress response through the transcriptional activation of genes encoding antioxidants, xenobiotic detoxification enzymes, and drug efflux pumps (Supplementary Fig. S5; ref. 24). Nuclear accumulation of Nrf2 and the transcriptional activation of target genes have been observed in lung cancer, concomitantly with the decreased activity of the Kelch-like ECH-associated protein 1 [KEAP1 (25)]. Indeed, KEAP1 functions as an adaptor protein for the Cul3-based E3 ligase, which, in turn, targets NRF2 to 26S proteasomal degradation (26, 27). Importantly and consistently, we found KEAP1 among the “inhibited” modulators in the TCGA dataset by the IPA (z-score = −2; Fig. 3D; Supplementary Table S9), and also when we used the DCC dataset (z-score = −1.7; Supplementary Table S11; Supplementary Fig. S6A). We also observed a 2-fold increase in the mutation frequency of KEAP1 in C1 patients compared with other non-C1 patients (27% vs. 13%, all stages; 25% vs. 13%, stage I; 28% vs. 12%, stage II–IV; Fig. 3E; Supplementary Table S12), while no differences were detected when comparing copy number or methylation status (Supplementary Table S12). These mutations span the BTB–BACK domains and Kelch motif of KEAP1 (Fig. 3F), which are important regions for protein–protein interactions and can decrease KEAP1-inhibitory activity on Nrf2 (25, 28). Indeed, we observed a significant upregulation of NRF2 target genes in stage I tumors with mutated KEAP1 gene (Supplementary Fig. S6B). Finally, high-risk patients in the IEO cohort displayed significant upregulation of the KEAP1/NRF2–regulated genes in the TCGA dataset (Supplementary Fig. S6C; Supplementary Table S14).

Figure 3.

Multiomics analysis of C1-stage I patients and of the other stage I lung adenocarcinoma patients of the TCGA cohort. A, Hierarchical cluster analysis using the 2,349 gene list?found to be transcriptionally regulated by SAM analysis (q < 0.05). Yellow, increased expression; blue, decreased expression. B, Hierarchical clustering analysis using the methylation profile of 1,199 promoters found to be regulated by SAM analysis (q < 0.05). β values, percentage of methylation for each CpG site in a sample. Red, increased methylation; blue, decreased methylation. C, Hierarchical cluster analysis using the 25 proteins found regulated by SAM analysis (q < 0.05). Red,?increased protein expression; light blue, decreased protein expression. In clusters, C1-stage I patients were labeled in red (I-C1), whereas the other stage I patients?are?labeled in violet (I-other). A–C, Smoking status and gender of patients are shown by colored bars in the lower part of the diagrams. NA, not available. D, Gene network of KEAP1-regulated genes present in the “C1-stage I” gene expression signature (2,349 genes; q < 0.05). Blue, IPA-predicted inhibited upstream modulator, KEAP1. Lines connect the modulator to direct targets, and orange color indicates consistency between the predicted activity and the expression change observed in C1-stage I versus other-stage I patients (i.e., target expression). Numbers indicate expression change (log2) of targets in C1-stage I patients versus other-stage I patients. E, Mutation frequency of KEAP1 in lung adenocarcinoma patients by tumor stage. Patients were subdivided in those belonging to C1 (C1) or not (other). P values were computed using the χ2 test for difference in proportion. F, Mutation plot of the KEAP1 protein in lung adenocarcinoma patients. Protein domains are indicated by colored sections in the cartoon. BTB, the BTB domain (Broad-Complex, Tramtrack, and Bric-a-brac); BACK, the BTB/Kelch-associated domain; K1, the Kelch repeat type 1 motif. Lines with colored ellipses indicate the positions of mutations in the gene, whereas the length of the lines indicates the number of mutations (# mutations) observed in the same position across tumor samples. Green, missense mutations; red, nonsense, indel, or splice site mutations.

Figure 3.

Multiomics analysis of C1-stage I patients and of the other stage I lung adenocarcinoma patients of the TCGA cohort. A, Hierarchical cluster analysis using the 2,349 gene list?found to be transcriptionally regulated by SAM analysis (q < 0.05). Yellow, increased expression; blue, decreased expression. B, Hierarchical clustering analysis using the methylation profile of 1,199 promoters found to be regulated by SAM analysis (q < 0.05). β values, percentage of methylation for each CpG site in a sample. Red, increased methylation; blue, decreased methylation. C, Hierarchical cluster analysis using the 25 proteins found regulated by SAM analysis (q < 0.05). Red,?increased protein expression; light blue, decreased protein expression. In clusters, C1-stage I patients were labeled in red (I-C1), whereas the other stage I patients?are?labeled in violet (I-other). A–C, Smoking status and gender of patients are shown by colored bars in the lower part of the diagrams. NA, not available. D, Gene network of KEAP1-regulated genes present in the “C1-stage I” gene expression signature (2,349 genes; q < 0.05). Blue, IPA-predicted inhibited upstream modulator, KEAP1. Lines connect the modulator to direct targets, and orange color indicates consistency between the predicted activity and the expression change observed in C1-stage I versus other-stage I patients (i.e., target expression). Numbers indicate expression change (log2) of targets in C1-stage I patients versus other-stage I patients. E, Mutation frequency of KEAP1 in lung adenocarcinoma patients by tumor stage. Patients were subdivided in those belonging to C1 (C1) or not (other). P values were computed using the χ2 test for difference in proportion. F, Mutation plot of the KEAP1 protein in lung adenocarcinoma patients. Protein domains are indicated by colored sections in the cartoon. BTB, the BTB domain (Broad-Complex, Tramtrack, and Bric-a-brac); BACK, the BTB/Kelch-associated domain; K1, the Kelch repeat type 1 motif. Lines with colored ellipses indicate the positions of mutations in the gene, whereas the length of the lines indicates the number of mutations (# mutations) observed in the same position across tumor samples. Green, missense mutations; red, nonsense, indel, or splice site mutations.

Close modal

Although the overall prognostic outcome of stage I NSCLC is favorable, up to 40% of these patients will eventually relapse with metastatic disease. This issue of high-risk stage I patients is becoming increasingly relevant to the clinical management of NSCLC as our ability to detect the disease in its early phases improves (29). Risk stratification tools may guide clinical decision making in patients with stage I lung cancer at high risk of disease recurrence, who would be eligible for multimodality treatment. However, this approach is hampered by the high inter- and intratumoral genetic heterogeneity of lung cancer (13, 30, 31). Genomics studies might help in identifying molecular subtypes of stage I NSCLC associated with poor prognosis, as exemplified by our previous discovery of a 10-gene prognostic signature (10).

In this study, we moved toward the clinical application of this signature by (i) testing it in large independent cohorts of NSCLC patients; (ii) assessing the robustness of its prognostic power across different technological platforms; and (iii) reducing it into practice in a closer-to-the-clinic setting, represented by FFPE specimens. Combining our current and past results, the 10-gene signature has now been analyzed in 1,487 lung adenocarcinoma patients from three different case collections comprising patients from Italy or from other countries, and across different platforms (507 by qRT-PCR on FFPE, 468 by RNA-seq, and 442 by Affymetrix in the current work, and 70 by RT-qPCR on fresh frozen in ref. 10). In all instances, the signature identified poor prognosis stage I patients. Although the strength of our results may be somehow limited by the lack of a multicentric validation of the 10-gene risk model, it is worth noting that the 10-gene signature was capable of identifying groups of patients with an adverse prognosis in independent cohorts of patients, analyzed with different screening platforms (RNA-seq and Affymetrix). Eventually, however, a final assessment of clinical utility of this 10-gene signature will only derive from randomized prospective studies to evaluate mortality reduction in patients identified at high risk by our 10-gene signature treated with multimodality therapy.

The availability of TCGA multiomics dataset allowed further molecular insights into the subgroup of stage I patients with adverse prognosis (the C1 cluster) identified by the 10-gene signature. In particular, it allowed the identification of a distinct mutational, gene expression, DNA methylation, and protein expression profiles arguing that they represent a molecular subgroup of stage I tumors. Notably, we identified in stage I-C1 tumors a 2-fold increase in the mutation rate of the KEAP1 gene versus non-C1 stage I tumors. KEAP1 is a known suppressor of the Nrf2 transcription factor, which is involved in the transcriptional activation of genes encoding antioxidants, xenobiotic detoxification enzymes, and drug efflux pumps, which protect normal cells from oxidative stress (32, 33). In cancer, genetic alterations of KEAP1 lead to its inactivation and, consequently, to the nuclear accumulation of Nrf2, which was shown to contribute to oxidative stress- and chemotherapy resistance (34–36). The potential implications of our findings are 2-fold. First, stage I lung tumors classified as high risk by the 10-gene signature might be (or might become) more resistant to microenvironmental stresses, such as reactive oxidative species, produced during hypoxia, a frequently occurring condition in cancer tissues (37–39). Second, patients with more advanced cancer classified as high risk could become resistant to adjuvant chemotherapy (platin-based). Thus, alternative therapies, such as molecularly targeted ones, might be more appropriate for these patients. This latter possibility is particularly interesting in light of current efforts aimed at the identification of small-molecule inhibitors targeting Nrf2 (14, 15).

Recently, two other signatures were proposed for the stratification of early-stage NSCLC (12, 40, 41). The characteristics of these signatures, in comparison with the 10-gene signature, are reported in Table 3. Although a direct comparison of the performances of the three signatures is currently impossible, we took an initial step in this direction, testing the ability of the two published signatures to stratify the tumors in the TGCA dataset. We found that unlike our 10-gene signature, which was able to identify stage I NSCLC patients with an adverse prognosis in this dataset, the other two signatures failed to do so (Supplementary Fig. S3B and S3C). This finding should not be taken at this stage as an indication of the superiority of our signature, but rather as evidence that additional work is needed to compare the various prognostic models in the same cohorts of patients, possibly with the intent of integrating them toward higher accuracy and prognostic power.

Table 3.

Comparison of published signatures with the 10-gene signature

11-gene signature Kratz, 2012 (40)CCP score Wistuba, 2013 (12)CCP score Bueno, 2015 (41)10-gene signature current study
N genes 11a 31b 31b 10c 
N reference genes 15 15 
FFPE cohort Kaiserd CCTC MDACC IEO BWH RIE IEO 
Period of surgery 1998–2005 2000–2008 1997–2008 1999–2005   1998–2012 
N NSCLC 420 967 207 174 474 176 507 
N stage I NSCLC 420 471 163 174 451 89 351 
Signature risk classes Low-, intermediate, high-risk Low-, intermediate, high-risk Low-, high-risk Low-, high-risk Low-, high-risk Low-, high-risk Low-, high-risk 
Signature performance for stage I 5-year OS Low: 71.4% Intermediate: 58.3% High: 49.2% 5-year OS Low: 83.0% Intermediate: 67.7% High: 64.6% 5-year LCS Low: 90% High: 79% 5-year LCS Low: 90% High: 75% 5-year LCS Low: 82% High: 72% 3-year OS Low: 95.6% High: 84.2% 5-yeare OS Low: 88.1% High: 77.6% 
 HR High vs. low 2.04 (1.28–3.26) HR High vs. lowf 2.37 (1.63–3.43) HR High vs. lowg 1.92 (1.18–3.10) HR High vs. low 1.56 (1.12–2.18) HR High vs. low 3.98 (1.73–9.14) HR High vs. low 2.34 (1.30–4.22) 
 Intermediate vs. low 1.66 (1.00–2.74) Intermediate vs. lowf 1.60 (1.03–2.49)      
11-gene signature Kratz, 2012 (40)CCP score Wistuba, 2013 (12)CCP score Bueno, 2015 (41)10-gene signature current study
N genes 11a 31b 31b 10c 
N reference genes 15 15 
FFPE cohort Kaiserd CCTC MDACC IEO BWH RIE IEO 
Period of surgery 1998–2005 2000–2008 1997–2008 1999–2005   1998–2012 
N NSCLC 420 967 207 174 474 176 507 
N stage I NSCLC 420 471 163 174 451 89 351 
Signature risk classes Low-, intermediate, high-risk Low-, intermediate, high-risk Low-, high-risk Low-, high-risk Low-, high-risk Low-, high-risk Low-, high-risk 
Signature performance for stage I 5-year OS Low: 71.4% Intermediate: 58.3% High: 49.2% 5-year OS Low: 83.0% Intermediate: 67.7% High: 64.6% 5-year LCS Low: 90% High: 79% 5-year LCS Low: 90% High: 75% 5-year LCS Low: 82% High: 72% 3-year OS Low: 95.6% High: 84.2% 5-yeare OS Low: 88.1% High: 77.6% 
 HR High vs. low 2.04 (1.28–3.26) HR High vs. lowf 2.37 (1.63–3.43) HR High vs. lowg 1.92 (1.18–3.10) HR High vs. low 1.56 (1.12–2.18) HR High vs. low 3.98 (1.73–9.14) HR High vs. low 2.34 (1.30–4.22) 
 Intermediate vs. low 1.66 (1.00–2.74) Intermediate vs. lowf 1.60 (1.03–2.49)      

Abbreviations: BWH, Brigham and Women's Hospital; CCTC, China Clinical Trials Consortium; LCS, lung cancer survival; MDACC, The University of Texas MD Anderson Cancer Center; OS, overall survival; RIE, Royal Infirmary of Edinburgh.

aBAG1, BRCA1, CDC6, CDK2AP1, ERBB3, FUT3, IL11, LCK, RND3, SH3BGR, WNT3A.

bASF1B, ASPM, BIRC5, BUB1B, CDC2, CDC20, CDCA3, CDCA8, CDKN3, CENPF, CENPM, CEP55, DLGAP5, DTL, FOXM1, KIAA0101, KIF11, KIF20A, MCM10, NUSAP1, ORC6L, PBK, PLK1, PRC1, PTTG1, RAD51, RAD54L, RRM2, SKA1, TK1, TOP2A.

cE2F1, E2F4, HOXB7, HSPG2, MCM6, NUDC1, RRM2, SCGB3A1, SERPINB5, SF3B1.

dKaiser Permanente Northern California system.

eFollow-up extended to 5 years for direct comparison with the other studies.

fMultivariable model on the complete cohort (including stage I–III tumors) with adjustment for stage.

gMultivariable model on the complete cohort (including stage I–II tumors) with adjustment for stage.

This latter possibility is particularly important from the perspective of applying current stratification models to the design of clinical trials for adjuvant chemotherapy in early-stage NSCLC. In such trials, it will be important to limit the number of “good prognosis” patients undergoing unnecessary treatment by carefully establishing the cutoff used to classify patients at high risk. This decision should take into account the trade-off between the numbers of overtreated and undertreated patients (Supplementary Fig. S2C) and might be aided by the integration of multiple prognostic models.

F. Bianchi and P.P. Di Fiore are listed as co-inventors of a patent, which is owned by Instituto FIRC di Oncologia Molecolare (IFOM), related to methods of diagnosis and prognosis of cancer, including lung cancer (NSCLC). No potential conflicts of interest were disclosed by the other authors.

The study funders had no role in the design of the study, the collection, analysis, and interpretation of the data, the writing of the manuscript, and the decision to submit the manuscript for publication.

Conception and design: E. Dama, G. Bertalot, G. Viale, L. Spaggiari, F. Bianchi, P.P. Di Fiore

Development of methodology: E. Dama, M. Vecchi, F. Bianchi

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): E. Dama, S. Pirroni, R.M. Carletti, D. Brambilla, G. Bertalot, M. Casiraghi, P. Maisonneuve, M. Barberis, G. Viale, M. Vecchi, L. Spaggiari, F. Bianchi

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): E. Dama, V. Melocchi, F. Dezi, P. Maisonneuve, F. Bianchi

Writing, review, and/or revision of the manuscript: E. Dama, V. Melocchi, F. Dezi, P. Maisonneuve, G. Viale, M. Vecchi, L. Spaggiari, F. Bianchi, P.P. Di Fiore

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): E. Dama, V. Melocchi, S. Pirroni, R.M. Carletti, M. Barberis, F. Bianchi

Study supervision: L. Spaggiari, F. Bianchi, P.P. Di Fiore

We are indebted to Francesca de Santis, Flavia Troglio, and Giovanna Jodice for technical support. We also thank the Molecular Pathology Unit of the Molecular Medicine Program at IEO and the Division of Thoracic Surgery and the Division of Pathology at IEO.

This work was supported by grants from AIRC (Associazione Italiana per la Ricerca sul Cancro, MCO 10.000 to P.P. Di Fiore, and MFAG17568 to F. Bianchi) and from the Monzino Foundation (to P.P. Di Fiore).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Siegel
RL
,
Miller
KD
,
Jemal
A
. 
Cancer statistics, 2015
.
CA Cancer J Clin
2015
;
65
:
5
29
.
2.
Aberle
DR
,
Adams
AM
,
Berg
CD
,
Black
WC
,
Clapp
JD
,
Fagerstrom
RM
, et al
Reduced lung-cancer mortality with low-dose computed tomographic screening
.
N Engl J Med
2011
;
365
:
395
409
.
3.
Henschke
CI
,
Boffetta
P
,
Yankelevitz
DF
,
Altorki
N
. 
Computed tomography screening: The International Early Lung Cancer Action Program Experience
.
Thorac Surg Clin
2015
;
25
:
129
43
.
4.
Henschke
CI
,
McCauley
DI
,
Yankelevitz
DF
,
Naidich
DP
,
McGuinness
G
,
Miettinen
OS
, et al
Early Lung Cancer Action Project: overall design and findings from baseline screening
.
Lancet
1999
;
354
:
99
105
.
5.
International Early Lung Cancer Action Program Investigators
,
Henschke
CI
,
Yankelevitz
DF
,
Libby
DM
,
Pasmantier
MW
,
Smith
JP
, et al
Survival of patients with stage I lung cancer detected on CT screening
.
N Engl J Med
2006
;
355
:
1763
71
.
6.
Kato
H
,
Ichinose
Y
,
Ohta
M
,
Hata
E
,
Tsubota
N
,
Tada
H
, et al
A randomized trial of adjuvant chemotherapy with uracil-tegafur for adenocarcinoma of the lung
.
N Engl J Med
2004
;
350
:
1713
21
.
7.
Strauss
GM
,
Herndon
JE
 II
,
Maddaus
MA
,
Johnstone
DW
,
Johnson
EA
,
Harpole
DH
, et al
Adjuvant paclitaxel plus carboplatin compared with observation in stage IB non-small-cell lung cancer: CALGB 9633 with the Cancer and Leukemia Group B, Radiation Therapy Oncology Group, and North Central Cancer Treatment Group Study Groups
.
J Clin Oncol
2008
;
26
:
5043
51
.
8.
West
H
. 
Individualizing adjuvant therapy for early stage non-small cell lung cancer: we see the destination, but we don't yet know the route
.
J Thorac Dis
2015
;
7
:
235
7
.
9.
Winton
T
,
Livingston
R
,
Johnson
D
,
Rigas
J
,
Johnston
M
,
Butts
C
, et al
Vinorelbine plus cisplatin vs. observation in resected non-small-cell lung cancer
.
N Engl J Med
2005
;
352
:
2589
97
.
10.
Bianchi
F
,
Nuciforo
P
,
Vecchi
M
,
Bernard
L
,
Tizzoni
L
,
Marchetti
A
, et al
Survival prediction of stage I lung adenocarcinomas by expression of 10 genes
.
J Clin Invest
2007
;
117
:
3436
44
.
11.
Subramanian
J
,
Simon
R
. 
Gene expression-based prognostic signatures in lung cancer: ready for clinical use?
J Natl Cancer Inst
2010
;
102
:
464
74
.
12.
Wistuba
II
,
Behrens
C
,
Lombardi
F
,
Wagner
S
,
Fujimoto
J
,
Raso
MG
, et al
Validation of a proliferation-based expression signature as prognostic marker in early stage lung adenocarcinoma
.
Clin Cancer Res
2013
;
19
:
6261
71
.
13.
The Cancer Genome Atlas Research Network
. 
Comprehensive molecular profiling of lung adenocarcinoma
.
Nature
2014
;
511
:
543
50
.
14.
No
JH
,
Kim
YB
,
Song
YS
. 
Targeting nrf2 signaling to combat chemoresistance
.
J Cancer Prev
2014
;
19
:
111
7
.
15.
Bauer
AK
,
Hill
T
 III
,
Alexander
CM
. 
The involvement of NRF2 in lung cancer
.
Oxid Med Cell Longev
2013
;
2013
:
746432
.
16.
Cronin
M
,
Pho
M
,
Dutta
D
,
Stephans
JC
,
Shak
S
,
Kiefer
MC
, et al
Measurement of gene expression in archival paraffin-embedded tissues: development and performance of a 92-gene reverse transcriptase-polymerase chain reaction assay
.
Am J Pathol
2004
;
164
:
35
42
.
17.
Hoerl
AE
,
Kennar
RW
. 
Ridge regression: biased estimation for nonorthogonal problems
.
Technometrics
1970
;
12
:
55
67
.
18.
van Wieringen
WN
,
Kun
D
,
Hampel
R
,
Boulesteix
AL
. 
Survival prediction using gene expression data: a review and comparison
.
Comput Stat Data Anal
2009
;
53
:
1590
603
.
19.
Waldron
L
,
Pintilie
M
,
Tsao
MS
,
Shepherd
FA
,
Huttenhower
C
,
Jurisica
I
. 
Optimized application of penalized regression methods to diverse genomic data
.
Bioinformatics
2011
;
27
:
3399
406
.
20.
Schoenfeld
DA
. 
Sample-size formula for the proportional-hazards regression model
.
Biometrics
1983
;
39
:
499
503
.
21.
Cerami
E
,
Gao
J
,
Dogrusoz
U
,
Gross
BE
,
Sumer
SO
,
Aksoy
BA
, et al
The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data
.
Cancer Discov
2012
;
2
:
401
4
.
22.
Li
J
,
Lu
Y
,
Akbani
R
,
Ju
Z
,
Roebuck
PL
,
Liu
W
, et al
TCPA: a resource for cancer functional proteomics data
.
Nat Methods
2013
;
10
:
1046
7
.
23.
Director's Challenge Consortium for the Molecular Classification of Lung Adenocarcinoma
,
Shedden
K
,
Taylor
JM
,
Enkemann
SA
,
Tsao
MS
,
Yeatman
TJ
, et al
Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study
.
Nat Med
2008
;
14
:
822
7
.
24.
Mitsuishi
Y
,
Taguchi
K
,
Kawatani
Y
,
Shibata
T
,
Nukiwa
T
,
Aburatani
H
, et al
Nrf2 redirects glucose and glutamine into anabolic pathways in metabolic reprogramming
.
Cancer Cell
2012
;
22
:
66
79
.
25.
Singh
A
,
Misra
V
,
Thimmulappa
RK
,
Lee
H
,
Ames
S
,
Hoque
MO
, et al
Dysfunctional KEAP1-NRF2 interaction in non-small-cell lung cancer
.
PLoS Medicine
2006
;
3
:
e420
.
26.
Kobayashi
A
,
Kang
MI
,
Okawa
H
,
Ohtsuji
M
,
Zenke
Y
,
Chiba
T
, et al
Oxidative stress sensor Keap1 functions as an adaptor for Cul3-based E3 ligase to regulate proteasomal degradation of Nrf2
.
Mol Cell Biol
2004
;
24
:
7130
9
.
27.
Zhang
DD
,
Lo
SC
,
Cross
JV
,
Templeton
DJ
,
Hannink
M
. 
Keap1 is a redox-regulated substrate adaptor protein for a Cul3-dependent ubiquitin ligase complex
.
Mol Cell Biol
2004
;
24
:
10941
53
.
28.
Hast
BE
,
Cloer
EW
,
Goldfarb
D
,
Li
H
,
Siesser
PF
,
Yan
F
, et al
Cancer-derived mutations in KEAP1 impair NRF2 degradation but not ubiquitination
.
Cancer Res
2014
;
74
:
808
17
.
29.
Aberle
DR
,
Adams
AM
,
Berg
CD
,
Black
WC
,
Clapp
JD
,
Fagerstrom
RM
, et al
Reduced lung-cancer mortality with low-dose computed tomographic screening
.
N Engl J Med
2011
;
365
:
395
409
.
30.
de Bruin
EC
,
McGranahan
N
,
Mitter
R
,
Salm
M
,
Wedge
DC
,
Yates
L
, et al
Spatial and temporal diversity in genomic instability processes defines lung cancer evolution
.
Science
2014
;
346
:
251
6
.
31.
Zhang
J
,
Fujimoto
J
,
Wedge
DC
,
Song
X
,
Seth
S
,
Chow
CW
, et al
Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing
.
Science
2014
;
346
:
256
9
.
32.
Ma
Q
. 
Role of nrf2 in oxidative stress and toxicity
.
Ann Rev Pharmacol Toxicol
2013
;
53
:
401
26
.
33.
Sporn
MB
,
Liby
KT
. 
NRF2 and cancer: the good, the bad and the importance of context
.
Nat Rev Cancer
2012
;
12
:
564
71
.
34.
Homma
S
,
Ishii
Y
,
Morishima
Y
,
Yamadori
T
,
Matsuno
Y
,
Haraguchi
N
, et al
Nrf2 enhances cell proliferation and resistance to anticancer drugs in human lung cancer
.
Clin Cancer Res
2009
;
15
:
3423
32
.
35.
Jiang
T
,
Chen
N
,
Zhao
F
,
Wang
XJ
,
Kong
B
,
Zheng
W
, et al
High levels of Nrf2 determine chemoresistance in type II endometrial cancer
.
Cancer Res
2010
;
70
:
5486
96
.
36.
Shibata
T
,
Kokubu
A
,
Gotoh
M
,
Ojima
H
,
Ohta
T
,
Yamamoto
M
, et al
Genetic alteration of Keap1 confers constitutive Nrf2 activation and resistance to chemotherapy in gallbladder cancer
.
Gastroenterology
2008
;
135
:
1358
68
.
37.
Kondoh
M
,
Ohga
N
,
Akiyama
K
,
Hida
Y
,
Maishi
N
,
Towfik
AM
, et al
Hypoxia-induced reactive oxygen species cause chromosomal abnormalities in endothelial cells in the tumor microenvironment
.
PLoS One
2013
;
8
:
e80349
.
38.
Lluis
JM
,
Buricchi
F
,
Chiarugi
P
,
Morales
A
,
Fernandez-Checa
JC
. 
Dual role of mitochondrial reactive oxygen species in hypoxia signaling: activation of nuclear factor-{kappa}B via c-SRC and oxidant-dependent cell death
.
Cancer Res
2007
;
67
:
7368
77
.
39.
Guzy
RD
,
Hoyos
B
,
Robin
E
,
Chen
H
,
Liu
L
,
Mansfield
KD
, et al
Mitochondrial complex III is required for hypoxia-induced ROS production and cellular oxygen sensing
.
Cell Metab
2005
;
1
:
401
8
.
40.
Kratz
JR
,
He
J
,
Van Den Eeden
SK
,
Zhu
ZH
,
Gao
W
,
Pham
PT
, et al
A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies
.
Lancet
2012
;
379
:
823
32
.
41.
Bueno
R
,
Hughes
E
,
Wagner
S
,
Gutin
AS
,
Lanchbury
JS
,
Zheng
Y
, et al
Validation of a molecular and pathological model for five-year mortality risk in patients with early stage lung adenocarcinoma
.
J Thorac Oncol
2015
;
10
:
67
73
.

Supplementary data