Background: We previously developed a prognostic classifier using the expression levels of BRCA1, HIF1A, DLC1, and XPO1 that identified stage I lung adenocarcinoma patients with a high risk of relapse. That study evaluated patients in five independent cohorts from various regions of the world. In an attempt to further validate the classifier, we have used a meta-analysis–based approach to study 12 cohorts consisting of 1,069 tumor–node–metastasis stage I lung adenocarcinoma patients from every suitable, publically available dataset.

Methods: Cohorts were obtained through a systematic search of public gene expression datasets. These data were used to calculate the risk score using the previously published 4-gene risk model. A fixed effect meta-analysis model was used to generate a pooled estimate for all cohorts.

Results: The classifier was associated with prognosis in 10 of the 12 cohorts (P < 0.05). This association was highly consistent regardless of the ethnic diversity or microarray platform. The pooled estimate demonstrated that patients classified as high risk had worse overall survival for all stage I [HR, 2.66; 95% confidence interval (CI), 1.93–3.67; P < 0.0001] patients and in stratified analyses of stage IA (HR, 2.69; 95% CI, 1.66–4.35; P < 0.0001) and stage IB (HR, 2.69; 95% CI, 1.74–4.16; P < 0.0001) patients.

Conclusions: The 4-gene classifier provides independent prognostic stratification of stage IA and stage IB patients beyond conventional clinical factors.

Impact: Our results suggest that the 4-gene classifier may assist clinicians in decisions about the postoperative management of early-stage lung adenocarcinoma patients. Cancer Epidemiol Biomarkers Prev; 23(12); 2884–94. ©2014 AACR.

Lung cancer is the leading cause of cancer-related death in the world, accounting for more than one-fourth of all cancer deaths (1). Approximately 85% of lung cancers are non–small cell lung cancer (NSCLC). The most common histology for NSCLC is adenocarcinoma, followed by squamous cell carcinoma (SQC) and large cell carcinoma. Despite therapeutic advances, prognosis remains considerably poor relative to other solid cancers, even in early-stage patients (1). Thus, more refined treatment strategies are needed.

Tumor–node–metastasis (TNM) staging is the best prognostic factor for NSCLC. TNM staging is used by clinicians to guide treatment options for NSCLC. Early-stage patients, including TNM stage I and II, are typically approached with curative surgery as the optimal treatment. Among such patients with completely resected NSCLC, adjuvant chemotherapy is recommended only for stage II patients based on several randomized trials that demonstrated survival benefit of platinum-based chemotherapy (2–4). In contrast, clinical trials have revealed no survival advantage and potential deleterious side-effects of adjuvant chemotherapy for stage IA patients (2, 5). With regard to stage IB patients, the evidence supporting the routine use of adjuvant chemotherapy is controversial (2, 6, 7). A more detailed histologic subtyping of lung cancer may improve on TNM classification system. For example, the presence of the micropapillary histologic subtype has been found to be associated with cancer recurrence after limited resection of peripheral lung adenocarcinoma and may help guide treatment strategies (8).

Approximately 30% of stage I lung cancer patients will relapse and ultimately die of this disease. The majority of these patients are being treated by surgery alone because of the lack of clear evidence of benefit from adjuvant chemotherapy. Consequently, 5-year overall survival rates for pathologic stage IA and IB are 73% and 58%, respectively, based on the recently revised, 7th edition of TNM staging (9). One simple and critical question is how clinicians can distinguish the approximately 30% of stage I patients who have higher risk of relapse from the other 70% of patients who have excellent prognosis. High-risk patients might have undetectable micrometastases at the time of surgery. Hence, their outcome could potentially be improved by postoperative systemic therapy with the primary goal of eliminating residual occult metastases that lead to disease recurrence. There is a substantial need to identify stage IB patients who are unlikely to benefit from adjuvant chemotherapy and/or immunotherapy as well as stage IA patients who have the highest risk of relapse. In view of that, it is vital to develop prognostic biomarkers that can help clinicians determine appropriate postoperative management for each individual patient. The demand for such clinical prognostic tests is now undoubtedly increasing, as the extensive use of CT screening becomes widely accepted, in which the majority of patients are diagnosed at stage I (10).

Numerous studies have identified prognostic biomarkers for NSCLC based on multigene expression by using qRT-PCR and/or microarray technology (11–26). However, associations reported in single studies often failed to provide sufficient validation in additional populations (12, 26, 27). A recent review criticized prognostic gene signatures for their unspecified clinical utility as well as the lack of reproducibility, and suggested that no lung cancer signatures are ready for clinical application (27). Taking into account the guidelines suggested in that review, we started to develop a gene expression-based prognostic signature that was intended to be used for early-stage lung adenocarcinomas, especially for stage I patients. Our goal was to make a classifier based on a few key genes that would be a simple and robust classifier for prognosis of stage I lung cancer patients. A similar strategy based on analyzing important 31 cell proliferation genes has shown to be a robust prognostic classifier for lung cancer (28). Our resulting signature, namely the 4-gene classifier, was found to be highly robust in all five cohorts that we analyzed. Its prognostic significance was independent of other clinical factors, including age, gender, TNM stage, and smoking status (29). These results suggest that the 4-gene classifier may be useful in guiding therapeutic decisions for early-stage lung cancer patients. We have now set out to test the 4-gene classifier in as many independent populations as we could identify from publically available gene expression data. As a result, we have evaluated more than 1,000 stage I adenocarcinoma patients from 12 independent cohorts using different gene expression platforms. We use a meta-analytic approach to measure the association of the 4-gene classifier with prognosis and evaluate its reproducibility across those cohorts. We focus on stage IA and IB patients separately to further evaluate the clinical usefulness of this classifier.

Selection of studies

We searched Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) in June 2013 with the search terms “lung cancer,” “non–small cell lung cancer,” “lung adenocarcinoma,” “lung adenocarcinomas,” and “NSCLC.” The retrieved GEO series were filtered by Organism (Homo Sapiens) and Series Type (Expression Profiling by Array) as well as sorted by the number of samples (series that have at least 30 samples). Ninety-two GEO series identified by the initial GEO search were screened on the basis of their Title, Summary, and Overall Design as described in GEO Accession Display. Datasets were excluded if they analyzed only cell lines/xenograft samples, only nontumor specimens (e.g., bronchial epithelial cells, blood, fluid), or contained no primary adenocarcinoma tumors. Also, several superseries that consisted of one or more subseries were excluded (due to duplicate data) and the corresponding subseries with gene expression data were retrieved, leaving 46 GEO datasets of lung cancer-related clinical studies. In parallel with this search, we used ONCOMINE (Compendia Bioscience; http://www.oncomine.com) to identify public microarray datasets that had patients with adenocarcinoma with survival status. ONCOMINE search identified 10 datasets, five of which were not deposited in GEO. The resulting 51 datasets containing primary adenocarcinoma samples were further reviewed on the basis of the Sample Characteristics in Series Matrix File, or the Dataset Detail in ONCOMINE. Selection criteria for all publicly available datasets required each dataset to include survival information for more than 30 TNM stage I patients of adenocarcinoma and have expression data for BRCA1, HIF1A, DLC1, and XPO1. After removing 40 datasets that did not fit our criteria, we found 11 independent microarray datasets, including the Botling (GSE37745; ref. 30), Tang (GSE42127; ref. 24), Rousseaux (GSE30219; ref. 31), Matsuyama (GSE11969; ref. 32), Wilkerson (GSE26939; ref. 33), Lee (GSE8894/ONCOMINE; ref. 17), Bild (GSE3141/ONCOMINE; ref. 34) cohorts as well as the Bhattacharjee (ONCOMINE; ref. 35), Directors (ONCOMINE; ref. 12), Japan (GSE31210; ref. 36), and Tomida (GSE13213; ref. 16) cohorts. Among them, the former seven cohorts were newly obtained from GEO or ONCOMINE (if available) for this present study, whereas the latter four datasets were the original cohorts that we analyzed in our initial study (29). The selection flowchart and the list of retrieved datasets are presented in Fig. 1 and Supplementary Table S1. All of the data from the stage I lung adenocarcinoma cohorts are available in the Supplementary Data.

Figure 1.

Dataset selection flow chart. A total of 92 datasets from GEO and 10 datasets from ONCOMINE were evaluated. A total of 11 datasets were selected to be included in this meta-analysis.

Figure 1.

Dataset selection flow chart. A total of 92 datasets from GEO and 10 datasets from ONCOMINE were evaluated. A total of 11 datasets were selected to be included in this meta-analysis.

Close modal

For the 4-coding gene analyses in patients with SQC, we used multiple cohorts of stage I SQC. Six cohorts, including the Botling, Rousseaux, Tang, Matsuyama, Lee, and Bild datasets among the adenocarcinoma datasets mentioned above, were included, as these cohorts also contained expression data for patients with SQC with survival information. We obtained one SQC dataset from GEO (GSE17710) deposited by Wilkerson and colleagues (37), separated from their adenocarcinoma data (GSE26939, the Wilkerson adenocarcinoma cohort; ref. 33). In addition, among three SQC datasets with survival information which were found in ONCOMINE, including the Raponi (SQC only; ref. 38), Larsen (SQC only; ref. 39), and Zhu (adenocarcinoma and SQC; ref. 11) cohort, the Raponi and the Zhu cohorts were included in SQC analyses. For the Zhu cohort, only patients with SQC were analyzed, whereas patients with adenocarcinoma (n = 14, stage I) were disregarded, since considerable number of patients with adenocarcinoma were already used within (CAN/DF) of the Directors cohort (11). The Larsen cohort was excluded because BRCA1 gene was not available in their platform.

Gene expression data analysis

In this study, we focused only on stage I patients. The 4-coding gene analysis of five original cohorts used American Joint Committee on Cancer (AJCC) TNM 6th edition as described previously (29). Concerning seven new cohorts, although the TNM edition was not specified as either 6th or 7th in each of original papers, we assumed that they were based on AJCC TNM 6th edition because most tumors were collected before the development of TNM 7th edition in 2009. For the Rousseaux cohort, T1N0 tumors were defined as stage IA, whereas T2N0 tumors were defined as stage IB, according to the provided TNM classification for each patient. Among all available stage I cases obtained from the public datasets, two adenocarcinomas and one SQC in the Tang cohort, three adenocarcinomas and four SQCs in the Lee cohort were excluded from the analysis because survival information was not provided for those cases.

For all analyses, the normalized expression values were obtained from each dataset and were not processed further. We then generated criteria to select the most reliable, informative probes. In brief, if there were multiple probes to a single gene, pairwise correlation of each probe was analyzed in each cohort. Probes were removed if they showed no correlation (R<0.5) between any of the other probes for that gene. If there were only two probes to a single gene that did not correlate with each other, the probe with the highest expression was selected. Probes that were selected are shown in Supplementary Table S2. If more than one probe was selected, they were averaged and no further processing was performed. The 4-coding gene classifier [(0.104 × BRCA1) + (0.133 × HIF1A) + (−0.246 × DLC1) + (0.378 × XPO1)] was applied to all newly obtained cohorts using microarray expression data, and the resulting classifier score was categorized as low, medium, or high based on tertiles within each cohort separately. The within cohort categorization was performed to standardize risk scores across all cohorts. This will compensate for the fact that each study used different methodologies to measure the expression of each of the four genes and these expression values are not directly comparable with one another across cohorts. The association between the 4-gene classifier and survival was assessed by the Kaplan–Meier log-rank test for trend using Graphpad Prism v5.0 (Graphpad Software Inc). Cox regression analyses were carried out using SPSS 11.0 (SPSS Inc), and all univariable and multivariable models were adjusted for cohort membership where appropriate. Forest plot analyses and calculations for the nomograms were performed using Stata 11.2 (Stata-Corp LP). Heterogeneity test for the combined HR was carried out using the I2 statistics (40). Nomograms were developed based on coefficients from multivariable Cox regression models on 5-year overall survival using all variables that were significantly associated with patient outcome.

Identification of eligible studies

Because the purpose of this gene expression-based classifier is to identify high-risk, stage I adenocarcinoma patients who may benefit from additional intervention after surgery, we limited all the analyses in this study to stage I primary adenocarcinoma tumors. The systematic search identified 11 microarray datasets consisting each of more than 30 cases of stage I adenocarcinoma patients with sufficient survival information with gene expression data for all four genes, including BRCA1, HIF1A, DLC1, and XPO1 as described in Fig. 1. Four of the 11 datasets were previously analyzed for our initial paper describing the signature, in which five independent cohorts of stage I adenocarcinomas were each analyzed by qRT-PCR and/or microarrays (29). Hence, seven independent cohorts were newly obtained through this systematic search and a total of 12 cohorts were included in this study. The characteristics of the 12 cohorts are summarized in Table 1. This analysis includes 1,069 patients in total, consisting of 546 stage IA and 518 stage IB cases (five cases were not specified as stage IA or IB). These cohorts were derived from six different countries, including Japan, Norway, Sweden, France, South Korea, as well as at least eight different institutions in the United States. Nine of 12 cohorts reported overall survival information, two cohorts reported relapse-free survival and one cohort reported cancer-specific survival. In each cohort, RNA samples were isolated from frozen tumor specimens and were subjected to gene expression analysis based on various platforms, including qRT-PCR and Affymetrix, Illumina, or Agilent microarrays (Table 1).

Table 1.

Twelve independent cohorts of stage I lung adenocarcinoma patients

TNM StageAgeGenderPostoperative therapy
CohortsCountrynIAIBIA or IBMeanMFSmoker (%)CT/RTNoneUnknownOutcomePlatformGEO ID
Five cohorts 
 Japan Japan 149 100 49 59.7 66 83 45.6 149 RFS qRT-PCR — 
 U.S./Norway U.S. (UMD), Norway 67 29 38 64.6 37 30 96.9 43 20 CSS qRT-PCR — 
 Directors U.S. (MSK,HLM,CAN/DF,UM) 276 114 162 64.4 131 145 NA 46 157 73 OSa Affymetrix U133A NAb 
 Bhattacharjee U.S. (Harvard) 76 35 40 64.2 32 44 90.8 76 OSa Affymetrix U95A NAb 
 Tomida Japan 79 42 37 61.4 41 38 50.6 79 OSa Agilent 44K GSE13213 
Seven new cohorts 
 Tang U.S. (MD Anderson) 87 32 55 64.1 37 50 NA 22 65 OSa Illumina WG6 V3 GSE42127 
 Rousseaux France 81 73 61.8 65 16 NA 81 OSa Affymetrix U133+2 GSE30219 
 Botling Sweden 70 28 42 63.5 31 39 NA 31 34 OSa Affymetrix U133+2 GSE37745 
 Wilkerson U.S. (UNC) 62 31 31 65.6 26 36 88.0 62 OSa Agilent 44K custom GSE26939 
 Matsuyama Japan 52 28 24 62.3 28 24 46.2 52 OSa Agilent 21.6K custom GSE11969 
 Lee Korea 36 13 23 61.4 16 20 38.9 36 RFS Affymetrix U133+2 GSE8894b 
 Bild U.S. (Duke) 34 21 64.8 17 17 NA 34 OSa Affymetrix U133+2 GSE3141b 
 Total 1,069 546 518 63.1 527 542  77 524 468    
TNM StageAgeGenderPostoperative therapy
CohortsCountrynIAIBIA or IBMeanMFSmoker (%)CT/RTNoneUnknownOutcomePlatformGEO ID
Five cohorts 
 Japan Japan 149 100 49 59.7 66 83 45.6 149 RFS qRT-PCR — 
 U.S./Norway U.S. (UMD), Norway 67 29 38 64.6 37 30 96.9 43 20 CSS qRT-PCR — 
 Directors U.S. (MSK,HLM,CAN/DF,UM) 276 114 162 64.4 131 145 NA 46 157 73 OSa Affymetrix U133A NAb 
 Bhattacharjee U.S. (Harvard) 76 35 40 64.2 32 44 90.8 76 OSa Affymetrix U95A NAb 
 Tomida Japan 79 42 37 61.4 41 38 50.6 79 OSa Agilent 44K GSE13213 
Seven new cohorts 
 Tang U.S. (MD Anderson) 87 32 55 64.1 37 50 NA 22 65 OSa Illumina WG6 V3 GSE42127 
 Rousseaux France 81 73 61.8 65 16 NA 81 OSa Affymetrix U133+2 GSE30219 
 Botling Sweden 70 28 42 63.5 31 39 NA 31 34 OSa Affymetrix U133+2 GSE37745 
 Wilkerson U.S. (UNC) 62 31 31 65.6 26 36 88.0 62 OSa Agilent 44K custom GSE26939 
 Matsuyama Japan 52 28 24 62.3 28 24 46.2 52 OSa Agilent 21.6K custom GSE11969 
 Lee Korea 36 13 23 61.4 16 20 38.9 36 RFS Affymetrix U133+2 GSE8894b 
 Bild U.S. (Duke) 34 21 64.8 17 17 NA 34 OSa Affymetrix U133+2 GSE3141b 
 Total 1,069 546 518 63.1 527 542  77 524 468    

Abbreviations: CT/RT, chemotherapy and/or radiotherapy; NA, not available; RFS, relapse-free survival; CSS, cancer-specific survival; OS, overall survival.

aNine cohorts with overall survival information were used in the combined analysis (n = 817).

bData were obtained from ONCOMINE.

The 4-gene classifier is tested in 12 independent cohorts

The 4-gene classifier was applied to each of the seven newly obtained cohorts using microarray expression data for four genes, and then cases were categorized as high, medium, or low, based on tertiles for each cohort separately. Similar to our previous results, highly concordant associations were found between the 4-gene classifier and prognosis in all seven newly obtained cohorts, including the Tang (P = 0.046), Rousseaux (P = 0.044), Wilkerson (P = 0.014), Matsuyama (P = 0.028), Lee (P = 0.010), Botling (P = 0.058), and Bild (P = 0.120) cohorts by the Kaplan–Meier analysis (Fig. 2). Overall, the 4-gene classifier was significantly associated with prognosis in 10 cohorts, whereas the remaining two cohorts had marginal associations in the proper direction.

Figure 2.

The performance of the 4-coding gene classifier in 12 independent cohorts of stage I lung adenocarcinoma patients. For each cohort, cases were categorized as high, medium, or low based on tertiles. P values were obtained by the log-rank test for trend.

Figure 2.

The performance of the 4-coding gene classifier in 12 independent cohorts of stage I lung adenocarcinoma patients. For each cohort, cases were categorized as high, medium, or low based on tertiles. P values were obtained by the log-rank test for trend.

Close modal

The nine cohorts that had overall survival information were analyzed in a fixed effects meta-analysis model, which included 817 stage I cases. There was no evidence for heterogeneity or inconsistency across multiple cohorts (I2 = 0.0%, P = 0.980), suggesting that these results are representative of most lung adenocarcinomas and not a result of selection bias (Fig. 3). Patients that were classified as high risk had significantly worse overall survival (HR = 1.73; 95% CI, 1.47–2.02) in stage I analysis (Fig. 3A). The corresponding Kaplan–Meier analysis for the combined stage I patients demonstrated a significant association with overall survival and the 4-gene classifier (P < 0.0001; Fig. 3A). Furthermore, stratified analyses were performed for stage IA and IB separately, to address the prognostic impact of this classifier in these subgroups. Significant associations between the 4-gene classifier and overall survival were found in both stage IA (HR = 1.61; 95% CI, 1.27–2.06) and stage IB (HR = 1.76; 95% CI, 1.41–2.19) analyses, respectively (Fig. 3B and C).

Figure 3.

Forest plot of the prognostic impact of the 4-coding gene classifier in 12 independent cohorts of stage I lung adenocarcinoma. A, meta-analysis of all patients with TNM stage IA or IB lung adenocarcinoma. B, meta-analysis of all patients with TNM stage IA lung adenocarcinoma. C, meta-analysis of all patients with TNM stage IB lung adenocarcinoma.

Figure 3.

Forest plot of the prognostic impact of the 4-coding gene classifier in 12 independent cohorts of stage I lung adenocarcinoma. A, meta-analysis of all patients with TNM stage IA or IB lung adenocarcinoma. B, meta-analysis of all patients with TNM stage IA lung adenocarcinoma. C, meta-analysis of all patients with TNM stage IB lung adenocarcinoma.

Close modal

The 4-gene classifier is an independent prognostic biomarker for stage IA as well as stage IB patients

Given that the classifier is significantly associated with survival in stage IA and IB subgroups, Cox regression analysis was conducted using the combined cohort with respect to each stage (Table 2). All univariable and multivariable Cox analyses were adjusted for cohort membership and multivariable models were adjusted for age, gender, and TNM stage. Because most of the public datasets did not provide complete clinical information, we could not apply other parameters, such as smoking status or adjuvant chemotherapy to the Cox analysis. In univariable analysis, older age, male gender, TNM stage IB, and the 4-gene classifier were each significantly associated with worse outcome. Multivariable models revealed that the 4-gene classifier was significantly associated with poor overall survival, independent of other parameters, in stage I patients (HR = 2.66; 95% CI, 1.93–3.67; P < 0.0001) and in stratified analyses of stage IA (HR = 2.69; 95% CI, 1.66–4.35; P < 0.0001) and stage IB (HR = 2.69; 95% CI, 1.74–4.16; P < 0.0001) patients. We have also performed an analysis of the risk score as a linear variable (rather than an ordered categorical variable) to demonstrate these associations are highly robust and do not rely on using tertiles as cutpoints for the data (Supplementary Table S3)

Table 2.

Univariable and multivariable Cox regression of the 4-coding gene classifier in the combined cohorta of stage I adenocarcinoma patients

Univariable analysisbMultivariable analysisb
Variable (n)HR (95% CI)PHR (95% CI)P
TNM stage I (n = 817) 
 4-gene classifierc Low (276) Reference NA Reference NA 
 Medium (271) 1.34 (0.95–1.89) 0.101 1.27 (0.89–1.80) 0.183 
 High (270) 2.83 (2.07–3.86) <0.0001 2.66 (1.93–3.67) <0.0001 
  Ptrend <0.0001 Ptrend < 0.0001 
 Staged IB (408)/IA (404) 1.68 (1.29–2.19) 0.0001 1.55 (1.19–2.03) 0.001 
 Age Continuous 1.03 (1.02–1.05) <0.0001 1.04 (1.02–1.05) <0.0001 
 Gender Female (409)/male (408) 0.67 (0.52–0.87) 0.002 0.78 (0.60–1.01) 0.062 
TNM stage IA (n = 404) 
 4-gene classifierc Low (149) Reference NA Reference NA 
 Medium (137) 1.47 (0.87–2.49) 0.151 1.42 (0.84–2.40) 0.191 
 High (118) 2.69 (1.67–4.34) <0.0001 2.69 (1.66–4.35) <0.0001 
  Ptrend < 0.0001 Ptrend < 0.0001 
 Age Continuous 1.03 (1.01–1.06) 0.002 1.04 (1.02–1.06) 0.0007 
 Gender Female (205)/male (199) 0.61 (0.40–0.91) 0.016 0.65 (0.43–0.99) 0.043 
TNM stage IB (n = 408) 
 4-gene classifierc Low (125) Reference NA Reference NA 
 Medium (132) 1.20 (0.74–1.93) 0.456 1.14 (0.71–1.84) 0.586 
 High (151) 2.88 (1.88–4.43) <0.0001 2.69 (1.74–4.16) <0.0001 
  Ptrend < 0.0001 Ptrend < 0.0001 
 Age Continuous 1.04 (1.02–1.05) <0.0001 1.03 (1.02–1.05) <0.0001 
 Gender Female (203)/male (205) 0.75 (0.54–1.06) 0.102 0.90 (0.64–1.26) 0.533 
Univariable analysisbMultivariable analysisb
Variable (n)HR (95% CI)PHR (95% CI)P
TNM stage I (n = 817) 
 4-gene classifierc Low (276) Reference NA Reference NA 
 Medium (271) 1.34 (0.95–1.89) 0.101 1.27 (0.89–1.80) 0.183 
 High (270) 2.83 (2.07–3.86) <0.0001 2.66 (1.93–3.67) <0.0001 
  Ptrend <0.0001 Ptrend < 0.0001 
 Staged IB (408)/IA (404) 1.68 (1.29–2.19) 0.0001 1.55 (1.19–2.03) 0.001 
 Age Continuous 1.03 (1.02–1.05) <0.0001 1.04 (1.02–1.05) <0.0001 
 Gender Female (409)/male (408) 0.67 (0.52–0.87) 0.002 0.78 (0.60–1.01) 0.062 
TNM stage IA (n = 404) 
 4-gene classifierc Low (149) Reference NA Reference NA 
 Medium (137) 1.47 (0.87–2.49) 0.151 1.42 (0.84–2.40) 0.191 
 High (118) 2.69 (1.67–4.34) <0.0001 2.69 (1.66–4.35) <0.0001 
  Ptrend < 0.0001 Ptrend < 0.0001 
 Age Continuous 1.03 (1.01–1.06) 0.002 1.04 (1.02–1.06) 0.0007 
 Gender Female (205)/male (199) 0.61 (0.40–0.91) 0.016 0.65 (0.43–0.99) 0.043 
TNM stage IB (n = 408) 
 4-gene classifierc Low (125) Reference NA Reference NA 
 Medium (132) 1.20 (0.74–1.93) 0.456 1.14 (0.71–1.84) 0.586 
 High (151) 2.88 (1.88–4.43) <0.0001 2.69 (1.74–4.16) <0.0001 
  Ptrend < 0.0001 Ptrend < 0.0001 
 Age Continuous 1.04 (1.02–1.05) <0.0001 1.03 (1.02–1.05) <0.0001 
 Gender Female (203)/male (205) 0.75 (0.54–1.06) 0.102 0.90 (0.64–1.26) 0.533 

aThe combined cohort consists of 9 publicly available, independent microarray datasets of stage I patients with overall survival information, including the Directors (276), Bhattacharjee (76), Tomida (79), Botling (70), Tang (87), Rousseaux (81), Matsuyama (52), Wilkerson (62), and Bild (34) cohorts.

bThe univariable model was adjusted for cohort membership and the multivariable model included the 4-gene classifier, cohort membership, age, gender, and TMN staging. Statistically significant (P < 0.05) associations are in bold.

cThe 4-coding gene classifier was categorized based on tertiles of stage I patients for each cohort.

dThere were a total of five stage I cases in the Bhattacharjee (1) and Bild (4) cohorts for which stage IB/IA information is not available. These are included in univariable analyses and excluded in multivariable analyses.

The potential use of the 4-gene classifier to predict prognosis for stage I lung adenocarcinoma

In order for a prognostic classifier to be clinically useful, it has to provide actionable information to the physician. To demonstrate this potential for the 4-gene classifier, we developed a nomogram to predict 5-year survival rates in patients diagnosed with stage I lung adenocarcinoma (Fig. 4). This nomogram is based on the nine cohorts with overall survival data and includes all variables that were significantly associated with 5-year overall survival. The points assigned to each variable are weighted on the basis of Cox regression coefficients. This nomogram demonstrates that the 4-gene classifier could be used with clinical staging and other clinical parameters to predict the probability of 5-year survival. Subgroup analysis within stage is important to demonstrate to show clinical utility, therefore, we created nomograms stratified by TNM stage IA and IB separately as another example of how this classifier can be integrated with TNM staging to help determine patient prognosis (Supplementary Fig. S1).

Figure 4.

Nomogram to predict 5-year survival rates for stage I lung adenocarcinoma. Each clinical variable (4-gene score, TNM stage, sex, and age) is assigned a point value. The sum of those points can then be used to estimate probability of survival for 5 years. For example, if the sum of the points is 300, a patient has approximately a 30% 5-year survival probability.

Figure 4.

Nomogram to predict 5-year survival rates for stage I lung adenocarcinoma. Each clinical variable (4-gene score, TNM stage, sex, and age) is assigned a point value. The sum of those points can then be used to estimate probability of survival for 5 years. For example, if the sum of the points is 300, a patient has approximately a 30% 5-year survival probability.

Close modal

The 4-gene classifier is applicable only to patients with adenocarcinoma

Up to now we have focused only on patients with lung cancer adenocarcinoma histology. To determine whether this association could be observed across different histologies, we examined the 4-gene classifier in SQC, which is another major histologic type of NSCLC. Nine independent cohorts, consisting of 337 stage I SQC patients, were obtained and the 4-gene classifier was applied to each cohort (Supplementary Table S4). In a combined analysis, the eight cohorts with overall survival information were combined (n = 292). However, no significant association was found in any of the SQC analyses (Supplementary Fig. S2), indicating that the 4-gene classifier is specific to adenocarcinoma. This was not completely unexpected because we built this classifier using adenocarcinoma gene expression data only. SQC and adenocarcinoma are also considered to be molecularly distinct entities (41). Therefore, this classifier seems to be only useful for lung adenocarcinoma and not SQC. Sufficient numbers were not available to examine other histologies.

Translating prognostic gene signatures into clinical use is a major challenge in the field of lung cancer research. There is little doubt that prognostic tests are necessary for stage I lung cancer patients after complete resection. In breast cancer, it is striking to note that several multigene assays are already commercially available and are currently used by clinical oncologists and supported by the NCCN and other guidelines (42, 43). As for lung cancer, no prognostic biomarkers have been incorporated into the current guidelines despite a large number of published gene signatures. This may be at least in part due to insufficient reproducibility as well as the lack of large-scale validation (27). Also, it has been suggested that many signatures were developed without clear focus on specific clinical contexts (27). To address those issues, our study has set out to test whether the 4-gene classifier that we previously identified is a robust prognostic classifier for stage I lung adenocarcinoma using every publically available dataset. The 4-gene classifier was a robust classifier for over 1,000 TNM stage I lung adenocarcinoma cases from 12 cohorts regardless of ethnic difference in the genetic background of the patients. When each cohort was separately analyzed, the 4-gene classifier showed highly consistent results for its association with survival. The classifier was highly reproducible across multiple platforms for gene expression measurement, including qRT-PCR and commercial/custom microarrays from Affymetrix, Agilent, and Illumina. There is no evidence of selection bias in any of the 12 cohorts, which suggests these results presented here are representative of stage I lung adenocarcinoma. However, the classifier had no prognostic impact on patients with SQC, indicating its limited utility only to lung adenocarcinomas.

The 4-gene classifier has potential to be used as a prognostic biomarker for the management of stage IA and stage IB adenocarcinoma patients in the current clinical setting. The pooled estimate revealed significant association between the 4-gene classifier and prognosis in stage IA and IB lung adenocarcinoma subgroups, independent of other clinical variables. This suggests that the classifier can add additional discriminative value to identify high-risk patients beyond conventional clinical characteristics. Hence, “low-risk” stage IB patients identified by the classifier and predicted to have excellent survival probabilities may be recommended to forgo adjuvant therapy. Likewise, the classifier may also identify “high-risk” stage IA patients for whom intensive postoperative intervention is considered. Future work should explore and identify the optimal cutpoint for this assay that should distinguish “high-risk” and “low-risk patients.” For convenience, we have used tertile as a cutpoint for our studies to distinguish between high/medium/low risk. It is likely that optimal cutpoints can be found that will improve the clinical utility of this classifier.

Many published multigene signatures for NSCLC had utilized dozens to hundreds or even thousands of genes along with complex classification models that are difficult to understand. We believe that this is an obstacle to the rapid development and feasibility of the clinical tests. In contrast, the 4-gene classifier uses the expression values of only four genes, potentially providing an opportunity to develop a simple and practical laboratory test. The classifier is composed of biologically relevant genes that are mechanistically important and are each significantly associated with prognosis in early-stage lung adenocarcinoma (29). We consider that our strategy of focusing on only biologically relevant genes have improved the chances of developing a robust classifier that will be generalizable to lung adenocarcinoma.

A potential limitation for this analysis is that there were incomplete data on smoking status and types of chemotherapy that were retrieved for several of the cohorts used for the meta-analysis. Therefore, we were not able to include these covariates in the models. Smoking history was considered as a covariate within the discovery cohorts and found to not contribute to the risk association model (29). Adjuvant chemotherapy is not recommended for stage I patients, thus the majority of the patients included in our analysis would not have received any. In fact, only 4% of patients in the Japan discovery cohort received adjuvant chemotherapy (29). Still, these factors are important for survival after lung cancer surgery and it is possible that their inclusion may modulate the association between the 4-gene classifier and prognosis.

We have shown that the classifier is robust and that using qRT-PCR data and microarray data from a variety of laboratories provides similar results. To turn the 4-gene classifier into a clinical test, future work should focus on developing a standardized assay that will include developing methods for measuring each of the four genes and recommending methods for tissue handling, processing, and RNA isolation. Assays will have to be designed to minimize the potential batch effects and interlaboratory differences. Another future possibility is the use of RNA samples extracted from formalin-fixed paraffin-embedded (FFPE) tissues. This could extend the practical utility of the classifier to readily available archived specimens. Furthermore, our previous study demonstrated an improved prognostic association by combining multiple, validated classifiers, namely the combination of the 4-coding gene classifier with non-coding miRNA-21 (miR21) classifier for stage I lung adenocarcinoma (29). In addition to the present results, recent meta-analysis studies demonstrating miR21 as a promising prognostic biomarker for NSCLC may be supportive to the potential combined use of the 4-gene and miR21 classifiers as validated biomarkers (44, 45). This demonstrates that the integration of multiple, independent classifiers can further improve prognostic predictions and suggests that these classifiers can be combined with additional biomarkers and histologic subtyping data to improve decision making capabilities for early-stage lung cancer.

In conclusion, the 4-gene classifier that we recently developed was rigorously validated in large-scale, multiple cohorts with a meta-analytic approach consisting of more than 1,000 stage I patients. To our knowledge, this is the first report of an RNA-based classifier in lung adenocarcinoma to be tested and validated this extensively. The reproducibility of its performance was clearly demonstrated in the intended clinical context based on unbiased approaches. Particularly, the classifier provides additional prognostic stratification beyond the current risk factors, namely, stage IA and stage IB subgroups, highlighting the potential of this classifier in personalized management for early-stage patients. These results support the development of standardized tests for the 4-gene assays and the incorporation of these assays into prospective studies.

No potential conflicts of interest were disclosed.

Conception and design: H. Okayama, A.J. Schetter, C.C. Harris

Development of methodology: H. Okayama, A.J. Schetter

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H. Okayama, T. Kohno, J. Yokota

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): H. Okayama, A.J. Schetter, T. Ishigame, A.I. Robles

Writing, review, and/or revision of the manuscript: H. Okayama, A.J. Schetter, A.I. Robles, C.C. Harris

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H. Okayama, J. Yokota

Study supervision: J. Yokota, S. Takenoshita, C.C. Harris

This work was supported by the Intramural Research Program of the National Cancer Institute, NIH and Department of Defense Congressionally Directed Medical Research Program Grant PR093793.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Siegel
R
,
Naishadham
D
,
Jemal
A
. 
Cancer statistics, 2013
.
CA Cancer J Clin
2013
;
63
:
11
30
.
2.
Pignon
JP
,
Tribodet
H
,
Scagliotti
GV
,
Douillard
JY
,
Shepherd
FA
,
Stephens
RJ
, et al
Lung adjuvant cisplatin evaluation: a pooled analysis by the LACE Collaborative Group
.
J Clin Oncol
2008
;
26
:
3552
9
.
3.
NCCN Clinical Practice Guidelines in Oncology
. Available from: http://www.nccn.org.
4.
Howington
JA
,
Blum
MG
,
Chang
AC
,
Balekian
AA
,
Murthy
SC
. 
Treatment of stage I and II non-small cell lung cancer: diagnosis and management of lung cancer, 3rd ed: American College of Chest Physicians evidence-based clinical practice guidelines
.
Chest
2013
;
143
:
e278S
313S
.
5.
Arriagada
R
,
Auperin
A
,
Burdett
S
,
Higgins
JP
,
Johnson
DH
,
Le Chevalier
T
, et al
Adjuvant chemotherapy, with or without postoperative radiotherapy, in operable non-small-cell lung cancer: two meta-analyses of individual patient data
.
Lancet
2010
;
375
:
1267
77
.
6.
Butts
CA
,
Ding
K
,
Seymour
L
,
Twumasi-Ankrah
P
,
Graham
B
,
Gandara
D
, et al
Randomized phase III trial of vinorelbine plus cisplatin compared with observation in completely resected stage IB and II non-small-cell lung cancer: updated survival analysis of JBR-10
.
J Clin Oncol
2010
;
28
:
29
34
.
7.
Strauss
GM
,
Herndon
JE
 II
,
Maddaus
MA
,
Johnstone
DW
,
Johnson
EA
,
Harpole
DH
, et al
Adjuvant paclitaxel plus carboplatin compared with observation in stage IB non-small-cell lung cancer: CALGB 9633 with the Cancer and Leukemia Group B, Radiation Therapy Oncology Group, and North Central Cancer Treatment Group Study Groups
.
J Clin Oncol
2008
;
26
:
5043
51
.
8.
Nitadori
J
,
Bograd
AJ
,
Kadota
K
,
Sima
CS
,
Rizk
NP
,
Morales
EA
, et al
Impact of micropapillary histologic subtype in selecting limited resection vs lobectomy for lung adenocarcinoma of 2 cm or smaller
.
J Natl Cancer Inst
2013
;
105
:
1212
20
.
9.
Goldstraw
P
,
Crowley
J
,
Chansky
K
,
Giroux
DJ
,
Groome
PA
,
Rami-Porta
R
, et al
The IASLC Lung Cancer Staging Project: proposals for the revision of the TNM stage groupings in the forthcoming (seventh) edition of the TNM Classification of malignant tumours
.
J Thorac Oncol
2007
;
2
:
706
14
.
10.
Church
TR
,
Black
WC
,
Aberle
DR
,
Berg
CD
,
Clingan
KL
,
Duan
F
, et al
Results of initial low-dose computed tomographic screening for lung cancer
.
N Engl J Med
2013
;
368
:
1980
91
.
11.
Zhu
CQ
,
Ding
K
,
Strumpf
D
,
Weir
BA
,
Meyerson
M
,
Pennell
N
, et al
Prognostic and predictive gene signature for adjuvant chemotherapy in resected non-small-cell lung cancer
.
J Clin Oncol
2010
;
28
:
4417
24
.
12.
Shedden
K
,
Taylor
JM
,
Enkemann
SA
,
Tsao
MS
,
Yeatman
TJ
,
Gerald
WL
, et al
Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study
.
Nat Med
2008
;
14
:
822
7
.
13.
Chen
HY
,
Yu
SL
,
Chen
CH
,
Chang
GC
,
Chen
CY
,
Yuan
A
, et al
A five-gene signature and clinical outcome in non-small-cell lung cancer
.
N Engl J Med
2007
;
356
:
11
20
.
14.
Beer
DG
,
Kardia
SL
,
Huang
CC
,
Giordano
TJ
,
Levin
AM
,
Misek
DE
, et al
Gene-expression profiles predict survival of patients with lung adenocarcinoma
.
Nat Med
2002
;
8
:
816
24
.
15.
Lau
SK
,
Boutros
PC
,
Pintilie
M
,
Blackhall
FH
,
Zhu
CQ
,
Strumpf
D
, et al
Three-gene prognostic classifier for early-stage non small-cell lung cancer
.
J Clin Oncol
2007
;
25
:
5562
9
.
16.
Tomida
S
,
Takeuchi
T
,
Shimada
Y
,
Arima
C
,
Matsuo
K
,
Mitsudomi
T
, et al
Relapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosis
.
J Clin Oncol
2009
;
27
:
2793
9
.
17.
Lee
ES
,
Son
DS
,
Kim
SH
,
Lee
J
,
Jo
J
,
Han
J
, et al
Prediction of recurrence-free survival in postoperative non-small cell lung cancer patients by using an integrated model of clinical information and gene expression
.
Clin Cancer Res
2008
;
14
:
7397
404
.
18.
Boutros
PC
,
Lau
SK
,
Pintilie
M
,
Liu
N
,
Shepherd
FA
,
Der
SD
, et al
Prognostic gene signatures for non-small-cell lung cancer
.
Proc Natl Acad Sci U S A
2009
;
106
:
2824
8
.
19.
Bianchi
F
,
Nuciforo
P
,
Vecchi
M
,
Bernard
L
,
Tizzoni
L
,
Marchetti
A
, et al
Survival prediction of stage I lung adenocarcinomas by expression of 10 genes
.
J Clin Invest
2007
;
117
:
3436
44
.
20.
Kratz
JR
,
He
J
,
Van Den Eeden
SK
,
Zhu
ZH
,
Gao
W
,
Pham
PT
, et al
A practical molecular assay to predict survival in resected non-squamous, non-small-cell lung cancer: development and international validation studies
.
Lancet
2012
;
379
:
823
32
.
21.
Raz
DJ
,
Ray
MR
,
Kim
JY
,
He
B
,
Taron
M
,
Skrzypski
M
, et al
A multigene assay is prognostic of survival in patients with early-stage lung adenocarcinoma
.
Clin Cancer Res
2008
;
14
:
5565
70
.
22.
Sun
Z
,
Wigle
DA
,
Yang
P
. 
Non-overlapping and non-cell-type-specific gene expression signatures predict lung cancer survival
.
J Clin Oncol
2008
;
26
:
877
83
.
23.
Roepman
P
,
Jassem
J
,
Smit
EF
,
Muley
T
,
Niklinski
J
,
van de Velde
T
, et al
An immune response enriched 72-gene prognostic profile for early-stage non-small-cell lung cancer
.
Clin Cancer Res
2009
;
15
:
284
90
.
24.
Tang
H
,
Xiao
G
,
Behrens
C
,
Schiller
J
,
Allen
J
,
Chow
CW
, et al
A 12-gene set predicts survival benefits from adjuvant chemotherapy in non-small cell lung cancer patients
.
Clin Cancer Res
2013
;
19
:
1577
86
.
25.
Chen
DT
,
Hsu
YL
,
Fulp
WJ
,
Coppola
D
,
Haura
EB
,
Yeatman
TJ
, et al
Prognostic and predictive value of a malignancy-risk gene signature in early-stage non-small cell lung cancer
.
J Natl Cancer Inst
2011
;
103
:
1859
70
.
26.
Guo
NL
,
Wan
YW
,
Tosun
K
,
Lin
H
,
Msiska
Z
,
Flynn
DC
, et al
Confirmation of gene expression-based prediction of survival in non-small cell lung cancer
.
Clin Cancer Res
2008
;
14
:
8213
20
.
27.
Subramanian
J
,
Simon
R
. 
Gene expression-based prognostic signatures in lung cancer: ready for clinical use?
J Natl Cancer Inst
2010
;
102
:
464
74
.
28.
Wistuba
II
,
Behrens
C
,
Lombardi
F
,
Wagner
S
,
Fujimoto
J
,
Raso
MG
, et al
Validation of a proliferation-based expression signature as prognostic marker in early stage lung adenocarcinoma
.
Clin Cancer Res
2013
;
19
:
6261
71
.
29.
Akagi
I
,
Okayama
H
,
Schetter
AJ
,
Robles
AI
,
Kohno
T
,
Bowman
ED
, et al
Combination of protein coding and noncoding gene expression as a robust prognostic classifier in stage I lung adenocarcinoma
.
Cancer Res
2013
;
73
:
3821
32
.
30.
Botling
J
,
Edlund
K
,
Lohr
M
,
Hellwig
B
,
Holmberg
L
,
Lambe
M
, et al
Biomarker discovery in non-small cell lung cancer: integrating gene expression profiling, meta-analysis, and tissue microarray validation
.
Clin Cancer Res
2013
;
19
:
194
204
.
31.
Rousseaux
S
,
Debernardi
A
,
Jacquiau
B
,
Vitte
AL
,
Vesin
A
,
Nagy-Mignotte
H
, et al
Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers
.
Sci Transl Med
2013
;
5
:
186ra66
.
32.
Matsuyama
Y
,
Suzuki
M
,
Arima
C
,
Huang
QM
,
Tomida
S
,
Takeuchi
T
, et al
Proteasomal non-catalytic subunit PSMD2 as a potential therapeutic target in association with various clinicopathologic features in lung adenocarcinomas
.
Mol Carcinog
2011
;
50
:
301
9
.
33.
Wilkerson
MD
,
Yin
X
,
Walter
V
,
Zhao
N
,
Cabanski
CR
,
Hayward
MC
, et al
Differential pathogenesis of lung adenocarcinoma subtypes involving sequence mutations, copy number, chromosomal instability, and methylation
.
PLoS ONE
2012
;
7
:
e36530
.
34.
Bild
AH
,
Yao
G
,
Chang
JT
,
Wang
Q
,
Potti
A
,
Chasse
D
, et al
Oncogenic pathway signatures in human cancers as a guide to targeted therapies
.
Nature
2006
;
439
:
353
7
.
35.
Bhattacharjee
A
,
Richards
WG
,
Staunton
J
,
Li
C
,
Monti
S
,
Vasa
P
, et al
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
.
Proc Natl Acad Sci U S A
2001
;
98
:
13790
5
.
36.
Okayama
H
,
Kohno
T
,
Ishii
Y
,
Shimada
Y
,
Shiraishi
K
,
Iwakawa
R
, et al
Identification of Genes Up-regulated in ALK-positive and EGFR/KRAS/ALK-negative Lung Adenocarcinomas
.
Cancer Res
2012
;
72
:
100
11
.
37.
Wilkerson
MD
,
Yin
X
,
Hoadley
KA
,
Liu
Y
,
Hayward
MC
,
Cabanski
CR
, et al
Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types
.
Clin Cancer Res
2010
;
16
:
4864
75
.
38.
Raponi
M
,
Zhang
Y
,
Yu
J
,
Chen
G
,
Lee
G
,
Taylor
JM
, et al
Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung
.
Cancer Res
2006
;
66
:
7466
72
.
39.
Larsen
JE
,
Pavey
SJ
,
Passmore
LH
,
Bowman
R
,
Clarke
BE
,
Hayward
NK
, et al
Expression profiling defines a recurrence signature in lung squamous cell carcinoma
.
Carcinogenesis
2007
;
28
:
760
6
.
40.
Higgins
JP
,
Thompson
SG
,
Deeks
JJ
,
Altman
DG
. 
Measuring inconsistency in meta-analyses
.
BMJ
2003
;
327
:
557
60
.
41.
Herbst
RS
,
Heymach
JV
,
Lippman
SM
. 
Lung cancer
.
N Engl J Med
2008
;
359
:
1367
80
.
42.
Goncalves
R
,
Bose
R
. 
Using multigene tests to select treatment for early-stage breast cancer
.
J Natl Compr Canc Netw
2013
;
11
:
174
82
.
43.
Reis-Filho
JS
,
Pusztai
L
. 
Gene expression profiling in breast cancer: classification, prognostication, and prediction
.
Lancet
2011
;
378
:
1812
23
.
44.
Yang
M
,
Shen
H
,
Qiu
C
,
Ni
Y
,
Wang
L
,
Dong
W
, et al
High expression of miR-21 and miR-155 predicts recurrence and unfavourable survival in non-small cell lung cancer
.
Eur J Cancer
2012
;
49
:
604
15
.
45.
Wang
Y
,
Li
J
,
Tong
L
,
Zhang
J
,
Zhai
A
,
Xu
K
, et al
The prognostic value of miR-21 and miR-155 in non-small-cell lung cancer: a meta-analysis
.
Jpn J Clin Oncol
2013
;
43
:
813
20
.

Supplementary data