Purpose: Current staging methods are imprecise for predicting prognosis of early-stage non–small-cell lung cancer (NSCLC). We aimed to develop a gene expression profile for stage I and stage II NSCLC, allowing identification of patients with a high risk of disease recurrence within 2 to 3 years after initial diagnosis.

Experimental Design: We used whole-genome gene expression microarrays to analyze frozen tumor samples from 172 NSCLC patients (pT1-2, N0-1, M0) from five European institutions, who had undergone complete surgical resection. Median follow-up was 89 months (range, 1.2-389) and 64 patients developed a recurrence. A random two thirds of the samples were assigned as the training cohort with the remaining samples set aside for independent validation. Cox proportional hazards models were used to evaluate the association between expression levels of individual genes and patient recurrence-free survival. A nearest mean analysis was used to develop a gene-expression classifier for disease recurrence.

Results: We have developed a 72-gene expression prognostic NSCLC classifier. Based on the classifier score, patients were classified as either high or low risk of disease recurrence. Patients classified as low risk showed a significantly better recurrence-free survival both in the training set (P < 0.001; n = 103) and in the independent validation set (P < 0.01; n = 69). Genes in our prognostic signature were strongly enriched for genes associated with immune response.

Conclusions: Our 72-gene signature is closely associated with recurrence-free and overall survival in early-stage NSCLC patients and may become a tool for patient selection for adjuvant therapy.

Translational Relevance

The current tumor-node-metastasis (TNM) system for non–small-cell lung cancer (NSCLC) remains far from accurate in forecasting survival of individual patients, as 50% of patients operated for early-stage disease will develop recurrent disease. A revision of the TNM system for NSCLC is expected in 2009. Although significant refinement is foreseen, the prediction of survival at the individual level is not expected to improve significantly.

Recent publications suggest that genomic profiles constructed from patient series with long and accurate follow-up outperform standard pathologic TNM staging in estimating risk of disease recurrence, but validation of identified gene expression classifiers remains a difficult issue due to complex classifier models and initial overestimation of the classifier performance.

We describe the development and validation of a 72-gene classifier for early-stage NSCLC based on a straightforward classifier model, and we have compared our classifier with those recently published. As we have developed our classifier along the same lines and practice of a breast cancer prognostic classifier that has recently been approved for clinical application, we believe that there is a good promise for clinical application and personalized tailoring of treatment.

Lung cancer is the leading cause of cancer mortality, with an annual death rate of more than 1.1 million people worldwide (1). Until today, the tumor-node-metastasis (TNM) staging system (2) has remained the most powerful tool for medical decision in non–small-cell lung cancer (NSCLC) patients. The mainstay of treatment in early-stage NSCLC is pulmonary resection, whereas adjuvant chemotherapy should be restricted to patients with pathologic stages II and III. A survival advantage of ∼4% has been shown for adjuvant therapy in these categories of patients (3).The role of adjuvant therapy in stage I is undetermined, mostly due to insufficient number of patients with this stage in clinical trials (4) given the small survival benefit of adjuvant chemotherapy expected in this population. The current TNM system remains far from accurate in forecasting survival of individual patients, as 50% of patients with apparently early disease will develop recurrent disease (2). On the other hand, a significant percentage of patients currently receiving adjuvant chemotherapy are overtreated, and more accurate prognostic tools could potentially avoid unnecessary toxicity of superfluous chemotherapy.

In recent years, it has become apparent that lung cancer develops as the result of multiple somatic mutations and gene expression changes (5). Advances in genome-wide sequencing and microarray analysis have stimulated research in molecular prognostics and have allowed the identification of molecular signatures that can promote more precise classification and prognostication of human cancers. Recent studies in patients with early-stage NSCLC have shown that genomic profiles constructed from patient series with long-term follow-up are able to outperform standard pathologic TNM staging in estimating risk of disease recurrence (610). However, proper validation and valorization of identified gene expression classifiers remains difficult, often due to the complicated classifier models that are used for clinical assessment and because of an initial overestimation of the classifier performance.

Thanks to the collaborative effort of a European Lung Cancer Microarray Consortium, we have developed and validated a 72-gene classifier for early-stage NSCLC that is based on a straightforward classifier model. This model is almost identical to the model used in the well-established 70-gene breast cancer classifier. As the breast cancer prognostic classifier has recently been approved for clinical application (11), the analogous prognostic NSCLC classifier reported here holds promise for clinical application in the near future.

Selection of patients and materials. A consortium of five European academic centers, which prospectively collected lung tumor tissues at the time of surgical treatment, contributed to this study. Tissue collection took place from 1991 till 2005, and samples were consecutively registered per center. To be selected for this retrospective analyses, patients had to fulfill the following criteria: completely resected tumor (free resection margins), no involvement of mediastinal lymph nodes (mediastinal dissection was standard in all institutions), stage I or stage II NSCLC, no adjuvant chemotherapy, and availability of representative fresh-frozen tumor material. To ensure proper assessment of recurrence-free survival (RFS) status, a minimum of 2-y follow-up was required for recurrence-free patients to be included for supervised analysis. From every patient with recurrent disease, time to recurrence, site of recurrence, date and preferably cause of death, and additional therapies were requested. Recurrence was defined as local and/or distant disease relapse. Snap-frozen samples of patients within five centers forming the consortium were collected in a standardized way and were sent to Agendia laboratories. Sections were made to allow one reference pathologist (W.J.M.) to determine percentage of tumor cells in the representative section. Samples had to contain at least 50% tumor cells to be eligible for microarray analysis. A central pathology review of slides of paraffin-embedded tissue ensured that only material from patients with NSCLC was included. This study was approved by the Institutional Review Boards of all five institutes within the consortium.

RNA isolation, amplification, and labeling. Tumor sample RNA isolation, amplification, and cRNA labeling with Cy3 or Cy5 fluorophores were done according to standard protocols (12). High-quality RNA could be isolated from 172 samples and was labeled for full genome gene expression analysis. One hundred three samples were used as training set to develop a prognostic signature. An additional 69 samples served as an independent validation set (Supplementary data 1).

Microarray analysis. All suitable samples were analyzed on Agilent 44K whole genome low density array (Agilent) and hybridized against a lung reference pool that consisted of a pool of RNA from 65 of the NSCLC samples. Hybridization was done according to standard procedures described by the manufacturer and similar to those described elsewhere (12). The scanned slides were quantified using Feature Extraction software (Agilent). Microarray and sample data are available at http://research.agendia.com/.

Data analysis. A multiple sampling procedure was used for development of a robust nearest mean classifier that was similar as described elsewhere (13). To separate good and poor prognosis patients, a recurrence-free period of 3 y was arbitrarily chosen. Based on a 10-fold cross-validation loop, we identified genes with expression ratios correlating with RFS and overall survival (OS) of NSCLC patients. Events as a result of comorbidity were censored, thereby focusing only on lung cancer related deaths. The initial 103 training samples (Supplementary data 1) were randomly split into multiple training (n = 93) and test (n = 10) sets. Each training set was used to identify which genes correlate best with RFS and with OS (based on three statistics: Welsh t test, log-rank test, and a Cox proportional hazard ratio).

Subsequently, the top 100 genes were used for building a nearest mean classifier based on the training set, and the performance was tested on the left-out test set. Nearest mean classifier scores for test samples were determined by calculation of its cosine correlation with the mean low-risk profile and high-risk profile of the training samples (14, 15). Repeating this procedure for different training and test splits (multiple sampling, 500 times) resulted in compilation of multiple gene signatures that comprised of (slightly) different prognostic gene sets.

Finally, the optimal gene set was determined by combined the RFS and OS gene rankings and selection of the most frequently selected genes. This top-ranked gene list was gradually expanded and OS and RFS prognostic power was calculated for 103 training samples for each specific gene set size (leave-one-out approach, Cox proportional hazard ratio) until the best prognostic power was reached using the optimal classifier low-risk/high-risk threshold. The optimal set of 72 genes in the nearest mean classifier was validated on an independent set of 69 samples (Supplementary data 1) using identical classification rules and threshold as used for the training cohort.

All statistics, data analysis, Cox univariate and multivariate analyses, Kaplan-Meier survival estimates, and functional category analysis were done in R8

with additional Bioconductor packages.9

DNA microarray analysis was used to identify a prognostic gene expression profile for RFS of early-stage NSCLC patients. The most common tumor types were adenocarcinomas and squamous cell carcinomas (Supplementary data 1). The patient group of 172 individuals had a median follow-up of 89 months (range, 1.2-389), and 64 patients developed a recurrence. This selection allowed the development of a classifier for the two most prominent subtypes of early-stage NSCLC. Hierarchical clustering of all samples based on global gene expression patterns showed no bias according to institution origin (Supplementary data 2). Furthermore, no significant separation of adenocarcinomas and squamous cell carcinomas or of stage I and stage II tumors was observed. Noteworthy, unsupervised analysis showed no difference between tumor samples from low-risk patients (recurrence-free ≥3 years) and high-risk patients (recurrence or lung cancer–related death within 3 years).

Typically, a cross-validation procedure is required for development of a unbiased gene expression signature (16). For this study, we used a robust multiple sampling procedure in which multiple samples were iteratively set aside as test samples (17). A 10-fold cross-validation procedure was used for identification of RFS and OS associated nearest mean classifiers (see Materials and Methods and Supplementary data 3 for details). Performance of the multiple sampling approach confirmed that the 10-fold cross-validation procedure was not prone to overfitting on the training samples (OS, P = 0.001; RFS, P = 0.01; Supplementary data 4).

Besides assessment of the prognostic power of the nearest mean classifiers, the multiple sampling approach provided a way to rank and identify the genes that were most often selected for classification of NSCLC samples and to select the optimal set of prognostic genes (see Methods for details). The strongest prognostic power was reached using an optimal set of 72 genes [Supplementary data 5; hazard ratio, 4.4 (95% confidence interval, 2.0-9.4) and 3.6 (95% confidence interval, 1.9-6.7), for OS and RFS, respectively]. Investigation of the 72-gene nearest mean classifier indicated that optimal training sample classification was achieved by comparison of each tumor profile with the average profile of clinically low-risk patients (cosine correlation, leave-one-out cross-validation; Fig. 1A). Setting of a low-risk correlation threshold at 0.145 resulted in an accurate 3-year RFS classification of 72 of the 103 training samples (P < 0.001) with a negative predictive value of 87% (Fig. 1A; Table 1). High-risk and low-risk training samples showed a clear difference in expression patterns of the 72 genes (Fig. 1B). Continuous survival analysis verified the significantly better OS and RFS for low-risk profile patients compared with patients with a high-risk profile (P < 0.001, log-rank test; Fig. 1C and D).

Fig. 1.

Classifier prognostic low-risk correlation outcome (leave-one-out cross-validation) of 103 training samples (A). Cosine correlations >0.145 indicate samples with a low-risk profile and correlations <0.145 indicate samples with a high-risk profile. The samples are colored according to the patients' RFS status at 3 y after diagnosis. B, visualization of the 72-gene prognostic classifier. Each row represents one sample and each column represents one gene. Samples are labeled according to the patients' RFS status and ranked similarly as in A. Red, up-regulation of a gene; green, down-regulation of a gene. C and D, Kaplan-Meier plot survival estimates of OS (C) and RFS (D) of the 103 training samples with a low-risk 72-gene profile and of patients with a high-risk 72-gene profile.

Fig. 1.

Classifier prognostic low-risk correlation outcome (leave-one-out cross-validation) of 103 training samples (A). Cosine correlations >0.145 indicate samples with a low-risk profile and correlations <0.145 indicate samples with a high-risk profile. The samples are colored according to the patients' RFS status at 3 y after diagnosis. B, visualization of the 72-gene prognostic classifier. Each row represents one sample and each column represents one gene. Samples are labeled according to the patients' RFS status and ranked similarly as in A. Red, up-regulation of a gene; green, down-regulation of a gene. C and D, Kaplan-Meier plot survival estimates of OS (C) and RFS (D) of the 103 training samples with a low-risk 72-gene profile and of patients with a high-risk 72-gene profile.

Close modal
Table 1.

Classifier performance

Sensitivity*Specificity*NPV*PPV*Accuracy*P
Training 78 66 87 51 70 2.4e−05 
Validation 87 52 93 34 59 0.006 
   

 
Median OS (mo)
 
Median RFS (mo)
 
Training   
    Low-risk group 52 52 
    High-risk group 33 32 
    P§ <0.001 <0.001 
Validation   
    Low-risk group 33 33 
    High-risk group 23 21 
    P§ 0.02 0.01 
Sensitivity*Specificity*NPV*PPV*Accuracy*P
Training 78 66 87 51 70 2.4e−05 
Validation 87 52 93 34 59 0.006 
   

 
Median OS (mo)
 
Median RFS (mo)
 
Training   
    Low-risk group 52 52 
    High-risk group 33 32 
    P§ <0.001 <0.001 
Validation   
    Low-risk group 33 33 
    High-risk group 23 21 
    P§ 0.02 0.01 

Abbreviations: NPV, negative predictive value; PPV, positive predictive value.

*

Based on 3-y relapse-free survival.

Wilcoxon rank sum test.

Patients that died of other causes than lung cancer were censored.

§

Log-rank test.

Validation of the 72-gene NSCLC classifier confirmed its prognostic power on the independent series of 69 patients (Supplementary data 1; Fig. 2A). Survival analysis showed a significant longer survival time (OS and RFS) for patients with a low-risk gene profile compared with patients with a high-risk gene profile (Fig. 2B and C). Further investigation of the classifier performance (Table 1) indicated that the 72-gene NSCLC classifier was most accurate toward prediction of low-risk early-stage NSCLC patients, with a negative predictive value of 93% for RFS. Median OS and RFS for low-risk and high-risk patients were 47 and 31 months (P < 1e−4, Wilcoxon rank-sum test) and 47 and 24 months (P < 0.001), respectively. In a univariate Cox regression model, the 72-gene profile was the strongest prognostic factor with a hazard ratio of 4.8 (95% confidence interval, 2.5-9.4) and 4.9 (95% confidence interval, 2.4-9.4) for OS and RFS, respectively. In a multivariate Cox model, the gene classifier remained the most significant prognostic factor with a hazard ratio of 4.7 and 4.6 for OS and RFS, respectively, indicating that gene classifier prognosis is independent of tumor histology and tumor stage (Table 2). The 72-gene NSCLC classifier was able to identify low-risk patients within both stage I and stage II. Interestingly, stage II low-risk patients showed an equally good survival as stage I low-risk patients (Supplementary data 6A). The identified classifier represented a general survival signature for early-stage NSCLC because high-risk or low-risk groups could be accurately identified for both squamous cell carcinomas and adenocarcinomas (Supplementary data 6B).

Fig. 2.

Validation of the 72-gene prognostic profile on independent samples. A, visualization of the profile for 69 independent validation samples. Coloring same as for Fig. 1B. Samples are ranked according to their prognostic correlative outcome. B and C, Kaplan-Meier plot survival estimates of OS (B) and RFS (C) for the independent validation samples.

Fig. 2.

Validation of the 72-gene prognostic profile on independent samples. A, visualization of the profile for 69 independent validation samples. Coloring same as for Fig. 1B. Samples are ranked according to their prognostic correlative outcome. B and C, Kaplan-Meier plot survival estimates of OS (B) and RFS (C) for the independent validation samples.

Close modal
Table 2.

Univariate and multivariate analyses for OS and RFS for NSCLC validation samples

Cox-ranked univariate
Cox-ranked multivariate
HR (95% CI)PHR (95% CI)P
OS     
    72-gene classifier (low risk vs high risk) 4.83 (2.47-9.44) <0.001 4.70 (2.40-9.21) <0.001 
    Histology (squamous, adeno, or other) 0.82 (0.55-1.21) 0.31 0.89 (0.57-1.40) 0.62 
    Tumor stage (stage I vs II) 2.22 (1.27-3.88) 0.005 2.13 (1.21-3.73) 0.008 
RFS     
    72-gene classifier (low-risk vs high-risk) 4.86 (2.49-9.50) <0.001 4.61 (2.36-9.03) <0.001 
    Histology (squamous, adeno or other) 0.79 (0.53-1.18) 0.25 0.87 (0.55-1.37) 0.54 
    Tumor stage (stage I vs II) 2.27 (1.30-3.97) 0.004 2.08 (1.19-3.64) 0.011 
Cox-ranked univariate
Cox-ranked multivariate
HR (95% CI)PHR (95% CI)P
OS     
    72-gene classifier (low risk vs high risk) 4.83 (2.47-9.44) <0.001 4.70 (2.40-9.21) <0.001 
    Histology (squamous, adeno, or other) 0.82 (0.55-1.21) 0.31 0.89 (0.57-1.40) 0.62 
    Tumor stage (stage I vs II) 2.22 (1.27-3.88) 0.005 2.13 (1.21-3.73) 0.008 
RFS     
    72-gene classifier (low-risk vs high-risk) 4.86 (2.49-9.50) <0.001 4.61 (2.36-9.03) <0.001 
    Histology (squamous, adeno or other) 0.79 (0.53-1.18) 0.25 0.87 (0.55-1.37) 0.54 
    Tumor stage (stage I vs II) 2.27 (1.30-3.97) 0.004 2.08 (1.19-3.64) 0.011 

Abbreviations: HR, hazard ratio; 95% CI, 95% confidence interval.

Functional category analysis of the 72 identified genes indicated that the prognostic profile was strongly enriched for genes associated with immune response (Supplementary data 7) and included immunoglobulins, IFN, and multiple cytokines (Supplementary data 5).

In this study, we developed and validated a prognostic tool based on genome-wide expression profiles of tumor tissue from stage I and stage II NSCLC patients. The developed 72-gene prognostic profile can identify early-stage NSCLC patients with high and low risk for disease recurrence and death within 3 years after primary surgical treatment. This prognostic marker is applicable for adenocarcinomas and squamous cell carcinomas and will likely enable an important step forward in treatment decision of NSCLC patients. Currently, pathologic diagnosis and tumor stage are used to predict patient survival and guide treatment decisions. Unfortunately, standard pathology provides little prognostic information, and the current updated lung cancer staging system is not refined enough to reliably predict tumor recurrence (2). This is especially relevant for the prescription of adjuvant chemotherapy after a complete resection of early-stage NSCLC. A significant survival advantage has been found in several recent randomized studies for patients receiving chemotherapy after complete resection in the stage II and IIIA categories (3). Data for adjuvant treatment of stage I patients are not consistent, most likely as a consequence of the low number of patients in these trials (4). Because 20% to 30% of stage I NSCLC patients will have recurrence within 3 years, identification of early-stage patients with poor prognosis as a consequence of lung cancer could delineate the appropriate candidates for adjuvant chemotherapy.

A survey of our prognostic 72-gene profile and other recently identified prognostic classifiers for early-stage NSCLC indicated large differences in sample numbers, microarray platform, and classifier design (Table 3; Supplementary data 8). Although a great variety of statistical models have been used, the performance of the different classifiers is similar with overall accuracies between 70% and 80% and a hazard ratio of 3 to 4. The overlap in profile genes, however, is limited to only 5 of a total of 327 genes (Fig. 3) even though it includes two studies (18, 19) that reanalyzed existing data (20) but showed respectively no and three genes in overlap. Depending on the selection of training samples, but potentially even more influenced by platform and classifier algorithm choice, different gene sets have been identified as most prognostic, a characteristic that has previously been identified for breast cancer profiling (21, 22). However, functional categories that are shared among the different identified NSCLC profiles form the basis of a potential “NSCLC prognostic space” (Fig. 3) that corroborates their unifying theme toward prognosis prediction.

Table 3.

Comparison of early-stage NSCLC prognostic profiles

StudySignatureTumor typePlatformSignature modelClassificationValidation performance
SensSpecHRP
This study 72 genes SCC/adeno Agilent Nearest mean classifier 3-y RFS 0.87 0.52 4.61* 0.006 
Guo et al. (18) 37 genes Adeno Affymetrix Bayesian belief networks 5-y OS AUC 0.84 — <0.0001  
Beer et al. (20) 50 genes Adeno Affymetrix Cox proportional hazard risk index 3 y OS 0.88 0.68 8.33 0.0008 
Raponi et al. (8) 50 genes SCC Affymetrix Cox proportional hazard risk index 3-y OS 0.41 0.84 2.66  
Beer + Raponi  SCC/adeno    0.64 0.77 3.54  
Lu et al. (19) 64 genes SCC/adeno Affymetrix Cox partial regression risk score model <2-y OS and >5-y RFS — — — <0.001 
Potti et al. (7) Metagene model SCC/adeno Affymetrix Metagene-based binary prediction tree <2.5-y OS and >5-y RFS 0.68 0.88 — <0.001 
      0.85 0.58 — <0.001 
Chen et al. (6) 5 genes SCC/adeno Custom Recursive partitioning decision tree — — — 2.82* 0.006 
Larsen et al. (24) 54 genes Adeno Custom Cox proportional hazard risk index <1.5-y or >3-y RFS 0.79 0.59 2.13* 0.039 
      0.65 0.77 3.30* 0.004 
StudySignatureTumor typePlatformSignature modelClassificationValidation performance
SensSpecHRP
This study 72 genes SCC/adeno Agilent Nearest mean classifier 3-y RFS 0.87 0.52 4.61* 0.006 
Guo et al. (18) 37 genes Adeno Affymetrix Bayesian belief networks 5-y OS AUC 0.84 — <0.0001  
Beer et al. (20) 50 genes Adeno Affymetrix Cox proportional hazard risk index 3 y OS 0.88 0.68 8.33 0.0008 
Raponi et al. (8) 50 genes SCC Affymetrix Cox proportional hazard risk index 3-y OS 0.41 0.84 2.66  
Beer + Raponi  SCC/adeno    0.64 0.77 3.54  
Lu et al. (19) 64 genes SCC/adeno Affymetrix Cox partial regression risk score model <2-y OS and >5-y RFS — — — <0.001 
Potti et al. (7) Metagene model SCC/adeno Affymetrix Metagene-based binary prediction tree <2.5-y OS and >5-y RFS 0.68 0.88 — <0.001 
      0.85 0.58 — <0.001 
Chen et al. (6) 5 genes SCC/adeno Custom Recursive partitioning decision tree — — — 2.82* 0.006 
Larsen et al. (24) 54 genes Adeno Custom Cox proportional hazard risk index <1.5-y or >3-y RFS 0.79 0.59 2.13* 0.039 
      0.65 0.77 3.30* 0.004 

NOTE: More detailed information about sample cohorts and performance can be found in Supplementary data 8.

Abbreviations: Sens, signature sensitivity; Spec, signature specificity; SCC, squamous cell carcinoma; AUC, area under the receiver operating characteristic curve.

*

Multivariate hazard ratio.

Univariate hazard ratio.

Fig. 3.

Gene overlap between NSCLC prognostic signatures. Overlap in genes of recent NSCLC survival signatures is limited to 5 of a total of 327 genes used. Likely, all identified signatures are subsets from a larger NSCLC prognostic space. Details of each individual signature are described in Table 3 and Supplementary data 8.

Fig. 3.

Gene overlap between NSCLC prognostic signatures. Overlap in genes of recent NSCLC survival signatures is limited to 5 of a total of 327 genes used. Likely, all identified signatures are subsets from a larger NSCLC prognostic space. Details of each individual signature are described in Table 3 and Supplementary data 8.

Close modal

As opposed to older reports suggesting that prognostic profiles for adenocarcinoma and squamous cell carcinoma should be developed separately, the prognostic signature described in this study showed statistically significant performance for both subtypes. The identification of this “general” NSCLC prognostic signature is in agreement with more recent findings of Raponi et al. (8) in which a previous adenocarcinoma profile and a new squamous cell carcinoma profile were combined for accurate prognosis of both major NSCLC subtypes.

Although numerous prognostic profiles have been identified, it remains difficult to validate and use publicly available signatures from external parties. In our opinion, this is partly due to shortcomings in detailed reporting of classifier decision rules and thresholds that are essential for accurate reproduction of an identified classifier. Moreover, proper comparison of different classifiers is hampered by the diversity in which prognostic accuracies are reported. Some studies used continuous survival estimates whereas others reported performances based on survival status after 1.5 to 5 years (see Supplementary data 8 for examples). We believe that hazard ratios and median survival times of particular prognostic patient groups should be reported to indicate clinical relevance. We also advocate reporting of individual classifier outcome for all analyzed samples because this provides valuable sample-to-sample information, which is lost on grouped Kaplan-Meier analysis for low-risk and high-risk patients. Finally, validation is essential for accurate assessment of a classifier performance on completely independent samples (16, 23). Proper validation must be done using the same classification rules and thresholds as those determined on the training cohort. Unfortunately, the latter conditions often remain underestimated for accurate validation of a prognostic profile.

In conclusion, our NSCLC 72-gene expression signature is closely associated with clinical outcome in stage I and stage II NSCLC patients who underwent complete resection. The signature shows good performance in patients with adenocarcinomas and squamous cell carcinomas. The reported NSCLC classifier procedure and algorithms closely represent the recently approved 70-gene breast cancer classifier (11) and will form the basis for development of a clinical tool that may allow identification of poor prognosis early-stage NSCLC patients who might benefit from adjuvant chemotherapy.

P. Roepman, A. Witteveen, and A. Floore are employed by Agendia BV. This study was supported by an educational grant from Eli Lilly Co.

Grant support: Eli Lilly.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).

We thank Jaana Lahti, Rob Pover, Arenda Schuurman, and Niels Bakx for sample processing and microarray analysis; Annuska Glas and Tako Bruinsma for bioinformatics support; Hans Peterse† for pathologic tumor percentage assessment of analyzed tumor samples; and Otilia Dalesio for statistical advice. We thank all other participants within the European Lung Cancer Microarray Consortium, specifically Jerzy Laudanski, Lech Chyczewski, Hans Hoffmann, Philipp Schnabel, Witold Rzyman, Ewa Jassem, and Amelia Szymanowska. Finally, we thank Iris Simon, René Bernards, and Laura van't Veer for discussion and critical review of the manuscript.

1
Jemal A, Siegel R, Ward E, et al. Cancer statistics, 2006.
CA Cancer J Clin
2006
;
56
:
106
–30.
2
Goldstraw P, Crowley J, Chansky K, et al. The IASLC Lung Cancer Staging Project: proposals for the revision of the TNM stage groupings in the forthcoming (seventh) edition of the TNM classification of malignant tumours.
J Thorac Oncol
2007
;
2
:
706
–14.
3
Arriagada R, Bergman B, Dunant A, Le CT, Pignon JP, Vansteenkiste J. Cisplatin-based adjuvant chemotherapy in patients with completely resected non-small-cell lung cancer.
N Engl J Med
2004
;
350
:
351
–60.
4
Strauss GM. Management of early-stage lung cancer: past, present, and future adjuvant trials.
Oncology (Huntingt)
2006
;
20
:
1651
–63.
5
Sato M, Shames DS, Gazdar AF, Minna JD. A translational view of the molecular pathogenesis of lung cancer.
J Thorac Oncol
2007
;
2
:
327
–43.
6
Chen HY, Yu SL, Chen CH, et al. A five-gene signature and clinical outcome in non-small-cell lung cancer.
N Engl J Med
2007
;
356
:
11
–20.
7
Potti A, Mukherjee S, Petersen R, et al. A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer.
N Engl J Med
2006
;
355
:
570
–80.
8
Raponi M, Zhang Y, Yu J, et al. Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung.
Cancer Res
2006
;
66
:
7466
–72.
9
Bhattacharjee A, Richards WG, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.
Proc Natl Acad Sci U S A
2001
;
98
:
13790
–5.
10
Lau SK, Boutros PC, Pintilie M, et al. Three-gene prognostic classifier for early-stage non small-cell lung cancer.
J Clin Oncol
2007
;
25
:
5562
–9.
11
Couzin J. Diagnostics. Amid debate, gene-based cancer test approved.
Science
2007
;
315
:
924
.
12
Glas AM, Floore A, Delahaye LJ, et al. Converting a breast cancer microarray signature into a high-throughput diagnostic test.
BMC Genomics
2006
;
7
:
278
.
13
Roepman P, Kemmeren P, Wessels LF, Slootweg PJ, Holstege FC. Multiple robust signatures for detecting lymph node metastasis in head and neck cancer.
Cancer Res
2006
;
66
:
2361
–6.
14
Roepman P, Wessels LF, Kettelarij N, et al. An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas.
Nat Genet
2005
;
37
:
182
–6.
15
van't Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer.
Nature
2002
;
415
:
530
–6.
16
Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.
J Natl Cancer Inst
2003
;
95
:
14
–8.
17
Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy.
Lancet
2005
;
365
:
488
–92.
18
Guo L, Ma Y, Ward R, Castranova V, Shi X, Qian Y. Constructing molecular classifiers for the accurate prognosis of lung adenocarcinoma.
Clin Cancer Res
2006
;
12
:
3344
–54.
19
Lu Y, Lemon W, Liu PY, et al. A gene expression signature predicts survival of patients with stage I non-small cell lung cancer.
PLoS Med
2006
;
3
:
e467
.
20
Beer DG, Kardia SL, Huang CC, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma.
Nat Med
2002
;
8
:
816
–24.
21
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: is there a unique set?
Bioinformatics
2005
;
21
:
171
–8.
22
Fan C, Oh DS, Wessels L, et al. Concordance among gene-expression-based predictors for breast cancer.
N Engl J Med
2006
;
355
:
560
–9.
23
Ransohoff DF. Rules of evidence for cancer molecular-marker discovery and validation.
Nat Rev Cancer
2004
;
4
:
309
–14.
24
Larsen JE, Pavey SJ, Passmore LH, Bowman RV, Hayward NK, Fong KM. Gene expression signature predicts recurrence in lung adenocarcinoma.
Clin Cancer Res
2007
;
13
:
2946
–54.

Supplementary data