Abstract
Purpose: Identification of effective markers for outcome is expected to improve the clinical management of non–small cell lung cancer (NSCLC). Here, we assessed in NSCLC the prognostic efficacy of genes, which we had previously found to be differentially expressed in an in vitro model of human lung carcinogenesis.
Experimental Design: Prediction algorithms and risk-score models were applied to the expression of the genes in publicly available NSCLC expression data sets. The prognostic capacity of the immunohistochemical expression of proteins encoded by these genes was also tested using formalin-fixed paraffin-embedded (FFPE) tissue specimens from 156 lung adenocarcinomas and 79 squamous cell carcinomas (SCCs).
Results: The survival of all-stages (P < 0.001, HR = 2.0) or stage-I (P < 0.001, HR = 2.84) adenocarcinoma patients that expressed the five-gene in vitro lung carcinogenesis model (FILM) signature was significantly poorer than that of patients who did not. No survival differences were observed between SCCs predicted to express or lack FILM signature. Moreover, all stages (P < 0.001, HR = 1.95) or stage-I (P = 0.001, HR = 2.6) adenocarcinoma patients predicted to be at high risk by FILM transcript exhibited significantly worse survival than patients at low risk. Furthermore, the corresponding protein signature was associated with poor survival (all stages, P < 0.001, HR = 3.6; stage-I, P < 0.001, HR = 3.5; stage-IB, P < 0.001, HR = 4.6) and mortality risk (all stages, P = 0.001, HR = 4.0; stage-I, P = 0.01, HR = 3.4; stage-IB, P < 0.001, HR = 7.2) in lung adenocarcinoma patients.
Conclusions: Our findings highlight a gene and corresponding protein signature with effective capacity for identification of stage-I lung adenocarcinoma patients with poor prognosis that are likely to benefit from adjuvant therapy. Clin Cancer Res; 17(6); 1490–501. ©2010 AACR.
Identification of molecular prognostic markers will improve the clinical management of NSCLC, the leading cause of cancer-related deaths in the United States and worldwide. We derived a 5-gene and corresponding immunohistochemical protein signature and tested their prognostic capacities in various publicly available microarray data sets and in an independent set of NSCLC histologic tissue specimens, respectively. Both the 5-gene transcript and corresponding protein signatures effectively predicted poor survival of all-stages or stage-I lung adenocarcinoma but not of squamous cell carcinoma patients. Moreover, the FILM protein signature specifically identified a subgroup of nontreated stage-IB lung adenocarcinoma patients with poor prognosis. These findings suggest that the derived 5-gene signature may be assessed for expression by different methods (transcript vs. protein) and for identifying early-stage lung adenocarcinoma patients with poor prognosis that will benefit from adjuvant therapy following resective surgery.
Introduction
Lung cancer is the leading cause of cancer-related deaths in the United States and worldwide (1). NSCLC accounts for the majority of the lung cancer cases and is composed of 2 major subtypes, lung adenocarcinomas and squamous cell carcinomas (SCCs; ref. 2). The average 5-year relative survival rate among NSCLC patients is only 15% (2–4). Mortality due to NSCLC is high because most cancers are diagnosed after regional or distant spread of the disease (3, 4). However, even the 5-year survival rate of stage-I NSCLC patients (30%–50%) is among the worst for early-stage disease of all other malignancies (4, 5). Therefore, identification of markers for early prediction of outcome is warranted for better clinical management of NSCLC patients, and in particular for those with early-stage disease.
Although surgical resection is the first treatment, adjuvant therapy has been shown to improve the survival of lung cancer patients (6, 7). Several factors have been proposed to elevate risk and justify the use of adjuvant therapy for lung cancer patients of which the TNM staging is the most effective standard (5). However, the potential benefits of adjuvant therapy are contentious, in particular in stage-I lung cancer patients (6, 7) as there are few if not any established clinical criteria to separate stage-I NSCLCs (8). Therefore, it is possible that additional molecular factors might help identify early or stage-I lung cancer patients with poor prognosis that may need to receive therapy versus patients with good prognosis that could be spared adjuvant therapy.
With the advent of microarray and high-throughput technology, several early studies have shown the significant association of gene expression profiles and signatures with the survival and outcome of NSCLC patients (9–12). In addition, gene expression signatures have been derived for the prognosis of early (stages I and II; refs. 13 and 14) and all stages NSCLC patients (15–17), lung adenocarcinomas alone (18, 19), and to predict recurrence of NSCLC disease (20, 21). Moreover, a 5-gene signature identified from a set of 672 genes differentially expressed among invasive lung adenocarcinoma cell lines identified NSCLC patients at high risk of poor survival (16). We have previously studied the gene expression profiles of normal, premalignant, and tumorigenic lung epithelial cells constituting an in vitro model of lung carcinogenesis and highlighted prominent gene expression profiles and pathways relevant to the survival of lung adenocarcinoma patients (22). Specifically, we identified the progressive modulation of 6 key genes,ubiquitin conjugating enzyme 2C (UBE2C), minichromosome maintenance (MCM) 2 and 6, targeting protein for Xklp2 (TPX2), flap structure-specific endonuclease 1 (FEN1) and stratifin (SFN), among the cell lines and the significant association of their expression with the survival of lung adenocarcinoma patients (22).
In this study, we sought to assess the prognostic efficacy of the aforementioned 6 genes in NSCLC. We analyzed the association of the expression of the genes at the transcript level with patient survival in several publicly available microarray data sets of lung adenocarcinoma and SCC. We derived a 5-gene signature that is predictive of survival of early-stage lung adenocarcinoma but not SCC patients. Moreover, we analyzed the proteins encoded by the 5-gene signature using immunohistochemistry in an independent FFPE tissue microarray (TMA) NSCLC set, and found that it was also effective in predicting the survival of lung adenocarcinoma patients including those with stage-I disease. Moreover, risk of mortality assessed by both the transcript and protein versions of the derived 5-gene expression signature was an independent predictor of poor survival in lung adenocarcinoma patients. Importantly, the FILM protein signature was capable of identifying stage-IB lung adenocarcinoma patients with dismal prognosis warranting the need of adjuvant therapy in these patients.
Methods
Analysis of publicly available NSCLC microarray data sets
To assess the expression and clinical relevance of the aforementioned 6 genes in human NSCLC, we used publicly available lung adenocarcinoma microarray data sets from the studies by Shedden and colleagues [National Cancer Institute (NCI) Director's Challenge, n = 442; ref. 19], Bild and colleagues (Duke cohort, n = 58; ref. 15), and Bhattacharjee and colleagues (Harvard cohort, n = 125; ref. 10). We also used publicly available microarray data sets generated from lung SCC patient samples from the studies by Raponi and colleagues (n = 130; ref. 17) and Bild and colleagues (n = 53; ref. 15). The raw microarray data for the Director's Challenge study were obtained from the National Cancer Institute Cancer Array database, experiment ID 1015945236141280:1 (https://caarraydb.nci.nih.gov/caarray). Raw data from all the published studies were obtained from the Gene Expression Omnibus (GEO). Raw microarray data from all data sets were analyzed using the BRB-ArrayTools v.3.8.0 developed by Dr. Richard Simon and BRB-ArrayTools Development Team (23). Robust multiarray analysis (RMA) was used for normalization of gene expression data in R language environment (24). Probe sets for UBE2C, MCMs 2 and 6, FEN1, TPX2 and SFN in the microarray platforms used in the different studies (Affymetrix HG-U95Av2, HG-U133A and HG-U133 plus 2.0) were identified using NetAffx from Affymetrix (http://www.affymetrix.com/analysis/index.affx).
Prediction of class
To predict the class of independent patient cohorts, we adopted a previously developed model using 6 algorithms, compound covariate predictor (CCP), linear discriminator analysis (LDA), nearest neighbors 1 and 3 (NN-1 and NN-3), nearest centroid (NC), and support vector machines (SVM; refs. 23 and 25). Lung adenocarcinomas from the data sets in the study by the Shedden and colleagues (19) study were designated together as a training set (n = 442). Class separation into high versus low expression (Fig. 1A) was maintained following analysis of the 6 genes in the 442 lung adenocarcinomas by hierarchical cluster analysis with average linkage (data not shown) using Cluster version 2.11. Samples from the Duke (15) and Harvard (10) cohorts were used as a combined lung adenocarcinoma test set and median-centered independently (n = 183), whereas SCC samples from the studies by Raponi and colleagues (17) and Bild and colleagues (15) were designated as an SCC validation set (n = 183, Fig. 1A). Assessment of the classification efficacy of the 6 genes as well as their capacity in estimating the probability of the identity of a particular sample was performed using a leave-one-out-cross-validation (LOOCV) approach with random permutation for accuracy estimation as previously described (23, 25). Five (all but SFN) of the 6 genes were capable of significant class prediction in training statistically assessed by a univariate t-test with a statistical cut-off of P < 10−5 (Fig. 1A). Furthermore and compared with the other 5 genes, SFN was down-regulated in tumors relative to normal tissue, exhibited relatively lower mean expression in tumors, and increased misclassification. The 6 classification algorithms were then applied to the indicated test sets and the different patient groups or arms, predicted by the derived FILM signature, were analyzed for statistically significant differences in survival by Kaplan-Meier method for estimation of survival probability and log-rank tests in R language environment.
Computation of risk scores
We generated a model for estimation of mortality risk similar to what was described earlier (26). Using gene expression data from training cohorts during LOOCV, the Cox regression coefficients were computed for UBE2C, MCMs 2 and 6, FEN1, and TPX2. A risk score was then derived for each patient in the training set by calculating the summation of the products of the Cox coefficients and normalized centered expression of each gene. Patients were dichotomized into high- and low-risk groups using the 50th percentile (median) cut-off of the risk score as the threshold value or were divided into tertiles or 3 groups (low risk, intermediate risk, and high risk). In all cases, the coefficient and the threshold value derived from the training cohort were directly applied to the gene expression data from validation cohorts. Kaplan-Meier and log-rank test were then applied to assess the significance of prognostic difference between the risk groups without or with the inclusion of age and gender clinical covariates as previously described (27).
Tissue microarray
For this study, we obtained archived FFPE samples from surgically resected lung cancer specimens from the lung cancer tissue bank at The University of Texas M.D. Anderson Cancer Center (Houston, TX). The tissue specimens originally had been collected between 2003 and 2005 and had been classified using the 2004 World Health Organization (WHO) classification system as described before (28). The tissue microarray analyzed in this study is composed of 235 NSCLC tumor specimens (156 lung adenocarcinomas and 79 SCCs) obtained from patients, who underwent surgery at the same institution from 2003 to 2005 and who did not receive any adjuvant or neo adjuvant therapy, under a protocol that was approved by the M.D. Anderson Cancer Center institutional review board. Detailed clinical and pathologic information was available for most of these cases and included patients' demographic data, smoking history (never smokers or ever smokers, patients who had smoked at least 100 cigarettes in their lifetime), and pathologic tumor-node-metastasis (TNM) staging (29). After histologic examination of NSCLC specimens, the NSCLC TMAs were constructed by obtaining three 1 mm in diameter cores from each tumor at 3 different sites (periphery, intermediate, and central tumor sites). The TMAs were prepared with a manual tissue arrayer (Advanced Tissue Arrayer ATA100, Chemicon International).
Immunohistochemical analysis of the protein signature
Immunohistochemistry was performed on histology sections of FFPE tissue samples, using the purified primary antihuman antibodies against UBE2C (1:500 dilution; Boston Biochem), FEN1 (1:50 dilution; BD Biosciences), MCM2, MCM6, SFN (all 1:100 dilution; Novus Biologicals), and TPX2 (1:400 dilution; Novus Biologicals). The sections were deparaffinized, hydrated, subjected to antigen retrieval by heating in a steamer for 20 minutes with 10 mmol/L sodium citrate (pH 6.0), and then incubated in peroxidase blocking reagent (DAKO). Sections were then washed with Tris-containing buffer and incubated overnight at 4°C with the primary antibodies. Subsequently, the sections were washed and incubated with secondary antibodies using the Evision plus labeled polymer kit (DAKO) for 30 minutes followed by incubation with avidin–biotin–peroxidase complex (DAKO) and development with diaminobenzidine chromogen for 5 minutes. Finally, the sections were rinsed in distilled water, counterstained with hematoxylin (DAKO), and mounted on glass slides before evaluation under the microscope. FFPE samples processed similarly, except for the omission of the primary antibody, were used as negative controls. Experienced lung cancer pathologists blinded to the clinical data examined the immunostainings jointly at the same time using light microscopy to generate one set of readings (P.Y. and I.I.W.). The antigens studied exhibited mainly nuclear immunoreactivity. The immunostainings were quantified using a 4-value intensity score (0, 1+, 2+, and 3+) and the percentage (0%–100%) of the extent of reactivity in each core. The final score was then obtained by multiplying the intensity and reactivity extension values (range, 0–300) as previously reported (22, 28, 30).
To understand the association of the expression of the studied proteins with the survival of NSCLC patients, a combined immunoreactivity score for each patient was computed by simple addition of the individual final imunoreactivity scores for each of the analyzed antigens. Lung adenocarcinoma and SCC patients were then dichotomized into a high and low expression group using the 50th percentile (median) cut-off of the combined final immunoreactivity score value. Alternatively for confirmation, patient samples were clustered by average linkage using Cluster v2.11 program following median-centering of the antigens' immunoreactivity scores. Clusters were then identified and visualized using TreeView programs (Michael Eisen Laboratory, Lawrence Berkeley National Laboratory and University of California, Berkeley; http://rana.lbl.gov/EisenSoftware.htm). To further validate the prognostic relevance of the protein signature, lung adenocarcinomas were randomly divided into a training set (n = 78) and a complete (n = 78) or stage-I only test set (n = 62) and a mortality risk was estimated similar to what was described earlier using Cox coefficients in the training set and protein expression within each set. Statistically significant differences in the survival of the clusters, expression or risk groups were analyzed by the Kaplan-Meier and log-rank tests in R language environment without and with the inclusion of and adjustment for age and gender as previously described (27).
Results
Derivation of a 5-gene signature predictive of survival in lung adenocarcinoma
We have previously identified genes that are expressed differentially among cells constituting an in vitro model of lung carcinogenesis (22). Functional pathways analysis of genes differentially expressed between the previously studied lung tumorigenic (1170-I) and normal lung epithelial cells aided us to identify a significantly modulated gene-interaction network composed of 6 key genes based on level of modulation and number of interactions with neighboring molecules (22). Here, we sought to test the relevance of these 6 genes to NSCLC prognosis. The lung adenocarcinoma microarray data sets from the NCI Director's challenge study (19) were used as a training set (n = 442). Lung adenocarcinomas from the Duke (15) and Harvard (10) cohorts served as a combined adenocarcinoma validation set, whereas SCCs from the studies by Raponi and colleagues (17) and Bild and colleagues (15) served as an SCC test set (Fig. 1A). An LOOCV approach, described in more detail in the “Methods,” was used to train the 6 genes based on class separation of high versus low expression identified by cluster analysis (not shown). The number of genes was reduced to 5 following application of a t-test P < 10−5 statistical cut-off to minimize misclassification during LOOCV (Fig. 1A) giving rise to a 5-gene signature which we have designated as FILM. The FILM signature was efficacious as indicated by the sensitivity and specificity of the 6 prediction algorithms (Supplementary Table S1), which demonstrated that overall survival of all stages (P < 0.001) and stage-I only (P < 0.001) lung adenocarcinoma patients predicted to express higher levels of the FILM signature was significantly poorer compared with patients with lower expression of FILM using the linear-discriminator analysis method (LDA) for cross-validation (Fig. 1B). Similar findings were obtained using the other methods outlined in Supplementary Table S1 (data not shown).
We integrated genes (n = 584) we had previously found to be differentially expressed between normal and lung tumorigenic cells (22) with the Shedden and colleagues data set which in turn subdivided patients into 2 clusters with significant differences in survival (data not shown). To compare to the performance of the FILM signature, we also analyzed the 584 genes using similar approaches and derived a 5-gene signature by recursive feature selection and a fold difference in expression of at least 2 between 2 classes identified by cluster analysis (data not shown). The recursive feature-generated 5-gene signature exhibited reduced sensitivity and specificity compared with the FILM signature (data not shown). These findings demonstrate the effectiveness of the FILM signature, despite being selected a priori, in predicting the survival of lung adenocarcinoma patients.
The FILM signature does not predict survival of lung SCC patients
We then examined the capacity of the FILM signature to predict survival in lung SCC. A similar strategy to that depicted in Figure 1A was employed except that a pooled SCC test set (n = 183) was used. All 6 prediction algorithms depicted the inability of the FILM signature to predict survival in lung SCC (Fig. 1C and data not shown). Similar results were obtained when only lung SCC patients were used in the training and test sets (data not shown). These findings exemplify the prognostic specificity of the FILM signature for lung adenocarcinoma.
Mortality risk assessed by the FILM expression signature predicts survival in all stages or stage-I lung adenocarcinoma
We then sought to further validate the robustness of the FILM expression signature in predicting survival in lung adenocarcinoma. We used a strategy similar to what was described before (26) to estimate mortality risk based on computation of risk scores. Lung adenocarcinomas from the Shedden and colleagues (19) study were used as a training set whereas adenocarcinomas from the Harvard (10) and Duke (15) cohorts were pooled as a test set (Fig. 2A). We developed risk scores for patients using the Cox regression coefficients of the genes in the FILM signature and their normalized expression data in the training cohort and patients were dichotomized by the median (50%) risk score (Fig. 2B) or were divided into tertiles and 3 risk groups (low, intermediate, and high; Supplementary Fig. S1) as described before (19). Lung adenocarcinoma patients identified to be at high risk based on FILM signature exhibited significantly worse survival than patients at low risk (P = 5.9 × 10−5; Fig. 2C). For validation of the risk score model, Cox regression coefficients and dichotomization cut-off threshold generated from the training cohort were directly applied to the validation set (n = 183). All stages (P < 0.001) or stage-I (P = 0.001) lung adenocarcinoma patients predicted to be at high risk displayed significantly worse survival (Fig. 2D). Similar results were obtained when patients in the training sets were divided into tertile risk groups (Supplementary Fig. S1) along with improved capacity of the FILM transcript signature to separate stage-I but not all stages lung adenocarcinoma patients with poor survival from those with excellent survival (Supplementary Fig. S1C and D). Multivariate Cox proportional hazard regression analyses of patients divided into 3 risk groups revealed that mortality risk assessed by the FILM transcript signature was, along with stage, an independent predictor of survival (P = 0.03, HR = 1.8, 95% CI = 1.06–3.1) in all stages lung adenocarcinoma patients (Supplementary Table S2). Importantly, in stage-I only patients, high risk estimated by the FILM signature was an independent predictor of survival (P <0.001, HR = 4.5, 95% CI = 1.98–10.16) similar in significance to stage-IA versus IB disease (P = 0.001, HR = 0.135, 95% CI = 0.04–0.45; Supplementary Table S2). Survival probability analysis was then adjusted by inclusion of clinical covariates, age, and gender, in all stages and stage-I test set patients similar to what was described earlier by Shedden and colleagues (19). Inclusion of age and gender as covariates, enhanced the capacity of the FILM transcript signature to separate patients in the test set with good survival from those with poorer survival more notably when analyzing events after 70 months and in particular in stage-I patients (Supplementary Fig. S2B and C). In contrast, the FILM signature performed similarly with or without clinical covariates when analyzed in the Director's challenge study utilized as a training set (Supplementary Fig. S2A).
We then asked whether there is an association between mortality risk assessed by the FILM signature and survival in stage-IA and -IB patients separately. Stage-IB but not stage-IA lung adenocarcinoma patients, from the data sets of the NCI Director's challenge, Duke and Harvard cohorts combined, at high risk of mortality as predicted by the FILM signature exhibited significantly poorer overall survival than patients at low risk (P < 0.001, HR = 2.1, 95% CI = 1.4–3.2; Supplementary Fig. S3A). All stage-IA and -IB patients were included in the analysis as the therapy status of patients in the Duke and Harvard cohorts is not available. We found similar results, albeit less effective, when we analyzed stage-IA and -IB lung adenocarcinoma patients from the NCI Director's challenge data sets data set that are known to have not been treated with any form of therapy (Supplementary Fig. S3B). These findings further validate the robustness of the FILM expression signature in predicting survival in lung adenocarcinoma and demonstrate the signature's capacity in identifying a subpopulation of stage-I patients with poor prognosis.
The corresponding FILM protein signature also predicts survival of nontreated all stages or stage-I lung adenocarcinoma patients
We then explored the prognostic capacity of the FILM signature in NSCLC at the protein level because the expression of transcripts and corresponding encoded proteins do not always correlate (31). We analyzed the protein expression of the FILM signature by immunohistochemistry analysis in FFPE histologic tissue specimens obtained from 156 and 79 lung adenocarcinoma and SCC patients, respectively, and who did not receive any therapy before or after tumor resection (Fig. 3A). A combined total immunoreactivity score for FILM protein expression was computed, and patients were then dichotomized based on the median FILM protein score. In accordance with the findings obtained with the transcript signature, all-stages (P < 0.001, HR = 3.6, 95% CI = 1.9–6.8) or stage-I only (P < 0.001, HR = 3.5, 95% CI = 1.7–7.5) lung adenocarcinoma patients with higher expression of FILM protein exhibited significantly poorer survival compared with patients with lower expression. In contrast, the protein signature was again not prognostic in lung SCC (Fig. 3B). To confirm the prognostic capacity of the FILM protein signature, all stages (Fig. 3C) or stage-I (Fig. 3D) lung adenocarcinoma patients were analyzed by hierarchical cluster analysis by average linkage based on the centered expression of the immunoreactivity scores of FILM protein expression. Two clusters were identified with dissimilar expression of FILM protein. All stages or stage-I only lung adenocarcinoma patients in the cluster with higher FILM protein (high cluster) exhibited significantly poorer survival compared with patients in the low FILM protein cluster (both P < 0.001; Fig. 3C and D).
We then determined to further confirm the robustness of the FILM protein signature in predicting the survival of lung adenocarcinoma patients. Lung adenocarcinoma patients were randomized into a training (n = 78) set and an all-stages (n = 78) or stage-I only (n = 62) test set (Fig. 4A). As performed with the gene expression signature, risk scores were computed based on the FILM proteins centered immunoreactivity scores and Cox coefficients in the training set (Fig. 4A). Patients were then dichotomized based on the median (50%) risk score into high-risk versus low-risk groups (Fig. 4B) or into tertiles (low-, intermediate-, and high-risk groups; data not shown) and subsequently analyzed for survival differences. Lung adenocarcinoma patients in the training set and predicted to be at high mortality risk based on FILM protein exhibited significantly poorer survival than patients at low risk (P = 0.01; Fig. 4C). Cox regression coefficients and dichotomization cut-off threshold generated from the training set were then directly applied to the test set (n = 78) or only stage-I patients within the set(n = 62) and analyzed similarly. In accordance, all stages (P = 0.001, HR = 4.0, 95% CI = 1.6–9.8) or stage-I (P = 0.01, HR = 3.4, 95% CI = 1.4–9.1) lung adenocarcinoma patients predicted to be at high mortality risk based on FILM protein exhibited significantly poor survival (Fig. 4D and Table 1). Separation of patients by the FILM protein risk score into tertiles did not increase itscapacity for survival prediction (data not shown). Moreover, multivariate Cox hazard regression analyses demonstrated that risk assessed by FILM protein was anindependent predictor of survival in all stages (P = 0.005, HR = 5.78, 95% CI = 1.7–20) almost similar in connotation to the most significant variable, age (P = 0.002, HR = 1.1, 95% CI = 1.03–1.15; Table 1). In addition, FILM protein risk was also an independent, and better than IB stage, predictor of survival in stage-I patients (P = 0.01, HR = 6, 95% CI = 1.4–26.2; Table 1). Inclusion of clinical covariates (age and gender) enhanced the capacity of the FILM protein signature to separate patients with poor survival from those with excellent survival (Supplementary Fig. S4). When comparing at 50 months follow-up and onward and following adjustment of survival probability plots for age and gender, no events or deaths were noted in low risk patients in contrast to survival analysis without the clinical covariates.
. | Univariate . | Multivariate . | ||
---|---|---|---|---|
. | HR (95% CI) . | P . | HR (95% CI) . | P . |
All stages . | ||||
High risk | 3.998 (1.575–9.846) | 0.001 | 5.776 (1.668–20.002) | 0.005 |
Stage III vs. I, II, or IV | 2.505 (0.731–8.583) | 0.144 | 7.011 (1.467–33.506) | 0.018 |
Poor differentiation | 1.401 (0.580–3.383) | 0.453 | 0.814 (0.176–3.762) | 0.793 |
Solid subtype | 1.007 (0.994–1.020) | 0.275 | 1.003 (0.983–1.023) | 0.777 |
BAC subtype | 0.997 (0.983–1.012) | 0.71 | 1.014 (0.995–1.034) | 0.14 |
Never vs. ever smoker | 0.969 (0.386–2.432) | 0.946 | 1.366 (0.419–4.453) | 0.605 |
Male gender | 1.816 (0.823–4.003) | 0.139 | 1.226 (0.491–3.062) | 0.662 |
Age | 1.032 (0.987–1.079) | 0.161 | 1.088 (1.031–1.149) | 0.002 |
Stage-I | ||||
High risk | 3.359 (1.392–9.110) | 0.011 | 6.049 (1.399–26.154) | 0.01 |
Stage-IB vs. stage-IA | 2.556 (0.917–7.124) | 0.072 | 3.154 (0.977–10.191) | 0.055 |
Poor differentiation | 2.135 (0.773–5.900) | 0.143 | 1.034 (0.187–5.722) | 0.97 |
Solid subtype | 1.010 (0.996–1.025) | 0.161 | 0.997 (0.972–1.026) | 0.921 |
BAC subtype | 0.996 (0.998–1.012) | 0.646 | 1.007 (0.988–1.026) | 0.459 |
Never vs. ever smoker | 0.728 (0.210–2.517) | 0.616 | 0.770 (0.148–4.024) | 0.757 |
Male gender | 2.210 (0.871–5.604) | 0.095 | 1.940 (0.597–6.289) | 0.269 |
Age | 1.037 (0.984–1.094) | 0.173 | 1.060 (0.990–1.135) | 0.09 |
. | Univariate . | Multivariate . | ||
---|---|---|---|---|
. | HR (95% CI) . | P . | HR (95% CI) . | P . |
All stages . | ||||
High risk | 3.998 (1.575–9.846) | 0.001 | 5.776 (1.668–20.002) | 0.005 |
Stage III vs. I, II, or IV | 2.505 (0.731–8.583) | 0.144 | 7.011 (1.467–33.506) | 0.018 |
Poor differentiation | 1.401 (0.580–3.383) | 0.453 | 0.814 (0.176–3.762) | 0.793 |
Solid subtype | 1.007 (0.994–1.020) | 0.275 | 1.003 (0.983–1.023) | 0.777 |
BAC subtype | 0.997 (0.983–1.012) | 0.71 | 1.014 (0.995–1.034) | 0.14 |
Never vs. ever smoker | 0.969 (0.386–2.432) | 0.946 | 1.366 (0.419–4.453) | 0.605 |
Male gender | 1.816 (0.823–4.003) | 0.139 | 1.226 (0.491–3.062) | 0.662 |
Age | 1.032 (0.987–1.079) | 0.161 | 1.088 (1.031–1.149) | 0.002 |
Stage-I | ||||
High risk | 3.359 (1.392–9.110) | 0.011 | 6.049 (1.399–26.154) | 0.01 |
Stage-IB vs. stage-IA | 2.556 (0.917–7.124) | 0.072 | 3.154 (0.977–10.191) | 0.055 |
Poor differentiation | 2.135 (0.773–5.900) | 0.143 | 1.034 (0.187–5.722) | 0.97 |
Solid subtype | 1.010 (0.996–1.025) | 0.161 | 0.997 (0.972–1.026) | 0.921 |
BAC subtype | 0.996 (0.998–1.012) | 0.646 | 1.007 (0.988–1.026) | 0.459 |
Never vs. ever smoker | 0.728 (0.210–2.517) | 0.616 | 0.770 (0.148–4.024) | 0.757 |
Male gender | 2.210 (0.871–5.604) | 0.095 | 1.940 (0.597–6.289) | 0.269 |
Age | 1.037 (0.984–1.094) | 0.173 | 1.060 (0.990–1.135) | 0.09 |
Univariate and multivariate Cox proportional Hazard regression analysis.
Because the FILM protein signature was effective in stage-I lung adenocarcinoma prognosis, we tested its prognostic capacity in stage-IA and -IB patients separately. When patients were dichotomized based on mortality risk computed by FILM FFPE protein signature, stage-IB but not stage-IA patients with higher risk exhibited significantly poorer survival than patients at low risk (P < 0.001, HR = 4.2, 95% CI = 1.8–9.9; Fig. 5A). Similarly, when patients were divided into tertile groups of low, intermediate, and high risk, FILM protein signature was able to further separate stage-IB patients with poor survival from those with excellent survival (P < 0.001, HR = 7.26, 95% CI = 2.40–21.98; Fig. 5B). These results demonstrate that the FILM protein signature, like its transcript version, is valuable for predicting the survival of lung adenocarcinoma patients and that the protein signature may be valuable for identifying stage-I or -IB patients who may benefit from adjuvant therapy.
Discussion
We previously identified genes differentially expressed among cells constituting an in vitro model of lung carcinogenesis (22). Functional pathways analysis of the differentially expressed genes highlighted a significantly modulated gene-interaction network comprised of 6 key genes, UBE2C, MCM2, MCM6, FEN1, TPX2, and SFN, and that were associated with poor survival in lung adenocarcinoma. In this study, we sought to validate and further examine the prognostic capacity of these genes using multiple methods, various publicly available expression data sets of lung adenocarcinoma and SCCs as well as an additional set of FFPE NSCLCs. We derived a 5-gene signature, FILM (UBE2C, MCM2, MCM6, FEN1, TPX2), that was predictive of poor survival in lung adenocarcinoma including those with stage-I disease. In contrast, the FILM signature was not prognostic in lung SCCs. In addition, we developed a risk model based on the FILM signature with and without inclusion of clinical covariates (age and gender) and demonstrated that FILM transcript signature mortality risk score was predictive of poor survival in lung adenocarcinoma. Moreover, we analyzed and validated the expression of the protein version of the FILM classifier by immunohistochemistry in a series of FFPE NSCLCs and found that FILM protein signature, like the transcript signature, was associated with poor survival in lung adenocarcinomas and not SCCs. Furthermore, risk assessed by the FILM immunohistochemical protein signature was a significant predictor of poor survival in all stages or stage-I lung adenocarcinomas. Finally, we demonstrated that the FILM protein signature effectively identified a subset of nontreated stage-IB lung adenocarcinomas with poor prognosis that may benefit from adjuvant therapy.
The robustness of the FILM transcript signature was validated by deriving a risk score model using the genes' Cox coefficients and expression in the training set similar to what was described before for predicting recurrence in tamoxifen-treated node-negative breast cancer patients (26). Following direct application of the same Cox coefficients and dichotomization threshold from the training set onto an independent test set, mortality risk assessed by the FILM expression signature was an independent predictor of poor survival in all stages or stage-I lung adenocarcinomas. It is worthwhile to mention that mortality risk assessed by FILM signature was significantly predictive of poor survival regardless of the identity of the training and validation cohorts (data not shown). Moreover, these findings were not replicated in lung SCCs. It is unclear why the FILM signature is only prognostic in lung adenocarcinoma but not SCCs. One possible explanation is that the molecular constituents of the FILM signature are significantly up-regulated in 1170-I lung adenocarcinoma forming cells (32) compared with normal bronchial epithelial cells (22). However, one cannot neglect the significant expression of the genes in lung SCCs. For example, TPX2 protein was shown to be up-regulated in preneoplastic lesions representing lung SCC pathogenesis as well as in tumors (33). In addition, we have previously reported the significant upregulation of UBE2C immunohistochemical protein in lung SCCs compared with normal bronchial epithelia (22). Interestingly, UBE2C protein immunohistochemical expression was significantly higher in lung SCCs compared with adenocarcinomas (22). It is plausible to suggest that increased expression of the FILM signature renders lung adenocarcinomas but not SCCs more clinically aggressive.
The prognostic effectiveness of the FILM expression signature was validated at an additional level by assessing the immunohistochemical expression of proteins encoding the FILM classifier in an independent set of FFPE NSCLCs. We deemed this approach to be important as the levels of proteins need not to match with the expression level of transcripts (31, 34). In clear accordance to the transcript signature, the FILM protein signature was prognostic in lung adenocarcinoma (all stages or stage-I) but not in SCC patients and an independent predictor of poor survival in stage-I lung adenocarcinomas. However, it is noteworthy that the semiquantitative and subjective nature of immunohistochemistry analyses, unlike automated quantitative determination of histologic protein expression (35), poses a limitation on using an immunohistochemical protein signature for prognostic purposes such as independent validation by other groups or studies. Moreover, a shortcoming of a protein prognostic classifier from paraffin embedded tissues is the possible variability in readings among different pathologists using the same or different tissue sets. Nevertheless, we attempted to overcome this potential limitation by averaging the score of 3 tissue cores representing 3 different sites per tumor specimen to generate final immunohistochemical scores. Moreover, we demonstrated the prognostic capacity of the FILM protein signature using various analytical methods; patient dichotomization based on median immunohistochemical score, hierarchical clustering of patients and derivation of a risk score model. In addition, the prognostic capacities of each protein within the FILM signature were analyzed independently and deemed in all cases weaker in comparison to the combined FILM protein signature. Furthermore and importantly, the results obtained using the transcript and protein versions of the FILM signature were nearly identical demonstrating the reliability and biological significance of this signature in lung adenocarcinoma prognosis.
We had previously derived global differential expression profiles between normal and tumorigenic lung epithelial cells that are also widely expressed in lung adenocarcinomas from the Shedden and colleagues data set (22). It is plausible that different 5-gene signatures developed from all genes differentially expressed among the in vitro model of lung carcinogenesis may as well be equally or more efficacious in prognosis compared to the FILM signature. To test this hypothesis, we derived a 5-gene signature by recursive feature selection and a fold difference in expression of at least 2 between the classes identified by cluster analysis of the Shedden and colleagues data set following integration of the genes (data not shown). The FILM signature, despite being originally based on genes selected a priori, was superior to the recursive feature-generated 5-gene signature in sensitivity and specificity as well as in predicting poor survival in lung adenocarcinoma. The effectiveness of the FILM signature in lung adenocarcinoma prognosis may be due to the nature of its genes that are proliferation-related despite acting at different points and through distinct mechanisms in cell cycle control (33, 36–42). It is noteworthy that in a large meta analysis using breast tumors from several publicly available breast cancer expression data sets, Wirapati and colleagues demonstrated that all 9 prognostic signatures compared exhibited similar prognostic capacities which were mostly driven by proliferation-related genes (43, 44). Moreover, in the NCI Director's Challenge study by Shedden and colleagues, the most effective gene classifiers in survival prediction (methods A and H as designated by the authors) were mainly comprised of proliferation related genes and performed well with or without clinical covariates. Interestingly, all genes within the FILM signature were found in methods A and H developed by Shedden and colleagues which may explain why the FILM transcript signature performed similarly with or without clinical covariates when analyzed in the Director's Challenge study.
In recent reviews of studies geared toward the development of prognostic expression signatures in NSCLC, Subramanian and Simon have suggested guidelines to follow in the design and testing of an expression signature for lung cancer prognosis including the stratification of stage-IA and -IB NSCLC patients independently in an effort to identify patients who may benefit from adjuvant therapy (6, 43, 44). Towards this, we had analyzed stage-IA and -IB patients alone using both the transcript and protein FILM signatures in publicly available microarray data sets and a tissue microarray of FFPE NSCLC specimens, respectively. We found that the FILM signature identified a subgroup of stage-IB but not stage-IA lung adenocarcinoma patients with poor prognosis, which was consistent as it was found by analyzing the transcript signature in publicly available microarray data sets and the parallel protein signature in a tissue microarray of FFPE specimens. However, it is noteworthy that we analyzed, as depicted in Supplementary Figure S3, the entire set of stage-IA and -IB lung adenocarcinomas from the Director's challenge (19), Duke (15), and Harvard (10) data sets as the therapy status for all the patients from the 2 latter cohorts and some from the former is not known. Nevertheless, when analyzed only in patients from the Director's challenge data set that were not treated with any form of therapy, the FILM transcript signature was also prognostic, albeit less, in stage-IB lung adenocarcinoma. Furthermore, we analyzed the FILM protein signature in a FFPE tissue microarray with more complete and annotated clinicopathological information and found that the FILM protein signature effectively identifies nontreated stage-IB lung adenocarcinoma patients with poor prognosis pinpointing to a potential benefit of adjuvant treatment in this patient population. It is worthwhile to mention that the FILM protein signature, in comparison to its transcript counterpart, was more effective in identifying stage-I and in particular stage-IB lung adenocarcinoma patients with very poor survival and separating them from patients with excellent survival. Therefore, it is plausible to assume that the FILM protein signature may be clinically more useful for stage-I or specifically stage-IB lung adenocarcinoma prognosis. However and as mentioned before, it cannot be neglected that the relatively more qualitative nature of immunohistochemistry analysis of FFPE tissues and more difficult external validation compared to real-time PCR assessment of RNA poses a limitation on the translation of FFPE protein signatures to the clinic. Unless the capacity of the FILM protein signature is cross-validated in independent tissue microarrays and by different investigators, a combination of both mRNA and protein signatures may be powerful for stage-I lung adenocarcinoma prognosis.
In conclusion, our study describes the development and testing of an effective 5-gene signature in lung adenocarcinoma prognosis. This expression signature has several unique and valuable attributes: (1) it identifies stage-I lung adenocarcinomas with poor prognosis; (2) exhibits prognostic specificity towards lung adenocarcinomas only; (3) the protein variant of the signature can effectively predict survival in all stages or stage-I lung adenocarcinoma following analysis of FFPE tissue specimens by immunohistochemistry; and (4) the protein FFPE variant is effective in identifying nontreated stage-IB lung adenocarcinoma patients with dismal prognosis that are most likely to benefit from adjuvant therapy. Therefore, further independent studies are warranted to externally validate the potential clinical use of this 5-gene signature, and in particular, the prognostic capacity of the FILM protein FFPE signature towards translation to the clinic for identifying nontreated stage-I and specifically IB lung adenocarcinoma patients with poor survival that will benefit from adjuvant therapy.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant Support
This work was supported in part by the Jeffrey Lee Cousins Fellowship in Lung Cancer Research (to H. Kadara), Department of Defense (grant W81XWH-04-1-0142 to W.-K. Hong, R. Lotan, and I. I. Wistuba), Jimmy Lane Hewlett Fund for Lung Cancer Research (to R. Lotan and I. I. Wistuba), and NCI lung cancer SPORE (P50 CA70907 to J. D. Minna and I. I. Wistuba).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.