Purpose: Blood-based surrogate markers would be attractive biomarkers for early detection, diagnosis, prognosis, and prediction of therapeutic outcome in cancer. Disease-associated gene expression signatures in peripheral blood mononuclear cells (PBMC) have been described for several cancer types. However, RNA-stabilized whole blood–based technologies would be clinically more applicable and robust. We evaluated the applicability of whole blood–based gene expression profiling for the detection of non–small cell lung cancer (NSCLC).

Experimental Design: Expression profiles were generated from PAXgene-stabilized blood samples from three independent groups consisting of NSCLC cases and controls (n = 77, 54, and 102), using the Illumina WG6-VS2 system.

Results: Several genes are consistently differentially expressed in whole blood of NSCLC patients and controls. These expression profiles were used to build a diagnostic classifier for NSCLC, which was validated in an independent validation set of NSCLC patients (stages I–IV) and hospital-based controls. The area under the receiver operator curve was calculated to be 0.824 (P < 0.001). In a further independent dataset of stage I NSCLC patients and healthy controls the AUC was 0.977 (P < 0.001). Specificity of the classifier was validated by permutation analysis in both validation cohorts. Genes within the classifier are enriched in immune-associated genes and show specificity for NSCLC.

Conclusions: Our results show that gene expression profiles of whole blood allow for detection of manifest NSCLC. These results prompt further development of gene expression–based biomarker tests in peripheral blood for the diagnosis and early detection of NSCLC. Clin Cancer Res; 17(10); 3360–7. ©2011 AACR.

Translational Relevance

Our results show that gene expression profiles of whole blood samples can be used to detect non–small cell lung cancer in smokers. These results open the avenue for further development of gene expression–based biomarker tests in peripheral blood for the diagnosis and early detection of non–small cell lung cancer. In the future such a biomarker could be developed as a diagnostic tool supplementary to imaging.

Lung cancer is still the leading cause of cancer-related death worldwide. Prognosis has remained poor with a disastrous 2-year survival rate of approximately 15% due to diagnosis of the disease in late, that is, incurable stages in the majority of patients (1) and still disappointing therapeutic regimens in advanced disease (2). Thus, there is an urgent need to establish reliable tools for the identification of non–small cell lung cancer (NSCLC) patients at early stages of the disease, for example, prior to the development of clinical symptoms. Today, the only way to detect NSCLC is by means of imaging technologies detecting morphologic changes in the lung in combination with biopsy specimens taken for histologic examination. However, these screening approaches are not easily applied to secondary prevention of NSCLC in an asymptomatic population (3).

The use of surrogate tissue–based, for example, blood-based, biomarkers for NSCLC might therefore circumvent the known pitfalls of imaging technologies and invasive diagnostics (3, 4). Such biomarkers might be utilized to direct imaging-based and invasive screening approaches to only those individuals identified as potential NSCLC patients by biomarker screening.

Array-based assessment of disease-specific gene expression patterns in peripheral blood mononuclear cells (PBMC) have been reported for nonmalignant (5) and malignant diseases including renal cell carcinoma, melanoma, bladder, breast, and lung cancers (6–11). In some cases, gene expression profiles derived from PBMC were even suggested as promising tools for early detection (8, 11) or prediction of prognosis (6), albeit these findings have not yet been validated in independent studies. Furthermore, circumventing known pitfalls of analyzing PBMC in a clinical setting (12, 13) by using stabilized RNA derived from whole blood would further strengthen the validity of blood-based surrogate biomarkers for early diagnosis of lung cancer and other malignant diseases.

Using 3 independent datasets of patients and controls, we investigated the validity of whole blood–based gene expression profiling for the detection of NSCLC patients among smokers. We show that RNA-stabilized whole blood samples can indeed be used to identify NSCLC patients among hospital-based controls as well as healthy individuals.

Cases and controls

NSCLC cases and hospital-based controls were recruited at the University Hospital Cologne and the Lung Clinic Merheim, Cologne, Germany. Healthy blood donors were recruited at the Institute for Transfusion Medicine, University of Cologne. From all individuals PAXgene-stabilized blood samples were taken for blood-based gene expression profiling. For all NSCLC cases blood was taken before chemotherapy. To establish and validate a NSCLC-specific classifier, 3 independent sets of cases and controls were assembled. The training set (TS) comprised 77 individuals; 35 of those represent NSCLC cases of stages I to IV admitted to the hospital with symptoms of NSCLC (coughing, dyspnea, weight loss, or reduction in general health state) and 42 were hospital-based controls with a comparable comorbidity but no prior history of lung cancer. The validation set 1 (VS1, n = 54) likewise contained 28 NSCLC cases of stages I to IV and 26 hospital-based controls. Overall, the hospital-based controls in TS and VS1 enclosed individuals suffering from advanced chronic obstructive pulmonary disease (COPD) as typically seen in a population of heavily smoking adults (TS, n = 7; VS1, n = 5). Other diseases such as hypertension (TS, n = 17; VS, n = 11) or other malignancies (TS, n = 10; VS1, n = 6) were also observed in the group of hospital-based controls. The validation set 2 (VS2, n = 102) contained 32 NSCLC cases that had documented stage I NSCLC and were diagnosed mostly during routine chest X-ray analyses or due to clinical workup of unspecific symptoms such as reduced general health status. All individuals had an Eastern Cooperative Oncology Group performance status of 0. In addition, VS2 contains 70 healthy blood donors without prior history of lung cancer. Detailed information on cases and controls are summarized in Table 1 and in Supplementary Table S1. The analyses were approved by the local ethics committee and all probands gave informed consent.

Table 1.

Clinical and epidemiologic characteristics of cases with NSCLC and respective controls

TSVS1VS2
NSCLCControlsNSCLCControlsNSCLCControls
Totala 35 42 28 26 32 70 
Female 10 14 10 16 35 
Male 25 28 19 16 16 35 
mA 61 61 62 65 67 44 
Stage 1       
 Allb NA NA 32 NA 
 SCC    
 AC   23  
 LCC    
Stage 2       
 Allb NA NA NA 
 SCC    
 AC    
 LCC    
Stage 3       
 Allb 17 NA 12 NA NA 
 SCC    
 AC 12    
 LCC    
Stage 4       
 Allb NA NA NA 
 SCC    
 AC    
 LCC    
TSVS1VS2
NSCLCControlsNSCLCControlsNSCLCControls
Totala 35 42 28 26 32 70 
Female 10 14 10 16 35 
Male 25 28 19 16 16 35 
mA 61 61 62 65 67 44 
Stage 1       
 Allb NA NA 32 NA 
 SCC    
 AC   23  
 LCC    
Stage 2       
 Allb NA NA NA 
 SCC    
 AC    
 LCC    
Stage 3       
 Allb 17 NA 12 NA NA 
 SCC    
 AC 12    
 LCC    
Stage 4       
 Allb NA NA NA 
 SCC    
 AC    
 LCC    

Abbreviations: mA, median age; NA, not applicable; SCC, squamous cell cancer; AC, adenocarcinoma; LCC, large cell carcinoma.

aTotal number of cases resp. controls per dataset.

bAll cases.

Blood collection, cRNA synthesis, and array hybridization

Blood (2.5 mL) was drawn into PAXgene vials. After RNA isolation, biotin-labeled cRNA preparation was carried out by using the Ambion Illumina RNA amplification kit (Ambion) or Epicentre TargetAmp Kit (Epicentre Biotechnologies) and Biotin-16-UTP (10 mmol/L; Roche Molecular Biochemicals) or Illumina TotalPrep RNA Amplification Kit (Ambion). Biotin-labeled cRNA (1.5 μg) was hybridized to Sentrix whole genome bead chips WG6 version 2 (Illumina) and scanned on the Illumina BeadStation 500x. For data collection, we used Illumina BeadStudio 3.1.1.0 software. Data are available at http://www.ncbi.nlm.nih.gov/geo/ (GSE12771).

Quality control

For RNA quality control the ratio of the optical density (OD) at wavelengths of 260 and 280 nm was calculated for all samples, which was between 1.85 and 2.1. To determine the quality of cRNA, a semiquantitative reverse transcriptase PCR amplifying a 5′ and a 3′ product of the β-actin gene was used as previously described (14) and showed no sign of degradation with the 5′ and a 3′ product being present. All expression data presented in this article were of high quality. Quality of RNA expression data was controlled by different separate tools. First, we performed quality control by visual inspection of the distribution of raw expression values. Therefore, we constructed pairwise scatter plots of expression values from all arrays (R-project version 2.8.0; ref. 15). For data derived from an array of good quality, a high correlation of expression values is expected to lead to a cloud of dots along the diagonal. In all comparisons the r2 was more than 0.95. Second, the present call rate was high in all samples. Finally, we conducted quantitative quality control. Here, the absolute deviation of the mean expression values of each array from the overall mean was determined (R-project version 2.8.0; ref. 15). In short, the mean expression value for each array was calculated. Next, the mean of these mean expression values (overall mean) was taken and the deviation of each array mean from the overall mean was determined (analogous to probe outlier detection used by Affymetrix before expression value calculation; ref. 16). The deviation was less than 28 for all samples.

Classification algorithm

An overview of the experimental design is depicted in Figure 1. Expression values were independently quantile normalized. The classifier for NSCLC was built and optimized on the basis of the TS (n = 77; 35 NSCLC cases stages I–IV and 42 hospital-based controls) by using a 10-fold cross-validation design. Briefly, TS was divided 10 times into an internal training and an internal validation set in a ratio of 9:1 (distribution to internal validation group, see Supplementary Table S1). In the internal TS, the differentially expressed genes between NSCLC cases and controls were calculated by a t test. Next, 36 different feature lists were extracted from this list of differentially expressed genes by 36 times sequentially increasing the cutoff of the P value (P = 0.00001, P = 0.00002, P = 0.00003, …, P = 0.08, P = 0.09, P = 0.1). Subsequently, for each of the resulting 36 feature lists, 3 different learning algorithms [support vector machine (SVM), linear discrimination analysis (LDA), and prediction analysis for microarrays (PAM)] were trained on the internal TS and used to calculate the probability score for each case of the respective internal validation set. This approach was repeated 10 times according to the 10 dataset splittings of this 10-fold cross-validation. For each of the 10 cross-validation steps the area under the receiver operator curve (AUC) was calculated for the internal validation set. For each of the 36 cutoffs the mean of the 10 AUCs was calculated. Each of the 10 split datasets was used once as internal validation set. The optimal cutoff P value of the t statistics and the optimal classification algorithm were selected according to the maximum mean AUC ever reached in all of the 3 algorithms (Fig. 2). We subsequently built a classifier by using the respective cutoff P value of the t statistics and the selected algorithm in the TS. To further control for overfitting (17), the classifier was validated in 2 independent validation sets [VS1, comprising 28 NSCLC cases (stages I–IV) and 26 hospital-based controls; VS2 comprising 32 NSCLC cases (stage I) and 70 healthy controls]. The AUC was used to measure the quality of the classifier. In addition, we determined a threshold of the test score in the TS to evaluate sensitivity and specificity in the validation sets. In order not to miss a potential case with NSCLC we maximized the sensitivity to detect NSCLC requiring a minimum specificity (18). This specificity was defined to be at least 0.5 in its 95% CI. Of note, the threshold fulfilling these criteria was determined in TS. Subsequently, all individuals in VS1 and VS2 reaching an equal or higher test score than the TS-based threshold score were diagnosed as NSCLC cases and all others were diagnosed as controls. The sensitivity and specificity of this diagnostic test and its 95% CI was estimated for VS1 and VS2 (19). In addition, we compared the probability scores to be a NSCLC case for each case and control by using t statistics. To test the specificity of the classifier the whole analysis was repeated thousand times by using random feature sets of equal size. For visualization of the test score obtained by the SVM algorithm we used the following transformation algorithm: log2 (score + 1) + 0.1.

Figure 1.

Experimental design. In the TS the optimal classifier was established, and then applied to 2 validation datasets (VS1 and VS2). To test the specificity of this optimized classifier additional 1,000 classifiers using random feature lists of equal size were permuted and applied to VS1 and VS2.

Figure 1.

Experimental design. In the TS the optimal classifier was established, and then applied to 2 validation datasets (VS1 and VS2). To test the specificity of this optimized classifier additional 1,000 classifiers using random feature lists of equal size were permuted and applied to VS1 and VS2.

Close modal
Figure 2.

A, identification of the optimal algorithm for classification based on the TS. The mean AUC is plotted against the cutoff P value of the t statistics for feature selection for all 3 algorithms (SVM, LDA, PAM) in the 10-fold cross-validation of the TS. SVM leads to the highest mean AUC (=0.754) at a cutoff P value of 0.003, which is highlighted by a dotted line. B, the fold change of the genes with the most significant changes (P < 0.003, fold change >1.3 or <0.7, absolute difference >80) are shown. All transcripts used in the optimized classifier are given in Supplementary Table S2.

Figure 2.

A, identification of the optimal algorithm for classification based on the TS. The mean AUC is plotted against the cutoff P value of the t statistics for feature selection for all 3 algorithms (SVM, LDA, PAM) in the 10-fold cross-validation of the TS. SVM leads to the highest mean AUC (=0.754) at a cutoff P value of 0.003, which is highlighted by a dotted line. B, the fold change of the genes with the most significant changes (P < 0.003, fold change >1.3 or <0.7, absolute difference >80) are shown. All transcripts used in the optimized classifier are given in Supplementary Table S2.

Close modal

Data mining

To investigate gene ontology (GO) of transcripts used for the classifier we carried out GeneTrail analysis for over- and underexpressed genes (20). To this end, we analyzed the enrichment in genes in the classifier compared with all genes present on the whole array. We analyzed under- and overexpressed genes by using the hypergeometric test with a minimum of 2 genes per category.

In addition, we carried out data mining by gene set enrichment analysis (GSEA; ref. 21). As indicated, we compared the respective list of genes obtained in our expression profiling experiment with datasets deposited in the Molecular Signatures Database (MSigDB). The power of the gene set analysis is derived from its focus on groups of genes that share common biological functions. In GSEA an overlap between predefined lists of genes and the newly identified genes can be identified by using a running sum statistics that leads to attribution of a score. The significance of this score is tested by using a permutation design which is adapted for multiple testing (21). Groups of genes, called gene sets were deposited in the MSigDB and ordered in different biological dimensions such as cancer modules, canonical pathways, miRNA targets, and GO terms (http://www.broadinstitute.org/gsea/msigdb/index.jsp). In our analysis we focused on cancer modules. The cancer modules integrated into the MSigDB are derived from a compendium of 1,975 different published microarrays spanning several different tumor entities (22).

Establishment of a gene expression profiling-based classifier for blood-based diagnosis of NSCLC

The classifier was build on the basis of an initial TS containing 35 NSCLC cases of different stages (stage I, n = 5; stage II, n = 5; stage III, n = 17; stage IV, n = 8) and 42 hospital-based controls suffering in part from severe comorbidities such as COPD, hypertension, cardiac diseases, and malignancies other than lung cancer. We first evaluated 3 different approaches, namely SVM, LDA, and PAM to identify the best algorithm to build a classifier for the diagnosis of NSCLC in a 10-fold cross-validation design. To this end we used 36 different feature lists extracted from the list of differentially expressed genes according to 36 different cutoff P values of the t statistics. In this setting, the SVM algorithm performed best by reaching the highest AUC (mean AUC = 0.754) at a cutoff P value of the t statistics of 0.003 (Fig. 2A). Thus, for subsequent classification we applied SVM by using the 484 feature list obtained at a cutoff P value of the t statistics of P ≤ 0.003 for differentially expressed genes between cases and controls based on the entire TS. Fold changes of genes with most significant P values are shown in Figure 2B and all transcripts used in the classifier are summarized in Supplementary Table S2. We next maximized the sensitivity of the classifier requiring the 95% CI of the specificity to still contain 0.5. Using these criteria, the threshold of the test score was determined to be 0.082. At this threshold of the test score sensitivity was determined to be 0.91 (0.75–0.97) and the specificity 0.38 (0.23–0.54), that is, the 95% CI containing 0.5.

The diagnostic NSCLC classifier can be used to detect NSCLC cases in an independent validation set of NSCLC cases and hospital-based controls

First we validated whether the classifier can be used to discriminate NSCLC cases of early and advanced stages among hospital-based controls. Therefore, in the first independent validation set cases and controls were chosen in a similar setting as in the TS, that is, patients with NSCLC stages I to IV and clinical symptoms associated with lung cancer and hospital-based controls with relevant comorbidities (n = 26). The AUC for the diagnostic test of NSCLC in this first validation set was calculated to be 0.824 (P < 0.001; Fig. 3A). In addition, probability scores were significantly different between cases and controls (p < 0.001, t test). Using the threshold determined in TS we observed a sensitivity of 0.61 (range, 0.41–0.78) and a specificity of 0.85 (range, 0.64–0.95) in VS1. Regarding only patients with stage III/IV NSCLC (n = 20) in VS1, the sensitivity was 0.70 (range, 0.46–0.87) and the specificity 0.85 (range, 0.64–0.95; data not shown). We observed that 3 out of 3 stage I NSCLC cases had a low score in this cohort of patients with a high degree of comorbidity (Fig. 3E). Patients with NSCLC of advanced stages in VS1 were identified among hospital-based controls by using the threshold determined in the TS.

Figure 3.

Performance of optimized classifier in VS1 and VS2, respectively. The classifier established in the TS was applied to VS1 and VS2, respectively, using SVM. A, receiver operating characteristic (ROC) curve for the optimized classifier applied to VS1 (all stage NSCLC patients and hospital-based controls); AUC = 0.824, P < 0.001. B, the box plot comprises 1,000 AUCs obtained by using a random list of 484 features to build the classifier in TS and then apply it to VS1. The real AUC using the specific classifier (as in A) is depicted. C, ROC curve for the optimized classifier applied to VS2 (stage I NSCLC patients, healthy controls); AUC = 0.977 (P < 0.001). D, the box plot comprises AUCs obtained by applying 1,000 randomly permutated classifiers of feature size same as that of VS2. The real AUC using the specific classifier (as in C) is depicted. E, test scores to be a case of all samples from VS1 and VS2 were ranked. NSCLC cases are marked in red and controls in blue. Cases with stage I NSCLC are indicated by ▾. Membership in a specific cohort is indicated by a vertical line underneath the graph (⇃. A line is drawn at the threshold defined in TS.

Figure 3.

Performance of optimized classifier in VS1 and VS2, respectively. The classifier established in the TS was applied to VS1 and VS2, respectively, using SVM. A, receiver operating characteristic (ROC) curve for the optimized classifier applied to VS1 (all stage NSCLC patients and hospital-based controls); AUC = 0.824, P < 0.001. B, the box plot comprises 1,000 AUCs obtained by using a random list of 484 features to build the classifier in TS and then apply it to VS1. The real AUC using the specific classifier (as in A) is depicted. C, ROC curve for the optimized classifier applied to VS2 (stage I NSCLC patients, healthy controls); AUC = 0.977 (P < 0.001). D, the box plot comprises AUCs obtained by applying 1,000 randomly permutated classifiers of feature size same as that of VS2. The real AUC using the specific classifier (as in C) is depicted. E, test scores to be a case of all samples from VS1 and VS2 were ranked. NSCLC cases are marked in red and controls in blue. Cases with stage I NSCLC are indicated by ▾. Membership in a specific cohort is indicated by a vertical line underneath the graph (⇃. A line is drawn at the threshold defined in TS.

Close modal

The diagnostic NSCLC classifier identifies stage I NSCLC patients in an independent second validation set comprising stage I NSCLC cases and healthy blood donors

After showing that the classifier can be used to detect NSCLC cases among individuals with comorbidities, we also investigated whether this test can be used to distinguish NSCLC cases presenting at stage I with no or only minor symptoms from healthy individuals. Therefore, we recruited a second independent validation set consisting of 32 NSCLC cases at stage I and 70 healthy blood donors (VS2). By applying the identical classifier to VS2 the AUC was determined to be 0.977 (P < 0.001; Fig. 3C). Again, the classifier was used as a diagnostic test thereby applying the TS-based threshold of the test score. At this threshold the sensitivity was 0.97 (0.82–0.99) and the specificity 0.89 (0.78–0.95). We also observed a highly significant difference in the probability values to be a NSCLC patient for cases in contrast to controls (P < 0.001, t test). Healthy controls without significant comorbidity (VS2) tend to have lower probability scores compared with hospital-based controls (VS1 and TS) although this finding was not statistically significant (Fig. 3E). Of note, the difference in the probability score between healthy controls and patients with stage I lung cancer is more pronounced compared with the difference of probability scores between patients with NSCLC stage III/IV and patients with a similar high load of comorbidity.

Permutation test to analyze the specificity of the classifier

To further underline the specificity of this classifier, we used 1,000 random feature lists, each comprising 484 features to likewise build a SVM-based classifier in the TS, which were then applied to VS1 and VS2, respectively. For VS1, the mean AUC obtained by using these random feature lists was 0.49 (range, 0.1346–0.8633) with only 2 AUCs being 0.824 or more, the AUC obtained by using the NSCLC-specific classifier (Fig. 3B). This corresponds to a P value of less than 0.002 for the permutation test further confirming the specificity of the NSCLC classifier. Similarly, by applying the permuted classifiers to VS2, only 1.8% of random feature lists lead to an AUC ≥ 0.977, the AUC obtained by using the NSCLC-specific classifier (Fig. 3D). Furthermore, by merging TS and VS1 and randomly generating new dataset splitting in TS′ and VS1′ it could be shown that highly specific classifiers can be built independently of the initial composition of the TS (data not shown). In conclusion, a NSCLC-specific blood-based classifier was build that was successfully used to identify NSCLC cases among hospital-based controls as well as NSCLC cases of early stage among healthy individuals.

Mining of expression profiles

Different strategies were used to analyze the biological significance of the extracted 484 features derived as classifier by the SVM approach. First, we used GeneTrail (20) to analyze an enrichment in GO terms of the genes associated with NSCLC in our study. We observed 112 GO categories showing a significant (false discovery rate–corrected P < 0.05) enrichment of genes in our extracted gene list, of which 25 were associated with the immune system (Supplementary Table S3). These data indicate an impact of immune cells to the genes involved in the classifier.

Next, we carried out a GSEA (21, 22) thereby focusing on cancer modules which comprise groups of genes participating in biological processes related to cancer. Initially, the power of such modules has been shown exemplarily for single genes such as cyclin D1 or PGC-1α (23, 24) and a more comprehensive view on such modules has been introduced recently (22). This comprehensive collection of modules allows the identification of similarities across different tumor entities such as the common ability of a tumor to metastasize to the bone, for example, in subsets of breast, lung, and prostate cancers (22). Overall, 456 such modules are described in the database spanning several biological processes such as metabolism, transcription, cell cycle, and others.

When analyzing the identified 484 NSCLC-specific features, 199 cancer modules including 26% of all NSCLC-associated modules were identified to show a significant enrichment. This indicates that genes used to build a classifier for NSCLC cases in our study represent, in part, a subset of biologically cooperating genes that are also differentially expressed in primary lung cancer.

To further investigate the specificity of the extracted list of 484 features obtained from our analysis for the classification of NSCLC, we also calculated the overlap between this extracted gene set and a set of genes differentially expressed in the blood of patients with renal cell cancer (7). No significant overlap was observed for both gene sets. Similarly, no overlap was observed between our NSCLC-specific gene set and gene sets obtained from blood-based expression profiles specific for melanoma (10), breast (8), and bladder (9). In summary, these data point to a NSCLC-specific gene set present in our classifier.

Using RNA-stabilized whole blood from smokers in 3 independent sets of NSCLC patients and controls, we present a gene expression–based classifier that can be used as a biomarker to discriminate between NSCLC cases and controls. The optimal parameters of this classifier were first determined by applying a classical 10-fold cross-validation approach to a TS consisting of NSCLC patients (stages I–IV) and hospital-based controls (TS). Subsequently, this optimized classifier was successfully applied to 2 independent validation sets, namely VS1 comprising NSCLC patients of stages I to IV and hospital-based controls and VS2 containing patients with stage I NSCLC and healthy blood donors. This successful application of the classifier in both validation sets underlines the validity and robustness of the classifier. Extensive permutation analysis by using random feature lists and the possibility of building specific classifiers independently of the composition of the initial TS further support the specificity of the classifier. We found no association between stage of disease and the probability score assigned to each sample. In addition, we observed no association between other cancers and the probability score of the controls (data not shown). But controls without documented morbidity (controls in VS2) tend to have lower probability scores to be a case as compared with controls with documented morbidity, although this was not statistically significant.

The gene set used to build the classifier was enriched in genes related to immune functions. We therefore postulate that the classifier is based on the transcriptome of blood-based immune effector cells rather than influenced by the occurrence of rare tumor cells occasionally detected in blood of cancer patients, although this possibility cannot be ruled out (25). Moreover, the lack of NSCLC tumor cell–specific transcripts, for example, thyroid transcription factor (TTF1), cytokeratins, or human telomerase reverse transcriptase in our classifier points into the same direction. GSEA (9) of the gene set used in our diagnostic approach in comparison with published expression datasets from a variety of cancer entities (22) revealed an significant overlap with 26% of the lung cancer tissue–specific gene expression profiles. As NSCLC tissue consists of tumor cells, immune cells, and stromal cells (10), we presume that the similarities of both gene sets is due to a similar regulation of genes present in immune cells in NSCLC tissue and peripheral blood of NSCLC patients. These findings are in line with the data showing tumor-induced alteration of the immune system in mice (6, 7) and in men (11, 26).

Recently, Showe and colleagues (11) reported a NSCLC-associated gene expression signature derived from PBMC of predominantly early-stage NSCLC patients. Also, an enrichment of immune-associated pathways in the signature was observed in this study, further indicating that the alteration of the immune system might be a common feature already during the initial phase of NSCLC development. As we used RNA-stabilized whole blood and not PBMC for analysis, we were not surprised that the signature identified by Showe and colleagues could not be used in our dataset to distinguish between cases and controls. The same holds true when applying our classifier to the published dataset (Zander and Schultze, unpublished data). Findings derived from several of our own studies further underline that signatures derived from PBMC and RNA-stabilized whole blood samples cannot be directly compared (refs. 12, 13); Schultze, unpublished data). However, as previously shown by us and others, for clinical applicability and robustness we would favor RNA-stabilized approaches because these methods reveal more reliable results in a multicenter setting (13).

Overall, our data show the feasibility of a diagnostic test for NSCLC based on RNA-stabilized whole blood. Our findings form the basis for validation studies in a multicenter setting in prevalent NSCLC patient cohorts enriched for early-stage disease. In the end, this endeavor might open the avenue to test the blood-based NSCLC classifier in prospective trials to evaluate the predictive potential of diagnostic classifiers for NSCLC in high-risk individuals.

No potential conflicts of interest were declared.

We thank Julia Classen for experimental assistance.

J.L. Schultze and J. Wolf were supported by the Helmholtz-Gemeinschaft (VH-VI-143). J.L. Schultze was also supported by the Humboldt-Foundation (Sofja Kovalevskaja award) and a Köln Fortune grant. J. Wolf and R.K. Thomas were supported by the NGFNplus-program of the German Ministry of Science and Education (BMBF; Grant 01GS08100). A. Staratschek-Jox was supported by the Monika Kutzner Stiftung.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Jemal
A
,
Siegel
R
,
Ward
E
,
Hao
Y
,
Xu
J
,
Murray
T
, et al
Cancer statistics, 2008
.
CA Cancer J Clin
2008
;
58
:
71
96
.
2.
Sandler
A
,
Gray
R
,
Perry
MC
,
Brahmer
J
,
Schiller
JH
,
Dowlati
A
, et al
Paclitaxel-carboplatin alone or with bevacizumab for non-small-cell lung cancer
.
N Engl J Med
2006
;
355
:
2542
50
.
3.
Henschke
CI
,
Yankelevitz
DF
,
Libby
DM
,
Pasmantier
MW
,
Smith
JP
,
Miettinen
OS
. 
Survival of patients with stage I lung cancer detected on CT screening
.
N Engl J Med
2006
;
355
:
1763
71
.
4.
Bach
PB
,
Silvestri
GA
,
Hanger
M
,
Jett
JR
. 
Screening for lung cancer: ACCP evidence-based clinical practice guidelines (2nd edition)
.
Chest
2007
;
132
:
69S-77S
.
5.
Staratschek-Jox
A
,
Classen
S
,
Gaarz
A
,
Debey-Pascher
S
,
Schultze
JL
. 
Blood based transcriptomics–leukemias and beyond
.
Expert Rev Mol Diagn
2009
;
9
:
271
80
.
6.
Burczynski
ME
,
Twine
NC
,
Dukart
G
,
Marshall
B
,
Hidalgo
M
,
Stadler
WM
, et al
Transcriptional profiles in peripheral blood mononuclear cells prognostic of clinical outcomes in patients with advanced renal cell carcinoma
.
Clin Cancer Res
2005
;
11
:
1181
9
.
7.
Twine
NC
,
Stover
JA
,
Marshall
B
,
Dukart
G
,
Hidalgo
M
,
Stadler
W
, et al
Disease-associated expression profiles in peripheral blood mononuclear cells from patients with advanced renal cell carcinoma
.
Cancer Res
2003
;
63
:
6069
75
.
8.
Sharma
P
,
Sahni
NS
,
Tibshirani
R
,
Skaane
P
,
Urdal
P
,
Berghagen
H
, et al
Early detection of breast cancer based on gene-expression patterns in peripheral blood cells
.
Breast Cancer Res
2005
;
7
:
R634
44
.
9.
Osman
I
,
Bajorin
DF
,
Sun
TT
,
Zhong
H
,
Douglas
D
,
Scattergood
J
, et al
Novel blood biomarkers of human urinary bladder cancer
.
Clin Cancer Res
2006
;
12
:
3374
80
.
10.
Critchley-Thorne
RJ
,
Yan
N
,
Nacu
S
,
Weber
J
,
Holmes
SP
,
Lee
PP
. 
Down-regulation of the interferon signaling pathway in T lymphocytes from patients with metastatic melanoma
.
PLoS Med
2007
;
4
:
e176
.
11.
Showe
MK
,
Vachani
A
,
Kossenkov
AV
,
Yousef
M
,
Nichols
C
,
Nikonova
EV
, et al
Gene expression profiles in peripheral blood mononuclear cells can distinguish patients with non-small cell lung cancer from patients with nonmalignant lung disease
.
Cancer Res
2009
;
69
:
9202
10
.
12.
Debey
S
,
Schoenbeck
U
,
Hellmich
M
,
Gathof
BS
,
Pillai
R
,
Zander
T
, et al
Comparison of different isolation techniques prior gene expression profiling of blood derived cells: impact on physiological responses, on overall expression and the role of different cell types
.
Pharmacogenomics J
2004
;
4
:
193
207
.
13.
Debey
S
,
Zander
T
,
Brors
B
,
Popov
A
,
Eils
R
,
Schultze
JL
. 
A highly standardized, robust, and cost-effective method for genome-wide transcriptome analysis of peripheral blood applicable to large-scale clinical trials
.
Genomics
2006
;
87
:
653
64
.
14.
Zander
T
,
Yunes
JA
,
Cardoso
AA
,
Nadler
LM
. 
Rapid, reliable and inexpensive quality assessment of biotinylated cRNA
.
Braz J Med Biol Res
2006
;
39
:
589
93
.
15.
Team RDC
. 
R: A language and environment for statistical computing
.
R Foundation for Statistical Computing
; 
2006
. (www.R-project.org).
16.
Affymetrix
. 
Statistical algorithms description document
. 
2002
.
Available from:
http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf.
17.
Lee
S
. 
Mistakes in validating the accuracy of a prediction classifier in high-dimensional but small-sample microarray data
.
Stat Methods Med Res
2008
;
17
:
635
42
.
18.
Akobeng
A
. 
Understanding diagnostic tests 3: receiver operating characteristic curves
.
Acta Paediatr
2007
;
96
:
644
7
.
19.
Newcombe
RG
. 
Two-sided confidence intervals for the single proportion: comparison of seven methods
.
Stat Med
1998
;
17
:
857
72
.
20.
Backes
C
,
Keller
A
,
Kuentzer
J
,
Kneissl
B
,
Comtesse
N
,
Elnakady
YA
, et al
GeneTrail–advanced gene set enrichment analysis
.
Nucleic Acids Res
2007
;
35
:
W186
92
.
21.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
22.
Segal
E
,
Friedman
N
,
Koller
D
,
Regev
A
. 
A module map showing conditional activity of expression modules in cancer
.
Nat Genet
2004
;
36
:
1090
8
.
23.
Lamb
J
,
Ramaswamy
S
,
Ford
HL
,
Contreras
B
,
Martinez
RV
,
Kittrell
FS
, et al
A mechanism of cyclin D1 action encoded in the patterns of gene expression in human cancer
.
Cell
2003
;
114
:
323
34
.
24.
Mootha
VK
,
Lindgren
CM
,
Eriksson
KF
,
Subramanian
A
,
Sihag
S
,
Lehar
J
, et al
PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes
.
Nat Genet
2003
;
34
:
267
73
.
25.
Nagrath
S
,
Sequist
LV
,
Maheswaran
S
,
Bell
DW
,
Irimia
D
,
Ulkus
L
, et al
Isolation of rare circulating tumour cells in cancer patients by microchip technology
.
Nature
2007
;
450
:
1235
9
.
26.
Keller
A
,
Leidinger
P
,
Borries
A
,
Wendschlag
A
,
Wucherpfennig
F
,
Scheffler
M
, et al
miRNAs in lung cancer–studying complex fingerprints in patient's blood cells by microarray experiments
.
BMC Cancer
2009
;
9
:
353
.