Affordable early screening in subjects with high risk of lung cancer has great potential to improve survival from this deadly disease. We measured gene expression from lung tissue and peripheral whole blood (PWB) from adenocarcinoma cases and controls to identify dysregulated lung cancer genes that could be tested in blood to improve identification of at-risk patients in the future. Genome-wide mRNA expression analysis was conducted in 153 subjects (73 adenocarcinoma cases, 80 controls) from the Environment And Genetics in Lung cancer Etiology study using PWB and paired snap-frozen tumor and noninvolved lung tissue samples. Analyses were conducted using unpaired t tests, linear mixed effects, and ANOVA models. The area under the receiver operating characteristic curve (AUC) was computed to assess the predictive accuracy of the identified biomarkers. We identified 50 dysregulated genes in stage I adenocarcinoma versus control PWB samples (false discovery rate ≤0.1, fold change ≥1.5 or ≤0.66). Among them, eight (TGFBR3, RUNX3, TRGC2, TRGV9, TARP, ACP1, VCAN, and TSTA3) differentiated paired tumor versus noninvolved lung tissue samples in stage I cases, suggesting a similar pattern of lung cancer–related changes in PWB and lung tissue. These results were confirmed in two independent gene expression analyses in a blood-based case–control study (n = 212) and a tumor–nontumor paired tissue study (n = 54). The eight genes discriminated patients with lung cancer from healthy controls with high accuracy (AUC = 0.81, 95% CI = 0.74–0.87). Our finding suggests the use of gene expression from PWB for the identification of early detection markers of lung cancer in the future. Cancer Prev Res; 4(10); 1599–608. ©2011 AACR.

Lung cancer causes more deaths than any other cancer in both men and women, with more than 160,000 deaths annually in the United States and 1 million worldwide (1). Unfortunately, the average 5-year survival rate has remained relatively stable at 15% over many decades because of minimal improvements in early detection and treatment. Noninvasive assays for detection of lung cancer at a curable stage could offer the best therapeutic option for these patients. Although promising, imaging techniques such as low-dose helical computed tomography are expensive and potentially associated to increased risk due to ionizing radiation exposure. Blood-based biomarker assays are a potentially important alternative noninvasive method to screen for lung cancer. Technological advances in methods of blood collection and RNA stabilization have only recently increased our ability to detect transcript levels in gene expression studies of human blood samples. Recent studies of gene expression from blood cells have successfully identified gene signatures for diverse exposures [e.g., tobacco smoking (2) or benzene (3)], and health conditions, including autoimmune disorders (4–6), inflammatory diseases (7), and cancer (8–11).

In our study, we first compared gene expression changes in blood between adenocarcinoma cases and noncancer controls to select the genes whose expression mostly differentiated cases from controls. We then compared this signature in paired adenocarcinoma versus noninvolved lung tissue samples to identify the subset of genes differentiating both cases–controls (blood samples) and tumor–nontumor (tissue samples). These expression changes could be specifically due to early development of cancer. Finally, we validated the overlapping gene expression signature in additional blood-based and tissue-based independent studies. If confirmed in prospective studies, gene expression changes from blood tests can provide a useful tool for the early detection of cancer in at-risk individuals.

Overview of strategy

Our study design included 3 phases. (i) First, we aimed to identify molecular changes in blood due to cancer, by comparing stage I adenocarcinoma cases (n = 26) to controls (n = 80). We restricted the analysis to stage I cases to focus on early molecular changes not affected by systematic metabolic disruption, such as weight loss or other sequelae of advanced disease. We then verified whether these gene changes in peripheral whole blood (PWB) were present also in later stages (n = 47). Because tobacco smoking is the most important risk factor for lung cancer (12) and has been associated with lung cancer progression (13), we also explored potentially distinct gene signatures by smoking groups. (ii) We then compared the blood-related gene expression signature distinguishing stage I cases from controls with the gene expression signature differentiating fresh frozen paired tumor versus noninvolved tissue samples in a subgroup (n = 15) of the same stage I cases. With this comparison, we aimed to identify expression changes in PWB due to lung cancer that paralleled changes in the target organ. (iii) Finally, we sought to validate the main results using (a) quantitative reverse transcriptase PCR (qRT-PCR) analysis from PWB for all identified genes in additional 82 stage I adenocarcinoma patients and 130 controls from the same population and (b) microarray gene expression data for all identified genes in 54 lung adenocarcinoma and noninvolved paired tissue samples from a previously published independent study (14).

Population

Individuals with lung adenocarcinoma (n = 73 for the microarray experiment; n = 82 for qRT-PCR validation) and healthy controls (n = 80 for the microarray experiment; n = 130 for qRT-PCR validation) were randomly sampled from a large, well-defined population-based case–control study, the Environment And Genetics in Lung cancer Etiology (EAGLE) study (15–21), including 2,100 consecutive incident lung cancer cases and 2,120 controls (all Caucasians) from Italy. Selected cases had histologically confirmed primary adenocarcinoma of the lung, including all stages, and controls were matched to cases by age, sex, and smoking status (never, former, and current smoking). For the validation set, we focused on current smoker stage I cases and controls. Detailed characteristics of subjects are described in Table 1.

Table 1.

Characteristics of lung adenocarcinoma patients and controls used for the gene expression analysis from peripheral whole blood, Italy, 2002–2005

Discovery setValidation set
ControlsCasesPaControlsCasesPa
Sex   0.93   0.48 
 Male 40 37  111 67  
 Female 40 36  19 15  
Age   0.73   0.46 
 40–44   
 45–49   
 50–54   
 55–59 11  16 14  
 60–64 13 13  27 15  
 65–69 13 16  31 17  
 70–74 21 19  33 18  
 75–79 10  14 10  
Cigarette smoking   0.71    
 Never 21 22   
 Former 32 27   
 Current 27 24  130 82  
Cumulative pack-years   0.44   0.76 
 Below median 31 23  63 40  
 Above median 28 28  67 39  
 Missing   
 Mean (SD) 33.7 (21.9) 34.7 (20.5)  51.2 (21.9) 50.1 (24.4)  
Cigarette per day   0.93   0.56 
 Below median 29 25  48 26  
 Above median 30 25  82 53  
 Missing   
 Mean (SD) 17.9 (10.1) 17.5 (9.1)  21.1 (7.8) 21.4 (9.3)  
Tumor stageb 
 IA  11   36  
 IB  15   46  
 IIA     
 IIB  15    
 IIIA     
 IIIB     
 IV  12    
Tumor grade 
 1     
 2  28   28  
 3  33   38  
 Missing     
Discovery setValidation set
ControlsCasesPaControlsCasesPa
Sex   0.93   0.48 
 Male 40 37  111 67  
 Female 40 36  19 15  
Age   0.73   0.46 
 40–44   
 45–49   
 50–54   
 55–59 11  16 14  
 60–64 13 13  27 15  
 65–69 13 16  31 17  
 70–74 21 19  33 18  
 75–79 10  14 10  
Cigarette smoking   0.71    
 Never 21 22   
 Former 32 27   
 Current 27 24  130 82  
Cumulative pack-years   0.44   0.76 
 Below median 31 23  63 40  
 Above median 28 28  67 39  
 Missing   
 Mean (SD) 33.7 (21.9) 34.7 (20.5)  51.2 (21.9) 50.1 (24.4)  
Cigarette per day   0.93   0.56 
 Below median 29 25  48 26  
 Above median 30 25  82 53  
 Missing   
 Mean (SD) 17.9 (10.1) 17.5 (9.1)  21.1 (7.8) 21.4 (9.3)  
Tumor stageb 
 IA  11   36  
 IB  15   46  
 IIA     
 IIB  15    
 IIIA     
 IIIB     
 IV  12    
Tumor grade 
 1     
 2  28   28  
 3  33   38  
 Missing     

aP value based on a 2-sided Wald test.

bStage definition was based on the International System for Staging Lung Cancer by the AJCC and UICC, Mountain CF 1997.

The study was approved by the Institutional Review Board of each participating institutions in Italy and by the National Cancer Institute, Bethesda, MD. All participants signed an informed consent.

Blood and tissue collection for RNA extraction

PWB was collected for all EAGLE participants (after lung cancer diagnosis and before treatment for cases, and at enrollment for controls) using the PAXgene Blood RNA System (PreAnalytiX) containing a proprietary solution that reduces RNA degradation and gene induction (22, 23). Fresh lung tissue samples were snap frozen within 20 minutes of surgical resection.

Microarray gene expression data

Data on microarray gene expression from PWB were obtained using the Affymetrix GeneChip HG-U133A v2.0. After exclusion of 2 samples with poor quality profile (see quality assessment in Supplementary Material S1), the remaining 162 samples were processed and normalized with the Robust Multichip Average (RMA) method. Corresponding CEL files and information conform to the MIAME guidelines and are publicly available on the GEO database (accession number GSE20189). Nine subjects were excluded after data normalization because of reclassification to nonadenocarcinoma morphology during histologic review. The final analyses were based on 73 adenocarcinoma cases and 80 controls. All 22,277 probe sets based on RMA summary measures were used in the analyses.

The detailed description of the gene expression examination of lung tissues in lung cancer cases in EAGLE (also based on the Affymetrix HG-U133A GeneChip) and sample inclusion and exclusion criteria have been published previously (24). For this study, we used data from paired tumor and noninvolved lung tissue samples from 15 of the same stage I adenocarcinoma cases included in the PWB-based study.

The validation lung tissue set consisted of 27 tumor and 27 noninvolved paired lung tissue samples from a previously published independent study (14). Details of the specimens, mRNA processing and hybridization, and data access are described in the relative publication (14).

qRT-PCR gene expression data

We followed the procedure described in Hu N and colleagues (25). Briefly, RNA quality and quantity was determined using the RNA 6000 LabChip/Aligent 2100 Bioanalyzer. RNA purification was done according to the manufacturer's instructions (Qiagen Inc.). After reverse transcription of RNA, all real-time PCRs were conducted using an ABI Prism 7000 Sequence Detection System with the designed primers and probes for target genes and an internal control gene, GAPDH. Each sample for each gene was run in triplicate. Quantitative methods require that PCR efficiencies be similar for all genes and 90% or more. Efficiency was measured using a standard curve generated by serial dilutions of the RNA as described in http://docs.appliedbiosystems.com/search.

Statistical analysis

  • A 2-sample t test was conducted to test whether blood RNA expression differed between cases and controls (overall and stratified by stage and by smoking status). Age, sex, and smoking variables were similarly distributed across the groups (Table 1) and were not associated with the expression of the 61 selected gene targeting probes (gene probes) among controls or cases. Analyses adjusted or unadjusted for these factors provided almost identical results. Unadjusted results are shown throughout the article. We used the Benjamini–Hochberg procedure (26) to calculate the false discovery rate (FDR) to adjust for the approximately 22,000 comparisons and only further considered results with a maximum FDR ≤ 0.1 (based on single gene probe P value threshold of 0.001). In addition, only gene probes with a fold change (FC) ≤0.66 for downregulated gene probes or ≥1.5 for upregulated gene probes were considered for follow-up in subsequent analyses.

  • Because significantly fewer hypotheses (61 probes) were tested in the following analyses, less stringent significance criteria were applied (P < 0.005). For analyses of tumor versus noninvolved paired tissues from the same subjects, a linear mixed effects model was used to account for intraperson correlation. Gene probes with P < 0.005 and same FC direction and intensity (i.e., FC ≤0.66 or ≥1.5) as in the case–control blood RNA comparison were selected for validation analyses.

  • To validate the significant results, we analyzed: (a) the qRT-PCR gene expression PWB-based data using the 2−ΔΔCt method (27) to compare cases with controls and (b) the microarray gene expression tissue-based data using linear mixed effects model to compare tumor with noninvolved paired lung tissue samples. In addition, receiver operating characteristic (ROC) analysis was done on the PWB-based validation data and the area under the curve (AUC) was estimated to assess the accuracy of the identified biomarkers, both individually and combined, in discriminating between lung cancer patients and controls.

All statistical analyses were conducted using R program language v2.10.

  • We compared mRNA expression from PWB in stage I adenocarcinoma cases versus controls, in the overall sample and stratified by smoking categories (Table 2). Two significant gene signatures in stage I cases were detected: one in the combined smokers and nonsmokers (FDR = 0.10), and the second among current smokers only (FDR = 0.15). No significant results were found within subsets of former or never-smokers (FDR = 0.97 and 1.00, respectively). However, gene expression changes significant in the analysis among current smokers showed similar, although not significant, trends in the analyses among never and former smokers (data not shown). At the same time, the analysis among current smokers revealed distinct alterations, which might be particularly important for individuals who smoke. Thus, for the comparison of stage I cases versus controls, we considered both results from all subjects and from current smokers only (221 and 144 gene probes, respectively, 81 overlapping between the two). To increase specificity, we restricted the successive analyses to gene probes with FC ≤0.66 or ≥1.5. The resulting 25 downregulated gene probes (20 genes), and 36 upregulated genes probes (30 genes), are shown in the heat map of Figure 1 and Supplementary Material S2. In general, FCs were stronger in the analysis restricted to current smokers than in the overall analysis. Because there was no significant difference between cigarette per day or cumulative pack-years between cases and controls (Table 1) and the analysis adjusted by these covariates provided almost identical results, our findings are unlikely due to differences in smoking quantity between cases and controls. We verified whether the identified 61 gene probes were also differentially expressed between cases and controls in late-stage disease. FCs were consistently stronger in the analysis limited to stage I cases but had concordant directions in all groups analyzed (Fig. 2).

  • We aimed to identify changes in gene expression related to early-stage lung cancer that are detectable in both blood cells and lung tissue cells. Thus, for the 61 gene probes (50 genes) in the analysis of stage I patients and controls (Fig. 1 and Supplementary Material S2), we compared gene expression in tumor versus paired noninvolved lung tissue samples in 15 stage I adenocarcinoma cases. We found that 10 probes from 8 genes (TGFBR3, RUNX3, TRGC2, TRGV9, TARP, ACP1, VCAN, and TSTA3) were differentially expressed (P ≤ 0.003) in tumor compared with noninvolved lung tissue samples and in the same direction and intensity as in stage I adenocarcinoma cases compared with controls (Fig. 3).

  • We validated the PWB-based gene expression differences in stage I cases compared with controls using qRT-PCR measurements of RNA extracted from PWB of additional 82 stage I adenocarcinoma patients and 130 controls from EAGLE. Each gene was covered by a single ABI probe with the exception of TARP, covered by both the TRGC2 and the TRGV9 ABI probes, because of overlap between these 3 genes. Results were strongly confirmed for all examined genes: RUNX3, TGFBR3, TRGC2/TARP, and TRGV9/TARP were significantly downregulated in stage I lung cancer patients compared with controls (FCs = 0.6, 0.5, 0.5, 0.6, P = 1.0 × 10−7, 1.4 × 10−8, 3.4 × 10−7, 2.6 × 10−6, respectively) and VCAN, ACP1, and TSTA3 were significantly upregulated in stage I lung cancer patients compared with controls (FCs = 1.2, 1.2, 1.3, P = 5.0 × 10−3, 5.0 × 10−3, 3.0 × 10−3, respectively). We then validated gene expression differences between tumor and noninvolved lung tissue samples for all 8 genes using microarray gene expression data from a previously published dataset (14), which included 27 adenocarcinoma and noninvolved paired lung tissue samples. The direction of changes was 100% consistent with our original finding: RUNX3, TGFBR3, TRGV9, TARP, and TRGC2 were significantly downregulated in tumor compared with noninvolved tissues (FCs = 0.7, 0.2, 0.3, 0.6, 0.5, P values = 0.06, 3.0 × 10−11, 4.0 × 10−7, 3.4 × 10−5, 2.5 × 10−6, respectively) and VCAN, ACP1, and TSTA3 were significantly upregulated in tumor compared with noninvolved tissues (FCs = 2.6, 1.5, 2.5, P values = 0.002, 5.7 × 10−5, 3.8 × 10−9, respectively). We evaluated the ability of PWB-based expression of each gene to discriminate lung cancer patients from controls in the validation set by means of ROC curves (Fig. 4). The AUC ranged from 0.55 (95% CI = 0.46–0.64) for ACP1 to 0.73 (95% CI = 0.66–0.81) for TGFBR3 (Fig. 4), thus indicating a reasonable discrimination power between lung cancer cases and controls for most genes when considered individually. In addition, a combination of all markers based on a logistic regression model showed the best diagnostic accuracy with an AUC of 0.81 (95% CI = 0.74–0.87, red ROC curve in Fig. 4).

Table 2.

Summary of gene expression results from peripheral whole blood comparing lung adenocarcinoma cases and controls, Italy, 2002–2005

ControlsStage I casesAll stage cases
All subjects n = 80 n = 26 n = 73 
FDRa Reference 0.10 0.03 
  Down Up Down Up 
Gene probes significant by P valueb  82 139 435 353 
Gene probes significant by P value and FCc  22 21 
Current smoker n = 27 n = 12 n = 24 
FDRa Reference 0.15 0.11 
  Down Up Down Up 
Gene probes significant by P value  100 44 117 78 
Gene probes significant by P value and FC  23 23 45 
      
Former smoker n = 32 n = 5 n = 27 
FDRa Reference 0.97 1.00 
  Down Up Down Up 
Gene probes significant by P value  14 
Gene probes significant by P value and FC  
      
Never smoker n = 21 n = 9 n = 22 
FDRa Reference 1.00 0.32 
  Down Up Down Up 
Gene probes significant by P value  11 21 43 
Gene probes significant by P value and FC  20 
      
ControlsStage I casesAll stage cases
All subjects n = 80 n = 26 n = 73 
FDRa Reference 0.10 0.03 
  Down Up Down Up 
Gene probes significant by P valueb  82 139 435 353 
Gene probes significant by P value and FCc  22 21 
Current smoker n = 27 n = 12 n = 24 
FDRa Reference 0.15 0.11 
  Down Up Down Up 
Gene probes significant by P value  100 44 117 78 
Gene probes significant by P value and FC  23 23 45 
      
Former smoker n = 32 n = 5 n = 27 
FDRa Reference 0.97 1.00 
  Down Up Down Up 
Gene probes significant by P value  14 
Gene probes significant by P value and FC  
      
Never smoker n = 21 n = 9 n = 22 
FDRa Reference 1.00 0.32 
  Down Up Down Up 
Gene probes significant by P value  11 21 43 
Gene probes significant by P value and FC  20 
      

aMaximum FDR shown in bold if ≤0.1.

bP ≤0.001.

cFC ≤0.66 or ≥1.5.

Figure 1.

Heat map of genes differentially expressed in stage I lung adenocarcinoma patients versus controls. Gene symbols and Affymetrix gene probe names are reported on the vertical axis and sample color codes are shown on the horizontal axis. Samples and gene probes are ordered according to unconditional clustering on the basis of linkage method and Spearman correlation distance. Samples are colored coded as following: stage I cases (magenta), controls (cyan), current smokers (blue), former smokers (orange), and never-smokers (yellow). The first 36 gene probes from the top are upregulated in stage I cases compared with controls. The last 25 gene probes are downregulated in stage I cases compared with controls. Subjects with different smoking status are homogeneously distributed along the horizontal axis, which indicates that smoking status does not contribute to the differential expression between cases and controls in these genes. Detailed information on these gene probes and related results are described in Supplementary Material S2.

Figure 1.

Heat map of genes differentially expressed in stage I lung adenocarcinoma patients versus controls. Gene symbols and Affymetrix gene probe names are reported on the vertical axis and sample color codes are shown on the horizontal axis. Samples and gene probes are ordered according to unconditional clustering on the basis of linkage method and Spearman correlation distance. Samples are colored coded as following: stage I cases (magenta), controls (cyan), current smokers (blue), former smokers (orange), and never-smokers (yellow). The first 36 gene probes from the top are upregulated in stage I cases compared with controls. The last 25 gene probes are downregulated in stage I cases compared with controls. Subjects with different smoking status are homogeneously distributed along the horizontal axis, which indicates that smoking status does not contribute to the differential expression between cases and controls in these genes. Detailed information on these gene probes and related results are described in Supplementary Material S2.

Close modal
Figure 2.

Comparison of gene expression changes in stage I cases versus in cases with advanced stages. Plot of FCs by case–control status for the gene probes that differed significantly (i.e., FDR ≤ 0.1 based on P ≤ 0.001) and had large FC (i.e., FC ≤0.66 or ≥1.5) in the analysis comparing stage I cases versus controls, among current smokers or among all smoking categories. Probes are listed by chromosomal location. FCs for downregulated probes are shown in the lower side of the graph (FC ≤1, open symbols) and corresponding gene symbols in the bottom x-axis. FCs for upregulated probes are shown on the upper side of the graph (FC >1, filled symbols) and corresponding gene symbols in the top x-axis. FCs were obtained comparing cases with stage I (black triangles), cases with stage II, III, and IV (dark grey squares), and cases with all stages (light grey circles) to controls. For consistency, FCs of gene probes that were significant in the analysis comparing stage I cases to controls among current smokers were computed based on current smokers also in the analyses including later stages. Analogously, FCs of gene probes that were significant in the analysis comparing stage I cases to controls among all smoking categories (i.e., GNLY, PBX1, CYP1B1, SIAH2, PF4V1, VCAN, HIST1H3H, CD36, CALD1, TSTA3, ABCA1, and TPM1) were computed on the basis of all smoking categories also in the analyses including advanced stages. Gene probes showed generally higher upregulated FCs and lower downregulated FCs in the analysis based on stage I cases (triangles) with respect to the analysis based on advanced stage cases (squares), although with concordant direction. As expected, FCs in the analyses based on all stage cases (circles) were in between those based on stage I (triangles) and those based on advanced stage (squares) cases.

Figure 2.

Comparison of gene expression changes in stage I cases versus in cases with advanced stages. Plot of FCs by case–control status for the gene probes that differed significantly (i.e., FDR ≤ 0.1 based on P ≤ 0.001) and had large FC (i.e., FC ≤0.66 or ≥1.5) in the analysis comparing stage I cases versus controls, among current smokers or among all smoking categories. Probes are listed by chromosomal location. FCs for downregulated probes are shown in the lower side of the graph (FC ≤1, open symbols) and corresponding gene symbols in the bottom x-axis. FCs for upregulated probes are shown on the upper side of the graph (FC >1, filled symbols) and corresponding gene symbols in the top x-axis. FCs were obtained comparing cases with stage I (black triangles), cases with stage II, III, and IV (dark grey squares), and cases with all stages (light grey circles) to controls. For consistency, FCs of gene probes that were significant in the analysis comparing stage I cases to controls among current smokers were computed based on current smokers also in the analyses including later stages. Analogously, FCs of gene probes that were significant in the analysis comparing stage I cases to controls among all smoking categories (i.e., GNLY, PBX1, CYP1B1, SIAH2, PF4V1, VCAN, HIST1H3H, CD36, CALD1, TSTA3, ABCA1, and TPM1) were computed on the basis of all smoking categories also in the analyses including advanced stages. Gene probes showed generally higher upregulated FCs and lower downregulated FCs in the analysis based on stage I cases (triangles) with respect to the analysis based on advanced stage cases (squares), although with concordant direction. As expected, FCs in the analyses based on all stage cases (circles) were in between those based on stage I (triangles) and those based on advanced stage (squares) cases.

Close modal
Figure 3.

Gene expression changes differentiating tumor status in both blood- and tissue-based analyses. Box plots of gene expression values (in log scale base 2) by case–control status and by tumor/noninvolved (T/NI) tissue type for 10 gene probes from Figure 1 corresponding to 8 genes whose expression changes by tumor status in PWB mirrored changes in the target organ. This analysis included 30 paired samples from 15 stage I adenocarcinoma cases. Data shown in figure included all smoking categories. Similar results were obtained for analyses restricted to current smokers.

Figure 3.

Gene expression changes differentiating tumor status in both blood- and tissue-based analyses. Box plots of gene expression values (in log scale base 2) by case–control status and by tumor/noninvolved (T/NI) tissue type for 10 gene probes from Figure 1 corresponding to 8 genes whose expression changes by tumor status in PWB mirrored changes in the target organ. This analysis included 30 paired samples from 15 stage I adenocarcinoma cases. Data shown in figure included all smoking categories. Similar results were obtained for analyses restricted to current smokers.

Close modal
Figure 4.

ROC curves based on qRT-PCR results from the PWB validation set for the 8 genes whose expression differentiated tumor status in both blood and tissue-based analyses. ROC curves show sensitivity versus specificity in discriminating between stage I lung adenocarcinoma patients and controls at different gene expression thresholds for the following models: individual RUNX3, TGFBR3, TRGC2/TARP, TRGV9/TARP (i.e., one ABI probe covered TRGC2 and TARP and another ABI probe covered TRGV9 and TARP), ACP1, VCAN, and TSTA3 and a multiplex model combining all genes. The AUC values and corresponding 95% CI (in parentheses) are reported for each predictive model.

Figure 4.

ROC curves based on qRT-PCR results from the PWB validation set for the 8 genes whose expression differentiated tumor status in both blood and tissue-based analyses. ROC curves show sensitivity versus specificity in discriminating between stage I lung adenocarcinoma patients and controls at different gene expression thresholds for the following models: individual RUNX3, TGFBR3, TRGC2/TARP, TRGV9/TARP (i.e., one ABI probe covered TRGC2 and TARP and another ABI probe covered TRGV9 and TARP), ACP1, VCAN, and TSTA3 and a multiplex model combining all genes. The AUC values and corresponding 95% CI (in parentheses) are reported for each predictive model.

Close modal

We identified a gene expression signature from blood samples consisting of 8 genes (RUNX3, TGFBR3, TRGC2, TRGV9, TARP, ACP1, VCAN, and TSTA3) that differentiates stage I lung adenocarcinoma cases from controls and mirrors cancer-related gene expression changes in the target tissue. Results were validated in additional independent sets of tissue-based and blood-based gene expression analyses of adenocarcinoma cases and controls. Although present in all stages, expression changes were weaker in advanced stages, possibly because of secondary changes due to the spread of the disease. Similarly, the changes were stronger in current smokers but present in all smoking categories. The accuracy in discriminating between stage I lung adenocarcinoma cases and controls was good for most genes when considered separately; in particular, those that were downregulated between cases and controls. A multiplex model on the basis of expression of all 8 genes combined showed a high diagnostic accuracy of 81% (Fig. 4). If further validated in prospective studies using PWB of cases drawn prior to lung cancer diagnosis (28), this gene expression signature may be used as a blood-based biomarker for early detection of lung adenocarcinoma in heavy smokers at high risk of lung cancer. We validated its use in current smokers. Further study in never- and former smokers is warranted. Moreover, it will be important to test the identified biomarkers in other lung cancer histologies.

The identified genes are promising with regard to potential mechanistic relevance. RUNX3 (runt-related transcription factor 3), downregulated in our analyses and with an AUC of 0.69, is involved in the negative regulation of epithelial cell proliferation, functions as a tumor suppressor, and is frequently deleted or transcriptionally silenced in cancer. Hypermethylation of RUNX3 has also been associated with the evolution of lung cancer (29) and, specifically, of lung adenocarcinoma (30). In addition, higher protein expression of RUNX3 has been associated with increased survival from lung adenocarcinoma (31). TGFBR3 (TGF-beta receptor III) encodes a glycoprotein that binds TGFB, a cytokine that modulates several tissue development and repair processes. TGFBR3 is the TGF-beta component most commonly downregulated at both the message and protein levels in several cancers (32–36), including non–small cell lung cancer (37). Our study is the first to show downregulation of TGFBR3 mRNA expression in both blood and tumor tissue cells of lung adenocarcinoma patients. TGFBR3 showed the highest accuracy among the single-gene models in discriminating cases from controls (AUC = 0.73). TRGC2 (T-cell receptor gamma constant 2), TRGV9 (T-cell receptor gamma variable 9), and TARP (T-cell receptor gamma alternate reading frame protein) are colocalized at chromosome locus 7p14.1, close to the 7p14.3 chromosomal region that frequently shows allelic loss in non–small cell lung cancer (38). TARP is embedded within the TCR gamma locus and cDNA that detect TCR gamma mRNA also detect TARP mRNA. Accordingly, probes in TRGC2, TRGV9, and TARP showed very similar results in our study. TRGV9 cells have been shown to contribute to the natural immune surveillance against colon cancers (39). TARP has been previously studied as a prostate-specific gene and an androgen-regulated protein that may carry out its biological functions via action on mitochondria (40). Downregulation in cases with respect to controls and in tumor compared with noninvolved tissues of TRGC2, TRGV9, and TARP points to an immune-related alteration as a possible contribution to lung adenocarcinoma development. Case–control discrimination on the basis of TRGC2, TRGV9, and TARPwas also good (average AUC = 0.70). ACP1 (acid phosphatase 1) gene, upregulated in our analysis, is polymorphic and encodes at least 2 electrophoretically different isozymes. An increase of fast isozyme concentration increases invasiveness of cancer cells, whereas a decrease of slow isozyme concentration in cancer results in cancer cell proliferation (41). In the validation set, ACP1 showed the poorest accuracy in discriminating cases and controls (AUC = 0.55). VCAN (versican) encodes a protein involved in cell adhesion, proliferation, migration, angiogenesis, tissue morphogenesis, and maintenance. VCAN was initially identified in cultures of lung fibroblasts (42) and has been recognized to play a role in the invasion of several cancers (43) including lung cancer (44). VCAN mRNA expression was upregulated in both lung tumor tissue and PWB of adenocarcinoma cases in our study. TSTA3 (tissue-specific transplantation antigen P35B) gene, also upregulated in our analysis, is involved in the expression of many glycoconjugates. Intriguingly, TSTA3 is located in chromosomal region 8q24, which contains several polymorphic variants recently associated with several cancers (45–47). VCAN and TSTA3 also showed a reasonable performance in discriminating between cases and controls (AUC = 0.61 and 0.59, respectively). In addition to the described 8 genes, we also identified 42 additional genes whose expression in PWB distinguishes stage I lung adenocarcinoma from controls (Fig. 1 and Supplementary Material S2) and was stronger among subjects who currently smoked. If further confirmed in additional blood-based analyses, these genes could also contribute to the detection of early lung adenocarcinoma lesions.

In conclusion, gene expression changes from peripheral blood samples can differentiate early-stage lung adenocarcinoma cases from controls and resemble gene expression changes in early-stage lung adenocarcinoma tissue. This finding suggests that early processes of lung adenocarcinoma development may lead to systemic alterations that can be detected in peripheral blood tests. Gene expression from PWB can provide an important tool for the identification of early detection markers of cancer in the future.

No potential conflicts of interest were disclosed.

We thank the EAGLE participants and study collaborators listed on the EAGLE website (http://www.eagle.cancer.gov/).

This research was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Jemal
A
,
Siegel
R
,
Ward
E
,
Hao
Y
,
Xu
J
,
Murray
T
, et al
Cancer statistics, 2008
.
CA Cancer J Clin
2008
;
58
:
71
96
.
2.
Charles
PC
,
Alder
BD
,
Hilliard
EG
,
Schisler
JC
,
Lineberger
RE
,
Parker
JS
, et al
Tobacco use induces anti-apoptotic, proliferative patterns of gene expression in circulating leukocytes of Caucasian males
.
BMC Med Genomics
2008
;
1
:
38
.
3.
Forrest
MS
,
Lan
Q
,
Hubbard
AE
,
Zhang
L
,
Vermeulen
R
,
Zhao
X
, et al
Discovery of novel biomarkers by microarray analysis of peripheral blood mononuclear cell gene expression in benzene-exposed workers
.
Environ Health Perspect
2005
;
113
:
801
7
.
4.
Baechler
EC
,
Batliwalla
FM
,
Karypis
G
,
Gaffney
PM
,
Ortmann
WA
,
Espe
KJ
, et al
Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus
.
Proc Natl Acad Sci U S A
2003
;
100
:
2610
5
.
5.
Bomprezzi
R
,
Ringner
M
,
Kim
S
,
Bittner
ML
,
Khan
J
,
Chen
Y
, et al
Gene expression profile in multiple sclerosis patients and healthy controls: identifying pathways relevant to disease
.
Hum Mol Genet
2003
;
12
:
2191
9
.
6.
Bovin
LF
,
Rieneck
K
,
Workman
C
,
Nielsen
H
,
Sorensen
SF
,
Skjodt
H
, et al
Blood cell gene expression profiling in rheumatoid arthritis. Discriminative genes and effect of rheumatoid factor
.
Immunol Lett
2004
;
93
:
217
26
.
7.
Oudijk
EJ
,
Nijhuis
EH
,
Zwank
MD
,
van de Graaf
EA
,
Mager
HJ
,
Coffer
PJ
, et al
Systemic inflammation in COPD visualised by gene profiling in peripheral blood neutrophils
.
Thorax
2005
;
60
:
538
44
.
8.
Burczynski
ME
,
Twine
NC
,
Dukart
G
,
Marshall
B
,
Hidalgo
M
,
Stadler
WM
, et al
Transcriptional profiles in peripheral blood mononuclear cells prognostic of clinical outcomes in patients with advanced renal cell carcinoma
.
Clin Cancer Res
2005
;
11
:
1181
9
.
9.
Moos
PJ
,
Raetz
EA
,
Carlson
MA
,
Szabo
A
,
Smith
FE
,
Willman
C
, et al
Identification of gene expression profiles that segregate patients with childhood leukemia
.
Clin Cancer Res
2002
;
8
:
3118
30
.
10.
Showe
MK
,
Vachani
A
,
Kossenkov
AV
,
Yousef
M
,
Nichols
C
,
Nikonova
EV
, et al
Gene expression profiles in peripheral blood mononuclear cells can distinguish patients with non-small cell lung cancer from patients with nonmalignant lung disease
.
Cancer Res
2009
;
69
:
9202
10
.
11.
Xu
T
,
Shu
CT
,
Purdom
E
,
Dang
D
,
Ilsley
D
,
Guo
Y
, et al
Microarray analysis reveals differences in gene expression of circulating CD8(+) T cells in melanoma patients and healthy donors
.
Cancer Res
2004
;
64
:
3661
7
.
12.
Youlden
DR
,
Cramb
SM
,
Baade
PD
. 
The international epidemiology of lung cancer: geographical distribution and secular trends
.
J Thorac Oncol
2008
;
3
:
819
31
.
13.
Guo
NL
,
Tosun
K
,
Horn
K
. 
Impact and interactions between smoking and traditional prognostic factors in lung cancer progression
.
Lung Cancer
2009
;
66
:
386
92
.
14.
Su
LJ
,
Chang
CW
,
Wu
YC
,
Chen
KC
,
Lin
CJ
,
Liang
SC
, et al
Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme
.
BMC Genomics
2007
;
8
:
140
.
15.
Bagnardi
V
,
Randi
G
,
Lubin
J
,
Consonni
D
,
Lam
TK
,
Subar
AF
, et al
Alcohol consumption and lung cancer risk in the Environment And Genetics in Lung cancer Etiology (EAGLE) study
.
Am J Epidemiol
2010
;
171
:
36
44
.
16.
Consonni
D
,
De Matteis
S
,
Lubin
JH
,
Wacholder
S
,
Tucker
M
,
Pesatori
AC
, et al
Lung cancer and occupation in a population-based case-control study
.
Am J Epidemiol
2010
;
171
:
323
33
.
17.
Gao
Y
,
Goldstein
AM
,
Consonni
D
,
Pesatori
AC
,
Wacholder
S
,
Tucker
MA
, et al
Family history of cancer and nonmalignant lung diseases as risk factors for lung cancer
.
Int J Cancer
2009
;
125
:
146
52
.
18.
Koshiol
J
,
Rotunno
M
,
Consonni
D
,
Pesatori
AC
,
De
MS
,
Goldstein
AM
, et al
Chronic obstructive pulmonary disease and altered risk of lung cancer in a population-based case-control study
.
PLoS One
2009
;
4
:
e7380
.
19.
Lam
TK
,
Rotunno
M
,
Lubin
JH
,
Wacholder
S
,
Consonni
D
,
Pesatori
AC
, et al
Dietary quercetin, quercetin-gene interaction, metabolic gene expression in lung tissue and lung cancer risk
.
Carcinogenesis
2010
;
31
:
634
42
.
20.
Landi
MT
,
Consonni
D
,
Rotunno
M
,
Bergen
AW
,
Goldstein
AM
,
Lubin
JH
, et al
Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case-control study of lung cancer
.
BMC Public Health
2008
;
8
:
203
.
21.
Rotunno
M
,
Yu
K
,
Lubin
JH
,
Consonni
D
,
Pesatori
AC
,
Goldstein
AM
, et al
Phase I metabolic genes and risk of lung cancer: multiple polymorphisms and mRNA expression
.
PLoS One
2009
;
4
:
e5652
.
22.
Muller
MC
,
Merx
K
,
Weisser
A
,
Kreil
S
,
Lahaye
T
,
Hehlmann
R
, et al
Improvement of molecular monitoring of residual disease in leukemias by bedside RNA stabilization
.
Leukemia
2002
;
16
:
2395
9
.
23.
Rainen
L
,
Oelmueller
U
,
Jurgensen
S
,
Wyrich
R
,
Ballas
C
,
Schram
J
, et al
Stabilization of mRNA expression in whole blood samples
.
Clin Chem
2002
;
48
:
1883
90
.
24.
Landi
MT
,
Dracheva
T
,
Rotunno
M
,
Figueroa
JD
,
Liu
H
,
Dasgupta
A
, et al
Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival
.
PLoS One
2008
;
3
:
e1651
.
25.
Hu
N
,
Qian
L
,
Hu
Y
,
Shou
J
,
Wang
C
,
Giffen
C
, et al
Quantitative real-time RT-PCR validation of differential mRNA expression of SPARC, FADD, Fascin, COL7A1, CK4, TGM3, ECM1, PPL and EVPL in esophageal squamous cell carcinoma
.
BMC Cancer
2006
;
6
:
33
.
26.
Benjamini
Y
,
Hochberg
Y
. 
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J Roy Stat Soc B
2005
;
57
:
289
300
.
27.
Livak
KJ
,
Schmittgen
TD
. 
Analysis of relative gene expression data using real-time quantitative PCR and the 2(-delta delta C(T)) method
.
Methods
2001
;
25
:
402
8
.
28.
Pepe
MS
,
Etzioni
R
,
Feng
Z
,
Potter
JD
,
Thompson
ML
,
Thornquist
M
, et al
Phases of biomarker development for early detection of cancer
.
J Natl Cancer Inst
2001
;
93
:
1054
61
.
29.
Li
QL
,
Kim
HR
,
Kim
WJ
,
Choi
JK
,
Lee
YH
,
Kim
HM
, et al
Transcriptional silencing of the RUNX3 gene by CpG hypermethylation is associated with lung cancer
.
Biochem Biophys Res Commun
2004
;
314
:
223
8
.
30.
Sato
K
,
Tomizawa
Y
,
Iijima
H
,
Saito
R
,
Ishizuka
T
,
Nakajima
T
, et al
Epigenetic inactivation of the RUNX3 gene in lung cancer
.
Oncol Rep
2006
;
15
:
129
35
.
31.
Araki
K
,
Osaki
M
,
Nagahama
Y
,
Hiramatsu
T
,
Nakamura
H
,
Ohgi
S
, et al
Expression of RUNX3 protein in human lung adenocarcinoma: implications for tumor progression and prognosis
.
Cancer Sci
2005
;
96
:
227
31
.
32.
Copland
JA
,
Davies
PJ
,
Shipley
GL
,
Wood
CG
,
Luxon
BA
,
Urban
RJ
. 
The use of DNA microarrays to assess clinical samples: the transition from bedside to bench to bedside
.
Recent Prog Horm Res
2003
;
58
:
25
53
.
33.
Dong
M
,
How
T
,
Kirkbride
KC
,
Gordon
KJ
,
Lee
JD
,
Hempel
N
, et al
The type III TGF-beta receptor suppresses breast cancer progression
.
J Clin Invest
2007
;
117
:
206
17
.
34.
Gordon
KJ
,
Dong
M
,
Chislock
EM
,
Fields
TA
,
Blobe
GC
. 
Loss of type III transforming growth factor beta receptor expression increases motility and invasiveness associated with epithelial to mesenchymal transition during pancreatic cancer progression
.
Carcinogenesis
2008
;
29
:
252
62
.
35.
Hempel
N
,
How
T
,
Dong
M
,
Murphy
SK
,
Fields
TA
,
Blobe
GC
. 
Loss of betaglycan expression in ovarian cancer: role in motility and invasion
.
Cancer Res
2007
;
67
:
5231
8
.
36.
Sharifi
N
,
Hurt
EM
,
Kawasaki
BT
,
Farrar
WL
. 
TGFBR3 loss and consequences in prostate cancer
.
Prostate
2007
;
67
:
301
11
.
37.
Finger
EC
,
Turley
RS
,
Dong
M
,
How
T
,
Fields
TA
,
Blobe
GC
. 
TbetaRIII suppresses non-small cell lung cancer invasiveness and tumorigenicity
.
Carcinogenesis
2008
;
29
:
528
35
.
38.
Tseng
RC
,
Chang
JW
,
Hsien
FJ
,
Chang
YH
,
Hsiao
CF
,
Chen
JT
, et al
Genomewide loss of heterozygosity and its clinical associations in non small cell lung cancer
.
Int J Cancer
2005
;
117
:
241
7
.
39.
Corvaisier
M
,
Moreau-Aubry
A
,
Diez
E
,
Bennouna
J
,
Mosnier
JF
,
Scotet
E
, et al
V gamma 9V delta 2 T cell response to colon carcinoma cells
.
J Immunol
2005
;
175
:
5481
8
.
40.
Maeda
H
,
Nagata
S
,
Wolfgang
CD
,
Bratthauer
GL
,
Bera
TK
,
Pastan
I
. 
The T cell receptor gamma chain alternate reading frame protein TARP, a prostate-specific protein localized in mitochondria
.
J Biol Chem
2004
;
279
:
24561
8
.
41.
Alho
I
,
Clara
BM
,
Carvalho
R
,
da Silva
AP
,
Costa
L
,
Bicho
M
. 
Low molecular weight protein tyrosine phosphatase genetic polymorphism and susceptibility to cancer development
.
Cancer Genet Cytogenet
2008
;
181
:
20
4
.
42.
Zimmermann
DR
,
Ruoslahti
E
. 
Multiple domains of the large fibroblast proteoglycan, versican
.
EMBO J
1989
;
8
:
2975
81
.
43.
Ricciardelli
C
,
Sakko
AJ
,
Ween
MP
,
Russell
DL
,
Horsfall
DJ
. 
The biological role and regulation of versican levels in cancer
.
Cancer Metastasis Rev
2009
;
28
:
233
45
.
44.
Pirinen
R
,
Leinonen
T
,
Bohm
J
,
Johansson
R
,
Ropponen
K
,
Kumpulainen
E
, et al
Versican in nonsmall cell lung cancer: relation to hyaluronan, clinicopathologic factors, and prognosis
.
Hum Pathol
2005
;
36
:
44
50
.
45.
Freedman
ML
,
Haiman
CA
,
Patterson
N
,
McDonald
GJ
,
Tandon
A
,
Waliszewska
A
, et al
Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men
.
Proc Natl Acad Sci U S A
2006
;
103
:
14068
73
.
46.
Schumacher
FR
,
Feigelson
HS
,
Cox
DG
,
Haiman
CA
,
Albanes
D
,
Buring
J
, et al
A common 8q24 variant in prostate and breast cancer from a large nested case-control study
.
Cancer Res
2007
;
67
:
2951
6
.
47.
Yeager
M
,
Chatterjee
N
,
Ciampa
J
,
Jacobs
KB
,
Gonzalez-Bosquet
J
,
Hayes
RB
, et al
Identification of a new prostate cancer susceptibility locus on chromosome 8q24
.
Nat Genet
2009
;
41
:
1055
7
.