Abstract
Purpose: CT screening can reduce death from lung cancer. We sought to improve the diagnostic accuracy of lung cancer screening using ultrasensitive methods and a lung cancer–specific gene panel to detect DNA methylation in sputum and plasma.
Experimental Design: This is a case–control study of subjects with suspicious nodules on CT imaging. Plasma and sputum were obtained preoperatively. Cases (n = 150) had pathologic confirmation of node-negative (stages I and IIA) non–small cell lung cancer. Controls (n = 60) had non-cancer diagnoses. We detected promoter methylation using quantitative methylation-specific real-time PCR and methylation-on-beads for cancer-specific genes (SOX17, TAC1, HOXA7, CDO1, HOXA9, and ZFP42).
Results: DNA methylation was detected in plasma and sputum more frequently in people with cancer compared with controls (P < 0.001) for five of six genes. The sensitivity and specificity for lung cancer diagnosis using the best individual genes was 63% to 86% and 75% to 92% in sputum, respectively, and 65% to 76% and 74% to 84% in plasma, respectively. A three-gene combination of the best individual genes has sensitivity and specificity of 98% and 71% using sputum and 93% and 62% using plasma. Area under the receiver operating curve for this panel was 0.89 [95% confidence interval (CI), 0.80–0.98] in sputum and 0.77 (95% CI, 0.68–0.86) in plasma. Independent blinded random forest prediction models combining gene methylation with clinical information correctly predicted lung cancer in 91% of subjects using sputum detection and 85% of subjects using plasma detection.
Conclusions: High diagnostic accuracy for early-stage lung cancer can be obtained using methylated promoter detection in sputum or plasma. Clin Cancer Res; 23(8); 1998–2005. ©2016 AACR.
The National Lung Screening Trial demonstrated a 20% reduction in lung cancer mortality using low-dose CT screening. Diagnostic accuracy of screening could be improved using cancer-specific biomarkers from sputum and plasma. We developed methylation-on-beads (MOB), reducing sample loss with potentially increased sensitivity. We used MOB and real-time quantitative methylation-specific PCR (qMSP) to detect the promoter methylation using genes frequently methylated in SOX17, TAC1, HOXA7, CDO1, HOXA9, and ZFP42. This study demonstrates that high diagnostic accuracy of early-stage NSCLC can be obtained using a panel of methylated promoter genes in plasma and sputum and that the methylation level of these genes is associated with a high lung cancer risk independent of age, pack-year, and nodule size. This panel could be used to identify patients at high risk for lung cancer, reducing false-positive results, unnecessary tests, and improving the diagnosis of lung cancer at an earlier stage.
Introduction
The National Lung Screening Trial (NLST) demonstrated a 20% reduction in lung cancer mortality using low-dose CT screening (1). This survival benefit comes at the price of detecting many indeterminate pulmonary nodules with an overall false-positive rate of 96.4% (1, 2). The likelihood that a nodule is malignant increases with size (3), with a challenge in management for the indeterminate nodules from 7 to 29 mm, with a risk of malignancy between 1.7% and 22% (3). This has led to a cautious adoption of CT screening, because complications, and even deaths, result from further diagnostic procedures (4). One approach to improve the specificity of CT screening involves the use of cancer-specific biomarkers from sputum and plasma. Previous studies have examined DNA methylation as a biomarker for cancer risk, but limited sensitivity and/or specificity were insufficient for lung cancer screening (5–16).
Reduced sensitivity of methylation detection may occur from technical limitations. Extraction methods for DNA have been inefficient for small amounts of DNA (17, 18), a particular problem for bodily fluids. We recently developed methylation-on-beads (MOB), which reduces sample loss thereby potentially increasing sensitivity (19, 20). Another issue for detection is the use of loci with low frequencies of altered DNA methylation, leading to an inability to detect changes in biofluids. We recently identified 6 genes (SOX17, TAC1, HOXA7, CDO1, HOXA9, ZFP42) using The Cancer Genome Atlas (TCGA; ref. 21) with highly prevalent DNA methylation in lung squamous and adenocarcinoma, but not in normal lung tissue (22, 23) one of which (CDO1) has been described elsewhere (22, 23). These were chosen solely on the basis of high frequency cancer-specific methylation and developed into assays using MOB and real-time methylation-specific PCR (qMSP) to determine the diagnostic accuracy for lung cancer detection in sputum and plasma.
Materials and Methods
Study population
The study population consists of a prospective, observational cohort of 651 participants, initiated in 2007 within the Johns Hopkins Lung Cancer Specialized Program of Research Excellence (SPORE). From this cohort, 210 study patients had early-stage node-negative tumors (T1–T2N0) and samples adequate for analysis. Institutional Review Board approval was obtained prior to study initiation (NA_00005998), and all patients signed informed consent. Surgical resection with curative intent and pathologic analyses of suspected lung cancer lesions were completed in all patients and staged according to revised TNM guidelines classification criteria (24). Cases had pathologically confirmed lung cancer. Controls were defined as patients histologically confirmed not to have cancer. Pack-years of cigarette smoking was defined as the average number of packs smoked per day times the number of years smoked. Nodule size was obtained from the pathologic report, and nodule volume was calculated using the ellipsoid volume formula (Volume = 4/3 × π × radius A × radius B × radius C).
Plasma and sputum collection
Prior to surgery, 20 mL of plasma was collected in tubes containing sodium heparin (Becton Dickinson) and then stored at −80°C. For sputum collection, 2 cups containing Saccomanno's fixative solution were used for each patient as previously described (8, 11, 25). Subjects were asked to provide an early morning spontaneous sputum at home in 2 cups for 3 consecutive days within 1 week prior to pulmonary resection (11, 26). Five milliliters of sputum was collected, washed with Saccomanno's solution, vortexed, centrifuged, and stored at −80°C (8).
DNA isolation and bisulfite conversion
DNA extraction from tumor, plasma, and sputum was performed using MOB, a process that allows DNA extraction and bisulfite conversion in a single tube via the use of silica super magnetic beads (20). This approach yields a 1.5- to 5-fold improvement in extraction efficiency compared with traditional conventional techniques (27). We optimized the protocol previously described for plasma (27), using 1.5 mL of plasma and 375 μL (800 units/mL, NEBL p8107s) of proteinase K. For DNA extraction from sputum, we modified the protocol used for plasma by adding 200 μL of sample to 300 μL of Buffer AL and 40 μL of proteinase K and by incubating them together at the same temperature (50°C for 2 hours). After digestion, 300 μL of isopropyl alcohol (IPA) and 150 μL of beads were added. The lysate was incubated and rotated for 10 minutes before adding 5 μL of carrier RNA and incubating for an additional 5 minutes (27).
DNA methylation analysis
The genomic sequence for the genes and 1,000 bases upstream was obtained from the UCSC genomic browser website (28). The primers and hybridization probes for methylation analysis were designed on the basis of this sequence by using Primer3 (v.0.4.0; refs. 29, 30). All primer and probe sequences are listed in Supplementary Table S1. The analysis was performed using real-time qMSP and normalized to a control β-actin assay (18). Each reaction was performed in a 25-μL PCR mixture consisting of 2 μL of bisulfite-converted DNA, 300 nmol/L R-sense primer, 300 nmol/L F-anti-sense primer, 100 nmol/L probe, 100 nmol/L of fluorescein reference dye (Life Technologies), 1.67 mmol/L dNTPs (VWRQuotation), and 1 μL of Platinum Taq DNA Polymerase (Invitrogen). Master mix contained 16.6 mmol/L (NH4)2SO4, 67 mmol/L Tris, pH 8.8, 6.7 mmol/L MgCl2, and 10 mmol/L β-mercaptoethanol in a nuclease-free deionized water solution. Amplification reactions were performed using 96-well plates (MicroAmp) in triplicate. Thermocycling conditions were: 95°C for 5 minutes, 50 cycles at 95°C for 15 seconds, and 65°C for 1 minute and 72°C for 1 minute. An ABI StepOnePlus Real-Time PCR system was used (Applied BioSystems, examples shown in Supplementary Fig. S1).
With the extremely low levels of DNA methylation in plasma and sputum, replicates for some samples produced no detectable methylation as expected. To incorporate this information into the final quantification of methylation, we calculated the 2−ΔCt for each methylation detection replicate comparing it to the mean Ct for β-actin (ACTB). For replicates which were not detected (ND), a CT of 100 was used, creating a near zero value for 2−ΔCt. The mean 2−ΔCt value was calculated with the formula:
Statistical analysis
Quantitative data are expressed as median (interquartile range) for continuous, nonparametric variables and frequency (percentage) for categorical variables. For intergroup comparison, the Wilcoxon rank-sum test was used for continuous data and the Fisher exact test for categorical data.
Data were analyzed using 2 approaches. The first approach is the receiver operating curve (ROC) analysis using the 2−ΔCt values for individual genes to determine the performance of each individual marker (R statistic software, version 3.0.2; ref. 31). The area under the curve was reported with 95% confidence intervals (CI). The 3 best-performing genes were selected for diagnostic accuracy for lung cancer detection, on the basis of ROC curves and were used for combined detection. Sensitivity and specificity values were obtained from the presence or absence of detectable methylation as a cutoff.
The second approach utilized a nonparametric machine learning method, random forest, to estimate the prediction accuracy in an independent validation dataset by combining the methylation data and clinical risk factors: nodule size, age, pack-year, chronic obstructive pulmonary disease (COPD) status, and forced vital capacity (FVC) values. Subjects were randomly selected as a training set (67%) and a test set (33%). A statistician (P. Huang), blinded to the diagnoses of the test set, used the training set to build 3 random forest prediction models: (i) used sputum, clinical, and demographic variables, (ii) used plasma and clinical variables, and (iii) used only clinical and demographic variables. The random forest model consisted of 5,000 trees, each using a random sample of the training data. The remaining training data were used for internal cross-validation. Each random forest model provides 2 predictions: the cancer status (a binary prediction) and the probability of cancer (a continuous prediction). The 2 random forest models were then applied to the test set data. Prediction accuracy was reported as the proportion of test set subjects correctly predicted by the random forest classification models, allowing calculation of sensitivity, specificity, and ROC analysis.
Results
Characteristics of the patients
Two hundred and ten patients fulfilled inclusion criteria, with 150 node-negative early-stage lung cancer subjects and 60 controls with non-cancerous lung lesions (Table 1). Clinical and demographic variables were similar in cases and controls with the exception of age, number of pack-year, and nodule size (cm) as well as volume (cm3). Subjects with lung cancer were older than controls (67 vs. 73 years, P = 0.007), smoked more (30 vs. 19.5 pack-years, P = 0.01), and had larger nodules (2.0 vs. 1.5 cm, P = 0.01). The proportion of smokers, former smokers, and never smokers was not different between cases and controls.
Patient characteristics . | Cancer (N = 150) . | Control (N = 60) . | P . |
---|---|---|---|
Age at surgery (IQR), y | 68 (62–75) | 63 (55–73) | 0.007 |
Gender | |||
Male (%) | 63 (42%) | 33 (55%) | 0.094 |
Female (%) | 87 (58%) | 27 (45%) | |
Race | |||
White (%) | 120 (80%) | 51 (85%) | 0.087 |
Black (%) | 19 (13%) | 3 (5%) | |
Other (%) | 11 (7%) | 6 (10%) | |
Stage | |||
IA–IB (%) | 136 (91%) | NA | NA |
IIA (%) | 14 (9%) | NA | |
Histology | |||
Adenocarcinoma (%) | 121 (81%) | NA | NA |
Squamous cell (%) | 26 (17%) | NA | |
Adenosquamous (%) | 3 (2%) | NA | |
Smoking status | |||
Current (%) | 27 (18%) | 7 (12%) | 0.176 |
Former (%) | 87 (58%) | 34 (57%) | |
Never (%) | 31 (21%) | 19 (32%) | |
Pack-year (IQR) | 30 (10–50) | 20 (0–35) | 0.010 |
COPD (%) | 41 (27%) | 12 (20%) | 0.370 |
FEV1 % predicted (IQR) | 84 (70–99) | 85 (70–100) | 0.861 |
FVC % predicted (IQR) | 92 (80–103) | 87 (80–110) | 0.682 |
FEV1/FVC % ratio (IQR) | 73 (68–78) | 77 (70–79) | 0.080 |
Nodule size, cm | 2 (1.5–3) | 1.5 (1.1–3) | 0.01 |
<1 | 6 (4%) | 13 (22%) | 0.001 |
1–2 | 52 (35%) | 19 (32%) | |
>2 | 92 (61%) | 28 (47%) | |
Nodule volume, cm3 | 4.19 (1.77–14–14) | 1.6 (0.52–18.12) | 0.001 |
Patient characteristics . | Cancer (N = 150) . | Control (N = 60) . | P . |
---|---|---|---|
Age at surgery (IQR), y | 68 (62–75) | 63 (55–73) | 0.007 |
Gender | |||
Male (%) | 63 (42%) | 33 (55%) | 0.094 |
Female (%) | 87 (58%) | 27 (45%) | |
Race | |||
White (%) | 120 (80%) | 51 (85%) | 0.087 |
Black (%) | 19 (13%) | 3 (5%) | |
Other (%) | 11 (7%) | 6 (10%) | |
Stage | |||
IA–IB (%) | 136 (91%) | NA | NA |
IIA (%) | 14 (9%) | NA | |
Histology | |||
Adenocarcinoma (%) | 121 (81%) | NA | NA |
Squamous cell (%) | 26 (17%) | NA | |
Adenosquamous (%) | 3 (2%) | NA | |
Smoking status | |||
Current (%) | 27 (18%) | 7 (12%) | 0.176 |
Former (%) | 87 (58%) | 34 (57%) | |
Never (%) | 31 (21%) | 19 (32%) | |
Pack-year (IQR) | 30 (10–50) | 20 (0–35) | 0.010 |
COPD (%) | 41 (27%) | 12 (20%) | 0.370 |
FEV1 % predicted (IQR) | 84 (70–99) | 85 (70–100) | 0.861 |
FVC % predicted (IQR) | 92 (80–103) | 87 (80–110) | 0.682 |
FEV1/FVC % ratio (IQR) | 73 (68–78) | 77 (70–79) | 0.080 |
Nodule size, cm | 2 (1.5–3) | 1.5 (1.1–3) | 0.01 |
<1 | 6 (4%) | 13 (22%) | 0.001 |
1–2 | 52 (35%) | 19 (32%) | |
>2 | 92 (61%) | 28 (47%) | |
Nodule volume, cm3 | 4.19 (1.77–14–14) | 1.6 (0.52–18.12) | 0.001 |
NOTE: Nodule size % <1, 1–2, >2 cm.
Abbreviations: FEV1, forced expiratory volume in 1 second; IQR, interquartile range.
Detection of DNA methylation
We first measured DNA methylation for these genes in tumor tissue, confirming our previous study suggesting these genes were methylated in the majority of lung tumors (Fig. 1). Methylation in sputum was detected more frequently in all 6 genes in patients with cancer compared with controls (Fig. 1), which for some patients was quantitatively similar to lung tumor tissues, but in some cases was at levels previously below conventional methods of detection. For 5 of the 6 genes (SOX17, TAC1, HOXA7, CDO1, and ZFP42), this was statistically significant (P < 0.001), whereas HOXA9 showed a lack of specificity. Methylation of all 6 genes was detected more frequently in plasma in cases compared with controls (P < 0.001). The worst performing gene was HOXA9 in plasma, which showed a lack of specificity as was seen in the sputum. We determined the sensitivity and specificity in this cohort using the presence or absence of detectable methylation as a cutoff, without considering the quantitation of methylation. This resulted in good sensitivity and specificities (Table 2), showing that the sensitivity and specificity for lung cancer diagnosis using individual genes from sputum ranged from 63% to 93% and 42% to 92%, respectively, and from plasma from 33% to 91% and 52% to 94%, respectively.
. | Cancer (N = 90) . | Control (N = 24) . | . | . | ||
---|---|---|---|---|---|---|
Sputum . | n . | Sensitivity . | n . | Specificity . | PPV . | NPV . |
SOX17 | 76 | 84% | 3 | 88% | 96% | 60% |
TAC1 | 77 | 86% | 6 | 75% | 93% | 58% |
HOXA7 | 57 | 63% | 2 | 92% | 97% | 40% |
CDO1 | 70 | 78% | 8 | 67% | 90% | 44% |
HOXA9 | 84 | 93% | 22 | 8% | 79% | 25% |
ZFP42 | 78 | 87% | 9 | 63% | 90% | 56% |
TAC1, HOXA7, SOX17 | 88 | 98% | 7 | 71% | 93% | 89% |
Cancer (N = 125) | Control (N = 50) | |||||
Plasma | n | Sensitivity | n | Specificity | PPV | NPV |
SOX17 | 91 | 73% | 8 | 84% | 92% | 55% |
TAC1 | 95 | 76% | 11 | 78% | 90% | 57% |
HOXA7 | 42 | 34% | 4 | 92% | 91% | 36% |
CDO1 | 81 | 65% | 13 | 74% | 86% | 46% |
HOXA9 | 108 | 86% | 27 | 46% | 80% | 58% |
ZFP42 | 105 | 84% | 23 | 54% | 82% | 57% |
CDO1, TAC1, SOX17 | 116 | 93% | 19 | 62% | 86% | 78% |
. | Cancer (N = 90) . | Control (N = 24) . | . | . | ||
---|---|---|---|---|---|---|
Sputum . | n . | Sensitivity . | n . | Specificity . | PPV . | NPV . |
SOX17 | 76 | 84% | 3 | 88% | 96% | 60% |
TAC1 | 77 | 86% | 6 | 75% | 93% | 58% |
HOXA7 | 57 | 63% | 2 | 92% | 97% | 40% |
CDO1 | 70 | 78% | 8 | 67% | 90% | 44% |
HOXA9 | 84 | 93% | 22 | 8% | 79% | 25% |
ZFP42 | 78 | 87% | 9 | 63% | 90% | 56% |
TAC1, HOXA7, SOX17 | 88 | 98% | 7 | 71% | 93% | 89% |
Cancer (N = 125) | Control (N = 50) | |||||
Plasma | n | Sensitivity | n | Specificity | PPV | NPV |
SOX17 | 91 | 73% | 8 | 84% | 92% | 55% |
TAC1 | 95 | 76% | 11 | 78% | 90% | 57% |
HOXA7 | 42 | 34% | 4 | 92% | 91% | 36% |
CDO1 | 81 | 65% | 13 | 74% | 86% | 46% |
HOXA9 | 108 | 86% | 27 | 46% | 80% | 58% |
ZFP42 | 105 | 84% | 23 | 54% | 82% | 57% |
CDO1, TAC1, SOX17 | 116 | 93% | 19 | 62% | 86% | 78% |
Gene methylation and lung cancer diagnostic accuracy
ROC curves for lung cancer detection were obtained for each single gene; using the normalized methylation ΔCt values calculated as described in Materials and Methods (Supplementary Table S2, ROC curves in Supplementary Figs. S2 and S3). By determining the best quantitative cutoff, the sensitivity and specificity for lung cancer diagnosis from single methylated genes in sputum ranged 63% to 93% and 42% to 92%, respectively, and in plasma from 33% to 91% and 52% to 94%, respectively, and was very similar to that obtained reported in Table 2, with the exception of HOXA9 where quantitative cutoffs improved performance. The AUC values were 0.56 to 0.89 in sputum samples and 0.60 to 0.78 in plasma samples.
The genes with the largest AUC in sputum were: TAC1: AUC, 0.84; 95% CI, 0.74–0.94; HOXA7: AUC, 0.77; 95% CI, 0.67–0.86; and SOX17: AUC, 0.84; 95% CI, 0.75–0.94 (Fig. 2A), with sensitivities and specificities for TAC1 at 86% and 75%; HOXA7 at 63% and 92%; and SOX17 at 84% and 88, respectively. The positive (PPV) and negative predictive values (NPV) for these 3 genes were: for TAC1, 93% and 58%; for HOXA7, 97% and 40%; and for SOX17, 96% and 60%, respectively.
In plasma, the genes with the largest AUC were: CDO1: AUC, 0.68; 95% CI, 0.58–0.77; TAC1: AUC, 0.78; 95% CI, 0.70–0.86; and SOX17: AUC, 0.78; 95% CI, 0.70–0.86 (Fig. 2B), with corresponding sensitivities and specificities for CDO1 at 65% and 74%; TAC1 at 76% and 78%; and SOX17 at 73% and 84%, respectively. The PPV and NPV for these genes were: for CDO1, 86% and 46%; for TAC1, 90% and 57%; and for SOX17, 92% and 55%, respectively.
The sensitivity and specificity obtained from the combination of the 3 best performing markers (TAC1, HOXA17, and SOX17) in sputum was 98% and 71%, respectively, with a corresponding ROC AUC of 0.89 (95% CI, 0.80–0.98; Fig. 2C). In plasma, the combination of CDO1, TAC1, and SOX17 showed a sensitivity, specificity, and AUC of 93%, 62%, and 0.77 (95% CI, 0.68–0.86), respectively (Fig. 2D).
Smokers' subset analysis
As CT screening for lung cancer is currently recommended for current and ex-smokers, we explored the diagnostic accuracy when only smokers were considered (n = 155; 114 with cancer and 41 without cancer; Supplementary Table S4). The results in only smokers were similar to the entire study population for the prevalence of methylated patients, sensitivity, specificity, and AUC (Supplementary Table S5). AUC in smokers only was 0.89 (95% CI, 0.79–0.99) for the combination of the methylation status of the best 3 genes from sputum and AUC was 0.85 (95% CI, 0.76–0.94) from the best 3 genes from plasma (Supplementary Table S5).
Independent prediction accuracy performance
While the above analysis looked at individual gene methylation in cases and controls to detect cancer, independent blinded random forest prediction models were used to consider all DNA methylation biomarkers in combination with clinical risk factors. Risk factors included in the first 2 random forest prediction models were methylation Ct values from all 6 genes, age, pack-year, COPD status, and FVC values. The methylation Ct values were not included in the last prediction model. The randomly selected training dataset has 140 subjects with 99 (70.7%) cancers and 41 (29.3%) controls. The independent test set has 70 subjects with 51 (72.9%) cancers and 19 (27.1%) controls. In the variable of importance output of the first 2 random forest prediction models, methylation Ct values were ranked as more important variables than demographic and clinical variables (Supplementary Fig. S4). Supplementary Table S3 summarizes the prediction accuracies of these 3 models when they were applied to the independent test set patients. With sputum samples, the random forest model correctly predicted lung cancer in 91% of subjects in the test subset. The corresponding AUC was 0.85 (95% CI, 0.59–1.0; Fig. 3). The sensitivity and specificity of the prediction in the testing subset from the ROC curve were 0.93 and 0.86, respectively. Using plasma samples, the random forest model correctly predicted lung cancer in 85% of subjects in the testing subset. The corresponding AUC was 0.89 (95% CI, 0.79–0.99; Fig. 3). The sensitivity and specificity of the prediction in the testing subset from the ROC curve were 0.93 and 0.67, respectively. Using clinical and demographic risk factors alone, the accuracies were lower than the first 2 models with a diagnostic accuracy of 68%, AUC of 0.64, PPV of 75%, and NPV of 38% (Fig. 3; Supplementary Table S3).
Discussion
High diagnostic accuracy for early-stage lung cancer can be obtained using a panel of methylated promoter genes and an ultrasensitive detection strategy on the basis of MOB in sputum or plasma., This assay has several characteristics which make it clinically useful: (i) the sensitivity and specificity in sputum and plasma exceeds the diagnostic accuracy required by most clinical standards (32, 33); (ii) it can be performed with minute quantities of DNA from sputum or plasma; and (iii) it can help distinguish malignant versus benign nodules, addressing the current problem of high false-positive scans in lung cancer screening. This discrimination is independent of age, pack-year, and even nodule size, which allows the detection of early-stage lung cancer in smokers. Finally, as a PCR-based assay, it is simple and relatively inexpensive.
Previous studies have sought to improve lung cancer risk assessment by the use of molecular biomarkers obtained from plasma and sputum (8, 10, 11, 25, 26, 34, 35). However none of these tests have been used clinically because their achieved sensitivities and specificities were usually not high enough for clinical decision making (8, 10, 11, 25, 26, 34–38). With improvements in DNA extraction methods and processing for methylation detection, along with the use of highly prevalent cancer-specific methylation targets, we have overcome these limitations. Direct comparisons between serum and plasma for detection of DNA methylation have not been conducted in this study, but the use of plasma may reduce the amount of lymphocyte DNA present for analysis.
Despite the improved sensitivity of this approach, there are some patients with undetectable DNA methylation in either blood or sputum. In examining these nondetectable patients, this is not related to clinical characteristics, including smoking status (see similar detection in only smokers, Supplementary Data). This does not appear to be related to PCR failure or assay efficiency, which was assessed for each assay using appropriate controls (Materials and Methods). We also examined whether tumor size and therefore tumor burden affected our ability to detect DNA methylation in the plasma or sputum. There was no statistical difference in tumor size between patients with cancer with or without detectable DNA methylation. (Supplementary Table S6), and notably nodules less than 2 cm were readily detected.
In this study, detection of methylation in sputum samples was slightly better than the detection of these same genes in plasma. The access of early cancers to the airways may be one explanation for this difference. Indeed, changes in the airways form the basis for the AEGIS Study, which reported an improved diagnostic yield of bronchoscopy using gene expression classifiers from epithelial cells collected during bronchoscopy (38). The AUC, sensitivities, and specificities reported in the AEGIS Study were lower than we report here. In our model where methylation markers from plasma were considered simultaneously with age and number of pack-years, we observed a predictive accuracy close to that of sputum. This suggests that plasma could substitute for sputum in lung cancer detection in those cases where sputum cannot be obtained.
According to the NLST, the chances of having lung cancer with a positive CT screening are less than 5% (1, 2). This is because lung cancer with CT screening in the NLST study yielded a 71% sensitivity but a 63% specificity with a 96.4% false-positive rate (1, 2). Our current findings suggest that methylation detection with a few genes from either plasma or sputum could potentially guide management of positive CT screening results. Although our study included some patients who would not meet current lung cancer screening guidelines (non-smokers), we observed similar detection rates when only smokers were analyzed. Replication and external validation of our findings in a large, prospective, multicenter case–control trial are essential before this approach can be adopted.
This study shows that high sensitivity and specificity detection of early-stage NSCLC can be obtained using a panel of methylated promoter genes in plasma and sputum and that the methylation level of these genes is associated with a high lung cancer risk independent of age, pack-year, and nodule size. If confirmed in a validation study, this panel could be used as an adjunct to CT screening, identifying patients at high risk for lung cancer, reducing false-positive results, unnecessary tests, and improving the diagnosis of lung cancer at an earlier stage.
Disclosure of Potential Conflicts of Interest
A. Hulbert holds ownership interest (including patents) in Johns Hopkins Technology Ventures. M.V. Brock holds ownership interest (including patents) in and is a consultant/advisory board member for Cepheid Corporation. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: A. Hulbert, I. Jusue-Torres, A. Stark, M.V. Brock, J.G. Herman
Development of methodology: A. Hulbert, I. Jusue-Torres, A. Stark, K. Rodgers, B. Lee, J. Wrangle, T.-H. Wang, J.G. Herman
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C. Chen, K. Rodgers, B. Lee, C. Griffin, A. Yang, S.C. Yang
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): I. Jusue-Torres, A. Stark, C. Chen, K. Rodgers, A. Yang, P. Huang, J. Wrangle, S.C. Yang, M.V. Brock, J.G. Herman
Writing, review, and/or revision of the manuscript: A. Hulbert, I. Jusue-Torres, A. Stark, C. Griffin, P. Huang, S.A. Belinsky, T.-H. Wang, S.C. Yang, S.B. Baylin, M.V. Brock, J.G. Herman
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A. Stark, K. Rodgers, B. Lee, C. Griffin, S.C. Yang, J.G. Herman
Study supervision: A. Hulbert, I. Jusue-Torres, K. Rodgers, B. Lee, C. Griffin, S.C. Yang, M.V. Brock, J.G. Herman
Grant Support
Funding for this study was provided by DOD W81WH-12-1-0323 (to J.G. Herman and M.V. Brock) and NCIP50CA058184 (to J.G. Herman, M.V. Brock, and S.B. Baylin).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.