Abstract
Background: Current management of lung nodules is complicated by nontherapeutic resections and missed chances for cure. We hypothesized that a serum proteomic signature may add diagnostic information beyond that provided by combined clinical and radiographic data.
Methods: Cohort A included 265 and cohort B 114 patients. Using multivariable logistic regression analysis we calculated the area under the receiver operating characteristic curve (AUC) and quantified the added value of a previously described serum proteomic signature beyond clinical and radiographic risk factors for predicting lung cancer using the integration discrimination improvement (IDI) index.
Results: The average computed tomography (CT) measured nodule size in cohorts A and B was 37.83 versus 23.15 mm among patients with lung cancer and 15.82 versus 17.18 mm among those without, respectively. In cohort A, the AUC increased from 0.68 to 0.86 after adding chest CT imaging variables to the clinical results, but the proteomic signature did not provide meaningful added value. In contrast, in cohort B, the AUC improved from 0.46 with clinical data alone to 0.61 when combined with chest CT imaging data and to 0.69 after adding the proteomic signature (IDI of 20% P = 0.0003). In addition, in a subgroup of 100 nodules between 5 and 20 mm in diameter, the proteomic signature added value with an IDI of 15% (P ≤ 0.0001).
Conclusions: The results show that this serum proteomic biomarker signature may add value to the clinical and chest CT evaluation of indeterminate lung nodules.
Impact: This study suggests a possible role of a blood biomarker in the evaluation of indeterminate lung nodules. Cancer Epidemiol Biomarkers Prev; 21(5); 786–92. ©2012 AACR.
Introduction
For more than 50 years lung cancer has been a leading cause of cancer-related death in the United States. In 2010 222,520 Americans were predicted to be diagnosed with lung cancer and roughly 157,300 to have succumbed to the disease (1). The national lung cancer screening trial showed a 20% lung cancer–specific mortality benefit in patients screened by low-dose computed tomography (CT; ref. 2). CT screening is thought to result in 3 times as many curative-intent interventions for early-stage lung cancers, but up to a 10-fold increase in surgical resection of these lung nodules is predicted. Therefore, a more accurate noninvasive diagnosis of lung nodules is urgently needed.
The diagnosis of lung cancer suffers from the lack of accurate, noninvasive diagnostic tests. Low-dose chest CT screening is very sensitive at detecting lung nodules of small size, many of which are benign (3, 4). To avoid missing a potentially curable lung cancer, the current management protocols for these lung nodules lead to unnecessary diagnostic procedures, an expensive follow-up, and related stress. When removed surgically, 10% to 30% of the lesions carry a benign diagnosis; often in the setting of other comorbidities and even rarely resulting in procedure-related mortality (5, 6). A 5-year prospective cohort of 1,520 patients with a smoking history and at least 50 years of age were found to have a 96% false positive rate (noncalcified lung nodules proven benign by means of observation or surgery; ref. 7), highlighting the lack of specificity commonly seen when just imaging modalities are applied to a high-risk population.
The high rate of nontherapeutic resections and the missed chances for cure from lung cancer are explained by the imperfect noninvasive diagnostic capabilities (8, 9). Age and gender have limited diagnostic value to detect lung cancer. Chest CT, although very sensitive, is not specific enough for classification of indeterminate pulmonary nodules as lung cancers.
A noninvasive diagnostic model for lung cancer developed by the Mayo group found age, smoking status, cancer history, nodule diameter, presence of spiculation, and nodule location to have an area under the receiver operating characteristic curve (AUC) of 0.83 to diagnosis lung cancer (10). A similar model developed by a VA cooperative group to discriminate lung cancer from benign lung nodules found age, smoking history, nodule diameter, and time since quitting smoking to have an AUC of 0.78 (11). Both of these models have been externally validated and shown to have similar diagnostic accuracy (12). These models already suggest that a multimodality approach is necessary to more accurately predict lung cancer in patients with pulmonary nodules. FDG-PET is usually not recommended for subcentimeter lesions because the metabolic activity of these lesions varies greatly and the scan lacks diagnostic specificity (13). Therefore, an additional noninvasive test to complement the existing modalities to improve the detection of lung cancer in patients with a lung nodule is needed to reduce the number of unnecessary referrals for invasive procedures.
We previously showed that a serum proteomic signature of 7 peaks using matrix-assisted laser desorption ionization mass spectrometry (MALDI MS) distinguished subjects with lung cancer from age, gender, and smoking history matched controls (14), with an overall accuracy of 0.73, after introducing a specific threshold yielding a sensitivity of 58% and specificity of 86%. After this phase one diagnostic case–control study, subsequent studies in consecutive patients selected on the presence of CT detected lung nodules are needed to quantify whether this new test adds discriminative accuracy beyond existing risk factors of clinical and imaging data (15–19). Hence, this study seeks to validate this serum proteomic signature and measure how much diagnostic value it adds to the combined clinical and chest CT data for the noninvasive diagnosis of patients presenting for the evaluation of a lung nodule following the PRoBE study design (20) in 2 cohorts of patients from different institutions presenting for the evaluation of lung nodules.
Patients and Methods
Study subjects
The study population included 2 cohorts of prospectively collected patients presenting with one or more lung nodules discovered by chest CT (N = 379). The first, cohort A, combined patients from Vanderbilt University and the Veterans Affairs Medical Center, Nashville, TN and the second, cohort B, included patients from the Mayo clinic, Rochester, MN. None of the subjects were enrolled in a lung cancer screening trial. Both cohorts included patients presenting to the doctors' office for the evaluation of lung nodule(s) on chest CT. Lung nodules were defined as rounded solid lung lesions surrounded by pulmonary parenchyma. Obtaining nodules from 2 distinct parts of the country was built in the study with the hope that we could further generalize our findings. Both cohorts were used separately to validate the diagnostic usefulness of the MALDI signature. Blood was collected from patients before any diagnostic or therapeutic intervention under separate standard operating procedures. In cohort A, 265 patients entered the study, we had to exclude 57 patients because either we were unable to obtain MALDI profiles (n = 46, hemolyzed samples) or some other index tests were not available (n = 11). In cohort B, 114 patients were enrolled. Shape of the nodule was available for 90 samples. Twenty-eight of these 90 patients were excluded because of missing smoking data (n = 5) or because we could not obtain MALDI profiles (n = 23), Supplementary Fig. S1. The study was approved by the local Institutional Review Boards of the Vanderbilt University, the Nashville Veterans Affairs Medical Center and the Mayo Clinic in Rochester, MN, and all patients provided informed consent.
Index tests
The index tests (19) included the novel serum proteomic signature as well as preset variables known to be associated with the diagnosis of lung cancer, specifically, age, pack-years of smoking history, chest CT nodule size, and shape.
Clinical and laboratory data.
Detailed questionnaire data generated from personal interview were available on all study subjects, including demographic characteristics and smoking history. For cohort A, laboratory data elements included complete blood count, lactate dehydrogenase, alkaline phosphatase, total protein, albumin, creatinine and calcium and were obtained from the patient's medical record. Most of these patients also had pulmonary function tests available at the time of the identification of the nodule. Both the FEV1 and DLCO were recorded. For cohort B, only demographic characteristics, smoking history, and c-reactive protein (CRP) were available.
Chest CT data.
Chest CT images of 379 patients (cohorts A+B) with a known solid pulmonary nodule were reviewed and characterized according to size and shape. Nodule size was determined by averaging the largest cross-sectional diameter of the nodule with the largest diameter perpendicular to this measurement. When taking the perpendicular measurement of a lobulated lesion, the outermost margins on either side were considered. If the nodule was in the pulmonary parenchyma, lung windows were used (21). If it involved the mediastinum, measurements with lung and mediastinal windows were used and the average was taken. Careful consideration was taken to classify the lesion as either smooth, lobulated, or spiculated. A lesion was determined to be smooth if its margins were well circumscribed and either round or ovoid. Lobulated lesions consisted of nodules that had undulations and a more amorphous appearance than smooth nodules. Spiculation, or corona radiata, was used to assess the presence of fibrosis around the nodule thought to represent a desmoplastic type reaction. Spiculations were characterized by radial projections from the nodule that were distinct from lung parenchyma or feeding blood vessels. When the nodule involved the mediastinum or perihilar structures, it was regarded as a central lesion, otherwise it was deemed to be a peripheral lesion.
Serum proteomic data.
MALDI MS spectrum acquisition and data processing was carried out as previously described (14). Peripheral blood was collected without additive, incubated at room temperature for 60 minutes, and centrifuged. Serum was aliquoted and stored within 4 hours at −80° C until analysis. Thawed serum samples were diluted 1:10 in water. One microliter of matrix solution (sinapinic acid in acetonitrile/water 50:50 v/v containing 0.1% trifluoroacetic acid) was mixed with 1 microliter of diluted serum and directly spotted in triplicate onto a gold-coated stainless steel MALDI target plate (PE Biosystems). Spectra were acquired with a Voyager-Elite MALDI mass spectrometer (Applied Biosystems). Spectra were generated in the mass-to-charge (m/z) range of 3,000 to 20,000. Internal calibration was carried out using the hemoglobin β chain [M+H)+ = 15,686] and APO-C1 [(M+H)+ = 6,631]. The data preprocessing consisted of internal calibration, smoothing, baseline correction, normalization to the total ion current, feature selection with a signal-to-noise ratio, and binning of features. The processing resulted in 120 m/z peaks per spectrum on average. A total of 162 bins from m/z ratios of 3,000 to 20,000 were selected. In addition, 75 bins reached S/N ≥3 from which 5 peaks related to hemoglobin were removed from the analysis. The 70 remaining peaks were used for statistical analysis.
Reference test (outcome)
The reference test was the histology of the material of the lung biopsy to determine the presence or absence of lung cancer, that is, the outcome of this study. All patients without a diagnosis of lung cancer on pathology were followed up for 12 months after discovery nodule; none were found to have lung cancer. The nodule imaging protocol followed current clinical guidelines for the management of pulmonary nodules (22).
Statistical analysis
To assess the primary hypothesis that the protein signature improves the diagnostic accuracy when adding to clinical, laboratory, and imaging results, we carried out the statistical analysis in 3 steps.
First, we applied a logistic regression analysis to test the diagnostic ability for the 3 groups of index tests separately, that is, clinical and laboratory variables, chest CT variables, and the MALDI MS score. The MALDI MS score (signature) was represented using a Weighted Flexible Compound Covariate Method (WFCCM; ref. 23). The method can be summarized as follows: for each patient i, the MALDI MS signature/score, WFCCM (i), can be calculated as Σj (Σk STjk)[Wj]xij, (in which xij is the intensity measurement of feature j for sample i on candidate feature j. STjk is the standardized test statistic for statistical analysis method k for feature j, Wj is the weight of the individual feature). More specifically in this application, we derived the MALDI MS score for each patient as following steps: Step 1, we predetermined 7 features suggested by a previous publication: located these 7 features' m/z values (i.e., m/z at 4,155, 7,616, 8,765, 11,440, 11,526, 11,683, and 13,762, j = 1…7) as well as obtained their corresponding tests statistics (i) t test, (ii) significant analysis of microarrays (iii) weighted gene analysis, and, (iv) Wilcoxon rank sum test (k = 1,…,4; ref. 14). Step 2, for each of these 7 features, we calculated their individual feature score using the feature's mass spectral intensity multiplied by the summation of 4 test statistics obtained in the first step. Step 3, we provided a MALDI MS score for each patient by calculating a weighted average of the 7 individual feature score. In this application, we set an equal weight for each feature, which made the final MALDI MS score a mean value of the total individual feature score. Thus, the 7 features were prespecified (a priori) based on previously published study (14), without a selection/screening process involving the current data. The tests statistics used as coefficients to build the MALDI MS score were also “carried over” from the published study, not from any statistical test involving the current data. The tests statistics carried over are provided in the Supplementary Table S1.
Second, we used multivariable logistic regression modeling techniques to quantify the added diagnostic value of the MALDI score beyond the preset accepted variables of risk for lung cancer: age, pack-years of smoking history (clinical variables), and chest CT nodule size and shape (imaging variables). The difference in diagnostic performance between the model, including the clinical (age and pack-years history of smoking) and CT test results (nodule size and shape) and the model after being extended with the MALDI score, was assessed by the AUC and the integrated discrimination improvement (IDI) test (24).
Third, to obtain a bias-corrected estimate of the multivariable logistic model's diagnostic accuracy and assess the added value of the MALDI MS score in a new cohort from the same patient population, we applied the bootstrapping method for internal validation (25, 26).
Standard bootstrap resampling technique applies the model on 1,000 bootstrap samples and calculates the original index, that is, c-index/AUC (area under the receiver operating characteristic) with 95% confidence interval (CI) and the bias-corrected c-index (25). The results involving c-index/AUC reported in the text and figures are all bias-corrected results after bootstrapping method were applied.
Missing data are dealt with in a uniform fashion. There are no multiple imputations missing data analysis results reported because we have attempted these methods and result in similar data. It is reasonable to assume that the missing data in our study is very close to missing completely at random, as the missing mechanism is unrelated to any parameters we are interested in and is independent to any inference we wish to draw.
Results
Patient characteristics: The clinical variables of the 2 cohorts are summarized in Table 1 and Supplementary Table S2. The 2 cohorts differed significantly with regard to smoking status and nodule size (Supplementary Table S3). The average nodule size in cohort A was 37.83 compared with 23.15 mm in cohort B among patients diagnosed with lung cancer and 15.82 versus 17.18 mm among those with benign disease. The size of the nodules varied significantly between the cohorts and nodules were significantly smaller in cohort B (Supplementary23 Fig. S2).
Diagnostic accuracy of the MALDI test. In cohort A (n = 208 patients), the AUC of the MALDI score was 0.64 (95% CI: 0.58–0.71) (Fig. 1A). In cohort B (n = 62 patients) the AUC was 0.64 (95% CI: 0.52–0.75) (Fig. 1B).
Diagnostic accuracy of the clinical and Chest CT results: In cohort A, adding chest CT imaging variables (size and shape) to the clinical results increased the AUC significantly from 0.68 (95% CI: 0.61–0.74) to 0.86 (95% CI: 0.81–0.91; Fig. 1A). In cohort B, the AUC improved from 0.46 (95% CI: 0.33–0.58) with clinical data alone to 0.61 (95% CI: 0.49–0.74) when combined with chest CT imaging data (Fig. 1B). In both cohorts, CRP and/or Hgb blood levels had no incremental diagnostic value when added to the model including clinical and chest CT data.
Added value of the proteomic serum signature: Adding the proteomic signature score to the clinical and chest CT data did not change the AUC 0.86 (95% CI: 0.81–0.91) in cohort A (Fig. 1A). The AUC increased from 0.61 to 0.69 (95% CI: 0.57–0.80) in cohort B with its smaller nodules (Fig. 1B). The IDI also showed a significant improvement of 20% in cohort B (P = 0.0003), whereas in cohort A the improvement was 2.7% (P = 0.006).
Role of the MALDI score in indeterminate pulmonary nodules: The clinical impact of our diagnostic signature for lung cancer would be greatest if it were able to discriminate lung cancer from other indeterminate nodules (5–20 mm in diameter). Because the proteomic serum signature was only additive in the cohort with smaller nodules (cohort B), we carried out a sensitivity analysis to confirm these findings in a larger cohort of pooled indeterminate nodules between cohorts A and B. A total of 100 patients from both cohorts with indeterminate nodules (Supplementary Table S4) were fully evaluated separately. Fifty-one individuals were diagnosed with lung cancer and 49 without lung cancer after 1 year follow-up. The characteristics of this group are reported in Table 2. AUC curve analysis showed an area under the curve of 0.57 for clinical data alone, 0.67 when imaging data were added, and 0.72 when also including the MALDI MS signature (Fig. 2). The IDI revealed an improvement of 15% (P < 0.0001) in this subgroup.
Clinical, CT imaging, and laboratory characteristics of cohorts A and B
Demographic characteristics . | Cohort A (VU/VA) . | Cohort B (Mayo) . |
---|---|---|
N | 265 | 114 |
Age, y (SD) | 64 (10) | 68 (10) |
Gender | ||
M | 158 (60) | 65 (57) |
F | 107 (40) | 49 (43) |
Race | ||
White | 247 (93) | 91 (80) |
Black | 16 (6) | 15 (13) |
Other | 2 (1) | 8 (7) |
Pack years of smoking (SD) | 61.2 (38) | 33.5 (27) |
CT Characteristics | ||
Mean size, mm, (SD) | 31.2 (23) | 19.4 (13) |
Spiculation present (%) | 146 (55) | 37 (41) |
Histology, n (%) | ||
Adenocarcinoma | 60 (23) | 25 (22) |
Squamous carcinoma | 49 (19) | 13 (11) |
Large cell carcinoma | 7 (3) | 1 (1) |
Small cell carcinoma | 22 (8) | |
Carcinoid neoplasm | 3 (3) | |
Renal cell carcinoma | 2 (1) | |
Esophageal carcinoma | 1 (<1) | |
Colon carcinoma | 1 (<1) | |
Fibrotic tissue | 16 (6) | |
Vascular lesions | 1 (<1) | 1 (1) |
Infectious granulomas | 8 (3) | 3 (3) |
Other infections | 8 (3) | 11 (10) |
Noninfectious granulomas | 25 (9) | 7 (6) |
Developmental lesions | 2 (1) | 5 (4) |
Other conditions | 47 (18) | 44 (4) |
Laboratory Characteristics | ||
C-Reactive Protein mg/dL, mean (SD) | 4.9 (14) | 6.1 (9) |
Hemoglobin g/dL, mean (SD) | 13.2 (2) | |
Alkaline Phosphatase, mean (SD) | 99 (47) | |
Albumin g/dL, mean (SD) | 3.9 (1) |
Demographic characteristics . | Cohort A (VU/VA) . | Cohort B (Mayo) . |
---|---|---|
N | 265 | 114 |
Age, y (SD) | 64 (10) | 68 (10) |
Gender | ||
M | 158 (60) | 65 (57) |
F | 107 (40) | 49 (43) |
Race | ||
White | 247 (93) | 91 (80) |
Black | 16 (6) | 15 (13) |
Other | 2 (1) | 8 (7) |
Pack years of smoking (SD) | 61.2 (38) | 33.5 (27) |
CT Characteristics | ||
Mean size, mm, (SD) | 31.2 (23) | 19.4 (13) |
Spiculation present (%) | 146 (55) | 37 (41) |
Histology, n (%) | ||
Adenocarcinoma | 60 (23) | 25 (22) |
Squamous carcinoma | 49 (19) | 13 (11) |
Large cell carcinoma | 7 (3) | 1 (1) |
Small cell carcinoma | 22 (8) | |
Carcinoid neoplasm | 3 (3) | |
Renal cell carcinoma | 2 (1) | |
Esophageal carcinoma | 1 (<1) | |
Colon carcinoma | 1 (<1) | |
Fibrotic tissue | 16 (6) | |
Vascular lesions | 1 (<1) | 1 (1) |
Infectious granulomas | 8 (3) | 3 (3) |
Other infections | 8 (3) | 11 (10) |
Noninfectious granulomas | 25 (9) | 7 (6) |
Developmental lesions | 2 (1) | 5 (4) |
Other conditions | 47 (18) | 44 (4) |
Laboratory Characteristics | ||
C-Reactive Protein mg/dL, mean (SD) | 4.9 (14) | 6.1 (9) |
Hemoglobin g/dL, mean (SD) | 13.2 (2) | |
Alkaline Phosphatase, mean (SD) | 99 (47) | |
Albumin g/dL, mean (SD) | 3.9 (1) |
Prediction models performance in cohorts A and B. A, receiver operating characteristic (ROC) curves showing diagnostic accuracy of clinical data alone: age, pack-year smoking history (green), MALDI MS signature alone (dotted gold), clinical combined with chest CT data including size and shape (blue), clinical combined with MALDI data (dotted purple), CT data alone (dotted light blue), CT data and MALDI MS signature (dotted black), and finally the added value of the serum MALDI MS signature to clinical and chest CT (red) in cohort A (208 patients, 150 cases and 58 controls). B, ROC curves showing diagnostic accuracy of clinical data alone: age, pack-year smoking history (green), MALDI MS signature alone (dotted gold), clinical combined with chest CT data including size and shape (blue), clinical combined with MALDI data (dotted purple), CT data alone (dotted light blue), CT data and MALDI MS signature (dotted black), and finally the added value of the serum MALDI MS signature to clinical and chest CT (red) in cohort B (62 patients, 25 cases and 37 controls).
Prediction models performance in cohorts A and B. A, receiver operating characteristic (ROC) curves showing diagnostic accuracy of clinical data alone: age, pack-year smoking history (green), MALDI MS signature alone (dotted gold), clinical combined with chest CT data including size and shape (blue), clinical combined with MALDI data (dotted purple), CT data alone (dotted light blue), CT data and MALDI MS signature (dotted black), and finally the added value of the serum MALDI MS signature to clinical and chest CT (red) in cohort A (208 patients, 150 cases and 58 controls). B, ROC curves showing diagnostic accuracy of clinical data alone: age, pack-year smoking history (green), MALDI MS signature alone (dotted gold), clinical combined with chest CT data including size and shape (blue), clinical combined with MALDI data (dotted purple), CT data alone (dotted light blue), CT data and MALDI MS signature (dotted black), and finally the added value of the serum MALDI MS signature to clinical and chest CT (red) in cohort B (62 patients, 25 cases and 37 controls).
Prediction models performance in indeterminate pulmonary nodules. ROC curves showing diagnostic accuracy of clinical data alone: age, pack-year smoking history (green), MALDI MS signature alone (dotted gold), clinical combined with chest CT data including size and shape (blue), clinical combined with MALDI data (dotted purple), CT data alone (dotted light blue), CT data and MALDI MS signature (dotted black), and finally the added value of the serum MALDI MS signature to clinical and chest CT (red) in patients with lung nodules sized 5 to 20 mm (100 patients, 51 cases, 49 controls).
Prediction models performance in indeterminate pulmonary nodules. ROC curves showing diagnostic accuracy of clinical data alone: age, pack-year smoking history (green), MALDI MS signature alone (dotted gold), clinical combined with chest CT data including size and shape (blue), clinical combined with MALDI data (dotted purple), CT data alone (dotted light blue), CT data and MALDI MS signature (dotted black), and finally the added value of the serum MALDI MS signature to clinical and chest CT (red) in patients with lung nodules sized 5 to 20 mm (100 patients, 51 cases, 49 controls).
Patients characteristics of nodules 0.5 to 2.0 cm
. | Nodules 5–20 mm in diameter from cohorts A and B . | |
---|---|---|
. | Lung CA . | Not Lung CA . |
N | 51 | 49 |
Age, y (SD) | 64.8 (9.8) | 62 (9) |
Pack-years (SD) | 50.9 (33.5) | 50.9 (39.9) |
Mean size, mm (SD) | 14.8 (3.1) | 11.8 (4.6) |
Spiculation (%) | 25.0 (49) | 13.0 (27) |
. | Nodules 5–20 mm in diameter from cohorts A and B . | |
---|---|---|
. | Lung CA . | Not Lung CA . |
N | 51 | 49 |
Age, y (SD) | 64.8 (9.8) | 62 (9) |
Pack-years (SD) | 50.9 (33.5) | 50.9 (39.9) |
Mean size, mm (SD) | 14.8 (3.1) | 11.8 (4.6) |
Spiculation (%) | 25.0 (49) | 13.0 (27) |
Discussion
This study achieves 2 main goals. First, it validates in 2 independent cohorts the performance of a previously published serum proteomic signature derived from a case–control study and tested in a relevant clinical context of individuals presenting for the evaluation of a newly discovered lung nodule. Second and more importantly, it addresses the biomarker's added value to that of clinical and chest CT imaging results used in current clinical practice.
The search for noninvasive diagnostic biomarkers in lung cancer has led to a large body of work that strives to find a molecular signature robust enough to deserve further validation. Many of these signatures tested in a case–control study design report an overall prediction of lung cancer diagnostic accuracy of 70% to 80%. Very few of these studies address the noninvasive diagnosis of lung cancer in individuals presenting with lung nodules. It is exactly in this population, however, that the addition of a noninvasive biomarker to the existing diagnostic modalities may have the largest clinical significance. The VA cooperative group and Mayo clinic models report clearly that the most useful clinical and imaging variables were found to be age, pack-years of smoking, and nodule size and shape. Therefore, we decided in this study to evaluate the added value of our proteomic signature to the one of these preset variables available at the time of evaluation.
We proceeded by showing first that the MALDI MS signature provided a noninvasive diagnostic accuracy of lung cancer similar to the one of epidemiologic information or better accuracy when considering smaller nodules (Fig. 1A and B). Second, we confirmed the power of the clinical and the chest CT imaging data in predicting lung cancer. This is particularly true for chest CT, and not surprisingly so, in larger lesions. Finally, we showed that the proteomic signature provided added value to the clinical and CT data when considering indeterminate nodules as shown in cohort B and shown by the significant integration–discrimination improvement analysis. Whether the added value is clinically meaningful remains to be tested and will be considered in future studies.
We then sought to confirm this improvement in classification of indeterminate nodules provided by the proteomic profile. To this aim, we selected from cohorts A and B nodules between 5 and 20 mm in diameter. In this population, the proteomic signature increased the prediction accuracy from 0.67 to 0.72 and a significant IDI of 15%. These observations suggest that the additive value of our proteomic signature is most likely to be in predicting the diagnosis of lung cancer when the lung nodule is indeterminate, not spiculated, and in patients with lower pack-years of smoking.
The serum biomarker did not add meaningful diagnostic value in cohort A, however. As we had anticipated in our diagnostic model, as the lung nodule size increases, the proteomic signature loses its added value (Supplementary Fig. S3). This is mainly due to the increase in diagnostic accuracy of the chest CT with larger nodules. We therefore speculate that the difference in size of nodules between the cohorts is playing a major role in determining the added value of the biomarker to the clinical and imaging variables. Other factors such as differences in smoking history and regional differences (Minnesota vs. Tennessee) may also explain in part this observation. In addition, our model does not take into account that some individuals without lung cancer may still develop disease beyond our observed 1 year follow-up.
A few studies have recently incorporated biomarkers into a radiographic model with improved accuracy. A group of investigators found CRP and CEA levels to improve a multivariable regression model when including presence or absence of calcification, spiculation, and a visible bronchus leading to the nodule. When validated prospectively, the model had a predictive accuracy of 84% (27) but did not integrate variables such as nodule size, age, or smoking history of the individuals. Another group combined an 80-gene microarray biomarker of bronchial epithelium with age, mass size, and lymphadenopathy and achieved an AUC of 0.94 (28). However, the need for bronchoscopy makes this approach less generalizable because of the more invasive nature of the test.
In summary, we determined that traditional test results (age, smoking history, and nodule size and shape) in combination with a novel biomarker can more accurately classify individuals with lung nodule into disease categories (cancer vs. no cancer) of potential clinical relevance. This serum proteomic biomarker adds diagnostic information to established, standard clinical, and chest CT imaging data in the evaluation of smaller nodules. Our group and others have shown MALDI MS is simple, rapid, and reproducible across laboratories (14, 29, 30). To our knowledge, this study presents for the first time the additive value of a serum proteomic signature to combined clinical and imaging data for patients with pulmonary nodules. This integrated noninvasive approach to the evaluation of lung nodules deserves further prospective validation among a larger cohort of patients presenting with indeterminate pulmonary nodules in the context of a screening strategy. If proven useful, this multimodality approach may lead to a reduction in futile thoracotomies and missed chances for cure.
Disclosure of Potential Conflicts of Interest
J.R Jett has held the title/role of Editor-in-Chief of Journal of Thoracic Oncology and title/role of editor of Lung Cancer Section, has received research grant from University of Nottingham, and he is also a consultant and is on the advisory board of Oncimmune Inc. J.B. Putnam has held the Chair of Thoracic Organ Site Committee of American College of Surgeons Oncology Group.
Authors' Contributions
Conception and design: C.V. Pecot, D.P. Carbone, K.G.M. Moons, Y. Shyr, and P.P. Massion.
Development of methodology: C.V. Pecot, D.P. Carbone, J.A. Worrell, K.G.M. Moons, and P.P. Massion.
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.V. Pecot, M. Li, X.J. Zhang, C. Calitri, A. Bungum, J.R. Jett, J.B. Putnam, E.L. Grogan, R. Rajanbabu, S. Deppen, J.A. Worrell, K.G.M. Moons, and P.P. Massion.
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C.V. Pecot, M. Li, X.J. Zhang, R. Rajanbabu, C. Calitri, A. Bungum, S. Deppen, J.A. Worrell, K.G.M. Moons, Y. Shyr, and P.P. Massion.
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C. Calitri, S. Deppen, and D.P. Carbone.
Writing, review, and/or revision of the manuscript: C.V. Pecot, J.R. Jett, C. Callaway-Lane, S. Deppen, E.L. Grogan, J.A. Worrell, K.G.M. Moons, and P.P. Massion.
Study supervision: C.V. Pecot and P.P. Massion.
Drafting the manuscript for important intellectual content: C.V. Pecot, M. Li, J.R. Jett, J.B. Putnam, C. Callaway-Lane, E.L. Grogan, D.P. Carbone, K.G.M. Moons, Y. Shyr, and P.P. Massion.
Grant Support
This work was funded in part by the SPECS in lung cancer U01 CA114771, the Vanderbilt SPORE in lung CA CA90949, and a Merit Review grant from the Veterans Administration. K.G.M. Moons is supported by The Netherlands Organization for Scientific Research (grant 918.10.615 and 9120.8004).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.