Abstract
Screening with low-dose helical computed tomography (CT) has been shown to significantly reduce lung cancer mortality but the optimal target population and time interval to subsequent screening are yet to be defined. We developed two models to stratify individual smokers according to risk of developing lung cancer. We first used the number of lung cancers detected at baseline screening CT in the 5,203 asymptomatic participants of the COSMOS trial to recalibrate the Bach model, which we propose using to select smokers for screening. Next, we incorporated lung nodule characteristics and presence of emphysema identified at baseline CT into the Bach model and proposed the resulting multivariable model to predict lung cancer risk in screened smokers after baseline CT. Age and smoking exposure were the main determinants of lung cancer risk. The recalibrated Bach model accurately predicted lung cancers detected during the first year of screening. Presence of nonsolid nodules (RR = 10.1, 95% CI = 5.57–18.5), nodule size more than 8 mm (RR = 9.89, 95% CI = 5.84–16.8), and emphysema (RR = 2.36, 95% CI = 1.59–3.49) at baseline CT were all significant predictors of subsequent lung cancers. Incorporation of these variables into the Bach model increased the predictive value of the multivariable model (c-index = 0.759, internal validation). The recalibrated Bach model seems suitable for selecting the higher risk population for recruitment for large-scale CT screening. The Bach model incorporating CT findings at baseline screening could help defining the time interval to subsequent screening in individual participants. Further studies are necessary to validate these models. Cancer Prev Res; 4(11); 1778–89. ©2011 AACR.
Introduction
Lung cancer is the world's leading cause of cancer death (1), mainly because most cases are diagnosed at a regionally advanced or metastatic stage, when surgery is unlikely to be feasible or useful. This is illustrated by the finding that, among 5-year lung cancer survivors, 70% have localized disease, 21% locally advanced disease, and 3% disseminated disease (2). Five-year relative survival after lung cancer diagnosis is 12% to 15% (2). Early diagnosis is the cornerstone of successful lung cancer treatment. Using data from Surveillance, Epidemiology, and End Results registries, Goldberg and colleagues (3) estimated if all lung cancers were diagnosed at an early stage, U.S. lung cancer deaths would be reduced by more than 70,000 per year. We already knew that low-dose helical computed tomographic (CT) screening can diagnose lung cancers at an early stage (4) but recently the National Cancer Institute (NCI) prematurely ceased the National Lung Cancer Screening Trial after initial results showed a 20% drop in lung cancer mortality (5, 6). Although the final results of this and other ongoing randomized trials (7–11) are awaited, we now need to identify the optimal target population who will best benefit from screening, and to decide on the frequency and duration of screening before implementing large-scale population screening trials. There is increasing interest, therefore, in developing ways to estimate the risk of an individual developing lung cancer, both to reduce screening costs and to reduce the numbers of persons harmed by screening and follow-up interventions.
Although risk modeling has established clinical applications in cardiovascular disease (12) and breast cancer (13, 14), only recently have models been developed to estimate lung cancer risk (15–19) both in high-risk groups (15), and the general population (19), but none in screened populations. Most models rely on epidemiologic variables to estimate risk, while Spitz and colleagues (17) combined epidemiologic risk factors with 2 markers of DNA repair capacity, although the improvement in risk assessment obtained by adding these molecular variables was small. In addition, nothing seems to have been published on models combining the results of imaging investigations with epidemiologic and clinical data, so the ideal target population—that would in theory benefit most from CT screening—has not been defined; the optimal interval between screening scans also remains undefined (19, 20).
The first aim of this study was to develop a model based on epidemiologic and clinical risk factors to estimate the probability of individuals in a high-risk population being diagnosed with lung cancer. This model might be useful to stratify individuals and select those at higher risk for inclusion in screening programs. We then went on to develop a second model based on baseline CT findings (presence and characteristics of nodules, presence of emphysema) in a screened population, combined with epidemiologic and clinical risk factors, to stratify individuals according to the probability of being diagnosed with lung cancer at repeat screening scans. This second model is proposed for use in large-scale screening programs to select lower risk patients in whom the interval between screening CTs can be lengthened, and at the same time identify those at higher risk of lung cancer in whom surveillance intensity might be increased or who might benefit from prevention intervention studies.
Subjects and Methods
The COSMOS trial
We used data from the ongoing COSMOS single-center nonrandomized lung cancer screening trial, being conducted in Northern Italy. Details of participants, screening protocol, and diagnostic workup have been published elsewhere (21). Briefly, asymptomatic volunteers aged 50 years or older were eligible for recruitment if they were heavy smokers (≥20 pack-years), still smoking or had stopped smoking less than 10 years previously, and had not been diagnosed with cancer (other than nonmelanoma skin cancer) in the previous 5 years. The protocol required once yearly low-dose CT for 5 years, and compilation of self-reporting questionnaires investigating respiratory symptoms, smoking, exposure to other risk factors, and medical and family history. The study was approved by the Ethics Committee of the European Institute of Oncology and was registered at www.clinicaltrials.gov (NCT01248806). All those recruited gave written consent to annual CT for 5 consecutive years.
CT was done with a High Speed Advantage (General Electric) CT scanner, with 8-slice (later 16-slice) multidetector, without contrast media. Findings of noncalcified lung nodules were reviewed collectively by radiologists, thoracic surgeons, and nuclear medicine physicians, and were classified as solid, partially solid, or nonsolid (22). Nodules 5 mm or less were scheduled for repeat CT a year later. Nodules between 5.1 and 8 mm were scheduled for repeat CT 3 months later. Nodules more than 8 mm (or growing lesions >8 mm at repeat scan) were scheduled for CT-PET. Lesions suspicious for malignancy (growing or CT-PET–positive) were scheduled for surgical biopsy and additional interventions. Lesions considered to be inflammation were treated with oral antibiotics for 10 days, with CT repeated 1 or 3 months later (22). CT-PET results were considered positive if the estimated maximum standard uptake value (max-SUV) of 18F-fludeoxyglucose exceeded 2. This work-up protocol has shown to be highly sensitive (91%) and specific (99.7%), and responsible for very few false negative findings (22).
The presence of emphysema (present vs. not present) was assessed visually by the radiologist, reading the low-dose CT scan reconstructed at 1.2 mm thickness, at window level of −480 and width of 1,260 Hounsfield units and defined as the presence of areas of low attenuation that contrast with the surrounding lung parenchyma with normal attenuation. Supplementary Material: Figure S1 shows the low-dose CT scan of the lung in a patient classified as having emphysema: the subtle areas of low attenuation and loss of parenchimal structures in the top right lobe are considered the lowest limit to define the presence of emphysema. Pulmonary function was assessed with a MIR Spirobank spirometer (MIR Medical International Research Inc.). Spirometric variables recorded and analyzed included: forced expiratory volume in the 1st second (FEV1)/forced vital capacity (FVC), and forced expiratory flow between 25% and 75% (FEF25–75). All values were calculated according to the European Respiratory Society criteria (23). In addition to traditional thresholds for lung function parameters, we also used the nonconventional threshold of less than 90% for FEV1, which has been found as a significant predictor of increased lung cancer risk in a similar population of heavy smokers undergoing annual chest CT screening (24). We did not take gender differences in lung function into account in the analysis.
Statistical analysis
Assessment of lung cancer risk in COSMOS participants
The lung cancer incidence rate for COSMOS participants according to categories of baseline characteristics (Supplementary Material: Table S1) was calculated dividing the number of observed cancers by the number of person-years of follow-up from baseline to the most recent CT or CT-PET, diagnosis of lung cancer (pathologic confirmation), or dropout, whichever came first. Lung cancer rates were compared between categories (univariate analysis) using rate ratios (RR), with 95% CIs determined assuming that lung cancer events followed a Poisson distribution, and using HRs with 95% CIs obtained from univariate Cox proportional regression modeling. We next used multivariable Cox proportional regression modeling to estimate the independent predictive value of the baseline characteristics identified by the univariate analysis. The proportional hazards assumption was tested by introducing a constructed time-dependent variable and testing its statistical significance. Although study participants were heavy smokers, overall mortality was low and not a major competing risk for lung cancer. We therefore used the Kaplan–Meier method to represent the cumulative incidence of lung cancer, and the log-rank test to compare lung cancer incidence between the various categories of patients.
Prediction of lung cancer risk at first screening round
We compared the frequency of lung cancers diagnosed in the first year in COSMOS with the frequency predicted by the Bach model (15), a model developed and validated in smokers, that employs age, sex, asbestos exposure history, and smoking history to estimate the risk an individual will be diagnosed with lung cancer or die of competing causes within 10 years. We were concerned with 1-year prediction, where competing causes of death are negligible.
The Bach model was evaluated for its ability to distinguish patients with a diagnosis of lung cancer from those without diagnosis (discrimination) and its agreement with the frequency of lung cancers (calibration) in COSMOS. Discrimination was assessed as the area under the receiver operating characteristic curve (25). Calibration was explored by visual inspection of the plot comparing the risk of lung cancer predicted by the model with the proportion of lung cancers observed in groups of participants defined by deciles of predicted probabilities. Calibration was further assessed using the Hosmer–Lemeshow χ2 test (26).
Because asymptomatic prevalent lung cancers were detected during the first year of screening (27), the baseline 1-year probability of being diagnosed with LC during the first screening round, which is the risk observed for a patient with risk factors fixed at their average level, would be far greater than the baseline risk predicted by the Bach model in the absence of screening. The Bach model was therefore modified (recalibrated) and the original baseline risk h0 was replaced with the baseline risk h0* recalculated from the COSMOS dataset. The h0* was estimated using the Breslow estimator from the Cox regression model with the linear predictor of the original Bach model included as the only covariate.
Prediction of lung cancer risk after first screening round
We next developed a predictive model, based on results of the baseline CT, to stratify screened individuals according to their probability of being diagnosed with lung cancer during subsequent screening CTs. We excluded cases diagnosed in the first screening round and started observation for the model at the time of second screening CT and continued up to the date of lung cancer diagnosis, or date of latest CT (for noncases). Multivariable Cox proportional hazard regression analysis was used to construct the model, with nodule type as categorical covariate (none/calcified, solid, partially solid, and nonsolid) and the square root of the diameter of the largest detected nodule (mm) as continuous covariate. We carried out a square root transformation of the diameter to satisfy the Cox model assumption of a linear relationship between the nodule diameter and the log hazard of event and to limit the influence of extreme values in the fitting process as nodule diameter distribution was highly right skewed. Improvements in risk prediction when the covariates age, sex, asbestos exposure history, and smoking history were incorporated into the model, were also assessed. To avoid overfitting, these covariates were incorporated by means of the linear predictor derived from the Bach model, instead of estimating them from our own data (28).
The discriminatory ability of the various predictive models fitted was assessed by Harrell's concordance statistic (c-index; ref. 29), which is the probability that, for 2 randomly selected participants, the survival time predicted by the model is greater for the participant who survived longer. When analyzing censored data, the c-index was calculated using all possible pairs of participants at least one of whom developed lung cancer. A c-index of 1 indicates perfect concordance and 0.5 indicates concordance no better than chance. The c-indexes of models with and without the linear predictor were compared using bootstrap.
We also compared predictive accuracy among models using the Net Reclassification Improvement (NRI) index, originally proposed by Pencina and colleagues (30) to evaluate individual risk predictions derived from regression models for binary outcomes and subsequently extended to time-to-event outcomes by Liu and colleagues (31). The standard error of NRI was estimated using bootstrap.
We used the graphical representation proposed by Pepe and colleagues (32) to illustrate the predictive and classification performance of our model in an integrated plot. The analyses were done with SAS software version 8.02 (SAS). All cited P values are 2-sided, with the exception of those derived from significance tests comparing discriminatory and predictive abilities among models (1-sided).
Results
The COSMOS trial
Between October 2004 and October 2005, 3,439 (66%) men and 1,764 (34%) women (total 5,203) were enrolled in COSMOS. Median age at baseline was 57 years (range 50–84) similar for men and women; 4,175 (80%) were current heavy smokers; 1,028 (20%) reported stopping for a median of 3 years (range 0–10 years). The men reported smoking a median of 47 pack-years (range 20–260) and had started on average at age 16. The women had smoked 40 pack-years (range 20–180) starting at an average age of 18.
Four hundred eighty-two participants were recalled for repeat CT (recall rate 9.3%), and 75 (1.4%) for second repeat CT during the first year; 160 (3.1%) CT-PET scans were also done, with a total of 525 (10.1%) participants being recalled for CT, CT-PET or both, in the first year. As a result of these first round investigations, 62 participants underwent surgery. Lung cancer was found in 55, and a benign lesion (false positive) in 7 (Fig. 1).
A total of 4,822 (92.7%) participants came back for the second CT round, 4,582 (88.1%) for the third round, and 4,383 (84.2%) for fourth round (Fig. 1). During this time an additional 107 participants underwent surgery for presumed lung cancer, 22 of whom were false positives. The recall rate fell to 3.9% at the second round, but was 5.1% at the third and 6.5% at the fourth round (Fig. 1).
Overall 162 lung cancers were detected: adenocarcinoma in 116 (71.6%) patients, squamous cell carcinoma in 19 (11.7%), small cell carcinoma in 10 (6.2%), carcinoid tumor in 4 (2.5%), bronchoalveolar carcinoma in 2 (1.2%), large cell neuroendocrine tumor in 1 (0.6%), and other nonsmall cell types in 10 (6.2%).
At diagnosis, 115 patients (71.0%) were stage I, 7 (4.3%) stage II, 25 (15.4%) stage III, and 15 (9.3%) stage IV. The resectability rate was 89%. Interval cancers developed in 3 (1.8%) cases.
Assessment of lung cancer risk in COSMOS participants
The 162 lung cancers were detected in 18,095 person-years of observation from baseline to the end of the fourth screening round, giving a lung cancer detection rate of 0.90 per 100 years (Supplementary Material: Table S1). The detection rate was approximately constant over time and cancer cases were usually diagnosed within 6 months of a screening CT (Fig. 1). Detection rates (per 100 years) were slightly higher in men (0.95) than women (0.78), and in current (0.92) than former (0.79) smokers, but differences were not significant. By contrast, the detection rate was strongly (P < 0.0001) dependent on age, increasing from 0.50 in those under 55 years at entry, to 1.64 in those more than 65 years at entry (Supplementary Material: Table S1). The lung cancer rate did not vary much with age at starting smoking, or years from stopping, but correlated strongly with the duration of smoking and cigarette consumption. The rate doubled in those who smoked for 35 to 40 years compared with those who smoked for less than 35 years, and was more than 6 times higher (RR = 6.27; 95% CI = 3.14–12.5) in those who smoked for more than 50 years, compared with those who smoked for less than 35 years. Similarly, in those who smoked more than 40 cigarettes per day the lung cancer rate was about double that in those who smoked less than 20 per day (RR = 1.91; 95% CI = 1.08–3.35). No other demographic, lifestyle (including body mass index, fruit and vegetable consumption pattern, alcohol, and passive smoking) or exposure factor was significantly associated with lung cancer risk (Supplementary Material: Table S1), although exposure to asbestos was nonsignificantly associated with the disease (RR = 3.05; 95% CI = 0.42–22.4). The lung cancer rate was also high among participants who reported a history of chronic obstructive pulmonary disease (RR = 1.60; 95% CI = 1.10–2.33), but was unrelated to other medical conditions. Dyspnea was reported by more than a third of participants and was associated with a significantly greater risk of lung cancer compared with those who did not report dyspnea (RR = 1.39; 95% CI = 1.00–1.93).
In about half the participants, lung spirometry was done at baseline. Participants with a forced expiratory flow (FEF25–75) less than 50% of predicted had a significantly greater risk of lung cancer (RR = 2.03; 95% CI = 1.13–3.62) than those with FEF25–75 80% or more, but lung cancer was unrelated to FVC (FVC% of predicted), forced expiratory volume in 1 second (FEV1% of predicted) or FEV1/FVC ratio. However, with the nonstandard cutoff of 90% (23), those with FEV1 less than 90% of predicted had twice the risk of lung cancer compared with those with FEV1 90% or more of predicted (RR = 2.09; 95% CI = 1.34–3.26). In the multivariable analysis, only age, smoking duration, number of cigarettes smoked, and predicted FEV1 (90% cutoff) were independently associated with lung cancer risk (Table 1). The cumulative incidence of screening-detected lung cancer according to age, smoking duration, number of cigarettes smoked, and predicted FEV1 is shown in Figure 2A and C.
Characteristic (participants) . | Lung cancers . | Univariate analysis HRa (95% CI) . | Multivariable analysis HRa (95% CI) . |
---|---|---|---|
Age | |||
<55 y (n = 1,759) | 31 | 1.00 | 1.00 |
55–59 y (n = 1,725) | 49 | 1.60 (1.02–2.51) | 1.41 (0.86–2.31) |
60–64 y (n = 1,061) | 46 | 2.50 (1.59–3.94) | 2.22 (1.27–3.88) |
≥65 y (n = 658) | 36 | 3.30 (2.04–5.33) | 2.16 (1.11–4.21) |
Smoking duration | |||
<35 y (n = 865) | 11 | 1.00 | 1.00 |
35–40 y (n = 1,623) | 42 | 2.01 (1.03–3.90) | 1.89 (0.97–3.69) |
40–44 y (n = 1,510) | 54 | 2.77 (1.45–5.29) | 2.02 (1.01–4.01) |
45–49 y (n = 798) | 24 | 2.40 (1.18–4.91) | 1.39 (0.63–3.06) |
≥50 y (n = 407) | 31 | 6.47 (3.25–12.9) | 3.45 (1.51–7.90) |
Number of cigarettes smoked | |||
<20/d (n = 911) | 20 | 1.00 | 1.00 |
20–24/d (n = 2011) | 54 | 1.21 (0.73–2.02) | 1.28 (0.77–2.15) |
25–29/d (n = 557) | 16 | 1.27 (0.66–2.46) | 1.35 (0.70–2.61) |
30–39/d (n = 973) | 41 | 1.91 (1.12–3.26) | 2.10 (1.22–3.60) |
≥40/d (n = 743) | 31 | 1.91 (1.09–3.35) | 2.05 (1.16–3.61) |
Dyspnea | |||
No (n = 2,956) | 82 | 1.00 | 1.00 |
Yes (n = 1,686) | 64 | 1.41 (1.02–1.95) | 1.20 (0.86–1.68) |
Chronic obstructive pulmonary disease | |||
No (n = 4,384) | 126 | 1.00 | 1.00 |
Yes (n = 819) | 36 | 1.60 (1.10–2.32) | 1.17 (0.80–1.73) |
Forced expiratory flow25–75 (FEF25–75% of predicted) | |||
≥80% (n = 1,523) | 42 | 1.00 | 1.00 |
50–80% (n = 677) | 28 | 1.49 (0.92–2.40) | 1.10 (0.65–1.86) |
<50% (n = 281) | 16 | 2.05 (1.16–3.65) | 1.05 (0.54–2.02) |
Forced expiratory volume in 1 s (FEV1% of predicted) | |||
≥90% (n = 1,300) | 30 | 1.00 | 1.00 |
<90% (n = 1,180) | 56 | 2.12 (1.36–3.30) | 1.74 (1.03–2.94) |
Characteristic (participants) . | Lung cancers . | Univariate analysis HRa (95% CI) . | Multivariable analysis HRa (95% CI) . |
---|---|---|---|
Age | |||
<55 y (n = 1,759) | 31 | 1.00 | 1.00 |
55–59 y (n = 1,725) | 49 | 1.60 (1.02–2.51) | 1.41 (0.86–2.31) |
60–64 y (n = 1,061) | 46 | 2.50 (1.59–3.94) | 2.22 (1.27–3.88) |
≥65 y (n = 658) | 36 | 3.30 (2.04–5.33) | 2.16 (1.11–4.21) |
Smoking duration | |||
<35 y (n = 865) | 11 | 1.00 | 1.00 |
35–40 y (n = 1,623) | 42 | 2.01 (1.03–3.90) | 1.89 (0.97–3.69) |
40–44 y (n = 1,510) | 54 | 2.77 (1.45–5.29) | 2.02 (1.01–4.01) |
45–49 y (n = 798) | 24 | 2.40 (1.18–4.91) | 1.39 (0.63–3.06) |
≥50 y (n = 407) | 31 | 6.47 (3.25–12.9) | 3.45 (1.51–7.90) |
Number of cigarettes smoked | |||
<20/d (n = 911) | 20 | 1.00 | 1.00 |
20–24/d (n = 2011) | 54 | 1.21 (0.73–2.02) | 1.28 (0.77–2.15) |
25–29/d (n = 557) | 16 | 1.27 (0.66–2.46) | 1.35 (0.70–2.61) |
30–39/d (n = 973) | 41 | 1.91 (1.12–3.26) | 2.10 (1.22–3.60) |
≥40/d (n = 743) | 31 | 1.91 (1.09–3.35) | 2.05 (1.16–3.61) |
Dyspnea | |||
No (n = 2,956) | 82 | 1.00 | 1.00 |
Yes (n = 1,686) | 64 | 1.41 (1.02–1.95) | 1.20 (0.86–1.68) |
Chronic obstructive pulmonary disease | |||
No (n = 4,384) | 126 | 1.00 | 1.00 |
Yes (n = 819) | 36 | 1.60 (1.10–2.32) | 1.17 (0.80–1.73) |
Forced expiratory flow25–75 (FEF25–75% of predicted) | |||
≥80% (n = 1,523) | 42 | 1.00 | 1.00 |
50–80% (n = 677) | 28 | 1.49 (0.92–2.40) | 1.10 (0.65–1.86) |
<50% (n = 281) | 16 | 2.05 (1.16–3.65) | 1.05 (0.54–2.02) |
Forced expiratory volume in 1 s (FEV1% of predicted) | |||
≥90% (n = 1,300) | 30 | 1.00 | 1.00 |
<90% (n = 1,180) | 56 | 2.12 (1.36–3.30) | 1.74 (1.03–2.94) |
NOTE: Forced expiratory flow25–75 and forced expiratory volume in 1 second are missing for 2,722 participants.
aHR with 95% CI obtained from univariate and multivariable Cox proportional hazards regression models.
After excluding the 55 lung cancers detected during the first screening round, we assessed the extent to which the results of the baseline screening CT predicted cancer diagnoses in subsequent CT rounds. This analysis was restricted to the 4,596 persons who participated in the second screening round with observation period starting at the date of second screening CT (Supplementary Material: Table S2). Visual evidence of emphysema on baseline CT doubled the risk (RR = 2.36; 95% CI = 1.59–3.49) of screening-detected lung cancer. Presence of a solid (RR = 2.00; 95% CI = 1.22–3.27), partially solid (RR = 3.43; 95% CI = 1.91–6.16) and nonsolid (RR = 10.1; 95% CI = 5.57–18.5) noncalcified nodule was also associated with increased cancer risk. Largest nodule size was also related to risk: participants with a nodule more than 8 mm at baseline (but not considered to be malignant by the diagnostic work-up protocol) had a 10-fold risk (RR = 9.89; 95% CI = 5.84–16.8) of being diagnosed with lung cancer at subsequent CT compared with those with smaller (<5 mm) nodule(s) (Supplementary Material: Table S2). Figure 2D and F show the cumulative incidence of screening-detected lung cancer according to baseline CT findings (emphysema, nodule type, and nodule size). Supplementary Material: Table S3 shows the distribution of participants and cancers detected at baseline CT and at subsequent CTs according to baseline CT findings (i.e., by different types of lesions in relation to size of largest lesion).
Prediction of lung cancer risk at first screening round
As in the model originally developed by Bach and colleagues in a nonscreened population (15), we found that age, smoking duration, and number of cigarettes smoked per day were the main determinants of lung cancer risk, while duration of quitting and exposure to asbestos were not significantly associated with cancer almost certainly because of the limited number of exposed participants. The Bach model estimated that 21 COSMOS participants would develop symptomatic lung cancer during the first year, whereas 55 lung cancers were detected in the first screening round (standardized incidence ratio = 2.62; 95% CI = 1.97–3.41). We therefore modified (recalibrated) the model using the average baseline risk calculated from our first round experience (Supplementary Material: recalibrated BACH model). Figure 3 plots the observed incidence versus the probability of COSMOS participants developing lung cancer, according to the original Bach model and the recalibrated Bach model. Although the observed incidence of lung cancer in COSMOS participants was higher than that predicted by the original Bach model (Hosmer–Lemeshow χ2 test = 70.7; P < 0.0001), the recalibrated Bach model accurately predicted the observed incidence (Hosmer–Lemeshow χ2 test = 6.2; P = 0.63).
Prediction of lung cancer risk after first screening round
After excluding lung cancers diagnosed during the first screening round, we found that presence of emphysema, nodule type, and nodule size strongly influenced the risk of being diagnosed with lung cancer at subsequent screening. A multivariable model incorporating these CT variables (using square root of largest nodule size) had a c-index of 0.744 (Table 2, model A). The discriminatory ability of this model did not improve when FEV1 (<90% vs. ≥90% of predicted) was incorporated [c-index = 0.747; P for difference = 0.87 (Table 2, model B], but increased somewhat after incorporating information about the background risk of each participant based on the Bach model (c-index = 0.763; P for difference from previous model = 0.11; Table 2, model C). A statistically significant improvement in reclassification of lung cancer risk was also observed when comparing model C with model A (NRI evaluated 3 years after first screening round = 0.325, P = 0.0001).
. | Model A . | Model B . | Model C . | Model D . | ||||
---|---|---|---|---|---|---|---|---|
. | CT only . | CT + functional . | CT + functional + epidemiologic . | CT + epidemiologic . | ||||
. | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . |
Emphysema | ||||||||
Yes vs. no | 1.99 (1.33–2.95) | 0.0007 | 1.92 (1.29–2.86) | 0.0014 | 1.71 (1.14–2.56) | 0.0092 | 1.76 (1.17–2.63) | 0.0062 |
Size of largest noncalcified nodule | ||||||||
mm (square root) | 1.85 (1.56–2.19) | <0.0001 | 1.86 (1.56–2.20) | <0.0001 | 1.78 (1.50–2.10) | <0.0001 | 1.76 (1.49–2.08) | <0.0001 |
Nodule typea | ||||||||
Solid vs. none/calcified | 0.49 (0.26–0.92) | <0.0001 | 0.48 (0.25–0.90) | <0.0001 | 0.49 (0.26–0.93) | <0.0001 | 0.51 (0.27–0.96) | <0.001 |
Partially solid vs. none/calcified | 0.72 (0.34-1.55) | 0.69 (0.32–1.48) | 0.73 (0.34-1.55) | 0.76 (0.36–1.61) | ||||
Nonsolid vs. none/calcified | 1.91 (0.86–4.23) | 1.83 (0.83–4.07) | 1.93 (0.87–4.28) | 2.06 (0.93–4.54) | ||||
FEV1% of predicted | ||||||||
<90% vs. >90% | 2.11 (1.20–3.70) | 0.0096 | 2.01 (1.14–3.53) | 0.0152 | ||||
Bach model | ||||||||
Linear predictor (1 unit) | 1.79 (1.35–2.37) | <0.0001 | 1.8 (1.37–2.38) | <0.0001 | ||||
c-index | 0.744 | 0.747 | 0.763 | 0.759 | ||||
P for difference between c-indexes | ||||||||
A vs. B | 0.866 | |||||||
A vs. C | 0.087 | |||||||
A vs. D | 0.267 | |||||||
B vs. C | 0.113 | |||||||
B vs. D | 0.448 | |||||||
C vs. D | 0.663 |
. | Model A . | Model B . | Model C . | Model D . | ||||
---|---|---|---|---|---|---|---|---|
. | CT only . | CT + functional . | CT + functional + epidemiologic . | CT + epidemiologic . | ||||
. | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . |
Emphysema | ||||||||
Yes vs. no | 1.99 (1.33–2.95) | 0.0007 | 1.92 (1.29–2.86) | 0.0014 | 1.71 (1.14–2.56) | 0.0092 | 1.76 (1.17–2.63) | 0.0062 |
Size of largest noncalcified nodule | ||||||||
mm (square root) | 1.85 (1.56–2.19) | <0.0001 | 1.86 (1.56–2.20) | <0.0001 | 1.78 (1.50–2.10) | <0.0001 | 1.76 (1.49–2.08) | <0.0001 |
Nodule typea | ||||||||
Solid vs. none/calcified | 0.49 (0.26–0.92) | <0.0001 | 0.48 (0.25–0.90) | <0.0001 | 0.49 (0.26–0.93) | <0.0001 | 0.51 (0.27–0.96) | <0.001 |
Partially solid vs. none/calcified | 0.72 (0.34-1.55) | 0.69 (0.32–1.48) | 0.73 (0.34-1.55) | 0.76 (0.36–1.61) | ||||
Nonsolid vs. none/calcified | 1.91 (0.86–4.23) | 1.83 (0.83–4.07) | 1.93 (0.87–4.28) | 2.06 (0.93–4.54) | ||||
FEV1% of predicted | ||||||||
<90% vs. >90% | 2.11 (1.20–3.70) | 0.0096 | 2.01 (1.14–3.53) | 0.0152 | ||||
Bach model | ||||||||
Linear predictor (1 unit) | 1.79 (1.35–2.37) | <0.0001 | 1.8 (1.37–2.38) | <0.0001 | ||||
c-index | 0.744 | 0.747 | 0.763 | 0.759 | ||||
P for difference between c-indexes | ||||||||
A vs. B | 0.866 | |||||||
A vs. C | 0.087 | |||||||
A vs. D | 0.267 | |||||||
B vs. C | 0.113 | |||||||
B vs. D | 0.448 | |||||||
C vs. D | 0.663 |
NOTE: HR and 95% CI obtained from multivariable Cox proportional modeling starting from the date of 2nd CT.
aSolid means only solid nodule(s) present; partially solid means at least 1 partially solid nodule with possibly solid nodule(s), but no nonsolid nodule(s); nonsolid means at least 1 nonsolid nodule with possibly solid or partially solid nodule(s) present.
Finally, elimination of information about FEV1 from model C did not substantially change the discriminatory ability of this model (c-index = 0.759; P for difference from previous model = 0.66; Table 2, model D). Improvement in lung cancer risk reclassification was positive for model D when compared with model C (NRI evaluated 3 years after first screening round = 0.11), but the difference was not statistically significant (P = 0.135). In all models, the assumption of proportional hazards was satisfied.
Because FEV1 assessment may not always be available, model D, which incorporates readily available information on the epidemiologic and baseline CT characteristics of participants, promises to be the most useful for deciding—in a smoker enrolled in a screening trial—the CT screening frequency (Supplementary Material: risk model for screening detected lung cancer at subsequent CT). The predictive and classification performance of the adopted model D is shown in Figure 4.
Supplementary Material: Table S4 shows, for COSMOS volunteers starting second screening CT, the numbers of screening and diagnostic procedures carried out, and numbers of lung cancers diagnosed, according to deciles of risk calculated from model D. Among the 459 persons in the first decile of risk (lowest risk group), 36 had recall CTs, and 3 CT-PETs; one was operated on for a benign nodule (false positive), but none were diagnosed with lung cancer more than 3 CT rounds. These 459 persons could have avoided the 3 yearly screening CT scans (and recall scans and consequent radiation exposure) as none were diagnosed with lung cancer. This amounted to 1,340 screening CTs, 36 recall CTs, and 3 CT-PET scans which could have been avoided.
At the other extreme, among the 459 participants in the top-risk decile (highest risk group), 177 received recall CTs and 66 CT-PET scans, 13 were operated on for benign nodule (false positive), but 45 were diagnosed with lung cancer during than 3 years (35 stage I, 6 stage II, and 4 stage IV).
Development of a prediction tool
To estimate the risk of an individual being diagnosed with lung cancer at screening entry, we developed a risk assessment calculator (Supplementary Material) based on the recalibrated Bach model. To estimate the risk of an individual being diagnosed with lung cancer after baseline CT we used prediction model D (Supplementary Material: Table S2) as the risk calculator (Supplementary Material: Risk Calculator). Both these calculators are incorporated into a single computer application. In the first case, individual risk factors are entered into the calculator; in the second case individual risk factors plus radiological findings at baseline CT are entered. In each case an estimate of the probability of the individual developing lung cancer within a defined period (typically 1 year) is the output.
Discussion
The COSMOS trial confirmed that screening is useful in detecting lung cancer at an early stage (21), and gave us the opportunity to construct our models to predict the risk of lung cancer in high-risk individuals undergoing screening.
After workup of baseline screening data, 55 participants (1.1%) were identified with lung cancer. Because only asymptomatic smokers were enrolled, most cancers were early stage, and curative resection was possible. As expected, the number of screening-detected cancers during the first screening round was higher than predicted by the Bach model (15), but was similar to that in other lung cancer screening trials (33).
Although this excess is mostly attributable to diagnosis anticipation, a small fraction might still be attributable to a documented (16) underestimation of lung cancer risk by the Bach model or to over diagnosis, as present for other forms of screening such as screening for prostate cancer and even melanoma (34).
Although the lung cancer detection rate dropped slightly from the first to the second screening round, it remained relatively constant subsequently (Fig. 1) suggesting that preexisting nodules considered not clinically significant at previous CT were progressing to malignancy, or that new malignancies were developing rapidly.
Data from nonrandomized trials suggest that screening might reduce lung cancer mortality by 20% to 40% (35, 36). The benefit of screening in smokers or former smokers has been recently confirmed by data from the randomized US National Lung Cancer Screening Trial (6). Despite these landmark favorable results, the problems of high cost, high radiation exposure, false positive findings, and deaths from competing events, might still mean that screening is unrealistic unless an optimal target population can be identified, and the screening interval can be optimized based on individualized risk and awaited cost-effectiveness analysis based on randomized trial data (37–40).
Most lung cancer screening trials recruited heavy smokers—the most readily identifiable high-risk category—but participants were not assessed individually for lung cancer risk (5–11). Furthermore current models to predict individual risk are known to underestimate the risk among heavy smokers (15–18).
We found that the predictors identified by Bach and colleagues (15) in the general population also predicted screening-detected lung cancer in our screened population. However, that overall risk was underestimated (Supplementary Material: Fig. S1, left panel). We therefore recalibrated the Bach model to render it applicable to screening, and in particular capable of selecting participants for future lung cancer screening and chemoprevention studies, in the same way that the Gail model can select high-risk participants for breast cancer chemoprevention trials (13, 14, 41).
We next investigated the effect of lung function on lung cancer risk. We found that dyspnea and FEF25–75 less than 50% of predicted were associated with lung cancer, but the association disappeared after adjustment for smoking. However, using the nonstandard FEV1 cutoff of 90% (24), patients with FEV1 less than 90% of predicted had a significantly higher lung cancer risk after adjustment for other risk factors including age and smoking history. However, we decided not to include this variable in our model, as spirometry results are not routinely available. Of interest, another recent risk prediction model (42) have been proposed incorporating data on pulmonary function and sputum DNA image cytometry to a validated risk prediction model developed using Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial data (43). In this model, decline FEV1% was also associated with lung cancer risk and addition of FEV1% to other base predictors increased the predictive value of the model.
Of the CT findings at baseline, presence of noncalcified nodules and size of largest nodule were the strongest predictors of subsequent lung cancer risk, not considering cancers diagnosed during the first CT round. For growing nodules, already present at baseline CT but not considered to be initially malignant, these initial radiological findings represent precharacteristics of lung cancer and may not be considered strict risk factors. However, to ensure that these nodules were not malignant, we followed a detailed protocol, which comprises the use of antibiotics to rule-out inflammatory disease, repeated CT at 3 or 6 months to assess the growth of suspect nodules and PET to quantify their metabolic activity. We recognized that our protocol may have failed to recognized early lung cancer, but its sensitivity (91%) and specificity (99.7%) are very high (22). Obviously, we did not remove all nonsolid nodules or large nodules if they did not show signs of progression at repeated CT. The current lack of noninvasive instruments to ascertain the malignant potential of these suspect nodules is the precise reason why we constructed our second model, combining traditional risk factors with radiological features to better quantify the probability of subjects to be diagnosed with lung cancer (or have lung cancer ascertained) at following screening rounds. Although the identity of CT detected nonsolid nodules or ground-glass opacities cannot be ascertained without histologic analysis, there is evidence that these nodules are the most likely to harbor atypical alveolar hyperplasia, the putative precursor of pulmonary adenocarcinomas (44–46). We also found that evidence of emphysema was an independent risk factor for lung cancer, in agreement with previous prospective studies (22, 47–51). Although a quantitative assessment of emphysema using densitometry may have resulted in a more reproducible evaluation of the extent of pulmonary emphysema and of its evolution over the time (52), we conducted a visual qualitative assessment which nonetheless represents a valid and reliable tool for the characterization of emphysema on CT (53).
We therefore constructed a new lung cancer risk model to quantify individual risk based on the individual risk factors used in the Bach model, plus the results of the baseline CT, so as to further stratify our already high-risk participants according to annual lung cancer detection rate in the subsequent 3 years. We preferred using the linear predictor obtained from the Bach model rather than estimating the single effect of age, sex, and smoking from our own data and to add new explanatory variables, if they significantly improve the calibrated literature model as proposed by Houwelingen (28). We believe that a model fully based on our data might have been less reliable as we observed only 97 events during follow-up CTs. Furthermore, since the COSMOS trial did not include long-term former smokers and only few participants were exposed to asbestos, these 2 important factors would not have been selected, limiting the generalization of the model. We should however recognize that we carried out only internal validation of our models and that external validation in different populations and settings is required. Other measures such as the Integrated Discrimination Improvement (IDI) index have been proposed to compare risk model performance (30, 31). Because IDI and the continuous version of NRI that we used done quite similarly in our application, and since the same was observed by Liu and colleagues (31) in their simulated data, we preferred to report only NRI for the sake of simplicity.
In detail, our model provided estimates of annual lung cancer detection rates that varied from 0 (first decile of participants) to more than 3% (top decile). Despite our population being comprised of heavy smokers, 40% of the participants had a less than 0.3% predicted annual risk of lung cancer, which approximately corresponds to the baseline risk of lung cancer of an unselected population older than 50 years of age (Fig. 3 and Supplementary Material: Table S3). However, only 10% of all lung cancers were diagnosed in this group during 3 years of follow-up. Increasing the screening CT interval to 3 years in this group of 1,838 participants would have saved approximately 4,000 CT scans (including recalls), avoided surgery for benign nodules in 7 participants, but delayed lung cancer diagnosis in 10 participants. By contrast, annual CT screening in those with a predicted annual risk of lung cancer more than 0.3% resulted in detection of 90% of all lung cancers diagnosed for the same individual costs and side effects, and comparable rate of false positives, as the low-risk group.
We therefore propose our model as useful tool for identifying lower risk persons in whom the time to the next screening CT can be safely increased, reducing unnecessary radiation exposure, and lowering the risk of intervention for false positive findings. Such persons would be relatively young, moderate smokers with no CT evidence of emphysema or noncalcified nodules. It is important to lower exposure to radiation associated with screening CT and associated recall CT and PET, since it has been estimated that annual low-dose screening CT from age 50 to 75 years incurs a lifetime lung cancer risk of 0.23% in men and 0.85% women (37). It is unlikely that reduction in lung cancer mortality due to CT screening can outweigh this increased risk due to screening in individuals with moderate-low lung cancer risk initiating screening at a young age (54).
Like any new model, these first risk prediction tools specifically applicable to screening will require external validation before they can be applied on a large scale. They have the strength however of being based on the validated Bach model (15, 16), itself based on clinical and epidemiologic factors known to be associated with lung cancer risk (22, 47–51). In addition, our models were derived from the prospective COSMOS study conducted at a single center, following a strict protocol, with dedicated staff including surgeon and pathologist, in which participant compliance was high.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
We thank Giovanna Ciambrone for general management of COSMOS volunteers and Don Ward for help with the English.
Grant Support
This study was supported by the Italian Association for Cancer Research (AIRC), the Italian Foundation for Cancer Research (FIRC) and the American-Italian Cancer Foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.