Abstract
Chronic obstructive pulmonary disease (COPD) is a heterogeneous condition with respect to onset, progression, and response to therapy. Incorporating clinical- and imaging-based features to refine COPD phenotypes provides valuable information beyond that obtained from traditional clinical evaluations. We characterized the spectrum of COPD-related phenotypes in a sample of former and current smokers and evaluated how these subgroups differ with respect to sociodemographic characteristics, COPD-related comorbidities, and subsequent risk of lung cancer.
White (N = 659) and African American (N = 520) male and female participants without lung cancer (controls) in the INHALE study who completed a chest CT scan, interview, and spirometry test were used to define distinct COPD-related subgroups based on hierarchical clustering. Seven variables were used to define clusters: pack years, quit years, FEV1/FVC, % predicted FEV1, and from quantitative CT (qCT) imaging, % emphysema, % air trapping, and mean lung density ratio. Cluster definitions were then applied to INHALE lung cancer cases (N = 576) to evaluate lung cancer risk.
Five clusters were identified that differed significantly with respect to sociodemographic (e.g., race, age) and clinical (e.g., BMI, limitations due to breathing difficulties) characteristics. Increased risk of lung cancer was associated with increasingly detrimental lung function clusters (when ordered from most detrimental to least detrimental).
Measures of lung function vary considerably among smokers and are not fully explained by smoking intensity.
Combining clinical (spirometry) and radiologic (qCT) measures of COPD defines a spectrum of lung disease that predicts lung cancer risk differentially among patient clusters.
Introduction
Chronic obstructive pulmonary disease (COPD) is the third most common cause of death in the United States and is also a major risk factor for lung cancer, the leading cause of cancer-related deaths. Spirometry is used to diagnose COPD but only quantifies a single aspect of COPD pathophysiology, a heterogeneous disease with respect to onset, spatial distribution, and progression. Other factors such as exacerbations, comorbidities, and physical attributes such as body mass index (BMI) or exercise endurance have been explored as a means to further refine COPD subphenotypes (1–4). Quantitative imaging based on low-dose chest CT scans allows objective measurement of radiologic features of COPD, and the regional distribution of COPD gained from quantitative CT (qCT) measures have been shown to provide additional clinically relevant information beyond spirometry. qCT whole lung measures of emphysema and air trapping are correlated with measures of lung function from spirometry, even in presymptomatic individuals, and predict risk of lung cancer (5–10).
The INHALE study was designed to evaluate different measures of COPD in relation to lung cancer risk in lung cancer cases and population-based controls. In this study, we explored COPD phenotyping in 1,179 ever-smoking controls (520 African American and 659 white) and 576 ever-smoking lung cancer cases (228 African American and 348 white) in INHALE who underwent spirometry and a chest CT scan, in addition to completing an interview. Hierarchical clustering was performed in controls to determine the spectrum of lung phenotypes and identify lung function subgroups using both clinical (i.e., pack years, spirometry) and radiologic (qCT) measures. These cluster-defined categories were used to determine their association with race, other sociodemographic and clinical characteristics, and COPD-related comorbidities including lung cancer risk.
Materials and Methods
Study participants
The INHALE study was initiated in 2012 and has been described previously (10). Briefly, lung cancer cases were enrolled at the Karmanos Cancer Institute (Detroit, MI) or Henry Ford Health System (HFHS, Detroit, MI) within 12 months of diagnosis. Volunteer controls were enrolled from the metropolitan Detroit area. Cases and controls were 21–89 years of age, able to complete the CT scan, never had taken amiodarone, or been diagnosed with bronchiectasis or cystic fibrosis. In addition, controls never had surgical removal of any portion of either lung, or been diagnosed with lung cancer and carried health insurance (in the event medical follow-up was required on the basis of a clinical finding on the CT or spirometry). Written informed consent was obtained from all subjects prior to participation. Participants completed an interview, low-dose chest CT scan, and pulmonary function test (PFT) with either spirometry at the time of enrollment or, for some cases unable to complete spirometry at the time of interview, PFTs were abstracted from medical records around the time of diagnosis. The Wayne State University (Detroit, MI) and HFHS (Detroit, MI) Institutional Review Boards approved the procedures used in collecting and processing participant information, which were in accordance with the Declaration of Helsinki. The analyses presented here included only white and African American current or former smoking control participants and lung cancer cases.
Trait definitions
Demographic data, smoking history, medication use, and other clinical characteristics (e.g., weight/height, physical activity, and diet) were ascertained from interviews. Pack years was calculated by multiplying number of years smoked by the average number of cigarettes smoked per day divided by 20. Family history of lung cancer was recorded as “yes” if the participant reported at least one first-degree relative with a diagnosis of lung cancer. PFTs were performed by trained technicians in accordance with ATS guidelines (11). FEV1 and FVC were measured and FEV1/FVC ratio was calculated. Predicted normal values were calculated according to sex, age, height, and race using the Third National Health and Nutrition Examination Survey (12). Board-certified pulmonologists, blinded to study group, reviewed the spirometry results for quality assurance. Severity of COPD was classified according to Global Initiative for Chronic Obstructive Lung Disease (GOLD) staging (13).
CT scans were taken at both full inspiration and full expiration under a protocol standardized across scanners (14). Radiologic (qCT) measures of COPD were generated by VIDA Diagnostics software (www.vidadiagnostics.com). Whole lung qCT measures included percent air trapping (qCT % air trapping), defined as the percent voxels below −856 Houndsfield Units (HU) on expiration, and percent emphysema (qCT % emphysema), defined as the percent voxels below −950 HU on inspiration, and mean lung density (MLD) ratio, defined as expiratory/inspiratory MLD. MLD ratio is considered a scanner-independent measure of air trapping associated with emphysema (15).
Statistical analysis
Tests of homogeneity by race or by cluster were performed using either χ2 tests or Fisher exact tests for categorical variables and t tests or ANOVA for continuous variables. Because of skewness, median and interquartile range was calculated for percent emphysema and homogeneity was evaluated by the nonparametric Wilcoxon rank sum test. Hierarchical clustering was performed to identify groups of individuals who are more similar to each other than to other groups based on seven lung disease–related variables (pack years, quit years, FEV1/FVC, FEV1 % predicted, qCT % emphysema, qCT % air trapping, and MLD ratio). The analyses were performed on standardized variables, due to differences in scale and variability between the measures. Dissimilarities were calculated using Ward's method, which measures the distance between two groups as the increase in sums of squares associated with merging the groups. The merging cost was used as an objective measure to aid in selecting the optimal number of clusters. Linear discriminant analysis was used to evaluate separation among clusters; only the first two linear discriminants were used to plot separation among clusters. Statistical significance was defined as P < 0.05.
The optimal clustering result was ordered from “most detrimental” to “least detrimental” through a simple averaging of individual “lung disease” Z-scores. Weights assigned to each value were either 1 or −1, depending on whether increased values negatively (1, i.e., pack years and qCT measures) or positively (−1, i.e., quit years and spirometry measures) affected lung disease, such that a higher score corresponded to a more detrimental lung disease profile. Clusters were applied to lung cancer cases, standardizing measures in cases based on their respective distribution in controls and estimating Euclidean distance from each cluster. The minimum distance among clusters was used to assign cases to control-based clusters. Logistic regression modeling was then used to estimate risk of lung cancer associated with (ordered) lung function subgroups.
All analyses were performed using R v3.4 statistical software.
Results
Sample description
A description of 659 white and 520 African American INHALE controls is presented in Tables 1 and 2. Whites and African Americans differed significantly by educational level (≤high school/>high school), BMI, smoking status (former/current), frequent physical activity (<3 times per week/≥3 times per week), reported limitations due to breathing difficulties (yes/no), and regular aspirin/NSAID use (yes/no). Of the seven variables used for clustering (Table 2), pack years, quit years, qCT % emphysema, and MLD ratio differed significantly by race.
Characteristics of INHALE controls used for clustering (N = 1,179)
Variable . | Whites (n = 659) . | African Americans (n = 520) . | Test of homogeneity . |
---|---|---|---|
Gender | |||
Male | 312 (47.3) | 222 (42.7) | 0.111 |
Female | 347 (52.7) | 298 (57.3) | |
Age | 61.1 (9.6) | 60.1 (8.9) | 0.057 |
Education | |||
≤High school | 191 (29.0) | 233 (44.8) | <0.001 |
>High school | 468 (71.0) | 287 (55.2) | |
BMI (kg/m2) | 28.6 (5.8) | 29.8 (6.7) | 0.002 |
Smoking status | |||
Former | 302 (45.8) | 138 (26.5) | <0.001 |
Current | 357 (54.2) | 382 (73.5) | |
Family history of lung cancer | |||
No | 547 (83.0) | 442 (85.2) | 0.316 |
Yes | 112 (17.0) | 77 (14.8) | |
Frequent physical activitya | |||
<3×/week | 288 (43.7) | 259 (49.9) | 0.034 |
≥3×/week | 371 (56.3) | 260 (50.1) | |
Limitations due to breathing difficultiesb | |||
No | 524 (79.6) | 356 (68.7) | <0.001 |
Yes | 134 (20.4) | 162 (31.3) | |
Regular aspirin/NSAID usec | |||
No | 179 (27.3) | 189 (36.6) | 0.001 |
Yes | 478 (72.7) | 328 (63.4) | |
Alcohol consumption (drinks/week) | 4.4 (8.0) | 4.3 (10.6) | 0.835 |
GOLD score | |||
0 (none) | 461 (70.0) | 331 (63.6) | |
1 (mild) | 30 (4.5) | 30 (5.8) | |
2 (moderate) | 107 (16.2) | 107 (20.6) | 0.211 |
3 (severe) | 54 (8.2) | 45 (8.6) | |
4 (very severe) | 7 (1.1) | 7 (1.4) |
Variable . | Whites (n = 659) . | African Americans (n = 520) . | Test of homogeneity . |
---|---|---|---|
Gender | |||
Male | 312 (47.3) | 222 (42.7) | 0.111 |
Female | 347 (52.7) | 298 (57.3) | |
Age | 61.1 (9.6) | 60.1 (8.9) | 0.057 |
Education | |||
≤High school | 191 (29.0) | 233 (44.8) | <0.001 |
>High school | 468 (71.0) | 287 (55.2) | |
BMI (kg/m2) | 28.6 (5.8) | 29.8 (6.7) | 0.002 |
Smoking status | |||
Former | 302 (45.8) | 138 (26.5) | <0.001 |
Current | 357 (54.2) | 382 (73.5) | |
Family history of lung cancer | |||
No | 547 (83.0) | 442 (85.2) | 0.316 |
Yes | 112 (17.0) | 77 (14.8) | |
Frequent physical activitya | |||
<3×/week | 288 (43.7) | 259 (49.9) | 0.034 |
≥3×/week | 371 (56.3) | 260 (50.1) | |
Limitations due to breathing difficultiesb | |||
No | 524 (79.6) | 356 (68.7) | <0.001 |
Yes | 134 (20.4) | 162 (31.3) | |
Regular aspirin/NSAID usec | |||
No | 179 (27.3) | 189 (36.6) | 0.001 |
Yes | 478 (72.7) | 328 (63.4) | |
Alcohol consumption (drinks/week) | 4.4 (8.0) | 4.3 (10.6) | 0.835 |
GOLD score | |||
0 (none) | 461 (70.0) | 331 (63.6) | |
1 (mild) | 30 (4.5) | 30 (5.8) | |
2 (moderate) | 107 (16.2) | 107 (20.6) | 0.211 |
3 (severe) | 54 (8.2) | 45 (8.6) | |
4 (very severe) | 7 (1.1) | 7 (1.4) |
NOTE: Categorical measures presented as N (%), continuous measures presented as mean (SD).
aFrequent physical activity defined as activities or exercises (other than work) performed ≥3×/week for 1 month or more in the past year.
bAny reported limitation in usual activities due to breathing difficulties or shortness of breath.
cRegular aspirin/NSAID use defined as taking either adult aspirin, baby aspirin, or NSAID ≥3×/week for 1 month or more.
Description of variables used in hierarchical clustering of white and African American INHALE controls (N = 1,179)
Variable . | Whites (n = 659) . | African Americans (n = 520) . | Test of homogeneity . |
---|---|---|---|
Pack years | 38.0 (26.2) | 27.2 (19.4) | <0.001 |
Quit years (former smokers) | 17.0 (12.2) | 14.6 (11.2) | 0.049 |
FEV1/FVC | 0.72 (0.11) | 0.72 (0.13) | 0.497 |
% Predicted FEV1 | 77.3 (20.1) | 76.6 (21.1) | 0.564 |
qCT % Emphysemaa (median, IQR) | 1.2 (2.3) | 0.9 (2.2) | 0.001 |
qCT % Gas trappingb | 15.6 (15.8) | 15.9 (17.7) | 0.756 |
qCT MLD Ratioc | 0.87 (0.06) | 0.88 (0.07) | 0.038 |
Variable . | Whites (n = 659) . | African Americans (n = 520) . | Test of homogeneity . |
---|---|---|---|
Pack years | 38.0 (26.2) | 27.2 (19.4) | <0.001 |
Quit years (former smokers) | 17.0 (12.2) | 14.6 (11.2) | 0.049 |
FEV1/FVC | 0.72 (0.11) | 0.72 (0.13) | 0.497 |
% Predicted FEV1 | 77.3 (20.1) | 76.6 (21.1) | 0.564 |
qCT % Emphysemaa (median, IQR) | 1.2 (2.3) | 0.9 (2.2) | 0.001 |
qCT % Gas trappingb | 15.6 (15.8) | 15.9 (17.7) | 0.756 |
qCT MLD Ratioc | 0.87 (0.06) | 0.88 (0.07) | 0.038 |
NOTE: Variables summarized as mean (SD) except where noted.
aPercent lung voxels < −950 HU on inspiration across both lungs on qCT.
bPercent lung voxels < −856 HU on expiration across both lungs on qCT.
cMLD ratio = expiratory MLD/inspiratory MLD on qCT.
Subgroups of smokers based on clustering in controls
We evaluated COPD phenotypes through clustering analysis in the 1,179 INHALE participants who were free of lung cancer (controls). On the basis of the increase in merging cost associated with each consecutive clustering event, five- and seven-cluster results were both considered (Supplementary Fig. S1). After investigating the separation of spirometry- and quantitative-based measures in each result, k = 5 was selected as the optimum clustering. Separation of the five clusters according to the first two linear discriminants is presented in Supplementary Fig. S2. The first two eigenvalues explained 77.0% of the variance.
To gain insight on the relative position of these clusters along the spectrum of lung disease, a weighted sum of variable Z-score means within each cluster was used to rank clusters from most detrimental (cluster 1) to least detrimental (cluster 5) with respect to lung disease. Cluster profiles are depicted in a mean trends chart in Figure 1, and actual mean values are listed in Table 3.
Hierarchical clustering results for 1,179 INHALE controls. Mean trends are shown for variables used in clustering. Clusters were ordered and numbered by most detrimental lung disease profile (cluster 1) to least detrimental profile (cluster 5), based on variable means within each cluster. Arrows indicate standardized group means for particular variables, relative to the overall mean (μ = 1), as follows: ↑/↓, 0.2–1 SD above/below overall mean; ↑↑/↓↓, 1–1.9 SDs above/below overall mean; ↑↑↑/↓↓↓, > 2 SDs above/below overall mean; –, within 0.1 SDs above/below mean. Red, negative mean trend; blue, beneficial mean trend.
Hierarchical clustering results for 1,179 INHALE controls. Mean trends are shown for variables used in clustering. Clusters were ordered and numbered by most detrimental lung disease profile (cluster 1) to least detrimental profile (cluster 5), based on variable means within each cluster. Arrows indicate standardized group means for particular variables, relative to the overall mean (μ = 1), as follows: ↑/↓, 0.2–1 SD above/below overall mean; ↑↑/↓↓, 1–1.9 SDs above/below overall mean; ↑↑↑/↓↓↓, > 2 SDs above/below overall mean; –, within 0.1 SDs above/below mean. Red, negative mean trend; blue, beneficial mean trend.
Hierarchical clustering of INHALE current/former smokers with qCT and spirometry data (N = 1,179).
. | Cluster . | ||||
---|---|---|---|---|---|
Variable . | 1 . | 2 . | 3 . | 4 . | 5 . |
N | 64 | 232 | 73 | 609 | 201 |
Pack years | 47.5 (24.1) | 38.4 (31.4) | 26.0 (18.7) | 34.2 (20.9) | 22.5 (19.6) |
Quit years | 2.6 (4.9) | 1.1 (3.1) | 6.2 (10.1) | 1.8 (3.6) | 25.9 (9.9) |
FEV1/FVC | 0.53 (0.11) | 0.69 (0.10) | 0.47 (0.11) | 0.77 (0.07) | 0.77 (0.07) |
% Predicted FEV1 | 53.9 (18.6) | 67.0 (16.6) | 49.1 (16.2) | 83.5 (16.9) | 86.5 (17.6) |
qCT % Emphysemaa | 16.3 (7.3) | 2.6 (2.4) | 1.5 (1.6) | 1.2 (1.4) | 2.4 (2.5) |
qCT % Air trappingb | 51.3 (13.6) | 30.6 (16.9) | 11.2 (9.6) | 7.1 (6.7) | 14.9 (13.5) |
qCT MLD Ratioc | 0.94 (0.04) | 0.95 (0.05) | 0.86 (0.06) | 0.84 (0.05) | 0.86 (0.06) |
. | Cluster . | ||||
---|---|---|---|---|---|
Variable . | 1 . | 2 . | 3 . | 4 . | 5 . |
N | 64 | 232 | 73 | 609 | 201 |
Pack years | 47.5 (24.1) | 38.4 (31.4) | 26.0 (18.7) | 34.2 (20.9) | 22.5 (19.6) |
Quit years | 2.6 (4.9) | 1.1 (3.1) | 6.2 (10.1) | 1.8 (3.6) | 25.9 (9.9) |
FEV1/FVC | 0.53 (0.11) | 0.69 (0.10) | 0.47 (0.11) | 0.77 (0.07) | 0.77 (0.07) |
% Predicted FEV1 | 53.9 (18.6) | 67.0 (16.6) | 49.1 (16.2) | 83.5 (16.9) | 86.5 (17.6) |
qCT % Emphysemaa | 16.3 (7.3) | 2.6 (2.4) | 1.5 (1.6) | 1.2 (1.4) | 2.4 (2.5) |
qCT % Air trappingb | 51.3 (13.6) | 30.6 (16.9) | 11.2 (9.6) | 7.1 (6.7) | 14.9 (13.5) |
qCT MLD Ratioc | 0.94 (0.04) | 0.95 (0.05) | 0.86 (0.06) | 0.84 (0.05) | 0.86 (0.06) |
NOTE: Clustering performed on standardized variables, actual values (cluster means and SDs) presented. Clusters are ordered from most detrimental lung disease profile (cluster 1) to least detrimental (cluster 5).
aPercent lung voxels < −950 HU on inspiration across both lungs.
bPercent lung voxels < −856 HU on expiration across both lungs.
cMLD ratio = expiratory MLD/inspiratory MLD.
Cluster 1 (N = 64, 5.4%) was defined by heavier smoking (higher pack years, lower quit years), very poor lung function on spirometry (FEV1/FVC < 0.70 and very low % predicted FEV1), and very poor lung imaging phenotype (very high % qCT emphysema, air trapping, and MLD ratio).
Cluster 2 (N = 232, 19.7%) included individuals with greater smoking intensity (above average pack years and very low average quit years), poor lung function on spirometry, and poor imaging phenotype.
Cluster 3 (N = 73, 6.2%) was defined by relatively lighter smokers with very poor lung function on spirometry but below average levels of qCT air trapping and emphysema.
Cluster 4 (N = 609, 51.7%) was defined by moderate smoking intensity with little evidence of impaired lung function on either spirometry (FEV1/FVC > 0.7 and above average % predicted FEV1) or qCT (low % emphysema, % air trapping, and MLD ratio).
Cluster 5 (N = 201, 17.0%) included former smokers with low pack years, above average lung function on spirometry (FEV1/FVC > 0.7 and high mean % predicted FEV1) but average levels of qCT emphysema, air trapping, and MLD ratio.
Sociodemographic and clinical cluster profiles
Cluster sociodemographic and clinical characteristics are presented in Table 4.
Sociodemographic and clinical characteristics by cluster of INHALE current/former smokers with qCT and spirometry data (N = 1,179)
. | Cluster . | . | ||||
---|---|---|---|---|---|---|
Variable . | 1 . | 2 . | 3 . | 4 . | 5 . | Phomogeneity . |
N | 64 | 232 | 73 | 609 | 201 | |
Race (African American) | 31 (48.4) | 117 (50.4) | 41 (56.2) | 274 (45.0) | 57 (28.4) | <0.001 |
Gender (female) | 27 (42.2) | 119 (51.3) | 36 (49.3) | 347 (57.0) | 116 (57.7) | 0.092 |
Age (years) | 68.2 (7.7) | 62.9 (8.2) | 59.1 (9.4) | 58.1 (8.8) | 64.0 (9.4) | <0.001 |
Education (>high school) | 36 (56.3) | 124 (53.5) | 50 (68.5) | 392 (64.4) | 153 (76.1) | <0.001 |
Family history of lung cancer | 11 (17.2) | 34 (14.7) | 8 (11.0) | 101 (16.6) | 35 (17.4) | 0.703 |
History of asthma | 7 (11.7) | 40 (17.5) | 10 (13.7) | 94 (15.5) | 53 (26.4) | 0.005 |
Current smoker | 42 (65.6) | 195 (84.1) | 44 (60.3) | 458 (75.2) | 0 | <0.001 |
Alcohol consumption (drinks/week) | 3.5 (5.3) | 4.1 (7.5) | 5.5 (9.6) | 4.9 (11.1) | 2.8 (4.7) | 0.043 |
. | Cluster . | . | ||||
---|---|---|---|---|---|---|
Variable . | 1 . | 2 . | 3 . | 4 . | 5 . | Phomogeneity . |
N | 64 | 232 | 73 | 609 | 201 | |
Race (African American) | 31 (48.4) | 117 (50.4) | 41 (56.2) | 274 (45.0) | 57 (28.4) | <0.001 |
Gender (female) | 27 (42.2) | 119 (51.3) | 36 (49.3) | 347 (57.0) | 116 (57.7) | 0.092 |
Age (years) | 68.2 (7.7) | 62.9 (8.2) | 59.1 (9.4) | 58.1 (8.8) | 64.0 (9.4) | <0.001 |
Education (>high school) | 36 (56.3) | 124 (53.5) | 50 (68.5) | 392 (64.4) | 153 (76.1) | <0.001 |
Family history of lung cancer | 11 (17.2) | 34 (14.7) | 8 (11.0) | 101 (16.6) | 35 (17.4) | 0.703 |
History of asthma | 7 (11.7) | 40 (17.5) | 10 (13.7) | 94 (15.5) | 53 (26.4) | 0.005 |
Current smoker | 42 (65.6) | 195 (84.1) | 44 (60.3) | 458 (75.2) | 0 | <0.001 |
Alcohol consumption (drinks/week) | 3.5 (5.3) | 4.1 (7.5) | 5.5 (9.6) | 4.9 (11.1) | 2.8 (4.7) | 0.043 |
NOTE: Clusters are ordered from most detrimental lung disease profile (cluster 1) to least detrimental (cluster 5). Dichotomous variables presented as N (%), continuous variables presented as mean (SD).
The proportion of African Americans was significantly different across clusters, ranging from 28.4% in cluster 5 (N = 57) to 56.2% in cluster 3 (N = 41; P < 0.001). Age and education level (≤high school vs. >high school) were also significantly different across clusters (P < 0.001 for both characteristics), with the relative frequency of those with education beyond high school increasing with less severe lung disease profiles. The proportion of individuals reporting a history of asthma differed (P = 0.005) across clusters, such that the lowest proportion was in the most detrimental cluster (11.7% in cluster 1) and the highest was in the least detrimental cluster (26.4% in cluster 5). Consistent with trends in quit years among clusters, proportions of current smokers were lowest in cluster 5 (no current smokers, highest quit years, mean = 25.9) and highest in cluster 2 (84% current smokers, lowest quit years, mean = 1.1).
Trends in COPD-related comorbidities among clusters
Among COPD-related comorbidities available from INHALE data, BMI, limitations due to breathing difficulty, and inflammatory conditions (gout, lupus, rheumatoid arthritis, and sarcoidosis) differed significantly across clusters, while chronic pain conditions (osteoarthritis and fibromyalgia) and diabetes were marginally significantly different (Supplementary Table S1). Mean BMI was lowest in cluster 1 (mean = 24.6, SD = 3.8) and highest in cluster 4 (mean = 30.0, SD = 6.3). The proportion of individuals with physical limitations due to breathing difficulties decreased with less detrimental lung disease clusters, with the highest proportion in cluster 1 (42.2%, N = 27) and lowest in cluster 5 (12.9%, N = 26).
Lung cancer risk among lung disease clusters
After applying cluster definitions to INHALE lung cancer cases, there was a trend in the crude OR of lung cancer across clusters, such that higher odds of lung cancer were associated with more detrimental lung disease clusters, using cluster 5 as the reference group. After adjusting for age, race, gender, and BMI, these trends persisted (Table 5). The odds of lung cancer in cluster 1 were more than 2.6 times the odds of lung cancer in cluster 5 [OR = 2.65; 95% confidence interval (CI), 1.71–4.10], and odds of lung cancer in cluster 2 were more than 2.3 times the odds of lung cancer in cluster 5 (OR = 2.33; 95% CI, 1.66– 3.26). Risk of lung cancer was significantly increased in cluster 3 compared with cluster 5 (OR = 1.68; 95% CI, 1.04–2.73), but risk was not significantly different in cluster 4 compared with cluster 5 (OR = 0.92; 95% CI, 0.67–1.29). When cluster order was treated as a continuous variable, odds of lung cancer decreased by 28% with each increase in lung disease cluster (less detrimental, P < 0.0001).
Control cluster definitions applied to INHALE current/former smoker lung cancer cases with qCT and spirometry/PFT data (N = 576)
. | Cluster . | ||||
---|---|---|---|---|---|
Variable . | 1 . | 2 . | 3 . | 4 . | 5 . |
N | 77 | 205 | 40 | 177 | 77 |
Pack years | 54.4 (33.9) | 53.0 (31.4) | 34.9 (18.5) | 42.7 (25.7) | 22.3 (16.8) |
Quit years | 5.5 (9.2) | 2.6 (5.9) | 5.3 (9.2) | 2.3 (4.1) | 29.4 (10.5) |
FEV1/FVC | 0.51 (0.10) | 0.66 (0.08) | 0.52 (0.09) | 0.74 (0.06) | 0.74 (0.08) |
% Predicted FEV1 | 49.7 (17.4) | 67.4 (17.2) | 48.5 (11.4) | 80.4 (18.6) | 80.7 (19.2) |
qCT % Emphysemaa | 16.4 (8.3) | 3.0 (2.6) | 2.7 (2.2) | 1.3 (1.7) | 2.0 (2.0) |
qCT % Air trappingb | 54.9 (14.1) | 31.1 (15.2) | 17.3 (12.5) | 7.3 (5.4) | 16.2 (12.9) |
qCT MLD Ratioc | 0.95 (0.04) | 0.94 (0.04) | 0.88 (0.06) | 0.84 (0.05) | 0.87 (0.07) |
Crude OR (95% CI)d | 3.14 (2.06–4.79) | 2.31 (1.67–3.19) | 1.43 (0.90–2.28) | 0.76 (0.56–1.04) | 1.00 |
Adjusted OR (95% CI)e | 2.65 (1.71–4.10) | 2.33 (1.66–3.26) | 1.68 (1.04–2.73) | 0.92 (0.67–1.29) | 1.00 |
. | Cluster . | ||||
---|---|---|---|---|---|
Variable . | 1 . | 2 . | 3 . | 4 . | 5 . |
N | 77 | 205 | 40 | 177 | 77 |
Pack years | 54.4 (33.9) | 53.0 (31.4) | 34.9 (18.5) | 42.7 (25.7) | 22.3 (16.8) |
Quit years | 5.5 (9.2) | 2.6 (5.9) | 5.3 (9.2) | 2.3 (4.1) | 29.4 (10.5) |
FEV1/FVC | 0.51 (0.10) | 0.66 (0.08) | 0.52 (0.09) | 0.74 (0.06) | 0.74 (0.08) |
% Predicted FEV1 | 49.7 (17.4) | 67.4 (17.2) | 48.5 (11.4) | 80.4 (18.6) | 80.7 (19.2) |
qCT % Emphysemaa | 16.4 (8.3) | 3.0 (2.6) | 2.7 (2.2) | 1.3 (1.7) | 2.0 (2.0) |
qCT % Air trappingb | 54.9 (14.1) | 31.1 (15.2) | 17.3 (12.5) | 7.3 (5.4) | 16.2 (12.9) |
qCT MLD Ratioc | 0.95 (0.04) | 0.94 (0.04) | 0.88 (0.06) | 0.84 (0.05) | 0.87 (0.07) |
Crude OR (95% CI)d | 3.14 (2.06–4.79) | 2.31 (1.67–3.19) | 1.43 (0.90–2.28) | 0.76 (0.56–1.04) | 1.00 |
Adjusted OR (95% CI)e | 2.65 (1.71–4.10) | 2.33 (1.66–3.26) | 1.68 (1.04–2.73) | 0.92 (0.67–1.29) | 1.00 |
NOTE: Clusters assigned based on standardized values, actual values (means and SDs) presented here. Bold text indicates statistical significance at α = 0.05.
aPercent lung voxels < −950 HU on inspiration across both lungs.
bPercent lung voxels < −856 HU on expiration across both lungs.
cMLD ratio = expiratory MLD/inspiratory MLD.
dUnadjusted OR comparing odds of lung cancer in each cluster to odds of lung cancer in cluster 5 (reference).
eOR comparing odds of lung cancer in each cluster to odds of lung cancer in cluster 5 (reference), adjusted for age, race, gender, and BMI.
Discussion
The spectrum of lung disease in this population-based sample of former and current smokers was defined by five unique combinations of smoking history, spirometry, and quantitative imaging phenotypes. We found significant evidence of racial heterogeneity across these clusters, consistent with overall differences observed in smoking history and qCT measures. Despite African Americans smoking, on average, fewer pack years compared with whites, the proportion of African Americans in cluster 1 (most detrimental, heaviest smoking) was similar to the overall proportion of African Americans (48% in cluster 1 vs. 44% overall), whereas the lightest smoking cluster (cluster 5, least detrimental) had the lowest proportion (28%) of African Americans among all clusters. The highest proportion of African Americans was found in cluster 3 (56%), which consisted of relatively light smokers (mean pack years = 26) with very poor spirometry measures. Cluster 3 was small (N = 73) but notable because it included individuals with the poorest lung function on spirometry, yet qCT measures were below average, even slightly lower than cluster 5 (very low % emphysema, % air trapping, and MLD ratio). Subjects in this cluster are more likely to be younger and African American than in cluster 5. These observations suggest that smoking intensity alone is not a sufficient indicator of overall lung disease, especially among African Americans, and that spirometry measures of poor lung function may precede qCT measures among lighter smoking African Americans. These findings are consistent with results from the National Emphysema Treatment Trial (NETT), which also found that African Americans had lower qCT measures of emphysema despite similarly poor spirometry values compared with whites (16). As in our study, African Americans in NETT were younger and smoked less, on average, than white participants, although African American enrollment was very limited in the NETT study (N = 42). We note that race contributes approximately 1.3% of the variability in cluster assignment, which is similar to other covariates significantly associated with cluster (e.g., age, 1.7%; education, 1.6%), suggesting that race is only one of many factors contributing to the cluster result.
Cluster 4, which included mostly current (75%) and fairly heavy smokers (mean pack years = 34), had lung function on spirometry comparable with cluster 5 and better qCT than cluster 5. This was the largest cluster, comprising 52% of the total sample (609/1,179) and was younger, on average, compared with cluster 5. This difference may indicate that airway damage as captured by qCT progresses with age, even after smoking cessation.
Although COPD is widely recognized as a heterogeneous disease, COPD subphenotyping continues to evolve. The most recent GOLD executive summary statement requires spirometry for diagnosis but recommends that other factors such as symptoms and comorbidities be considered in the subphenotyping of patients into risk categories (13). CT imaging has been used in specific subsets of patients with severe disease when making treatment decisions, but has not been routinely performed to aid in diagnosis or treatment of COPD (17). This is despite evidence that CT measures of emphysema, air trapping, and airway morphology correlate strongly with COPD severity (8, 18–20). The cluster results presented here indicate imaging data can aid clinicians in identifying individuals at risk for developing COPD-related comorbidities, even in the absence of traditional risk factors such as high pack years/low quit years (i.e., cluster 5). In addition, these clusters may prove useful for stratifying patients for treatment and disease related outcomes in COPD, although this avenue of research is currently in its infancy.
Other groups have employed clustering approaches among those with COPD or symptoms of airway obstruction using spirometry and qCT measures. COPDGene investigators identified a group of resistant smokers (low emphysema/airflow obstruction), a group of smokers with severe emphysema, and two discordant groups in relation to emphysema and airflow obstruction (21). Another cluster analysis of NETT patients with COPD also found four clusters with heterogeneous and discordant profiles (22). A study of 2,164 GOLD stage II–IV patients with COPD found significant differences in mortality, hospitalizations, and exacerbations among five distinct clusters using 13 variables including COPD symptoms and inflammation markers in addition to spirometry and qCT measures (23). These and other studies are consistent with our results in highlighting the complexity of COPD subphenotyping and the disparate information provided by qCT and spirometry measures (4). Our study advances these subphenotyping approaches by evaluating their contribution to lung cancer risk.
The cluster results were significantly predictive of lung cancer risk. Even after adjusting for covariates, the spectrum of lung disease across the clusters is strongly associated with lung cancer risk, such that the most detrimental cluster has the highest odds of lung cancer and risk decreases with less detrimental lung clusters. In addition, the INHALE study tracks control subjects for subsequent diagnoses of lung cancer (diagnosis > 1 year post-CT). There are only 9 controls in this sample who were subsequently diagnosed with lung cancer; however, it is worth noting that 5 of these subjects were in cluster 2, a group with mild evidence of COPD on spirometry (on average) yet high levels of qCT air trapping and elevated MLD ratio. Furthermore, a disproportionate percentage of these control-to-case subjects (7/9 = 78%) were members of the two most detrimental clusters (clusters 1 and 2), which represent only 25% of the total sample (Fisher exact test P = 0.001). The remaining 2 subjects were in cluster 4, the largest cluster. These findings demonstrate that (i) qCT measures contribute to predicting lung cancer risk beyond that provided by spirometry/PFT and (ii) lung cancer risk differs across COPD subphenotypes, suggesting qCT measures could aid in identifying patients with COPD at greatest risk of developing lung cancer.
We note that while our study includes a large, diverse, population-based sample of current and former smokers, clustering results are inherently data-dependent and difficult to generalize. To validate the selected clusters, variables not used in the clustering, such as physical attributes (BMI, physical activity) and COPD-related comorbidities were evaluated and correlated with the clusters as expected. Validation of these results in additional smoking populations and monitoring for lung cancer diagnoses are critical next steps, and future efforts will focus on expanding the set of lung disease features considered to identify subphenotypes specific to whites and African Americans.
Subphenotyping of COPD is difficult due in part to the often-lengthy course of the disease. Utilizing a variety of measures related to COPD, including smoking intensity, spirometry, and quantitative imaging, reveals a spectrum of lung disease in a population of current and former smokers. Evidence presented here suggests that radiologic measures based on quantitative imaging analysis add information on lung physiology beyond that captured by spirometry. We have demonstrated that these COPD subphenotypes are clinically relevant predictors of lung cancer risk.
Disclosure of Potential Conflicts of Interest
J.C. Sieren has provided expert testimony for Vidadiagnostic. M.J. Simoff is a consultant/advisory board member for Auris Surgical Robotics, Intuitive Surgical Robotics, and Gongwin Biopharm. S. Gadgeel has received speakers bureau honoraria from Astra-Zeneca, Genentech/Roche, Bristol Myers-Squibb, Pfizer, Abbvie, and Takeda. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: C.M. Lusk, C. Neslund-Dudas, D. Spizarny, A.O. Soubani, A.G. Schwartz
Development of methodology: C.M. Lusk, D. Watza, J.C. Sieren, D. Spizarny, M.J. Simoff, A.O. Soubani
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.S. Wenzlaff, N. Robinette, M. Petrich, C. Neslund-Dudas, M.J. Flynn, T. Song, D. Spizarny, M.J. Simoff, A.O. Soubani, S. Gadgeel, A.G. Schwartz
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C.M. Lusk, D. Watza, N. Robinette, M. Petrich, S. Gadgeel, A.G. Schwartz
Writing, review, and/or revision of the manuscript: C.M. Lusk, D. Watza, J.C. Sieren, N. Robinette, C. Neslund-Dudas, M.J. Flynn, T. Song, M.J. Simoff, A.O. Soubani, S. Gadgeel, A.G. Schwartz
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C.M. Lusk, A.S. Wenzlaff, M.J. Flynn, A.G. Schwartz
Study supervision: C. Neslund-Dudas, S. Gadgeel, A.G. Schwartz
Other (radiologist; interpreted CT scans used in this study): G. Walworth
Acknowledgments
This work was funded by the NIH (grant nos R01CA141769, P30CA022453, and HHSN261201300011I) and the Herrick Foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.