Background:

High disease burden suggests the desirability to identify high-risk Asian never-smoking females (NSF) who may benefit from low-dose CT (LDCT) screening. In North America, one is eligible for LDCT screening if one satisfies the U.S. Preventive Services Task Force (USPSTF) criteria or has model-estimated 6-year risk greater than 0.0151. According to two U.S. reports, only 36.6% female patients with lung cancer met the USPSTF criteria, while 38% of the ever-smokers ages 55 to 74 years met the USPSTF criteria.

Methods:

Using data on NSFs in the Taiwan Genetic Epidemiology Study of Lung Adenocarcinoma and the Taiwan Biobank before August 2016, we formed an age-matched case–control study consisting of 1,748 patients with lung cancer and 6,535 controls. Using these and an estimated age-specific lung cancer 6-year incidence rate among Taiwanese NSFs, we developed the Taiwanese NSF Lung Cancer Risk Models using genetic information and simplified questionnaire (TNSF-SQ). Performance evaluation was based on the newer independent datasets: Taiwan Lung Cancer Pharmacogenomics Study (LCPG) and Taiwan Biobank data after August 2016 (TWB2).

Results:

The AUC based on the NSFs ages 55 to 70 years in LCPG and TWB2 was 0.714 [95% confidence intervals (CI), 0.660–0.768]. For women in TWB2 ages 55 to 70 years, 3.94% (95% CI, 2.95–5.13) had risk higher than 0.0151. For women in LCPG ages 55 to 74 years, 27.03% (95% CI, 19.04–36.28) had risk higher than 0.0151.

Conclusions:

TNSF-SQ demonstrated good discriminative power. The ability to identify 27.03% of high-risk Asian NSFs ages 55 to 74 years deserves attention.

Impact:

TNSF-SQ seems potentially useful in selecting Asian NSFs for LDCT screening.

This article is featured in Highlights of This Issue, p. 265

The National Lung Screening Trial (NLST) reported that a 20% decrease in mortality from lung cancer was observed in the arm screened using low-dose CT (LDCT) compared with the arm using chest radiography. Its success depends critically on its application of screening to high-risk individuals. Participants in NLST were 55–74 years of age, smoked no less than 30 pack-years, and had no more than 15 years of smoking quit time (1). Subsequently, the U.S. Preventive Services Task Force (USPSTF) recommended annual LDCT lung cancer screening of high-risk populations; namely, those who are 55–80 years of age and have the same smoking experience as those in NLST (2).

Several risk prediction models have since been considered to deal with the important problem of selecting high-risk individuals for LDCT screening (3–6). For example, the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) 2012 model (PLCOM2012) that estimates the probability of a smoker developing lung cancer in a 6-year period was used to show that the mortality rates in the NLST LDCT arm were consistently below those in the chest X-ray arm among individuals with PLCOM2012 risk ≥ 0.0151 (7). Model-based 6-year risk higher than 0.0151 and 0.02 have been considered in selecting individuals for LDCT screening for lung cancer (8, 9).

The USPSTF or NLST criteria have been examined continuously in practical situations. For example, only about 36.6% of female patients with lung cancer diagnosed during 2005–2011 met the USPSTF criteria for screening and this proportion had been decreasing (10), while about 38% (28,401/74,218) of the ever-smoking participants, or about 20% [28,401/(74,218 + 65,711)] of all the participants, in the PLCO were eligible for screening according to NLST criteria (4, 7, 9). Besides, selecting lung cancer screenees using PLCOM2012 risk higher than 1.5% or 2% have also been compared extensively with that using USPSTF/NLST criterion (4, 7, 9). In particular, PLCOM2012 would select 8.8% fewer persons and identify 12.4% more cases of lung cancer than USPSTF criteria.

The recent NELSON trial observed a 26% mortality reduction in male patients with lung cancer and up to a 61% mortality reduction in female patients with lung cancer among those screened using LDCT compared with those having no screening (11). This, together with a Korean report (12) that using LDCT screening, only 5% of the lung cancers detected in never-smokers were interval cancers, compared with 43% of those in ever-smokers, suggests that LDCT screening may be effective in reducing lung cancer–related mortality in Asian never-smoking females (NSF).

About 25% of lung cancer cases arise in never-smokers and lung cancer in never-smokers (LCINS) ranks as the seventh most common cause of cancer-related death worldwide (13–15). The proportion of LCINS has been increasing overtime, and about 60%–80% of female patients with lung cancer in Asia are never-smokers, much higher than in the United States and Europe (15, 16). LCINS exhibits distinct molecular characteristics, and the incidence of lung cancer in Asian NSFs is particularly high (17, 18). In Taiwan, 55% of the lung cancers are in never-smokers, lung cancer is the leading cause of cancer-related death among women, and over 90% of female patients with lung cancer are never-smokers (19). Recently, there has been an interest in LDCT screening for lung cancer among female never-smokers in China, Japan, Korea, and Taiwan (20–25). In particular, a lung cancer screening program for never-smokers in Taiwan is ongoing (ClinicalTrials.gov Identifier: NCT02611570), whose eligibility criteria were never-smokers aged between 55 and 75 years with one of the following risk: family history of lung cancer within third-degree, passive smoking exposure, TB/chronic obstructive pulmonary disease, and high cooking index without using ventilator during cooking (24, 25).

The above observations suggest that it is crucial to be able to identify high-risk Asian NSFs who may benefit from LDCT screening. To this end, developing lung-cancer risk prediction tools for Asian NSFs based on risk factors consistently identified in previous studies becomes a priority (26). However, this is difficult and challenging. Unlike the situation of tobacco-driven lung cancer, there are no established risk factors dominating the development of lung cancer among never-smokers. Numerous risk factors have been suggested and their effects vary greatly by geographic region (15, 18, 27–31). We note that PLCO models do not seem to be useful for Asian NSFs because PLCO included only about 2,000 never-smokers of Asian ethnicity and only seven lung cancers (7, 26, 32). Indeed, none of the never-smokers in the PLCO (n = 65,711) had a 6-year risk >0.0151, using the PLCOM2014 that is analogous to PLCOM2012 and included never-smokers (7).

The main purpose of this study was to propose models to estimate the risk of lung cancer diagnosis over a 6-year period in a Taiwanese NSF and to examine whether they are useful in identifying high-risk never-smoking women who may benefit from LDCT screening. In particular, we evaluated the criteria of using the 6-year risk higher than 0.0151 or 0.02 as thresholds to select NSFs for LDCT screening. The evaluation is based on datasets obtained chronologically later than the datasets used in the model development. We believe this study is useful and timely from both the public and personal perspectives. In particular, it is hoped that this study could improve the implementation of LDCT lung cancer screening program in Asian NSF.

The model was developed on the basis of an age-matched case–control study (AMCCS) and the age-specific six-year lung cancer incidence rate (ASSIR) in Taiwanese NSFs. The AMCCS included the Genetic Epidemiology Study of Lung Adenocarcinoma in Taiwan (GELAC), which has been used to study the genetic and environmental risk factors for lung cancer in Asian NSFs (33–39). The core risk factors included in our final model are age, body mass index (BMI), chronic obstructive pulmonary disease (COPD), education, family history of lung cancer, and SNPs reported in the genome-wide association studies (GWAS) of lung cancer in Asian NSFs, although PM2.5 did help in prediction.

AMCCS and selection of risk factors

Model construction began using the questionnaire and genetic information of 2,105 patients with lung cancer and 1,405 healthy controls from the NSFs in the case–control study component of the GELAC (33, 35, 38), which were recruited during 2000–2015, and 7,687 healthy NSFs from the Taiwan Biobank data granted before August 2016 (TWB1) having SNP array data. Cases in GELAC were incident patients with lung cancer. Supplementary Text S1 in the Supplementary Materials has more information about GELAC and Taiwan Biobank; some of the Taiwan Biobank participants were administered with simplified questionnaires, instead of the complete ones. Implementing quality control on these datasets by using both SNP array data and questionnaire information resulted in a dataset, referred to as the initial dataset, consisting of 7,094 females from TWB1 and 2,085 cases and 1,365 controls from GELAC, among which any two individuals genotyped by SNP array were unrelated, with their relatedness coefficient (PI-HAT) less than 0.05. Sex suggested by their SNP arrays is female.

Because 35.6% in TWB1 used the simplified questionnaire (SQ) and the rest used the complete one, and because we considered only risk factors having identical meanings in the Taiwan Biobank and GELAC questionnaires, we included only age, BMI, COPD, education, family history of lung cancer, the GWAS-identified SNPs, and PM2.5*I(Age≥55) in developing our main model.

Because Taiwan Biobank questionnaire includes only the districts that its participants were residents of at enrollment and only since the beginning of 2006, Taiwan government measured PM2.5 exposure at 76 stations over Taiwan main island, the PM2.5*I(Age≥55) for an individual in this article equals to the average PM2.5 exposure during 2006–2008 interpolated at the district office of this individual's residence if aged 55 or more and 0 otherwise; there are a total of 352 districts in Taiwan main island; see Text S2 for more details. We considered PM2.5 for those aged 55 or more to reflect its long-term effect (40). Its concentration was measured in |\mu g/{m^3}.$|

On the basis of the initial dataset, we formed the AMCCS as follows. For each case, we age-matched (±1 year, age at recruitment) healthy controls selected from GELAC or TWB1 until each case had 1–5 matched controls. This resulted in the AMCCS that consisted of 1,748 age-matched groups. Figure 1 details these procedures.

Figure 1.

The procedures to form the AMCCS based on the simplified questionnaire. For the matching procedure from the initial dataset (box c) to the AMCCS (box d), priority was given to controls having less missing values among the risk factors of age, BMI, education, family history, COPD, 11 SNPs, and PM2.5. In box a, 966 among the 2,085 cases and 646 among the 1,365 controls were ages 55 to 70 years. In box b, 2,377 among the 7,094 controls were ages 55 to 70 years. In box e, 648 of the 1,341 cases and 2,561 of the 5,343 controls were ages 55 to 70 years.

Figure 1.

The procedures to form the AMCCS based on the simplified questionnaire. For the matching procedure from the initial dataset (box c) to the AMCCS (box d), priority was given to controls having less missing values among the risk factors of age, BMI, education, family history, COPD, 11 SNPs, and PM2.5. In box a, 966 among the 2,085 cases and 646 among the 1,365 controls were ages 55 to 70 years. In box b, 2,377 among the 7,094 controls were ages 55 to 70 years. In box e, 648 of the 1,341 cases and 2,561 of the 5,343 controls were ages 55 to 70 years.

Close modal

On the basis of the AMCCS, we assessed a variety of risk factors, shown in Supplementary Table S0, Supplementary Materials. Supplementary Table S1, Supplementary Materials, provide information on the 11 SNPs.

Approach to absolute risk

We built our risk prediction models by combining relative risk models with population incidence rates. This approach was used in breast cancer risk models and the Liverpool Lung Project risk model (41–43); additional efforts were made to take into consideration the increase of female lung cancer incidence rate in Taiwan and competing causes of death.

It is a two-stage approach to fit a logistic regression model including intercept, age, and other risk factors. In the first stage, we fitted multivariate logistic regression models, using a conditional likelihood approach (44), to the AMCCS data to obtain the ORs of risk factors other than age. The subset of samples of the AMCCS used in the final step of this fitting procedure, referred to as selected AMCCS (SAMCCS), was used in the second stage.

Taking into account competing causes of death and the increase of female lung cancer incidence rate in Taiwan (Fig. 5b in Chien and colleagues; ref. 45), we constructed the 2011 age-specific rates of developing lung cancer in the next 6 years among Taiwanese NSFs, based on population data prior to 2011, including the Taiwan Cancer Registry, Taiwan Cause of Death Database, Monthly Bulletin of Interior Statistics from the Taiwan Ministry of the Interior, and the 2010 Taiwan life table. Details are in Supplementary Text S2 and Supplementary Tables S2 and S3, which reports the ASSIR.

We assume that the healthy controls in the AMCCS are representatives of NSFs in Taiwan conditional on the matching variable, age, and that a logit relationship holds between risk factors and the probability of developing cancer in the subsequent 6-year period. In the second stage, we estimated the effect of age and intercept using 2011 ASSIR, conditional on the ORs of the risk factors from the first stage, to obtain the risk prediction model. Details are in Supplementary Text S2, Supplementary Materials.

Chronologically later datasets for performance evaluation

We used the data of the NSFs in the Taiwan Lung Cancer Pharmacogenomics Study (LCPG) and those of the NSFs with SNP array genotype in the Taiwan Biobank granted from September 2016 to October 2018 (TWB2) as independent cohorts for performance evaluation. LCPG recruited patients with late-stage lung cancer whose first-line treatments were chemotherapy or targeted therapy in the period 2015–2017 from five hospitals in Taiwan. LCPG recruited 233 NSFs. Detailed information for TWB2 and LCPG are in Supplementary Text S1, Supplementary Materials.

We used LCPG and TWB2 to estimate the percentage of individuals with risk higher than certain thresholds and the AUC ROC (46). Participants in LCPG and lung cancer cases in GELAC are beyond risk, as they had lung cancer at enrollment. However, they serve to estimate the sensitivity of the risk prediction models if screening were available prior to clinical diagnosis. In fact, the percentage of LCPG higher than a threshold approximated the sensitivity of a slightly lower threshold, because their risks were based on questionnaire administered at diagnosis and risk usually increases with age; similarly, the percentage of TWB2 higher than a threshold approximated the 1 − specificity of a slightly lower threshold. With this understanding, we calculated these quantities and the AUC ROC.

We built our models for women ages 30–76 and mainly evaluated on women ages 55–70 or 55–74 because the Taiwan Biobank individuals were under 70 years of age and 55–74 was the age range in the NLST. We focused on two models; one included genetic information and the other did not.

We considered three risk levels: 0.0134, 0.0151, and 0.02. The level 0.0134 is the risk at which PLCOM2012 finds eligible the same number of individuals as the NLST criteria would find eligible in PLCO data (7). The level 0.02 was used to recruit participants for the PanCan study (8). All calculations were implemented in R.

This study was approved by the Institutional Review Board of the National Health Research Institutes (Zhunan, Taiwan) with the approval number EC1000902. Written consent was obtained from each participant in this study.

Characteristics of the AMCCS

The epidemiologic characteristics of the AMCCS are shown in Supplementary Table S0, Supplementary Materials. Supplementary Table S0 is suggestive of risk factors that might be important as potential candidates for entry into the prediction models. On the basis of the univariate analyses at 0.05 significance level, education, family history of lung cancer, COPD, cooking time-year (CTY), CTY without fume extractor when cooking, hormone replacement therapy, and PM2.5*I(Age≥55) were associated with lung cancer risk. Nine of the GWAS-identified 11 SNPs showed P values less than 0.05; the two that did not, rs3817963 and rs2179920, had the largest P values in Seow and colleagues' study (37).

Risk model using the simplified questionnaire

We first obtained the effects of BMI, COPD, education, family history, PM2.5*I(Age≥55), and the 11 SNPs by fitting to the AMCCS data using conditional likelihood. Conditional on these effects and using the 2011 ASSIR, we then obtained the proposed risk prediction model. In fact, we constructed two such risk models; one involved variable selection in the first estimation stage; the other did not. The former is termed the Taiwan NSF Lung Cancer Risk Model using SQ (TNSF-SQ); the latter is termed TNSF-SQ1. Table 1 reports the ORs for TNSF-SQ except for age, whose effects are given in Supplementary Fig. S1. Risk calculators are given in the Supplementary Text S2, Supplementary Materials, where the coefficients are shown in more decimal points. These two models are similar, except that PM2.5*I(Age≥55) was included in TNSF-SQ1 only. In both models, education was protective and had a strong effect; both family history of lung cancer and COPD also had large ORs. The impact of BMI on lung cancer susceptibility showed that those having the smallest BMI were at the highest risk. Nine of the 11 SNPs from GWAS were included in the model. We considered TNSF-SQ our main model for reasons to be given in Discussion.

Table 1.

Risk factors and their ORs in the Taiwan TNSF-SQ.

SAMCCS (6,684 = 1,341 + 5,343)
VariableOR (95% CI)Pa
Education 0.543 (0.511–0.577) <2.00E-16 
BMI (kg/m2
 BMI < 18.5 1.581 (1.115–2.243) 1.02E-02 
 18.5 ≤ BMI < 24  
 24 ≤ BMI < 27 0.886 (0.753–1.041) 1.41E-01 
 27 ≤ BMI < 30 0.656 (0.521–0.824) 3.06E-04 
 BMI ≥ 30 0.742 (0.542–1.015) 6.18E-02 
Family history of lung cancer 2.067 (1.643–2.600) 5.72E-10 
COPD 2.204 (1.382–3.513) 8.96E-04 
rs10937405 (A)b 0.738 (0.663–0.822) 3.33E-08 
rs2736100 (C1.360 (1.235–1.498) 3.79E-10 
rs2395185 (A1.205 (1.093–1.329) 1.85E-04 
rs2495239 (A1.158 (1.053–1.273) 2.54E-03 
rs9387478 (A0.867 (0.789–0.953) 3.11E-03 
rs72658409 (T0.755 (0.620–0.919) 5.15E-03 
rs7086803 (A1.265 (1.144–1.398) 4.64E-06 
rs11610143 (G0.857 (0.768–0.956) 5.61E-03 
rs7216064 (G0.874 (0.790–0.967) 9.10E-03 
Variable Coefficient P 
Model constant −19.45842 5.98E-14 
 Agec 0.69922 7.80E-08 
 (Age)2 −0.01018 1.74E-05 
 (Age)3 0.00005 4.67E-04 
SAMCCS (6,684 = 1,341 + 5,343)
VariableOR (95% CI)Pa
Education 0.543 (0.511–0.577) <2.00E-16 
BMI (kg/m2
 BMI < 18.5 1.581 (1.115–2.243) 1.02E-02 
 18.5 ≤ BMI < 24  
 24 ≤ BMI < 27 0.886 (0.753–1.041) 1.41E-01 
 27 ≤ BMI < 30 0.656 (0.521–0.824) 3.06E-04 
 BMI ≥ 30 0.742 (0.542–1.015) 6.18E-02 
Family history of lung cancer 2.067 (1.643–2.600) 5.72E-10 
COPD 2.204 (1.382–3.513) 8.96E-04 
rs10937405 (A)b 0.738 (0.663–0.822) 3.33E-08 
rs2736100 (C1.360 (1.235–1.498) 3.79E-10 
rs2395185 (A1.205 (1.093–1.329) 1.85E-04 
rs2495239 (A1.158 (1.053–1.273) 2.54E-03 
rs9387478 (A0.867 (0.789–0.953) 3.11E-03 
rs72658409 (T0.755 (0.620–0.919) 5.15E-03 
rs7086803 (A1.265 (1.144–1.398) 4.64E-06 
rs11610143 (G0.857 (0.768–0.956) 5.61E-03 
rs7216064 (G0.874 (0.790–0.967) 9.10E-03 
Variable Coefficient P 
Model constant −19.45842 5.98E-14 
 Agec 0.69922 7.80E-08 
 (Age)2 −0.01018 1.74E-05 
 (Age)3 0.00005 4.67E-04 

aExcept for age-related variables (i.e., Age, Age2, and Age3), the P values were obtained from the multivariate logistic regression using conditional likelihood. The P values for age-related variables were obtained from a linear regression analysis in the second stage (see Materials and Methods).

bThe genetic variables take values 0, 1, and 2 according to the number of the minor alleles the individuals have at the SNP. Here, the minor alleles, in the parentheses, are those reported in the literature.

cThe age effect can be visualized in Supplementary Fig. S1, Supplementary Materials.

Among women ages 55–70, the AUC based on TWB2 and LCPG was 0.714 [95% confidence interval (CI), 0.660–0.768]; among these women in TWB2, 3.94% (95% CI, 2.95–5.13) had risk higher than 0.0151. For women in LCPG ages 55–74, 27.03% (95% CI, 19.04–36.28) had risk higher than 0.0151 (Table 2A).

Table 2.

Number and percentage of people with 6-year risk higher than certain thresholds in TWB1, GELAC cases, TWB2, and LCPG.

(A) TNSF-SQ
Ages 55–70 yearsAges 55–70 yearsAges 55–74 years
TWB1GELAC casesTWB2LCPGLCPG
1 − SpecificitySensitivity1 − SpecificitySensitivitySensitivity
Risk thresholdNa% (95% CIb)N% (95% CI)N% (95% CI)N% (95% CI)N% (95% CI)
≥0 2,350 100 718 100 1,321 100 96 100 111 100 
≥0.0134 191 8.13 (7.05–9.31) 301 41.92 (38.28–45.63) 70 5.3 (4.15–6.65) 24 25 (16.72–34.88) 33 29.73 (21.43–39.15) 
≥0.0151 145 6.17 (5.23–7.22) 259 36.07 (32.55–39.71) 52 3.94 (2.95–5.13) 22 22.92 (14.95–32.61) 30 27.03 (19.04–36.28) 
≥0.02 58 2.47 (1.88–3.18) 174 24.23 (21.14–27.54) 24 1.82 (1.17–2.69) 10 10.42 (5.11–18.32) 17 15.32 (9.18–23.39) 
AUC (95% CI) 0.770 (0.749–0.791) 0.714 (0.660–0.768)   
(B) TNSF-SQNG 
≥0 2,351 100 822 100 1,323 100 101 100 117 100 
≥0.0134 148 6.3 (5.35–7.35) 289 35.16 (31.89–38.53) 51 3.85 (2.88–5.04) 19 18.81 (11.72–27.81) 28 23.93 (16.53–32.7) 
≥0.0151 82 3.49 (2.78–4.31) 232 28.22 (25.17–31.44) 23 1.74 (1.11–2.60) 11 10.89 (5.56–18.65) 19 16.24 (10.07–24.19) 
≥0.02 50 2.13 (1.58–2.79) 151 18.37 (15.78–21.19) 15 1.13 (0.64–1.86) 6.93 (2.83–13.76) 14 11.97 (6.70–19.26) 
AUC (95%CI) 0.754 (0.734–0.775) 0.694 (0.637–0.751)   
(A) TNSF-SQ
Ages 55–70 yearsAges 55–70 yearsAges 55–74 years
TWB1GELAC casesTWB2LCPGLCPG
1 − SpecificitySensitivity1 − SpecificitySensitivitySensitivity
Risk thresholdNa% (95% CIb)N% (95% CI)N% (95% CI)N% (95% CI)N% (95% CI)
≥0 2,350 100 718 100 1,321 100 96 100 111 100 
≥0.0134 191 8.13 (7.05–9.31) 301 41.92 (38.28–45.63) 70 5.3 (4.15–6.65) 24 25 (16.72–34.88) 33 29.73 (21.43–39.15) 
≥0.0151 145 6.17 (5.23–7.22) 259 36.07 (32.55–39.71) 52 3.94 (2.95–5.13) 22 22.92 (14.95–32.61) 30 27.03 (19.04–36.28) 
≥0.02 58 2.47 (1.88–3.18) 174 24.23 (21.14–27.54) 24 1.82 (1.17–2.69) 10 10.42 (5.11–18.32) 17 15.32 (9.18–23.39) 
AUC (95% CI) 0.770 (0.749–0.791) 0.714 (0.660–0.768)   
(B) TNSF-SQNG 
≥0 2,351 100 822 100 1,323 100 101 100 117 100 
≥0.0134 148 6.3 (5.35–7.35) 289 35.16 (31.89–38.53) 51 3.85 (2.88–5.04) 19 18.81 (11.72–27.81) 28 23.93 (16.53–32.7) 
≥0.0151 82 3.49 (2.78–4.31) 232 28.22 (25.17–31.44) 23 1.74 (1.11–2.60) 11 10.89 (5.56–18.65) 19 16.24 (10.07–24.19) 
≥0.02 50 2.13 (1.58–2.79) 151 18.37 (15.78–21.19) 15 1.13 (0.64–1.86) 6.93 (2.83–13.76) 14 11.97 (6.70–19.26) 
AUC (95%CI) 0.754 (0.734–0.775) 0.694 (0.637–0.751)   

aThe number in TWB1 is the number of individuals in box b, Fig. 1, ages 55 to 70 years, with all the variables available for the model. The number in GELAC cases is the number of patients with lung cancer in box a, Fig. 1, ages 55 to 70 years, with all the variables available for the model. The number in TWB2 and LCPG is the number of individuals with all the variables available for the model.

bThe CIs for the proportion of the high-risk group are computed using binomial exact CI.

To compare the discriminative power of the risk factors, we similarly developed five additional models ignoring each of the covariates: education, COPD, family history of lung cancer, BMI, and SNPs, starting with the SAMCCS in box e, Fig. 1. Their AUCs and sensitivities are given in Supplementary Table S4, suggesting that education had the strongest discriminative power and that although some of these models had higher AUCs, their sensitivities were lower than TNSF-SQ's.

Risk model with no genetic variants

Because of its wider applicability, the risk model without using the SNPs, termed TNSF-SQ with no genetic variants (TNSF-SQNG), deserves attention. Among women ages 55–70, the AUC was 0.694 (95% CI, 0.637–0.751); among these women in TWB2, 1.74% (95% CI, 1.11–2.60) had risks higher than 0.0151. For women in LCPG ages 55–74, 16.24% (95% CI, 10.07–24.19) had risks higher than 0.0151 (Table 2B).

Other risk models

To assess the usefulness of other risk factors, we used the same method and the same AMCCS in box d, Fig. 1 to develop the model using genetic variants and age only (TNSF-G), the model using all the risk factors common to both the Taiwan Biobank and GELAC questionnaire (TNSF), and the model using all the risk factors common to both but without SNPs (TNSF-NG). Their performance is presented in Supplementary Table S5.

Comparison of training and validation datasets

Table 2 and Supplementary Table S5D show that the percentages of people with high risks in GELAC cases were higher than those in LCPG under all the models except under TNSF-G. To better understand this phenomenon, we present in Supplementary Table S6 the distributions of risk factors in GELAC cases, LCPG, TWB1, and TWB2. It shows that differences seemed to exist between GELAC cases and LCPG for certain factors but not for 10 of the 11 SNPs; similar remarks hold for TWB1 and TWB2, where none of the 11 SNPs showed difference. The 6-year risk distributions in these four cohorts under TNSF-SQ, TNSF-SQNG, and TNSF-G are given in Supplementary Fig. S2, Supplementary Materials, suggesting that the similarity between the 6-year risk distribution in TWB1 and that in TWB2 is higher than the similarity between that in GELAC cases and that in LCPG; the similarity between that in GELAC cases and that in LCPG is higher under TNSF-G than under the other two models. The number of participants in each of these cohorts under these studies is shown in Supplementary Table S7.

To know more of the risk factors, we present in Supplementary Table S8 the characteristics of the AMCCS restricted to GELAC.

Performance of TNSF-SQ

TNSF-SQ seems to be the first model in literature based on standard risk factors to address the need to identify high-risk Asian NSFs who may benefit from LDCT lung cancer screening (26). Our performance evaluation is based on TWB2 and LCPG, which were formed chronologically later than TWB1 and GELAC, and hence is realistic. In addition to an AUC of 0.714, it seems that the percentages of NSFs in TWB2 and LCPG having risks higher than 0.0151 or 0.02 deserve attention.

Given that there are no established risk factors dominating the lung cancer development among never-smokers, the model TNSF-SQ seems to represent a major step in identifying high-risk Asian NSFs for lung cancer screening.

Table 2A suggests that about 3.94% of all the healthy NSFs ages 55–70 have TNSF-SQ risk higher than 0.0151. If we screened all these women at this moment, then among all the healthy NSFs ages 55–70 who will develop lung cancer in the subsequent 6-year period, about 23% would be among the screened. This percentage would be 27 if we screened all those ages 55–74 with risk higher than 0.0151. To put this into perspective, we consider the observations regarding the USPSTF and NLST criteria. First, only about 36.6% of U.S. female patients with lung cancer diagnosed during 2005–2011 met the USPSTF criteria for screening, and the proportion had been decreasing (2, 10). Second, about 38% of all the smokers ages 55–74 in the United States were eligible for screening according to NLST criteria (4, 7, 9). Comparison with the USPSTF or NLST criteria for ever-smokers helps appreciate the usefulness of TNSF-SQ in selecting NSFs for LDCT screening.

For Asian NSFs, both PLCOM2014 and TNSF-SQ used the same risk factors, except that the former also included a history of cancer and the latter included SNPs. But the effects of these common risk factors are different. Although the effect of BMI in TNSF-SQ is in-line with that in PLCOM2014 and with literature, it is worth noting that ours regard Asian NSFs (47). That education had a larger effect in TNSF-SQ seems reasonable; our preliminary studies suggest that higher education correlates with shorter CTY, lower environmental tobacco smoking (ETS) at home, and lower incense burning in worship. Further studies are needed.

Other risk factors

TNSF-SQ1 performed slightly better than TNSF-SQ (Supplementary Table S5C). Because PM2.5 in TNSF-SQ1 changes only with district, the populations of districts may be in the hundreds of thousands, and its real exposure varies greatly within each district, it may raise concerns in the implementation of TNSF-SQ1; hence, we consider TNSF-SQ our main model for practical reason. In fact, PM2.5 exposure has become a topic of debate in Taiwan's newspapers. Our discussions about PM2.5*I(Age≥55) are preliminary; further studies are needed.

It is worth noting that TNSF-SQNG could be implemented on a wider scale and in lower income regions where SNP genotyping is not available. It can also be used as a preliminary tool; individuals with high TNSF-SQNG risks are suggested to obtain genotype information and calculate TNSF-SQ risk.

A recent risk prediction model for developing lung cancer among never-smokers in Taiwan had a high AUC of 0.806 (48). Their model included maximum mid-expiratory flow, carcinoembryonic antigen, and alpha fetoprotein. With medical prescriptions, information on these covariates can be obtained through Taiwan's National Health Insurance Program. In general, information on these covariates and that on SNPs can be obtained at an individual's expense through health care providers and hence are not as readily available as that on the risk factors in TNSF-SQNG.

The findings that TNSF-SQ performed better than TNSF-SQNG and that TNSF-G was robust, suggest the usefulness of GWAS-identified SNPs. Because the utility of polygenic risk prediction models depends on the training dataset cohort size, underlying genetic architecture, and other risk factors (49), we believe that larger GWAS for lung cancer in Asian NSFs might lead to more predictive models.

Uncertainty

Underestimation of uncertainty in a two-stage estimation is a general concern. One may study the information matrix analytically by considering a probability structure in the second stage of the estimation (50). In this study, we described a bootstrap approach in Supplementary Text S2B; in particular, the CIs of the age effects presented in Supplementary Fig. S1 were obtained by the bootstrap. Further studies to compare the analytic approach and various bootstrap strategies are warranted; see Supplementary Text S2B for details. Because the imputed genotype data used in our model development seemed to be of good quality, we did not consider its uncertainty in this study; see Supplementary Text S1A for more discussion.

It is desirable to assess the calibration of TNSF-SQ in a prospective cohort like Taiwan Biobank (51, 52). However, Taiwan Biobank follow-up data are limited at this moment; see Supplementary Text S1. Here are some remarks on the representativeness of our case–control studies.

In view of the post-1960 industrialization in Taiwan, distribution of certain risk factors changed with time, as shown in Supplementary Table S6. These are in-line with the recruitment time of GELAC (2002–2015), LCPG (2015–2017), TWB1 (2008–2015), and TWB2 (2008–2016). These suggest that good calibration could be expected if recruitment period for training set and that for validation set are comparable and that there is a need for continuous development and evaluation of our models to take into account the change in risk factor distributions (53, 54).

Limitation

A limitation of this study is that the models were constructed from a two-stage design rather than from a prospective cohort study and our choices of the thresholds 0.0134, 0.0151, and 0.02 were based on prospective studies for smokers in the North America. We need a prospective study to build the model and to do the evaluation so as to decide the thresholds and conduct calibration study.

Another limitation regards information on ETS. The prevalence of ETS at home for GELAC cases and controls were 68% and 56%, respectively, and statistically different (Supplementary Table S8). Compared with GELAC, Taiwan Biobank provides very limited and somewhat different information on ETS, making it hard to use in model development. GELAC provides more solid information like the number of cigarettes consumed daily by a participant's spouse or father and the ETS exposure periods. To make ETS useful in risk prediction models, we suggest collect detailed information and standardize the ETS exposure measurement in future studies.

Our study may be useful for policy makers in screening program design when government budgets are limited. It might also be useful for Taiwanese NSFs or their doctors to get some idea about their risks for lung cancer and decide if they might benefit from LDCT lung cancer screening. Like GELAC in Taiwan, several cohorts in mainland China, South Korea, Japan, Hong Kong, and Singapore have jointly taken part in the GWAS of lung cancer in Asian NSFs (35, 37); these cohorts together with their age-specific local incidence rates could be utilized to build risk prediction models for the regions in which the cohorts were established. Given that these regions are similar in environment, life style, and genetic architecture and that more cases and controls are available, we expect similar but more useful predictive risk models for lung cancer among Asian NSFs to appear in the near future.

K.-Y. Chen reports receiving speakers bureau honoraria from AstraZeneca, Roche, Boehringer Ingelheim, Pfizer, Novartis, Merck Sharp & Dohme, Ono Pharmaceutical, and Bristol-Myers Squibb and other remuneration (travel/accommodation/meeting expenses) from Merck Sharp & Dohme, Boehringer Ingelheim, and Pfizer. S.-K. Liang reports receiving speakers bureau honoraria from Roche, AstraZeneca, Pfizer, Merck Sharp & Dohme, Novartis, and Boehringer Ingelheim. No potential conflicts of interest were disclosed by the other authors.

The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the article; or decision to submit the article for publication.

Conception and design: L.-H. Chien, C.-L. Wang, P.-C. Yang, C.-J. Chen, I-S. Chang, C.A. Hsiung

Development of methodology: L.-H. Chien, C.-H. Chen, T.-Y. Chen, C.-J. Chen, I-S. Chang, C.A. Hsiung

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): G.-C. Chang, Y.-H. Tsai, K.-Y. Chen, W.-C. Su, M.-S. Huang, Y.-M. Chen, C.-Y. Chen, S.-K. Liang, C.-Y. Chen, C.-L. Wang, J.-W. Hu, S.J. Chanock, N. Rothman, Q. Lan, P.-C. Yang, C.-J. Chen, C.A. Hsiung

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L.-H. Chien, C.-H. Chen, T.-Y. Chen, W.-C. Wang, Y.-M. Chen, S.-K. Liang, R.-H. Chung, F.-Y. Tsai, N. Chatterjee, S.J. Chanock, N. Rothman, Q. Lan, P.-C. Yang, C.-J. Chen, I-S. Chang, C.A. Hsiung

Writing, review, and/or revision of the manuscript: L.-H. Chien, T.-Y. Chen, W.-C. Wang, Y.-M. Chen, M.-H. Lee, H.A. Katki, N. Chatterjee, S.J. Chanock, N. Rothman, Q. Lan, C.-J. Chen, I-S. Chang, C.A. Hsiung

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C.-F. Hsiao, Y.-M. Chen, P.-C. Yang, C.A. Hsiung

Study supervision: I-S. Chang, C.A. Hsiung

The authors thank Dr. Christine D. Berg for her valuable comments on an earlier version of this article. The authors also thank Ms. Hsiao-Han Hung, Wan-Shan Hsieh, and Hsin-Fang Jiang for technical assistance. This study was supported by grants from the Ministry of Health and Welfare (DOH100-TD-PB-111-TM013 to C.A. Hsiung, DOH101-TD-PB-111-TM015 to C.A. Hsiung, DOH102-TD-PB-111-TM024 to C.A. Hsiung, MOHW103-TDU-PB-211-144003 to C.A. Hsiung, and MOHW105-TDU-B-212-134013 to I-S. Chang) and the Ministry of Science and Technology (MOST 103-2325-B-400-023 to C.A. Hsiung, MOST 104-2325-B-400-012 to C.A. Hsiung and I-S. Chang, MOST 105-2325-B-400-010 to C.A. Hsiung and I-S. Chang, MOST 106-2319-B400-001 to C.A. Hsiung and I-S. Chang, and MOST 107-2319-B-400-001 to C.A. Hsiung and I-S. Chang).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
National Lung Screening Trial Research Team
,
Aberle
DR
,
Adams
AM
,
Berg
CD
,
Black
WC
,
Clapp
JD
, et al
Reduced lung-cancer mortality with low-dose computed tomographic screening
.
N Engl J Med
2011
;
365
:
395
409
.
2.
Moyer
VA
. 
Screening for lung cancer: U.S. Preventive Services Task Force recommendation statement.
Ann Inter Med
2014
;
160
:
330
8
.
3.
Raji
OY
,
Duffy
SW
,
Agbaje
OF
,
Baker
SG
,
Christiani
DC
,
Cassidy
A
, et al
Predictive accuracy of the Liverpool Lung Project risk model for stratifying patients for computed tomography screening for lung cancer: a case-control and cohort validation study
.
Ann Intern Med
2012
;
157
:
242
50
.
4.
Tammemagi
MC
,
Katki
HA
,
Hocking
WG
,
Church
TR
,
Caporaso
N
,
Kvale
PA
, et al
Selection criteria for lung-cancer screening
.
N Engl J Med
2013
;
368
:
728
36
.
5.
Katki
HA
,
Kovalchik
SA
,
Berg
CD
,
Cheung
LC
,
Chaturvedi
AK
. 
Development and validation of risk models to select ever-smokers for CT lung cancer screening
.
JAMA
2016
;
315
:
2300
11
.
6.
Katki
HA
,
Kovalchik
SA
,
Petito
LC
,
Cheung
LC
,
Jacobs
E
,
Jemal
A
, et al
Implications of nine risk prediction models for selecting ever-smokers for computed tomography lung cancer screening
.
Ann Intern Med
2018
;
169
:
10
9
.
7.
Tammemagi
MC
,
Church
TR
,
Hocking
WG
,
Silvestri
GA
,
Kvale
PA
,
Riley
TL
, et al
Evaluation of the lung cancer risks at which to screen ever- and never-smokers: screening rules applied to the PLCO and NLST cohorts
.
PLoS Med
2014
;
11
:
e1001764
.
8.
Tammemagi
MC
,
Schmidt
H
,
Martel
S
,
McWilliams
A
,
Goffin
JR
,
Johnston
MR
, et al
Participant selection for lung cancer screening by risk modelling (the Pan-Canadian Early Detection of Lung Cancer [PanCan] study): a single-arm, prospective study
.
Lancet Oncol
2017
;
18
:
1523
31
.
9.
Tammemagi
MC
. 
Selecting lung cancer screenees using risk prediction models-where do we go from here
.
Transl Lung Cancer Res
2018
;
7
:
243
53
.
10.
Wang
Y
,
Midthun
DE
,
Wampfler
JA
,
Deng
B
,
Stoddard
SM
,
Zhang
S
, et al
Trends in the proportion of patients with lung cancer meeting screening criteria
.
JAMA
2015
;
313
:
853
5
.
11.
De Koning
H
,
Van Der Aalst
C
,
Ten Haaf
K
,
Oudkerk
M
. 
Effects of volume CT lung cancer screening: mortality results of the NELSON randomised-controlled population based trial
.
J Thorac Oncol
2018
:
S185
.
12.
Kang
HR
,
Cho
JY
,
Lee
SH
,
Lee
YJ
,
Park
JS
,
Cho
YJ
, et al
Role of low-dose computerized tomography in lung cancer screening among never-smokers
.
J Thorac Oncol
2019
;
14
:
436
44
.
13.
Ferlay
J
,
Shin
HR
,
Bray
F
,
Forman
D
,
Mathers
C
,
Parkin
DM
. 
Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008
.
Int J Cancer
2010
;
127
:
2893
917
.
14.
Parkin
DM
,
Bray
F
,
Ferlay
J
,
Pisani
P
. 
Global cancer statistics, 2002
.
CA Cancer J Clin
2005
;
55
:
74
108
.
15.
Sun
S
,
Schiller
JH
,
Gazdar
AF
. 
Lung cancer in never smokers–a different disease. Nat Rev
Cancer
2007
;
7
:
778
90
.
16.
Yano
T
,
Miura
N
,
Takenaka
T
,
Haro
A
,
Okazaki
H
,
Ohba
T
, et al
Never-smoking nonsmall cell lung cancer as a separate entity: clinicopathologic features and survival
.
Cancer
2008
;
113
:
1012
8
.
17.
Thun
MJ
,
Hannan
LM
,
Adams-Campbell
LL
,
Boffetta
P
,
Buring
JE
,
Feskanich
D
, et al
Lung cancer occurrence in never-smokers: an analysis of 13 cohorts and 22 cancer registry studies
.
PLoS Med
2008
;
5
:
e185
.
18.
Couraud
S
,
Zalcman
G
,
Milleron
B
,
Morin
F
,
Souquet
PJ
. 
Lung cancer in never smokers–a review
.
Eur J Cancer
2012
;
48
:
1299
311
.
19.
Tseng
CH
,
Tsuang
BJ
,
Chiang
CJ
,
Ku
KC
,
Tseng
JS
,
Yang
TY
, et al
The relationship between air pollution and lung cancer in nonsmokers in Taiwan
.
J Thorac Oncol
2019
;
14
:
784
92
.
20.
Kondo
R
,
Yoshida
K
,
Kawakami
S
,
Shiina
T
,
Kurai
M
,
Takasuna
K
, et al
Efficacy of CT screening for lung cancer in never-smokers: analysis of Japanese cases detected using a low-dose CT screen
.
Lung Cancer
2011
;
74
:
426
32
.
21.
Wu
FZ
,
Huang
YL
,
Wu
CC
,
Tang
EK
,
Chen
CS
,
Mar
GY
, et al
Assessment of selection criteria for low-dose lung screening CT among Asian ethnic groups in Taiwan: from mass screening to specific risk-based screening for non-smoker lung cancer
.
Clin Lung Cancer
2016
;
17
:
e45
e56
.
22.
Luo
X
,
Zheng
S
,
Liu
Q
,
Wang
S
,
Li
Y
,
Shen
L
, et al
Should nonsmokers be excluded from early lung cancer screening with low-dose spiral computed tomography? Community-based practice in Shanghai
.
Transl Oncol
2017
;
10
:
485
90
.
23.
Kim
HY
,
Jung
KW
,
Lim
KY
,
Lee
SH
,
Jun
JK
,
Kim
J
, et al
Lung cancer screening with low-dose CT in female never smokers: retrospective cohort study with long-term national data follow-up
.
Cancer Res Treat
2018
;
50
:
748
56
.
24.
Yang
PC
. 
Taiwan lung cancer screening program for never-smokers
.
Respirology
2018
;
32
:
69
.
25.
Yang
PC
. 
National lung screening program in Taiwan
.
J Thorac Oncol
2018
;
13
:
S274
5
.
26.
Lam
S
. 
Lung cancer screening in never-smokers
.
J Thorac Oncol
2019
;
14
:
336
7
.
27.
Toh
CK
,
Gao
F
,
Lim
WT
,
Leong
SS
,
Fong
KW
,
Yap
SP
, et al
Never-smokers with lung cancer: epidemiologic evidence of a distinct disease entity
.
J Clin Oncol
2006
;
24
:
2245
51
.
28.
Subramanian
J
,
Govindan
R
. 
Lung cancer in never smokers: a review
.
J Clin Oncol
2007
;
25
:
561
70
.
29.
Pope
CA
 III
,
Burnett
RT
,
Thun
MJ
,
Calle
EE
,
Krewski
D
,
Ito
K
, et al
Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution
.
JAMA
2002
;
287
:
1132
41
.
30.
Turner
MC
,
Krewski
D
,
Pope
CA
 III
,
Chen
Y
,
Gapstur
SM
,
Thun
MJ
. 
Long-term ambient fine particulate matter air pollution and lung cancer in a large cohort of never-smokers
.
Am J Respir Crit Care Med
2011
;
184
:
1374
81
.
31.
Sisti
J
,
Boffetta
P
. 
What proportion of lung cancer in never-smokers can be attributed to known risk factors?
Int J Cancer
2012
;
131
:
265
75
.
32.
ten Haaf
K
,
de Koning
HJ
. 
Should never-smokers at increased risk for lung cancer be screened?
J Thorac Oncol
2015
;
10
:
1285
91
.
33.
Hsiung
CA
,
Lan
Q
,
Hong
YC
,
Chen
CJ
,
Hosgood
HD
,
Chang
IS
, et al
The 5p15.33 locus is associated with risk of lung adenocarcinoma in never-smoking females in Asia
.
PLos Genet
2010
;
6
.
doi: 10.1371/journal.pgen.1001051
.
34.
Hosgood
HD
 III
,
Wang
WC
,
Hong
YC
,
Wang
JC
,
Chen
K
,
Chang
IS
, et al
Genetic variant in TP63 on locus 3q28 is associated with risk of lung adenocarcinoma among never-smoking females in Asia
.
Hum Genet
2012
;
131
:
1197
203
.
35.
Lan
Q
,
Hsiung
CA
,
Matsuo
K
,
Hong
YC
,
Seow
A
,
Wang
Z
, et al
Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in Asia
.
Nat Genet
2012
;
44
:
1330
5
.
36.
Wang
Z
,
Seow
WJ
,
Shiraishi
K
,
Hsiung
CA
,
Matsuo
K
,
Liu
J
, et al
Meta-analysis of genome-wide association studies identifies multiple lung cancer susceptibility loci in never-smoking Asian women
.
Hum Mol Genet
2016
;
25
:
620
9
.
37.
Seow
WJ
,
Matsuo
K
,
Hsiung
CA
,
Shiraishi
K
,
Song
M
,
Kim
HN
, et al
Association between GWAS-identified lung adenocarcinoma susceptibility loci and EGFR mutations in never-smoking Asian women, and comparison with findings from Western populations
.
Hum Mol Genet
2017
;
26
:
454
65
.
38.
Lo
YL
,
Hsiao
CF
,
Chang
GC
,
Tsai
YH
,
Huang
MS
,
Su
WC
, et al
Risk factors for primary lung cancer among never smokers by gender in a matched case-control study
.
Cancer Causes Control
2013
;
24
:
567
76
.
39.
Chang
IS
,
Jiang
SS
,
Yang
JC
,
Su
WC
,
Chien
LH
,
Hsiao
CF
, et al
Genetic modifiers of progression-free survival in never-smoking lung adenocarcinoma patients treated with first-line tyrosine kinase inhibitors
.
Am J Respir Crit Care Med
2017
;
195
:
663
73
.
40.
Raaschou-Nielsen
O
,
Andersen
ZJ
,
Beelen
R
,
Samoli
E
,
Stafoggia
M
,
Weinmayr
G
, et al
Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for air pollution effects (ESCAPE)
.
Lancet Oncol
2013
;
14
:
813
22
.
41.
Gail
MH
,
Brinton
LA
,
Byar
DP
,
Corle
DK
,
Green
SB
,
Schairer
C
, et al
Projecting individualized probabilities of developing breast cancer for white females who are being examined annually
.
J Natl Cancer Inst
1989
;
81
:
1879
86
.
42.
Chen
J
,
Pee
D
,
Ayyagari
R
,
Graubard
B
,
Schairer
C
,
Byrne
C
, et al
Projecting absolute invasive breast cancer risk in white women with a model that includes mammographic density
.
J Natl Cancer Inst
2006
;
98
:
1215
26
.
43.
Cassidy
A
,
Myles
JP
,
van Tongeren
M
,
Page
RD
,
Liloglou
T
,
Duffy
SW
, et al
The LLP risk model: an individual risk prediction model for lung cancer
.
Br J Cancer
2008
;
98
:
270
6
.
44.
Gail
MH
,
Lubin
JH
,
Rubinstein
LV
. 
Likelihood calculations for matched case-control studies and survival studies with tied death times
.
Biometrika
1980
;
68
:
703
7
.
45.
Chien
LC
,
Wu
YJ
,
Hsiung
CA
,
Wang
LH
,
Chang
IS
. 
Smoothed lexis diagrams with applications to lung and breast cancer trends in Taiwan
.
J Am Stat Assoc
2015
;
110
:
1000
12
.
46.
Pepe
MS
,
Gu
JW
,
Morris
DE
. 
The potential of genes and other markers to inform about risk
.
Cancer Epidemiol Biomarkers Prev
2010
;
19
:
655
65
.
47.
Zhu
H
,
Zhang
S
. 
Body mass index and lung cancer risk in never smokers: a meta-analysis
.
BMC Cancer
2018
;
18
:
635
.
48.
Wu
X
,
Wen
CP
,
Ye
Y
,
Tsai
M
,
Wen
C
,
Roth
JA
, et al
Personalized risk assessment in never, light, and heavy smokers in a prospective cohort in Taiwan
.
Sci Rep
2016
;
6
:
36482
.
49.
Chatterjee
N
,
Wheeler
B
,
Sampson
J
,
Hartge
P
,
Chanock
SJ
,
Park
JH
. 
Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies
.
Nat Genet
2013
;
45
:
400
5
50.
Benichou
J
,
Gail
MH
. 
Methods of inference for estimates of absolute risk derived from population-based case-control studies
.
Biometrics
1995
;
51
:
182
94
.
51.
Tammemagi
MC
. 
Application of risk prediction models to lung cancer screening: a review
.
J Thorac Imaging
2015
;
30
:
88
100
.
52.
Muller
DC
,
Johansson
M
,
Brennan
P
. 
Lung cancer risk prediction model incorporating lung function: development and validation in the UK Biobank prospective cohort study
.
J Clin Oncol
2017
;
35
:
861
9
.
53.
Chatterjee
N
,
Shi
J
,
Garcia-Closas
M
. 
Developing and evaluating polygenic risk prediction models for stratified disease prevention
.
Nat Rev Genet
2016
;
17
:
392
406
.
54.
Maas
P
,
Barrdahl
M
,
Joshi
AD
,
Auer
PL
,
Gaudet
MM
,
Milne
RL
, et al
Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States
.
JAMA Oncol
2016
;
2
:
1295
302
.

Supplementary data