Background:

Polygenic risk scores (PRS) which summarize individuals’ genetic risk profile may enhance targeted colorectal cancer screening. A critical step towards clinical implementation is rigorous external validations in large community-based cohorts. This study externally validated a PRS-enhanced colorectal cancer risk model comprising 140 known colorectal cancer loci to provide a comprehensive assessment on prediction performance.

Methods:

The model was developed using 20,338 individuals and externally validated in a community-based cohort (n = 85,221). We validated predicted 5-year absolute colorectal cancer risk, including calibration using expected-to-observed case ratios (E/O) and calibration plots, and discriminatory accuracy using time-dependent AUC. The PRS-related improvement in AUC, sensitivity and specificity were assessed in individuals of age 45 to 74 years (screening-eligible age group) and 40 to 49 years with no endoscopy history (younger-age group).

Results:

In European-ancestral individuals, the predicted 5-year risk calibrated well [E/O = 1.01; 95% confidence interval (CI), 0.91–1.13] and had high discriminatory accuracy (AUC = 0.73; 95% CI, 0.71–0.76). Adding the PRS to a model with age, sex, family and endoscopy history improved the 5-year AUC by 0.06 (P < 0.001) and 0.14 (P = 0.05) in the screening-eligible age and younger-age groups, respectively. Using a risk-threshold of 5-year SEER colorectal cancer incidence rate at age 50 years, adding the PRS had a similar sensitivity but improved the specificity by 11% (P < 0.001) in the screening-eligible age group. In the younger-age group it improved the sensitivity by 27% (P = 0.04) with similar specificity.

Conclusions:

The proposed PRS-enhanced model provides a well-calibrated 5-year colorectal cancer risk prediction and improves discriminatory accuracy in the external cohort.

Impact:

The proposed model has potential utility in risk-stratified colorectal cancer prevention.

This article is featured in Highlights of This Issue, p. 287

Colorectal cancer is among the leading causes of cancer death (1). Despite decreasing colorectal cancer incidence overall, the incidence rate in individuals aged < 50 years has been increasing over the last decades (2), leading to a recent recommendation by the US Preventive Services Task Force (USPSTF) to lower the age at screening initiation to 45 years for individuals at average risk (3). However, given the enormous burden of nearly 22 million additional people becoming eligible for screening and that colorectal cancer remains a rare event in younger individuals, targeted screening based on an individual's risk factors has received much attention and may be an appealing alternative to a universal change of the screening age (4–10). Colorectal cancer risk prediction models summarize individuals’ colorectal cancer risk based on their risk profile and quantitatively position them on the risk spectrum. There is a growing interest in developing accurate and precise colorectal cancer risk prediction models to achieve risk stratification and targeted screening.

Polygenic risk scores (PRS), which uniquely summarize individuals’ genetic risk profile, have shown promising potential in colorectal cancer risk prediction (11–20). A PRS-enhanced colorectal cancer risk model is a model that adds genetic information via a PRS to other known colorectal cancer risk predictors, such as age and family history. Prior studies investigated the impact of PRS (comprising subsets of current known colorectal cancer loci) on colorectal cancer risk (11, 12, 14) and assessed the contribution of these PRS to model discrimination, which is the ability to assign higher risk scores to cases than non-cases (11–13, 15–17, 19, 20). However, due to the lack of external cohorts with sufficient events and available genetic information, earlier validation studies relied on either internal validation in a reserved subgroup or cross-validation, or external validation using UK Biobank (16, 17, 20), which has been included in recent genome-wide association study (GWAS) discoveries (21). Therefore, existing data are not adequate for externally validating a model incorporating the colorectal cancer loci discovered in these recent GWAS. Despite supportive findings from some of these studies, more real-world evidence on the validity of PRS-enhanced colorectal cancer risk models in large and diverse independent cohorts, particularly those reflecting sociodemographic diversity in consensus population, are warranted before the inclusion of PRS to colorectal cancer risk stratification can be considered at a population scale. In addition, limited consideration has been given to risk calibration, the closeness of the predicted and the observed risks. Without this crucial step in validation, it remains unclear if PRS-enhanced models reliably predict colorectal cancer risks observed in practice.

The Genetic Epidemiology Research on Adult Health and Aging cohort (GERA) established at Kaiser Permanente Northern California (KPNC) provides an opportunity, often rare in the era of large GWAS consortia, to externally validate PRS-enhanced colorectal cancer risk models as it has not been used in any colorectal cancer GWAS discoveries. KPNC, one of the large integrated health care systems serving 30% to 40% general population in northern California including Medicare and Medicaid patients, has a member cohort broadly representative of the regional consensus population's demographics (22). The GERA, with its large sample size, community-based and sociodemographically diverse population, and detailed clinical and genetic information, is uniquely positioned for real-world community-based validation on PRS-enhanced colorectal cancer risk models.

In this study, we extended a PRS-enhanced model (15) using individual-level demographic, clinical and genetic information, and externally validated the proposed model in the GERA, including a comprehensive assessment of absolute risk calibration and discriminatory accuracy. Our model includes an updated PRS comprising 140 colorectal cancer risk loci [including 77 additional loci from recent GWAS (21, 23, 24) compared with the PRS validated in Jeon and colleagues 2018 (15)] in addition to age, sex, first-degree family history of colorectal cancer (hereafter termed as family history) and endoscopy (including colonoscopy and sigmoidoscopy) history. We conducted a time-dependent validation under a time-to-event framework accounting for competing risk of mortality, which is important as the onset of colorectal cancer occurs generally later in a person's life. In addition, we evaluated the gain in prediction accuracy of including PRS in colorectal cancer risk prediction models in screening-eligible individuals with age 45 to 74 years, reflecting a recent USPSTF recommendation on colorectal cancer screening age (3), and individuals with age 40 to 49 years and no endoscopy history as the latter group may benefit from a risk-stratified screening-initiation strategy using the PRS-enhanced model.

Study population: model building

We extended the risk prediction model using a subset of Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) and Colorectal Cancer Transdisciplinary Study (CORECT; 9,748 colorectal cancer cases and 10,590 controls of European ancestry) with detailed harmonized data on age, family history, endoscopy history, and genetic risk factors [Supplementary Materials in Jeon and colleagues (15) and Supplementary Table S1] along with colorectal cancer incidence rates in SEER data (https://seer.cancer.gov/, registry18, 2007–2015, RRID:SCR_006902; ref. 25).

Study population: model validation

External validation was carried out in the GERA (Supplementary Methods and Supplementary Table S2). Briefly, the cohort is comprised of study participants (survey respondents) from the Research Program on Genes, Environment, and Health (RPGEH) and California Men's Health Study (CMHS), both nested within the member population of KPNC (26, 27). In total, 110,266 consenting participants who provided saliva samples were selected for genotyping. All racial and ethnic minority participants who provided saliva samples (n = 20,935, 19%) were selected for genotyping to maximize diversity and a random subset were selected from the approximately 140,000 available non-Hispanic White participants with biospecimens. 94,938 participants passed genotype quality control and had a valid PRS. The demographic characteristics of the GERA are generally representative of the RPGEH and CMHS survey respondents; compared with the KPNC member population, the GERA participants are on average older (average age at time of sample collection: 63 years), have a slightly greater proportion of non-Hispanic whites, and have higher levels of education and income. In the validation dataset, prevalent colorectal cancer cases were excluded from the analyses (n = 865). Participants who entered the cohort at ages < 40 years (n = 6,622) and ≥ 85 years (n = 2,230) were excluded because few colorectal cancer cases were diagnosed at ages < 40 years and aggregated SEER colorectal cancer rates for ages ≥ 85 years hindered a reliable estimation of colorectal cancer risk in this age group. The number of participants after exclusion was 85,221. Due to the relatively small number of colorectal cancer cases in non-European ancestry groups, we restricted our primary analysis to European-ancestry participants and secondarily evaluated the calibration of relative risk (RR) of PRS in other racial and ethnic groups. A comparison on risk factors between the GERA and GECCO/CORECT was shown in Supplementary Table S3. All study participants provided written informed consent and the study was approved by the KPNC and Fred Hutchinson Cancer Research Center Institutional Review Board (IRB).

Developing risk prediction models in GECCO and CORECT

We estimated sex-specific ORs of PRS, family history and endoscopy history associated with colorectal cancer risk using logistic regression models, adjusting for study and age, using pooled GECCO and CORECT data. The PRS was calculated as a weighted sum of numbers of the effective alleles of the 140 known colorectal cancer loci (Supplementary Table S4 and S4a) with the marginal log ORs estimated from > 125,000 samples, predominantly of European ancestry, as weights (21). Baseline colorectal cancer hazard rates were derived using colorectal cancer incidence rates in SEER18 (2007–2015). The absolute risk estimates for colorectal cancer were calculated on the basis of estimated sex-specific baseline colorectal cancer hazard rates and ORs (Supplementary Table S5), while accounting for competing risks from death (Supplementary Methods; refs. 25, 28).

Projecting risk in the GERA

The 5- and 10-year absolute colorectal cancer risks for each GERA participant were estimated from the model based on age, sex, family history, endoscopy history, and PRS of the 140 known loci (Supplementary Methods). To account for the diminishing protective impact of endoscopy over time, we classified patients who had received their last endoscopy greater than 10 years ago as no endoscopy. We chose 10 years based on the current recommended screening interval following a colonoscopy.

Statistical analysis

We primarily focused on the predicted 5-year risk and secondarily on the 10-year risk given that both family history and endoscopy history in our model varied over time, and risk prediction every 5 years was likely more accurate. All analyses were conducted under survival analysis framework to account for varying follow-up duration across participants, with the observed time-to-event defined as the time from study entry to the earliest of the following events: colorectal cancer diagnosis, death, last follow-up, or 6-months after first colonoscopy post study entry (Supplementary Methods).

Ethics declaration

The study was approved by the Fred Hutchinson Cancer Research Center and KPNC IRB. All study participants provided written informed consent as required by the IRBs. The data was de-identified.

Calibration (29)

We compared model-based expected colorectal cancer cases (E) with observed colorectal cancer cases in the GERA (O) within t years (t = 5 and 10) following study entry, where E was the sum of individuals’ absolute risk predictions at either the observed time-to-event or t-year, whichever comes first. The 95% confidence interval (CI) was calculated by using a normal approximation to Poisson distribution as E/O×exp(±1.96√1/O). We assessed t-year E/O ratios overall and in clinically meaningful subgroups defined by age (age 40–49, 50–59, and ≥ 60), sex (men, women), endoscopy history within 10 years prior to study entry (yes, no), and family history (yes, no). We further evaluated the consistency between E and O across 10 risk-based deciles based on the predicted 5- and 10-year colorectal cancer risks using calibration plots (Supplementary Methods). O in the jth decile for the t-year risk was estimated by 1-Sj(t), where Sj(t) is the Kaplan–Meier estimator for colorectal cancer–free probability accounting for right-censoring and competing risk (30, 31). As a secondary analysis, we evaluated how the RR of PRS calibrates in African American, Asian, European-ancestry, and Latinx groups by comparing the predicted and observed RR in 7 PRS-based strata in individual ancestral groups (Supplementary Methods). The stratum that included the sample medium of the PRS was set as the reference group in the RR calculation. The observed RR of a PRS stratum was estimated by fitting a Cox model with a 0–1 stratum indicator as the covariate using individuals included in this specific stratum and the reference stratum.

Discriminatory accuracy

We evaluated the discriminatory accuracy of 5- and 10-year risk predictions by time-dependent AUC, accommodating time-varying colorectal cancer status and predictors during follow-up. We accounted for competing risks of death (32) because 8.2% of participants experienced death without prior colorectal cancer diagnosis.

We compared the AUC of the proposed PRS-enhanced model with two reduced models that excluded PRS, Model 1 (age and first-degree family history; Supplementary Methods) as this is currently used to inform screening and Model 2 (age, first-degree family history, sex, and endoscopy history; Supplementary Methods) in all European-ancestry individuals and those aged 45 to 74 years eligible for screening (termed screening-eligible group hereafter). We also compared the AUC in individuals aged 40 to 49 years without endoscopy history (termed younger-age group hereafter) to explore whether the proposed PRS-enhanced model can better inform strategic screening initiation. In addition, we compared the 5-year risk prediction of these models in terms of two clinically relevant measures, sensitivity and specificity of identifying a high 5-year risk group with a risk threshold of 0.29%, the SEER 5-year colorectal cancer incidence rate at age 50 when colorectal cancer screening initiation was recommended conventionally for people at average colorectal cancer risk in the US.

We obtained the 95% CIs and P values by bootstrapping with 500 resamples. A two-sided P ≤ 0.05 was considered statistically significant. All analyses were performed using R 3.6 (33) and plots were generated using the R package ‘ggplot2’ (34).

Data availability

Access to GERA data used in this study may be obtained by application to the Kaiser Permanente Research Bank (KPRB) via [email protected]. A subset of the GERA consented for public use can be found at NlH/dbGaP: phs000674. Genotype data from CORECT and GECCO are deposited at NIH/dbGaP: phs001415.v1.p1, phs001315.v1.p1, phs001078.v1.p1, phs001499.v1.p1, phs001903.v1.p1, phs001856.v1.p1, phs001045.v1.p1, and phs001499.v1.p1.

Our primary analysis was restricted to GERA participants of European ancestry (n = 66,282, 78%). Among them, 57.7% were women, 57.5% underwent endoscopy before study entry, and 9.9% had a positive family history. The mean age at study entry was 63.2 years (Supplementary Table S2). 30% of men were enrolled in 2002–2003, earlier than the rest of the GERA. However, no appreciable difference in colorectal cancer rate was observed between this subgroup and the rest of the GERA. Overall, men and women were comparable in age, family history, endoscopy history, and PRS (Supplementary Table S2). The sex-specific OR estimates of these risk factors are provided in Supplementary Table S5.

Calibration

The PRS-enhanced model accurately predicted the number of colorectal cancer cases in a 5-year window. The overall E/O ratio for 5-year risk was 1.01 (95% CI, 0.91–1.13); the subgroup E/O ratios were close to 1 with 95% CIs including 1 (Table 1). The calibration for 10-year risk performed generally well (overall E/O ratio 1.00; 95% CI, 0.92–1.09; Supplementary Table S6), except for an underestimation in individuals with a positive family history (E/O ratio 0.77; 95% CI, 0.61–0.98), and an overestimation in the age subgroup of 50 to 59 years (E/O ratio 1.42; 95% CI, 1.11–1.83). The risk calibration stratified by the risk-based deciles (Fig. 1, left; Supplementary Figures S1–S5) demonstrated the same patterns of good calibration.

Table 1.

5-year E/O colorectal cancer case ratio in the GERA European-ancestry participants.

StratumEOE/O ratio (95% CI)
Overall 313.56 309 1.01 (0.91–1.13) 
Age at study entry, years 
 40–49 13.96 11 1.27 (0.70–2.29) 
 50–59 48.40 37 1.31 (0.95–1.81) 
 ≥60 251.19 261 0.96 (0.85–1.09) 
First-degree colorectal cancer family historya 
 Negative 279.24 270 1.03 (0.92–1.17) 
 Positive 34.32 39 0.88 (0.64–1.20) 
Sex 
 Women 160.66 173 0.93 (0.80–1.08) 
 Men 152.90 136 1.12 (0.95–1.33) 
Endoscopy historyb 
 No 136.55 139 0.98 (0.83–1.16) 
 Yes 177.01 170 1.04 (0.90–1.21) 
StratumEOE/O ratio (95% CI)
Overall 313.56 309 1.01 (0.91–1.13) 
Age at study entry, years 
 40–49 13.96 11 1.27 (0.70–2.29) 
 50–59 48.40 37 1.31 (0.95–1.81) 
 ≥60 251.19 261 0.96 (0.85–1.09) 
First-degree colorectal cancer family historya 
 Negative 279.24 270 1.03 (0.92–1.17) 
 Positive 34.32 39 0.88 (0.64–1.20) 
Sex 
 Women 160.66 173 0.93 (0.80–1.08) 
 Men 152.90 136 1.12 (0.95–1.33) 
Endoscopy historyb 
 No 136.55 139 0.98 (0.83–1.16) 
 Yes 177.01 170 1.04 (0.90–1.21) 

Note: We presented the comparison of expected (E) and observed (O) colorectal cancer cases based on the PRS-enhanced model (including age, first-degree colorectal cancer family history, sex, endoscopy history and PRS) in the 5 years follow-up via E/O ratio and the corresponding 95% CIs in GERA participants of European ancestry. The number of expected cases was calculated on the basis of age, first-degree family history, sex, endoscopy history and PRS; The number of observed cases was based on diagnosed cases in the cohort.

aFirst-degree family history of ascertained from study survey questionnaires and electronic health records.

bEndoscopy history in 10 years prior to study entry.

Figure 1.

Calibration plots of 5-year absolute risk and RR of PRS with risk-based and PRS-based stratification. We demonstrated the calibration of 5-year absolute risk based on the PRS-enhanced model [1(A), left] and RR of PRS [1(B), right] stratified by deciles of predicted risk (left) and PRS (right) in GERA participants of European-ancestry. The x-axis is the average of predicted absolute (left) and relative (right) risk values by deciles. The y-axis is the average observed absolute (left) and relative (right) risks by deciles of predicted absolute risk (left) and PRS (right), along with their 95% CIs indicated by the vertical error bars.

Figure 1.

Calibration plots of 5-year absolute risk and RR of PRS with risk-based and PRS-based stratification. We demonstrated the calibration of 5-year absolute risk based on the PRS-enhanced model [1(A), left] and RR of PRS [1(B), right] stratified by deciles of predicted risk (left) and PRS (right) in GERA participants of European-ancestry. The x-axis is the average of predicted absolute (left) and relative (right) risk values by deciles. The y-axis is the average observed absolute (left) and relative (right) risks by deciles of predicted absolute risk (left) and PRS (right), along with their 95% CIs indicated by the vertical error bars.

Close modal

The model-based RR of PRS calibrated well across the PRS range in the European-ancestry group (Fig. 1, right). For the non-European ancestry groups, the model-based RR of PRS were generally lower than the observed RR (Supplementary Fig. S6). However, the 95% CIs of the observed RR were wide due to the limited non-European ancestry colorectal cancer cases and covered the model-based RR.

Discriminatory accuracy

The overall 5-year AUC of the PRS-enhanced risk model was 0.73 (95% CI, 0.71–0.76; Table 2). The AUC did not vary by sex or endoscopy history. The AUC among individuals with a positive family history was 0.78 (95% CI, 0.70–0.86), 5% (95% CI, –3% to 13%) higher than the AUC in those without (0.73; 95% CI, 0.70–0.75). The AUC was 0.77 (95% CI, 0.70–0.84) in individuals aged 40 to 49 years and 0.72 (95% CI, 0.67–0.77) in 50 to 59 years, 10% (95% CI, 2%–17%) and 5% (95% CI, −1% to 11%) higher than in those aged 60 years or older (0.67; 95% CI, 0.64–0.71), respectively. The 10-year AUCs generally followed a similar pattern (Supplementary Table S7).

Table 2.

5-year time-dependent AUC estimates of the PRS-enhanced model in the GERA European-ancestry participants.

StratumAUC (95% CI)Difference to the lowest AUC in each stratification (95% CI)
Overall 0.73 (0.71–0.76) Not available 
Age at study entry, years 
 40–49 0.77 (0.70–0.84) 10% (2%–17%) 
 50–59 0.72 (0.67–0.77) 5% (–1% to 11%) 
 ≥60 0.67 (0.64–0.71) Reference 
First-degree colorectal cancer family history 
 Negative 0.73 (0.70–0.75) Reference 
 Positive 0.78 (0.70–0.86) 5% (–3% to 13%) 
Sex 
 Women 0.73 (0.70–0.76) Reference 
 Men 0.74 (0.70–0.79) 1% (–4% to 6%) 
Endoscopy history 
 No 0.73 (0.69–0.77) Reference 
 Yes 0.73 (0.70–0.77) 0% (–5% to 5%) 
StratumAUC (95% CI)Difference to the lowest AUC in each stratification (95% CI)
Overall 0.73 (0.71–0.76) Not available 
Age at study entry, years 
 40–49 0.77 (0.70–0.84) 10% (2%–17%) 
 50–59 0.72 (0.67–0.77) 5% (–1% to 11%) 
 ≥60 0.67 (0.64–0.71) Reference 
First-degree colorectal cancer family history 
 Negative 0.73 (0.70–0.75) Reference 
 Positive 0.78 (0.70–0.86) 5% (–3% to 13%) 
Sex 
 Women 0.73 (0.70–0.76) Reference 
 Men 0.74 (0.70–0.79) 1% (–4% to 6%) 
Endoscopy history 
 No 0.73 (0.69–0.77) Reference 
 Yes 0.73 (0.70–0.77) 0% (–5% to 5%) 

Note: The time-dependent AUC estimates (95% CI) of predicted 5-year absolute risk are based on the PRS-enhanced model including age, first-degree colorectal cancer family history, sex, endoscopy history and PRS.

The 5-year AUC of the PRS-enhanced model in the screening-eligible group was 0.70 (95% CI, 0.67–0.74), which was 6% (P < 0.001) higher than Model 2 (AUC 0.64; when including age and family history sex, endoscopy history; Table 3). An improvement was also observed in those aged 40 to 49 years without endoscopy history (0.76 vs. 0.62; P = 0.04) and in all European-ancestry participants (0.73 vs. 0.71; P = 0.04). The comparison of the PRS-enhanced model to Model 1 of age and family history showed very similar results. The 10-year AUC (Supplementary Table S8) also showed a similar pattern, except that the AUC improvements were not significant in the younger-age group.

Table 3.

Sequential model comparison using 5-year time-dependent AUC in the GERA European-ancestry participants.

ModelAUCPa
All participants eligible for screening with ageb 45 to 74 years 
 Model 1 0.64 <0.001 
 Model 2 0.64 <0.001 
 PRS-enhanced model 0.70 Reference 
All participants with ageb 40 to 49 years and having no endoscopy history 
 Model 1 0.60 0.04 
 Model 2 0.62 0.05 
 PRS-enhanced model 0.76 Reference 
All participants 
 Model 1 0.71 0.04 
 Model 2 0.71 0.01 
 PRS enhanced model 0.73 Reference 
ModelAUCPa
All participants eligible for screening with ageb 45 to 74 years 
 Model 1 0.64 <0.001 
 Model 2 0.64 <0.001 
 PRS-enhanced model 0.70 Reference 
All participants with ageb 40 to 49 years and having no endoscopy history 
 Model 1 0.60 0.04 
 Model 2 0.62 0.05 
 PRS-enhanced model 0.76 Reference 
All participants 
 Model 1 0.71 0.04 
 Model 2 0.71 0.01 
 PRS enhanced model 0.73 Reference 

Note: We compared time-dependent AUCs of 5-year absolute risks based on Model 1 (age and first-degree colorectal cancer family history) and Model 2 (age, first-degree colorectal cancer family history, sex, and endoscopy history) to the proposed PRS-enhanced model (age, first-degree colorectal cancer family history, sex, endoscopy history and PRS) in GERA European-ancestry participants.

aP value for comparing the AUC estimates of a reduced model to the PRS-enhanced model.

bAge at study entry.

Using a risk threshold of 0.29% to identify a high 5-year risk group in the screening-eligible group, no appreciable difference in sensitivity was observed across models, but specificity of the proposed PRS-enhanced model was 31.0% (95% CI, 30.6%–31.4%), which was 15% (P < 0.001) and 11% (P < 0.001) higher than Model 1 and Model 2, respectively (Table 4). A similar pattern was observed in all European-ancestry individuals. In the younger-age group, using the PRS-enhanced model to define a high 5-year risk group demonstrated a sensitivity of 64% (95% CI, 35%–92%), which was 55% (P < 0.001) and 27% (P = 0.04) higher than Models 1 and 2, respectively, and had a specificity of 80% (95% CI, 79%–81%), which was 17% (P < 0.001) lower than Model 1 and 1% (P = 0.05) higher than Model 2.

Table 4.

Sequential model comparison using 5-year sensitivity/specificity in the GERA European-ancestry participants.

Measure estimatea (95% CI)Difference to PRS-enhanced model (95% CIb)Pb
All participants eligible for screening with agec 45 to 74 years 
Sensitivity (177 individuals) 
 Model 1 93.8% (85.2%–96.9%) 0% (−5.2% to 3.9%) P > 0.99 
 Model 2 90.4% (85.1%–94.3%) −3.4% (−7.5% to 0.8%) P = 0.11 
 PRS-enhanced model 93.8% (85.2%–96.9%) Reference Reference 
Specificity (52,890 individuals) 
 Model 1 16.4% (16.1%–16.7%) −14.6% (−15.0% to −14.2%) P < 0.001 
 Model 2 20.1% (19.8%–20.4%) −10.9% (−11.3% to −10.6%) P < 0.001 
 PRS-enhanced model 31.0% (30.6%–31.4%) Reference Reference 
All participants with agec 40 to 49 years and having no endoscopy history 
Sensitivity (11 individuals) 
 Model 1 9% (0%–26%) −55% (−86% to −25%) P < 0.001 
 Model 2 36% (8%–65%) −27% (−54% to 0%) P = 0.04 
 PRS-enhanced model 64% (35%–92%) Reference Reference 
Specificity (6,889 individuals) 
 Model 1 97% (96%–97%) 17% (16%–18%) P < 0.001 
 Model 2 79% (78%–80%) −1% (−2% to 0%) P = 0.05 
 PRS-enhanced model 80% (79%–81%) Reference Reference 
All participants: age 40–84 years 
Sensitivity (309 individuals) 
 Model 1 96.1% (94.0%–98.3%) 0% (−2.7% to 2.7%) P > 0.99 
 Model 2 94.2% (91.6%–96.8%) −1.9% (−4.4% to 0.3%) P = 0.09 
 PRS-enhanced model 96.1% (94.0%–98.3%) Reference Reference 
Specificity (65,973 individuals) 
 Model 1 17.1% (16.8%–17.4%) −11.7% (−12.0% to −11.4%) P < 0.001 
 Model 2 20.0% (19.7%–20.4%) −8.8% (−9.0% to −8.4%) P < 0.001 
 PRS-enhanced model 28.8% (28.4%–29.1%) Reference Reference 
Measure estimatea (95% CI)Difference to PRS-enhanced model (95% CIb)Pb
All participants eligible for screening with agec 45 to 74 years 
Sensitivity (177 individuals) 
 Model 1 93.8% (85.2%–96.9%) 0% (−5.2% to 3.9%) P > 0.99 
 Model 2 90.4% (85.1%–94.3%) −3.4% (−7.5% to 0.8%) P = 0.11 
 PRS-enhanced model 93.8% (85.2%–96.9%) Reference Reference 
Specificity (52,890 individuals) 
 Model 1 16.4% (16.1%–16.7%) −14.6% (−15.0% to −14.2%) P < 0.001 
 Model 2 20.1% (19.8%–20.4%) −10.9% (−11.3% to −10.6%) P < 0.001 
 PRS-enhanced model 31.0% (30.6%–31.4%) Reference Reference 
All participants with agec 40 to 49 years and having no endoscopy history 
Sensitivity (11 individuals) 
 Model 1 9% (0%–26%) −55% (−86% to −25%) P < 0.001 
 Model 2 36% (8%–65%) −27% (−54% to 0%) P = 0.04 
 PRS-enhanced model 64% (35%–92%) Reference Reference 
Specificity (6,889 individuals) 
 Model 1 97% (96%–97%) 17% (16%–18%) P < 0.001 
 Model 2 79% (78%–80%) −1% (−2% to 0%) P = 0.05 
 PRS-enhanced model 80% (79%–81%) Reference Reference 
All participants: age 40–84 years 
Sensitivity (309 individuals) 
 Model 1 96.1% (94.0%–98.3%) 0% (−2.7% to 2.7%) P > 0.99 
 Model 2 94.2% (91.6%–96.8%) −1.9% (−4.4% to 0.3%) P = 0.09 
 PRS-enhanced model 96.1% (94.0%–98.3%) Reference Reference 
Specificity (65,973 individuals) 
 Model 1 17.1% (16.8%–17.4%) −11.7% (−12.0% to −11.4%) P < 0.001 
 Model 2 20.0% (19.7%–20.4%) −8.8% (−9.0% to −8.4%) P < 0.001 
 PRS-enhanced model 28.8% (28.4%–29.1%) Reference Reference 

Note: We presented the comparison across Model 1 (age and first-degree colorectal cancer family history), Model 2 (age, first-degree colorectal cancer family history, sex, and endoscopy history) and the PRS-enhanced model (age, first-degree colorectal cancer family history, sex, endoscopy history and PRS) on sensitivity and specificity with a risk threshold of 0.29% among GERA participants of European ancestry.

aWe dichotomized 5-year risks derived from each model to high- (>0.29%) and low-risk (≤0.29%) categories. The risk threshold of 0.29% represents the 5-year incidence colorectal cancer rate for average-risk individuals at age 50 based on SEER18 colorectal cancer incidence rates (2007–2015).

bThe 95% CI of the differences and the corresponding P values were obtained by bootstrap resampling with 500 resamples.

cAge at study entry.

The current analysis, using large training and validation datasets, substantially extends existing colorectal cancer risk prediction knowledge by incorporating an updated PRS and conducting a comprehensive external validation in a community-based cohort. The predicted 5- and 10-year colorectal cancer risks from the proposed PRS-enhanced risk model calibrated well overall in European-ancestry participants. The model accurately discriminated colorectal cancer cases and controls. In addition, the inclusion of the PRS in colorectal cancer risk models significantly improved the discriminatory accuracy of 5-year predicted risk in the screening-eligible group and the younger-age group. Using the average 5-year colorectal cancer risk at age 50 years as a threshold, adding the PRS leads to significant improvement in the specificity in the screening-eligible age group and the sensitivity in the younger-age group. The latter group with its low prevalence of colorectal cancer would likely benefit substantially from risk-stratified screening. Our study provides empirical support to warrant future studies on the evaluation of PRS-enhanced risk models focusing on younger population.

The PRS-enhanced model calibrated well overall and in subgroups defined by sex, family history, endoscopy history and age, with some overestimation of the 10-year risk in participants aged 50 to 59 years and underestimation among those with a family history of colorectal cancer. A possible explanation for the overestimation in those aged 50 to 59 years is that there were two spikes in the SEER colorectal cancer incidence rates around ages 50 and 65 years (Supplementary Fig. S7), which were used to calculated baseline risk. These possibly result from increased screen-detected colorectal cancer as people tend to undergo screening at these ages, which include the historical initiation of screening at age 50 and the beginning of insurance coverage with Medicare at age 65. The underestimation of the 10-year risk among those with a family history is likely due to the underestimated effect of family history on colorectal cancer risk in GECCO/CORECT compared with that in the GERA. By fitting a Cox proportional hazards model on colorectal cancer with endoscopy history, family history and PRS in the GERA, we observed that positive family history is associated with 2.2- and 1.6-times higher colorectal cancer HRs, in men and women, respectively, greater than the observed increased risk of 1.3 and 1.2 in men and women in GECCO/CORECT. Because it is difficult to reliably predict long-term (such as 10-year or lifetime) risk, one may consider updating risk prediction periodically using predicted 5-year risk based on one's most recent risk profiles, which is more accurate (35) and reflective of the clinical needs of a risk model (36) when recommendation on screening/intervention for the near future is needed.

Prior studies have validated other colorectal cancer risk models incorporating genetic components (11–13, 15–17, 19, 37–39). The AUCs of our model in external validation are comparable with or better than the best AUCs in these studies, especially considering that internal validation as used in prior studies tends to yield more optimistic results (40). Calibration of colorectal cancer absolute risk predicted by PRS-enhanced models has been given little emphasis in the literature and yet is crucial for determining if a model provides reliable predictions at an individual level. This study examined the absolute risk calibration of the proposed PRS-enhanced risk model among European-ancestry participants, overall and in several subgroups, and demonstrated that it provides accurate predictions. We also noted in an exploratory analysis comparing a risk model including endoscopy history versus not, that the endoscopy history does not show a strong predictive value in terms of AUC improvement (Supplementary Table S9) despite its significant association with colorectal cancer risk observed in the GERA [hazard rate 0.69 (P = 0.004) in men and 0.73 (P = 0.009) in women estimated from a Cox model] and an earlier study (41). This observation reflects a phenomenon that significant variables do not guarantee improvement in prediction, as discussed in Lo and colleagues (42).

The improved discrimination using the PRS-enhanced model in the younger-age group, although there are only 11 colorectal cancer cases in this group in our study, demonstrates the model's potential in risk stratification and supports the warrant for a further evaluation in this cohort with a sufficient sample size and number of colorectal cancer cases to assess the model-based triage of screening initiation. The result also supports our prior finding of a stronger predictive value at younger ages for PRS comprising of 95 known loci (43). Recently the USPSTF recommended to lower the starting age for screening to 45 years (3), addressing rising incidence rates of early onset colorectal cancer over the last two decades (44). However, concerns remain with this new recommendation, including the lack of outcome data supporting earlier screening and whether existing screening resources can absorb the nearly 22 million newly screening-eligible adults. As such, this risk model could be a starting point for identifying young individuals of higher colorectal cancer risk for earlier screening.

Advances in PRS development have invigorated the interest in using PRS-enhanced models for risk stratification to facilitate targeted screening or intervention. The decreasing genome-wide genotyping cost has made it affordable to implement PRS broadly. However, barriers of implementation of PRS in healthcare remain and need further study, including PRS tailored for different ancestries, education of clinicians and public, and adaptation of health care systems to manage and use continuously improved PRS (45). For the first barrier, the PRS developed in cohorts predominately of European-ancestry individuals may not predict well in non-European ancestral groups, due to different linkage disequilibrium patterns resulting likely attenuated genetic associations (46–49). We anticipate an expansion of PRS development and validation to include more ancestrally diverse populations to ensure equity in targeted screening across all populations. The second barrier will require more promotion and education to enhance clinician and public awareness of using genetic profile to inform complex disease risks. The last barrier is related to the continuous discovery of more colorectal cancer–associated loci and updates on scoring algorithms for the PRS. However, it does not require genome-wide genotyping on an individual multiple times because one's genetic profile is time-invariant and genome-wide information can be stored digitally once it is collected (45). Prior to updating the PRS in healthcare implementation, a cost–benefit evaluation to weigh the cost of a system update to adapt a new scoring algorithm and the improvement in predictive performance of the updated PRS may be needed.

There are several strengths of our study. First, the proposed model includes an updated PRS comprising 140 known colorectal cancer loci. Since the publication of earlier studies on colorectal cancer risk models with PRS (11–16, 19), a sizable number of colorectal cancer risk loci have been discovered in large-scale GWAS (21, 23, 24). Our work provides empirical evidence supporting that adding this updated PRS to a risk model improves risk stratification. Second, this is the first external validation for the updated PRS-enhanced colorectal cancer risk model. Assessing model performance in an independent validation cohort as shown in our study minimizes the impact of optimism, which is usually a concern in internal validation (40). Third, the GERA provides a unique and rare opportunity with detailed demographics, clinical and genetic information to externally validate the proposed PRS-enhanced model in a community-based setting. The comprehensive assessment shown in this study, including absolute risk calibration and model discrimination, are more generalizable to census population compared with assessment obtained in typical research cohorts. Overall, our study supplies warranted evidence for moving the inclusion of PRS in colorectal cancer screening further toward clinical consideration.

Our study has some limitations. First, the GERA included participants from two large cohorts of KPNC members recruited in 2002–2003 and 2007–2008. As the baseline colorectal cancer hazard was derived using SEER18 2007–2015, the prediction of participants in the earlier cohort may be less accurate. However, there was no appreciable difference between the empirical colorectal cancer incidence rates in these two cohorts. Second, the GERA was drawn from a single US geographic region and may not reflect the overall US population. Additional validations are needed to comprehensively assess the model performance in the overall US population and more globally. Third, only 11 European-ancestry individuals in the younger-age group developed colorectal cancer within 5 years since study entry. Our findings of greater AUC improvement in this subgroup may not be robust and generalizable to other cohorts due to the limited number of colorectal cancer cases. Further study focusing on individuals of age 40 to 49 years is warranted to confirm these results. Fourth, our model did not include lifestyle and environmental risk factors such as smoking, dietary, and nonsteroidal anti-inflammatory drugs use. These factors are difficult to measure precisely in clinical practice, as they are commonly obtained using questionnaires, hence prone to recall bias and measurement errors (12). Further research is needed before incorporating these factors in risk prediction models to ensure accuracy.

We externally validated that the PRS-enhanced colorectal cancer risk model was calibrated well and had high predictive performance among European-ancestral individuals in a large, community-based cohort. Our findings demonstrated improvement in risk discrimination and classification accuracy with the addition of the updated PRS in individuals eligible for screening and potentially in those aged < 50 years. The proposed PRS-enhanced model may aid in developing targeted screening strategy to improve screening efficiency and further enhance colorectal cancer prevention.

Y.R. Su reports grants from NIH during the conduct of the study. L.C. Sakoda reports grants from NCI during the conduct of the study; grants from AstraZeneca outside the submitted work. I. Lansdorp-Vogelaar reports grants from NCI during the conduct of the study. E.F.P Peterse reports after contributing to this study, I joined OPEN Health, a consultancy providing services in health economics and outcome research, market access and medical communication. A.G. Zauber reports grants from NIH outside the submitted work. J.A. Baron reports grants from NCI during the conduct of the study. E.L. Barry reports grants from NIH/NCI during the conduct of the study. G. Casey reports grants from NIH outside the submitted work. A.T. Chan reports grants and personal fees from Pfizer Inc.; personal fees from Boehringer Ingelheim, Bayer Pharma AG; grants from Zoe Ltd.; and grants from Freenome outside the submitted work. G.G. Giles reports grants from National Health and Medical Research Council during the conduct of the study. S.B. Gruber reports other support from Brogent International LLC outside the submitted work. J. Hampe reports grants from German BmBF during the conduct of the study. H. Hampel reports personal fees from Invitae, Genome Medical, Promega, GI OnDemand, 23andMe; and personal fees from Natera outside the submitted work. M.A. Jenkins reports grants from University of Melbourne during the conduct of the study; in addition, M.A. Jenkins has a patent for AU2016900254 issued to University of Melbourne. L. Li reports grants from NIH during the conduct of the study. V. Moreno reports grants from Agency for Management of University and Research Grants (AGAUR), Instituto de Salud Carlos III, co-funded by FEDER funds; and grants from Spanish Association Against Cancer (AECC) Scientific Foundation during the conduct of the study. P.D.P. Pharoah reports grants from Cancer Research UK during the conduct of the study. E.A. Platz reports I am the editor-in-chief of CEBP. R.E. Schoen reports grants from Freenome, Immunovia; and grants from Exact outside the submitted work. M.L. Slattery reports grants from NCI during the conduct of the study. A. Wolk reports grants from Swedish Research Council; and grants from Swedish Cancer Foundation during the conduct of the study. R.B. Hayes reports grants from NCI during the conduct of the study. D.A. Corley reports grants from NCI during the conduct of the study. The Editor-in-Chief of Cancer Epidemiology, Biomarkers & Prevention is an author on this article. In keeping with AACR editorial policy, a senior member of the Cancer Epidemiology, Biomarkers & Prevention editorial team managed the consideration process for this submission and independently rendered the final decision concerning acceptability. No disclosures were reported by the other authors.

The NIH had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Y.-R. Su: Conceptualization, software, formal analysis, investigation, visualization, methodology, writing–original draft. L.C. Sakoda: Conceptualization, formal analysis, investigation, writing–original draft. J. Jeon: Conceptualization, formal analysis, investigation, methodology, writing–review and editing. M. Thomas: Data curation, investigation, writing–review and editing. Y. Lin: Data curation, investigation, writing–original draft. J.L. Schneider: Investigation, project administration. N. Udaltsova: Data curation, formal analysis, investigation. J.K. Lee: Conceptualization, funding acquisition, investigation, writing–review and editing. I. Lansdorp-Vogelaar: Conceptualization, funding acquisition, investigation, writing–review and editing. E.F.P. Peterse: Investigation, writing–review and editing. A.G. Zauber: Funding acquisition, investigation, writing–review and editing. J. Zheng: Conceptualization, investigation, methodology, writing–review and editing. Y. Zheng: Conceptualization, investigation, methodology, writing–review and editing. E. Hauser: Investigation, writing–review and editing. J.A. Baron: Funding acquisition, investigation, writing–review and editing. E.L. Barry: Investigation, writing–review and editing. D.T. Bishop: Funding acquisition, investigation, writing–review and editing. H. Brenner: Funding acquisition, investigation, writing–review and editing. D.D. Buchanan: Funding acquisition, investigation, writing–review and editing. A. Burnett-Hartman: Investigation, writing–review and editing. P.T. Campbell: Funding acquisition, investigation, writing–review and editing. G. Casey: Funding acquisition, investigation, writing–review and editing. S. Castellví-Bel: Investigation, writing–review and editing. A.T. Chan: Funding acquisition, investigation, writing–review and editing. J. Chang-Claude: Funding acquisition, investigation, writing–review and editing. J.C. Figueiredo: Funding acquisition, investigation, writing–review and editing. S.J. Gallinger: Funding acquisition, investigation, writing–review and editing. G.G. Giles: Funding acquisition, investigation, writing–review and editing. S.B. Gruber: Funding acquisition, investigation, writing–review and editing. A. Gsur: Funding acquisition, investigation, writing–review and editing. M.J. Gunter: Funding acquisition, investigation, writing–review and editing. J. Hampe: Funding acquisition, investigation, writing–review and editing. H. Hampel: Funding acquisition, investigation, writing–review and editing. T.A. Harrison: Funding acquisition, investigation, project administration, writing–review and editing. M. Hoffmeister: Funding acquisition, investigation, writing–review and editing. X. Hua: Investigation, writing–review and editing. J.R. Huyghe: Funding acquisition, investigation, writing–review and editing. M.A. Jenkins: Funding acquisition, investigation, writing–review and editing. T.O. Keku: Funding acquisition, investigation, writing–review and editing. L. Le Marchand: Funding acquisition, investigation, writing–review and editing. L. Li: Funding acquisition, investigation, writing–review and editing. A. Lindblom: Funding acquisition, investigation, writing–review and editing. V. Moreno: Funding acquisition, investigation, writing–review and editing. P.A. Newcomb: Funding acquisition, investigation, writing–review and editing. P.D.P. Pharoah: Funding acquisition, investigation, writing–review and editing. E.A. Platz: Funding acquisition, investigation, writing–review and editing. J.D. Potter: Funding acquisition, investigation, writing–review and editing. C. Qu: Data curation, investigation. G. Rennert: Funding acquisition, investigation, writing–review and editing. R.E. Schoen: Funding acquisition, investigation, writing–review and editing. M.L. Slattery: Funding acquisition, investigation, writing–review and editing. M. Song: Funding acquisition, investigation, writing–review and editing. F.J.B. van Duijnhoven: Funding acquisition, investigation, writing–review and editing. B. Van Guelpen: Funding acquisition, investigation, writing–review and editing. P. Vodicka: Funding acquisition, investigation, writing–review and editing. A. Wolk: Funding acquisition, investigation, writing–review and editing. M.O. Woods: Funding acquisition, investigation, writing–review and editing. A.H. Wu: Funding acquisition, investigation, writing–review and editing. R.B. Hayes: Conceptualization, formal analysis, supervision, funding acquisition, investigation, writing–original draft. U. Peters: Conceptualization, formal analysis, supervision, funding acquisition, investigation, writing–original draft. D.A. Corley: Conceptualization, formal analysis, supervision, funding acquisition, investigation, writing–original draft. L. Hsu: Conceptualization, software, formal analysis, supervision, funding acquisition, investigation, methodology, writing–original draft.

This work was primarily supported by the NIH [grant number R01CA206279 (MPI: U. Peters; D.A. Corley; R. B. Hayes), R01CA195789 (PI: L. Hsu), R01CA189532 (PI: L. Hsu), UM1CA222035 (MPI: D.A. Corley; J. K. Lee), K07CA188142 (PI: L. C. Sakoda), R03CA215775 (PI: R. B. Hayes), and K07CA212057 (PI: J. K. Lee)]. Scientific Computing Infrastructure at Fred Hutch supporting the analyses in this study was funded by NIH Office of Research Infrastructure Programs grant S10OD028685. In addition, individual cohorts and studies in GECCO/CORECT/GERA were supported by other funding resources as listed in the supplement. The authors greatly thank participants, interviewers, coordinators, data managing staff, and all researchers in the GERA, GECCO, and CORECT. Y.R. Su thanks Kaiser Permanente Washington for supporting the writing of this manuscript.

The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).

2.
Murphy
CC
,
Singal
AG
,
Baron
JA
,
Sandler
RS
.
Decrease in incidence of young-onset colorectal cancer before recent increase
.
Gastroenterology
2018
;
155
:
1716
9
.
3.
US Preventive Services Task Force
.
Screening for colorectal cancer: US Preventive Services Task Force recommendation statement
.
JAMA
2021
;
325
:
1965
.
4.
Lieberman
DA
.
Targeted colon cancer screening: a concept whose time has almost come
.
Am J Gastroenterol
1992
;
87
:
1085
93
.
5.
Knudsen
AB
,
Zauber
AG
,
Rutter
CM
,
Naber
SK
,
Doria-Rose
VP
,
Pabiniak
C
, et al
.
Estimation of benefits, burden, and harms of colorectal cancer screening strategies: modeling study for the US Preventive Services Task Force
.
JAMA
2016
;
315
:
2595
609
.
6.
Campos
FG
.
Colorectal cancer in young adults: a difficult challenge
.
World J Gastroenterol
2017
;
23
:
5041
4
.
7.
Rex
DK
,
Boland
CR
,
Dominitz
JA
,
Giardiello
FM
,
Johnson
DA
,
Kaltenbach
T
, et al
.
Colorectal cancer screening: recommendations for physicians and patients from the U.S. multi-society task force on colorectal cancer
.
Am J Gastroenterol
2017
;
112
:
1016
30
.
8.
Corley
DA
,
Peek
RM
.
When should guidelines change? A clarion call for evidence regarding the benefits and risks of screening for colorectal cancer at earlier ages
.
Gastroenterology
2018
;
155
:
947
9
.
9.
Wolf
AMD
,
Fontham
ETH
,
Church
TR
,
Flowers
CR
,
Guerra
CE
,
LaMonte
SJ
, et al
.
Colorectal cancer screening for average-risk adults: 2018 guideline update from the American Cancer Society
.
CA Cancer J Clin
2018
;
68
:
250
81
.
10.
Weinberg
BA
,
Marshall
JL
.
Colon cancer in young adults: trends and their implications
.
Curr Oncol Rep
2019
;
21
:
3
.
11.
Dunlop
MG
,
Tenesa
A
,
Farrington
SM
,
Ballereau
S
,
Brewster
DH
,
Koessler
T
, et al
.
Cumulative impact of common genetic variants and other risk factors on colorectal cancer risk in 42,103 individuals
.
Gut
2013
;
62
:
871
81
.
12.
Ibáñez-Sanz
G
,
Díez-Villanueva
A
,
Alonso
MH
,
Rodríguez-Moranta
F
,
Pérez-Gómez
B
,
Bustamante
M
, et al
.
Risk model for colorectal cancer in Spanish population using environmental and genetic factors: results from the MCC-Spain study
.
Sci Rep
2017
;
7
:
43263
.
13.
Hsu
L
,
Jeon
J
,
Brenner
H
,
Gruber
SB
,
Schoen
RE
,
Berndt
SI
, et al
.
A model to determine colorectal cancer risk using common genetic susceptibility loci
.
Gastroenterology
2015
;
148
:
1330
9
.
14.
Weigl
K
,
Chang-Claude
J
,
Knebel
P
,
Hsu
L
,
Hoffmeister
M
,
Brenner
H
.
Strongly enhanced colorectal cancer risk stratification by combining family history and genetic risk score
.
Clin Epidemiol
2018
;
10
:
143
52
.
15.
Jeon
J
,
Du
M
,
Schoen
RE
,
Hoffmeister
M
,
Newcomb
PA
,
Berndt
SI
, et al
.
Determining risk of colorectal cancer and starting age of screening based on lifestyle, environmental, and genetic factors
.
Gastroenterology
2018
;
154
:
2152
64
.
16.
Smith
T
,
Gunter
MJ
,
Tzoulaki
I
,
Muller
DC
.
The added value of genetic information in colorectal cancer risk prediction models: development and evaluation in the UK Biobank prospective cohort study
.
Br J Cancer
2018
;
119
:
1036
9
.
17.
Saunders
GL
,
Kilian
B
,
Thompson
DJ
,
McGeoch
LJ
,
Griffin
SJ
,
Antoniou
AC
, et al
.
External validation of risk prediction models incorporating common genetic variants for incident colorectal cancer using UK Biobank
.
Cancer Prev Res
2020
;
13
:
509
20
.
18.
McGeoch
L
,
Saunders
CL
,
Griffin
SJ
,
Emery
JD
,
Walter
FM
,
Thompson
DJ
, et al
.
Risk prediction models for colorectal cancer incorporating common genetic variants: a systematic review
.
Cancer Epidemiol Biomarkers Prev
2019
;
28
:
1580
93
.
19.
Iwasaki
M
,
Tanaka-Mizuno
S
,
Kuchiba
A
,
Yamaji
T
,
Sawada
N
,
Goto
A
, et al
.
Inclusion of a genetic risk score into a validated risk prediction model for colorectal cancer in Japanese men improves performance
.
Cancer Prev Res
2017
;
10
:
535
41
.
20.
Kachuri
L
,
Graff
RE
,
Smith-Byrne
K
,
Meyers
TJ
,
Rashkin
SR
,
Ziv
E
, et al
.
Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction
.
Nat Commun
2020
;
11
:
6084
.
21.
Huyghe
JR
,
Bien
SA
,
Harrison
TA
,
Kang
HM
,
Chen
S
,
Schmit
SL
, et al
.
Discovery of common and rare genetic risk variants for colorectal cancer
.
Nat Genet
2019
;
51
:
76
87
.
22.
Gordon
N
,
Lin
T
.
The Kaiser Permanente Northern California adult member health survey
.
Perm J
2016
;
20
:
15
225
.
23.
The PRACTICAL consortium
,
Law
PJ
,
Timofeeva
M
,
Fernandez-Rozadilla
C
,
Broderick
P
,
Studd
J
,
Fernandez-Tajes
J
, et al
.
Association analyses identify 31 new risk loci for colorectal cancer susceptibility
.
Nat Commun
2019
;
10
:
2154
.
24.
Lu
Y
,
Kweon
SS
,
Tanikawa
C
,
Jia
WH
,
Xiang
YB
,
Cai
Q
, et al
.
Large-scale genome-wide association study of East Asians identifies loci associated with risk for colorectal cancer
.
Gastroenterology
2019
;
156
:
1455
66
.
25.
Gail
MH
,
Brinton
LA
,
Byar
DP
,
Corle
DK
,
Green
SB
,
Schairer
C
, et al
.
Projecting individualized probabilities of developing breast cancer for white females who are being examined annually
.
J Natl Cancer Inst
1989
;
81
:
1879
86
.
26.
Banda
Y
,
Kvale
MN
,
Hoffmann
TJ
,
Hesselson
SE
,
Ranatunga
D
,
Tang
H
, et al
.
Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort
.
Genetics
2015
;
200
:
1285
95
.
27.
Kvale
MN
,
Hesselson
S
,
Hoffmann
TJ
,
Cao
Y
,
Chan
D
,
Connell
S
, et al
.
Genotyping informatics and quality control for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort
.
Genetics
2015
;
200
:
1051
60
.
28.
Freedman
AN
,
Slattery
ML
,
Ballard-Barbash
R
,
Willis
G
,
Cann
BJ
,
Pee
D
, et al
.
Colorectal cancer risk prediction tool for white men and women without known susceptibility
.
J Clin Oncol
2009
;
27
:
686
93
.
29.
Steyerberg
EW
,
Vickers
AJ
,
Cook
NR
,
Gerds
T
,
Gonen
M
,
Obuchowski
N
, et al
.
Assessing the performance of prediction models: a framework for traditional and novel measures
.
Epidemiology
2010
;
21
:
128
38
.
30.
Aalen
O
.
Nonparametric estimation of partial transition probabilities in multiple decrement models
.
Ann Statist
1978
;
6
:534–545.
31.
Kalbfleisch
JD
,
Prentice
RL
.
The statistical analysis of failure time data
.
John Wiley & Sons, Inc.
;
2002
.
32.
Saha
P
,
Heagerty
PJ
.
Time-dependent predictive accuracy in the presence of competing risks
.
Biometrics
2010
;
66
:
999
1011
.
33.
R Core Team
.
R: A language and environment for statistical computing
. Available from: https://www.R-project.org/.
34.
Wickham
H
.
Ggplot2
.
Springer:
New York
;
2009
.
35.
MacInnis
RJ
,
Knight
JA
,
Chung
WK
,
Milne
RL
,
Whittemore
AS
,
Buchsbaum
R
, et al
.
Comparing 5-year and lifetime risks of breast cancer using the prospective family study cohort
.
J Natl Cancer Inst
2021
;
113
:
785
91
.
36.
Etzioni
R
,
Shen
Y
,
Shih
YCT
.
Identifying preferred breast cancer risk predictors: a holistic perspective
.
J Natl Cancer Inst
2021
;
113
:
660
1
.
37.
Jo
J
,
Nam
CM
,
Sull
JW
,
Yun
JE
,
Kim
SY
,
Lee
SJ
, et al
.
Prediction of colorectal cancer risk using a genetic risk score: The Korean Cancer Prevention Study-II (KCPS-II)
.
Genomics Inform
2012
;
10
:
175
83
.
38.
Wang
HM
,
Chang
TH
,
Lin
FM
,
Chao
TH
,
Huang
WC
,
Liang
C
, et al
.
A new method for post genome-wide association study (GWAS) analysis of colorectal cancer in Taiwan
.
Gene
2013
;
518
:
107
13
.
39.
Yarnall
JM
,
Crouch
DJM
,
Lewis
CM
.
Incorporating non-genetic risk factors and behavioral modifications into risk prediction models for colorectal cancer
.
Cancer Epidemiol
2013
;
37
:
324
9
.
40.
Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
K
.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement
.
BMC Med
2015
;
13
:
1
.
41.
Carr
PR
,
Weigl
K
,
Edelmann
D
,
Jansen
L
,
Chang-Claude
J
,
Brenner
H
, et al
.
Estimation of absolute risk of colorectal cancer based on healthy lifestyle, genetic risk, and colonoscopy status in a population-based study
.
Gastroenterology
2020
;
159
:
129
38
.
42.
Lo
A
,
Chernoff
H
,
Zheng
T
,
Lo
SH
.
Why significant variables aren't automatically good predictors
.
Proc Natl Acad Sci USA
2015
;
112
:
13892
7
.
43.
Archambault
AN
,
Su
YR
,
Jeon
J
,
Thomas
M
,
Lin
Y
,
Conti
DV
, et al
.
Cumulative burden of colorectal cancer–associated genetic variants is more strongly associated with early-onset vs late-onset cancer
.
Gastroenterology
2020
;
158
:
1274
86
.
44.
Bailey
CE
,
Hu
CY
,
You
YN
,
Bednarski
BK
,
Rodriguez-Bigas
MA
,
Skibber
JM
, et al
.
Increasing disparities in the age-related incidences of colon and rectal cancers in the United States, 1975–2010
.
JAMA Surg
2015
;
150
:
17
22
.
45.
Slunecka
JL
,
van der Zee
MD
,
Beck
JJ
,
Johnson
BN
,
Finnicum
CT
,
Pool
R
, et al
.
Implementation and implications for polygenic risk scores in healthcare
.
Hum Genomics
2021
;
15
:
46
.
46.
Martin
AR
,
Gignoux
CR
,
Walters
RK
,
Wojcik
GL
,
Neale
BM
,
Gravel
S
, et al
.
Human demographic history impacts genetic risk prediction across diverse populations
.
Am J Hum Genet
2017
;
100
:
635
49
.
47.
Hindorff
LA
,
Bonham
VL
,
Brody
LC
,
Ginoza
MEC
,
Hutter
CM
,
Manolio
TA
, et al
.
Prioritizing diversity in human genomics research
.
Nat Rev Genet
2018
;
19
:
175
85
.
48.
Martin
AR
,
Kanai
M
,
Kamatani
Y
,
Okada
Y
,
Neale
BM
,
Daly
MJ
.
Clinical use of current polygenic risk scores may exacerbate health disparities
.
Nat Genet
2019
;
51
:
584
91
.
49.
Wojcik
GL
,
Graff
M
,
Nishimura
KK
,
Tao
R
,
Haessler
J
,
Gignoux
CR
, et al
.
Genetic analyses of diverse populations improves discovery for complex traits
.
Nature
2019
;
570
:
514
8
.

Supplementary data