Abstract
Serrated polyps (SP) are precursors for colorectal cancer and contribute disproportionately to postcolonoscopy cancers. Leveraging three U.S. cohorts (43,974 women and 5,322 men), we developed prediction models for high-risk SPs (sized ≥10 mm or ≥3) among individuals undergoing their first colonoscopy screening. We then validated the model in the Partners Colonoscopy Cohort (51,203 women and 39,077 men). We evaluated discrimination and calibration using the C-statistic and Hosmer–Lemeshow test, respectively. The age and family history model generated a C-statistic [95% confidence interval (CI)] of 0.57 (0.56–0.58) in women and 0.58 (0.55–0.61) in men. Further inclusion of smoking, alcohol, and body mass index (the simple model) increased the C-statistic (95% CI) to 0.68 (0.67–0.69) in women and 0.68 (0.66–0.71) in men (all P < 0.001). Adding more predictors did not provide much incremental predictivity. In the validation cohort, moderate discrimination was observed in both women (0.60, 0.58–0.61) and men (0.60, 0.59–0.62). Notably, the simple model also yielded similar C-statistics for a composite endpoint of SPs and high-risk conventional adenomas (women, 0.62, 0.62–0.63; men, 0.63, 0.61–0.64). The model was adequately calibrated in both sets of cohorts. In summary, we developed and externally validated a simple prediction model based on five major risk factors for high-risk SPs that may be useful for healthy lifestyle recommendations and tailored colorectal cancer screening.
On the basis of four prospective studies in the United States, we developed and externally validated a simple risk prediction model for high-risk SPs in the setting of colonoscopy screening. Our model showed moderate discriminatory accuracy and has potential utility for individualized risk assessment, healthy lifestyle recommendations, and tailored colorectal cancer prevention.
Introduction
Colorectal cancer is the third leading cause of cancer-related death in the United States (1). Increasing uptake of colonoscopy has been suggested to contribute to half of the decline in colorectal cancer incidence and mortality between 1975 and 2000 in the United States by the removal of premalignant lesions and early cancers (2–4). However, only about 60% of adults undergo screening as recommended and the uptake is much lower among disadvantaged individuals (5). Only less than 10% of adults who have undergone colonoscopy are diagnosed with advanced neoplasia, resulting in over 90% of the adults having undergone an invasive procedure with no significant direct benefit (6). There is a high unmet need to identify high-risk individuals for tailored colonoscopy screening.
In addition to conventional adenomas arising from chromosomal instability, there was another group of premalignant lesions for colorectal cancer named serrated polyps (SP; ref. 7). SPs have been shown to contribute to approximately 20%–30% of colorectal cancer cases (8). Because SPs are more frequently found in the proximal colon where the detection capability of colonoscopy is limited, SPs are more likely to be missed or incompletely resected, resulting in their disproportionate contribution to postcolonoscopy colorectal cancers (9, 10). SPs sized at least 10 mm or multiple SPs have been shown to increase colorectal cancer risk, and SPs satisfying any of these criteria may be considered as high-risk SPs (11–13). Thus, identifying individuals at high risk of developing high-risk SPs is critical to improving the effectiveness of colonoscopy screening for better prevention of colorectal cancer (14). To our knowledge, no risk prediction models for SPs have yet been developed. Therefore, based on a thorough review of the existing epidemiologic evidence, we developed and externally validated a set of simple risk prediction models for high-risk SPs in independent large prospective U.S. cohorts.
Materials and Methods
Study population
We used data from two independent sets of prospective cohorts to develop and externally validate risk prediction models following the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guideline (15) and the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement (16).
For model development, we used data from three U.S. cohorts as follows. The Nurses’ Health Study (NHS) included 121,700 U.S. female nurses ages 30–55 years at enrollment in 1976. The NHS2 included 116,430 registered U.S. female nurses ages 25–42 years at enrollment in 1989. The Health Professionals Follow-up Study (HPFS) enrolled 51,529 male health professionals ages 40–75 years at study entry in 1986. More details about the three cohorts were described previously (11, 17). In brief, participants were mailed questionnaires biennially until 2012 to collect epidemiologic information, and disease outcomes including the diagnosis of colorectal cancer and polyps. Diet was assessed by a validated food frequency questionnaire every 4 years. The average follow-up rate has been greater than 90% in all three cohorts. Because detailed histologic information on polyps was not collected until 1992 for the NHS/HPFS and 1991 for the NHS2, these years were used as the baseline for the current study. At baseline, participants free of cancer (except nonmelanoma skin cancer), colorectal polyp, inflammatory bowel disease, and who had reported undergone first-time screening endoscopy for colorectal cancer were included. Individuals who reported occult or visible blood, diarrhea, constipation, or abdominal pain as indication(s) for colonoscopy or reported sigmoidoscopy as their first-time screening endoscopy were excluded. Participants with missing values in predictors were also excluded. A total of 13,200 eligible participants in the NHS, 30,774 in the NHS2, and 5,322 in the HPFS were included in the current analysis (Supplementary Fig. S1). This study was conducted in accordance with the principles of the Declaration of Helsinki. All participants signed an informed consent form. The study was approved by the Institutional Review Boards at the Brigham and Women's Hospital and Harvard T.H. Chan School of Public Health, and those of participating registries as required.
For model external validation, we used data from the Partners Colonoscopy Cohort (18), which included all colonoscopies performed in patients ages more than 18 years from 2007 to 2018 from the Brigham and Women's Hospital, Brigham and Women's Faulkner Hospital, and Massachusetts General Hospital. In brief, utilizing natural language processing, we extracted and merged endoscopic data (including the polyp location, polyp number for each location, polyp location, histology, and dysplasia), pathologic data, demographic data, colorectal data, and genetic data. In the current study, we restricted to participants with screening colonoscopy for the model validation. This study was conducted in accordance with the principles of the Declaration of Helsinki. All participants signed an informed consent form. The study was approved by the Partners Human Research Committee (Institutional Review Board Protocol 2018P002052).
Ascertainment of colorectal polyps
On each biennial questionnaire, participants were asked whether they had undergone a colonoscopy, and whether any polyps had been diagnosed in the past 2 years. For those who reported yes, informed consent was obtained to collect endoscopy and pathology reports, and relevant histopathologic data on type, anatomic location, number, and size (mm) of polyps was extracted. Although the systematic record review for individuals who reported to have undergone an endoscopy but no polyps was not conducted, our previous validation study in a random sample of 114 women indicated an accuracy of 97% for self-reported negative endoscopies (19).
According to the World Health Organization classification schema (20), SPs included hyperplastic polyps (HP), sessile serrated lesions (SSL), and traditional serrated adenomas (TSA). Because all initial endoscopies in the current study were performed before 2010 when there was no consensus on diagnostic criteria for specific SP subtypes, and recommendations were still evolving with the pathologic diagnostic criteria (21), we could not distinguish HPs from SSA/Ps and TSAs. Thus, the definition of SP includes HPs, TSAs and SSLs in this study. If subjects had both SP and conventional adenomas at endoscopy, each type of polyp was recorded separately. If more than one SP was diagnosed in the same anatomic region, the size of the largest was used. In the current study, we focused on high-risk SP, which was defined as having at least one SP ≥ 1 cm in diameter, or ≥3 SPs. Participants diagnosed with high-risk SPs were considered as cases, and those with no polyps were used as noncases.
Assessment of risk factors
We thoroughly reviewed the existing evidence and considered the simplicity of each factor to ensure our model could be implemented in general settings. In the development cohorts, for risk factors with periodic updates, the most updated information before the first colonoscopy was used.
On the basis of the literature review, we selected and assessed the following 16 established/potential risk factors for SPs, adenomas, and colorectal cancer including age, family history of colorectal cancer in the first degree, body mass index (BMI), smoking, alcohol consumption, regular aspirin use, physical activity, multivitamin intake, processed red meat, beef/pork/lamb as main dish or servings, total fiber, total folate, total vitamin D, total calcium, and marine omega-3 fatty acid. For women, menopausal status and postmenopausal hormone use were also evaluated. The details of assessment and validation for each of the risk factors, including foods and nutrients, have been described previously (17, 22, 23). For missing data on the studied risk factors (all less than 10%), we carried forward the most recently available information from prior questionnaires (22).
Statistical analysis
Because SPs are more commonly diagnosed in women than men, we developed models for each sex separately. We considered three sets of models: (i) the age and family history model based on age and family history of colorectal cancer (defined as a diagnosis of colorectal cancer in parents or siblings) only, (ii) the simple model based on the five established risk factors for SPs, including age, family history of colorectal cancer, BMI, smoking, and alcohol consumption, and (iii) the full model assessing 12 additional dietary/lifestyle factors through stepwise multivariable-adjusted logistic regression (Pentry = 0.15 and Pstay = 0.20). We calculated ORs and 95% confidence intervals (CI) of high-risk SPs. We also calculated predicted risk by exp(βo+∑βiXi)/(1 + exp(βo+∑βiXi)), where βo was the intercept, and βi was the regression coefficient for risk factor Xi.
Model discrimination was assessed using ROC curves and C-statistic. To facilitate clinical translation of our findings, we evaluated the predictive ability of risk prediction models according to age group (<55, or ≥55 years), SPs subgroups (≥1 cm in diameter, or ≥3), and period (<2000, 2000–2005, or ≥2005). In addition, for the simple model, we performed internal validation by 10-fold cross-validation. Reclassification was assessed by the integrated discrimination improvement (IDI) and the net reclassification improvement (NRI; ref. 24). Model calibration was assessed by the Hosmer–Lemeshow goodness-of-fit test by comparing the observed and predicted probability across deciles of the predicted risk. PHL values >0.05 indicate adequate calibration.
To test the broad utility of our models for tailored colorectal cancer screening, in secondary analysis, we considered a composite endpoint defined as having either high-risk SPs or high-risk conventional adenomas [HRA, defined as having at least one adenoma ≥1 cm in diameter, or with advanced histology (tubulovillous or villous histologic features or high grade or severe dysplasia), or ≥3 adenomas]. We conducted sensitivity analyses using continuous instead of categorical predictors and after excluding individuals with ≥3 small distal HPs from high-risk SPs.
All the statistical analyses were conducted using SAS version 9.4 (SAS Institute). All statistical tests were two sided. A P value of <0.05 was considered statistically significant.
Role of funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
Data availability statement
The data generated in this study are not publicly available but are available upon reasonable request from the corresponding author.
Results
The characteristics of participants at their first screening colonoscopy are shown in Table 1. Among 43,974 women from NHS and NHS2 and 5,322 men from HPFS, we documented 2,495 (5.7%) SPs and 508 (1.2%) high-risk SPs cases in women, respectively. The detection rate of SPs (7.9%) and high-risk SPs (2.1%) was higher in men than that in women. Compared with participants without polyps, high-risk SP cases were more likely to have a family history of colorectal cancer, smoke, and drink alcohol; had a higher BMI; and consumed more processed red meat and less fiber, folate, and vitamin D. The characteristics of participants in the NHS and NHS2 separately are shown in Supplementary Table S1.
Characteristics of study participants by sex and high-risk SPs.a
. | Women . | Men . | ||||
---|---|---|---|---|---|---|
Characteristic . | Overall population . | Non-Polyps . | High-risk SPs . | Overall population . | Non-Polyps . | High-risk SPs . |
Number of participants | 43,974 | 43,466 | 508 | 5,322 | 5,211 | 111 |
Age, y, mean (SD) | 55.75 ± 8.76 | 55.73 ± 8.75 | 57.17 ± 9.2 | 63.29 ± 7.95 | 63.26 ± 7.96 | 65 ± 7.16 |
<60, n (%) | 32,256 (73.35) | 31,929 (73.46) | 327 (64.38) | 1,911 (35.91) | 1,886 (36.19) | 25 (22.52) |
60–64, n (%) | 3,893 (8.85) | 3,824 (8.80) | 69 (13.58) | 1,264 (23.75) | 1,231 (23.62) | 33 (29.73) |
65–69, n (%) | 3,922 (8.92) | 3,869 (8.90) | 53 (10.43) | 964 (18.11) | 943 (18.10) | 21 (18.92) |
≥70, n (%) | 3,903 (8.88) | 3,844 (8.84) | 59 (11.61) | 1,183 (22.23) | 1,151 (22.09) | 32 (28.83) |
Race, n (%) | ||||||
White | 42,156 (95.87) | 41,654 (95.83) | 502 (98.82) | 5,081 (95.47) | 4,973 (95.43) | 108 (97.30) |
Non-White | 1,818 (4.13) | 1,812 (4.17) | 6 (1.18) | 241 (4.53) | 238 (4.57) | 3 (2.70) |
Family history of CRC, n (%) | ||||||
No | 36,122 (82.14) | 35,743 (82.23) | 379 (74.61) | 4,533 (85.17) | 4,438 (85.17) | 95 (85.59) |
Yes | 7,852 (17.86) | 7,723 (17.77) | 129 (25.39) | 789 (14.83) | 773 (14.83) | 16 (14.41) |
BMI, kg/m2, mean (SD) | 26.91 ± 5.82 | 26.9 ± 5.82 | 27.99 ± 5.8 | 26.42 ± 3.68 | 26.39 ± 3.68 | 27.27 ± 3.86 |
<22.5, n (%) | 10,016 (22.77) | 9,930 (22.85) | 86 (16.93) | 552 (10.37) | 547 (10.49) | 5 (4.50) |
22.5–24.9, n (%) | 9,659 (21.97) | 9,577 (22.03) | 82 (16.14) | 1,386 (26.04) | 1,358 (26.06) | 28 (25.23) |
25.0–27.4, n (%) | 8,497 (19.32) | 8,386 (19.29) | 111 (21.85) | 1,697 (31.89) | 1,666 (31.97) | 31 (27.93) |
27.5–29.9, n (%) | 5,185 (11.79) | 5,112 (11.76) | 73 (14.37) | 930 (17.47) | 907 (17.41) | 23 (20.72) |
30.0–34.9, n (%) | 6,512 (14.81) | 6,410 (14.75) | 102 (20.08) | 613 (11.52) | 594 (11.40) | 19 (17.12) |
≥35.0, n (%) | 4,105 (9.34) | 4,051 (9.32) | 54 (10.63) | 144 (2.71) | 139 (2.67) | 5 (4.50) |
Pack-years of smoking, mean (SD) | 6.52 ± 12.94 | 6.44 ± 12.85 | 12.95 ± 17.63 | 9.95 ± 15.8 | 9.81 ± 15.66) | 16.81 ± 20.11 |
Never smokers, n (%) | 26,766 (60.87) | 26,568 (61.12) | 198 (38.98) | 2,826 (53.09) | 2,786 (53.47) | 40 (36.03) |
Past smokers, n (%) | ||||||
<30 pack-years | 13,351 (30.36) | 13,160 (30.28) | 191 (37.6) | 1,766 (33.18) | 1,726 (33.12) | 40 (36.04) |
≥30 pack-years | 1,949 (4.43) | 1,899 (4.37) | 50 (9.84) | 516 (9.70) | 502 (9.63) | 14 (12.61) |
Current smokers, n (%) | ||||||
<30 pack-years | 828 (1.88) | 809 (1.86) | 19 (3.74) | 78 (1.47) | 72 (1.38) | 6 (5.41) |
≥30 pack-years | 1,080 (2.46) | 1,030 (2.37) | 50 (9.84) | 136 (2.56) | 125 (2.40) | 11 (9.91) |
Alcohol intake, g/d, mean (SD) | 5.55 ± 9.27 | 5.53 ± 9.24 | 7.39 ± 11.24 | 12.09 ± 15.4 | 12.05 ± 15.4 | 13.67 ± 15.66 |
<7.0, n (%) | 32,810 (74.61) | 32,465 (74.7) | 345 (67.92) | 2,657 (49.93) | 2,613 (50.15) | 44 (39.64) |
7.0–13.9, n (%) | 5,704 (12.97) | 5,635 (12.96) | 69 (13.58) | 998 (18.75) | 971 (18.63) | 27 (24.32) |
14.0–27.9, n (%) | 3,829 (8.71) | 3,775 (8.68) | 54 (10.63) | 982 (18.45) | 963 (18.48) | 19 (17.12) |
≥28.0, n (%) | 1,631 (3.71) | 1,591 (3.66) | 40 (7.87) | 685 (12.87) | 664 (12.74) | 21 (18.92) |
Regular aspirin user, n (%)b | ||||||
No | 34,538 (78.54) | 34,141 (78.55) | 397 (78.15) | 2,872 (53.96) | 2,820 (54.12) | 52 (46.85) |
Yes | 9,436 (21.46) | 9,325 (21.45) | 111 (21.85) | 2,450 (46.04) | 2,391 (45.88) | 59 (53.15) |
Physical activity, MET-hours/week, n (%)c | 21.58 ± 26.38 | 21.59 ± 26.39 | 19.87 ± 25.01 | 34.65 ± 32.17 | 34.72 ± 32.22 | 30.78 ± 29.7 |
<7.5, n (%) | 14,248 (32.40) | 14,069 (32.37) | 179 (35.23) | 915 (17.20) | 897 (17.21) | 18 (16.22) |
7.5–14.9, n (%) | 8,777 (19.96) | 8,678 (19.97) | 99 (19.49) | 705 (13.25) | 682 (13.09) | 23 (20.72) |
15.0–29.9, n (%) | 10,369 (23.58) | 10,241 (23.56) | 128 (25.20) | 1,327 (24.93) | 1,299 (24.93) | 28 (25.23) |
30.0–59.9, n (%) | 7,573 (17.22) | 7,500 (17.25) | 73 (14.37) | 1,477 (27.75) | 1,450 (27.83) | 27 (24.32) |
≥60, n (%) | 3,007 (6.84) | 2,978 (6.85) | 29 (5.71) | 898 (16.87) | 883 (16.94) | 15 (13.51) |
Multivitamin user, n (%) | ||||||
No | 16,710 (38.00) | 16,512 (37.99) | 198 (38.98) | 2,055 (38.61) | 2,012 (38.61) | 43 (38.74) |
Yes | 27,264 (62.00) | 26,954 (62.01) | 310 (61.02) | 3,267 (61.39) | 3,199 (61.39) | 68 (61.26) |
Menopausal and hormone use status, n (%) | ||||||
Premenopausal | 16,480 (37.48) | 16,304 (37.51) | 176 (34.64) | |||
Postmenopausal, never | 10,481 (23.83) | 10,333 (23.77) | 148 (29.13) | |||
Postmenopausal, past | 6,823 (15.52) | 6,758 (15.55) | 65 (12.80) | |||
Postmenopausal, current | 10,190 (23.17) | 10,071 (23.17) | 119 (23.43) | |||
Processed red meat, servings/wk, mean (SD) | 1.47 ± 1.81 | 1.47 ± 1.81 | 1.58 ± 1.7 | 1.57 ± 2.07 | 1.56 ± 2.02 | 2.07 ± 3.66 |
Beef/pork/lamb as main dish, servings/wk, mean (SD) | 0.92 ± 0.97 | 0.92 ± 0.97 | 0.99 ± 0.95 | 1.13 ± 1.19 | 1.13 ± 1.19 | 1.19 ± 1.09 |
Total fiber intake, g/day, mean (SD) | 20.63 ± 6.14 | 20.64 ± 6.14 | 19.92 ± 5.74 | 23.38 ± 7.17 | 23.4 ± 7.16 | 22.36 ± 7.58 |
Total folate intake, μg/day, mean (SD) | 676.31 ± 316.29 | 676.44 ± 316.43 | 664.84 ± 303.69 | 735.53 ± 370.48 | 736.19 ± 370.18 | 704.67 ± 384.64 |
Vitamin D intake, IU/day, mean (SD) | 497.59 ± 340.97 | 497.94 ± 341.10 | 467.15 ± 329.10 | 467.14 ± 296.86 | 467.96 ± 297.32 | 428.32 ± 272.67 |
Calcium intake, mg/day, mean (SD) | 1397.63 ± 637.44 | 1398.83 ± 637.81 | 1294.6 ± 596.86 | 1059.26 ± 487.61 | 1058.82 ± 487.52 | 1079.88 ± 493.66 |
Marine omega-3 fatty acid intake, g/day, mean (SD) | 0.25 ± 0.28 | 0.25 ± 0.28 | 0.24 ± 0.25 | 0.36 ± 0.36 | 0.36 ± 0.36 | 0.37 ± 0.34 |
. | Women . | Men . | ||||
---|---|---|---|---|---|---|
Characteristic . | Overall population . | Non-Polyps . | High-risk SPs . | Overall population . | Non-Polyps . | High-risk SPs . |
Number of participants | 43,974 | 43,466 | 508 | 5,322 | 5,211 | 111 |
Age, y, mean (SD) | 55.75 ± 8.76 | 55.73 ± 8.75 | 57.17 ± 9.2 | 63.29 ± 7.95 | 63.26 ± 7.96 | 65 ± 7.16 |
<60, n (%) | 32,256 (73.35) | 31,929 (73.46) | 327 (64.38) | 1,911 (35.91) | 1,886 (36.19) | 25 (22.52) |
60–64, n (%) | 3,893 (8.85) | 3,824 (8.80) | 69 (13.58) | 1,264 (23.75) | 1,231 (23.62) | 33 (29.73) |
65–69, n (%) | 3,922 (8.92) | 3,869 (8.90) | 53 (10.43) | 964 (18.11) | 943 (18.10) | 21 (18.92) |
≥70, n (%) | 3,903 (8.88) | 3,844 (8.84) | 59 (11.61) | 1,183 (22.23) | 1,151 (22.09) | 32 (28.83) |
Race, n (%) | ||||||
White | 42,156 (95.87) | 41,654 (95.83) | 502 (98.82) | 5,081 (95.47) | 4,973 (95.43) | 108 (97.30) |
Non-White | 1,818 (4.13) | 1,812 (4.17) | 6 (1.18) | 241 (4.53) | 238 (4.57) | 3 (2.70) |
Family history of CRC, n (%) | ||||||
No | 36,122 (82.14) | 35,743 (82.23) | 379 (74.61) | 4,533 (85.17) | 4,438 (85.17) | 95 (85.59) |
Yes | 7,852 (17.86) | 7,723 (17.77) | 129 (25.39) | 789 (14.83) | 773 (14.83) | 16 (14.41) |
BMI, kg/m2, mean (SD) | 26.91 ± 5.82 | 26.9 ± 5.82 | 27.99 ± 5.8 | 26.42 ± 3.68 | 26.39 ± 3.68 | 27.27 ± 3.86 |
<22.5, n (%) | 10,016 (22.77) | 9,930 (22.85) | 86 (16.93) | 552 (10.37) | 547 (10.49) | 5 (4.50) |
22.5–24.9, n (%) | 9,659 (21.97) | 9,577 (22.03) | 82 (16.14) | 1,386 (26.04) | 1,358 (26.06) | 28 (25.23) |
25.0–27.4, n (%) | 8,497 (19.32) | 8,386 (19.29) | 111 (21.85) | 1,697 (31.89) | 1,666 (31.97) | 31 (27.93) |
27.5–29.9, n (%) | 5,185 (11.79) | 5,112 (11.76) | 73 (14.37) | 930 (17.47) | 907 (17.41) | 23 (20.72) |
30.0–34.9, n (%) | 6,512 (14.81) | 6,410 (14.75) | 102 (20.08) | 613 (11.52) | 594 (11.40) | 19 (17.12) |
≥35.0, n (%) | 4,105 (9.34) | 4,051 (9.32) | 54 (10.63) | 144 (2.71) | 139 (2.67) | 5 (4.50) |
Pack-years of smoking, mean (SD) | 6.52 ± 12.94 | 6.44 ± 12.85 | 12.95 ± 17.63 | 9.95 ± 15.8 | 9.81 ± 15.66) | 16.81 ± 20.11 |
Never smokers, n (%) | 26,766 (60.87) | 26,568 (61.12) | 198 (38.98) | 2,826 (53.09) | 2,786 (53.47) | 40 (36.03) |
Past smokers, n (%) | ||||||
<30 pack-years | 13,351 (30.36) | 13,160 (30.28) | 191 (37.6) | 1,766 (33.18) | 1,726 (33.12) | 40 (36.04) |
≥30 pack-years | 1,949 (4.43) | 1,899 (4.37) | 50 (9.84) | 516 (9.70) | 502 (9.63) | 14 (12.61) |
Current smokers, n (%) | ||||||
<30 pack-years | 828 (1.88) | 809 (1.86) | 19 (3.74) | 78 (1.47) | 72 (1.38) | 6 (5.41) |
≥30 pack-years | 1,080 (2.46) | 1,030 (2.37) | 50 (9.84) | 136 (2.56) | 125 (2.40) | 11 (9.91) |
Alcohol intake, g/d, mean (SD) | 5.55 ± 9.27 | 5.53 ± 9.24 | 7.39 ± 11.24 | 12.09 ± 15.4 | 12.05 ± 15.4 | 13.67 ± 15.66 |
<7.0, n (%) | 32,810 (74.61) | 32,465 (74.7) | 345 (67.92) | 2,657 (49.93) | 2,613 (50.15) | 44 (39.64) |
7.0–13.9, n (%) | 5,704 (12.97) | 5,635 (12.96) | 69 (13.58) | 998 (18.75) | 971 (18.63) | 27 (24.32) |
14.0–27.9, n (%) | 3,829 (8.71) | 3,775 (8.68) | 54 (10.63) | 982 (18.45) | 963 (18.48) | 19 (17.12) |
≥28.0, n (%) | 1,631 (3.71) | 1,591 (3.66) | 40 (7.87) | 685 (12.87) | 664 (12.74) | 21 (18.92) |
Regular aspirin user, n (%)b | ||||||
No | 34,538 (78.54) | 34,141 (78.55) | 397 (78.15) | 2,872 (53.96) | 2,820 (54.12) | 52 (46.85) |
Yes | 9,436 (21.46) | 9,325 (21.45) | 111 (21.85) | 2,450 (46.04) | 2,391 (45.88) | 59 (53.15) |
Physical activity, MET-hours/week, n (%)c | 21.58 ± 26.38 | 21.59 ± 26.39 | 19.87 ± 25.01 | 34.65 ± 32.17 | 34.72 ± 32.22 | 30.78 ± 29.7 |
<7.5, n (%) | 14,248 (32.40) | 14,069 (32.37) | 179 (35.23) | 915 (17.20) | 897 (17.21) | 18 (16.22) |
7.5–14.9, n (%) | 8,777 (19.96) | 8,678 (19.97) | 99 (19.49) | 705 (13.25) | 682 (13.09) | 23 (20.72) |
15.0–29.9, n (%) | 10,369 (23.58) | 10,241 (23.56) | 128 (25.20) | 1,327 (24.93) | 1,299 (24.93) | 28 (25.23) |
30.0–59.9, n (%) | 7,573 (17.22) | 7,500 (17.25) | 73 (14.37) | 1,477 (27.75) | 1,450 (27.83) | 27 (24.32) |
≥60, n (%) | 3,007 (6.84) | 2,978 (6.85) | 29 (5.71) | 898 (16.87) | 883 (16.94) | 15 (13.51) |
Multivitamin user, n (%) | ||||||
No | 16,710 (38.00) | 16,512 (37.99) | 198 (38.98) | 2,055 (38.61) | 2,012 (38.61) | 43 (38.74) |
Yes | 27,264 (62.00) | 26,954 (62.01) | 310 (61.02) | 3,267 (61.39) | 3,199 (61.39) | 68 (61.26) |
Menopausal and hormone use status, n (%) | ||||||
Premenopausal | 16,480 (37.48) | 16,304 (37.51) | 176 (34.64) | |||
Postmenopausal, never | 10,481 (23.83) | 10,333 (23.77) | 148 (29.13) | |||
Postmenopausal, past | 6,823 (15.52) | 6,758 (15.55) | 65 (12.80) | |||
Postmenopausal, current | 10,190 (23.17) | 10,071 (23.17) | 119 (23.43) | |||
Processed red meat, servings/wk, mean (SD) | 1.47 ± 1.81 | 1.47 ± 1.81 | 1.58 ± 1.7 | 1.57 ± 2.07 | 1.56 ± 2.02 | 2.07 ± 3.66 |
Beef/pork/lamb as main dish, servings/wk, mean (SD) | 0.92 ± 0.97 | 0.92 ± 0.97 | 0.99 ± 0.95 | 1.13 ± 1.19 | 1.13 ± 1.19 | 1.19 ± 1.09 |
Total fiber intake, g/day, mean (SD) | 20.63 ± 6.14 | 20.64 ± 6.14 | 19.92 ± 5.74 | 23.38 ± 7.17 | 23.4 ± 7.16 | 22.36 ± 7.58 |
Total folate intake, μg/day, mean (SD) | 676.31 ± 316.29 | 676.44 ± 316.43 | 664.84 ± 303.69 | 735.53 ± 370.48 | 736.19 ± 370.18 | 704.67 ± 384.64 |
Vitamin D intake, IU/day, mean (SD) | 497.59 ± 340.97 | 497.94 ± 341.10 | 467.15 ± 329.10 | 467.14 ± 296.86 | 467.96 ± 297.32 | 428.32 ± 272.67 |
Calcium intake, mg/day, mean (SD) | 1397.63 ± 637.44 | 1398.83 ± 637.81 | 1294.6 ± 596.86 | 1059.26 ± 487.61 | 1058.82 ± 487.52 | 1079.88 ± 493.66 |
Marine omega-3 fatty acid intake, g/day, mean (SD) | 0.25 ± 0.28 | 0.25 ± 0.28 | 0.24 ± 0.25 | 0.36 ± 0.36 | 0.36 ± 0.36 | 0.37 ± 0.34 |
Abbreviations: BMI, body mass index; CRC, colorectal cancer; MET, metabolic equivalent task; MO, month; NHS2, the Nurses’ Health Study 2; SPs, Serrated polyps; SD, standard deviation; HPFS, the Health Professionals Follow-up Study; NHS, the Nurses’ Health Study; WK, week.
aThe presented data are based on the most recent information for each participant up to polyp diagnosis (for cases) or the end of the follow-up period (for controls). Mean (SD) is presented for continuous variables and percentages for categorical variables. All variables are adjusted for age except for age. High-risk SPs were defined as having at least one SP ≥ 1 cm in diameter, or ≥ 3 SPs.
bA standard tablet contains 325 mg aspirin, and regular users were defined as those who used at least two tablets per week.
cPhysical activity is represented by the product sum of the MET of each specific recreational activity and hours spent on that activity per week.
Supplementary Table S2 presents the ORs (95% CI) for each predictor. We found a positive association with high-risk SPs for age in men, family history of colorectal cancer in women, and BMI, smoking, and alcohol intake in both sexes. In addition, we found a positive association with high-risk SPs for processed red meat, beef/pork/lamb as main dish, vitamin D, and calcium in women in univariate analysis. The associations of these predictors with high-risk SPs in the NHS and NHS2 separately are shown in Supplementary Table S3.
Table 2 shows the discriminative performance of models assessed by C-statistics. Compared with the model including age and family history only (C-statistic: women, 0.57, 0.56–0.58; men, 0.58, 0.55–0.61), the simple model generated a statistically significant higher C-statistics of 0.68 (0.67–0.69) in women and 0.68 (0.66–0.71) in men (all P < 0.001; Fig. 1). In 10-fold cross-validation, the mean C-statistic of the simple model across the 10 test sets was 0.67 for women and 0.63 for men. Stratified analysis by age showed that the C-statistic of the simple model was higher among men <55 years (0.70, 0.60–0.80) than men ≥ 55 years (0.68, 0.65–0.70). When examined by the defining features of high-risk SPs, the simple model yielded higher C-statistic for participants ≥ 3 SPs (women, 0.71, 0.69–0.72; men, 0.72, 0.70–0.75) than those with SPs of ≥ 1 cm (women, 0.66, 0.64–0.67; men, 0.73, 0.69–0.77).
C-statistics of the risk prediction models for high-risk SPs.a
. | Women . | Men . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | Age and family history model . | Simple modelb . | Case/Control . | Age and family history model . | Simple modelb . | ||
. | Case/Control . | C-statistics . | C-statistics . | Pc . | . | C-statistics . | C-statistics . | Pc . |
Overall | 508/43,466 | 0.57 (0.56–0.58) | 0.68 (0.67–0.69) | <0.001 | 111/5,211 | 0.58 (0.55–0.61) | 0.68 (0.66–0.71) | <0.001 |
By age, y | ||||||||
<55 | 271/26,053 | 0.54 (0.52–0.56) | 0.67 (0.66–0.69) | <0.001 | 7/612 | 0.54 (0.44–0.65) | 0.70 (0.60–0.80) | 0.272 |
≥55 | 237/17,413 | 0.58 (0.56–0.60) | 0.67 (0.66–0.69) | <0.001 | 104/4,599 | 0.57 (0.54–0.59) | 0.68 (0.65–0.70) | <0.001 |
By type | ||||||||
≥1 cm in diameter | 261/43,466 | 0.57 (0.55–0.59) | 0.66 (0.64–0.67) | <0.001 | 36/5,211 | 0.57 (0.52–0.62) | 0.73 (0.69–0.77) | 0.002 |
≥3 | 299/43,466 | 0.59 (0.57–0.60) | 0.71 (0.69–0.72) | <0.001 | 87/5,211 | 0.61 (0.58–0.64) | 0.72 (0.69–0.75) | <0.001 |
By time, y | ||||||||
<2000 | 48/4,744 | 0.58 (0.54–0.62) | 0.70 (0.67–0.74) | <0.001 | 15/1,179 | 0.58 (0.51–0.65) | 0.66 (0.59–0.73) | 0.310 |
2000–2005 | 227/14,940 | 0.56 (0.54–0.58) | 0.68 (0.66–0.69) | <0.001 | 53/2,176 | 0.59 (0.55–0.63) | 0.69 (0.65–0.72) | <0.001 |
≥2005 | 233/23,782 | 0.58 (0.56–0.60) | 0.67 (0.66–0.69) | <0.001 | 43/1,856 | 0.54 (0.50–0.59) | 0.68 (0.64–0.72) | 0.005 |
. | Women . | Men . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | Age and family history model . | Simple modelb . | Case/Control . | Age and family history model . | Simple modelb . | ||
. | Case/Control . | C-statistics . | C-statistics . | Pc . | . | C-statistics . | C-statistics . | Pc . |
Overall | 508/43,466 | 0.57 (0.56–0.58) | 0.68 (0.67–0.69) | <0.001 | 111/5,211 | 0.58 (0.55–0.61) | 0.68 (0.66–0.71) | <0.001 |
By age, y | ||||||||
<55 | 271/26,053 | 0.54 (0.52–0.56) | 0.67 (0.66–0.69) | <0.001 | 7/612 | 0.54 (0.44–0.65) | 0.70 (0.60–0.80) | 0.272 |
≥55 | 237/17,413 | 0.58 (0.56–0.60) | 0.67 (0.66–0.69) | <0.001 | 104/4,599 | 0.57 (0.54–0.59) | 0.68 (0.65–0.70) | <0.001 |
By type | ||||||||
≥1 cm in diameter | 261/43,466 | 0.57 (0.55–0.59) | 0.66 (0.64–0.67) | <0.001 | 36/5,211 | 0.57 (0.52–0.62) | 0.73 (0.69–0.77) | 0.002 |
≥3 | 299/43,466 | 0.59 (0.57–0.60) | 0.71 (0.69–0.72) | <0.001 | 87/5,211 | 0.61 (0.58–0.64) | 0.72 (0.69–0.75) | <0.001 |
By time, y | ||||||||
<2000 | 48/4,744 | 0.58 (0.54–0.62) | 0.70 (0.67–0.74) | <0.001 | 15/1,179 | 0.58 (0.51–0.65) | 0.66 (0.59–0.73) | 0.310 |
2000–2005 | 227/14,940 | 0.56 (0.54–0.58) | 0.68 (0.66–0.69) | <0.001 | 53/2,176 | 0.59 (0.55–0.63) | 0.69 (0.65–0.72) | <0.001 |
≥2005 | 233/23,782 | 0.58 (0.56–0.60) | 0.67 (0.66–0.69) | <0.001 | 43/1,856 | 0.54 (0.50–0.59) | 0.68 (0.64–0.72) | 0.005 |
aHigh-risk SPs were defined as having at least one SP ≥ 1 cm in diameter, or ≥ 3 SPs.
bSimple model: further included BMI, smoking, and alcohol intake.
cP values comparing the simple models with the age and family history model.
ROC curves for prediction of high-risk SPs at the first-time screening colonoscopy based on the model with age and family history of colorectal cancer only, and the simple modela, among women (A) and men (B). aIncluding age, family history of colorectal cancer, BMI, smoking, and alcohol consumption.
ROC curves for prediction of high-risk SPs at the first-time screening colonoscopy based on the model with age and family history of colorectal cancer only, and the simple modela, among women (A) and men (B). aIncluding age, family history of colorectal cancer, BMI, smoking, and alcohol consumption.
Supplementary Table S4 presents the reclassification performance when we added smoking, alcohol drinking, and BMI into the age and family history models. Compared with the models with only age and family history, both statistically significant (all P < 0.001) higher NRI (women, 47.6; men, 48.7) and IDI (women, 0.006; men, 0.014) were observed for the simple models.
Supplementary Figure S2 shows the calibration results. The simple models were calibrated well across deciles of the predicted risk (women: χ2 = 5.60, PHL = 0.69; men: χ2 = 8.25, PHL = 0.41). The predicted risks of high-risk SPs in the top versus bottom deciles were 3.4% versus 0.4% in women (OR, 8.72; 95% CI, 5.27–14.44), and 5.4% vs. 0.6% in men (OR, 10.25; 95% CI, 3.10–33.85).
Table 3 shows the results of the secondary analysis for the composite endpoint of high-risk SPs and HRAs. The simple models demonstrated moderate discrimination, with the C-statistics of 0.62 (0.62–0.63) in women, and 0.63 (0.61–0.64) in men, respectively. The simple models also showed good calibration. The predicted risk of high-risk SPs or HRAs was 6.4% in the top decile versus 1.7% in the bottom decile for women (OR, 3.96; 95% CI, 3.05–5.14), and 15.6% versus 4.1% for men (OR, 4.02; 95% CI, 2.47–6.54; Supplementary Fig. S3).
C-statistics of the risk prediction models for a composite endpoint of high-risk SPs and high-risk HRAs.a
. | Women . | Men . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | Age and family history model . | Simple modelb . | . | Age and family history model . | Simple modelb . | ||
. | Case/Control . | C-statistics . | C-statistics . | Pc . | Case/Control . | C-statistics . | C-statistics . | Pc . |
Overall | 1,365/42,609 | 0.59 (0.58–0.59) | 0.62 (0.62–0.63) | <0.001 | 425/4,897 | 0.58 (0.56–0.59) | 0.63 (0.61–0.64) | <0.001 |
By age, y | ||||||||
<55 | 682/25,642 | 0.55 (0.53–0.56) | 0.61 (0.59–0.62) | <0.001 | 21/598 | 0.48 (0.42–0.54) | 0.52 (0.45–0.58) | 0.674 |
≥55 | 683/16,967 | 0.58 (0.57–0.59) | 0.61 (0.60–0.62) | <0.001 | 404/4,299 | 0.56 (0.54–0.57) | 0.61 (0.60–0.63) | <0.001 |
By type | ||||||||
≥1 cm in diameter | 911/42,609 | 0.59 (0.58–0.60) | 0.61 (0.61–0.62) | <0.001 | 293/4,897 | 0.58 (0.56–0.60) | 0.62 (0.60–0.63) | 0.011 |
Advanced histology | 470/42,609 | 0.63 (0.62–0.64) | 0.66 (0.65–0.67) | <0.001 | 168/4,897 | 0.62 (0.60–0.64) | 0.64 (0.62–0.66) | 0.115 |
≥3 | 425/42,609 | 0.59 (0.58–0.61) | 0.68 (0.67–0.69) | <0.001 | 185/4,897 | 0.59 (0.57–0.61) | 0.67 (0.65–0.69) | <0.001 |
By time, y | ||||||||
<2000 | 220/4,572 | 0.58 (0.56–0.60) | 0.60 (0.58–0.62) | 0.121 | 97/1,097 | 0.67 (0.64–0.70) | 0.70 (0.67–0.73) | 0.315 |
2000–2005 | 585/14,582 | 0.58 (0.57–0.59) | 0.62 (0.61–0.63) | <0.001 | 192/2,037 | 0.59 (0.56–0.61) | 0.62 (0.60–0.64) | <0.001 |
≥2005 | 560/23,455 | 0.57 (0.56–0.58) | 0.61 (0.60–0.62) | <0.001 | 136/1,763 | 0.52 (0.49–0.54) | 0.60 (0.57–0.62) | <0.001 |
. | Women . | Men . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | Age and family history model . | Simple modelb . | . | Age and family history model . | Simple modelb . | ||
. | Case/Control . | C-statistics . | C-statistics . | Pc . | Case/Control . | C-statistics . | C-statistics . | Pc . |
Overall | 1,365/42,609 | 0.59 (0.58–0.59) | 0.62 (0.62–0.63) | <0.001 | 425/4,897 | 0.58 (0.56–0.59) | 0.63 (0.61–0.64) | <0.001 |
By age, y | ||||||||
<55 | 682/25,642 | 0.55 (0.53–0.56) | 0.61 (0.59–0.62) | <0.001 | 21/598 | 0.48 (0.42–0.54) | 0.52 (0.45–0.58) | 0.674 |
≥55 | 683/16,967 | 0.58 (0.57–0.59) | 0.61 (0.60–0.62) | <0.001 | 404/4,299 | 0.56 (0.54–0.57) | 0.61 (0.60–0.63) | <0.001 |
By type | ||||||||
≥1 cm in diameter | 911/42,609 | 0.59 (0.58–0.60) | 0.61 (0.61–0.62) | <0.001 | 293/4,897 | 0.58 (0.56–0.60) | 0.62 (0.60–0.63) | 0.011 |
Advanced histology | 470/42,609 | 0.63 (0.62–0.64) | 0.66 (0.65–0.67) | <0.001 | 168/4,897 | 0.62 (0.60–0.64) | 0.64 (0.62–0.66) | 0.115 |
≥3 | 425/42,609 | 0.59 (0.58–0.61) | 0.68 (0.67–0.69) | <0.001 | 185/4,897 | 0.59 (0.57–0.61) | 0.67 (0.65–0.69) | <0.001 |
By time, y | ||||||||
<2000 | 220/4,572 | 0.58 (0.56–0.60) | 0.60 (0.58–0.62) | 0.121 | 97/1,097 | 0.67 (0.64–0.70) | 0.70 (0.67–0.73) | 0.315 |
2000–2005 | 585/14,582 | 0.58 (0.57–0.59) | 0.62 (0.61–0.63) | <0.001 | 192/2,037 | 0.59 (0.56–0.61) | 0.62 (0.60–0.64) | <0.001 |
≥2005 | 560/23,455 | 0.57 (0.56–0.58) | 0.61 (0.60–0.62) | <0.001 | 136/1,763 | 0.52 (0.49–0.54) | 0.60 (0.57–0.62) | <0.001 |
aIncluding high-risk SPs (defined as having at least one SP ≥ 1 cm in diameter, or ≥ 3 SPs) and HRAs (defined as having at least one adenoma ≥1 cm in diameter, or with advanced histology [tubulovillous or villous histologic features or high grade or severe dysplasia], or ≥3 adenomas).
bSimple model: further included BMI, smoking, and alcohol intake.
cP values comparing the simple models with the age and family history model.
Table 4 shows the results of the external validation in the Partners Colonoscopy Cohort. The simple models demonstrated moderate discrimination for high-risk SPs, with the C-statistics of 0.60 (0.58–0.61) in women and 0.60 (0.59–0.62) in men. The simple model also yielded similar discrimination for the composite endpoint of high-risk SPs and high-risk HRAs, with the C-statistics of 0.57 (0.56–0.58) in women and 0.60 (0.59–0.61) in men.
C-statistics of the risk prediction models in the Validation Database (Partners Colonoscopy Cohort).
. | Women . | Men . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | Age and family history model . | Simple modelb . | . | Age and family history model . | Simple modelb . | ||
. | Case/Control . | C-statistics . | C-statistics . | Pc . | Case/Control . | C-statistics . | C-statistics . | Pc . |
High-risk SPs | ||||||||
Overall | 2,296/48,907 | 0.52 (0.51–0.53) | 0.60 (0.58–0.61) | <0.001 | 1,888/37,189 | 0.50 (0.49–0.52) | 0.60 (0.59–0.62) | <0.001 |
Large SP (≥1 cm in diameter) | 723/50,480 | 0.52 (0.50–0.54) | 0.61 (0.59–0.63) | <0.001 | 524/38,553 | 0.52 (0.49–0.54) | 0.62 (0.60–0.65) | <0.001 |
Multiple (≥3) SPs | 1,763/49,440 | 0.52 (0.50–0.53) | 0.59 (0.58–0.61) | <0.001 | 1,535/37,542 | 0.50 (0.50–0.50) | 0.60 (0.59–0.62) | <0.001 |
High-risk HRAs | ||||||||
Overall | 3,283/47,920 | 0.53 (0.52–0.54) | 0.56 (0.55–0.57) | <0.001 | 4,214/34,863 | 0.56 (0.55–0.57) | 0.60 (0.59–0.61) | <0.001 |
Large HRAs (≥1 cm in diameter) | 1,553/49,650 | 0.51 (0.50–0.53) | 0.56 (0.55–0.58) | <0.001 | 1,894/37,183 | 0.53 (0.52–0.55) | 0.58 (0.57–0.60) | <0.001 |
Advanced histology | 606/50,597 | 0.52 (0.49–0.54) | 0.55 (0.53–0.57) | 0.007 | 707/38,370 | 0.54 (0.52–0.56) | 0.58 (0.56–0.60) | <0.001 |
Multiple HRAs (≥3) | 1,906/49,297 | 0.55 (0.54–0.56) | 0.56 (0.55–0.57) | 0.276 | 2,758/36,319 | 0.58 (0.57–0.59) | 0.62 (0.61–0.63) | <0.001 |
High-risk SPs or HRAsa | ||||||||
Overall | 4,876/46,327 | 0.53 (0.52–0.53) | 0.57 (0.56–0.58) | <0.001 | 5,332/33,745 | 0.55 (0.54–0.5534) | 0.60 (0.59–0.61) | <0.001 |
Large (≥1 cm in diameter) | 1,963/49,240 | 0.51 (0.50–0.53) | 0.57 (0.56–0.58) | <0.001 | 2,109/36,968 | 0.53 (0.52–0.5406) | 0.59 (0.57–0.60) | <0.001 |
Multiple (≥3) | 3,269/47,934 | 0.54 (0.53–0.54) | 0.57 (0.56–0.58) | <0.001 | 3,793/35,284 | 0.55 (0.54–0.5630) | 0.61 (0.60–0.62) | <0.001 |
. | Women . | Men . | ||||||
---|---|---|---|---|---|---|---|---|
. | . | Age and family history model . | Simple modelb . | . | Age and family history model . | Simple modelb . | ||
. | Case/Control . | C-statistics . | C-statistics . | Pc . | Case/Control . | C-statistics . | C-statistics . | Pc . |
High-risk SPs | ||||||||
Overall | 2,296/48,907 | 0.52 (0.51–0.53) | 0.60 (0.58–0.61) | <0.001 | 1,888/37,189 | 0.50 (0.49–0.52) | 0.60 (0.59–0.62) | <0.001 |
Large SP (≥1 cm in diameter) | 723/50,480 | 0.52 (0.50–0.54) | 0.61 (0.59–0.63) | <0.001 | 524/38,553 | 0.52 (0.49–0.54) | 0.62 (0.60–0.65) | <0.001 |
Multiple (≥3) SPs | 1,763/49,440 | 0.52 (0.50–0.53) | 0.59 (0.58–0.61) | <0.001 | 1,535/37,542 | 0.50 (0.50–0.50) | 0.60 (0.59–0.62) | <0.001 |
High-risk HRAs | ||||||||
Overall | 3,283/47,920 | 0.53 (0.52–0.54) | 0.56 (0.55–0.57) | <0.001 | 4,214/34,863 | 0.56 (0.55–0.57) | 0.60 (0.59–0.61) | <0.001 |
Large HRAs (≥1 cm in diameter) | 1,553/49,650 | 0.51 (0.50–0.53) | 0.56 (0.55–0.58) | <0.001 | 1,894/37,183 | 0.53 (0.52–0.55) | 0.58 (0.57–0.60) | <0.001 |
Advanced histology | 606/50,597 | 0.52 (0.49–0.54) | 0.55 (0.53–0.57) | 0.007 | 707/38,370 | 0.54 (0.52–0.56) | 0.58 (0.56–0.60) | <0.001 |
Multiple HRAs (≥3) | 1,906/49,297 | 0.55 (0.54–0.56) | 0.56 (0.55–0.57) | 0.276 | 2,758/36,319 | 0.58 (0.57–0.59) | 0.62 (0.61–0.63) | <0.001 |
High-risk SPs or HRAsa | ||||||||
Overall | 4,876/46,327 | 0.53 (0.52–0.53) | 0.57 (0.56–0.58) | <0.001 | 5,332/33,745 | 0.55 (0.54–0.5534) | 0.60 (0.59–0.61) | <0.001 |
Large (≥1 cm in diameter) | 1,963/49,240 | 0.51 (0.50–0.53) | 0.57 (0.56–0.58) | <0.001 | 2,109/36,968 | 0.53 (0.52–0.5406) | 0.59 (0.57–0.60) | <0.001 |
Multiple (≥3) | 3,269/47,934 | 0.54 (0.53–0.54) | 0.57 (0.56–0.58) | <0.001 | 3,793/35,284 | 0.55 (0.54–0.5630) | 0.61 (0.60–0.62) | <0.001 |
aIncluding high-risk SPs (defined as having at least one SP ≥ 1 cm in diameter, or SPs located in the proximal colon, or ≥ 3 SPs) and HRAs [defined as having at least one adenoma ≥1 cm in diameter, or with advanced histology (tubulovillous or villous histologic features or high grade or severe dysplasia), or ≥3 adenomas].
bSimple model: further included BMI, smoking, and alcohol intake.
cP values comparing the simple models with the age and family history model.
In a sensitivity analysis, we obtained essentially the same C-statistics using continuous instead of categorical variables in our simple prediction models (women, 0.67, 0.65–0.69; men, 0.65, 0.60–0.70). Similar results were found when we excluded small distal SPs from the high-risk SP group (women, 0.60, 0.59–0.61; men, 0.63, 0.61–0.65).
Discussion
Using data from four prospective U.S. cohorts, we developed and externally validated a set of simple risk prediction models for high-risk SPs among adults who underwent colonoscopy as their first routine colorectal cancer screening. The models showed moderate discrimination and goodness-of-fit for both SPs and HRAs, and may be useful to identify high-risk individuals for healthy lifestyle recommendations and more intensive colorectal cancer screening.
In the current study, all the predictors have been linked to the risk of both SPs and colorectal cancer. Consistent with previous evidence (25–29), a statistically significant positive association with the risk of high-risk SPs was observed for family history of colorectal cancer, BMI, smoking, and alcohol consumption, whereas a significant inverse association was observed for regular aspirin use. Diet plays an important role in the development of colorectal neoplasia (30). In the current study, we confirmed previous findings from a meta-analysis that intake of total vitamin D and folate was inversely associated with the risk of SPs.
In recent years, several risk scores have been developed for colorectal neoplasia (31). The models vary in terms of the outcome of interest (colorectal cancer or high-risk adenoma), source population, predictors, and risk threshold. To the best of our knowledge, the current study represents the first effort to develop a prediction model for high-risk SPs among asymptomatic U.S. adults that undergo their first colonoscopy screening. Compared with other colorectal cancer lesions, the endoscopic detection and complete resection of SPs are more difficult, indicating the need for developing effective risk assessment tools. Importantly, although we focused on high-risk SPs, given the overlapping risk factor profiles with conventional adenomas, our models also demonstrated moderate to good discriminative ability for high-risk adenomas, with comparable C-statistic estimates (0.62–0.63) as those of previously published models (32). These findings indicate the potential clinical utility of our models for risk stratification of both high-risk SPs and adenomas.
Compared with other endoscopic and histopathologic features of SPs, large size has demonstrated better sensitivity in identifying high-risk lesions. The presence of large SPs has been consistently associated with an increased risk of synchronous advanced neoplasia and colorectal cancer (11, 12, 33, 34). In this study, we found the discrimination of our models for large SPs was slightly higher than that for multiple SPs. Also, we found that the C-statistics of our models did not differ by time, suggesting that our results are unlikely to be substantially influenced by the misdiagnosis of SPs in earlier years.
Our study has several strengths, including the large sample size, comprehensive profiling of colorectal cancer risk factors, confirmation of polyp diagnosis based on pathology reports, as well as external validation. Moreover, our risk estimation was based on incident SPs rather than projections of the future development of colorectal cancer, enhancing the potential utility of our models in the setting of colonoscopy screening. We also provided separate discrimination estimates for clinically significant subgroups of high-risk SPs and high-risk adenomas. We compared different sets of models and found that, compared with the simple model based on five major risk factors, adding more predictors that are more difficult to assess in routine clinical practice (e.g., diet) did not provide much incremental predictive value.
Our study has a few limitations. All the lifestyle data were self-reported and subject to measurement error. However, rigorous validation of our questionnaires has been performed previously. Moreover, given the prospective design, any error in exposure assessment would have likely been attenuated. Detailed information on bowel preparation and colonoscopy quality was not collected. Moreover, we were unable to distinguish HPs from SSA/Ps and TSAs, which may have compromised the discrimination of the model. Because our study participants are all health professionals and largely White, the generalizability of our models should be validated in more diverse populations in future studies. The pathology of polyps was not centrally rereviewed, due to cost and time constraints.
Despite the limitations, our study suggests the great potential of identifying high-risk individuals for better detection of SPs using a simple risk prediction tool. It has been reported that most primary care providers are willing to use such a risk prediction model, assuming that it adds minimal burden to routine clinical workflow, is easy to use, and requires minimal time to complete (35). The simple model developed in the current study meets these requirements by utilizing risk factors that can be easily assessed in a clinical setting and potentially automated in the electronic health record system. Nevertheless, we acknowledge the modest discriminatory ability of our model, and future studies are warranted to assess the potential of integrating more predictors, including markers, for improved accuracy. On the other hand, our results highlight the importance of risk factor reduction for colorectal cancer prevention. Primary care providers may motivate high-risk patients to modify their lifestyles (e.g., weight loss, smoking, and alcohol cessation) to reduce colorectal cancer incidence.
In conclusion, we developed and externally validated a simple risk prediction model for high-risk SPs in the setting of colonoscopy screening. Our model showed adequate discriminatory accuracy and has implications for shared decision-making and individualized risk assessment for tailored colorectal cancer screening.
Authors' Disclosures
K. Wu is currently a stakeholder and an employee at Vertex Pharmaceuticals. This study was not funded by this commercial entity. B. Rosner reports grants from NIH during the conduct of the study. A.T. Chan reports personal fees from Bayer Pharma AG and Boehringer Ingelheim, grants and personal fees from Pfizer Inc., grants from Zoe Ltd and Freenome outside the submitted work. S. Ogino reports grants from NIH during the conduct of the study. No disclosures were reported by the other authors.
Authors' Contributions
Z. Lyu: Conceptualization, data curation, formal analysis, visualization, methodology, writing–original draft, writing–review and editing. D. Hang: Conceptualization, formal analysis, methodology, writing–review and editing. X. He: Data curation, supervision, methodology, writing–review and editing. K. Wu: Data curation, supervision, funding acquisition, project administration, writing–review and editing. Y. Cao: Data curation, formal analysis, supervision, visualization, writing–review and editing. B. Rosner: Data curation, software, formal analysis, supervision, writing–review and editing. A.T. Chan: Data curation, supervision, validation, methodology, writing–review and editing. S. Ogino: Conceptualization, supervision, funding acquisition, writing–review and editing. N. Li: Software, supervision, methodology, writing–review and editing. M. Dai: Software, supervision, methodology, writing–review and editing. E.L. Giovannucci: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, project administration, writing–review and editing. M. Song: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, project administration, writing–review and editing.
Acknowledgments
This work was supported by the grant MRSG-17-220-01-NEC from the American Cancer Society (M. Song), U.S. NIH grants (R01 CA176726, to A.T. Chan; R00 CA215314, to M. Song; R35 CA253185, to A.T. Chan; R03 CA197879, to K. Wu; R21 CA230873, to K. Wu; R21 CA222940, to K. Wu) from the NIH, grant 81973127 from the National Natural Science Foundation of China (D. Hang), and grant BK20190083 from the National Science Foundation of Jiangsu Province (D. Hang). A.T. Chan is an American Society Clinical Research professor and Suzanne Steele MGH Research Scholar. We would also like to thank the participants and staff of the NHS, the NHS2, and the HPFS for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY. The authors assume full responsibility for analyses and interpretation of these data.
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Cancer Prevention Research Online (http://cancerprevres.aacrjournals.org/).