Abstract
Colorectal cancer is the second leading cause of cancer-related death in Europe and the United States. Survival is strongly related to stage at diagnosis and population-based screening reduces colorectal cancer incidence and mortality. Stratifying the population by risk offers the potential to improve the efficiency of screening. In this systematic review we searched Medline, EMBASE, and the Cochrane Library for primary research studies reporting or validating models to predict future risk of primary colorectal cancer for asymptomatic individuals. A total of 12,808 papers were identified from the literature search and nine through citation searching. Fifty-two risk models were included. Where reported (n = 37), half the models had acceptable-to-good discrimination (the area under the receiver operating characteristic curve, AUROC >0.7) in the derivation sample. Calibration was less commonly assessed (n = 21), but overall acceptable. In external validation studies, 10 models showed acceptable discrimination (AUROC 0.71–0.78). These include two with only three variables (age, gender, and BMI; age, gender, and family history of colorectal cancer). A small number of prediction models developed from case–control studies of genetic biomarkers also show some promise but require further external validation using population-based samples. Further research should focus on the feasibility and impact of incorporating such models into stratified screening programmes. Cancer Prev Res; 9(1); 13–26. ©2015 AACR.
See related article by Frank L. Meyskens, Jr., p. 11
Introduction
Colorectal cancer is the second leading cause of cancer-related death in Europe and the United States (1). Survival is strongly related to stage at diagnosis (2) and population-based screening has been shown to significantly reduce colorectal cancer incidence and mortality (3–6). Stratifying the population into risk categories offers the potential to improve the efficiency of this screening by tailoring the intensity of screening, or preventive approaches, to the predicted level of risk. Providing patients and practitioners with a personalized risk assessment may also encourage engagement in risk reducing behaviors, including participation in screening or prevention programmes and lifestyle changes to reduce incidence of disease (7).
A number of risk prediction models for colorectal cancer have been developed and two previous reviews of these have been published (8, 9). However, neither was comprehensive, and since those reviews were published several new risk models have been developed. This article provides the first comprehensive analysis of risk prediction tools for risk of primary colorectal cancer in asymptomatic individuals within the general population. It includes analysis of the range of 87 variables in addition to genes and SNPs included in each model, the predictive ability of the different risk models and their potential applicability and practical use for population-based stratification.
Materials and Methods
We performed a systematic literature review following an a priori established study protocol (available on request).
Search strategy
We performed an electronic literature search of Medline, EMBASE, and the Cochrane Library from January 2000 to March 2014 with no language limits using a combination of subject headings incorporating “colorectal cancer,” “risk/risk factor/risk assessment/chance,” and “prediction/model/score” (see Supplementary File 1, for complete search strategy for Medline and EMBASE). We then manually screened the reference lists of all included papers.
Study selection
We included studies if they fulfilled all of the following criteria: (i) published as a primary research paper in a peer-reviewed journal; (ii) identify risk factors for developing colon, rectal or colorectal cancer, or advanced colorectal neoplasia at the level of the individual; (iii) provide a measure of relative or absolute risk using a combination of two or more risk factors that allows identification of people at higher risk of colon and/or rectal cancer; and (iv) are applicable to the general population. Studies including only highly selected groups, for example immunosuppressed patients, organ transplant recipients, or those with a previous history of colon and/or rectal cancer were excluded. Conference proceedings were also excluded after contacting the authors to confirm the results had not been published elsewhere in a peer-reviewed journal.
One reviewer (J.U.S.) performed the search and screened the titles and abstracts to exclude papers that were clearly not relevant. Two reviewers (F.W. and S.G.) independently assessed a random selection of 5% of the papers each. The full text was examined where a definite decision to reject could not be made based on title and abstract alone. Two reviewers (J.U.S. and F.W./S.G./J.E.) independently assessed all full-text papers, and those deemed not to meet inclusion criteria by both researchers were excluded. We discussed papers for which it was unclear whether or not the inclusion criteria were met at consensus meetings. Papers written in languages other than English were translated into English for assessment and subsequent data extraction.
Data extraction and synthesis
Data were extracted independently by two researchers (J.U.S. and F.W./S.G./J.E.) using a standardized form to minimize bias. The form included details on: (i) the development of the model, including potential risks of bias such as the study design, selection of participants, and the variables considered for inclusion in the model and how they were selected; (ii) the risk model itself, including the variables included and requirement for data collection; (iii) the performance of the risk model in the development population; and (iv) any validation studies of the risk model and/or data collection tool, including the study design and performance of the risk model. In this process, the methods of studies published for each risk model were classified according to the TRIPOD guidelines (10) and tabulation of the methods allowed assessment of bias. For studies which included multiple different models, for example separate models for men and women or for self-assessment and physician assessment, all were included separately.
Results
Identified risk models
After duplicates were removed, the search identified 12,808 papers. Of these, 12,727 were excluded at title and abstract level and a further 50 after full-text assessment. After title and abstract screening by the first reviewer (J.U.S.), no additional papers met the inclusion criteria in the random 10% screened by a second reviewer (F.W./S.G.). The most common reasons for exclusion at full-text level were that the papers included symptomatic populations, were conference abstracts or did not include a risk score (Fig. 1). Four were excluded as they included circulating biomarkers that were felt to detect prevalent undiagnosed disease rather than estimate future risk (11–14).
Nine further papers were identified through citation searching, giving 40 papers describing 52 risk models for inclusion in the analysis and six external validation studies (15–20). Table 1 summarizes these 52 risk models. Thirteen have advanced colonic neoplasia (defined as invasive cancer, an adenoma 10 mm or more, a villous adenoma (at least 25% villous), or an adenoma with high-grade dysplasia) as the outcome (21–32), 13 colon cancer (33–41), 20 colorectal cancer (31, 36, 38, 39, 41–54), and six rectal cancer (37–39). Most include both men and women, but 16 are specific to either men or women. Six include only variables that are available in routine medical records. The majority (n = 32) include variables obtained via a self-completed questionnaire. These range from questionnaires with only one or two simple questions concerning family history (26, 27, 32, 50, 52), diet (44), or physical activity (38) to those including detailed dietary habits, aspirin/NSAID use, estrogen and HRT use, inflammatory bowel disease, previous colonoscopy or sigmoidoscopy, and polyp history and the most complex including 15 variables (35). Six, all from the same study, use data from a self-completed questionnaire and results of blood tests for fasting plasma glucose and total cholesterol (39), four a blood test alone for genetic biomarkers (45, 48, 49, 51), and four a self-completed questionnaire and genetic biomarkers (43, 54, 46). Between them, the authors of the 52 risk models considered 87 different risk factors (Table 2).
Author, year . | Country . | Outcome . | Factors included in score . | Factors considered but not included . | TRIPOD levela . | Data source . |
---|---|---|---|---|---|---|
Betes 2003a (21) | Spain | ACN+ | Age, gender, BMI | NSAIDs, nonspecific abdominal pain, bowel habit (1–2 movements/day; diarrhea-alternate; chronic constipation), cholesterol, triglycerides, form of recruitment | 1a | Medical records |
Betes 2003b (21) | Spain | ACN | Age, gender, BMI | NSAIDs, nonspecific abdominal pain, bowel habit (1–2 movements/day; diarrhoea-alternate; chronic constipation), cholesterol, triglycerides, form of recruitment | 1a, 4 | Medical records |
Cai 2012 (22) | China | ACN | Age, gender, smoking, diabetes mellitus, green vegetables, pickled food, fried food, white meat | BMI, hypertension, hypertriglyceridaemia, alcohol intake, calcium or vitamin D supplementation, aspirin or NSAIDs, fresh fruit, eggs, milk, red meat | 2a, 4 | Questionnaire |
Chen 2013 (23) | China | ACN | Age, smoking, alcohol | Gender, history of CVD, egg intake, defecation frequency, education level, hypertension, diabetes, hyperlipidemia, gastric/gallbladder/appendix operations history, aspirin, tea drinking, physical activity, green vegetable/fruit/milk/pickled food/fried or smoked food/bamboo root/red meat/white meat intake | 1b | Medical records |
Chen 2014 (24) | China | ACN | Age, gender, history of CHD, egg intake, defecation frequency | Education level, hypertension, diabetes, hyperlipidemia, gastric/gallbladder/appendix operations history, aspirin use, smoking, alcohol, tea drinking, physical activity, green vegetable/fruit/milk/pickled food/fried or smoked food/bamboo root/red meat intake/white meat intake | 1b | Questionnaire |
Hassan 2013 (25) | Italy | ACN | Age, gender | Family history, BMI | 1b | Medical records |
Kaminski 2014 (26) | Poland | ACN | Age, gender, BMI, smoking, number and age affected of first degree relatives with CRC | Diabetes, regular aspirin use | 2a | Questionnaire |
Lin 2006 (27) | USA | ACN | Age, gender, first degree relative with CRC or second degree relative with adenoma | None | 1a, 4 | Questionnaire |
Lin 2013 (28) | USA | ACN | Age, BMI, smoking, number of first degree relatives with CRC, previous sigmoidoscopy or colonoscopy, polyp history in past 10 years, physical activity, vegetable consumption, NSAID use, estrogen use | None | 1b | Questionnaire |
Stegeman 2013 (30) | Netherlands | ACN | Age, gender, BMI, first degree relative with CRC, menopausal status (women), smoking, sleep, vigorous exercise, alcohol, fiber intake, calcium intake, red meat intake, aspirin/NSAID use | None | 1a | Questionnaire |
Stegeman 2014 (29) | Netherlands | ACN | Age, smoking, first degree relative with CRC, fecal immunochemical test, calcium intake | BMI, menopausal status, aspirin/NSAID use, fiber/red meat intake | 1a | Questionnaire |
Tao 2014a (31) | Germany | ACN | Age, gender, smoking, first-degree relative with CRC, alcohol, previous polyp, red meat consumption, NSAIDS, previous colonoscopy | BMI, physical activity, vegetable/fruit intake, HRT | 3 | Questionnaire |
Yeoh 2011 (32) | Asia | ACN | Age, gender, smoking, first-degree relative with CRC | Alcohol, diabetes | 3 | Questionnaire |
Almurshed 2009 (33) | Saudi Arabia | CC | Region, marital status, education level, employment status, activity level, physical activity, knowledge of high-fiber diet | None | 1a | Questionnaire |
Camp 2002 (34) | USA | CC | Age, BMI, first degree relative with CRC, NSAID use, long term vigorous physical activity, Western diet, folic acid, calcium intake, lutein intake, refined grain intake, Prudent dietary pattern | Sex, hormone replacement therapy, smoking history, calorific intake, dietary fiber, total vegetable/fat intake, glycemic index of intake, mutagen index, alcohol consumption | 1a | Questionnaire |
Colditz 2000 (35) | USA | CC | BMI, first degree relative with CRC, fecal occult blood test or sigmoidoscopy, aspirin, IBD, folate, vegetables, alcohol, height, physical activity, estrogen replacement, fruits, fiber, saturated fat, smoking | None | 1a, 4 | Questionnaire |
Driver 2007a (36) | USA | CC | Age, BMI, history of smoking | Weekly or daily alcohol use, intake of vegetables, intake of multivitamins, vitamin C, vitamin E, intake of cold cereal, physical activity, history of diabetes | 1a | Medical records |
Ma 2010a (38) | Japan | CC | Age, BMI, smoking, alcohol, physical activity | FH CRC, diabetes | 3 | Questionnaire |
Wei E 2009 (40) | USA | CC | Age, BMI, smoking, current or past HRT, height, first degree relative with colon cancer, processed meat consumption, folate intake, physical activity, aspirin use, sigmoidoscopy or colonoscopy during follow up | None | 1a | Questionnaire |
Wei E 2004a (41) | USA | CC | Age, gender, BMI, smoking, alcohol, first-degree relative with colon cancer, physical activity, height, processed meat, servings of beef, pork or lamb, folate intake, calcium intake | None | 1a | Questionnaire |
Shin 2014a (39) | Korea | CC (male) | Age, BMI, family history of cancer, height, fasting serum glucose, total serum cholesterol, alcohol, meat consumption | Smoking, exercise | 3 | Questionnaire and blood test |
Shin 2014d (39) | Korea | CC (female) | Age, family history of cancer, height, fasting serum glucose, meat consumption | BMI, alcohol, smoking, exercise, female reproductive factors | 3 | Questionnaire and blood test |
Freedman 2009b (37) | USA | Distal CC (male) | BMI, number of first-degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, aspirin and NSAID use | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, physical activity, smoking, age | 1a, 4b | Questionnaire |
Freedman 2009e (37) | USA | Distal CC (female) | Age, BMI, number of relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, aspirin and NSAID use, estrogen use in last 2 years | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, physical activity | 1a, 4b | Questionnaire |
Freedman 2009a (37) | USA | Proximal CC (male) | BMI, smoking, number of first-degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, aspirin and NSAID use, vegetable consumption | FOBT, multivitamin use, red meat/fruit intake, alcohol intake, physical activity, age | 1a, 4b | Questionnaire |
Freedman 2009d (37) | USA | Proximal CC (female) | Number of first degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, physical activity, aspirin and NSAID use, vegetable consumption, estrogen use in last 2 years | FOBT, multivitamin use, red meat/fruit intake, alcohol intake, BMI, age | 1a, 4b | Questionnaire |
Bener 2010 (42) | Qatar | CRC | BMI, smoking, family history of CRC, consumption of bakery products, consumption of soft drinks | Smoking of Sheesha, fresh fruit/fresh vegetable/green salad/frozen meat/chicken/fast food/processed food intake, consanguinity | 1a | Questionnaire |
Driver 2007b (36) | USA | CRC | Age, BMI, history of smoking, weekly or daily alcohol use | Intake of vegetables, intake of multivitamins, vitamin C, vitamin E, intake of cold cereal, physical activity, history of diabetes | 1b | Medical records |
Dunlop 2013 (43) | Worldwide | CRC | Age, gender, first-degree relative with CRC, 10 SNPsc | None | 3 | Questionnaire and blood test for genetics |
Guesmi 2010 (44) | Tunisia | CRC | Age, meat consumption, milk consumption | Gender, anemia, smoking, physical activity, fruit/fried food intake, urban or rural living, olive oil consumption, walking | 1a | Questionnaire |
Han 2008 (45) | Not given | CRC | Five genesd | Affymetrix U133Plis 2.0 chip | 3 | Blood test for genetics |
Johnson 2013 (47) | Worldwide | CRC | BMI, smoking, first-degree relative with CRC, physical activity, alcohol, IBD, hormone therapy (current or former), aspirin/NSAIDs, processed meat/red meat/fruit/vegetable intake | None | 1a | Questionnaire |
Lubbe 2012 (48) | UK | CRC | 14 SNPse | None | 1a | Blood test for genetics |
Ma 2010c (38) | Japan | CRC | Age, BMI, smoking, physical activity, alcohol | FH CRC, diabetes | 3 | Questionnaire |
Marshall 2010 (49) | Canada and USA | CRC | Seven genesf | 38 genes | 2b | Blood test for genetics |
Tao 2014b (31) | Germany | CRC | Age, gender, smoking, first-degree relative with CRC, alcohol, previous polyp, red meat consumption, NSAIDS, previous colonoscopy | BMI, physical activity, vegetable/fruit intake, HRT | 3 | Questionnaire |
Taylor 2011 (50) | USA | CRC | Age, first-, second- and third-degree relatives with CRC | None | 1a | Questionnaire |
Wang 2013 (51) | Taiwan | CRC | 16 SNPsg | 10 additional SNPs | 1b | Blood test for genetics |
Yarnall 2013 (54) | UK data | CRC | BMI, smoking, alcohol, fiber intake, red meat intake, physical activity, 14 SNPsh | None | 1a | Questionnaire and blood test for genetics |
Wei Y 2009 (52) | China | CRC | BMI, smoking, first- or second-degree relative with CRC, alcohol | None | 1a | Questionnaire |
Shin 2014c (39) | Korea | CRC (male) | Age, BMI, family history of cancer, height, fasting serum glucose, total serum cholesterol, alcohol, meat consumption | Smoking, exercise | 3 | Questionnaire and blood test |
Wells 2014b (53) | California and Hawaii | CRC (male) | Age, BMI, smoking, first-degree relative with CC, race/ethnicity, alcohol, years of education, regular use of aspirin, multivitamins, red meat intake, history of diabetes, physical activity | History of cancer, regular use of NSAIDs, preference for well-done meat | 1b | Questionnaire |
Jo 2012b (46) | Korea | CRC (male) | Three SNPsi, age, family history of CRC | From 426,019 SNPs | 1b | Questionnaire and blood test for genetics |
Shin 2014f (39) | Korea | CRC (female) | Age, family history of cancer, height, fasting serum glucose, meat consumption | BMI, alcohol, smoking, exercise, female reproductive factors | 3 | Questionnaire and blood test |
Wells 2014a (53) | California and Hawaii | CRC (female) | Age, BMI, smoking, first degree relative with CC, race/ethnicity, alcohol, years of education, regular use of NSAIDs, multivitamins, history of diabetes, use of oestrogen | Preference for well done meat, physical activity, regular use of aspirin, red meat intake, history of cancer | 1b | Questionnaire |
Jo 2012a (46) | Korea | CRC (female) | Age, family history of CRC, five SNPsj | From 426,019 SNPs | 1b | Questionnaire and blood test for genetics |
Ma 2010b (38) | Japan | Rectal cancer | Age, BMI, physical activity, alcohol | FH CRC, diabetes, smoking | 3 | Questionnaire |
Wei E 2004b (41) | USA | Rectal cancer | Age, BMI, smoking, first-degree relative with rectal cancer, alcohol, physical activity, height, processed meat, gender, servings of beef, pork or lamb, folate intake, calcium intake | None | 1a | Questionnaire |
Freedman 2009c (37) | USA | Rectal cancer (male) | Number of first degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, NSAID use, physical activity | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, smoking, BMI, age | 1a, 4b | Questionnaire |
Shin 2014b (39) | Korea | Rectal cancer (male) | Age, BMI, family history of cancer, height, fasting serum glucose, total serum cholesterol, alcohol, meat consumption | Smoking, exercise | 3 | Questionnaire and blood test |
Freedman 2009f (37) | USA | Rectal cancer (female) | BMI, number of first-degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, physical activity, NSAID use, estrogen use in last 2 years | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, age | 1a, 4b | Questionnaire |
Shin 2014e (39) | Korea | Rectal cancer (female) | Age, family history of cancer, height, fasting serum glucose, meat consumption | BMI, alcohol, smoking, exercise, female reproductive factors | 3 | Questionnaire and blood test |
Author, year . | Country . | Outcome . | Factors included in score . | Factors considered but not included . | TRIPOD levela . | Data source . |
---|---|---|---|---|---|---|
Betes 2003a (21) | Spain | ACN+ | Age, gender, BMI | NSAIDs, nonspecific abdominal pain, bowel habit (1–2 movements/day; diarrhea-alternate; chronic constipation), cholesterol, triglycerides, form of recruitment | 1a | Medical records |
Betes 2003b (21) | Spain | ACN | Age, gender, BMI | NSAIDs, nonspecific abdominal pain, bowel habit (1–2 movements/day; diarrhoea-alternate; chronic constipation), cholesterol, triglycerides, form of recruitment | 1a, 4 | Medical records |
Cai 2012 (22) | China | ACN | Age, gender, smoking, diabetes mellitus, green vegetables, pickled food, fried food, white meat | BMI, hypertension, hypertriglyceridaemia, alcohol intake, calcium or vitamin D supplementation, aspirin or NSAIDs, fresh fruit, eggs, milk, red meat | 2a, 4 | Questionnaire |
Chen 2013 (23) | China | ACN | Age, smoking, alcohol | Gender, history of CVD, egg intake, defecation frequency, education level, hypertension, diabetes, hyperlipidemia, gastric/gallbladder/appendix operations history, aspirin, tea drinking, physical activity, green vegetable/fruit/milk/pickled food/fried or smoked food/bamboo root/red meat/white meat intake | 1b | Medical records |
Chen 2014 (24) | China | ACN | Age, gender, history of CHD, egg intake, defecation frequency | Education level, hypertension, diabetes, hyperlipidemia, gastric/gallbladder/appendix operations history, aspirin use, smoking, alcohol, tea drinking, physical activity, green vegetable/fruit/milk/pickled food/fried or smoked food/bamboo root/red meat intake/white meat intake | 1b | Questionnaire |
Hassan 2013 (25) | Italy | ACN | Age, gender | Family history, BMI | 1b | Medical records |
Kaminski 2014 (26) | Poland | ACN | Age, gender, BMI, smoking, number and age affected of first degree relatives with CRC | Diabetes, regular aspirin use | 2a | Questionnaire |
Lin 2006 (27) | USA | ACN | Age, gender, first degree relative with CRC or second degree relative with adenoma | None | 1a, 4 | Questionnaire |
Lin 2013 (28) | USA | ACN | Age, BMI, smoking, number of first degree relatives with CRC, previous sigmoidoscopy or colonoscopy, polyp history in past 10 years, physical activity, vegetable consumption, NSAID use, estrogen use | None | 1b | Questionnaire |
Stegeman 2013 (30) | Netherlands | ACN | Age, gender, BMI, first degree relative with CRC, menopausal status (women), smoking, sleep, vigorous exercise, alcohol, fiber intake, calcium intake, red meat intake, aspirin/NSAID use | None | 1a | Questionnaire |
Stegeman 2014 (29) | Netherlands | ACN | Age, smoking, first degree relative with CRC, fecal immunochemical test, calcium intake | BMI, menopausal status, aspirin/NSAID use, fiber/red meat intake | 1a | Questionnaire |
Tao 2014a (31) | Germany | ACN | Age, gender, smoking, first-degree relative with CRC, alcohol, previous polyp, red meat consumption, NSAIDS, previous colonoscopy | BMI, physical activity, vegetable/fruit intake, HRT | 3 | Questionnaire |
Yeoh 2011 (32) | Asia | ACN | Age, gender, smoking, first-degree relative with CRC | Alcohol, diabetes | 3 | Questionnaire |
Almurshed 2009 (33) | Saudi Arabia | CC | Region, marital status, education level, employment status, activity level, physical activity, knowledge of high-fiber diet | None | 1a | Questionnaire |
Camp 2002 (34) | USA | CC | Age, BMI, first degree relative with CRC, NSAID use, long term vigorous physical activity, Western diet, folic acid, calcium intake, lutein intake, refined grain intake, Prudent dietary pattern | Sex, hormone replacement therapy, smoking history, calorific intake, dietary fiber, total vegetable/fat intake, glycemic index of intake, mutagen index, alcohol consumption | 1a | Questionnaire |
Colditz 2000 (35) | USA | CC | BMI, first degree relative with CRC, fecal occult blood test or sigmoidoscopy, aspirin, IBD, folate, vegetables, alcohol, height, physical activity, estrogen replacement, fruits, fiber, saturated fat, smoking | None | 1a, 4 | Questionnaire |
Driver 2007a (36) | USA | CC | Age, BMI, history of smoking | Weekly or daily alcohol use, intake of vegetables, intake of multivitamins, vitamin C, vitamin E, intake of cold cereal, physical activity, history of diabetes | 1a | Medical records |
Ma 2010a (38) | Japan | CC | Age, BMI, smoking, alcohol, physical activity | FH CRC, diabetes | 3 | Questionnaire |
Wei E 2009 (40) | USA | CC | Age, BMI, smoking, current or past HRT, height, first degree relative with colon cancer, processed meat consumption, folate intake, physical activity, aspirin use, sigmoidoscopy or colonoscopy during follow up | None | 1a | Questionnaire |
Wei E 2004a (41) | USA | CC | Age, gender, BMI, smoking, alcohol, first-degree relative with colon cancer, physical activity, height, processed meat, servings of beef, pork or lamb, folate intake, calcium intake | None | 1a | Questionnaire |
Shin 2014a (39) | Korea | CC (male) | Age, BMI, family history of cancer, height, fasting serum glucose, total serum cholesterol, alcohol, meat consumption | Smoking, exercise | 3 | Questionnaire and blood test |
Shin 2014d (39) | Korea | CC (female) | Age, family history of cancer, height, fasting serum glucose, meat consumption | BMI, alcohol, smoking, exercise, female reproductive factors | 3 | Questionnaire and blood test |
Freedman 2009b (37) | USA | Distal CC (male) | BMI, number of first-degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, aspirin and NSAID use | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, physical activity, smoking, age | 1a, 4b | Questionnaire |
Freedman 2009e (37) | USA | Distal CC (female) | Age, BMI, number of relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, aspirin and NSAID use, estrogen use in last 2 years | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, physical activity | 1a, 4b | Questionnaire |
Freedman 2009a (37) | USA | Proximal CC (male) | BMI, smoking, number of first-degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, aspirin and NSAID use, vegetable consumption | FOBT, multivitamin use, red meat/fruit intake, alcohol intake, physical activity, age | 1a, 4b | Questionnaire |
Freedman 2009d (37) | USA | Proximal CC (female) | Number of first degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, physical activity, aspirin and NSAID use, vegetable consumption, estrogen use in last 2 years | FOBT, multivitamin use, red meat/fruit intake, alcohol intake, BMI, age | 1a, 4b | Questionnaire |
Bener 2010 (42) | Qatar | CRC | BMI, smoking, family history of CRC, consumption of bakery products, consumption of soft drinks | Smoking of Sheesha, fresh fruit/fresh vegetable/green salad/frozen meat/chicken/fast food/processed food intake, consanguinity | 1a | Questionnaire |
Driver 2007b (36) | USA | CRC | Age, BMI, history of smoking, weekly or daily alcohol use | Intake of vegetables, intake of multivitamins, vitamin C, vitamin E, intake of cold cereal, physical activity, history of diabetes | 1b | Medical records |
Dunlop 2013 (43) | Worldwide | CRC | Age, gender, first-degree relative with CRC, 10 SNPsc | None | 3 | Questionnaire and blood test for genetics |
Guesmi 2010 (44) | Tunisia | CRC | Age, meat consumption, milk consumption | Gender, anemia, smoking, physical activity, fruit/fried food intake, urban or rural living, olive oil consumption, walking | 1a | Questionnaire |
Han 2008 (45) | Not given | CRC | Five genesd | Affymetrix U133Plis 2.0 chip | 3 | Blood test for genetics |
Johnson 2013 (47) | Worldwide | CRC | BMI, smoking, first-degree relative with CRC, physical activity, alcohol, IBD, hormone therapy (current or former), aspirin/NSAIDs, processed meat/red meat/fruit/vegetable intake | None | 1a | Questionnaire |
Lubbe 2012 (48) | UK | CRC | 14 SNPse | None | 1a | Blood test for genetics |
Ma 2010c (38) | Japan | CRC | Age, BMI, smoking, physical activity, alcohol | FH CRC, diabetes | 3 | Questionnaire |
Marshall 2010 (49) | Canada and USA | CRC | Seven genesf | 38 genes | 2b | Blood test for genetics |
Tao 2014b (31) | Germany | CRC | Age, gender, smoking, first-degree relative with CRC, alcohol, previous polyp, red meat consumption, NSAIDS, previous colonoscopy | BMI, physical activity, vegetable/fruit intake, HRT | 3 | Questionnaire |
Taylor 2011 (50) | USA | CRC | Age, first-, second- and third-degree relatives with CRC | None | 1a | Questionnaire |
Wang 2013 (51) | Taiwan | CRC | 16 SNPsg | 10 additional SNPs | 1b | Blood test for genetics |
Yarnall 2013 (54) | UK data | CRC | BMI, smoking, alcohol, fiber intake, red meat intake, physical activity, 14 SNPsh | None | 1a | Questionnaire and blood test for genetics |
Wei Y 2009 (52) | China | CRC | BMI, smoking, first- or second-degree relative with CRC, alcohol | None | 1a | Questionnaire |
Shin 2014c (39) | Korea | CRC (male) | Age, BMI, family history of cancer, height, fasting serum glucose, total serum cholesterol, alcohol, meat consumption | Smoking, exercise | 3 | Questionnaire and blood test |
Wells 2014b (53) | California and Hawaii | CRC (male) | Age, BMI, smoking, first-degree relative with CC, race/ethnicity, alcohol, years of education, regular use of aspirin, multivitamins, red meat intake, history of diabetes, physical activity | History of cancer, regular use of NSAIDs, preference for well-done meat | 1b | Questionnaire |
Jo 2012b (46) | Korea | CRC (male) | Three SNPsi, age, family history of CRC | From 426,019 SNPs | 1b | Questionnaire and blood test for genetics |
Shin 2014f (39) | Korea | CRC (female) | Age, family history of cancer, height, fasting serum glucose, meat consumption | BMI, alcohol, smoking, exercise, female reproductive factors | 3 | Questionnaire and blood test |
Wells 2014a (53) | California and Hawaii | CRC (female) | Age, BMI, smoking, first degree relative with CC, race/ethnicity, alcohol, years of education, regular use of NSAIDs, multivitamins, history of diabetes, use of oestrogen | Preference for well done meat, physical activity, regular use of aspirin, red meat intake, history of cancer | 1b | Questionnaire |
Jo 2012a (46) | Korea | CRC (female) | Age, family history of CRC, five SNPsj | From 426,019 SNPs | 1b | Questionnaire and blood test for genetics |
Ma 2010b (38) | Japan | Rectal cancer | Age, BMI, physical activity, alcohol | FH CRC, diabetes, smoking | 3 | Questionnaire |
Wei E 2004b (41) | USA | Rectal cancer | Age, BMI, smoking, first-degree relative with rectal cancer, alcohol, physical activity, height, processed meat, gender, servings of beef, pork or lamb, folate intake, calcium intake | None | 1a | Questionnaire |
Freedman 2009c (37) | USA | Rectal cancer (male) | Number of first degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, NSAID use, physical activity | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, smoking, BMI, age | 1a, 4b | Questionnaire |
Shin 2014b (39) | Korea | Rectal cancer (male) | Age, BMI, family history of cancer, height, fasting serum glucose, total serum cholesterol, alcohol, meat consumption | Smoking, exercise | 3 | Questionnaire and blood test |
Freedman 2009f (37) | USA | Rectal cancer (female) | BMI, number of first-degree relatives with CRC, prior negative sigmoidoscopy/colonoscopy, polyp history, physical activity, NSAID use, estrogen use in last 2 years | FOBT, multivitamin use, red meat/fruit/vegetable intake, alcohol intake, age | 1a, 4b | Questionnaire |
Shin 2014e (39) | Korea | Rectal cancer (female) | Age, family history of cancer, height, fasting serum glucose, meat consumption | BMI, alcohol, smoking, exercise, female reproductive factors | 3 | Questionnaire and blood test |
Abbreviations: ACN+, advanced colorectal neoplasia including moderate dysplasia; ACN, advanced colorectal neoplasias; CC, colon cancer; CRC, colorectal cancer; BMI, body mass index; NSAIDs, nonsteroidal anti-inflammatory drugs; CVD, cardiovascular disease; CHD, coronary heart disease; HRT, hormone replacement therapy; IBD, inflammatory bowel disease; FOBT, fecal occult blood test; FH, family history.
aTypes of prediction model studies for each model defined according to the TRIPOD guidelines. 1a, development only; 1b, development and validation using resampling; 2a, random split-sample development and validation; 2b, nonrandom split-sample development and validation; 3, development and validation using separate data; 4, validation study.
bThe validation was for colon and rectal cancer combined.
crs6983267, rs4779584, rs4939827, rs3802842, rs10795668, rs16892766, rs4444235, rs9929218, rs10411210, rs961253.
dBANK1, B-cell scaffold protein with ankyrin repeats 1; BCNP1, B-cell novel protein 1; CDA, cytidine deaminase; MGC20553, FERM domain containing 3; MS4A, membrane-spanning four domains.
e14 SNPs localizing to 14 chromosome regions—1q41, 3q26.2, 8q23.3, 8q24.21, 10p14, 11q23.1, 12q13.13, 14q22.2, 15q13.3, 16q22.1, 18q21.1, 19q13.11, 20p12.3, 20q13.33.
fANXA3, Annexin A3; CLEC4D, C-type lectin domain family4, member D; IL2RB, Interleukin 2 receptor, beta; LMNB1, Lamin B1; PRRG4, Proline risk Gla 4; TNFAIP6, Tumor necrosis factor, alpha-induced protein 6; VNN1, Vanin 1.
grs1983891, rs869736, rs3214050, rs10411210, rs3731055, rs231775, rs1412829, rs1572072, rs6983267, rs1799782, rs712221, rs160277, rs11721827, rs2736100, rs3135967, rs1760944.
hrs6691170, rs10936599, rs16892766, rs6983267, rs10795668, rs3802842, rs11169552, rs4444235, rs4779584, rs9929218, rs4939827, rs10411210, rs961253, rs4925386.
irs17391002, rs9549448, rs254833.
jrs10083736, rs16987827, rs8046516, rs9926182, rs175237.
Personal characteristics | Diet |
Age | Fiber intake |
BMI | Meat |
Gender | Red meat |
Consanguinuity | Processed meat |
Family history of colorectal cancer | Servings of beef, pork or lamb |
Height | White meat |
Race/ethnicity | Frozen meat/chicken |
Marital status | Preference for well-done meat |
Education level | Vegetables |
Employment status | Fresh vegetables |
Knowledge of high-fiber diet | Green vegetables |
Years of education | Green salad |
Urban or rural living | Fruit |
Personal medical history | Fast food |
Gastric operation history | Processed food |
Gallbladder operation history | Pickled food |
Appendix operations | Fried food |
Hypertension | Smoked food |
Diabetes or history of diabetes | Eggs |
Inflammatory bowel disease | Milk |
History of coronary heart disease | Fat |
History of cardiovascular disease | Saturated fat |
Polyp history | Bakery products |
History of cancer | Refined grain |
Defecation frequency | Tea |
Non-specific abdominal pain | Olive oil |
Female hormonal factors | Soft drinks |
HRT (ever, current or past) | Bamboo root intake |
Estrogen use | Cold cereal |
Menopausal status | Glycemic index of intake |
Age at menarche | Western diet |
Age at first childbirth | Prudent dietary pattern |
Age at menopause | Calorific intake |
Lifestyle | Mutagen indexa |
Smoking (tobacco or Sheesha) | Calcium intake |
Alcohol | Folic acid intake |
Physical activity | Lutein intake |
Sleep | Biomarkers |
Drug and vitamin supplementation | Fasting glucose |
NSAID use | Hyperlipidemia |
Aspirin use | Cholesterol |
Multivitamin use | Triglycerides |
Calcium supplementation | Hemoglobin |
Vitamin D supplementation | Other tests |
Vitamin C supplementation | Fecal immunochemical test |
Vitamin E supplementation | fecal occult blood test |
Prior sigmoidoscopy or colonoscopy |
Personal characteristics | Diet |
Age | Fiber intake |
BMI | Meat |
Gender | Red meat |
Consanguinuity | Processed meat |
Family history of colorectal cancer | Servings of beef, pork or lamb |
Height | White meat |
Race/ethnicity | Frozen meat/chicken |
Marital status | Preference for well-done meat |
Education level | Vegetables |
Employment status | Fresh vegetables |
Knowledge of high-fiber diet | Green vegetables |
Years of education | Green salad |
Urban or rural living | Fruit |
Personal medical history | Fast food |
Gastric operation history | Processed food |
Gallbladder operation history | Pickled food |
Appendix operations | Fried food |
Hypertension | Smoked food |
Diabetes or history of diabetes | Eggs |
Inflammatory bowel disease | Milk |
History of coronary heart disease | Fat |
History of cardiovascular disease | Saturated fat |
Polyp history | Bakery products |
History of cancer | Refined grain |
Defecation frequency | Tea |
Non-specific abdominal pain | Olive oil |
Female hormonal factors | Soft drinks |
HRT (ever, current or past) | Bamboo root intake |
Estrogen use | Cold cereal |
Menopausal status | Glycemic index of intake |
Age at menarche | Western diet |
Age at first childbirth | Prudent dietary pattern |
Age at menopause | Calorific intake |
Lifestyle | Mutagen indexa |
Smoking (tobacco or Sheesha) | Calcium intake |
Alcohol | Folic acid intake |
Physical activity | Lutein intake |
Sleep | Biomarkers |
Drug and vitamin supplementation | Fasting glucose |
NSAID use | Hyperlipidemia |
Aspirin use | Cholesterol |
Multivitamin use | Triglycerides |
Calcium supplementation | Hemoglobin |
Vitamin D supplementation | Other tests |
Vitamin C supplementation | Fecal immunochemical test |
Vitamin E supplementation | fecal occult blood test |
Prior sigmoidoscopy or colonoscopy |
aMutagen index considers cooking temperature, frequency of consumption of meats cooked at high temperature and how well-done the meats are.
Development of the risk models
Further details of the development of each model and the risks of bias are given in Supplementary Tables S1a to S1d. Seventeen were developed from case–control studies with cases identified from hospitals or cancer disease registries and controls from primary care (n = 1), hospitals (n = 5), other research studies (n = 2), random-digit dialing (n = 7), spouses (n = 1), and healthy individuals or blood donors (n = 1). Seventeen were developed from cohort studies with between 21,581 and 1,326,058 participants and most identifying cases of cancer through cancer registries over a 10- to 20-year follow-up period. Fourteen were cross-sectional studies of participants attending for screening colonoscopy and all but one had advanced colorectal neoplasia as the outcome. Three risk models were developed from a review of the literature (35), a meta-analysis of risk factors (47) or modeling in a simulated population (54).
Discrimination of the risk models
The performance of 42 of the 52 models was reported in at least one of either the development population (n = 31), using bootstrapping or cross-validation (n = 13), a subset of the initial development population (n = 3), or an external population (n = 21). Details of the discrimination, calibration, and accuracy are given in Table 3 and details of the methods for those using a subset of the initial population or external populations in Supplementary Table S2.
. | . | Performance in development population . | Performance in bootstrap or cross-validation . | Performance in subset of population . | Performance in external population . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Author, year . | Outcome . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | External reference . |
Betes 2003a (21) | ACN+ | 0.65 | PPV 12.0-50.0 | |||||||||||
Betes 2003b (21) | ACN | 0.67 | PPV 7.3-33.3 | 0.65 (0.61–0.69) | Cai 2012 (22) | |||||||||
0.71 (0.64–0.78) | Chen 2014 (24) | |||||||||||||
Cai 2012 (22) | ACN | 0.74 (0.72–0.77) | Sens 82.8; Spec 50.8 | 0.74 (0.72–0.77) | 0.74 (0.70–0.78) | H-L P = 0.77 | Sens 80.3; Spec 51.2 | 0.65 (0.58–0.72) | Chen 2014 (24) | |||||
Chen 2013 (23) | ACN | 0.65 (0.61–0.69) | H-L P = 0.093 | Sens 65.1; Spec 57.2; PPV 44.4; NPV 75.7 | 0.66 (0.62–0.68) | |||||||||
Chen 2014 (24) | ACN | 0.75 (0.69–0.82) | H-L P = 0.205 | Sens 93.8; Spec 47.6; PPV 9.1; NPV 99.3 | 0.75 (0.70–0.82) | |||||||||
Hassan 2013 (25) | ACN | H-L P = 0.30 | ||||||||||||
Kaminski 2014 (26) | ACN | 0.64a | H-L P = 0.74a | 0.62 (0.60–0.64) | E/O ratio 1 (0.95–1.06). H-L P = 0.16a | Sens 92.4, Spec 13.9, PPV 7.55, NPV 96.0; | ||||||||
Lin 2006 (27) | ACN | 0.65 (0.61–0.70) | Cai 2012 (22) | |||||||||||
0.71 (0.64–0.77) | Chen 2014 (24) | |||||||||||||
Lin 2013 (28) | ACN | Men 0.61 (0.58–0.65), Women 0.62 (0.58–0.66) | ||||||||||||
Stegeman 2014 (29) | ACN | 0.76 | H-L P = 0.94 | Sens 40; Spec 93 | ||||||||||
Tao 2014a (31) | ACN | 0.67 (0.65–0.69) | H-L P = 0.21 | 0.66 (0.63–0.69) | H-L P = 0.65 | Tao 2014 (31) | ||||||||
Yeoh 2011 (32) | ACN | 0.66 (0.62–0.70) | H-L P = 0.29 | 0.64 (0.60–0.68) | H-L P = 0.49 | |||||||||
Colditz 2000 (34) | CC | Women 0.67 (0.64–0.70); Men 0.71 (0.68–0.74)b | Kim 2004 (17) | |||||||||||
0.6 | Schroy 2012 (19) | |||||||||||||
Driver 2007a (36) | CC | 0.72 | H-L P = 0.43 | |||||||||||
Ma 2010a (38) | CC | 0.71 (0.68–0.74) | 0.66 (0.62–0.70) | χ2 P = 0.20; E/O 1.19 (1.03–1.37) | Ma 2010 (38) | |||||||||
Wei E 2009 (40) | CC | 0.61 (0.59–0.63) | ||||||||||||
Shin 2014a (39) | CC (male) | 0.77 (0.76–0.78) | χ2 P = 0.22 | 0.77 (0.75–0.79) | χ2 P = 0.029 | |||||||||
Shin 2014d (39) | CC (female) | 0.71 (0.69–0.73) | χ2 P = 0.73 | 0.72 (0.70–0.74) | χ2 P = 0.49 | |||||||||
Driver 2007b (36) | CRC | 0.70 | H-L P = 0.91 | 0.69 | ||||||||||
Dunlop 2013 (43) | CRC | 0.59 | PPV 0.71; NPV 0.51 | 0.57 | Dunlop 2013 (43) | |||||||||
Han 2008 (45) | CRC | 0.88 (0.81–0.94) | Sens 94; Spec 77 PPV 82, NPV 92 | 79% (71.5–86.5) | Sens 88; Spec 64. PPV 67; NPV 87 | |||||||||
Ma 2010c (38) | CRC | 0.70 (0.68–0.72) | 0.64 (0.61–0.67) | χ2 P = 0.08; E/O 1.09 (0.98–1.23) | Ma 2010 (38) | |||||||||
Marshall 2010 (49) | CRC | 0.80 (0.74–0.85) | Sens 82; Spec 64; PPV 68, NPV 79 | 0.80 (0.76–0.84) | Sens 72; Spec 70; PPV 70, NPV 72 | 0.76 (0.70–0.82) | Sens 71.7; Spec 71.2 | Yip 2010 (20) | ||||||
Tao 2014b (31) | CRC | 0.71 (0.67–0.75) | 0.68 (0.57–0.79) | Tao 2014 (31) | ||||||||||
Taylor 2011 (50) | CRC | 0.67 | ||||||||||||
Wang 2013 (51) | CRC | 0.77 | 0.72 | |||||||||||
Yarnall 2013 (54) | CRC | 0.63 | ||||||||||||
Freedman 2009a,b,c (37) | CRC (male) | 0.61 (0.60–0.62) | E/O ratio 0.99 (0.96–1.04) | Park 2008 (18) | ||||||||||
Jo 2012b (46) | CRC (male) | 0.73 (0.68–0.77) | 0.70 (0.65–0.74) | |||||||||||
Shin 2014c (39) | CRC (male) | 0.76 (0.76–0.77) | χ2 P = 0.1035 | 0.78 (0.77–0.79) | χ2 P = 0.0003 | |||||||||
Wells 2014b (53) | CRC (male) | 0.69 | 0.68 (0.67–0.69) | |||||||||||
Freedman 2009d,e,f (37) | CRC (female) | 0.61 (0.59–0.62) | E/O ratio 1.05 (0.98–1.11) | Park 2008 (18) | ||||||||||
Jo 2012a (46) | CRC (female) | 0.65 (0.62–0.68) | 0.60 (0.56–0.64) | |||||||||||
Shin 2014f (39) | CRC (female) | 0.71 (0.70–0.72) | χ2 P = 0.6123 | 0.73 (0.71–0.74) | χ2 P = 0.1569 | |||||||||
Wells 2014a (53) | CRC (female) | 0.69 | 0.68 (0.67–0.69) | |||||||||||
Ma 2010b (38) | Rectal cancer | 0.68 (0.64–0.71) | 0.62 (0.57–0.66) | χ2 P = 0.19; E/O 0.94 (0.78–1.12) | Ma 2010 (38) | |||||||||
Shin 2014b (39) | Rectal cancer (male) | 0.75 (0.74–0.76) | χ2 P = 0.29 | 0.78 (0.77–0.79) | χ2 P = 0.0003 | |||||||||
Shin 2014e (39) | Rectal cancer (female) | 0.70 (0.68–0.71) | χ2 P = 0.084 | 0.72 (0.70–0.74) | χ2 P = 0.198 |
. | . | Performance in development population . | Performance in bootstrap or cross-validation . | Performance in subset of population . | Performance in external population . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Author, year . | Outcome . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | Discrimination AUROC (95% CI) . | Calibration . | Accuracy . | External reference . |
Betes 2003a (21) | ACN+ | 0.65 | PPV 12.0-50.0 | |||||||||||
Betes 2003b (21) | ACN | 0.67 | PPV 7.3-33.3 | 0.65 (0.61–0.69) | Cai 2012 (22) | |||||||||
0.71 (0.64–0.78) | Chen 2014 (24) | |||||||||||||
Cai 2012 (22) | ACN | 0.74 (0.72–0.77) | Sens 82.8; Spec 50.8 | 0.74 (0.72–0.77) | 0.74 (0.70–0.78) | H-L P = 0.77 | Sens 80.3; Spec 51.2 | 0.65 (0.58–0.72) | Chen 2014 (24) | |||||
Chen 2013 (23) | ACN | 0.65 (0.61–0.69) | H-L P = 0.093 | Sens 65.1; Spec 57.2; PPV 44.4; NPV 75.7 | 0.66 (0.62–0.68) | |||||||||
Chen 2014 (24) | ACN | 0.75 (0.69–0.82) | H-L P = 0.205 | Sens 93.8; Spec 47.6; PPV 9.1; NPV 99.3 | 0.75 (0.70–0.82) | |||||||||
Hassan 2013 (25) | ACN | H-L P = 0.30 | ||||||||||||
Kaminski 2014 (26) | ACN | 0.64a | H-L P = 0.74a | 0.62 (0.60–0.64) | E/O ratio 1 (0.95–1.06). H-L P = 0.16a | Sens 92.4, Spec 13.9, PPV 7.55, NPV 96.0; | ||||||||
Lin 2006 (27) | ACN | 0.65 (0.61–0.70) | Cai 2012 (22) | |||||||||||
0.71 (0.64–0.77) | Chen 2014 (24) | |||||||||||||
Lin 2013 (28) | ACN | Men 0.61 (0.58–0.65), Women 0.62 (0.58–0.66) | ||||||||||||
Stegeman 2014 (29) | ACN | 0.76 | H-L P = 0.94 | Sens 40; Spec 93 | ||||||||||
Tao 2014a (31) | ACN | 0.67 (0.65–0.69) | H-L P = 0.21 | 0.66 (0.63–0.69) | H-L P = 0.65 | Tao 2014 (31) | ||||||||
Yeoh 2011 (32) | ACN | 0.66 (0.62–0.70) | H-L P = 0.29 | 0.64 (0.60–0.68) | H-L P = 0.49 | |||||||||
Colditz 2000 (34) | CC | Women 0.67 (0.64–0.70); Men 0.71 (0.68–0.74)b | Kim 2004 (17) | |||||||||||
0.6 | Schroy 2012 (19) | |||||||||||||
Driver 2007a (36) | CC | 0.72 | H-L P = 0.43 | |||||||||||
Ma 2010a (38) | CC | 0.71 (0.68–0.74) | 0.66 (0.62–0.70) | χ2 P = 0.20; E/O 1.19 (1.03–1.37) | Ma 2010 (38) | |||||||||
Wei E 2009 (40) | CC | 0.61 (0.59–0.63) | ||||||||||||
Shin 2014a (39) | CC (male) | 0.77 (0.76–0.78) | χ2 P = 0.22 | 0.77 (0.75–0.79) | χ2 P = 0.029 | |||||||||
Shin 2014d (39) | CC (female) | 0.71 (0.69–0.73) | χ2 P = 0.73 | 0.72 (0.70–0.74) | χ2 P = 0.49 | |||||||||
Driver 2007b (36) | CRC | 0.70 | H-L P = 0.91 | 0.69 | ||||||||||
Dunlop 2013 (43) | CRC | 0.59 | PPV 0.71; NPV 0.51 | 0.57 | Dunlop 2013 (43) | |||||||||
Han 2008 (45) | CRC | 0.88 (0.81–0.94) | Sens 94; Spec 77 PPV 82, NPV 92 | 79% (71.5–86.5) | Sens 88; Spec 64. PPV 67; NPV 87 | |||||||||
Ma 2010c (38) | CRC | 0.70 (0.68–0.72) | 0.64 (0.61–0.67) | χ2 P = 0.08; E/O 1.09 (0.98–1.23) | Ma 2010 (38) | |||||||||
Marshall 2010 (49) | CRC | 0.80 (0.74–0.85) | Sens 82; Spec 64; PPV 68, NPV 79 | 0.80 (0.76–0.84) | Sens 72; Spec 70; PPV 70, NPV 72 | 0.76 (0.70–0.82) | Sens 71.7; Spec 71.2 | Yip 2010 (20) | ||||||
Tao 2014b (31) | CRC | 0.71 (0.67–0.75) | 0.68 (0.57–0.79) | Tao 2014 (31) | ||||||||||
Taylor 2011 (50) | CRC | 0.67 | ||||||||||||
Wang 2013 (51) | CRC | 0.77 | 0.72 | |||||||||||
Yarnall 2013 (54) | CRC | 0.63 | ||||||||||||
Freedman 2009a,b,c (37) | CRC (male) | 0.61 (0.60–0.62) | E/O ratio 0.99 (0.96–1.04) | Park 2008 (18) | ||||||||||
Jo 2012b (46) | CRC (male) | 0.73 (0.68–0.77) | 0.70 (0.65–0.74) | |||||||||||
Shin 2014c (39) | CRC (male) | 0.76 (0.76–0.77) | χ2 P = 0.1035 | 0.78 (0.77–0.79) | χ2 P = 0.0003 | |||||||||
Wells 2014b (53) | CRC (male) | 0.69 | 0.68 (0.67–0.69) | |||||||||||
Freedman 2009d,e,f (37) | CRC (female) | 0.61 (0.59–0.62) | E/O ratio 1.05 (0.98–1.11) | Park 2008 (18) | ||||||||||
Jo 2012a (46) | CRC (female) | 0.65 (0.62–0.68) | 0.60 (0.56–0.64) | |||||||||||
Shin 2014f (39) | CRC (female) | 0.71 (0.70–0.72) | χ2 P = 0.6123 | 0.73 (0.71–0.74) | χ2 P = 0.1569 | |||||||||
Wells 2014a (53) | CRC (female) | 0.69 | 0.68 (0.67–0.69) | |||||||||||
Ma 2010b (38) | Rectal cancer | 0.68 (0.64–0.71) | 0.62 (0.57–0.66) | χ2 P = 0.19; E/O 0.94 (0.78–1.12) | Ma 2010 (38) | |||||||||
Shin 2014b (39) | Rectal cancer (male) | 0.75 (0.74–0.76) | χ2 P = 0.29 | 0.78 (0.77–0.79) | χ2 P = 0.0003 | |||||||||
Shin 2014e (39) | Rectal cancer (female) | 0.70 (0.68–0.71) | χ2 P = 0.084 | 0.72 (0.70–0.74) | χ2 P = 0.198 |
Abbreviations: ACN+, advanced colorectal neoplasia including moderate dysplasia; ACN, advanced colorectal neoplasia defined as invasive cancer, an adenoma 10 mm or more, a villous adenoma (at least 25% villous) or an adenoma with high-grade dysplasia; CC, colon cancer; CRC, colorectal cancer; AUROC, area under the receiver operating characteristic curve. Values given as mean and 95% confidence intervals; Sens, sensitivity; Spec, specificity; PPV, positive predictive value; NPV, negative predictive value; H-L, Hosmer–Lemeshow test; E/O, expected over observed ratio.
aThese values are from the model prior to conversion of the coefficients to scores (Kaminski).
bRemoved aspirin use from men and history of chronic IBD from both genders as not available so actually not validating original score.
Discrimination, as measured by the area under the receiver operating characteristic curve (AUROC), was reported for 37 of the risk models, and these values are summarized in Fig. 2 in which the models are grouped into five groups according to the type of variables included (routine data only, self-completed questionnaire, self-completed questionnaire and nongenetic biomarkers, genetic biomarkers, and self-completed questionnaires plus genetic biomarkers). Within each group the models are ordered according to the number of variables included. The models on the left are, therefore, those with the fewest and most easily obtained variables and the more complex models are towards the right of the figure. Most models have acceptable to good discrimination with AUROCs between 0.65 and 0.75.
Among those models including only routinely available data, the best performing and validated model for advanced colorectal neoplasia was developed by Betes and colleagues among 2,210 asymptomatic individuals attending routine colorectal cancer screening in Spain (21). It includes only age, gender, and BMI and has AUROCs of 0.65 (95% CI, 0.61–0.69) and 0.71 (95% CI, 0.64–0.78) in external validation studies in China (22, 24). The only risk scores using routine data for colon cancer and colorectal cancer were developed by Driver and colleagues from a cohort of 21,581 men in the United States (36). The score for colon cancer includes age, BMI, and history of smoking and has an AUROC of 0.72 in that population and the score for colorectal cancer includes those variables plus alcohol consumption and has similar discrimination in bootstrap analysis (AUROC = 0.69; ref. 36).
The second group of risk models used self-completed questionnaire and, as illustrated by the absence of any clear trend in the AUROC within that group in Fig. 2, there is no clear improvement in discrimination as increasing numbers of variables are added from self-completed questionnaires to routine data. There is a suggestion from the third group of risk models that adding fasting serum glucose and total cholesterol to self-completed questionnaire variables might improve the discrimination in the scores developed by Shin and colleagues using a South Korean population of men (39), but this same improvement above other risk models containing only routine or questionnaire data was not seen in women.
The two models with the highest discrimination are both in the group based entirely on genetic biomarkers and were developed from small case–control studies. The model by Han and colleagues (2008) includes five genes (BANK1, B-cell scaffold protein with ankyrin repeats 1; BCNP1, B-cell novel protein 1; CDA, cytidine deaminase; MGC20553, FERM domain containing 3; MS4A, membrane-spanning 4 domains) identified from a case–control study including 58 patients with colorectal cancer and 57 disease-free controls using hierarchical cluster analysis and logistic regression (45). In that development population, the biomarker panel has an AUROC of 0.88 (95% CI, 0.81–0.94). It has yet to be externally validated. The model developed by Marshall and colleagues (2010) includes seven genes (ANXA3, Annexin A3; CLEC4D, C-type lectin domain family4, member D; IL2RB, interleukin 2 receptor, beta; LMNB1, Lamin B1; PRRG4, proline risk Gla 4; TNFAIP6, tumor necrosis factor, alpha-induced protein 6; VNN1, Vanin 1) similarly identified from a case–control study with 112 patients with colorectal cancer and 120 disease-free controls from hospitals in Canada and the United States (49). In that population, the model has an AUROC of 0.80 (95% CI, 0.74–0.85) and in a separate sample of 99 patients with colorectal cancer and 111 controls in Malaysia the AUROC was reported as 0.76 (95% CI, 0.70–0.82). The third risk model based entirely on genetic biomarkers also has acceptable discrimination. It was developed by Wang and colleagues in Taiwan, again from a case–control study, and includes 16 SNPs from a GWAS study in Asian people (51). It has an AUROC of 0.77 in the development population and 0.72 in cross-validation.
The final group of four risk models including both genetic biomarkers and phenotypic variables, however, do not have such good discrimination and adding variable numbers of different SNPs to data available from self-completed questionnaires does not appear to improve discrimination. The addition of 10 SNPs to age, gender, and family history (43) or three SNPs to age and family history (46) in case–control studies does not improve discrimination over age, gender, and family history alone [ref. 50; AUROC 0.57 and 0.73 (95% CI, 0.68–0.77) (male), 0.65 (95% CI, 0.62–0.68) (female) compared to 0.67]. The discrimination of a model with 14 SNPs added to BMI, smoking, alcohol, fiber intake, red meat intake, and physical activity (54) has an AUROC of 0.63 in a simulated population which is no better than those models using only routinely available data in cross-sectional or cohort studies.
Calibration of the risk models
Calibration was reported for 21 of the 52 models. In most cases it was reported as the Hosmer–Lemeshow statistic (n = 9) or chi-squared test (n = 6) with P-values ranging from P = 0.0003 to 0.94. Where expected:observed ratios were given the confidence intervals all cross one except for the model by Ma and colleagues (2010) for colon cancer where it is 1.19 (95% CI, 1.03–1.37; ref. 38).
Sensitivity and specificity of the risk models
Sensitivity and specificity were reported for only seven models. Two of these were the genetic models developed by Han and Marshall, which have sensitivities of 88% and 71.7% and specificities of 64% and 71.2% in external populations, respectively (45, 49). The other five were all risk models for advanced colorectal neoplasia and range from high sensitivity (92.4%) and low specificity (13.9%) in Kaminski (26) to low sensitivity (40%) and high specificity (93%) in Stegeman (29).
Comparison of different outcomes
Five studies (31, 36, 38, 37, 39) simultaneously developed risk models for more than one of advanced colorectal neoplasia, colorectal cancer, colon cancer, and rectal cancer. All showed that beta-coefficients and included variables differed slightly between different sites but only two provided any comparative data. Tao reported the performance of the same model for predicting advanced colorectal neoplasia or colorectal cancer and showed that the discrimination was similar (AUROC 0.68 for advanced colorectal neoplasia and 0.66 for colorectal cancer; ref. 31). Driver showed that the AUROC of a predictive model developed for colon cancer was only slightly superior to the model developed for colorectal cancer when predicting colon cancer risk (0.717 vs. 0.695), but the goodness-of-fit test showed it to perform less well than the colorectal cancer model (Hosmer–Lemeshow statistic 0.43 vs. 0.91; ref. 36).
Discussion
Principal findings
To our knowledge this is the first comprehensive systematic review of risk prediction models for colorectal cancer. It shows that multiple risk models exist for predicting the risk of developing colorectal cancer, colon cancer, rectal cancer, or advanced colorectal neoplasia in asymptomatic populations, and that they have the potential to identify individuals at high risk of disease. The discrimination of the models, as measured by AUROC, compare favorably with risk models used for other cancers, including breast cancer (55) and melanoma (56), and several include only variables recorded in routine medical records and so could be implemented into clinical practice without the need for further data collection. Grouping risk models according to type and number of variables included (Fig. 2) also shows that there is no clear improvement in discrimination as increasing numbers of variables are added from self-completed questionnaires to routine data, or in studies in which genetic biomarkers are added to data from self-completed questionnaires. A small number of risk models developed from case–control studies of genetic biomarkers alone show some promise but require further external validation in population-based samples.
Strengths and weaknesses
The main strength of this review is our use of a broad search strategy and careful screening of possible papers for inclusion. Although we cannot exclude publication bias or the possibility that there are other risk models that we did not identify, using this systematic approach enabled us to identify over three times as many risk models as reported in previous reviews in this area (8, 9). This review is, therefore, the most comprehensive to date and the inclusion of less well cited risk models allows us to demonstrate for the first time the relative performance of simple and more complex models. However, as we included only those risk models applicable to asymptomatic individuals from the general population, these models are not applicable to those with familial syndromes, such as Lynch syndrome or familial adenomatous polyposis, or those with existing cancer. Most of the risk models were developed from predominantly white populations in Europe or America or Asian populations in China, Japan, Taiwan, and Korea, with only two from Arabic countries and none from Australasia. There is a well-recognized high degree of heterogeneity by nationality in colorectal cancer incidence with an up to 10-fold difference internationally (57). Much of this variation is thought to be due to differences in environmental risk factors as the incidence rate of colorectal cancer in migrants approaches that of the host country within one or two generations (58). The risk models in this review may, therefore, be less applicable to these less well represented populations.
Implications for clinicians and policy makers
There is now substantial evidence that the incidence of, and mortality from, colorectal cancer can be reduced by screening with fecal occult blood testing (59–61), flexible sigmoidoscopy (62, 63), or colonoscopy (64–66), and multiple economic analyses support the cost-effectiveness of population-based colorectal cancer screening (67–69). This review shows that risk models exist that have the potential to stratify the general population into risk categories and allow screening and preventive strategies to be targeted at those most likely to benefit while leaving those at low risk of disease unexposed to direct and indirect harms of screening programmes. This might improve the cost-effectiveness of colorectal cancer screening (70) and would address concerns about demand and capacity for colonoscopy (71, 72). It would also provide an opportunity to implement potential chemo-preventive medications such as nonsteroidal anti-inflammatory drugs. These drugs are currently not recommended for asymptomatic adults at average risk for colorectal cancer (73), but both the United States Preventive Services Task Force (74) and a recent international consensus panel (75) advocate additional research into the use of aspirin in high-risk individuals for whom the benefits might outweigh the harms. The use of risk prediction models would also potentially increase uptake of screening and provide an opportunity to give information to encourage lifestyle changes. Despite the known mortality benefit of colorectal cancer screening, large numbers of eligible people do not participate in colorectal cancer screening programs (76, 77). Although the reasons for nonparticipation are complex, several studies have suggested that high-risk individuals are more likely to be up-to-date with colorectal cancer screening and adhere to physician recommendations (77–80). Knowledge of their risk, both within or outside screening programmes, may also encourage adoption of more healthy lifestyles which might further improve outcomes: it is estimated that between 30% and 70% of the overall burden of colon cancers in the U.S. and U.K. populations could be prevented through moderate changes in diet and lifestyle (81, 82), and information about individualized colon cancer risk has been shown to lead to a reduction in multiple behavioral risk factors in patients with a history of colon adenoma (83).
Several barriers, however, exist to the incorporation of risk prediction models into clinical practice. The main one is the practical challenge of collecting the necessary risk factor information. Many of the included risk scores used data collected from food frequency questionnaires. Although this allows accurate estimates for research, their application is unlikely to be practical at the population level. Similarly, risk scores including genetic biomarkers require sample collection and processing and some means of feeding back results to individuals. Although from Fig. 2 it appears as if the two models with the highest reported discrimination were both based on genes, these were developed in small case–control studies, which will tend to overestimate performance in the general population. Several risk models including genetic biomarkers also performed no better than those based on routine information and GWA studies of colorectal cancer have shown that the colorectal cancer risks associated with each of the variants are at best modest (relative risks of 1.1–1.3), with the distribution of risk alleles following a normal distribution in both colorectal cancer cases and controls (84). As our understanding of these genetic biomarkers increases, and point-of-care genetic profiling becomes more widely available, more accurate models incorporating genomic markers will become easier to implement. A risk model that is able to predict colorectal cancer, colon cancer, rectal cancer, and advanced colorectal neoplasia would also clearly have more utility in the clinical setting than separate models for each and this review also shows that to be possible: where studies developed separate risk models for different sites, the final models did include different variables, but these differences tended to be small and the performance of the models similar (31, 36).
Unanswered questions and future research
Although the potential clinical and economic benefits of successfully integrating a risk prediction model for colorectal cancer into clinical practice could be substantial, it remains to be defined what role the currently available and emerging models can have in practice and a number of steps are required to establish a viable useable risk profile. First, this review provides comparative data on the performance of existing risk models but ideally the choice of risk model for each country would be based on validation studies in each population of interest (10). Further studies are therefore needed to compare the performance of these risk models, including those for different sites, simultaneously in large cohorts. This is particularly the case for those risk models incorporating genetic biomarkers which have mostly been developed using small case–control studies and which may perform substantially less well in population-based studies. Second, research is needed to establish the optimal implementation strategies. This includes modeling studies comparing the impact on morbidity and mortality and cost-effectiveness of different implementation strategies in comparison to current programmes based on age alone and consensus meetings with expert groups. Third, qualitative research with members of the public and practitioners is needed to determine how best to communicate the risk output and to assess the feasibility, acceptability of any risk based programme. Finally, before any risk model is introduced into routine clinical practice, implementation studies, ideally randomized controlled trials, are needed to assess the benefits and potential adverse consequences of applying these models in practice.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Author's Contributions
Conception and design: J.A. Usher-Smith, F.M. Walter, J. Emery, A.K. Win, S.J. Griffin
Development of methodology: J.A. Usher-Smith, F.M. Walter, J. Emery, S.J. Griffin
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J.A. Usher-Smith, F.M. Walter
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.A. Usher-Smith, F.M. Walter, J. Emery, A.K. Win, S.J. Griffin
Writing, review, and/or revision of the manuscript: J.A. Usher-Smith, F.M. Walter, J. Emery, A.K. Win, S.J. Griffin
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J.A. Usher-Smith
Study supervision: S.J. Griffin
Acknowledgments
The authors thank Isla Kuhn, Reader Services Librarian, University of Cambridge Medical Library, for her help developing the search strategy.
Grant Support
J.A. Usher-Smith is funded by a National Institute of Health Research (NIHR) Clinical Lectureship and F.M. Walter by an NIHR Clinician Scientist award. J.D. Emery is funded by an Australian National Health and Medical Research Council (NHMRC) Practitioner Fellowship. A.K. Win has an NHMRC Early Career Fellowship. The views expressed in this publication are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. All researchers were independent of the funding body and the study sponsors and funder had no role in study design; data collection, analysis, and interpretation of data; in the writing of the report; or decision to submit the article for publication.