Abstract
Most risk models for cancer are either specific to individual cancers or include complex or predominantly non-modifiable risk factors.
We developed lifestyle-based models for the five cancers for which the most cases are potentially preventable through lifestyle change in the UK (lung, colorectal, bladder, kidney, and esophageal for men and breast, lung, colorectal, endometrial, and kidney for women). We selected lifestyle risk factors from the European Code against Cancer and obtained estimates of relative risks from meta-analyses of observational studies. We used mean values for risk factors from nationally representative samples and mean 10-year estimated absolute risks from routinely available sources. We then assessed the performance of the models in 23,768 participants in the EPIC-Norfolk cohort who had no history of the five selected cancers at baseline.
In men, the combined risk model showed good discrimination [AUC, 0.71; 95% confidence interval (CI), 0.69–0.73] and calibration. Discrimination was lower in women (AUC, 0.59; 95% CI, 0.57–0.61), but calibration was good. In both sexes, the individual models for lung cancer had the highest AUCs (0.83; 95% CI, 0.80–0.85 for men and 0.82; 95% CI, 0.76–0.87 for women). The lowest AUCs were for breast cancer in women and kidney cancer in men.
The discrimination and calibration of the models are both reasonable, with the discrimination for individual cancers comparable or better than many other published risk models.
These models could be used to demonstrate the potential impact of lifestyle change on risk of cancer to promote behavior change.
This article is featured in Highlights of This Issue, p. 1
Introduction
Previous research has shown that providing cancer risk information to individuals can improve accuracy of risk perception (1–3), enhance response efficacy (4), and increase intention to have cancer screening (5, 6). In addition, in the only trial assessing the impact of cancer risk tools in primary care on lifestyle behaviors, participants in the intervention group were significantly more likely to report increased daily fruit and vegetable intake and physical activity after 6 months (7).
Providing individualized estimates of risk of cancer in primary care settings, alongside demonstration of the impact of lifestyle change on that risk, may therefore help motivate change among individuals and complement wider collective approaches to shifting population distributions of behavior and risk factors. Studies of healthcare professionals and members of the public in the UK have shown that both groups support provision of cancer risk information in primary care (8, 9). However, in order to successfully incorporate such risk information into practice, there is a need for risk algorithms which include modifiable risk factors that are either routinely available in electronic medical records or can easily be obtained at new patient registration or within consultations.
The incidence of individual cancers is also low compared with other conditions, such as cardiovascular disease. Consequently, in one study, the first reaction of almost all participants to being presented with their 10-year absolute risk of an individual cancer was that it was low and not concerning (9). Providing context for the risk estimates through comparison with other people was also needed. There is, therefore, a need for models that can estimate an individual's combined absolute risk of a number of cancers based on current values of lifestyle risk factors, and which can be used to calculate relative risk comparing current values with either average or recommended values of the risk factors.
A number of risk models for cancer already exist. However, most are specific to individual cancers (10–13), and although two collections of models exist for multiple cancers, the QCancer10 models (14) and the Disease Risk Index (15), to our knowledge, no models that predict risk of multiple cancers together have been published. In addition, the risk models for individual cancers often include multiple complex risk factors, such as breast density, exposure to asbestos, or a past history of colorectal adenomas, or include few modifiable risk factors. Of the 17 models for breast cancer identified in one systematic review (10) for example, less than half included body mass index (BMI) and only one physical activity.
We therefore aimed to develop and validate lifestyle-based prediction models for the five most common preventable cancers in men and women.
Materials and Methods
We developed and validated risk prediction models in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis guidelines (16).
Cancer outcomes
We included as outcomes the five cancers for which the most cases are potentially preventable through lifestyle change in the UK for men and women from Cancer Research UK data (http://www.cancerresearchuk.org/health-professional/cancer-statistics/risk/preventable-cancers). We excluded melanoma as the evidence to date suggests that risk is influenced by exposure to sun as a child rather than sun protection habits in adulthood (17). For men, these are lung, colorectal, bladder, kidney, and esophageal cancer. Together, they account for 38% of cancer cases among men in the UK, and across the five cancers, 45,000 cases are estimated to have been preventable in 2011, 25% of all cancer cases in men. For women, these are breast, lung, colorectal, endometrial, and kidney cancer. Together, they account for 61% of cancer cases among women in the UK, and across the five cancers, 48,000 cases are estimated to have been preventable in 2011, 28% of all cancer cases in women.
Development of risk models
Risk factors.
We selected lifestyle risk factors for each of these cancers based on the European Code against Cancer 4th Edition (18–22). For each cancer, we included all risk factors considered convincing or probable by the panels of experts cited within the European Code against Cancer 4th Edition, except for: dietary fiber for colorectal cancer in view of the difficulty obtaining reliable self-report measures for fiber intake using a single question; radon levels for lung cancer as this may not always be under the control of the individual; breastfeeding for breast cancer as in most cases this will not be modifiable for the target age range for these risk models (40–79 years); and hormone replacement therapy (HRT) for breast and endometrial cancer because the relationship between HRT use and cancer depends on multiple factors, including type of HRT, route of administration (23), and age at initiation, and the decision to take HRT is a complex decision requiring a balance of risks and benefits and one that should not be made on the basis of future cancer risk alone (24). We then obtained estimates of the association of each lifestyle factor with each of the relevant cancers from published meta-analyses of observational studies (25–36). For all associations except alcohol with colorectal cancer, we assumed a log linear relationship between exposure and risk. To incorporate the increasing evidence that BMI is associated with increased risk of postmenopausal breast cancer but inversely associated with breast cancer before the menopause (30), we included different estimates of the association between BMI and breast cancer in those <50 years of age and those ≥50 years. No other significant interactions between the risk factors included for the chosen cancers have been reported (30).
Estimates of average values of risk factors.
We estimated average population values of each risk factor in 10-year age groups (40–49, 50–59, 60–69, and 70–79 years) in men and women using nationally representative population surveys. As the latent period between “exposure” to the lifestyle factors and the subsequent increase in cancer risk is not known, we assumed this would be, on average, 10 years as in previous publications (26, 37–41). We, therefore, used the Health Survey for England (HSE) 2005 (https://digital.nhs.uk/data-and-information/publications/statistical/health-survey-for-england) to obtain population average values for BMI, smoking status, and fruit and vegetable consumption and data from the National Diet and Nutrition survey (NDNS) years 1 to 4 (2008–2012; https://discover.ukdataservice.ac.uk/catalogue/?sn=6533), the closest years to 2005 that included adults up to age 79, for red and processed meat consumption, alcohol intake, and physical activity.
Details of the sampling design and methods of both datasets have been described in detail alongside the data (see https://discover.ukdataservice.ac.uk/catalogue/?sn=6533). In brief, the HSE is an annual survey designed to measure health and health-related behaviors in a nationally representative sample of adults and children living in private households in England. In 2005, it also included an additional nationally representative general population sample of adults ages 65 years and over. For this analysis, we used data on self-report smoking status, BMI, and portions of fruit and vegetable consumption per day. Each portion of fruit or vegetables was considered 80 g based on the British Dietetic Association portion sizes food factsheet (available from: https://www.bda.uk.com/foodfacts/portionsizesfoodfactsheet.pdf), and all analyses were performed with weighting to adjust for non-contact and non-response at both household and individual levels.
The NDNS is an annual cross-sectional survey that began in 2008 and covers a representative sample of approximately 1,000 people living in private households in the UK per year. For this analysis, red meat, processed meat, and alcohol consumption (g/day) were obtained from responses to a self-completed food diary. Eight grams of alcohol were estimated as one unit. Physical activity in metabolic equivalents (MET) was derived from time spent at moderate or vigorous physical activity, calculated from the recent physical activity questionnaire (42). Ten minutes of activity was considered 1 MET (43). As for the HSE, all analyses were performed with weighting to adjust for non-contact and non-response.
Estimates of relative risk comparing observed with average values of all the risk factors for separate cancers.
For each of the five cancers, we used the estimates of associations between each risk factor and each cancer to create a formula for the relative risk comparing observed with average values of all the risk factors. For continuous risk factors, the relative risk was given by:
Relative risk (χ) = (risk per unit) ⁁ (observed value – average value).
For smoking, the other categorical risk factor, the relative risk was given by:
Relative risk (smoking) = (risk for current smokers x proportion of population who are current smokers) + (risk for ex-smokers x proportion of population who are ex-smokers) + (risk for nonsmokers x proportion of population who are nonsmokers).
Risk factors were assumed to act multiplicatively. The risk of developing endometrial cancer relative to a person with average values of all the risk factors, for example, was calculated by:
Relative risk = [risk per k gm−2 ⁁ (observed BMI – average BMI)] × [risk per MET ⁁ (observed METs – average METs)].
Estimates of average 10-year absolute risk for separate cancers.
We then calculated estimates of average 10-year risk for each cancer in men and women in England in the same 10-year age groups from 40 to 79 years using the “Current Probability” method (44). This uses a life-table approach for calculating the risk of developing cancer and takes into account the probability of death from other causes. To obtain the data required for the current probability calculations, we used the age- and sex-specific cancer incidence and mortality rates, age- and sex-specific all-cause mortality, and mid-year population size in England during 2015 reported by the Office for National Statistics (ONS) (https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancerregistrationstatisticscancerregistrationstatisticsengland and https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationestimatesanalysistool).
Estimates of 10-year absolute risk with observed values for separate cancers.
To estimate absolute cancer risk over the following 10 years for each separate cancer for an individual with observed values of the risk factors, we multiplied the estimates derived above of the relative risk comparing observed with average values of the risk factors by the estimated average 10-year absolute risk for an individual in the same age group and of the same sex as the observed individual, so that:
10-year absolute risk for an individual = relative risk comparing individual values to average values of lifestyle factors × average 10-year absolute risk for sex and age group.
Estimates for the combined cancers.
To estimate the relative risk comparing observed with average values of the risk factors for the five cancers combined, we calculated a weighted average of the five cancer-specific relative risks, using the average 10-year estimated absolute risk of developing each cancer in 10-year age groups (40 years to 79 years) as weights. The 10-year estimated absolute risk for the combined cancers was calculated by summing the 10-year estimated absolute risks of the five separate cancers, assuming independence of each cancer.
Estimates of relative risk comparing observed with “recommended” values of all the risk factors.
To allow us to present estimates of the change in risk if individuals followed a “recommended” lifestyle, we used the same method to calculate the relative risk comparing observed with recommended values of the risk factors. For smoking, BMI, fruit and vegetable consumption, and physical activity we used the UK Department of Health guidelines to define these (being a non-smoker, having a BMI of 25 kg/m2, eating five portions of fruit and vegetables a day, and doing 150 minutes of moderate physical activity per week). For alcohol and red and processed meat consumption which are associated with increased risk, we used zero as our recommended level in line with recommendations from the World Cancer Research Fund (https://www.wcrf-uk.org/). This decision was made to avoid appearing to encourage consumption of red or processed meat or alcohol among those consuming small amounts.
Validation of risk model
We externally validated the model in the EPIC-Norfolk cohort (45). This includes 25,639 men and women who were recruited between March 1993 and December 1997 from 27 practices across Norfolk and were aged between 45 and 74 years old at the time of recruitment. Participants were extensively phenotyped at baseline. Incident cases of cancer are recorded through linkage to cancer registries. Smoking status and alcohol consumption were assessed using single questions. Alcohol consumption in grams was estimated as the total units of drinks consumed multiplied by eight. Fruit, vegetable, red meat, and processed meat consumption was estimated from responses to a previously validated food frequency questionnaire and seven-day food diary, respectively (46): Each portion of fruit or vegetables was considered 80 g, and each portion of red or processed meat 90 g, based on the British Dietetic Association portion sizes food factsheet (https://www.bda.uk.com/foodfacts/portionsizesfoodfactsheet.pdf). Physical activity was computed from the average number of hours per week that participants reported cycling or doing “other physical activity such as keep fit, aerobics, swimming or jogging.” As above, 10 minutes of activity was considered 1 MET (43).
We assessed the performance of the risk model in 23,768 participants (12,828 women and 10,940 men) in the EPIC-Norfolk cohort who had at least 10-year follow-up, data for all risk factors, and no previous history or diagnosis of any of the chosen cancers at baseline. Participants with prevalent and incident diagnoses of each cancer were identified through linkage to the National Cancer Registry. A full list of the ICD9 and ICD10 codes used for each cancer are in Supplementary Table S1. We truncated continuous variables at the 95th centile. We treated the outcome as a binary variable (developed one or more of the five cancers or did not develop any of the five cancers). For the primary analysis, we assessed discrimination by plotting ROC curves and calculating the area under the ROC curve (AUC). We assessed calibration graphically by plotting the observed risk (i.e., percentage of individuals who developed cancer) within each decile of predicted risk and calculated overall observed to expected ratios.
All analyses were conducted using STATA/SE version 14.2 (47).
Results
Development of risk model
Details of the lifestyle risk factors included, units for comparison, relative risks, and their source are given in Table 1. Smoking is associated with the highest relative risk for five of the seven cancers considered across the two sexes, followed by BMI. The sites associated with the greatest number of lifestyle risk factors are colorectal and esophageal cancer. The estimated mean 10-year risk for each of the cancers is provided in Supplementary Table S2A and S2B. In men, colorectal cancer has the highest 10-year estimated risk of the five cancers between 20 and 59 years and lung cancer the highest above age 60 years. In women, breast cancer has the highest 10-year estimated risk at all ages, with colorectal cancer the second highest up to age 60 years and lung cancer above age 60 years. The mean values used in the calculations are reported in Supplementary Table S3. Across all age groups, there were more former smokers among men than women, and men consumed more alcohol, more red and processed meat, and less fruit and, except for those ages 70 to 79 years, less vegetables than women. In both sexes, alcohol consumption, vegetable consumption, and physical activity reduced with age, whereas fruit consumption increased up to ages 60 to 69.
Risk factor . | Unit/comparison . | Relative risk . | |
---|---|---|---|
Physical activity (22) | 1 MET-hour per week | Breast (25)—0.9970 | |
Colorectal (26)—0.9940 | |||
Endometrial (27)—0.9933 | |||
Smoking (18) | Current and former smokers compared with nonsmokers | Current smokers | Former smokers |
Lung (28)—8.96 | Lung (28)—3.85 | ||
Colorectal (28)—1.2 | Colorectal (28)—1.2 | ||
Esophageal (28)—2.5 | Esophageal (28)—2.03 | ||
Kidney (28)—1.52 | Kidney (28)—1.25 | ||
Bladder (29)—3.14 | Bladder (29)–1.83 | ||
Red meat (20) | 1 g per day | Colorectal (30)—1.0016 | |
Processed meat (20) | 1 g per day | Colorectal (30)—1.0033 | |
Alcohol (19) | 1 g per day | Breast (31)—1.0068 | |
Colorectal (32)—ln (RR) = 0.006992 × g/day – 0.00001 × g/day2 | |||
Esophageal (33)—1.0129 | |||
BMI (25) | 1 kg/m2 | Breast (postmenopausala) (30)—1.0229 | |
Breast (premenopausala) (30)—0.9856 | |||
Colorectal (33)—1.030 | |||
Esophageal (34)—1.087 | |||
Kidney (33)—1.04 | |||
Endometrial (30)—1.034 | |||
Fruit (20) | 1 gram per day | Esophageal (35)—0.994 | |
Lung (36)—0.99 | |||
Vegetables (20) | 1 gram per day | Esophageal (35)—0.9972 |
Risk factor . | Unit/comparison . | Relative risk . | |
---|---|---|---|
Physical activity (22) | 1 MET-hour per week | Breast (25)—0.9970 | |
Colorectal (26)—0.9940 | |||
Endometrial (27)—0.9933 | |||
Smoking (18) | Current and former smokers compared with nonsmokers | Current smokers | Former smokers |
Lung (28)—8.96 | Lung (28)—3.85 | ||
Colorectal (28)—1.2 | Colorectal (28)—1.2 | ||
Esophageal (28)—2.5 | Esophageal (28)—2.03 | ||
Kidney (28)—1.52 | Kidney (28)—1.25 | ||
Bladder (29)—3.14 | Bladder (29)–1.83 | ||
Red meat (20) | 1 g per day | Colorectal (30)—1.0016 | |
Processed meat (20) | 1 g per day | Colorectal (30)—1.0033 | |
Alcohol (19) | 1 g per day | Breast (31)—1.0068 | |
Colorectal (32)—ln (RR) = 0.006992 × g/day – 0.00001 × g/day2 | |||
Esophageal (33)—1.0129 | |||
BMI (25) | 1 kg/m2 | Breast (postmenopausala) (30)—1.0229 | |
Breast (premenopausala) (30)—0.9856 | |||
Colorectal (33)—1.030 | |||
Esophageal (34)—1.087 | |||
Kidney (33)—1.04 | |||
Endometrial (30)—1.034 | |||
Fruit (20) | 1 gram per day | Esophageal (35)—0.994 | |
Lung (36)—0.99 | |||
Vegetables (20) | 1 gram per day | Esophageal (35)—0.9972 |
aDefined as below 50 years or 50 years and over such that premenopausal = age < 50 years and postmenopausal = age ≥ 50 years.
Performance of risk model
From the 25,059 participants within the EPIC-Norfolk cohort who were aged 40 to 79 years at baseline with no existing diagnosis of any of the cancers of interest and who had 10-year follow-up, data for all risk factors were available for 23,768 (94.8%). Among these participants, 432 (3.95%) men and 647 (5.0%) women developed at least one of the cancers during the 10-year follow-up (Tables 2 and 3). Compared with those who did not develop at least one of the cancers, those who did were on average older and more likely to be female, have a higher BMI, report less physical activity, and be a current smoker. Differences in red meat, processed meat, fruit, and vegetable consumption were small.
. | Validation cohort . | |
---|---|---|
. | No incident cancer . | Incident cancer . |
. | n = 22,689 . | n = 1,079 . |
Age (years) | ||
Mean (SD) | 58.9 (9.3) | 62.9 (8.6) |
40–49 (%) | 22.2 | 8.7 |
50–59 (%) | 32.2 | 27.2 |
59–60 (%) | 30.7 | 37.4 |
69–70 (%) | 15.0 | 26.8 |
Sex | ||
Female (%) | 53.7 | 60.0 |
Male (%) | 46.3 | 40.0 |
BMI (kg/m2) | ||
Mean (SD) | 26.3 (3.9) | 26.9 (4.3) |
Smoking status | ||
Never (%) | 42.4 | 43.0 |
Former (%) | 46.1 | 42.5 |
Current (%) | 11.5 | 14.5 |
Alcohol intake (g/day) | ||
Mean (SD) | 7.5 (8.2) | 7.2 (8.2) |
Physical activity (MET-h/day) | ||
Mean (SD) | 9.3 (13.2) | 8.4 (13.0) |
Red meat consumption (g/day) | ||
Mean (SD) | 39.3 (25.3) | 39.8 (25.0) |
Processed meat consumption (g/day) | ||
Mean (SD) | 17.9 (12.9) | 18.0 (12.9) |
Fruit consumption (g/day) | ||
Mean (SD) | 251.2 (149.6) | 254.7 (154.2) |
Vegetable consumption (g/day) | ||
Mean (SD) | 251.2 (107.1) | 252.3 (105.6) |
. | Validation cohort . | |
---|---|---|
. | No incident cancer . | Incident cancer . |
. | n = 22,689 . | n = 1,079 . |
Age (years) | ||
Mean (SD) | 58.9 (9.3) | 62.9 (8.6) |
40–49 (%) | 22.2 | 8.7 |
50–59 (%) | 32.2 | 27.2 |
59–60 (%) | 30.7 | 37.4 |
69–70 (%) | 15.0 | 26.8 |
Sex | ||
Female (%) | 53.7 | 60.0 |
Male (%) | 46.3 | 40.0 |
BMI (kg/m2) | ||
Mean (SD) | 26.3 (3.9) | 26.9 (4.3) |
Smoking status | ||
Never (%) | 42.4 | 43.0 |
Former (%) | 46.1 | 42.5 |
Current (%) | 11.5 | 14.5 |
Alcohol intake (g/day) | ||
Mean (SD) | 7.5 (8.2) | 7.2 (8.2) |
Physical activity (MET-h/day) | ||
Mean (SD) | 9.3 (13.2) | 8.4 (13.0) |
Red meat consumption (g/day) | ||
Mean (SD) | 39.3 (25.3) | 39.8 (25.0) |
Processed meat consumption (g/day) | ||
Mean (SD) | 17.9 (12.9) | 18.0 (12.9) |
Fruit consumption (g/day) | ||
Mean (SD) | 251.2 (149.6) | 254.7 (154.2) |
Vegetable consumption (g/day) | ||
Mean (SD) | 251.2 (107.1) | 252.3 (105.6) |
Cancer . | Men (n = 10,940) . | Women (n = 12,828) . |
---|---|---|
Lung | 142 (1.30) | 54 (0.42) |
Colorectal | 184 (1.68) | 138 (1.08) |
Kidney | 28 (0.26) | 16 (0.12) |
Breast | — | 367 (2.86) |
Bladder | 47 (0.43) | — |
Endometrial | — | 84 (0.65) |
Esophageal | 35 (0.32) | — |
Cancer . | Men (n = 10,940) . | Women (n = 12,828) . |
---|---|---|
Lung | 142 (1.30) | 54 (0.42) |
Colorectal | 184 (1.68) | 138 (1.08) |
Kidney | 28 (0.26) | 16 (0.12) |
Breast | — | 367 (2.86) |
Bladder | 47 (0.43) | — |
Endometrial | — | 84 (0.65) |
Esophageal | 35 (0.32) | — |
The mean relative risk compared with the “recommended” lifestyle was 1.76 (SD, 0.94; range, 0.72–8.0). In men, there was good discrimination with an AUC for the combined model of 0.71 [95% confidence interval (CI), 0.69–0.73; Fig. 1A]. There was also reasonable agreement between the predicted absolute 10-year risk and the observed risk (Fig. 2A), although overall the risk model underestimated risk with an observed:expected ratio of 1.34 (95% CI, 1.04–1.73). Discrimination was less good in women with an AUC of 0.59 (95% CI, 0.57–0.61; Fig. 1B). Overall calibration was better in women (Fig. 2B) with the overall observed-to-expected ratio of 1.08 (95% CI, 0.90–1.30), but at higher risks, the model overestimated risk.
Figure 3A and B shows the AUC for the five individual cancers as well as the combined model for men and women, respectively. In both sexes, the models for lung cancer had the highest AUCs [0.80 (95% CI, 0.77–0.83) for men and 0.82 (95% CI, 0.76–0.87) for women]. The lowest AUCs were for breast and endometrial cancer in women and kidney cancer in men.
Discussion
Key findings
We have developed and validated models in men and women for prediction of the individual or combined absolute risk of developing one or more of the five cancers for which the most cases are potentially preventable through lifestyle change. The models can also be used to present relative risks compared with either an average or a recommended lifestyle. The models include information about established lifestyle risk factors in a format that is readily obtainable from individuals or their medical records without the need for laboratory tests. The combined models had good discrimination in men (AUC 0.71) and reasonable discrimination in women (AUC 0.59). The discrimination for the individual cancers ranged from very good for lung cancer (AUC over 0.8 in both men and women), to poor for breast cancer (AUC 0.56). Overall calibration, as assessed graphically, was reasonable. The models could be used to identify those most likely to benefit from lifestyle interventions and to demonstrate the impact of change to individuals and comparison of their risk with others to contribute to decisions to change behavior.
Strengths and limitations
To our knowledge, these are the first risk models incorporating only modifiable factors alongside age and sex that have been developed for multiple cancers in a UK population. Particular strengths include: the use of the European Code against Cancer 4th Edition to identify the risk factors associated with each of the individual cancers; the use of data from systematic reviews to obtain estimates of the relative risk for each lifestyle factor and each of the relevant cancers; the use of nationally representative datasets to obtain average values for each risk factor; and assessment of the performance of the models in a large population-based UK cohort. There are, however, a number of limitations. Firstly, except for the association between alcohol and colorectal cancer, we assumed a log-linear relationship between exposure and relative risk. This is supported by the absence of significant nonlinearity reported in many of the meta-analyses but may have influenced estimates of relative risk for extreme values of each risk factor.
Estimating the absolute 10-year risk for an individual required us to estimate the average 10-year risk at varying ages for each of the cancers. We did this using the current probability method. This involves calculating the number of cases that would occur within each age band on the basis of the person-years at risk and age-specific incidence rate (44). This has the advantage over cumulative risk estimates in that it takes into account competing risks. However, when it is calculated using routine incidence data, which includes multiple primary cancers, such as the data from ONS which we used, the method is actually estimating the average number of primary cancers per person rather than the probability of a person getting cancer. As a result, the method tends to overestimate lifetime risk of getting cancer for individual cancers. However, the differences between the estimates obtained from this method and the “gold standard” estimates have been shown to be small (48).
Advantages of using the EPIC-Norfolk cohort for validation of the models include the comprehensive phenotyping, completeness of data, and linkage to national cancer registries. However, although the cohort at baseline was representative of the Health Survey for England population for age, sex, and BMI, it had fewer current smokers (45) and participants were recruited from only one geographical region in the East of England. The models therefore need to be assessed in other populations before inferences can be made about model generalizability (49).
Comparison with existing literature
As we aimed to develop models that could be used within routine practice, we have not included all factors that are predictive of each cancer. We also specifically sought to include variables in a way that they could be collected easily and quickly. Had we included greater detail for variables, for example using pack years of smoking rather than a categorical smoking status variable, model performance may have been improved. Despite this, the discrimination for the individual cancers is comparable or better than many other published risk models. For example, the AUCs of 0.66 and 0.68 for colorectal cancer in men and women respectively are comparable with published models that also include family history and more complex variables (50); the AUC of over 0.80 for lung cancer in both men and women is better than the range of 0.61 to 0.81 reported in external validation of nine models, all of which included age and smoking behavior (51); and the AUC of 0.74 for bladder cancer in men is better than the only other published model (52) which includes smoking and exposure to diesel, aromatic amines, dry cleaning fluids, radioactive materials, and arsenic and had an AUC of 0.70 (95% CI, 0.67–0.73) in split-sample validation.
With an AUC of 0.55, the model for breast cancer has the lowest discrimination. This is consistent with the literature in which models incorporating a combination of age, age at menarche, first live birth and menopause, breast biopsy, and family history of breast cancer still only have AUCs between 0.59 and 0.67 in population-based cohorts (53, 54). Furthermore, a recent extension of the Rosner–Colditz prediction model incorporating 22 risk factors had an AUC of only 0.65 in split-sample validation (55). As described previously, the relatively weak predictive ability of these models may arise because risk factors with large associations such as mutations in the BRCA1 or BRCA2 genes have low prevalence, and risk factors such as early age at menarche or late age at first birth are common among women who never go on to develop breast cancer and have only modest associations with risk.
The AUCs for endometrial cancer (AUC 0.61) and esophageal cancer (AUC 0.65) were also lower than other published models with AUCs of 0.68 (95% CI, 0.66–0.70) in external validation (56) and 0.77 (0.68–0.85) in cross-validation (57) for endometrial cancer and 0.79 (95% CI, 0.75–0.83) in cross-validation (52) and 0.75 (95% CI, 0.66–0.84) in 10-fold cross-validation (58) for esophageal cancer. This may reflect the absence from our models of hormonal factors for endometrial cancer and reflux symptoms or nonsteroidal anti-inflammatory drugs for esophageal cancer. To our knowledge, no other risk models have been developed for kidney cancer alone.
The discrimination of the combined model for the five most common preventable cancers in men (AUC 0.71) is also comparable with the Healthy Heart Score, a lifestyle-based prediction model for cardiovascular disease (59). The Healthy Heart Score includes age, smoking status, BMI, physical activity, alcohol, and a composite diet score incorporating fiber, fruit, and vegetables, nuts, sugar-sweetened beverages, and red and processed meats. In split-sample validation within large U.S.-based cohorts, it had a Harrell's C-index of 0.77 (95% CI, 0.76–0.79) in men and 0.72 (95% CI, 0.71–0.74) in women. The performance of the combined model for the five most common preventable cancers in women (AUC 0.59) was, however, substantially poorer. This is in part due to the poor performance of the risk score for breast cancer which is the most common cancer in women and so has the greatest weighting within the combined model. As mentioned above, hormonal factors are also known to influence risk of cancer in women and are not included in our models.
Implications for clinicians, researchers, and policy makers
The risk models developed here are applicable to individuals age 40 to 79 years and enable presentation of information about the impact of lifestyle on future risk of cancer. By focusing on the cancers for which the most cases are potentially preventable through lifestyle change and modifiable risk factors which can be easily obtained without the need for laboratory or imaging tests, they could be used as the first step in a multistage risk stratification program to identify those most likely to benefit from lifestyle interventions and to motivate individuals to change their behavior. For example, a 65-year-old man who weighs 80 kg, is 1.7 m tall, is a current smoker, drinks four units of alcohol per day, eats three portions of red meat and two portions of processed meat per week and one portion or fruit and vegetables per day, and does 2 hours of moderate physical activity per week, has an estimated 10-year risk of 8%. If he lost 5 kg of weight, quit smoking, reduced alcohol to one unit per day and red and processed meat to two portions per week, increased fruit and vegetables to five portions per day, and did 3 hours of physical activity per week, his risk would reduce to 4%. Previous research has shown that public awareness of the impact of such lifestyle changes on cancer risk is low: only 3% of people are aware that being overweight can increase their risk of cancer; less than a third are aware that physical activity could help reduce risk (60–63); and one in seven people believe that lifetime risk of cancer is unmodifiable (64). Although communicating personalized risk information on its own is unlikely to lead to much sustained behavior change (65), using these risk models to illustrate the impact of lifestyle change and allow individuals to compare their risk with average people of their age and sex and the risk if they followed a recommended lifestyle may help motivate change at an individual level when combined with other established behavior change techniques (66). This would then complement wider collective approaches to shifting population distributions of behavior and risk factors.
This risk assessment could be conducted within existing healthcare and prevention services (8) or made available online. Further research is now needed to assess the performance of the models in other populations, develop a user-friendly interface in which these models can be incorporated into clinical practice, and implementation studies quantifying the potential benefits and harms of providing such information.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Disclaimer
The views expressed in this publication are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health. All researchers were independent of the funding body and the funder had no role in data collection, analysis and interpretation of data; in the writing of the report; or decision to submit the article for publication.
Authors' Contributions
Conception and design: J.A. Usher-Smith, S.J. Griffin
Development of methodology: J.A. Usher-Smith, S.J. Griffin
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R. Luben
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J.A. Usher-Smith, S.J. Sharp, S.J. Griffin
Writing, review, and/or revision of the manuscript: J.A. Usher-Smith, S.J. Sharp, R. Luben, S.J. Griffin
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): R. Luben
Study supervision: S.J. Griffin
Acknowledgments
The authors would like to acknowledge the contribution of the staff and participants of the EPIC-Norfolk Study and Professor Kay-Tee Khaw for helpful comments on the article.
This paper presents independent research funded by the National Institute for Health Research School for Primary Care Research (NIHR SPCR) (reference number 342). J.A. Usher-Smith is funded by a Cancer Research UK Prevention Fellowship (C55650/A21464). S.J. Sharp is supported by the Medical Research Council (unit program no. MC_ UU_12015/1). The University of Cambridge has received salary support in respect of S.J. Griffin from the NHS in the East of England through the Clinical Academic Reserve. EPIC-Norfolk and R. Luben are supported by the Medical Research Council program grants (G0401527 and G1000143) and Cancer Research UK program grants (C864/A8257 and C864/A14136). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.