Abstract
Clinical use of breast cancer risk prediction requires simplified models. We evaluate a simplified version of the validated Rosner–Colditz model and add percent mammographic density (MD) and polygenic risk score (PRS), to assess performance from ages 45–74. We validate using the Mayo Mammography Health Study (MMHS).
We derived the model in the Nurses' Health Study (NHS) based on: MD, 77 SNP PRS and a questionnaire score (QS; lifestyle and reproductive factors). A total of 2,799 invasive breast cancer cases were diagnosed from 1990–2000. MD (using Cumulus software) and PRS were assessed in a nested case–control study. We assess model performance using this case–control dataset and evaluate 10-year absolute breast cancer risk. The prospective MMHS validation dataset includes 21.8% of women age <50, and 434 incident cases identified over 10 years of follow-up.
In the NHS, MD has the highest odds ratio (OR) for 10-year risk prediction: ORper SD = 1.48 [95% confidence interval (CI): 1.31–1.68], followed by PRS, ORper SD = 1.37 (95% CI: 1.21–1.55) and QS, ORper SD = 1.25 (95% CI: 1.11–1.41). In MMHS, the AUC adjusted for age + MD + QS 0.650; for age + MD + QS + PRS 0.687, and the NRI was 6% in cases and 16% in controls.
A simplified assessment of QS, MD, and PRS performs consistently to discriminate those at high 10-year breast cancer risk.
This simplified model provides accurate estimation of 10-year risk of invasive breast cancer that can be used in a clinical setting to identify women who may benefit from chemopreventive intervention.
See related commentary by Tehranifar et al., p. 587
This article is featured in Highlights of This Issue, p. 585
Introduction
Breast cancer risk prediction tools are needed to more appropriately stratify risk for women undergoing routine mammographic screening. This demands tools that are easy to use, do not disrupt routine clinical practice, and reflect current science in understanding contributions of factors driving breast cancer risk. Studies show that measures of mammographic breast density (MD), polygenic risk score (PRS), and modifiable and nonmodifiable “life course risk factors” are independently related to risk for breast cancer (1, 2).
Over the past decade, our models were built from life course risk factors that include reproductive factors, adiposity at different points in life, age and type of menopause, menopausal hormone therapy, family history of breast cancer, personal history of benign breast disease, and alcohol intake (3–6). Investigators have added to these models to improve discrimination often measured as the area under the ROC curve (AUC), and include MD (3, 7–10), endogenous hormones (11), and/or PRS using a variable number of SNPs (12–18). While combinations of these factors have been evaluated (e.g., lifestyle factors and PRS or lifestyle factors and MD), models that combine measures from all three components (life course risk factors, MD, and PRS; refs. 19–21) show the highest AUC values at 0.71 (9, 22). However, comprehensive models present a greater patient burden and simpler models are needed for routine clinical use.
Following fundamental principles of model creation and validation (23, 24) we evaluate a simplified model of life course variables reduced from the validated Rosner–Colditz model (25–27) to achieve a simpler parsimonious model that retained performance characteristics of the validated model. We then add MD and PRS, to assess the combined predictive performance in the age range from 45 to 74. We use 10-year absolute risk as the prediction outcome consistent with prediction models for cardiovascular disease and the Tyrer–Cuzick model, used widely in the UK clinical setting (28, 29) and ASCO guidelines (30). We report validation using prospective data from the Mayo Mammography Health Study.
Materials and Methods
We summarize the steps to refine a simplified questionnaire-based score (based on established lifestyle and reproductive factors), then addition of MD and PRS from a case–control dataset, drawn from the Nurses' Health Study (NHS) cohort. The study cohort was established in 1976 and included 121,700 female registered nurses aged 30–55 years (31). Questionnaires were mailed to women biennially to collect information on breast cancer risk factors, including age at menarche, age at first birth, parity, family history of breast cancer, height, weight, menopausal status, age at menopause, and hormone therapy use. Alcohol consumption was assessed using a validated semiquantitative food frequency questionnaire. Women included in these analyses are either pre or postmenopausal and age 45–74 and free from cancer (other than nonmelanoma skin cancer) at the beginning of follow-up in 1990. During 10 years of follow-up time through June 2000, 2,799 cases of invasive breast cancer were diagnosed during 770,679 person years.
The study protocol was conducted in accordance with the Declaration of Helsinki and approved by the institutional review boards of the Brigham and Women's Hospital (Boston, MA) and Harvard T.H. Chan School of Public Health (Boston, MA), and those of participating registries as required. The completion and mailed return of the self-administered questionnaire were considered to imply informed consent.
Questionnaire-based score
Risk factors were defined in 1990 without updates during follow-up. Breast cancer risk variables are listed in Supplementary Table S1. We built on the model used to estimate population attributable risk for breast cancer from the NHS (32) and further reduced the pregnancy variables to only include age at first birth, removing parity, and age at menarche. Age and menopausal status were treated as combined indicator variables to account for the rate of rising incidence per year in premenopausal women, and the slower rise in incidence per year among postmenopausal women (4–6). Personal history of benign breast disease confirmed by breast biopsy and family history of breast cancer in a first-degree relative were each included. Hormone therapy was classified as past and never combined (reference group) with separate indicators for current form of therapy (estrogen alone, estrogen plus progestin, progesterone alone, and other). Height and current body mass index (BMI) were modeled as continuous variables and current alcohol as a set of indicators to better capture the increase in risk with higher intake. Risk factors in this model were assessed at baseline (1990) and not updated. The questionnaire score (QS) was developed from a Cox regression model based on these risk factors to predict 10-year risk of invasive breast cancer (1990–2000) and is given by:
where xi is the ith risk factor and βi is the corresponding regression coefficient (Table 2).
Mammographic breast density
In the NHS, MD is only available in a nested case–control dataset of women who provided blood samples in 1989–1990. That dataset includes assessment of MD using Cumulus software to measure percent mammographic density (33, 34) with reproducibility between readings of 0.90 (35). It comprises 533 cases of breast cancer and 633 controls matched on age and month of blood draw. We assessed MD for mammograms retrieved from the participants as close to blood draw (1989–1990) for DNA assessment as possible and used this as a continuous variable (33). In a sensitivity analysis, we use the BIRADS score.
This study also includes women with GWAS data. An externally derived PRS was estimated using GWAS data and consists of 77 SNPs (36). The PRS was defined as the sum of the number of risk alleles across 77 SNPs weighted by the independently estimated effect size of each SNP.
Analysis
We assess performance over 10 years of follow-up from the baseline assessment and risk classification using the nested case–control dataset. An unconditional logistic regression was run on the nested case–control dataset of the form:
where p = probability of breast cancer.
The summary risk score is given by
To estimate 10-year risk of breast cancer, we follow the approach of Gail to calibrate individual risk to the SEER population breast cancer rates (37), as applied to calibrating the breast cancer incidence model in the California Teachers Study (Supplementary Materials and Methods; ref. 25). After generating individual 10-year risk estimates, we stratify women using the UK guideline categories: below average (<2%), average (2% to <3%), above average (3 to <5%), moderately increased (5% to <8%), and high (≥8%; refs. 10, 28, 38).
Validation study
The prospective Mayo Mammography Health Study (MMHS) enrolled participants undergoing screening mammography who were residents of Minnesota, Iowa or Wisconsin, from October 2003 through September 2006 at the Mayo Clinic in Rochester, MN (39). The baseline questionnaire included risk factor data. Annual follow-up used the Mayo and tristate tumor registries and mail and/or phone contacts for those who moved outside this region. This study includes incident invasive breast cancer cases through 10 years of follow-up, and an at-risk cohort representing approximately 10% of the overall cohort, 21.8% aged <50 at enrollment. MD was assessed by an expert reader using the Cumulus program. We have previously shown comparable estimates from the NHS and MMHS studies (40). PRS was also generated from the 77 published breast cancer GWAS SNPs (2). Cases and controls for analysis were limited to 434 incident cases and 898 controls with data on QS, MD, and PRS.
Validation analysis
For each woman in MMHS, we generated a QS using the beta coefficients from the NHS (Table 2) and the woman's own risk factor status. We then summarize QS, MD, and PRS for cases and controls in the independent MMHS data (Table 4). To assess performance in the independent MMHS data, we calculate median estimated 10-year risk in each of the five risk groups in NHS then run a logistic regression of case–control status on median risk in the MMHS data. This Wald χ2 provides a test for trend on 1 df. We compare model A age + QS + MD versus model B age + QS + MD + PRS using AUC and NRI (41).
Results
In Table 1, we describe the prevalence of risk factors in the NHS included in the QS in 1990 and the number of incident invasive breast cancer cases from 1990–2000. Distributions reflect patterns for established breast cancer risk factors. Supplementary Table S2 has the baseline prevalence of risk factors for the nested case–control dataset.
. | Nurses' Health Study . | Mayo Mammography Health Study . | ||
---|---|---|---|---|
. | Cases . | Controls . | Cases . | Controls . |
Variable categories . | N (%) . | N (%) . | N (%) . | N (%) . |
N | 2,799 | 75,557 | 438 | 898 |
Age, menopausal status, N (%) | ||||
Age 45–49 | 368 (13.1%) | 13,514 (17.9%) | 51 (11.6%) | 196 (21.8%) |
Age 50–54 | 557 (19.9%) | 16,877 (22.3%) | 78 (17.8%) | 181 (20.2%) |
Age 55–59 | 634 (22.7%) | 17,010 (22.5%) | 71 (16.2%) | 155 (17.3%) |
Age 60–64 | 683 (24.4%) | 15,540 (20.6%) | 105 (24.0%) | 127 (14.1%) |
Age 65–74 | 557 (19.9%) | 12,616 (16.7%) | 133 (30.4%) | 239 (26.6%) |
Menopausal status, N (%) | ||||
Premenopausal | 502 (17.9%) | 15,551 (20.6%) | 74 (16.9%) | 222 (24.7%) |
Postmenopausal | 2,297 (82.1%) | 60,006 (79.4%) | 364 (83.1%) | 676 (75.3%) |
Duration of postmenopause, years, mean (SD)a | 10.1 (6.2) | 10.1 (6.5) | 14.5 (8.7) | 13.4 (9.0) |
Pregnancy history | ||||
Nulliparous | 204 (7.3%) | 4,254 (5.6%) | 51 (11.8%) | 116 (13.1%) |
Age 1st birth 20–24 | 1,199 (42.8%) | 36,977 (48.9%) | 255 (59.2%) | 508 (57.3%) |
Age 1st birth 25–29 | 1,071 (38.3%) | 27,367 (36.2%) | 93 (21.6%) | 183 (20.6%) |
Age 1st birth 30+ | 325 (11.6%) | 6,959 (9.2%) | 32 (7.4%) | 80 (9.0%) |
BBD (Biopsy confirmed), N (%) | 717 (25.6%) | 14,551 (19.3%) | 158 (36.1%) | 197 (21.9%) |
Family history of breast cancer, N (%) | 422 (15.1%) | 7,672 (10.2%) | 119 (27.2%) | 174 (19.4%) |
Current menopausal hormone therapy use, N (%)b | ||||
Not current (past and never users) | 1,779 (63.6%) | 52,610 (69.6%) | 331 (75.6%) | 686 (76.4%) |
Estrogen only | 466 (16.6%) | 11,302 (15.0%) | 64 (14.6%) | 132 (14.7%) |
Estrogen & Progestin | 399 (14.3%) | 7,051 (9.3%) | 30 (6.8%) | 42 (4.7%) |
Progesterone alone | 18 (0.6%) | 487 (0.6%) | 3 (0.7%) | 3 (0.3%) |
Otherc | 137 (4.9%) | 4,107 (5.4%) | 10 (2.3%) | 35 (3.9%) |
Current BMI, premenopause, N (%) | ||||
<25 | 286 (57.0%) | 8,475 (54.5%) | 34 (45.9%) | 86 (38.7%) |
25–29.9 | 147 (29.3%) | 4,390 (28.2%) | 24 (32.4%) | 69 (31.1%) |
≥30 | 69 (13.8%) | 2,686 (17.3%) | 16 (21.6%) | 67 (30.2%) |
Mean (SD) | 25.2 (4.4) | 25.8 (5.1) | 27.1 (6.8) | 28.2 (6.6) |
Current BMI, postmenopause, N (%) | ||||
<25 | 1,143 (49.8%) | 30,904 (51.5%) | 96 (26.4%) | 233 (34.5%) |
25–29.9 | 735 (32.0%) | 18,988 (31.6%) | 125 (34.3%) | 256 (37.9%) |
≥30 | 419 (18.2%) | 10,114 (16.9%) | 143 (39.3%) | 187 (27.7%) |
Mean (SD) | 26.0 (4.9) | 25.8 (4.9) | 29.2 (6.1) | 27.9 (5.9) |
Height, inches, mean (SD) | 64.7 (2.4) | 64.5 (2.4) | 64.6 (2.5) | 64.5 (2.4) |
Alcohol, N (%) | ||||
None | 1,033 (36.9%) | 29,778 (39.4%) | 230 (52.5%) | 479 (53.3%) |
<11 gm/day | 1,257 (44.9%) | 33,289 (44.1%) | 192 (43.8%) | 374 (41.6%) |
11–21.9 gm/day | 313 (11.2%) | 7,929 (10.5%) | 16 (3.7%) | 43 (4.8%) |
>22 gm/day | 196 (7.0%) | 4,561 (6.0%) | 0 | 2 (0.2%) |
. | Nurses' Health Study . | Mayo Mammography Health Study . | ||
---|---|---|---|---|
. | Cases . | Controls . | Cases . | Controls . |
Variable categories . | N (%) . | N (%) . | N (%) . | N (%) . |
N | 2,799 | 75,557 | 438 | 898 |
Age, menopausal status, N (%) | ||||
Age 45–49 | 368 (13.1%) | 13,514 (17.9%) | 51 (11.6%) | 196 (21.8%) |
Age 50–54 | 557 (19.9%) | 16,877 (22.3%) | 78 (17.8%) | 181 (20.2%) |
Age 55–59 | 634 (22.7%) | 17,010 (22.5%) | 71 (16.2%) | 155 (17.3%) |
Age 60–64 | 683 (24.4%) | 15,540 (20.6%) | 105 (24.0%) | 127 (14.1%) |
Age 65–74 | 557 (19.9%) | 12,616 (16.7%) | 133 (30.4%) | 239 (26.6%) |
Menopausal status, N (%) | ||||
Premenopausal | 502 (17.9%) | 15,551 (20.6%) | 74 (16.9%) | 222 (24.7%) |
Postmenopausal | 2,297 (82.1%) | 60,006 (79.4%) | 364 (83.1%) | 676 (75.3%) |
Duration of postmenopause, years, mean (SD)a | 10.1 (6.2) | 10.1 (6.5) | 14.5 (8.7) | 13.4 (9.0) |
Pregnancy history | ||||
Nulliparous | 204 (7.3%) | 4,254 (5.6%) | 51 (11.8%) | 116 (13.1%) |
Age 1st birth 20–24 | 1,199 (42.8%) | 36,977 (48.9%) | 255 (59.2%) | 508 (57.3%) |
Age 1st birth 25–29 | 1,071 (38.3%) | 27,367 (36.2%) | 93 (21.6%) | 183 (20.6%) |
Age 1st birth 30+ | 325 (11.6%) | 6,959 (9.2%) | 32 (7.4%) | 80 (9.0%) |
BBD (Biopsy confirmed), N (%) | 717 (25.6%) | 14,551 (19.3%) | 158 (36.1%) | 197 (21.9%) |
Family history of breast cancer, N (%) | 422 (15.1%) | 7,672 (10.2%) | 119 (27.2%) | 174 (19.4%) |
Current menopausal hormone therapy use, N (%)b | ||||
Not current (past and never users) | 1,779 (63.6%) | 52,610 (69.6%) | 331 (75.6%) | 686 (76.4%) |
Estrogen only | 466 (16.6%) | 11,302 (15.0%) | 64 (14.6%) | 132 (14.7%) |
Estrogen & Progestin | 399 (14.3%) | 7,051 (9.3%) | 30 (6.8%) | 42 (4.7%) |
Progesterone alone | 18 (0.6%) | 487 (0.6%) | 3 (0.7%) | 3 (0.3%) |
Otherc | 137 (4.9%) | 4,107 (5.4%) | 10 (2.3%) | 35 (3.9%) |
Current BMI, premenopause, N (%) | ||||
<25 | 286 (57.0%) | 8,475 (54.5%) | 34 (45.9%) | 86 (38.7%) |
25–29.9 | 147 (29.3%) | 4,390 (28.2%) | 24 (32.4%) | 69 (31.1%) |
≥30 | 69 (13.8%) | 2,686 (17.3%) | 16 (21.6%) | 67 (30.2%) |
Mean (SD) | 25.2 (4.4) | 25.8 (5.1) | 27.1 (6.8) | 28.2 (6.6) |
Current BMI, postmenopause, N (%) | ||||
<25 | 1,143 (49.8%) | 30,904 (51.5%) | 96 (26.4%) | 233 (34.5%) |
25–29.9 | 735 (32.0%) | 18,988 (31.6%) | 125 (34.3%) | 256 (37.9%) |
≥30 | 419 (18.2%) | 10,114 (16.9%) | 143 (39.3%) | 187 (27.7%) |
Mean (SD) | 26.0 (4.9) | 25.8 (4.9) | 29.2 (6.1) | 27.9 (5.9) |
Height, inches, mean (SD) | 64.7 (2.4) | 64.5 (2.4) | 64.6 (2.5) | 64.5 (2.4) |
Alcohol, N (%) | ||||
None | 1,033 (36.9%) | 29,778 (39.4%) | 230 (52.5%) | 479 (53.3%) |
<11 gm/day | 1,257 (44.9%) | 33,289 (44.1%) | 192 (43.8%) | 374 (41.6%) |
11–21.9 gm/day | 313 (11.2%) | 7,929 (10.5%) | 16 (3.7%) | 43 (4.8%) |
>22 gm/day | 196 (7.0%) | 4,561 (6.0%) | 0 | 2 (0.2%) |
aAmong postmenopausal women.
bHormone therapy status unknown or missing use were deleted.
cVaginal estrogen or vaginal progesterone or Other Brands, mixed use or Current user no brand given.
Table 2 summarizes regression coefficients, HRs, and 95% confidence interval (CI) for the variables included in the QS for 10-year risk of breast cancer. We note that among premenopausal women, incidence rises with age. Among postmenopausal women who are 45 to 49 years of age, risk is significantly lower than for their comparable aged premenopausal counterparts (HR = 0.28, 95% CI: 0.16–0.50). For postmenopausal women, risk continues to rise through age 74. At any given age for postmenopausal women, longer duration of menopause corresponds to earlier age at menopause and lower risk, HR = 0.85 (0.81–0.89) per 5 years. Other risk factors are consistent with prior studies.
Variable categories . | Unit . | Beta . | SE . | P . | HR . | 95% CI . |
---|---|---|---|---|---|---|
Age, menopausal status | ||||||
Age 45–49, pre (ref) | 1 | 1.0 | ||||
Age 50–59, pre | 1 | 0.242 | 0.091 | 0.01 | 1.27 | (1.07–1.52) |
Age 45–49, post | 1 | −1.277 | 0.279 | <0.0001 | 0.28 | (0.16–0.50) |
Age 50–54, post | 1 | −0.854 | 0.273 | 0.002 | 0.43 | (0.25–0.73) |
Age 55–59, post | 1 | −0.540 | 0.273 | 0.05 | 0.58 | (0.34–1.00) |
Age 60–64 | 1 | −0.188 | 0.276 | 0.50 | 0.83 | (0.48–1.42) |
Age 65–74 | 1 | −0.024 | 0.282 | 0.93 | 0.98 | (0.56–1.70) |
Duration of postmenopause (per 5 years) | 5 | −0.162 | 0.026 | <0.0001 | 0.85 | (0.81–0.89) |
Pregnancy history | ||||||
Nulliparous | 1 | 0.376 | 0.077 | <0.0001 | 1.46 | (1.25–1.69) |
Age 1st birth 20–24 (ref) | 1 | 1.0 | ||||
Age 1st birth 25–29 | 1 | 0.151 | 0.043 | 0.0004 | 1.16 | (1.07–1.27) |
Age 1st birth 30+ | 1 | 0.286 | 0.063 | <0.0001 | 1.33 | (1.18–1.51) |
BBD (Biopsy confirmed) | 1 | 0.318 | 0.044 | <0.0001 | 1.37 | (1.26–1.50) |
Family history BRCN | 1 | 0.292 | 0.060 | <0.0001 | 1.34 | (1.19–1.51) |
Current PMH usea | ||||||
No (ref) | 1 | 1.0 | ||||
Estrogen alone | 1 | 0.292 | 0.055 | <0.0001 | 1.34 | (1.20–1.49) |
E&P | 1 | 0.573 | 0.060 | <0.0001 | 1.77 | (1.58–1.99) |
Progesterone alone | 1 | 0.199 | 0.238 | 0.40 | 1.22 | (0.77–1.95) |
Otherb | 1 | 0.041 | 0.090 | 0.65 | 1.04 | (0.87–1.24) |
Current BMI (per 8 kg/m2)c | ||||||
Premenopausal | 8 | −0.122 | 0.075 | 0.101 | 0.89 | (0.77–1.02) |
Postmenopausal | 8 | 0.136 | 0.034 | <0.0001 | 1.15 | (1.07–1.23) |
Height (per 6 inches) | 6 | 0.198 | 0.047 | <0.0001 | 1.22 | (1.11–1.34) |
Current alcohol | ||||||
None (ref) | 1 | 1.0 | ||||
<11 gm/day | 1 | 0.085 | 0.042 | 0.05 | 1.09 | (1.00–1.18) |
11–21.9 gm/day | 1 | 0.105 | 0.07 | 0.12 | 1.11 | (0.98–1.26) |
>22 gm/day | 1 | 0.195 | 0.079 | 0.01 | 1.22 | (1.04–1.42) |
Variable categories . | Unit . | Beta . | SE . | P . | HR . | 95% CI . |
---|---|---|---|---|---|---|
Age, menopausal status | ||||||
Age 45–49, pre (ref) | 1 | 1.0 | ||||
Age 50–59, pre | 1 | 0.242 | 0.091 | 0.01 | 1.27 | (1.07–1.52) |
Age 45–49, post | 1 | −1.277 | 0.279 | <0.0001 | 0.28 | (0.16–0.50) |
Age 50–54, post | 1 | −0.854 | 0.273 | 0.002 | 0.43 | (0.25–0.73) |
Age 55–59, post | 1 | −0.540 | 0.273 | 0.05 | 0.58 | (0.34–1.00) |
Age 60–64 | 1 | −0.188 | 0.276 | 0.50 | 0.83 | (0.48–1.42) |
Age 65–74 | 1 | −0.024 | 0.282 | 0.93 | 0.98 | (0.56–1.70) |
Duration of postmenopause (per 5 years) | 5 | −0.162 | 0.026 | <0.0001 | 0.85 | (0.81–0.89) |
Pregnancy history | ||||||
Nulliparous | 1 | 0.376 | 0.077 | <0.0001 | 1.46 | (1.25–1.69) |
Age 1st birth 20–24 (ref) | 1 | 1.0 | ||||
Age 1st birth 25–29 | 1 | 0.151 | 0.043 | 0.0004 | 1.16 | (1.07–1.27) |
Age 1st birth 30+ | 1 | 0.286 | 0.063 | <0.0001 | 1.33 | (1.18–1.51) |
BBD (Biopsy confirmed) | 1 | 0.318 | 0.044 | <0.0001 | 1.37 | (1.26–1.50) |
Family history BRCN | 1 | 0.292 | 0.060 | <0.0001 | 1.34 | (1.19–1.51) |
Current PMH usea | ||||||
No (ref) | 1 | 1.0 | ||||
Estrogen alone | 1 | 0.292 | 0.055 | <0.0001 | 1.34 | (1.20–1.49) |
E&P | 1 | 0.573 | 0.060 | <0.0001 | 1.77 | (1.58–1.99) |
Progesterone alone | 1 | 0.199 | 0.238 | 0.40 | 1.22 | (0.77–1.95) |
Otherb | 1 | 0.041 | 0.090 | 0.65 | 1.04 | (0.87–1.24) |
Current BMI (per 8 kg/m2)c | ||||||
Premenopausal | 8 | −0.122 | 0.075 | 0.101 | 0.89 | (0.77–1.02) |
Postmenopausal | 8 | 0.136 | 0.034 | <0.0001 | 1.15 | (1.07–1.23) |
Height (per 6 inches) | 6 | 0.198 | 0.047 | <0.0001 | 1.22 | (1.11–1.34) |
Current alcohol | ||||||
None (ref) | 1 | 1.0 | ||||
<11 gm/day | 1 | 0.085 | 0.042 | 0.05 | 1.09 | (1.00–1.18) |
11–21.9 gm/day | 1 | 0.105 | 0.07 | 0.12 | 1.11 | (0.98–1.26) |
>22 gm/day | 1 | 0.195 | 0.079 | 0.01 | 1.22 | (1.04–1.42) |
aPMH status unknown or missing hormone use were deleted.
bVaginal estrogen or vaginal progesterone or Other Brands, mixed use or Current user no brand given.
cCentered at 25 kg/m2.
We used the coefficients from the full model (Table 2) to estimate the QS and then ran a logistic regression in the nested case–control dataset of breast cancer risk on QS, MD, and PRS (33). The odds ratio (OR) per SD is presented (Table 3). We see in the mutually adjusted model that MD has the highest OR: ORper SD = 1.48 (95% CI: 1.31–1.68) followed by PRS, ORper SD = 1.37 (95% CI: 1.21–1.55) and QS, ORper SD = 1.25 (95% CI:1.11–1.41). These component measures also have low correlations with each other; Spearman correlation (rS): (QS vs. MD, 0.054; QS vs. PRS, 0.024; MD vs. PRS, 0.085), consistent with similarity between marginal and mutually adjusted ORs. Furthermore, there was no significant interaction among QS, MD, and PRS in their effects on incident breast cancer.
Variable . | SD . | Beta . | SE . | P . | ORb . | 95% CI . | |
---|---|---|---|---|---|---|---|
Marginal results | |||||||
Questionnaire score | 0.3682 | 0.6573 | 0.1631 | <0.0001 | 1.27 | 1.13 | 1.43 |
Percent mammographic density | 19.2902 | 0.0219 | 0.0032 | <0.0001 | 1.53 | 1.35 | 1.72 |
Polygenic risk score | 0.9933 | 0.3432 | 0.0696 | <0.0001 | 1.41 | 1.25 | 1.59 |
Mutually adjusted results | |||||||
Questionnaire score | 0.3682 | 0.6095 | 0.1682 | 0.0003 | 1.25 | 1.11 | 1.41 |
Percent mammographic density | 19.2902 | 0.0204 | 0.00325 | <0.0001 | 1.48 | 1.31 | 1.68 |
Polygenic risk score | 0.9933 | 0.3154 | 0.0631 | <0.0001 | 1.37 | 1.21 | 1.55 |
Variable . | SD . | Beta . | SE . | P . | ORb . | 95% CI . | |
---|---|---|---|---|---|---|---|
Marginal results | |||||||
Questionnaire score | 0.3682 | 0.6573 | 0.1631 | <0.0001 | 1.27 | 1.13 | 1.43 |
Percent mammographic density | 19.2902 | 0.0219 | 0.0032 | <0.0001 | 1.53 | 1.35 | 1.72 |
Polygenic risk score | 0.9933 | 0.3432 | 0.0696 | <0.0001 | 1.41 | 1.25 | 1.59 |
Mutually adjusted results | |||||||
Questionnaire score | 0.3682 | 0.6095 | 0.1682 | 0.0003 | 1.25 | 1.11 | 1.41 |
Percent mammographic density | 19.2902 | 0.0204 | 0.00325 | <0.0001 | 1.48 | 1.31 | 1.68 |
Polygenic risk score | 0.9933 | 0.3154 | 0.0631 | <0.0001 | 1.37 | 1.21 | 1.55 |
aBased on 533 cases and 633 controls.
bOR per SD.
We assessed performance of the model both in the NHS and MMHS. (see Table 4). Cases have significantly higher mean scores for 10-year risk than controls both for each component and the overall risk score. The AUC for the overall model adjusted for age was 0.523, for age + MD 0.629, for age + MD + QS 0.637, and for age + MD + QS + PRS 0.658. There was no significant interaction between menopausal status and the components of the overall risk score. Repeating analysis using BIRADS for breast density gave comparable AUC values (see Supplementary Table S3).
A. Nurses' Health Study . | |||||||
---|---|---|---|---|---|---|---|
Category . | . | Cases . | . | Controls . | . | Pa . | AUC C-Statisticb . |
Mean ± SD | N | Mean ± SD | N | 0.658 | |||
Overall | Questionnaire score | 2.35 ± 0.38 | 533 | 2.26 ± 0.36 | 633 | <0.0001 | |
Percent mammographic density | 32.28 ± 19.72 | 24.35 ± 18.16 | <0.0001 | ||||
Polygenic risk score | 0.15 ± 0.98 | −0.18 ± 0.98 | <0.0001 | ||||
Overall risk score | 2.13 ± 0.58 | 1.82 ± 0.55 | <0.0001 | ||||
Age only | 0.523 | ||||||
Age + percent density | 0.629 | ||||||
Age + percent density + questionnaire score | 0.637 | ||||||
Age + percent density + questionnaire score + polygenic risk score | 0.658 |
A. Nurses' Health Study . | |||||||
---|---|---|---|---|---|---|---|
Category . | . | Cases . | . | Controls . | . | Pa . | AUC C-Statisticb . |
Mean ± SD | N | Mean ± SD | N | 0.658 | |||
Overall | Questionnaire score | 2.35 ± 0.38 | 533 | 2.26 ± 0.36 | 633 | <0.0001 | |
Percent mammographic density | 32.28 ± 19.72 | 24.35 ± 18.16 | <0.0001 | ||||
Polygenic risk score | 0.15 ± 0.98 | −0.18 ± 0.98 | <0.0001 | ||||
Overall risk score | 2.13 ± 0.58 | 1.82 ± 0.55 | <0.0001 | ||||
Age only | 0.523 | ||||||
Age + percent density | 0.629 | ||||||
Age + percent density + questionnaire score | 0.637 | ||||||
Age + percent density + questionnaire score + polygenic risk score | 0.658 |
B. Mayo Mammography Health Study . | |||||||
---|---|---|---|---|---|---|---|
Category . | . | Cases . | . | Controls . | . | Pa . | AUC C-Statisticb . |
Mean ± SD | N | Mean ± SD | N | 0.687 | |||
Overall | Questionnaire score | 1.97 ± 0.46 | 438 | 1.90 ± 0.49 | 898 | 0.0063 | |
Percent mammographic density | 18.36 ± 12.41 | 15.45 ± 11.17 | <0.0001 | ||||
Polygenic risk score | 0.25 ± 0.97 | −0.15 ± 0.98 | <0.0001 | ||||
Overall risk score | 1.62 ± 0.51 | 1.39 ± 0.51 | <0.0001 | ||||
Age only | 0.595 | ||||||
Age + percent density | 0.635 | ||||||
Age + percent density + questionnaire score | 0.650 | ||||||
Age + percent density + questionnaire score + polygenic risk score | 0.687 |
B. Mayo Mammography Health Study . | |||||||
---|---|---|---|---|---|---|---|
Category . | . | Cases . | . | Controls . | . | Pa . | AUC C-Statisticb . |
Mean ± SD | N | Mean ± SD | N | 0.687 | |||
Overall | Questionnaire score | 1.97 ± 0.46 | 438 | 1.90 ± 0.49 | 898 | 0.0063 | |
Percent mammographic density | 18.36 ± 12.41 | 15.45 ± 11.17 | <0.0001 | ||||
Polygenic risk score | 0.25 ± 0.97 | −0.15 ± 0.98 | <0.0001 | ||||
Overall risk score | 1.62 ± 0.51 | 1.39 ± 0.51 | <0.0001 | ||||
Age only | 0.595 | ||||||
Age + percent density | 0.635 | ||||||
Age + percent density + questionnaire score | 0.650 | ||||||
Age + percent density + questionnaire score + polygenic risk score | 0.687 |
aBased on t test.
bBased on logistic regression including strata of age and then additional risk factors as noted.
We next assessed the distribution of cases and controls according to the 10-year estimated risk of breast cancer (Table 5). Using the UK National Health Service cut points for 10-year breast cancer risk, the very high-risk category (8% or higher 10-year risk) includes 9% of the controls and 18% of the cases. The lowest category (10-year risk less than 2%) includes 19% of the controls and 9% of the cases. This represents an OR of 4.0 consistent with the ability of the model to separate the distribution of risk in cases and noncases when predicting 10-year risk of breast cancer. Figure 1 presents these distributions demonstrating separation of controls to the left and cases to the right.
. | Nurses' Health Study . | Mayo Mammography Health Study . | . | ||
---|---|---|---|---|---|
10-year expected incidence of breast cancer . | Case (N = 533) . | Control (N = 633) . | Cases (N = 438) . | Controls (N = 898) . | Model performance in MMHS cohort . |
<0.02 (below average) | 48 (9%) | 119 (19%) | 28 (6%) | 172 (19%) | |
0.02–0.029 (average) | 91 (17%) | 162 (26%) | 61 (14%) | 220 (24%) | |
0.03–0.049 (above average) | 161 (30%) | 188 (30%) | 134 (31%) | 290 (32%) | |
0.05–0.079 (moderately increased) | 137 (26%) | 104 (16%) | 148 (34%) | 159 (18%) | Χ2 = 85.83 |
0.08+ (high) | 96 (18%) | 60 (9%) | 67 (15%) | 57 (6%) | P < 0.001 |
. | Nurses' Health Study . | Mayo Mammography Health Study . | . | ||
---|---|---|---|---|---|
10-year expected incidence of breast cancer . | Case (N = 533) . | Control (N = 633) . | Cases (N = 438) . | Controls (N = 898) . | Model performance in MMHS cohort . |
<0.02 (below average) | 48 (9%) | 119 (19%) | 28 (6%) | 172 (19%) | |
0.02–0.029 (average) | 91 (17%) | 162 (26%) | 61 (14%) | 220 (24%) | |
0.03–0.049 (above average) | 161 (30%) | 188 (30%) | 134 (31%) | 290 (32%) | |
0.05–0.079 (moderately increased) | 137 (26%) | 104 (16%) | 148 (34%) | 159 (18%) | Χ2 = 85.83 |
0.08+ (high) | 96 (18%) | 60 (9%) | 67 (15%) | 57 (6%) | P < 0.001 |
Validation
From MMHS data we note in Table 1 that the women are slightly older with more cases and controls in the 65–74 year age range than in the NHS. The MMHS includes more nulliparous women and also a higher proportion with first birth before age 25, and fewer with first birth after age 30 than in the NHS. They also have a higher proportion of women with history of benign breast disease and family history of breast cancer. Fewer women currently use menopausal hormone therapy, more postmenopausal women are obese, and alcohol intake is lower than among the NHS.
Using the beta-coefficients from the NHS, the estimated QS is lower than in the NHS, although the mean difference between cases and controls in the MMHS (0.07, P = 0.006) is similar to the mean difference in the NHS (0.08, P < 0.0001; Table 4). MD is lower overall in the MMHS, but the case–control differences are statistically significant, and the PRS is also higher in cases than controls with significant differences overall. The overall AUC is 0.687, comparable with 0.658 in the derivation cohort. Evaluating AUC in MMHS as variables are added, we observed that AUC adjusted for age was 0.595; for age + MD 0.635; for age + MD + QS 0.650; for age + MD + QS + PRS 0.687.
Stratifying women by their overall 10-year estimated breast cancer risk category (Table 5) shows comparable cases and controls in the below average 10-year risk category in MMHS (6% of cases, 19% of controls) and NHS (9% of cases and 19% controls). Estimated risk increased significantly with increasing risk category in the MMHS, Wald X2 = 85.83, P < 0.001. Similar distributions of women in the moderately increased and high-risk categories are seen across both studies. Figure 1A and B demonstrate the separation of the distribution of risk among cases and controls in the two cohorts. The NRI comparing model A age + QS + MD versus model B age + QS + MD + PRS was 6% in cases and 16% in controls (Supplementary Table S4).
Discussion
A simplified lifestyle assessment of breast cancer risk factors, combined with MD and PRS, performs consistently to discriminate those at high risk of subsequent breast cancer from those who will remain free from breast cancer over 10 years. This combination of three established predictors of breast cancer risk is robust and after controlling for age, it identifies 18% of cases and 9% of controls coming from the high-risk category (>8% 10-year risk). Equally important, only 9% of cases and 19% of controls arise from those at low risk of breast cancer in the next ten years (<2% 10-year risk). This separation of absolute 10-year risk for cases and controls holds up in an independent validation dataset of over 400 incident invasive breast cancer cases. The AUC is 0.687 in the MMHS validation dataset with an NRI of 6% in cases and 16% in controls for Model A versus Model B. Stratification of risk using a model that integrates questionnaire risk factors, PRS and MD could usefully be adapted to clinical practice to modify frequency and intensity of screening programs.
We note that assessment in colorectal cancer also can perform adequately in the context of simplified assessment by dichotomizing or reducing respondent burden with simpler assessment (42). Truncating assessment complexity does not reduce performance while it increases ease of use (43–47).
MD shows only modest decay over time. The longitudinal correlation of MD over 10 years is estimated at 0.91 (48), and peak percent MD is at ages 35–39 (49). Thus, a one-time assessment is adequate for long-term risk prediction (50). In addition, germline SNPs are unchanging, equally supporting the need for only a one-time assessment. The number of SNPs in PRS has increased over time from 77 (36) to 313 (15) SNPs with modest change in the AUC. Here, we use the lower count as a conservative PRS, but expect similar findings with the expanded PRS, though generating a PRS comprising 300 or more SNPs (15) remains problematic in routine screening clinics.
Risk models continue to pursue dual measures of success—better model performance based on discrimination and easier adaptation to clinical application. We have shown that in an independent validation dataset the Rosner–Colditz model of life course questionnaire risk factors outperforms the Gail model (25). Compared with the Rosner–Colditz model, both the Gail and Tyrer–Cuzick models overpredict risk in high-risk women and underpredict risk in low-risk women (27). Both models perform less well than the Rosner–Colditz model, and all show better discrimination among women under age 70 (27). Assessing 10-year performance of BOADICEA, BRCAPRO, the Breast Cancer Risk Assessment Tool (BCRAT), and IBIS model, the BCRAT model underpredicts risk as does BRCAPRO; and the BOADICEA, BCRAT, and IBIS models overpredict in the highest risk subgroup of women, while the models are overall well calibrated with AUC of 0.71 or lower (51). That analysis used data from the Breast Cancer Prospective Family Study Cohort of women without breast cancer at baseline but identified from families oversampled for early-onset breast cancer (52).
We use SEER breast cancer incidence data from 1995 for the population estimates of average risk. This corresponds to the midpoint of the 10-year follow-up for the NHS. This anchoring could be readily updated to more recent time periods for routine clinical use.
We, like others (53), see no interaction between MD and PRS attesting to the value of the combined measures to maximize discrimination. The BOADICEA implementation of similar combinations of lifestyle, PRS, and MD gives a broad distribution of risk for the UK population (54). Moving to routine clinical use both to increase screening in high-risk women, and reduce screening and surveillance in low-risk women, looks increasingly possible. The NRI of 16% in controls supports adding PRS to the risk model. This is consistent with the growing call for more widespread use of PRS in routine clinical care (55, 56) and continuing refinement to screening recommendations that use risk of disease to modify frequency and modality for screening (57). Saliva DNA can be collected prior to baseline mammogram to generate PRS.
Current U.S. guidelines vary, but in general recommend screening mammography from age 45 (American Cancer Society; ref. 58) or 50 (US Preventive Services Task Force; ref. 59), with either annual or biennial mammography (60). Assessment and screening decisions should include shared decision making as recommended by numerous guidelines. This assumes a reliable risk assessment and shared decision-making tools. These recommendations apply to average risk women, defined by USPSTF to include women with a history of breast cancer in a single family member and women with dense breasts.
In summary, the combination of QS, MD, and PRS predicts 10-year risk of invasive breast cancer. Now validated, this model can be readily implemented in routine clinical services to identify women at higher risk of breast cancer and those who are at low risk and unlikely to benefit from chemoprevention interventions.
Authors' Disclosures
B. Rosner reported grants from NIH during the conduct of the study. R.M. Tamimi reported grants from NIH/NCI during the conduct of the study. P. Kraft reported grants from National Institutes of Health during the conduct of the study. Y. Mu reported grants from NIH during the conduct of the study. C.M. Vachon reported grants from NCI during the conduct of the study. G.A. Colditz reported grants from Breast Cancer Research Foundation during the conduct of the study; personal fees from GRAIL Inc outside the submitted work; in addition, G.A. Colditz had a patent for Up To Date author with royalties paid. No other disclosures were reported.
Disclaimer
The authors assume full responsibility for analyses and interpretation of these data. The funding agencies had no role in the construction or the writing of this manuscript.
Authors' Contributions
B. Rosner: Conceptualization, data curation, formal analysis, validation, methodology, writing–original draft, writing–review and editing. R.M. Tamimi: Validation, investigation. P. Kraft: Validation, investigation. C. Gao: Formal analysis, investigation. Y. Mu: Software, validation. C. Scott: Investigation. S.J. Winham: Validation, investigation. C.M. Vachon: Investigation. G.A. Colditz: Conceptualization, data curation, formal analysis, validation, investigation, writing–review and editing.
Acknowledgments
The authors would like to thank the participants and staff of the Nurses' Health Study for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY. This project was funded by a NIH cohort infrastructure grant (UM1 CA186107), and a program project grant (P01 CA87969) from the NCI, and by a grant (BCRF 20-028) to Dr. Colditz from the Breast Cancer Research Foundation. Mayo Mammography Health Study was funded by NCI R01 CA97396, R01 CA177150, and Mayo Clinic Cancer Center (Rochester, MN).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.