Abstract
SNP risk information can potentially improve the accuracy of breast cancer risk prediction. We aim to review and assess the performance of SNP-enhanced risk prediction models.
Studies that reported area under the ROC curve (AUC) and/or net reclassification improvement (NRI) for both traditional and SNP-enhanced risk models were identified. Meta-analyses were conducted to compare across all models and within similar baseline risk models.
Twenty-six of 406 studies were included. Pooled estimate of AUC improvement is 0.044 [95% confidence interval (CI), 0.038–0.049] for all 38 models, while estimates by baseline models ranged from 0.033 (95% CI, 0.025–0.041) for BCRAT to 0.053 (95% CI, 0.018–0.087) for partial BCRAT. There was no observable trend between AUC improvement and number of SNPs. One study found that the NRI was significantly larger when only intermediate-risk women were included. Two other studies showed that majority of the risk reclassification occurred in intermediate-risk women.
Addition of SNP risk information may be more beneficial for women with intermediate risk.
Screening could be a two-step process where a questionnaire is first used to identify intermediate-risk individuals, followed by SNP testing for these women only.
This article is featured in Highlights of This Issue, p. 425
Introduction
Breast cancer is the most common cancer among women and is rising in incidence worldwide (1, 2). In this era of precision medicine, there is interest in applying tailored breast cancer screening and prevention strategies based on a woman's specific risk (3). Many risk factors have been identified and risk prediction models developed to quantify the combined effect of these factors (4). These models can be used to estimate a woman's individual risk, advise patients, inform screening and direct breast cancer research (4, 5). The models include the Breast Cancer Risk Assessment Tool (BCRAT; ref. 6), International Breast Intervention Study (IBIS) Breast Cancer Risk Evaluation Tool (also known as Tyrer–Cuzick model; ref. 7), BRCAPRO developed by Parmigiani and colleagues (8), Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA; refs. 9, 10) and the Breast Cancer Surveillance Consortium (BCSC) Breast Cancer Risk Calculator (11, 12).
Over the past few years, rapid expansion of next generation DNA sequencing has led to an increasing discovery of breast cancer predisposition genes beyond BRCA1 and BRCA2. For example, Shimelis and colleagues identified six other germline mutations (BARD1, PALB2, RAD51D, BRIP1, RAD51C, and TP53) that were associated with moderate to high risk of triple-negative breast cancer, a notoriously aggressive subtype, in a case–control study (13). Still, these genes account for only a small fraction of familial (14) and sporadic breast cancer (15, 16), leaving most of it unexplained. Research suggests that much of the missing heritability could be polygenic (14). A single SNP is associated with only low to moderate breast cancer risk. When combined, their effects could be significant as they occur at higher frequencies than high-penetrance mutations (17).
Over the past decade, many common, low-penetrance risk alleles for breast cancer have been identified by genome-wide association studies (GWAS) (14, 18, 19). The GWAS catalog, a freely available database of published SNP-trait associations lists about 53 studies and 1,272 associations related to breast cancer (20). As many as 182 SNPs have been identified as being associated with breast cancer, increasing our understanding of its heritability (21). These SNPs have not been included in existing breast cancer risk prediction models except for BCRAT, but there is intention to include them in future versions (22), as researchers investigate the potential of adding these SNPs to traditional risk prediction models, which results in SNP-enhanced risk prediction models.
Breast cancer screening aims for early detection of the disease and reduction in the associated mortality (23). SNP-enhanced risk prediction models may be able to estimate breast cancer risk more accurately and can translate into a more efficient risk-based screening program. Potential risk-adapted screening practices include starting screening later in women with lower risk, increasing the recommended screening interval for these women or possibly not inviting them for screening at all (24). High-risk women could be recommended to begin screening at an earlier age, attend annual screening (25) or even engage in supplemental screening beyond mammography (26).
A couple of reviews have studied the performance of breast cancer risk prediction models without SNPs. Anothaisintawee and colleagues (27) found that discriminatory accuracy was poor to fair in both internal validation [concordance statistic (c-statistic): 0.53–0.66] and external validation (c-statistic: 0.56–0.63). This could be due to insufficient knowledge about risk factors, heterogeneous nature of breast cancer, and varying distributions of risk factors across populations. Another review by Meads and colleagues (28) found that none of the models reliably discriminate between those who did and did not develop breast cancer. Models such as BRCAPRO, BOADICEA and IBIS, which are preferred for women who come from families known to have a BRCA1 or BRCA2 mutation were not included. One other review by Evans and colleagues (29) assessed discriminatory accuracy of models using data from 1,933 women who attended the Family History Evaluation and Screening Program in United Kingdom. AUC was 0.716 for Claus model, 0.735 for BCRAT, 0.737 for BRCAPRO and 0.762 for IBIS model.
In this systematic review and meta-analysis, we aim to identify existing SNP-enhanced breast cancer risk prediction models and assess their performance, measured by discriminatory accuracy and improvement in predictive ability (AUC and NRI). The extent of improvement in performance, from the addition of genetic information in the SNP-enhanced risk models, was also evaluated.
Materials and Methods
This review is written in accordance to the PRISMA guidelines (30, 31).
Literature search strategy
A literature search of the EMBASE, Scopus, and PubMed databases was completed on January 31, 2018, without a specified time frame for published articles. The search strategy utilized three keywords: “breast cancer,” “single nucleotide polymorphism,” “risk prediction,” and their synonyms. The exact search terms used are shown in Supplementary Table S1 of the Online Supplementary Material.
Study selection
Only studies published in English were considered. Studies were included if they fulfilled the following criteria: (i) empirical studies assessing risk prediction models for breast cancer in women, which reported outcomes using AUC or its equivalent–c-statistic and/or NRI; (ii) compared between SNP-enhanced risk prediction models and traditional risk prediction models. Studies excluded were genome-wide association studies, reviews, narratives, prognostic or diagnostic studies, model development studies, nonrisk prediction studies and studies that included only BRCA mutation carriers. Full texts of relevant studies were independently screened by three reviewers (Shi Xun Lee, Xin Yi Wong, and Si Ming Fung) for inclusion. Any disagreement was resolved, and consensus was reached among the reviewers.
Discriminatory accuracy and improvement in predictive ability of model
The primary outcomes in this review are AUC and NRI. The ROC curve is a plot of true positive rate against false positive rate. AUC is interpreted as the probability of assigning a higher predicted risk to a randomly selected individual with the outcome of interest than another randomly selected individual without the outcome (32) and provides a measure of discriminatory accuracy of the risk prediction model. A guideline on interpreting AUC values by Swets (33) has been widely used. It classifies the discriminatory accuracy of models into the following categories: noninformative (AUC = 0.5), low accuracy (0.5 < AUC < 0.7), moderate accuracy (0.7 < AUC < 0.9), high accuracy (0.9 < AUC < 1.0) and perfect accuracy (AUC = 1.0). However, the increase in AUC for a new model is often of a very small magnitude (34) and has been criticized as an insensitive measure of model improvement (35–37). Hence, Pencina and colleagues (34) proposed a novel index, NRI, for evaluating the improvement in predictive ability of a new model over an old model by analyzing the reclassification of subjects. NRI is defined in the following equation,
where P(x) represents predicted probabilities and upward movement (up) as a change into higher risk category based on the new model and downward movement (down) as a change in the opposite direction. Positive NRI values indicate an overall correct reclassification in the new model as cases were moved upwards to higher risk thresholds and controls moved downwards to lower risk thresholds (34).
NRI has been previously computed and analyzed in studies that predicted risk of cardiovascular disease (38, 39), atrial fibrillation (40), diabetes mellitus (41, 42), end-stage liver disease (43) and end-stage renal disease (44). In the study by Paynter and colleagues (39), addition of a genotype to a prediction model based on traditional risk factors, high-sensitivity C-reactive protein and family history of premature myocardial infarction had no effect on c-index, a generalization of AUC (0.807 to 0.809) nor showed improvement in net reclassification (NRI −0.2%, P = 0.59). Yet, in the study by Lubitz and colleagues (40), a significant improvement in c-statistic contributed by premature familial atrial fibrillation [0.842 (95% confidence interval (CI), 0.826–0.858) to 0.846 (95% CI, 0.831–0.862), P = 0.004] did not correspond to a significant NRI [0.011 (95% CI, −0.021–0.042), P = 0.51].
Data extraction
The following data were extracted from the studies: publication year, number of case and control participants, age and ethnicity of participants, breast cancer subtype or hormone receptor status of case participants, country of study, study design, risk factors considered in the models, number of SNPs and the loci of SNPs included, method of incorporating SNPs in SNP-enhanced models, measure for evaluating discriminatory accuracy, and improvement in predictive ability.
Standard of reporting and quality assessment
Studies were evaluated against the 25-item checklist provided by the “Strengthening the Reporting of Genetic RIsk Prediction Studies” (GRIPS) statement (Supplementary Table S2A, Online Supplementary Material; ref. 45). The GRIPS statement suggested a standard reporting guideline for genetic risk prediction studies, but it does not serve as a quality assessment for studies.
We used the Newcastle-Ottawa Quality Assessment Scale (NOS; Supplementary Table S3A, Online Supplementary Material; ref. 46) to assess quality. NOS was identified by Cochrane (47) as a tool to assess methodological quality or risk of bias in non-randomized studies. It was based on three categories: selection of cases and controls, comparability of cases and controls and ascertainment of the exposure of interest (46). NOS scores range from 0 to 9. To the best of our knowledge, there are no established cut-offs for low, moderate and high quality. Hence, we have relied on previous literature (48) to define low quality as a score ≤5, moderate quality as a score between 6 and 7 and high quality as a score between 8 and 9.
Statistical analyses
We extracted the number of case and control participants, the improvement in AUC (AUC of SNP-enhanced model minus AUC of baseline risk model without SNPs), overall AUC of SNP-enhanced model and the corresponding 95% confidence interval (CI) for our meta-analyses. These were conducted to pool studies by similar baseline risk models. We grouped the baseline risk prediction models into six groups: (i) BCRAT (5–6 risk factors of BCRAT), (ii) partial BCRAT (2–4 risk factors of BCRAT), (iii) partial BCRAT with additional risk factors, (iv) BCSC, (v) IBIS and (vi) other models. Forest plots were simultaneously obtained to visualize the trends, within each baseline model group.
A preformatted datasheet developed by Neyeloff and colleagues (49) was used to perform the meta-analyses and forest plots. The method suggested by Hanley and McNeil (32, 50) was used to calculate the standard error for the AUC of the studies in the datasheet. Heterogeneity was assessed using I2-statistic to assess the extent of variation between study population estimates. If significant heterogeneity was present (I2-statistic ≥ 50%), the random-effects model will be applied (51, 52).
Results
Figure 1 presents the literature search process. A total of 259 unique studies were identified from the databases. Reviewing of titles and abstracts yielded 84 potentially eligible articles of which 20 were included after a full-text review. Six more studies were identified from the references of relevant studies and 26 studies were included in the quality assessment and systematic review. One study did not evaluate AUC (53), so the remaining 25 studies were included in the meta-analysis.
Study population characteristics
Table 1 summarizes the study population characteristics, risk prediction model characteristics and model performance of the 26 studies which evaluated the discriminatory accuracy and/or predictive ability of baseline risk models and SNP-enhanced models. Supplementary Table S4 (Online Supplementary Material) provides more detailed information about the method of incorporating SNPs in model. The studies were published between 2010 to 2018. All were case-control in nature and no sample overlap was found in any study. The largest number of studies (n = 14) came from the United States (54–67). 13 studies evaluated risk prediction models among Caucasian or European populations (23, 24, 54–57, 59, 61–64, 68, 69), eight in Asian populations (53, 67, 70–75), three within mixed populations (58, 60, 65), one among Australians (76), another among African Americans and Hispanics (66). Sample sizes ranged from 324 to 37,033 women. Reported measures of central tendency of age ranged from 44.2 (mean) to 64.6 (mean) years. Most studies evaluated the risk prediction models in the general population (24, 53, 55–58, 60, 61, 63, 64, 66–68, 70, 71, 73–76) while others specifically studied non-BRCA mutation carriers (54, 62, 69), post-menopausal (23, 59, 65, 72) or pre-menopausal women (72). Most studies included only patients with invasive breast cancer as case participants (23, 53, 56–61, 63, 64, 66–70, 73, 74, 76), while six included invasive or in situ breast cancer patients (24, 54, 55, 62, 71, 75) and two focused on estrogen receptor (ER)—positive patients with breast cancer (65, 72).
Risk prediction model characteristics
The number of SNPs incorporated in the risk prediction models ranged from 2 to 92. From the 25 studies, 38 risk prediction models were selected for our meta-analysis (Supplementary Table S5). Some studies evaluated more than one risk prediction model by including different risk factors in the baseline model (23, 55, 69) or different number of SNPs in the SNP-enhanced model (56, 66, 72). Seven models used BCRAT as baseline model (55, 59, 66, 69, 75, 76) and another four used partial BCRAT (57, 63, 68, 74). 11 models used partial BCRAT with additional risk factors (23, 55, 62, 64, 67, 70–73), four used BCSC (60, 61, 65), four used IBIS (24, 66, 69) and eight used other types of baseline models (54, 56, 58, 69). Risk factors included in the various traditional baseline models are summarized in Supplementary Table S6 (Online Supplementary Material). Most were designed for use in the general population (69). IBIS was developed using data from postmenopausal women and intended for use in high-risk populations (66, 69) while BRCAPRO was developed based on studies among individuals of Ashkenazi Jewish and European ancestry for pretest BRCA mutation (77). BOADICEA was developed using complex segregation analysis of breast and ovarian cancer based on a combination of families identified through population-based studies of breast cancer, and families with multiple affected individuals who had been screened for BRCA1 and BRCA2 mutations (78).
A variety of methods for incorporating SNPs into the models were observed. Among the 38 models, seven models (54, 56, 62, 63, 68) used the number of risk alleles present (0, 1, or 2) in each SNP, to incorporate genetic information in the model. Ten models (59, 67, 69, 76) used the Mealiffe method (59) to incorporate the SNPs in the model. In the Mealiffe method, the polygenic risk score (PRS) was calculated as the product of genotype relative risk values. Risk allele frequencies and published estimates of odds ratio per allele were based on a log-additive model. Independence between non-genetic risk factors and SNP risk for breast cancer was assumed. In five models (61, 64, 67, 71, 75), the PRS was calculated as the sum of the product of the number of risk allele copies of the selected SNPs and corresponding log odds ratio. Another three models used a Bayesian approach to calculate the PRS (60, 65). In four models (24, 70, 73, 74), a genetic risk score was calculated by other ways to include genetic information in the model (Supplementary Table S4, Online Supplementary Material). Two models (23) used a multiplicative penetrance model and one model (56) used a multiple log-additive model for incorporating the SNPs. Six models (55, 57, 58, 72) used other methods to include genetic information in the model.
Standards of reporting and quality assessment
Details on the evaluation of studies against the GRIPS checklist can be found in Supplementary Table S2B (Online Supplementary Material). Many studies did not carry out internal validation or cross-validation (23, 24, 53, 55, 58, 61, 62, 64, 66–69, 72, 73, 76; Item 10). However, some used risk factors from established risk models (e.g., BCRAT) and SNPs identified from previously published GWAS and hence, the study itself might be considered a validation. Some studies did not carry out additional analyses or had stated the results without discussion (23, 53, 54, 57–60, 62, 63, 66, 67, 69, 73, 74, 76; Item 20). Still, the main results were not significantly affected, as these analyses were not considered in the primary outcome. Some studies did not state the numbers or reasons for non-participation (23, 53–56, 62, 64–67, 69–74, 76; Item 14). Most studies did not report the measures of association between risk factors and outcome except that for SNPs (23, 55, 56, 58, 61, 65–67, 73–76; Item 16). In general, majority of the studies sufficiently reported their objectives, methods, results and presented appropriate discussion and conclusion. As for the study quality, the average NOS score obtained by the studies was 6.1 (range: 2–8), out of a maximum of 9. Out of 26 studies, nine were of low quality, 12 were of moderate quality and 5 of high quality (Supplementary Table S3B). Only two studies reported that the non-response rates were the same for both cases and controls.
Discriminatory accuracy and improvement in predictive ability of model
Two models reported the AUC-equivalent c-statistic values (70, 73) while the rest provided AUC values. Using the AUC classification provided by Swets (33), four SNP-enhanced models (65, 69, 72) had moderate accuracy (0.7 < AUC < 0.9) and the other 34 SNP-enhanced models (23, 24, 54–64, 66–68, 70, 71, 73–76) showed low accuracy (0.5 < AUC < 0.7). Nonetheless, all studies showed an improvement in AUC/c-statistic from baseline risk models to SNP-enhanced models, and the improvement ranged from 0.011 to 0.15. Statistically significant improvement (P < 0.05) in either AUC or NRI was observed in 15 models (see asterisks in Table 1, last two columns).
There were 13 models (53, 56, 59, 61, 66, 69, 76) that reported the NRI of the effect of including genetic information in the SNP-enhanced models (Table 2). The reported NRI values for all the overall populations were positive, which indicates an improvement in classification when genetic information was included. However, one study found that the addition of SNPs to the baseline model did not improve classification among a subgroup of women aged 50 to 59 years (76).
Model . | No. of risk factors . | No. of SNPs . | NRI Value (95% CI)a . | Classification categories . | Author, year . |
---|---|---|---|---|---|
BCRAT (5–6 risk factors of BCRAT) | 6 | 7 | Overall: 0.085 | <1.5%, | Mealiffe, 2010 |
5 | 7 | Overall: 0.028 | 1.5%–2.0%, | Dite, 2013 | |
35–39 years: 0.021 | >2.0% | ||||
40–49 years: 0.074 | |||||
50–59 years: −0.029 | |||||
5 | 77 | 0.066 (0.019–0.110) | Dite, 2016 | ||
6 | 75 | 0.033 (0.025–0.089) | <1.5%, | Allman, 2015 | |
6 | 71 | 0.082 (0.003–0.162) | 1.5%–<2.0%, | ||
≥2.0% | |||||
Partial BCRAT (2–4 risk factors of BCRAT) | 2 + 4 | 51 | 0.062 | <1.0%, | Lee, 2014 |
1.0%–<1.5%, | |||||
1.5%–<2.0%, | |||||
2.0%–<2.5% | |||||
≥2.5% | |||||
BOADICEA | 8 | 77 | 0.040 (0.007–0.073) | <1.5%, | Dite, 2016 |
BRCAPRO | 8 | 77 | 0.063 (0.030–0.094) | 1.5%–2.0%, | |
IBIS | 12 | 77 | 0.052 (0.015–0.088) | >2.0% | |
6 | 75 | 0.060 (0.005–0.113) | <1.5%, | Allman, 2015 | |
6 | 71 | 0.181 (0.085–0.273) | 1.5%–<2.0%, | ||
≥2.0% | |||||
BCSC | 5 | 76 | Case: 0.110 (0.070–0.150), Control 0.020 (−0.010–0.050) | <3%, | Vachon, 2015 |
≥3% | |||||
Other models | 10 | 18 | 0.083 | <1%, | Hüsing, 2012 |
1%–<1.66%, | |||||
1.66%–<3.5% | |||||
>3.5% |
Model . | No. of risk factors . | No. of SNPs . | NRI Value (95% CI)a . | Classification categories . | Author, year . |
---|---|---|---|---|---|
BCRAT (5–6 risk factors of BCRAT) | 6 | 7 | Overall: 0.085 | <1.5%, | Mealiffe, 2010 |
5 | 7 | Overall: 0.028 | 1.5%–2.0%, | Dite, 2013 | |
35–39 years: 0.021 | >2.0% | ||||
40–49 years: 0.074 | |||||
50–59 years: −0.029 | |||||
5 | 77 | 0.066 (0.019–0.110) | Dite, 2016 | ||
6 | 75 | 0.033 (0.025–0.089) | <1.5%, | Allman, 2015 | |
6 | 71 | 0.082 (0.003–0.162) | 1.5%–<2.0%, | ||
≥2.0% | |||||
Partial BCRAT (2–4 risk factors of BCRAT) | 2 + 4 | 51 | 0.062 | <1.0%, | Lee, 2014 |
1.0%–<1.5%, | |||||
1.5%–<2.0%, | |||||
2.0%–<2.5% | |||||
≥2.5% | |||||
BOADICEA | 8 | 77 | 0.040 (0.007–0.073) | <1.5%, | Dite, 2016 |
BRCAPRO | 8 | 77 | 0.063 (0.030–0.094) | 1.5%–2.0%, | |
IBIS | 12 | 77 | 0.052 (0.015–0.088) | >2.0% | |
6 | 75 | 0.060 (0.005–0.113) | <1.5%, | Allman, 2015 | |
6 | 71 | 0.181 (0.085–0.273) | 1.5%–<2.0%, | ||
≥2.0% | |||||
BCSC | 5 | 76 | Case: 0.110 (0.070–0.150), Control 0.020 (−0.010–0.050) | <3%, | Vachon, 2015 |
≥3% | |||||
Other models | 10 | 18 | 0.083 | <1%, | Hüsing, 2012 |
1%–<1.66%, | |||||
1.66%–<3.5% | |||||
>3.5% |
Abbreviations: BCRAT, Breast Cancer Risk Assessment Tool (6); BCSC, Breast Cancer Surveillance Consortium (11, 12); BOADICEA, Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (9, 10); BRCAPRO (8); CI, confidence interval; IBIS, International Breast Intervention Study (7); NRI, net reclassification improvement.
aPositive NRI values indicate an overall correct reclassification.
Meta-analysis of AUC
We found no observable trend between improvement in AUC of the SNP-enhanced risk prediction model and number of SNPs in the model. In the test for trend in AUC improvement across ordered groups according to panel size (2 to 9 SNPs: small; 10 to 44 SNPs: medium and 71 to 92 SNPs: large), we found no significant association (P = 0.244). We also estimated the Pearson's correlation coefficient between AUC improvement and number of SNPs, which is −0.0305 (P = 0.856). Hence, we conclude that there is no linear correlation between AUC improvement and number of SNPs.
In Fig. 2, the pooled estimate of AUC improvement is 0.044 (95% CI, 0.038–0.049) for all 38 models, while the pooled estimates of AUC improvement by baseline models ranged from 0.033 (95% CI, 0.025–0.041; Fig. 3A) for BCRAT to 0.053 (95% CI: 0.018 to 0.087, Fig. 3B) for partial BCRAT. The overall AUC of the SNP-enhanced models generally fall short of the cut-off of 0.7 for moderate accuracy (Supplementary Fig. S7). Pooled estimates of overall AUC was highest at 0.671 (95% CI, 0.649–0.694) for BCSC models, followed by 0.653 (95% CI, 0.632–0.674) when partial BCRAT and additional risk factors model was used as the baseline, 0.643 (95% CI, 0.602–0.685) for partial BCRAT, 0.627 (95% CI, 0.608–0.645) for other models, 0.622 (95% CI, 0.591–0.654) for BCRAT model and finally 0.612 (95% CI, 0.584–0.640) for IBIS model (Supplementary Fig. S8).
Prediction of risk by breast cancer subtype
It appears that the addition of genetic information offers greater benefit when the risk models were used to predict ER-positive diseases. Among studies that used partial BCRAT and additional risk factors, Guo and colleagues (72) had the largest AUC improvement and overall AUC (Fig. 3C; Supplementary Fig. S7C). It only involved patients with ER-positive, HER-2 negative breast cancers, while other studies in the same group included invasive or in situ breast cancer cases regardless of hormone receptor status (23, 55, 62, 64, 70, 71, 73). Another study by Shieh and colleagues (65), which included only ER-positive cases, also had larger improvement in AUC compared to the other studies within the group (Fig. 3D). Hüsing and colleagues investigated breast cancer risk by subtype and the predictive quality of the models were markedly better for ER-positive cancers rather than ER-negative (56).
However, it should be noted that majority of breast cancer cases used in the discovery and validation of SNPs from GWAS thus far have been ER-positive. Most of the known breast cancer loci show differences in relative risk by subtype. In particular, 6 of the 14 loci associated with ER-negative disease at genome-wide significance show no evidence of association with ER-positive disease (79). A recent meta-analysis (80) identified four new susceptibility loci for ER-negative disease, and that brings the total count to only 23. Studies included in our review did not focus on ER-negative specific SNPs or ER-negative cancers. This is likely why the risk prediction models appear to perform better in ER-positive cancers.
Discussion
The findings of this study suggest that the addition of genetic information into traditional risk prediction models for breast cancer improved model performance, although only slightly. Four out of 38 models showed moderate discriminatory accuracy (0.7 < AUC < 0.9) (65, 69, 72). Of these, three were focused on ER-positive breast cancers and included endogenous hormones as risk factors in the baseline models (65, 72). All other studies displayed low discriminatory accuracy. All studies showed an overall improvement in discriminatory accuracy or reclassification when genetic information was added to the baseline risk prediction model, apart from a subgroup analysis among women aged 50 to 59 years (76). The greatest gain in AUC was in the study by Kaklamani and colleagues (58) using race, age and body mass index in the baseline model. They genotyped breast cancer cases and controls for four fat mass and obesity associated gene SNPs: rs7206790, rs8047395, rs9939609, and rs1477196 and found that these genotypes provided powerful classifiers to predict breast cancer risk. A model containing epistatic interactions further improved the prediction accuracy to an AUC of 0.68 (improvement of 0.15, from 0.53).
The greatest gain in NRI was in the study by Allman and colleagues (66) using the IBIS as baseline model (Table 2, Fig. 3E). They studied the extent to which clinical breast cancer risk prediction models are improved by including information on susceptibility SNPs in African American or Hispanic women. The addition of 71 SNPs resulted in an NRI of 0.181 in Hispanic women. The IBIS model includes an extensive set of risk factors and is one of the most sensitive models for detecting risk for breast cancer (81, 82). For instance, it includes extended family history, BRCA1/2 genetic status with non-genetic risk factors such as age, age at menarche, age at first live birth, age at menopause, parity, history of hormone replacement therapy use, history of atypical hyperplasia, history of lobular carcinoma in situ, height, and body mass index (7). Clinicians typically use models like BCRAT for women deemed at average risk and models like IBIS for women whose family history and genetic information indicate above average risk (83).
There was no observable trend between improvement in AUC of the SNP-enhanced risk prediction model and number of SNPs included in the model. Other factors, such as age, ethnicity, loci of SNPs, and method of incorporating SNPs may also affect risk prediction. While we have stratified the studies by baseline risk models, variation in abovementioned study characteristics still exist within each group and could have confounded the findings and limited the ability to conclude on the association between number of SNPs and improvement in AUC. Therefore, we looked at studies that compared incremental number of SNPs internally. Hüsing and colleagues (56) showed that adding SNPs using allele count method resulted in a small but steady increase in discriminatory accuracy, when the number of genetic variants increase (7, 9, and 32 SNPs; Table 1). However, when 18 SNPs were added using a multiple log-additive model, the AUC was slightly higher than that of 32 SNPs added using allele count.
In another study that included only SNPs in the risk prediction model (84), the AUC increased from 0.591 to 0.622 (10 versus 22 SNPs) and from 0.622 to 0.684 (22 versus 77 SNPS). This suggests that the more SNPs the prediction model includes, the more discriminative it becomes. However, this was not observed in the model with 153 SNPs, where AUC was 0.650, lower than that achieved by model with 77 SNPs (84). This raises the question of whether the upper limit of SNPs predictive power has been reached. Also, GWAS have been primarily designed to capture common variation, and are thus underpowered to detect the effects of rare variants (85). We may need to relook at the strategies for genetic variants identification, such as employing whole genome sequencing as the technology advances (86).
Addition of genetic information in SNP-enhanced models may not offer benefit when the risk prediction models were used among older women. Jupe and colleagues (57) reported that the improvement in AUC between SNP-enhanced model and baseline model among women aged 50 to 54 years did not reach a statistically significant increase. Another study reported that the addition of genetic information did not improve classification among 50 to 59 years, as the NRI for this subgroup was negative (76). Under the partial BCRAT baseline model group, both the improvement and overall AUC of the study by Jupe and colleagues were much higher than the others (Fig. 3B; Supplementary Fig. S7B, Online Supplementary Material). This could be due to lower age (35–39 years) as compared with the other three studies which involved older women (63, 68, 74). While this may suggest that the addition of genetic information results in better prediction among younger women, it is worth noting that these women came from Marin County, California, a region with very high incidence of breast cancer (57). Also, in a study that investigated the value of using 77 SNPs as a PRS for risk stratification, Mavaddat and colleagues (87) found that the degree of attenuation of the family history odds ratio when adjusted by PRS was lower among younger women (below 40 years), due to the higher familial relative risk in this subgroup. This suggests that rarer genetic variants may be more important at young ages.
Currently, there is no clear evidence to guide the inclusion of SNP-SNP interactions in the SNP-enhanced model. Hüsing and colleagues (56) performed individual pair-wise SNP-interaction tests and found no evidence to include genetic interaction terms into the risk models. A large-scale study assessed all 2.5 billion possible two-way interactions between 70,917 breast cancer associated-SNPs and found no significant associations with breast cancer risk (88). Still, the authors cautioned that despite the large sample size, the study might have been underpowered to detect very small interaction effects. Kaklamani and colleagues (58) reported a larger than expected improvement and overall AUC (Fig. 3F; Supplementary Fig. S7F). This study found that the inclusion of epistatic interactions that were significantly associated with breast cancer risk improved the model fit and reduced out-of-sample prediction error.
Few studies reported NRI as it was introduced recently in 2008 (34). NRI should also be used in conjunction with complementary statistical measures, such as AUC. An increase in AUC was observed for all the SNP-enhanced models and positive NRI values observed for all the overall models, which indicates overall correct reclassification by the addition of genetic information in the SNP-enhanced models. If we consider the incorporation of SNP information as a refinement to existing risk models, the benefit may be the greatest among those who are at borderline high or borderline low risk for breast cancer. This was reflected in the study by Mealiffe and colleagues (59), where the NRI value was significantly larger only when women with intermediate risk was included compared to when women with all risk categories were included. Two other studies in our review have shown that majority of the risk reclassification occurred in the group of women with intermediate risk. Shieh and colleagues (60) found that both cases and controls within the BCSC 5-year average- and intermediate-risk strata (1.00% to 1.66% and 1.67% to 2.49% respectively) were reclassified into the low-, high- and very high-risk strata, when PRS from 83 SNPs were added to the original BCSC model. Notably, the BCSCv2-PRS model classified nearly three times as many cases into the high-risk (≥3%) strata compared with the BCSCv2 model. This points towards the possibility of administering the SNPs test in women with average/intermediate risk.
Another study by Darabi and colleagues (23) found that by adding mammographic density, body mass index and genetic information from 18 SNPs, 58% of those in the intermediate risk category under the original Swe-Gail model were reclassified into low or high risk. This is higher than that in the low- and high-risk groups (24% and 41%).
A study that investigated the cost-effectiveness of a 7SNP test for breast cancer risk (89) found it most cost-effective when given to patients with an intermediate lifetime risk of breast cancer. When they limited the test to patients who are most likely to have their risk category changed, they found that testing those with an intermediate Gail risk near 20% was relatively efficient. Two issues arise when we consider the role of SNP testing. The first issue is whether the screening should be a two-step process where a questionnaire is first used to identify individuals at intermediate risk and followed by administration of SNP testing for these women only. The second issue is whether this marginal gain in discriminatory accuracy offers value for money, that is, whether SNP testing is cost-effective. There are several advantages for the two-step process. Women at higher risk of breast cancer may be more likely to take up SNP testing as it can guide prevention and surveillance strategies (90, 91). SNP testing a smaller group of women may be more cost-effective than testing the whole population. Furthermore, fewer participants would be subjected to the potential loss of privacy of their genetic information. Still, we acknowledge that the evidence for targeted SNPs testing is in the early stage and more research is warranted.
Comprehensive risk assessment in a clinic setting is limited by time constraints, leading to incomplete and variable risk evaluation (92). While a genetic test may be more efficient, it does not incorporate well-established risk factors such as hormonal and reproductive factors, family history of breast and/or ovarian cancer and mammographic density and cannot accurately predict a woman's breast cancer risk. Studies in our review shown that the AUC for SNPs-only models are typically in the 0.5 to 0.6 range, indicating low accuracy (56, 59, 60, 73, 74). An AUC of 0.5 indicates a random classification model not capable of predicting cancer.
A more opportune moment to assess breast cancer risk and inform women about risk reduction measures could be during the first mammography screening appointment, as highlighted by Evans and colleagues (22). The Predicting Risk Of Cancer At Screening study (93) has shown that it is feasible to collect questionnaire data from women when they attend the screening episode. In future, this may also allow the introduction of risk-stratified screening.
In the pooled estimates grouped by baseline models, overall AUC was the highest for BCSC followed by partial BCRAT and additional risk factors, partial BCRAT, other models, BCRAT and IBIS model (Supplementary Fig. S8). Researchers may keep this in mind when selecting the baseline model for their studies. BCSC was developed and validated in a multiracial and multiethnic population of over 1 million women undergoing mammography in the United States (11).
Some limitations include those associated with study design. The studies in this review were all case-control in nature, of which nine were nested case–control studies (24, 53, 55, 56, 59–61, 64, 65). The calibration of absolute event rates could not be evaluated in such a study design (59). In many studies (23, 53, 56, 58–61, 64, 65, 70, 71, 73, 74), the cases and controls were matched by age, and hence the discriminatory effect of age on breast cancer risk prediction could not be evaluated. The discrimination of the risk prediction score is likely to be higher in large, unmatched studies (65). There is a need for evaluation of model calibration in population-based cohorts so that the clinical validity of the models can be assessed further.
Another common limitation is that the SNP-enhanced risk prediction model in the studies might not be generalizable to other populations, especially those of different ethnicities. Different populations have different SNP profiles. Thus, for higher accuracy, the associations between SNPs and breast cancer should be specifically validated in ethnicities used in the study (94). For instance, in one study, 7 out of 8 breast cancer-associated SNPs were initially identified in previously published studies conducted in European women but were applied in the risk prediction model among Chinese women (70). The SNP-enhanced model showed low discriminatory accuracy (AUC = 0.630) and lower improvement in AUC (0.012) compared with the other included studies (70). Most studies also had missing information on some risk factors, such as history of atypical hypoplasia for BCRAT (57, 59, 63, 68, 69, 74, 76) and IBIS (69) and number of breast biopsies for BCRAT (57, 69, 76) and BCSC (65).
In this review, we obtained information from published literature which did not always provide access to primary data. Given that all included studies showed an improvement in overall discriminatory accuracy and reclassification, publication bias is likely.
Conclusion
Genetic information improved the discrimination accuracy when added to traditional risk prediction models for breast cancer, with overall AUC being the highest in SNP-enhanced BCSC model. SNP-enhanced models have also demonstrated an improvement in overall reclassification for risk groups. We did not observe any association (P = 0.244 for non-parametric test for trend across ordered panel sizes) nor linear correlation (P = 0.856 for Pearson correlation coefficient) between AUC improvement and number of SNPs added (range: 2–92). In addition, we observed significant heterogeneity in the choice of baseline model, method of incorporating SNP information and population studied. To further advance knowledge in this field, guidance on appropriate study design and/or standardization of methodology may be required. To overcome the limitation of GWAS in identifying rare variants, newer technologies such as whole genome sequencing may be employed. Our findings suggest that the addition of genetic information in SNP-enhanced models may offer greater benefit when the models are used for risk prediction among subgroups, particularly women with intermediate risk. This implies that perhaps screening could be a two-step process where a questionnaire is first used to identify individuals at intermediate risk and followed by administration of SNP testing for these women only. Further research on targeted SNPs testing is warranted.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
S.M. Fung was directly supported by the Singapore Ministry of Health Health Services Research Competitive Research Grant, administered by the National Medical Research Council (grant number: HSRG/13MAY006). The study sponsor had no role in the study design, collection, analysis and interpretation of data; in the writing of the manuscript; and in the decision to submit the manuscript for publication. We would like to thank Drs. Joanne Ngeow and Kelvin Bryan Tan, and members of the thesis advisory committee for X.Y. Wong for their comments on this manuscript, which is part of a thesis being prepared in fulfilment of the requirements for a Doctor of Philosophy at the National University of Singapore. We would also like to thank three anonymous reviewers for suggesting substantial improvements.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.