Abstract
Breast cancer is the most common cancer among women in the United States, with up to 30% of those diagnosed displaying a family history of breast cancer. To date, 18% of the familial risk of breast cancer can be explained by SNPs. This review summarizes the discovery of risk-associated SNPs using candidate gene and genome-wide association studies (GWAS), including discovery and replication in large collaborative efforts such as The Collaborative Oncologic Gene-environment Study and OncoArray. We discuss the evolution of GWAS studies, efforts to discover additional SNPs, and methods for identifying causal variants. We summarize findings associated with overall breast cancer, pathologic subtypes, and mutation carriers (BRCA1, BRCA2, and CHEK2). In addition, we summarize the development of polygenic risk scores (PRS) using the risk-associated SNPs and show how PRS can contribute to estimation of individual risks for developing breast cancer. Cancer Epidemiol Biomarkers Prev; 27(4); 380–94. ©2018 AACR.
See all articles in this CEBP Focus section, “Genome-Wide Association Studies in Cancer.”
Introduction
Breast cancer is the most common cancer among women in the United States, accounting for 15% of all new cancer cases. An estimated 15% to 30% of breast cancer cases are heritable or due to underlying genetic transmission, but the genetic alterations accounting for these breast cancers are not fully defined (1, 2). Mutations in dominant high and moderate penetrance breast cancer susceptibility genes, such as BRCA1, BRCA2, PALB2, ATM, and CHEK2 have been identified in 5% of breast cancer cases in the general population and only 30%–40% of cases associated with a family history of breast cancer (3–7). Over the last decade, efforts to explain the missing heritability of breast cancer have focused on discovery of other moderate and high-risk genes as well as identification of common genetic variants. Investigations of common variants by genome-wide association studies (GWAS) have successfully identified many genetic loci that are associated with breast cancer risk, and explain up to 18% heritability, suggesting that breast cancer is a complex, polygenic disease.
Initial Association Studies of Common Genetic Variants
Early investigations of common genetic risk factors for breast cancer focused on candidate gene and candidate variant studies using hundreds of breast cancer cases and unaffected controls. The majority of candidate variants associated with breast cancer in these studies have subsequently failed replication and have therefore been excluded as risk factors for breast cancer (8). Only a coding variant (D302H) in the caspase 8 gene (CASP8) has consistently shown associations with breast cancer, and ongoing studies have recently identified multiple independent risk–associated signals in this locus (9, 10).
Earliest Breast Cancer GWAS (2007–2013)
Beginning in 2005 with the development of large collaborative groups such as the Cancer Genetic Markers of Susceptibility (CGEMS) Breast Cancer Consortium and the Breast Cancer Association Consortium (BCAC), studies increased in size and statistical power and used GWAS to identify SNPs linked to increased breast cancer risk (11, 12). In 2007, Easton and colleagues identified five breast cancer risk loci (FGFR2—rs2981582, 8q24—rs13281615, LSP1—rs3817198, TNRC9—rs3803662, and MAP3K1—rs889312) in a two-stage GWAS study of Caucasian women followed by a third replication stage (12). The initial GWAS phase evaluated 227,876 SNPs among only 390 breast cancer cases and 364 controls from the United Kingdom, but replication in the second and third stages involved many thousands of cases and controls.
FGFR2: The First Consistently Replicated Hit
Soon after the groundbreaking publication by Easton and colleagues, additional evidence quickly accumulated implicating the FGFR2 locus in breast cancer risk (11–15). A study by CGEMS showed an increased risk of breast cancer associated with rs1219648 (in high LD with rs2981582) and a related haplotype in FGFR2 in Caucasian postmenopausal women (11). In addition, the FGFR2 rs2981582 SNP was more strongly associated with ER-positive than ER-negative disease, and was associated with increased risk of breast cancer among BRCA2 mutation carriers (13–15). To this day, SNPs in the FGFR2 locus are still among those with a strong influence on breast cancer risk (rs2981578 OR = 1.23; rs35054928 OR = 1.27; rs45631563 OR = 1.19; Table 1).
Early Strategies to Identify Additional Susceptibility Loci
Although most early GWAS studies had small sample sizes by today's standards, studies improved power by enriching for family history, performing meta-analyses, and evaluating marginally significant GWAS SNPs in larger populations. For example, 925 SNPs displaying near genome-wide significant association with breast cancer in the initial study by Easton and colleagues were further evaluated by BCAC using 37,012 cases and 40,069 controls resulting in two new susceptibility loci at 3p24 near SLC4A7 and NEK10 and at 17q23.2 near COX11 (12, 16). Additional efforts by the Shanghai Breast Cancer study in a three-stage study also identified risk-associated SNPs in the ESR1 locus; these SNPs were replicated in an independent study of European women from BCAC (1,591 cases and 1,466 controls; ref. 17). Thus, the initial phase of GWAS using limited numbers of cases and controls in one-, two-, or three-stage GWAS designs and the examination of marginally significant loci from these initial GWAS in larger sample sizes resulted in identification of the first 14 loci containing SNPs displaying genome-wide associations (P < 1 × 10−6) with breast cancer and conferring low to moderate risk of disease (OR = 1.07–1.29) in European populations (18). Furthermore, 72 potential breast cancer SNPs that did not reach genome-wide significance in pooled data from two GWAS (UK2 and British Breast Cancer Study; BBCS) were evaluated using cases and controls from 41 studies in BCAC and 9 individual GWAS. One SNP on 12p11 near PTHLH (rs10771399) was associated with overall breast cancer risk, and two others (rs1292011on 12q24; rs2823093 on 21q21 near NRIP1) associated with ER-positive breast cancer risk (19). Subsequently, five additional breast cancer susceptibility loci were identified in a multistage GWAS that included 3,659 cases with a family history of breast cancer and 4,897 controls (20). Similarly, Fletcher and colleagues identified a new susceptibility locus at 9q31.2 in a GWAS including 1,694 cases with a personal and/or family history of breast cancer compared with 2,365 controls (21).
SNP Discovery: COGS Era (2012–2015)
The Collaborative Oncologic Gene-environment Study (COGS) was a multi-consortium initiative to study prostate, ovarian, and breast cancer using a customized iSelect SNP genotyping array from Illumina (iCOGS) in which more than 200,000 SNPs were tested in each sample. In 2013, Michailidou and colleagues reported on the evaluation of 199,961 SNPs across 52,675 breast cancer cases and 49,436 controls from BCAC (22). At that time, 27 breast cancer risk loci had already been identified. Twenty-three of the 27 loci were replicated in COGS, 3 were more weakly associated with breast cancer risk, and 1 SNP was not included on the iCOGS genotyping array. In addition, 41 new breast cancer susceptibility loci were identified, bringing the total number of breast cancer susceptibility loci to 68. Thirteen of the newly identified SNPs were more strongly associated with ER-positive breast cancer, while one SNP was associated with ER-negative breast cancer. The 68 susceptibility SNPs were estimated to account for approximately 14% of familial relative risk (risk of breast cancer in first-degree relatives of women with breast cancer).
By 2015, 79 breast cancer susceptibility loci had been published, and 71 of these were confirmed in a 2015 meta-analysis including the COGS data from BCAC and 11 additional GWAS (23). The meta-analysis included 62,533 breast cancer cases and 60,976 controls. Using the 1000 Genomes Project, which catalogues a haplotype map for 38 million SNPs, genotypes were imputed for approximately 11.6 million SNPs, with imputation r2 > 0.3, that had not been included on the original arrays. After excluding SNPs within 500 kb of previously identified SNPs, 15 new breast cancer susceptibility loci were identified. This increased the total number of independent signals in known susceptibility loci to 94. Variation in these loci was estimated to account for approximately 16% of familial breast cancer risk.
SNP Discovery: OncoArray Era (2015–Present)
The most recent and largest breast cancer GWAS used the Illumina OncoArray BeadChip, which included approximately 570,000 SNPs and was imputed to approximately 11.8 million SNPs (imputation r2 > 0.3; refs. 24, 25). From BCAC, 61,282 breast cancer cases and 45,494 controls of European ancestry were genotyped on the OncoArray platform and results were used in a meta-analysis including data from iCOGS and 11 GWAS studies (24). In total, 122,977 breast cancer cases and 105,974 controls of European ancestry were analyzed along with 14,068 breast cancer cases and 13,104 controls of East Asian ancestry. In this study, SNPs from 49 previously identified breast cancer loci were replicated with genome-wide significance (P < 5 × 10−8), and SNPs from another 50 known risk loci did not reach but were associated with breast cancer (P < 0.05; Table 1). In addition, 65 new breast cancer risk loci (also genome-wide significant) were identified for women of European ancestry (Table 1). Of these, 19 SNPs were associated more strongly with ER-positive disease, and 2 SNPs were associated more strongly with ER-negative disease (Supplementary Table S1). The total 172 risk-associated SNPs account for an estimated approximately 18% of familial relative risk (Table 1).
Divide and Conquer: SNPs Associated with Breast Cancer Subtypes and Mutation Carriers
Studies of more homogeneous populations have been conducted to increase the statistical power of analyses, including BRCA1 and BRCA2 mutation carriers and breast cancer subtypes, such as ER-negative disease and triple-negative breast cancer (TNBC).
BRCA1 and BRCA2 mutation carriers
In 2007, the CIMBA consortium was formed with the goal of investigating genetic modifiers of cancer risk in BRCA1 and BRCA2 mutation carriers (26). Using up to 30,000 BRCA1 and BRCA2 mutation carriers with breast or ovarian cancer or unaffected by these diseases, CIMBA evaluated SNPs discovered in previous candidate SNP studies and GWAS including FGFR2, TNRC9, RAD51, TP53, ERCC4, CASP8, ZRCC1, IRS1, and a locus on chromosome 6p22 (15, 27–39). SNP associations were not always consistent among BRCA1 and BRCA2 mutation carriers. For example, the FGFR2 locus was found to be associated with breast cancer in BRCA2 mutation carriers only, whereas TNRC9 was associated with both BRCA1 and BRCA2 breast cancer (15).
The first GWAS focusing on BRCA1 and BRCA2 mutation carriers was performed in 2010 and modeled genotype data from women diagnosed with breast cancer under the age of 40 and unaffected women over the age of 35 using a retrospective likelihood approach (40). Multiple genome-wide significant SNPs associated with two independent signals at 19p13 were identified among BRCA1 but not BRCA2 mutation carriers. These SNPs were also associated with TNBC in a subsequent analysis (41). These results were later replicated by Couch and colleagues in a study investigating associations between SNPs at 19p13.1 and ZNF365 and both breast and ovarian cancer (42). SNPs at 19p13.1 were associated with ER-negative breast cancer for BRCA1 and BRCA2 mutation carriers and increased risk of ovarian cancer. In addition, the SNP of interest in ZNF365 was associated with ER-positive breast cancer for both BRCA1 and BRCA2 mutation carriers.
As genetic susceptibility to breast cancer in BRCA1 and BRCA2 mutation carriers unfolded, it became evident that breast cancer risk for BRCA1 and BRCA2 mutation carriers may be influenced by different loci. SNPs associated with overall breast cancer risk in the general population were also associated with breast cancer risk in BRCA2 mutation carriers, whereas SNPs associated with ER-negative breast cancer in the general population were associated with breast cancer risk in BRCA1 mutation carriers. These findings are consistent with the fact that BRCA2 mutation carriers are substantially more likely to have ER-positive disease than BRCA1 mutation carriers (BRCA2 ER-positive OR = 11.4; 95% CI: 9.8–13.2; ref. 43).
Similar to the general population, CIMBA identified stronger associations with ER-positive breast cancer for eleven out of the first twelve known susceptibility loci (the exception was rs2046210, near ESR1) in 11,421 BRCA1 and 7,080 BRCA2 mutation carriers (44). In BRCA1 mutation carriers, there were significantly stronger associations with ER-positive cancer risk than ER-negative cancer risk for SNPs in FGFR2 and SLC4A7/NEK10. In addition, a SNP in TOX3/TNRC9 was associated with ER-positive disease, but not ER-negative disease, and a SNP in LSP1 was associated with ER-negative but not ER-positive disease. For BRCA2 carriers, one SNP near ESR1 was significantly different by ER status. In 2014, Kuchenbaecker and colleagues evaluated 74 known breast cancer susceptibility loci in 15,252 BRCA1 mutation carriers and 8,211 BRCA2 mutation carriers using the iCOGS array (45). This study confirmed genome-wide significant associations for SNPs in three loci previously known to modify BRCA1 breast cancer risk: 19p13, 6q25.1, and 12p11. In addition, four novel SNPs of interest were identified, with a SNP at 1q32 in MDM4 reaching genome-wide significance. Furthermore, 15 and 8 breast cancer susceptibility loci were associated (P < 0.05) with risk of ER-negative disease in BRCA1 and BRCA2 mutation carriers, respectively. With respect to ER-positive disease, 10 SNPs were associated (P < 0.05) with risk in BRCA1 carriers and 14 SNPs were associated with risk in BRCA2 carriers. More recently, associations with BRCA1 breast cancer and ER-negative disease in the general population from iCOGS and the Oncoarray study have been combined in meta-analyses to increase statistical power to identify SNPs and loci associated with ER-negative disease (10, 25).
Pathologic subtypes
Following identification of breast cancer risk loci in the early GWAS, subsequent studies investigated associations between these risk loci and breast tumor histopathology. In 2008, Garcia-Closas and colleagues found that of the five SNPs previously identified by Easton and colleagues, two risk loci were associated with ER-positive breast cancer, and three were associated with ER-negative breast cancer (12, 14). Both SNPs associated with ER-positive breast cancer were also associated with lower tumor grade, but none of the SNPs were associated with lymph node status following corrections for multiple comparisons (14).
Because the overwhelming majority of breast cancer cases are ER-positive, it is not surprising that most early GWAS studies identified subtype-specific associations predominantly with ER-positive disease. For example, Stacey and colleagues evaluated Icelandic breast cancer cases (n = 1,600) and controls (n = 11,563) for 311,524 SNPs on the Illumina Infinium HumanHap 300 microarray, and then replicated the top ten hits in an independent dataset (46). Three new loci associated with ER-positive breast cancer were identified (13, 46). To date, many of the breast cancer SNPs associated with overall breast cancer are also associated with ER-positive breast cancer (Supplementary Table S1; Fig. 1).
As a result, several studies specifically focused on ER-negative breast cancer and TNBC were conducted. In 2011, Haiman and colleagues evaluated SNPs in a study including women of both European and African ancestry, and identified the TERT-CLPTM1L locus as a risk factor for both ER-negative breast cancer and TNBC (47). In addition, four novel SNPs associated with ER-negative breast cancer were identified in a meta-analysis of 3 GWAS of 4,193 ER-negative breast cancer cases and 35,194 controls, with confirmation in the iCOGS dataset using 6,514 ER-negative cases and 41,455 controls (48).
Focusing specifically on risk of triple-negative disease, the Triple Negative Breast Cancer Consortium (TNBCC) evaluated 21 variants previously associated with overall breast cancer risk in 2,980 TNBC cases and 4,978 controls in 2011 and found that 4 were strongly associated with TNBC (49). Subsequently, in 2012 SNPs in the 19p13.1 locus were associated with ER-negative and triple-negative disease using 48,869 breast cancer cases and 49,787 controls from BCAC (41). In 2014, 25 of 74 known breast cancer susceptibility SNPs were associated with triple-negative breast cancer (P < 0.05) based on data from 3,677 TNBC cases and 4,708 controls from TNBCC and iCOGS (50). In addition, Purrington and colleagues identified two SNPs associated with TNBC in ESR1, which were independent of other breast cancer risk associations in this locus (50).
In 2016, Couch and colleagues identified 4 novel loci (two at 13q22) displaying genome-wide significant associations with ER-negative breast cancer by pooling data from 11 GWAS and the iCOGS BCAC study (including 12,472 total ER-negative breast cancer cases and 56,820 controls; ref. 10). Two of the novel SNPs (rs67073037 and rs115635831) were not significantly associated with ER-positive breast cancer, whereas rs17181761 and rs6562760 were weakly associated with ER-positive disease. In addition, 15 of 94 previously reported general breast cancer risk SNPs were associated with ER-negative breast cancer at genome-wide significance in this study. Novel SNPs associated with ER-negative breast cancer were also identified in 3 previously discovered breast cancer risk loci. This study also included 7,797 BRCA1 mutation carriers from CIMBA that had been diagnosed with breast cancer and 7,455 that had not yet had cancer. Out of 19 loci that were known to be associated with ER-negative breast cancer in the general population, 15 SNPs showed some association with risk in BRCA1 mutation carriers (P < 0.05).
Milne and colleagues recently evaluated approximately 11.5 million SNPs using imputed OncoArray data from 21,468 ER-negative breast cancer cases and 100,594 controls (25). Of these, 9,655 cases and 45,494 served as an independent replication for 10 of the 11 SNPs previously identified in ER-negative or BRCA1 studies (Supplementary Table S1; Fig. 2). In addition, 10 new genome-wide significant ER-negative susceptibility SNPs were identified (Supplementary Table S1; Fig. 2). Five of these 20 SNPs associated with ER-negative disease were also associated with TNBC; this may be due to decreased statistical power due to a smaller sample of patients with TNBC, or it may be that HER2-driven tumors have unique genetics that have not been well-studied to date. Furthermore, 105 additional SNPs that were previously associated with overall breast cancer risk yielded associations with ER-negative breast cancer (5 × 10−8 < P < 0.05), bringing the total number of ER-negative variants to 125 (Supplementary Table S1). These SNPs account for approximately 14% of familial relative risk for ER-negative breast cancer. Of the SNPs associated with overall breast cancer risk, 24 were also associated with breast cancer risk for BRCA1 mutation carriers (25).
Race/Ethnicity
LD structures in populations
The majority of SNPs associated with breast cancer risk have been discovered in populations of European descent, presenting an etiologic and clinical challenge for non-European populations. There are known differences in LD structures between Europeans and other populations, which may influence risks associated with breast cancer for selected SNPs (51). In addition, women of African descent have an increased incidence of ER-negative breast cancer, which may be partially explained by genetic risk factors (52).
LD structures influencing GWAS results in breast cancer were first reported by Stacey and colleagues in 2008 when utilizing the Multiethnic Cohort to replicate variants associated with breast cancer risk (13). Although power was limited by small sample sizes, certain SNPs that were associated with breast cancer risk in Europeans and Latinas (rs13387042 and rs3803662) were not significantly associated with increased breast cancer risk in Hawaiians, Japanese, or African Americans. In fact, one risk SNP in Europeans (rs3803662) displayed a protective effect in African Americans, suggesting that it was in LD with a different causative SNP (13).
African American population
Discovering risk loci underlying breast cancer etiology in African American women may allow investigators to further understand and address health disparities in African American women with breast cancer (53). Several studies investigating associations between SNPS in known breast cancer risk loci, candidate genes, and pathways with breast cancer risk in the African American population have been conducted (54–66). For example, in 2013, Long and colleagues evaluated 67 known breast cancer susceptibility SNPs in 1,231 cases and 2,069 controls, and found that 7 SNPs were significantly associated (P < 0.05) with breast cancer risk in African American women, 3 were borderline significant (P < 0.10), and 57 did not show association (60).
There have also been several GWAS for breast cancer in African American women. In 2011, a GWAS by Haiman and colleagues included women of African American ancestry and women of European ancestry resulting in the identification of the TERT-CLPTM1L loci (47). To increase sample size and power, three consortia have been formed to study breast cancer in women of African ancestry: African American Breast Cancer (AABC; n = 3,153 cases), GWAS in Breast Cancer in the African Diaspora (ROOT; n = 1,657 cases), and African American Breast Cancer Epidemiology and Risk (AMBER; n = 6,000 cases; refs. 63, 66, 67). Two epidemiologic studies, Multi-Ethnic Cohort and Women's Circle of Health Study, contributed samples to both AABC and AMBER.
GWAS have been performed by both AABC and ROOT. However, no SNPs yielded genome-wide significant associations with breast cancer (66, 68). The AMBER consortium performed a two-stage meta-analysis of breast cancer GWAS in women of African descent using discovery-phase data from 4,673 breast cancer cases and 4,774 controls from the ROOT and AABC consortia followed by replication in the AMBER consortium. Three SNPs, one of which was novel (rs13074711 in 3q26.21) were associated with overall breast cancer risk (69). In subsequent analyses, 2 of the SNPs were associated with ER-negative disease and one with ER-positive disease (69). A separate study of AMBER followed by replication in an independent case–control study identified two novel loci associated with ER-positive cancer in women of African ancestry (14p16.1 and 17q25.1; ref. 70). The study also identified two loci (10q26 and 11q13) previously associated with breast cancer risk that have an excess of African ancestry. Subsequent fine-mapping studies suggested that 11q13 is associated with ER-positive breast cancer in women of African ancestry (70).
Asian populations
The breast cancer GWAS studies from COGS and OncoArray have included women of European and Asian ancestry (22–24). There have also been several independent GWAS studies in Asian women. In 2009, a GWAS of 1,505 cases and 1,522 controls from the Shanghai Breast Cancer Study was performed (17). After two further stages of replication of 29 and 4 SNPs, respectively, a SNP upstream of ESR1 was associated with breast cancer risk. The association was stronger in postmenopausal women and women with ER-negative breast cancer. These findings were subsequently confirmed in a population of European descent and provided the first glimpses into ESR1 as a complex locus (17). Studies of 23,637 East Asian breast cancer cases and 25,579 East Asian controls using the iCOGs array found that 31 of the 67 breast cancer loci at that time also conferred an increased risk of breast cancer in East Asian women (71). Similar studies of 22,780 breast cancer cases and 24,181 controls from the East Asian population identified three novel breast cancer loci (1q32.1, 5q14.3, and 15q26.1) at genome-wide significance that replicated in a European ancestry population (72).
Putting It All Together—Polygenic Risk Scores
Individually, SNPs have quite small effect sizes and may not be informative for evaluating risk of developing breast cancer. However, the recent OncoArray study suggested that all of the breast cancer susceptibility markers to date explain up to 18% of familial relative risk (24). Thus, combining these common variants together may provide some insight into individual risks of breast cancer. While the first 7 breast cancer susceptibility SNPs were shown to have little influence on risks predicted by the National Cancer Institute Breast Cancer Risk Assessment Tool (BCRAT or Gail model), subsequent studies using polygenic risk scores (PRS) based on larger numbers of SNPs have successfully demonstrated an ability to stratify or individualize breast cancer risk in a number of populations (73).
In 2015, BCAC investigators generated a PRS using 77 known breast cancer susceptibility SNPs that improved risk estimates for women in the general population (74). Specifically, lifetime risk for a woman without a family history of breast cancer in the highest quintile of the PRS was 16.6%. However, women with a first-degree family history of breast cancer and in the highest quintile had a lifetime risk of 24.4%. Separately, Vachon and colleagues showed that a 76 SNP PRS was independent from breast density as a risk factor and enhanced the breast cancer risk prediction by the Breast Cancer Surveillance Consortium breast cancer risk model (75). With the recent discovery of an additional 75 breast cancer susceptibility loci, it is anticipated that an updated PRS including all 182 breast cancer SNPs will likely stratify risk even further (24, 25). However, understanding how the PRS influences other risk factors in these models is needed. Recent analyses examining the combined effects of a 77-SNP PRS and risk factors on breast cancer risk in up to 23,000 cases showed the combined effects of the 77-SNP PRS and environmental risk factors for breast cancer are generally well described by a multiplicative model. However, there was some evidence for departure from the multiplicative model for alcohol, current menopausal hormone therapy, and height with ER-positive disease that warrants further study (76).
PRS have also been designed to modify the estimated breast cancer risk of women with mutations in cancer predisposition genes, such as BRCA1, BRCA2, and CHEK2 c.1100delC. CHEK2 c.1100delC is found in up to 1.5% of European individuals and is known to confer a moderate risk of breast cancer (OR = 2.0–3.0; refs. 7, 77, 78). Muranen and colleagues applied the PRS based on 77 breast cancer risk SNPs in the BCAC population with consideration for the effect of the PRS on CHEK2*1100delC carriers (79). For the highest quintile of the PRS, the OR for carriers was 2.03 (0.86–4.78), while the OR for the lowest quintile was much lower and more comparable with lifetime risk expected in the general population (OR = 0.52; 95% CI: 0.16–1.74). In the future, PRS may be developed to refine breast cancer risk estimates for mutation carriers in other high and moderate breast cancer risk genes such as PALB2 and ATM.
In 2017, Kuchenbaecker and colleagues evaluated 94 breast cancer risk SNPs in a PRS to stratify risk for BRCA1 and BRCA2 mutation carriers (80). PRS were generated for overall breast cancer risk as well as ER-negative and ER-positive breast cancer risk. All three PRS were associated with breast cancer risk for BRCA1 and BRCA2 carriers. For BRCA1 carriers, the ER-negative PRS showed the strongest association (HR = 1.27; 95% CI = 1.23–1.31). Conversely, stronger associations were observed for the overall and ER-positive PRS in BRCA2 mutation carriers (HR = 1.22; 95% CI = 1.17–1.28). Stratification or personalization of breast cancer risk is particularly important for women with BRCA1 and BRCA2 mutations because of high age-related and lifetime risks of disease. Identification of mutation carriers at particularly high risk or with reduced risk relative to the average mutation carrier may lead to changes in risk management including decisions about uptake and timing of prophylactic mastectomy and prophylactic oophorectomy, and decisions about child-bearing.
In 2016, Dite and colleagues evaluated models combining a PRS based on 77 breast cancer susceptibility SNPs and clinical breast cancer risk prediction models including BOADICEA, BRCAPRO, BCRAT, and IBIS (81). This study suggested that including the PRS improved breast cancer prediction in women younger than 50 years by more than 20%. Beginning in August 2017, Myriad Genetics, a commercial genetic testing laboratory, incorporated 82 SNPs on the MyRisk multi-gene panel. The resulting Breast Cancer riskScore that includes results from the 82 SNPs and the Tyrer–Cuzick model predicts a woman's 5-year and lifetime risk of developing breast cancer (82).
PRS may also impact clinical care by informing screening guidelines, particularly in women ages 40–49, for whom professional societies often have discordant recommendations. The Women Informed to Screen Depending On Measure of risk (WISDOM) population-based study is investigating breast cancer screening modalities in the precision medicine era (83). This study will evaluate risk thresholds and screening recommendations by integrating traditional clinical models, mutations in high and intermediate penetrance genes, and PRS. In addition, WISDOM plans to investigate prediction of breast cancer subtype-specific risk, which may result in more frequent screening for women at a higher risk of more aggressive breast cancers (e.g., ER-negative disease; ref. 83).
Breast Cancer Gene–Environment Interactions/Associations with Other Risk Factors
There are many well established factors that influence risk for breast cancer including breast density, parity, breastfeeding, alcohol use, body mass index (BMI), exercise, hormone replacement therapy (HRT), and menstrual history. For hereditary traits that influence breast cancer risk, such as breast density and menstrual history, there may be overlap between their predisposition loci and breast cancer risk loci. In fact, using GWAS of breast density and breast cancer, both traits show evidence of a shared genetic basis that is mediated through a large number of common variants (84). Furthermore, 18% of known breast cancer susceptibility variants have been associated with mammographic density (85). Lindstrom and colleagues performed a two-stage GWAS investigating SNPs associated with mammographic density and found that four previously identified breast cancer susceptibility SNPs are also associated with mammographic density (rs12665607 in ESR1, rs10995190 in SNF365, rs3817198 in LSP1, and rs17001868 in SGSM2/MKL1; ref. 86). Four additional SNPs (rs186749 in PRDM6, rs7816345 in 8p11.23, rs703556 in IGF1, and rs7289126 in TMEM184B) found to be associated with mammographic density were later shown to be associated with breast cancer risk (86). For nonhereditary traits, gene–environment interactions may be useful in assessing risk associated with breast cancer risk SNPs. However, sufficient power is frequently an obstacle in identifying gene–environment interactions (87). Following correction for multiple testing, five SNPs have been shown to have an interaction with breast cancer risk and a risk factor: two SNPs have been associated with parity (rs3817198 in LSP1 and rs11249422 in 1p11.2), one SNP with alcohol use (rs1045485 in CASP8), and two SNPs in strong LD with each other were associated with postmenopausal BMI (rs10483028 and rs2242714 in 21q22.12; refs. 88–90). Additional gene–environment interactions may be identified in the OncoArray study due to larger sample sizes.
Understanding the Biology
Most of the breast cancer susceptibility SNPs reside in intergenic or intronic regions of the genome, and counter to early expectations, do not generally impact protein-coding regions. Many follow-up studies have been conducted to identify causal variants, defined by Edwards and colleagues as variants that influence a molecular or cellular process to affect a human phenotype (91). There are many approaches to evaluation of GWAS findings including fine mapping by genotyping of additional SNPs in a locus, in silico annotations, SNP function, target gene identification, and target gene function.
Fine-mapping studies often require more power than the original GWAS to identify small differences in risk that are associated with different SNPs. Because fine-mapping studies require information on each SNP in the locus of interest, the advent of imputation techniques has alleviated the burdens of genotyping every single SNP. Several fine-mapping studies have been performed to identify causal breast cancer SNPs, revealing complex loci that contain more than one independent variant contributing to risk. One example of this is an investigation of the TERT locus that identified three independent SNP associations (92). ESR1 is another example of a complex risk locus; there have been five independent SNPs identified in ESR1 that have different associations with breast cancer according to subtype and grade (93).
One way to identify SNP function and a potential target gene is to evaluate expression quantitative trait loci (eQTLs) to determine whether a SNP is impacting gene expression. After utilizing fine mapping to identify multiple independent signals in the 5p12 locus, Ghoussaini and colleagues performed an eQTL analysis that revealed a SNP associated with increased expression of two candidate genes for breast cancer pathogenesis (94). However, eQTL studies can be limited by the normal tissue that is available. While data for many normal tissues are now available from the GTEx portal, the numbers of samples analyzed limit the statistical power of eQTL analyses based on these data (95). More recently, transcriptome-wide association studies (TWAS) have been used to analyze both SNPs and gene expression to identify novel breast cancer susceptibility genes and loci. By pairing genotyping data with gene expression data, SNPs that can predict gene expression can be selected. Subsequent imputation of gene expression data can be performed using these selected SNPs in another dataset. Finally, the imputed gene expression data can be evaluated for associations with breast cancer risk. Importantly, this method reduces the correction needed for multiple comparisons, which may increase power. In addition, findings from these studies may have clearer functional implications than traditional GWAS, and combining SNPs to predict gene expression may help to enhance signals that were too weak to detect previously. Using TWAS, Hoffman and colleagues identified five genes and 23 unique SNPs that may be associated with breast cancer risk (96).
Causal SNPs in regulatory regions may impact breast cancer risk in complex ways. One example is the 11q13 breast cancer risk locus, which has been associated with ER-positive disease. Fine-mapping studies identified three independent signals in 11q13 (97). Subsequent functional assays including electrophorectic mobility shift assay (EMSA), promoter and enhancer luciferase reporter assays, and chromatin immunoprecipitation studies implicated these SNPs in distal, long-distance regulation of the CCND1 promoter (97). This distal enhancer on 11q13 has also been implicated in regulation of two estrogen-modulated long noncoding RNAs, suggesting that the underlying risk-driving SNPs in this locus may cause preferential chromatin looping further impacting promoter activity and transcription (98). Importantly, CRISPR/cas9–based allele-specific deletion and/or replacement is rapidly becoming the preferred method for evaluation of effects of single SNPs or combinations of SNPs on the biology of risk-associated loci.
Conclusions
To date, 182 breast cancer susceptibility SNPs have been identified for breast cancer, increasing understanding of the heritability of breast cancer. The combined effect of these SNPs in PRS models may pave the way to tailored screening and intervention strategies in the general population as well as in carriers of deleterious mutations in predisposition genes (e.g., BRCA1 and BRCA2). Further functional studies of risk loci will elucidate biological mechanisms of breast cancer etiology, with potential implications for breast cancer prevention, prognosis, and treatment. New methodologies, such as TWAS, are emerging to inform functional investigations of known breast cancer SNPs as well as identify novel susceptibility breast cancer SNPs and genes. In addition, with the growing size of consortia designed to study the genetic component of breast cancer, we may see even larger GWAS in the future that will have power to detect smaller effect sizes.
Disclosure of Potential Conflicts of Interest
F.J. Couch reports receving a commercial research grant from GRAIL Inc. and is a cousultant/ advisory board member for Astrazeneca. No potential conflicts of interest were disclosed by the other authors.
Acknowledgments
K.J. Ruddy was supported by a training grant under the CTSA Grant Program Numbers UL1 TR000135 and KL2TR000136-09 from the National Center for Advancing Translational Sciences of the NIH. J. Lilyquist, C.M. Vachon, and F.J. Couch were supported by a NIH training grant R25CA92049 (Mayo Cancer Genetic Epidemiology Training program). F.J. Couch was supported by Breast Cancer Research Foundation, NIH grants R01 CA192393, CA116167, CA176785, the Mayo Clinic Specialized Program of Research Excellence (SPORE) in Breast Cancer (CA116201), and the Mayo Clinic Breast Cancer Registry. K.J. Ruddy, C.M. Vachon, and F.J. Couch were also supported by the Mayo Clinic Cancer Center.
The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official view of NIH.