Abstract
Genes regulated by breast cancer risk alleles identified through genome-wide association studies (GWAS) may harbor rare coding risk alleles.
We sequenced the coding regions for 38 genes within 500 kb of 38 lead GWAS SNPs in 13,538 breast cancer cases and 5,518 controls.
Truncating variants in these genes were rare, and were not associated with breast cancer risk. Burden testing of rare missense variants highlighted 5 genes with some suggestion of an association with breast cancer, although none met the multiple testing thresholds: MKL1, FTO, NEK10, MDM4, and COX11. Six common alleles in COX11, MAP3K1 (two), and NEK10 (three) were associated at the P < 0.0001 significance level, but these likely reflect linkage disequilibrium with causal regulatory variants.
There was no evidence that rare coding variants in these genes confer substantial breast cancer risks. However, more modest effect sizes could not be ruled out.
We tested the hypothesis that rare variants in 38 genes near breast cancer GWAS loci may mediate risk. These variants do not appear to play a major role in breast cancer heritability.
Introduction
Genome-wide association studies (GWAS) have identified approximately 180 risk loci (1), all of which are common variants that confer modest disease risks. Fine-mapping and functional analyses suggest that most causal variants modulate risk via regulatory effects, though a few lead SNPs, including in DCLRE1B and EXO1, are missense substitutions (1). In some diseases, GWAS association signals have been shown to be mediated, at least in part, by rare, high-risk coding variants in nearby genes (2). Moreover, even if GWAS signals are due to regulatory variants, rare coding variants in the target genes are biologically plausible candidates for modulating risk. In this study, we tested this hypothesis by sequencing the coding exons and intron–exon boundaries of 38 genes that are potential targets for GWAS-identified causal variants.
Materials and Methods
The subjects, DNA enrichment, sequencing, and variant calling employed in this study have been described elsewhere (3). Sequencing primers, coverage statistics, quality metrics, and variants are in Supplementary Tables S1–S4.
Results
A total of 3,839 variants were identified, and most were rare, with 3,564 (92.8%) found in <0.1% of all sequenced subjects (noncoding variants) or ExAC European subjects (coding variants; Supplementary Fig. S1; Supplementary Tables S3 and S4). Only 131 truncating variants were identified, and all were uncommon in the population (Supplementary Tables S3 and S4). Burden testing showed that truncating variants were not associated with risk for any gene, even at the nominal significance threshold (P < 0.05; Supplementary Table S5).
The aggregate of rare missense variants were not associated for any gene at P < 0.0001; however, 5 genes were associated at P < 0.05 (Fig. 1; Table 1). Stratification with SIFT, PolyPhen2, and CADD effect predictions showed that only NEK10 variants with a CADD score >20 conferred a significantly higher risk than predicted benign variants (OR = 2.73 and 0.86, respectively; Pdiff = 0.010; Supplementary Table S6). This signal was partially driven by variants within the highly conserved NEK10 protein kinase domain, which were more strongly associated than variants outside of the domain (Table 1; Pdiff = 0.033). In contrast, for MDM4, the association was stronger for variants outside Pfam-defined domains (Table 1).
Six common variants were associated with breast cancer risk at P < 0.0001 (Supplementary Table S7), and all were in linkage disequilibrium (LD) with the reported lead GWAS SNP: a 3′-untranslated region variant in COX11 (rs1802212), two synonymous variants in MAP3K1 (p.Gln1028Gln, rs3822625; and p.Thr522Thr, rs2229882), and one missense and two synonymous variants in NEK10 (p.Lys513Ser, rs10510592; p.Thr670Thr, rs11129280; and p.Thr687Thr, rs3213930). In each case, the strength of the associations was compatible with those seen by Michailidou and colleagues (1), but the associations were much weaker than for the corresponding lead SNP (rs2787486, rs62355902, and rs4973769).
Among variants associated at P < 0.05, DCLRE1B p.His49Tyr (rs11552449; OR = 1.10) was also the lead GWAS SNP and conferred a similar risk to that seen in the initial study (OR = 1.07, P = 1.8 × 10−8; ref. 4).
Discussion
Exon sequencing of genes in GWAS regions did not identify clear novel associations. There was no evidence for association with truncating variants in any genes, and although the variants were too rare to establish reliable estimates, these are unlikely to be large contributors to breast cancer risk. There was limited evidence of association for rare missense variants in 5 genes, whereas 1.9 genes would have been expected to be associated by chance. Larger targeted studies will be required to establish whether any of these associations can be confirmed; if so, these may indicate novel associations distinct from the common variant associations identified through GWAS.
Six common variants were associated with breast cancer after correcting for multiple testing, but all were in LD with the lead GWAS SNP, and therefore probably do not represent novel risk loci. Moreover, other noncoding SNPs in these regions were more strongly associated, suggesting that these associations are “passenger” associations reflecting LD with causal regulatory variants (5).
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: B. Decker, D.F. Easton
Development of methodology: B. Decker, J. Allen, S. Ahmed, A.M. Dunning, D.F. Easton
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): B. Decker, C. Luccarini, M. Shah, S. Ahmed, R. Luben, P.D.P. Pharoah, A.M. Dunning, D.F. Easton
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B. Decker, J. Allen, K.A. Pooley, D.F. Easton
Writing, review, and/or revision of the manuscript: B. Decker, J. Allen M.K. Bolla, R. Luben, P.D.P. Pharoah, D.F. Easton
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): B. Decker, J. Allen, C. Luccarini, M. Shah, M.K. Bolla, Q. Wang, C. Baynes, D.M. Conroy, J. Brown, R. Luben, E.A. Ostrander, D.F. Easton
Study supervision: B. Decker, D.F. Easton
Acknowledgements
B. Decker and E.A. Ostrander were supported by the Intramural Research Program of the National Human Genome Research Institute. SEARCH is funded by a programme grant from Cancer Research UK (C490/A10124) and supported by the UK NIH Research Biomedical Research Centre at the University of Cambridge (Cambridge, UK). Targeted sequencing in SEARCH was supported by Cancer Research UK grants (C1287/A16563, to D.F. Easton; C8197/A16565, to A.M. Dunning).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.