Abstract
Association studies have been widely used to search for common low-penetrance susceptibility alleles to breast cancer in general. However, breast cancer is a heterogeneous disease and it has been suggested that it may be possible to identify additional susceptibility alleles by restricting analyses to particular subtypes. We used data on 710 single nucleotide polymorphisms (SNP) in 120 candidate genes from a large candidate gene association study of up to 4,470 cases and 4,560 controls to compare the results of analyses of “overall” breast cancer with subgroup analyses based on the major clinicopathologic characteristics of breast cancer (stage, grade, morphology, and hormone receptor status). No SNP was highly significant in overall effects analysis. Subgroup analysis resulted in substantial reordering of ranks of SNPs, as assessed by the magnitude of the test statistics, and some associations that were not significant for an overall effect were detected in subgroups at a nominal 5% level adjusted for multiple testing. The most significant association of CCND1 SNP rs3212879 with estrogen receptor–negative tumor types (P = 0.001) did not reach genome-wide significance levels. These results show that it may be possible to detect associations using subgroup analysis that are missed in overall effects analysis. If the associations we found can be replicated in independent studies, they may provide important insights into disease mechanisms in breast cancer. (Cancer Epidemiol Biomarkers Prev 2009;18(1):255–9)
Introduction
Breast cancer tends to cluster in families, the disease being approximately twice as common in first-degree relatives of cases than in the general population (1). Some of this clustering occurs as part of specific familial breast cancer syndromes where disease results from single alleles conferring a high risk. However, such alleles are rare in the population and the majority of multiple case breast cancer families do not segregate mutations in these genes (2). The model that best describes aggregation of breast cancer in these families is a polygenic model in which susceptibility to breast cancer is conferred by a large number of genetic variants, each of which has a modest effect (3-6).
Despite large research efforts over the past 10 years, the number of common susceptibility alleles identified has been small. Loci implicated thus far include single nucleotide polymorphism (SNP) alleles located in the CASP8, FGFR2, TNRC9, MAP3K1, and LSP1 genes and in regions in the DNA devoid of known genes at chromosome positions 8q24 and 2q35 (7, 8). One of the reasons for this lack of success may be disease heterogeneity. Invasive breast cancers can be divided into several pathologic subtypes with different histologic appearances of the malignant cells and different clinical presentations and outcomes. Novel subtypes have also emerged as a result of gene expression profiling (9). It is plausible that the etiology of the subtypes is different. The association of the “basal” phenotype in breast cancers with rare, deleterious mutations in BRCA1 (10-13) shows the principle that different genetic determinants can underlie different subtypes of the disease. In addition, the association with SNPs at the FGFR2, 2q35, and MAP3K1 loci is largely restricted to estrogen receptor (ER)–positive disease (14, 15).
Few genetic association studies have systematically evaluated association between putative common susceptibility alleles and specific subtypes of disease. However, under some models of disease susceptibility, subgroup analysis may identify associations missed by analysis of overall effects. For example, if the genetic effects in the subgroups are sufficiently heterogeneous, the power of subgroup analysis may exceed that of simple overall effects analysis. This is supported by the finding that, under some models, power to detect epistasis between genes is greater than power to detect overall effects (16). In that report, the subgroup was defined by genotype at a second locus. The largest study to date of common variants in multiple candidate genes for breast cancer susceptibility has been published recently (17). Over 700 SNPs in 120 genes in approximately 4,400 cases and 4,400 controls were tested. None of the SNPs reached genome-wide significance levels after adjusting for population stratification (17). However, when the admixture maximum likelihood experiment-wise test for association was applied, there was evidence for an excess of positive associations over the proportion expected by chance, suggesting that some SNPs in these candidate genes are associated with breast cancer risk (17).
The purpose of this study was to reanalyze this data set to identify highly significant associations with specific subgroups of breast cancer in the absence of overall effects. Subgroups were based on the major clinicopathologic features of the tumors [i.e., morphology type, stage, grade, and ER and progesterone receptor (PR) status].
Materials and Methods
Study Participants
Study participants (up to 4,470 cases and 4,560 controls) were selected, as described by Pharoah et al. (17). In brief, cases were drawn from Studies of Epidemiology and Risk Factors in Cancer Heredity (SEARCH), an ongoing population-based study ascertained through the Eastern Cancer Registration and Information Centre. All patients diagnosed with invasive breast cancer below age 55 y since 1991 and still alive in 1996 (prevalent cases; median age, 48 y), together with all those diagnosed below age 70 y between 1996 and the present (incident cases; median age, 54 y), are eligible to take part. Of 12,767 eligible patients, 2,284 were not contacted because their general practitioner did not respond or thought that it would be inappropriate to contact the patient. Of the 10,583 patients who were contacted, 67% have returned a questionnaire and 64% provided a blood sample for DNA analysis. Eligible patients who did not take part in the study were similar to participants except, as might be expected, the proportion of clinical stage III/IV cases was somewhat higher in nonparticipants (10% versus 5%). Female controls were randomly selected from the Norfolk component of the European Prospective Investigation of Cancer (EPIC), a prospective study of diet and cancer being carried out in nine European countries (18). The EPIC-Norfolk cohort comprises 25,000 individuals resident in Norfolk, East Anglia—the same region from which the cases have been recruited. Controls are not matched to cases but are broadly similar in age (42-81 y). The ethnic background of both cases and controls as reported on the questionnaires is similar, with >98% being white. Staging and phenotyping of breast cancer cases were obtained through the cancer registry from routine pathology and clinical records. The study is approved by the Eastern Region Multi-centre Research Ethics Committee, and all patients gave written informed consent.
The samples were split into two sets to save DNA and reduce genotyping costs: the first set (n = 2,270 cases and 2,280 controls) was genotyped for all SNPs, and the second set (n = 2,200 cases and 2,280 controls) was then tested for those SNPs that showed marginally significant associations in overall effects analysis for set 1 (P heterogeneity or P trend < 0.1). This staged approach substantially reduces genotyping costs without significantly affecting statistical power of the overall effects analysis. Results of subgroup analyses for set 1 data were, however, not used to select SNPs for analysis in set 2.
Candidate Gene, SNP Selection, and Genotyping Methods
Data on 710 SNPs in 120 candidate genes were available for analysis (Supplementary Table S1; ref. 17). Genes that encode proteins in cellular pathways that are likely to be involved in breast carcinogenesis were chosen as candidates. The major pathways studied were steroid hormone metabolism and signaling, double-strand break DNA repair, oxidative damage repair, epigenetic modifiers, and cell cycle control. Genes in the 17q21 region commonly amplified in a variety of animal models of cancer and some carcinogen metabolism genes were also tested. For some pathways, only a small subset of genes was selected for study. Genes evaluated by pathway and number of SNPs assayed for each are described by Pharaoh et al. (17). Common variation in most genes was captured using a minimal set of tagging SNPs (17, 19). Genotyping methods are as described (17). Concordance for duplicate samples was 98% for all assays. Failed genotypes were not repeated (the rate for failed genotypes did not exceed 8.3% for any of the SNPs under study). Hardy-Weinberg equilibrium was tested as part of genotyping quality assurance and SNPs with serious deviations were excluded.
Statistical Methods
The aim of this study was to test for statistical association between each of 710 individual SNPs and breast cancer subtypes and to compare the results of these with the overall effects analyses. The subtypes were categorized according to clinical stage at diagnosis (I, II, and III/IV), histopathologic grade (1, 2, and 3), ER and PR status (negative or positive), and histopathologic morphology. Only the most common morphologic types, lobular and ductal, were analyzed. Pairwise correlation coefficient was calculated to assess correlation structure between subgroups (Supplementary Table S2). Sample sizes for most joint phenotypes were too small for subgrouping to be based on phenotypic correlations. Association between disease and genotype for each SNP within each subtype category for ER status, PR status, and morphology was assessed using the one degree of freedom Cochran-Armitage trend test with a single variable for allele dose. The analyses were conducted with each subgroup being compared with all of the available controls. Grade and stage were assessed as ordered categories using ordinal polytomous logistic regression.
Results for all tests were summarized using standard quantile-quantile (Q-Q) plots constructed by ranking the set of values for the test statistic from smallest to largest and plotting them against their expected values. Per-allele odds ratios (OR) and confidence intervals were estimated using logistic regression. To compare the previously reported overall effects analysis with the subgroup analyses reported here, a Bonferroni correction was applied to correct for the number of subgroup analyses (eight). A nominal significance level of P < 0.05 was chosen for overall effects and an equivalent P < 0.00625 (=0.05/8) for subgroup analyses. Note that this is not a correction for the number of SNPs tested as such a correction would be the same for each subgroup analysis and would make no difference to the comparison between overall effects and subgroup analyses.
Results
The number of cases by subgroup is shown in Table 1. This is a maximum sample size as not all SNPs were genotyped for both set 1 and set 2. Figure 1A shows the Q-Q plot for the univariate trend test for association between SNPs and breast cancer (overall effect). For χ2 values less than three, the observed values lie close to the line expected under the null hypothesis of no association, providing no evidence of inflation of the test statistic that would suggest population stratification or other systematic bias. The deviation of the higher observed values from those expected is suggestive of multiple weak associations. One SNP showed a much higher χ2 statistic than the others. This SNP, rs3020314 in the ERα gene (P = 8 × 10−5), did not reach genome-wide significance but did reach the P < 10−4 threshold that has been suggested for candidate gene studies (20). Figure 1B to I shows Q-Q plots for the univariate trend test for each subgroup scan. None of the associations reached the level of significance for the most significant association in the overall effects analysis.
Clinical phenotype/subgroup . | Set 1 . | Set 2 . | Total . | |||
---|---|---|---|---|---|---|
Controls* | 2,280 | 2,280 | 4,560 | |||
Stage* | ||||||
I | 1,114 | 1,077 | 2,191 | |||
II | 987 | 996 | 1,983 | |||
III/IV | 110 | 84 | 194 | |||
Missing | 59 | 43 | 102 | |||
Grade* | ||||||
1 | 437 | 437 | 874 | |||
2 | 788 | 899 | 1,687 | |||
3 | 505 | 515 | 1,020 | |||
Missing | 540 | 349 | 889 | |||
Morphology type | ||||||
Lobular | 351 | 308 | 659 | |||
Ductal | 1,674 | 1,642 | 3,316 | |||
Other | 222 | 233 | 455 | |||
Missing | 23 | 17 | 40 | |||
ER status | ||||||
Positive | 1,054 | 983 | 2,037 | |||
Negative | 278 | 192 | 470 | |||
Missing | 938 | 1,025 | 1,963 | |||
PR status | ||||||
Positive | 536 | 240 | 776 | |||
Negative | 233 | 116 | 349 | |||
Missing | 1,501 | 1,844 | 3,345 |
Clinical phenotype/subgroup . | Set 1 . | Set 2 . | Total . | |||
---|---|---|---|---|---|---|
Controls* | 2,280 | 2,280 | 4,560 | |||
Stage* | ||||||
I | 1,114 | 1,077 | 2,191 | |||
II | 987 | 996 | 1,983 | |||
III/IV | 110 | 84 | 194 | |||
Missing | 59 | 43 | 102 | |||
Grade* | ||||||
1 | 437 | 437 | 874 | |||
2 | 788 | 899 | 1,687 | |||
3 | 505 | 515 | 1,020 | |||
Missing | 540 | 349 | 889 | |||
Morphology type | ||||||
Lobular | 351 | 308 | 659 | |||
Ductal | 1,674 | 1,642 | 3,316 | |||
Other | 222 | 233 | 455 | |||
Missing | 23 | 17 | 40 | |||
ER status | ||||||
Positive | 1,054 | 983 | 2,037 | |||
Negative | 278 | 192 | 470 | |||
Missing | 938 | 1,025 | 1,963 | |||
PR status | ||||||
Positive | 536 | 240 | 776 | |||
Negative | 233 | 116 | 349 | |||
Missing | 1,501 | 1,844 | 3,345 |
NOTE: Maximum number of cases and controls are listed.
Controls are coded as zero for ordered logistic regression. Otherwise, all other analyses were conducted with all of the controls.
In the overall effects analyses, 52 SNPs (7.5%) were significant at the P < 0.05 level. In subgroup analysis, at the equivalent threshold significance of P < 0.00625, 7 SNPs were significantly associated with increasing cancer grade, 16 with increasing cancer stage, 7 with lobular cancer, 7 with ductal cancer, 7 with ER-positive disease, 14 with ER-negative disease, 6 with PR-positive cancer, and 5 with PR-negative cancer (data not shown). Most of the SNPs detected at the P < 0.00625 level in subgroup analysis achieve at least P < 0.05 significance in overall effects analysis, but 18 SNPs found to be significant in subgroup analysis did not. For these SNPs, per-allele OR and the corresponding 95% confidence intervals (95% CI) in subgroup and overall effects analysis are shown in Table 2. Sample sizes of cases and controls for each SNP tested are also indicated. The strongest association observed was for CCND1 rs3212879 in ER-negative disease (P = 0.0001; OR, 1.40; 95% CI, 1.20-1.70).
Subgroup . | Gene . | dbSNP ref . | Cases/controls . | Per-allele OR* (95% CI) . | Per-allele OR† (95% CI) . |
---|---|---|---|---|---|
Lobular | GPX4 | rs4807542 | 340/2,259 | 1.38 (1.14-1.69) | 1.10 (0.97-1.2) |
ER positive | TBXAS1 | rs6971207 | 269/2,278 | 1.85 (1.23-2.70) | 1.30 (0.90-1.9) |
ER negative | CCND1 | rs3212879 | 268/2,271 | 1.40 (1.20-1.70) | 1.04 (0.96-1.13) |
CCND1 | rs678653 | 447/4,471 | 1.30 (1.13-1.50) | 1.06 (1.00-1.13) | |
CCND1 | rs3212891 | 268/2,275 | 1.40 (1.15-1.66) | 1.04 (0.96-1.13) | |
CCND1 | rs602652 | 267/2,268 | 0.74 (0.61-0.88) | 0.98 (0.93-1.07) | |
CCND1 | rs603965 | 257/2,194 | 0.74 (0.62-0.90) | 0.94 (0.87-1.03) | |
DNMT3A | rs11677670 | 269/2,275 | 0.68 (0.52-0.87) | 0.91 (0.82-1.01) | |
PTGIS | rs693649 | 268/2,272 | 0.69 (0.53-0.88) | 0.92 (0.83-1.02) | |
GPX1 | rs3448 | 460/4,557 | 1.24 (1.07-1.44) | 1.03 (0.97-1.10) | |
STE | rs1881668 | 449/4,473 | 1.24 (1.07-1.44) | 1.05 (0.99-1.13) | |
PR positive | CDK6 | rs3757823 | 339/4,540 | 1.36 (1.14-1.60) | 1.10 (0.99-1.22) |
PTGIS | rs6095541 | 222/2,270 | 1.32 (1.11-1.56) | 1.07 (0.96-1.20) | |
PTGIS | rs6095543 | 223/2,264 | 1.32 (1.11-1.56) | 1.08 (0.97-1.21) | |
CCND3 | rs2479717 | 224/2,267 | 0.80 (0.68-0.94) | 0.94 (0.86-1.03) | |
PR negative | IGF1R | rs1546713 | 209/2,149 | 0.74 (0.60-0.90) | 0.94 (0.86-1.02) |
NBS1 | rs1805787 | 215/2,243 | 1.34 (1.10-1.65) | 1.04 (0.95-1.14) | |
BRCA2 | rs4942485 | 217/2,251 | 0.60 (0.42-0.87) | 0.95 (0.83-1.08) |
Subgroup . | Gene . | dbSNP ref . | Cases/controls . | Per-allele OR* (95% CI) . | Per-allele OR† (95% CI) . |
---|---|---|---|---|---|
Lobular | GPX4 | rs4807542 | 340/2,259 | 1.38 (1.14-1.69) | 1.10 (0.97-1.2) |
ER positive | TBXAS1 | rs6971207 | 269/2,278 | 1.85 (1.23-2.70) | 1.30 (0.90-1.9) |
ER negative | CCND1 | rs3212879 | 268/2,271 | 1.40 (1.20-1.70) | 1.04 (0.96-1.13) |
CCND1 | rs678653 | 447/4,471 | 1.30 (1.13-1.50) | 1.06 (1.00-1.13) | |
CCND1 | rs3212891 | 268/2,275 | 1.40 (1.15-1.66) | 1.04 (0.96-1.13) | |
CCND1 | rs602652 | 267/2,268 | 0.74 (0.61-0.88) | 0.98 (0.93-1.07) | |
CCND1 | rs603965 | 257/2,194 | 0.74 (0.62-0.90) | 0.94 (0.87-1.03) | |
DNMT3A | rs11677670 | 269/2,275 | 0.68 (0.52-0.87) | 0.91 (0.82-1.01) | |
PTGIS | rs693649 | 268/2,272 | 0.69 (0.53-0.88) | 0.92 (0.83-1.02) | |
GPX1 | rs3448 | 460/4,557 | 1.24 (1.07-1.44) | 1.03 (0.97-1.10) | |
STE | rs1881668 | 449/4,473 | 1.24 (1.07-1.44) | 1.05 (0.99-1.13) | |
PR positive | CDK6 | rs3757823 | 339/4,540 | 1.36 (1.14-1.60) | 1.10 (0.99-1.22) |
PTGIS | rs6095541 | 222/2,270 | 1.32 (1.11-1.56) | 1.07 (0.96-1.20) | |
PTGIS | rs6095543 | 223/2,264 | 1.32 (1.11-1.56) | 1.08 (0.97-1.21) | |
CCND3 | rs2479717 | 224/2,267 | 0.80 (0.68-0.94) | 0.94 (0.86-1.03) | |
PR negative | IGF1R | rs1546713 | 209/2,149 | 0.74 (0.60-0.90) | 0.94 (0.86-1.02) |
NBS1 | rs1805787 | 215/2,243 | 1.34 (1.10-1.65) | 1.04 (0.95-1.14) | |
BRCA2 | rs4942485 | 217/2,251 | 0.60 (0.42-0.87) | 0.95 (0.83-1.08) |
NOTE: SNPs at significance level P < 0.00625 in subgroup analysis, but not P < 0.05 in overall effects analysis, are shown.
Per-allele OR for subgroup.
Per-allele OR for overall effects analysis.
Thus, subgroup analysis has not identified any highly significant associations missed by overall effects analysis. Nevertheless, subgroup analysis may still be useful when selecting SNPs of borderline significance for further replication. In general, the number of SNPs selected for replication studies is limited by the cost of genotyping. For example, assume funding is available to attempt to replicate 50 SNPs in further studies to provide definitive evidence of association. One strategy would be to simply select the top 50 “hits,” that is, SNPs that are significant at some predefined level (e.g., P < 0.05), from the overall effects analysis. However, better candidates for replication may be identified from the subgroup analyses. A possible strategy would be to include SNPs for replication that achieve the same P value as overall effects analysis after Bonferroni correction. Here, we applied a correction of eight to compare directly the results of the subgroup and overall effects analysis, although this correction is overly conservative as subgroups (and therefore tests) are correlated (Supplementary Table S2). A ranking of corrected P values representing the top 50 hits (P < 0.05) from both overall effects and subgroup analyses is shown in Table 3. With this strategy, 12 SNPs achieved higher ranks in subgroup analysis than in overall effects analysis. Thus, 50 SNPs chosen for replication would include 38 SNPs from the overall effects analysis and 12 SNPs from the subgroup analyses.
Gene . | dbSNP ref . | P . | P* . | Subgroup . |
---|---|---|---|---|
ESR1 | rs3020314 | 8 × 10−5 | 8 × 10−5 | Overall effect |
CCND1 | rs3212879 | 0.0001 | 0.0010 | ER negative |
MCS5A6 | rs2182317 | 0.0011 | 0.0011 | Overall effect |
CCND1 | rs678653 | 0.0003 | 0.0023 | ER negative |
DNMT3B | rs406193 | 0.0026 | 0.0026 | Overall effect |
CDKN1A | rs3176336 | 0.0030 | 0.0030 | Overall effect |
ESR1 | rs3020317 | 0.0030 | 0.0030 | Overall effect |
ESR1 | rs3020394 | 0.0034 | 0.0034 | Overall effect |
CCNE1 | rs997669 | 0.0034 | 0.0034 | Overall effect |
CCND1 | rs3212891 | 0.0005 | 0.0038 | ER negative |
ESR1 | rs3020396 | 0.0038 | 0.0038 | Overall effect |
ESR1 | rs3020401 | 0.0039 | 0.0039 | Overall effect |
CDK6 | rs3757823 | 0.0005 | 0.0036 | ER positive |
ESR1 | rs3020377 | 0.0043 | 0.0043 | Overall effect |
ESR1 | rs3020390 | 0.0046 | 0.0046 | Overall effect |
ESR1 | rs3020400 | 0.0047 | 0.0047 | Overall effect |
STE | rs3736599 | 0.0053 | 0.0053 | Overall effect |
ESR1 | rs3020407 | 0.0062 | 0.0062 | Overall effect |
RB1 | rs2854344 | 0.0070 | 0.0070 | Overall effect |
ESR1 | rs3020405 | 0.0077 | 0.0077 | Overall effect |
MMP8 | rs1892886 | 0.0097 | 0.0097 | Overall effect |
GPX4 | rs4807542 | 0.0001 | 0.0086 | Lobular |
CCND1 | rs602652 | 0.0001 | 0.0086 | ER negative |
PTGIS | rs6095541 | 0.0001 | 0.0106 | ER positive |
CDKN1B | rs34330 | 0.0120 | 0.0120 | Overall effect |
IGFBP3 | rs2132572 | 0.0122 | 0.0122 | Overall effect |
CDKN2A | rs3731239 | 0.0122 | 0.0122 | Overall effect |
PTGIS | rs6095543 | 0.0016 | 0.0130 | ER negative |
BAT8 | rs535586 | 0.0138 | 0.0138 | Overall effect |
ERBB2 | rs1801201 | 0.0139 | 0.0139 | Overall effect |
ESR1 | rs726282 | 0.0167 | 0.0167 | Overall effect |
RB1 | rs198580 | 0.0174 | 0.0174 | Overall effect |
CHEK2 | rs9608698 | 0.0177 | 0.0177 | Overall effect |
CCND1 | rs603965 | 0.0023 | 0.0180 | ER negative |
PGR | rs1042838 | 0.0180 | 0.0180 | Overall effect |
STE | rs3775775 | 0.0182 | 0.0182 | Overall effect |
STK15 | rs732417 | 0.0182 | 0.0182 | Overall effect |
EHMT1 | rs4634736 | 0.0194 | 0.0194 | Overall effect |
ADH1C | rs698 | 0.0196 | 0.0196 | Overall effect |
RIZ1 | rs2235515 | 0.0204 | 0.0204 | Overall effect |
DNMT3A | rs11677670 | 0.0030 | 0.0240 | ER negative |
PTGIS | rs693649 | 0.0031 | 0.0250 | ER negative |
TBXAS1 | rs6971207 | 0.0032 | 0.0254 | ER positive |
ADH1B | rs1042026 | 0.0253 | 0.0253 | Overall effect |
ESR1 | rs1884051 | 0.0299 | 0.0299 | Overall effect |
XRCC2 | rs3218536 | 0.0302 | 0.0302 | Overall effect |
IGF1R | rs2229765 | 0.0308 | 0.0308 | Overall effect |
IGF1R | rs1546713 | 0.0039 | 0.0310 | ER negative |
ESR1 | rs1884054 | 0.0328 | 0.0328 | Overall effect |
SHBG | rs858524 | 0.0329 | 0.0329 | Overall effect |
Gene . | dbSNP ref . | P . | P* . | Subgroup . |
---|---|---|---|---|
ESR1 | rs3020314 | 8 × 10−5 | 8 × 10−5 | Overall effect |
CCND1 | rs3212879 | 0.0001 | 0.0010 | ER negative |
MCS5A6 | rs2182317 | 0.0011 | 0.0011 | Overall effect |
CCND1 | rs678653 | 0.0003 | 0.0023 | ER negative |
DNMT3B | rs406193 | 0.0026 | 0.0026 | Overall effect |
CDKN1A | rs3176336 | 0.0030 | 0.0030 | Overall effect |
ESR1 | rs3020317 | 0.0030 | 0.0030 | Overall effect |
ESR1 | rs3020394 | 0.0034 | 0.0034 | Overall effect |
CCNE1 | rs997669 | 0.0034 | 0.0034 | Overall effect |
CCND1 | rs3212891 | 0.0005 | 0.0038 | ER negative |
ESR1 | rs3020396 | 0.0038 | 0.0038 | Overall effect |
ESR1 | rs3020401 | 0.0039 | 0.0039 | Overall effect |
CDK6 | rs3757823 | 0.0005 | 0.0036 | ER positive |
ESR1 | rs3020377 | 0.0043 | 0.0043 | Overall effect |
ESR1 | rs3020390 | 0.0046 | 0.0046 | Overall effect |
ESR1 | rs3020400 | 0.0047 | 0.0047 | Overall effect |
STE | rs3736599 | 0.0053 | 0.0053 | Overall effect |
ESR1 | rs3020407 | 0.0062 | 0.0062 | Overall effect |
RB1 | rs2854344 | 0.0070 | 0.0070 | Overall effect |
ESR1 | rs3020405 | 0.0077 | 0.0077 | Overall effect |
MMP8 | rs1892886 | 0.0097 | 0.0097 | Overall effect |
GPX4 | rs4807542 | 0.0001 | 0.0086 | Lobular |
CCND1 | rs602652 | 0.0001 | 0.0086 | ER negative |
PTGIS | rs6095541 | 0.0001 | 0.0106 | ER positive |
CDKN1B | rs34330 | 0.0120 | 0.0120 | Overall effect |
IGFBP3 | rs2132572 | 0.0122 | 0.0122 | Overall effect |
CDKN2A | rs3731239 | 0.0122 | 0.0122 | Overall effect |
PTGIS | rs6095543 | 0.0016 | 0.0130 | ER negative |
BAT8 | rs535586 | 0.0138 | 0.0138 | Overall effect |
ERBB2 | rs1801201 | 0.0139 | 0.0139 | Overall effect |
ESR1 | rs726282 | 0.0167 | 0.0167 | Overall effect |
RB1 | rs198580 | 0.0174 | 0.0174 | Overall effect |
CHEK2 | rs9608698 | 0.0177 | 0.0177 | Overall effect |
CCND1 | rs603965 | 0.0023 | 0.0180 | ER negative |
PGR | rs1042838 | 0.0180 | 0.0180 | Overall effect |
STE | rs3775775 | 0.0182 | 0.0182 | Overall effect |
STK15 | rs732417 | 0.0182 | 0.0182 | Overall effect |
EHMT1 | rs4634736 | 0.0194 | 0.0194 | Overall effect |
ADH1C | rs698 | 0.0196 | 0.0196 | Overall effect |
RIZ1 | rs2235515 | 0.0204 | 0.0204 | Overall effect |
DNMT3A | rs11677670 | 0.0030 | 0.0240 | ER negative |
PTGIS | rs693649 | 0.0031 | 0.0250 | ER negative |
TBXAS1 | rs6971207 | 0.0032 | 0.0254 | ER positive |
ADH1B | rs1042026 | 0.0253 | 0.0253 | Overall effect |
ESR1 | rs1884051 | 0.0299 | 0.0299 | Overall effect |
XRCC2 | rs3218536 | 0.0302 | 0.0302 | Overall effect |
IGF1R | rs2229765 | 0.0308 | 0.0308 | Overall effect |
IGF1R | rs1546713 | 0.0039 | 0.0310 | ER negative |
ESR1 | rs1884054 | 0.0328 | 0.0328 | Overall effect |
SHBG | rs858524 | 0.0329 | 0.0329 | Overall effect |
Bonferroni correction factor of eight is used to adjust tests in subgroup analysis.
Discussion
Complex diseases such as breast cancer are phenotypically heterogeneous, and this heterogeneity may obscure genetic associations. If alleles at different genetic loci are responsible for different subtypes of disease, genetic associations may be best detected by subgroup analysis. However, there is a trade-off between the increased specificity that may be obtained by subgroup analysis, and the loss of statistical power from reduction in sample size and increase in number of hypotheses being evaluated. The most powerful way to test for and detect associations is not known because the true underlying biological/genetic models for the data are not known.
Our results support the notion that subgroup analysis may be worthwhile because some associations that were not detected using an overall effects approach were detected using a subgroup approach at a nominal level of P < 0.05 adjusted for multiple subgroup testing. Nevertheless, the findings are only illustrative of the potential for subgroup analysis because none of the associations detected could be regarded as definitive, and we cannot state with certainty that subgroup analyses have identified true associations missed by the overall effects analysis. The most strongly associated SNP was rs3212879 in CCND1, which was associated with risk of ER-negative tumors (P = 0.001, adjusted for multiple subgroup testing). This does not reach the threshold for genome-wide significance (P < 10−8), which is necessarily stringent due to the low prior probability of any individual SNP being associated with disease. The prior probability for SNPs in candidate gene studies may be somewhat higher but there is still no biological a priori hypothesis for association between any particular SNP with a particular subtype of cancer. Despite good total sample size, the subgroup sizes are modest and power to detect subgroup effects at very stringent significance levels may be small. Statistical power may be further limited by subgroup classification error. Data for subgroup categorization were obtained from clinical records and there is likely to be some degree of misclassification of phenotype. Nevertheless, the effect size for GPX4 rs4087542 in lobular carcinoma and the SNPs in the TBXAS1 family in ER-positive tumors and those in the CCND1 family in ER-negative tumors are of sufficient magnitude to warrant replication in larger studies of patients with these subtypes of cancer.
There are other published examples of associations between genetic variants and breast cancer restricted to specific subtypes. The associations of common variants at FGFR2, MAP3K1, and 2q35 have been reported to be confined to ER-positive cancers (14, 15). These loci were not evaluated as part of our candidate gene study. In these examples, the overall effect analyses between the variants and breast cancer had reached genome-wide significance levels before subgroup analysis.
Our results also show that subgroup analyses may be incorporated into a strategy for selection of SNPs for replication in independent data sets. Staged study designs are commonly used in genome-wide association studies to reduce costs (21). The most appropriate selection of SNPs for the second and subsequent stages is critical to maximize power. To date, most genome-wide association studies have based this selection on the results of overall effects analyses, but SNP selection based on overall effects and subgroup analysis with an appropriate correction for multiple testing may prove more efficient.
The potential advantages of reducing phenotypic heterogeneity by restricting analysis to specific subtypes of disease are clear. Further evaluation of such a strategy is required to provide definitive evidence of its value.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant support: N. Mavaddat was funded by scholarships from Cancer Research UK and the Medical Research Council. P.D. Pharoah is a Senior Clinical Research Fellow and D.F. Easton is a Principal Fellow of Cancer Research UK.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).
Acknowledgments
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank the SEARCH team, the EPIC collaborators, and the Eastern Cancer Registration and Information Centre (patient recruitment and phenotype data). Genotyping was carried out by many individuals from the Department Oncology at Strangeways Research Laboratory and funded by Cancer Research UK.