CYP11A catalyzes the rate-limiting step in the biosynthesis of sex-steroid hormones. In this study, we employed a systematic approach that involved gene resequencing and a haplotype-based analysis to investigate the relationship between common variation in CYP11A and breast cancer risk among African-Americans, Latinas, Japanese-Americans, Native Hawaiians, and Whites in the Multiethnic Cohort Study. Resequencing in a multiethnic panel of 95 advanced breast cancer cases revealed no common missense variant (≥5% frequency). Common haplotype patterns were assessed by genotyping 36 densely spaced single nucleotide polymorphisms (SNPs) spanning 67 kb of the CYP11A locus in a multiethnic panel of subjects (n = 349; 1 SNP/1.86 kb on average). We identified one to two regions of strong linkage disequilibrium in these populations. Twelve tagging SNPs were selected to predict the common haplotypes (≥5% frequency) in these regions with high probability (average Rh2 = 0.94) and were examined in a breast cancer case-control study in the Multiethnic Cohort Study (1,615 cases and 1,962 controls). A global test for differences in risk according to common haplotypes over the locus was statistically significant (P = 0.006), as were associations with haplotypes in each block (block 1 global test, P = 0.008; haplotype 1D, effect per haplotype copy, odds ratios, 1.23; 95% confidence interval, 1.03–1.48) and block 2 (global test, P = 0.016; haplotype 2F odds ratios, 1.52; 95% confidence interval, 1.15–2.01). These haplotypes were most common in Japanese-Americans and Native Hawaiians, followed by Whites then Latinas, and were rare in African-Americans (<5% frequency); the haplotype effects on risk across each group were homogeneous. Based on these findings, CYP11A deserves further consideration as a candidate breast cancer susceptibility gene. (Cancer Res 2006; 66(24): 12019-25)

Estrogens and progesterone are breast cell mitogens and postmenopausal women with high circulating estrogen levels are at a greater risk of breast cancer (1). It is hypothesized that a common variation in the genes involved in the biosynthesis and metabolism of sex-steroid hormones may alter a woman's lifetime exposure to endogenous estrogens and progesterone, and risk of developing breast cancer (2, 3). One such candidate is CYP11A, a gene that encodes the cholesterol side chain cleavage (P450scc) enzyme which catalyzes one of the initial steps of steroid hormone biosynthesis, specifically, the conversion of cholesterol to pregnenolone; a precursor of androgens, estrogens, and progesterone.

Previous studies investigating common genetic variations in CYP11A have focused on a pentanucleotide repeat (TAAAA)n polymorphism located 528 bp upstream from the translation start site of the gene. This repeat polymorphism has been studied in relation to hormone-related phenotypes including polycystic ovarian syndrome (47), and breast and prostate cancer (8, 9). It is unclear, however, whether specific (TAAAA)n repeat alleles are biologically functional or merely mark important variation located elsewhere in the gene. To date, a comprehensive assessment of common genetic variations in CYP11A, in association with the risk of breast cancer or other hormone-related phenotypes, has not been done.

In the present study, we did a systematic evaluation of common genetic variations in CYP11A using a combination of approaches that included resequencing of the coding region to identify common missense variations, and a haplotype-based analysis to characterize and study common patterns of genetic variation across the entire locus, including the regulatory and noncoding regions. Subsequent tests of association were done in a large case-control study of breast cancer in the Multiethnic Cohort Study (MEC).

Subjects. The MEC consists of >215,000 men and women in Hawaii and Los Angeles (with additional African-Americans from elsewhere in California), and has been described in detail elsewhere (10). The cohort is comprised mainly of five self-described racial-ethnic populations, Native Hawaiians, Japanese-Americans and Whites from Hawaii, and African-Americans, Japanese-Americans, and Latinos from Los Angeles. Between 1993 and 1996, participants entered the MEC by completing a self-administered mail questionnaire that asked detailed information about dietary habits, demographic factors, personal behaviors, history of prior medical conditions, family history of common cancers, and for women, reproductive history and exogenous hormone use. The participants were between the ages of 45 and 75 when they entered the cohort.

Incident cancers in the MEC are identified by record linkage to the Hawaii Tumor Registry, the Cancer Surveillance Program for Los Angeles County, and the California State Cancer Registry. These population-based tumor registries participate in the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program of cancer registration, which is known to have an excellent (98%) case ascertainment. From the registries, we also obtained information about stage of disease at diagnosis. Breast cancer cases were classified as “advanced” cases when diagnosed with invasive/nonlocalized disease (SEER stage ≥2) at diagnosis.

Beginning in 1994, blood samples were collected from incident breast cancer cases. At this time, blood collection was also initiated in a random sample of MEC participants to serve as a control pool for genetic analyses. The participation rates for providing blood sample were ≥65% for cases and controls. Demographic characteristics related to socioeconomic status and acculturation (e.g., age at cohort entry, education, place of birth, and years of living in the U.S.) were similar among those who provided a blood sample and women in the entire cohort. Eligible breast cancer cases in this study consisted of women with incident breast cancer diagnosed after enrollment in the MEC through April 2002. Controls were women without breast cancer prior to entry into the cohort and without a cancer diagnosis up to April 2002, and were frequency-matched to cases by age and ethnicity. Because <6% of cohort members had moved outside of Hawaii and Los Angeles between enrollment (1993–1996) and the cutoff date for diagnosis (April 2002), the likelihood of missing cases that accrued in the cohort over this period of time is low.

The study consists of 1,615 invasive breast cancer cases (345 African-Americans, 425 Japanese-Americans, 335 Latinas, 109 Native Hawaiians, and 401 Whites) and 1,962 controls (426 African-Americans, 290 Native Hawaiians, 420 Japanese, 386 Latinas, and 440 Whites). The study protocol was approved by the Institutional Review Boards at the University of Hawaii and at the University of Southern California.

Gene resequencing. We resequenced the nine exons and splice-site regions of CYP11A in germ line DNA from 95 advanced breast cancer cases (19 of each racial-ethnic group). We used DNA samples from advanced cases to increase the probability of discovering single nucleotide polymorphisms (SNPs) that are biologically relevant to breast cancer. Resequencing was done using ABI BigDye terminator chemistry on the ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, CA). The PolyPhred program was used to identify polymorphisms with manual review by at least two observers, and all putative coding variants were validated by genotyping in the same panel of advanced cases and in the multiethnic panel (discussed below).

Characterization of linkage disequilibrium and haplotype patterns. A haplotype-based approach was implemented to study common variations in the CYP11A gene in the five ethnic and racial populations in the MEC. The details of this approach have been described previously (11). We defined the common haplotype patterns spanning 67.0 kb of the CYP11A gene (from 26.6 kb upstream of exon 1 to 9.5 kb downstream of the transcribed region) using SNPs selected from the National Center of Biotechnology Information SNP database12

and the Celera database.13 We genotyped common SNPs (minor allele frequency >5%) at a density of 1 SNP every ∼2 kb across the locus, all known missense SNPs in the public database, and all newly identified missense SNPs in our resequencing effort. In total, 63 SNPs were selected and genotyped in a multiethnic panel of 349 women in the MEC without a history of cancer (n = 69–70/racial-ethnic group). This sample size guaranteed that any haplotype with a frequency of >5% would be represented at least once among the 140 chromosomes with a probability of >99%. Fifteen SNPs were identified as monomorphic and 12 SNPs genotyped poorly (SNPs missing genotype data for ≥25% of samples or out of Hardy-Weinberg equilibrium in >1 of the populations, P ≤ 0.01). This left 36 SNPs with minor allele frequencies (≥5%) in at least one racial-ethnic group to be included in the haplotype analysis. We also genotyped the pentanucleotide (TAAAA)n polymorphism in the multiethnic panel to correlate specific repeat alleles with the observed haplotype patterns.

The ∣D′∣ and r2 statistics were used to assess pairwise linkage disequilibrium (LD) between the 36 common SNPs. Within regions of strong LD (12), haplotype frequency estimates were constructed from the genotype data in the multiethnic panel (one ethnicity at a time) using the expectation-maximization (E-M) algorithm of Excoffier and Slatkin (13). The squared correlation (Rh2) between the true haplotypes (h) and their estimates were then calculated as described by Stram et al. (14). Tagging SNPs for the case-control study were then chosen by finding the minimum set of SNPs for each ethnic group that would have Rh2 > 0.7 for all common haplotypes with an estimated frequency of >5%. Pairwise r2 measures were also calculated to assess how well the chosen tag SNPs could predict the SNPs that were not selected as tag SNPs, and were not genotyped in the breast cancer cases and controls.

Genotyping. DNA for all subjects was extracted from WBC fractions using the Qiagen blood kit (Chatsworth, CA). SNP genotyping in the multiethnic panel was done using the Sequenom, Inc. (San Diego, CA) and Illumina Inc. (San Diego, CA) platforms. Genotyping of the (TAAAA)n repeat polymorphism in the multiethnic panel was done on the ABI3730 platform (Applied Biosystems) using the protocol of Zheng et al. (9). Tag SNP genotyping in the breast cancer cases and controls was done by the 5′ nuclease TaqMan allelic discrimination assay using the ABI7900. Replicate blinded quality control samples (5%) were included to assess the reproducibility of the genotyping procedure; the concordance was 99.7%.

Statistical analysis. Haplotype frequencies among breast cancer cases and controls were estimated using the tag SNPs selected to distinguish the common haplotypes (>5% frequency) for each ethnic group in the multiethnic panel as described (15). The E-M algorithm was run to estimate haplotype frequencies for the tag SNPs in the combined data set (cases + controls) and individual estimates of haplotype count (expected number of copies of each haplotype carried by each individual) from the E-M were output to an external file and merged with case-control status. These estimates were then used as explanatory variables in logistic regression models.

Our analytic approach is based on the common disease/common variant hypothesis (16, 17). This theory suggests that alleles with appreciable frequencies in the population contribute to important variation in risk of disease. As shown empirically (18), the majority of common variation is shared across racial and ethnic populations (18, 19). The biological effects on risk for the majority of common disease–associated alleles have also been shown to be consistent across populations (20). These observations provide a justification for the pooling of genetic data across racial and ethnic populations if no heterogeneity is noted. To assess the consistency of genetic effects across populations, we first tested for heterogeneity across racial-ethnic groups prior to pooling genetic data. These tests were done using a likelihood ratio test following the inclusion of an interaction term between the each haplotype (or SNP) and ethnicity in the logistic regression model. All tests for heterogeneity were not statistically significant (P > 0.14). Pooled odds ratios (OR) and 95% confidence intervals (CI) were then estimated for each haplotype and tag SNP using unconditional logistic regression adjusting for age and ethnicity.

We used the methods described by Zaykin et al. to perform global tests of association between haplotypes and cancer risk within each LD block and to estimate haplotype-specific ORs (21). ORs were estimated for each common haplotype using the most common haplotype as the reference group and for each SNP using the more common genotype as the reference group. We also did the haplotype analyses using all other haplotypes as the reference group and obtained similar results (data not shown). Because further adjustment for study area (Hawaii or Los Angeles) and the established breast cancer risk factors (first-degree family history of breast cancer, body mass index, parity, age at first birth, age at menarche, type and age at menopause, use of hormone replacement therapy, and alcohol consumption) did not affect our results, we only present results from the age- and ethnicity-adjusted models.

We assessed and corrected for the potential effects of population stratification on our risk estimates in the following way. Using genotype data for ∼1,000 SNPs across 60 genes that had been successfully genotyped in the same subjects, and following the methods of Price et al. (22), we computed the pairwise correlation matrix of SNPs for each pair of subjects in the study of the same reported ethnicity, extracted the first few principal components of each these five (ethnic-specific) matrices and included them as covariates in the logistic regression models along with terms for ethnicity and for ethnicity × each principal component.

Permutation testing was employed to assess nominally statistically significant associations observed with common haplotypes. Permutation testing was done as follows: for each racial-ethnic population, we permuted the case-control status 10,000 times at random and did association tests for the common haplotypes in each block using the permuted data. Next, we generated a distribution of the minimum P values for the haplotype effects from each permutation and then re-evaluated the nominal P values in relation to this distribution.

The haplotype frequencies were estimated using tagSNP program14

and the permutation testing program was written in FORTRAN. All other statistical analyses were done using the SAS statistical package version 9.1 (SAS Institute, Cary, NC).

Characterization of genetic variation at the CYP11A locus. In resequencing the nine exons of CYP11A among women with advanced breast cancer (n = 95), we found only one rare missense SNP in one Japanese-American case (Met367Arg, minor allele frequency, <1.0%). This SNP is located in exon 6 and the minor allele was found on the most common haplotype in block 2 (haplotype 2A). This heterozygote case reported no first-degree family history of breast cancer and was diagnosed at the age of 74. In testing for this variant in the full breast cancer case-control study, no additional carriers were observed.

We characterized LD patterns across the CYP11A locus by genotyping 36 common SNPs (Supplemental Table S1); with an average spacing of one SNP per 1.86 kb. With this high-density SNP map, we identified one to two regions of strong LD depending on the population (Supplemental Fig. S1). Except for block 1 in African-Americans, which was slightly smaller than in other groups, block boundaries were observed to be highly conserved across populations. For this reason, we defined the block boundaries based on all populations combined: block 1 (SNPs 5–21, size = 23 kb) and block 2 (SNPs 22–36, size = 23 kb). The common haplotypes (≥5% frequency) were highly correlated between block for all populations (multiallelic D′ score ≥ 0.94).15

The largest gap between SNPs (SNPs 21 and 22) was only 8.05 kb, which defined the interblock region in intron 1 (Supplemental Table S1).

In each block, seven common haplotypes were observed and these common haplotypes accounted for at least 90% of all chromosomes in each racial-ethnic group (Table 1). We selected 12 markers as tag SNPs that strongly predicted the common haplotypes in each ethnic group in the multiethnic panel. The average Rh2 in predicting the common haplotypes across all ethnic groups in each block was 0.94 (range, 0.73–1.00).

Table 1.

Common haplotype frequencies in LD blocks of CYP11A

Haplotypes*Haplotype frequencies (%) in the multiethnic panel
AANHJALAWH
Block 1: SNPs 5–21; tag SNPs 6, 8, 11, 12, 19, and 21       
    1A CGGGCG 23.9 36.2 24.7 47.6 61.4 
    1B CAAGTC 34.1 26.0 32.0 21.9 16.4 
    1C TGGGCC 6.6 12.3 14.0 15.7 14.2 
    1D CGGACC 2.9 17.4 12.1 2.1 5.7 
    1E CGAGTC 13.1 5.2 10.2 7.3 <1.0 
    1F CGGGCC 5.9 <1.0 5.5 3.1 <1.0 
    1G CGGGTC 6.7 0.0 0.0 <1.0 0.0 
    Total  93.2 97.9 98.5 98.3 99.2 
    Minimum Rh2§  0.85 0.78 0.84 0.89 0.94 
Block 2: SNPs 22–36; tag SNPs 22, 24, 26, 32, 33, and 36       
    2A GCTCCC 25.7 35.4 19.2 29.3 59.8 
    2B ACCCAT 41.8 38.4 47.7 38.6 30.0 
    2C GCCCAC 10.0 5.1 8.1 7.3 2.2 
    2D GCCCCC 4.9 5.1 8.1 21.2 3.0 
    2E GTTACC 1.4 11.4 4.5 1.4 3.6 
    2F GTTCCC 1.0 4.6 8.6 0.0 0.0 
    2G ACCCCC 9.5 0.0 0.0 1.4 0.0 
    Total  94.3 100.0 96.2 99.2 98.6 
    Minimum Rh  0.73 0.91 0.99 0.80 0.97 
Haplotypes*Haplotype frequencies (%) in the multiethnic panel
AANHJALAWH
Block 1: SNPs 5–21; tag SNPs 6, 8, 11, 12, 19, and 21       
    1A CGGGCG 23.9 36.2 24.7 47.6 61.4 
    1B CAAGTC 34.1 26.0 32.0 21.9 16.4 
    1C TGGGCC 6.6 12.3 14.0 15.7 14.2 
    1D CGGACC 2.9 17.4 12.1 2.1 5.7 
    1E CGAGTC 13.1 5.2 10.2 7.3 <1.0 
    1F CGGGCC 5.9 <1.0 5.5 3.1 <1.0 
    1G CGGGTC 6.7 0.0 0.0 <1.0 0.0 
    Total  93.2 97.9 98.5 98.3 99.2 
    Minimum Rh2§  0.85 0.78 0.84 0.89 0.94 
Block 2: SNPs 22–36; tag SNPs 22, 24, 26, 32, 33, and 36       
    2A GCTCCC 25.7 35.4 19.2 29.3 59.8 
    2B ACCCAT 41.8 38.4 47.7 38.6 30.0 
    2C GCCCAC 10.0 5.1 8.1 7.3 2.2 
    2D GCCCCC 4.9 5.1 8.1 21.2 3.0 
    2E GTTACC 1.4 11.4 4.5 1.4 3.6 
    2F GTTCCC 1.0 4.6 8.6 0.0 0.0 
    2G ACCCCC 9.5 0.0 0.0 1.4 0.0 
    Total  94.3 100.0 96.2 99.2 98.6 
    Minimum Rh  0.73 0.91 0.99 0.80 0.97 
*

Haplotypes observed with ≥0.05 frequency in at least one ethnic group.

Frequencies estimated using the tagging SNPs: AA, African-Americans; NH, Native Hawaiians; JA, Japanese-Americans; LA, Latinas; WH, Whites.

The percentage of all chromosomes accounted for by the haplotypes.

§

The Rh2 that is given is the minimum Rh2 of the common haplotype in each ethnic group.

To assess how well the 12 tagging SNPs captured the other 20 SNPs in the LD blocks, we computed their pairwise r2 values. The average maximum r2 across populations for the unmeasured SNPs was 0.85 (range, 0.74–0.96). Thus, we believe that the chosen tag SNPs provide relatively good prediction of all SNPs assayed in the multiethnic panel and that common variation was thoroughly characterized at this locus.

The relationship between the (TAAAA)n repeat polymorphism and the common haplotypes observed in block 1 are shown in Table 2. The (TAAAA)n is located in the 5′ region between SNPs 19 and 20 in block 1. Five common alleles with four, six, seven, eight, or nine (TAAAA) repeats were observed in the multiethnic panel, which accounted for >99% of all alleles. The number and frequency of the repeat alleles observed were consistent with previous reports (6, 9). The six-repeat allele was the most common allele among African-Americans (52%), Japanese-Americans (60%), and Native Hawaiians (51%), and was found on several of the common haplotypes (1B, 1D, 1E, 1F, and 1G), with varying frequencies across ethnic groups. The four-repeat allele was the most common allele among Latinas (50%) and Whites (60%), and traveled exclusively on haplotype 1A. Associations between the other repeat alleles and common haplotypes in block 1 are presented in Table 2. The average maximum r2 across populations between the TAAAA repeat allele and the flanking tag SNPs in block 1 was 0.63 (range, 0.34–0.77).

Table 2.

The relation of CYP11A (TAAAA)n alleles with the common haplotypes in block 1

(TAAAA)nAllele frequencies (%) in the multiethnic panel*
AA (n = 67)NH (n = 64)JA (n = 66)LA (n = 67)WH (n = 67)
 25.4 37.5 25.8 50.0 60.4 
 52.2 50.8 59.8 30.6 24.6 
 6.0   3.7  
 13.4 8.6 12.9 8.2 4.5 
 3.0   6.0 10.4 
       
Haplotype
 
(TAAAA)n
 
Haplotype frequencies (%) in the multiethnic panel*
 
    
  AA
 
NH
 
JA
 
LA
 
WH
 
1A 23.1 37.5 25.0 48.5 59.7 
1B 25.4 24.1 30.9 17.6 16.4 
1B 9.0     
1C 3.7 8.6 12.9 6.7 4.5 
1C    6.0 9.7 
1D  18.7 12.8  6.0 
1E 12.5 5.6 10.8 7.7  
1F   4.5   
1G 4.5     
(TAAAA)nAllele frequencies (%) in the multiethnic panel*
AA (n = 67)NH (n = 64)JA (n = 66)LA (n = 67)WH (n = 67)
 25.4 37.5 25.8 50.0 60.4 
 52.2 50.8 59.8 30.6 24.6 
 6.0   3.7  
 13.4 8.6 12.9 8.2 4.5 
 3.0   6.0 10.4 
       
Haplotype
 
(TAAAA)n
 
Haplotype frequencies (%) in the multiethnic panel*
 
    
  AA
 
NH
 
JA
 
LA
 
WH
 
1A 23.1 37.5 25.0 48.5 59.7 
1B 25.4 24.1 30.9 17.6 16.4 
1B 9.0     
1C 3.7 8.6 12.9 6.7 4.5 
1C    6.0 9.7 
1D  18.7 12.8  6.0 
1E 12.5 5.6 10.8 7.7  
1F   4.5   
1G 4.5     

Abbreviations: AA, African-Americans; NH, Native Hawaiians; JA, Japanese-Americans; LA, Latinas; WH, Whites.

*

Haplotypes with frequencies ≥2.5% are shown.

Case-control analysis. Descriptive characteristics of the cases and controls have been previously published (11), and associations with known breast cancer risk factors were as expected (Supplemental Table S2). We observed no significant heterogeneity of effects for haplotypes across racial-ethnic groups (P > 0.14; Supplemental Table S3) and present results from pooled analyses (Table 3). In the analysis of the common haplotypes, the global tests of haplotype effects in each block and over both blocks were statistically significant (block 1, P = 0.008; block 2, P = 0.016; both blocks, P = 0.006; Table 3). In block 1, we observed a nominally significant positive association with haplotype 1D (effect per haplotype copy: OR, 1.23; 95% CI, 1.03–1.4; P = 0.025) and nominally significant inverse associations with haplotypes 1E (OR, 0.83; 95% CI, 0.68–1.02; P = 0.071) and 1G, which was common only among African-Americans (OR, 0.62; 95% CI, 0.42–0.91; P = 0.015). In block 2, we observed a positive association with haplotype 2F (OR, 1.52; 95% CI, 1.15–2.01; P = 0.003) and an inverse association with haplotype 2G, which was common only among African-Americans (OR, 0.67; 95% CI, 0.46–0.97; P = 0.033). Haplotypes 1D and 2F were more common among cases than among controls in all racial-ethnic groups (Table 3; Supplemental Table S3). These haplotypes, however, varied considerably in frequency by group, being most common in Native Hawaiians and Japanese-Americans, followed by Whites then Latinas, and was rare in African-Americans (≤1.9% among controls). In evaluating the long-range haplotype patterns across the region, haplotype 2F was found to travel predominantly with haplotype 1D, and this long-range haplotype combination was associated with a modest increase in risk (OR, 1.44; 95% CI, 1.10–1.90; P = 0.009). Similarly, haplotypes 1G and 2G were found to travel together, and combined, were associated with a decrease in risk (OR, 0.52; 95% CI, 0.33–0.83; P = 0.005).

Table 3.

Associations between CYP11A haplotypes and breast cancer risk

Haplotype*Racial-ethnic groups (% cases/% controls)
OR (95% CI)
AANHJALAWHOne copyPTwo copiesPEffect per haplotype copyP
No. cases/controls  345/426 109/290 425/420 335/386 401/440       
Block 1             
    1A CGGGCG 26.2/24.1 35.2/34.2 27.3/26.3 50.2/49.5 56.9/60.0 1.00  1.00  1.00  
    1B CAAGTC 30.2/28.7 28.3/29.7 31.4/28.4 19.8/21.8 15.5/13.7 1.12 (0.95–1.31) 0.172 0.96 (0.70–1.30) 0.769 1.04 (0.92–1.18) 0.535 
    1C TGGGCC 6.4/7.5 10.2/12.9 14.4/15.9 14.3/14.3 13.6/14.0 0.94 (0.79–1.12) 0.481 0.79 (0.48–1.31) 0.361 0.92 (0.79–1.08) 0.307 
    1D CGGACC 2.1/1.9 19.4/15.5 16.5/13.9 6.0/4.6 8.3/6.6 1.29 (1.04–1.58) 0.018 1.24 (0.66–2.30) 0.504 1.23 (1.03–1.48) 0.025 
    1E CGAGTC 12.9/13.0 5.3/5.8 6.5/9.9 4.6/5.3 1.7/2.0 0.81 (0.65–1.01) 0.065 0.80 (0.32–1.96) 0.618 0.83 (0.68–1.02) 0.071 
    1F CGGGCC 15.7/14.6 1.1/1.2 3.4/4.6 4.2/3.8 3.3/3.1 1.01 (0.80–1.29) 0.926 1.02 (0.40–2.60) 0.962 1.01 (0.81–1.26) 0.913 
    1G CGGGTC 6.0/9.4 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 0.62 (0.42–0.91) 0.015 —    
    Global test§ χ2 = 19.15; df, 7; P = 0.008             
Block 2             
    2A GCTCCC 27.4/24.6 29.7/32.4 17.6/16.5 34.1/31.6 56.0/59.3 1.00  1.00  1.00  
    2B ACCCAT 38.3/38.0 40/40.7 46.7/44.8 34.7/35.7 29.9/27.6 1.04 (0.89–1.22) 0.632 1.04 (0.82–1.32) 0.757 1.01 (0.90–1.14) 0.840 
    2C GCCCAC 16.2/16.2 3.9/5.7 5.5/7.8 5.3/5.4 1.6/2.4 0.85 (0.68–1.06) 0.152 0.77 (0.35–1.68) 0.514 0.85 (0.69–1.04) 0.115 
    2D GCCCCC 5.5/5.6 4.4/2.7 9.6/10.2 17.9/20.6 3.2/3.4 0.96 (0.78–1.19) 0.714 0.72 (0.37–1.43) 0.353 0.92 (0.76–1.12) 0.408 
    2E GTTACC <1.0/<1.0 8.1/7.9 7.1/6.8 3.8/3.2 3.7/2.5 1.07 (0.81–1.42) 0.626 1.57 (0.64–3.85) 0.325 1.11 (0.87–1.42) 0.401 
    2F GTTCCC 1.6/1.2 10.0/6.1 8.2/6.9 1.7/1.2 3.6/3.0 1.52 (1.15–2.01) 0.003 —    
    2G ACCCCC 5.3/9.0 <1.0/<1.0 1.6/1.3 <1.0/<1.0 <1.0/<1.0 0.67 (0.46–0.97) 0.033 —    
    Global test§ χ2 = 17.26; df, 7; P = 0.016             
Long-range             
    1A–2A CGGGCGGCTCCC 24.9/22.6 29.9/31.3 17.1/15.7 33.1/31.4 55.3/58.3 1.00  1.00  1.00  
    1B–2B CAAGTCACCCAT 29.1/28.3 28.3/28.5 29.5/26.9 19.4/21.3 15.4/13.2 1.09 (0.93–1.27) 0.308 1.02 (0.75–1.39) 0.902 1.03 (0.90–1.18) 0.635 
    1C–2B TGGGCCACCCAT 3.8/4.5 9.5/11.9 14.1/15.2 14.0/13.6 13.7/13.8 0.97 (0.81–1.16) 0.724 0.85 (0.50–1.45) 0.544 0.94 (0.80–1.11) 0.483 
    1D–2F CGGACCGTTCCC 1.3/<1.0 9.8/6.6 7.7/6.7 1.5/1.2 3.9/3.0 1.44 (1.10–1.90) 0.009 —    
    1D–2E CGGACCGTTACC <1.0/<1.0 8.1/7.9 5.8/5.1 3.8/3.1 3.7/2.4 1.14 (0.86–1.51) 0.367 1.88 (0.57–6.15) 0.300 1.17 (0.90–1.51) 0.248 
    1F–2D CGGGCCGCCCCC 5.2/5.2 <1.0/1.0 1.4/2.5 1.6/2.9 2.4/2.2 0.77 (0.56–1.06) 0.112 1.32 (0.18–9.53) 0.785 0.79 (0.58–1.08) 0.135 
    1E–2C CGAGTCGCCCAC 7.1/6.9 3.6/5.4 5.1/7.6 3.6/4.7 1.2/1.7 0.77 (0.60–0.99) 0.042 0.60 (0.15–2.40) 0.470 0.77 (0.60–0.98) 0.033 
    1A–2D CGGGCGGCCCCC <1.0/<1.0 4.6/1.6 8.5/7.7 16.3/17.5 <1.0/<1.0 1.06 (0.83–1.36) 0.623 0.89 (0.41–1.94) 0.771 1.02 (0.82–1.28) 0.842 
    1F–2C CGGGCCGCCCAC 6.8/6.3 <1.0/<1.0 0.0/0.0 1.5/<1.0 <1.0/<1.0 1.38 (0.91–2.08) 0.130 —    
    1G–2G CGGGTCACCCCC 4.1/7.2 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 0.52 (0.33–0.83) 0.005 —    
    Global test§ χ2 = 20.73; df, 10; P = 0.006             
Haplotype*Racial-ethnic groups (% cases/% controls)
OR (95% CI)
AANHJALAWHOne copyPTwo copiesPEffect per haplotype copyP
No. cases/controls  345/426 109/290 425/420 335/386 401/440       
Block 1             
    1A CGGGCG 26.2/24.1 35.2/34.2 27.3/26.3 50.2/49.5 56.9/60.0 1.00  1.00  1.00  
    1B CAAGTC 30.2/28.7 28.3/29.7 31.4/28.4 19.8/21.8 15.5/13.7 1.12 (0.95–1.31) 0.172 0.96 (0.70–1.30) 0.769 1.04 (0.92–1.18) 0.535 
    1C TGGGCC 6.4/7.5 10.2/12.9 14.4/15.9 14.3/14.3 13.6/14.0 0.94 (0.79–1.12) 0.481 0.79 (0.48–1.31) 0.361 0.92 (0.79–1.08) 0.307 
    1D CGGACC 2.1/1.9 19.4/15.5 16.5/13.9 6.0/4.6 8.3/6.6 1.29 (1.04–1.58) 0.018 1.24 (0.66–2.30) 0.504 1.23 (1.03–1.48) 0.025 
    1E CGAGTC 12.9/13.0 5.3/5.8 6.5/9.9 4.6/5.3 1.7/2.0 0.81 (0.65–1.01) 0.065 0.80 (0.32–1.96) 0.618 0.83 (0.68–1.02) 0.071 
    1F CGGGCC 15.7/14.6 1.1/1.2 3.4/4.6 4.2/3.8 3.3/3.1 1.01 (0.80–1.29) 0.926 1.02 (0.40–2.60) 0.962 1.01 (0.81–1.26) 0.913 
    1G CGGGTC 6.0/9.4 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 0.62 (0.42–0.91) 0.015 —    
    Global test§ χ2 = 19.15; df, 7; P = 0.008             
Block 2             
    2A GCTCCC 27.4/24.6 29.7/32.4 17.6/16.5 34.1/31.6 56.0/59.3 1.00  1.00  1.00  
    2B ACCCAT 38.3/38.0 40/40.7 46.7/44.8 34.7/35.7 29.9/27.6 1.04 (0.89–1.22) 0.632 1.04 (0.82–1.32) 0.757 1.01 (0.90–1.14) 0.840 
    2C GCCCAC 16.2/16.2 3.9/5.7 5.5/7.8 5.3/5.4 1.6/2.4 0.85 (0.68–1.06) 0.152 0.77 (0.35–1.68) 0.514 0.85 (0.69–1.04) 0.115 
    2D GCCCCC 5.5/5.6 4.4/2.7 9.6/10.2 17.9/20.6 3.2/3.4 0.96 (0.78–1.19) 0.714 0.72 (0.37–1.43) 0.353 0.92 (0.76–1.12) 0.408 
    2E GTTACC <1.0/<1.0 8.1/7.9 7.1/6.8 3.8/3.2 3.7/2.5 1.07 (0.81–1.42) 0.626 1.57 (0.64–3.85) 0.325 1.11 (0.87–1.42) 0.401 
    2F GTTCCC 1.6/1.2 10.0/6.1 8.2/6.9 1.7/1.2 3.6/3.0 1.52 (1.15–2.01) 0.003 —    
    2G ACCCCC 5.3/9.0 <1.0/<1.0 1.6/1.3 <1.0/<1.0 <1.0/<1.0 0.67 (0.46–0.97) 0.033 —    
    Global test§ χ2 = 17.26; df, 7; P = 0.016             
Long-range             
    1A–2A CGGGCGGCTCCC 24.9/22.6 29.9/31.3 17.1/15.7 33.1/31.4 55.3/58.3 1.00  1.00  1.00  
    1B–2B CAAGTCACCCAT 29.1/28.3 28.3/28.5 29.5/26.9 19.4/21.3 15.4/13.2 1.09 (0.93–1.27) 0.308 1.02 (0.75–1.39) 0.902 1.03 (0.90–1.18) 0.635 
    1C–2B TGGGCCACCCAT 3.8/4.5 9.5/11.9 14.1/15.2 14.0/13.6 13.7/13.8 0.97 (0.81–1.16) 0.724 0.85 (0.50–1.45) 0.544 0.94 (0.80–1.11) 0.483 
    1D–2F CGGACCGTTCCC 1.3/<1.0 9.8/6.6 7.7/6.7 1.5/1.2 3.9/3.0 1.44 (1.10–1.90) 0.009 —    
    1D–2E CGGACCGTTACC <1.0/<1.0 8.1/7.9 5.8/5.1 3.8/3.1 3.7/2.4 1.14 (0.86–1.51) 0.367 1.88 (0.57–6.15) 0.300 1.17 (0.90–1.51) 0.248 
    1F–2D CGGGCCGCCCCC 5.2/5.2 <1.0/1.0 1.4/2.5 1.6/2.9 2.4/2.2 0.77 (0.56–1.06) 0.112 1.32 (0.18–9.53) 0.785 0.79 (0.58–1.08) 0.135 
    1E–2C CGAGTCGCCCAC 7.1/6.9 3.6/5.4 5.1/7.6 3.6/4.7 1.2/1.7 0.77 (0.60–0.99) 0.042 0.60 (0.15–2.40) 0.470 0.77 (0.60–0.98) 0.033 
    1A–2D CGGGCGGCCCCC <1.0/<1.0 4.6/1.6 8.5/7.7 16.3/17.5 <1.0/<1.0 1.06 (0.83–1.36) 0.623 0.89 (0.41–1.94) 0.771 1.02 (0.82–1.28) 0.842 
    1F–2C CGGGCCGCCCAC 6.8/6.3 <1.0/<1.0 0.0/0.0 1.5/<1.0 <1.0/<1.0 1.38 (0.91–2.08) 0.130 —    
    1G–2G CGGGTCACCCCC 4.1/7.2 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 <1.0/<1.0 0.52 (0.33–0.83) 0.005 —    
    Global test§ χ2 = 20.73; df, 10; P = 0.006             
*

Haplotypes with frequencies ≥5% in the multiethnic panel are shown.

ORs adjusted for age and ethnicity.

Reference group is the most common haplotype.

§

The most common haplotype was the reference group. All rare haplotypes (<5%) were combined as one group. df = number of haplotypes (common and rare haplotype combined) − 1.

Next, we tested the effect of the eight-TAAAA allele which was the allele suggested by Zheng et al. to be associated with increased risk of breast cancer (9). In African-Americans, Native Hawaiians, and Japanese, haplotype 1C carried the eight-repeat allele exclusively, thus representing a proxy for this specific repeat allele in these racial-ethnic groups. In these groups, we observed an inverse association between haplotype 1C and breast cancer risk, although this finding was not statistically significant (n = 879 cases; n = 1,136 controls: OR, 0.84; 95% CI, 0.67–1.05; P = 0.117).

We also tested the independent effects of each tag SNP as they predict not only the common haplotype patterns across the locus but also the other common SNPs genotyped in the multiethnic panel that were not genotyped and tested directly in the case-control population. We found a nominally significant positive association with a tag SNP that was found to travel exclusively on haplotype 1D (SNP 12, P for trend = 0.012; Supplemental Table S4). We also found a borderline significant positive association with SNP 24 (P for trend = 0.052), which was found to travel on haplotypes 2E and 2F. These effects, however, were no greater than that observed with the individual haplotypes that were found to be nominally associated with an increase in risk.

Further adjustment for population stratification, by including terms for the first four principal components of the covariance matrix (which represented the large majority of total variation of the eigenvalues of this matrix) and interaction terms between ethnicity and these terms, yielded similar results. The global tests of haplotype effects in each block remained significant (block 1, P = 0.011; block 2, P = 0.018) as well as the association with individual haplotypes [1D, RR = 1.20 (95% CI, 1.00–1.45, P = 0.05); 2F, RR = 1.51 (95% CI, 1.14–2.00; P = 0.005)]. The associations with SNP 12 (P for trend = 0.023) and SNP 24 (P for trend = 0.090) were also similar after the inclusion of these terms.

To clarify whether the associations that we observed with individual haplotypes and tag SNPs might be the result of multiple testing, we did permutation tests (23). The permutation-based P value for the most nominally significant haplotype 2F was 0.053. This result indicates that, if a similar study was repeated under a null distribution (i.e., no CYP11A haplotype associated with breast cancer), an association similar to that observed with haplotype 2F would occur by chance only 5.3% of the time.

With a key role in steroid hormone biosynthesis, variation in CYP11A has been hypothesized to be related to the risk of developing breast cancer. In this study, we did a systematic analysis of common variation across the CYP11A locus, considering the possibility that CYP11A variants other than the (TAAAA)n polymorphism may contribute to breast cancer susceptibility. We directly surveyed variation in the coding region of the CYP11A gene to search for common missense variants, and did a detailed haplotype-based analysis to uncover the effects of functional variation in noncoding regions. The sequencing panel provided >99% power to detect missense variants at a combined frequency of ≥5% across racial-ethnic groups. In this survey, we identified only one rare missense variant (Met367Arg) in one Japanese case, indicating that missense variants at this locus are scarce and not a major contributor to breast cancer risk in these populations. Obviously, our sample size is not adequately powered to identify rare variants (<5%) that are shared or unique to specific populations; to address these issues, larger resequencing efforts will be required.

We did, however, observe modest nominal significant positive associations between breast cancer risk and two strongly correlated haplotypes in blocks 1 (1D) and 2 (2F) as well as SNPs that tag these haplotypes. We also observed modest nominal significant inverse associations with haplotype 1E, and haplotypes 1G and 2G, which were common only among African-Americans. The two risk haplotypes (1D and 2F) are relatively common in all populations except in African-Americans, and their frequencies are higher in cases than in controls in each ethnic group in which they were observed. The consistency of the association across populations suggests that these haplotypes may be marking a common susceptibility allele for breast cancer. As these haplotypes are rare among African-Americans (<2%), larger studies in this population will be needed to determine if these haplotypes are associated with risk, and to assess whether closely related haplotypes, such as 1F, which is only common in African-Americans, might be marking the same allele that is contributing to risk in the other populations.

As numerous statistical tests were conducted in this analysis, we considered the possibility that our results for individual haplotypes were chance findings. This was assessed by performing a global test of the effect of any common haplotypes on risk and by subjecting the most significant nominal associations observed for individual haplotypes to permutation testing. Whereas the global tests were significant (P < 0.02), the permutation test result was borderline significant (P ≥ 0.053). One important difference between these two tests is that the permutation test compared the risk associated with each common haplotype to the pooled effect of all other haplotypes, whereas the global test was used to compare all common haplotypes simultaneously. Thus, the permutation test will be more sensitive if there is only one haplotype associated with increased risk. On the other hand, because the global test is less specific about which haplotype is most likely to be associated with risk overall, it may be more sensitive to the presence of multiple haplotypes that either increase or decrease risk such as we observed.

To date, only one study has examined variation at the CYP11A locus in relation to breast cancer risk—a large population-based breast cancer case-control study in Shanghai (1,015 incident cases and 1,082 controls; ref. 9). In this study, Zheng et al. observed that compared with noncarriers, women carrying a single copy of the eight-repeat allele had a >50% elevated risk of breast cancer, whereas the risk was increased nearly 3-fold in those who carried two copies of this allele (9). We were able to examine the effects of the (TAAAA)n polymorphism indirectly as the eight-repeat allele (the high-risk repeat allele suggested by Zheng et al.) was found to travel exclusively on haplotype 1C in African-Americans, Native Hawaiians, and Japanese. Tests of association between this haplotype and breast cancer in these three populations, however, yielded no significant results.

In this study, we had 80% statistical power in the combined analysis to detect an OR of 1.25 (OR for heterozygotes versus homozygote wild-types) assuming a codominant model for an allele with a frequency of 0.10 and an Rh2 of 0.94 (which is the average Rh2 across racial-ethnic groups and blocks in our study). However, within each ethnic group, we had only 67% to 75% power to detect an OR of 1.5 for an allele with a frequency of 0.10 (except for Native Hawaiians: OR, 1.8; power, 70%). We acknowledge that the sample size within each ethnic group was not large enough to evaluate definitively ethnic-specific risks and we cannot exclude the possibility that we may have missed associations with ethnic-specific variants that display modest effects.

In summary, our results suggest that common germ line variations in CYP11A may contribute to breast cancer risk in this multiethnic cohort, although no SNP or haplotype has been singled out as the most likely “causal” variant. Additional follow-up studies of the suggestive findings in larger study populations such as the National Cancer Institute Breast and Prostate Cancer Cohort Consortium are warranted.

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).

Grant support: National Cancer Institute grants CA63464 and CA54281. National Cancer Institute Career Development Award CA116543 (V.W. Setiawan).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We are most indebted to the participants of the Multiethnic Cohort Study for their participation and commitment, Stephanie Riley and David Wong for their laboratory assistance, Dr. John Casagandre for bioinformatics support, and Dr. Kristine Monroe and Hank Huang for their support with the data management.

1
Key T, Appleby P, Barnes I, Reeves G. Endogenous sex hormones and breast cancer in postmenopausal women: reanalysis of nine prospective studies.
J Natl Cancer Inst
2002
;
94
:
606
–16.
2
Henderson BE, Feigelson HS. Hormonal carcinogenesis.
Carcinogenesis
2000
;
21
:
427
–33.
3
Hunter DJ, Riboli E, Haiman CA, et al. A candidate gene approach to searching for low-penetrance breast and prostate cancer genes.
Nat Rev Cancer
2005
;
5
:
977
–85.
4
Franks S, Gharani N, Waterworth D, et al. The genetic basis of polycystic ovary syndrome.
Hum Reprod
1997
;
12
:
2641
–8.
5
Gharani N, Waterworth DM, Batty S, et al. Association of the steroid synthesis gene CYP11a with polycystic ovary syndrome and hyperandrogenism.
Hum Mol Genet
1997
;
6
:
397
–402.
6
Gaasenbeek M, Powell BL, Sovio U, et al. Large-scale analysis of the relationship between CYP11A promoter variation, polycystic ovarian syndrome, and serum testosterone.
J Clin Endocrinol Metab
2004
;
89
:
2408
–13.
7
Diamanti-Kandarakis E, Bartzis MI, Bergiele AT, Tsianateli TC, Kouli CR. Microsatellite polymorphism (tttta)(n) at −528 base pairs of gene CYP11α influences hyperandrogenemia in patients with polycystic ovary syndrome.
Fertil Steril
2000
;
73
:
735
–41.
8
Kumazawa T, Tsuchiya N, Wang L, et al. Microsatellite polymorphism of steroid hormone synthesis gene CYP11A1 is associated with advanced prostate cancer.
Int J Cancer
2004
;
110
:
140
–4.
9
Zheng W, Gao YT, Shu XO, et al. Population-based case-control study of CYP11A gene polymorphism and breast cancer risk.
Cancer Epidemiol Biomarkers Prev
2004
;
13
:
709
–14.
10
Kolonel LN, Henderson BE, Hankin JH, et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics.
Am J Epidemiol
2000
;
151
:
346
–57.
11
Haiman CA, Stram DO, Pike MC, et al. A comprehensive haplotype analysis of CYP19 and breast cancer risk: the Multiethnic Cohort.
Hum Mol Genet
2003
;
12
:
2679
–92.
12
Gabriel SB, Schaffner SF, Nguyen H, et al. The structure of haplotype blocks in the human genome.
Science
2002
;
296
:
2225
–9.
13
Excoffier L, Slatkin M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population.
Mol Biol Evol
1995
;
12
:
921
–7.
14
Stram DO, Haiman CA, Hirschhorn JN, et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study.
Hum Hered
2003
;
55
:
27
–36.
15
Freedman ML, Penney KL, Stram DO, et al. Common variation in BRCA2 and breast cancer risk: a haplotype-based analysis in the Multiethnic Cohort.
Hum Mol Genet
2004
;
13
:
2431
–41.
16
Collins FS, Guyer MS, Charkravarti A. Variations on a theme: cataloging human DNA sequence variation.
Science
1997
;
278
:
1580
–1.
17
Lander ES. The new genomics: global views of biology.
Science
1996
;
274
:
536
–9.
18
A haplotype map of the human genome.
Nature
2005
;
437
:
1299
–320.
19
Rosenberg NA, Pritchard JK, Weber JL, et al. Genetic structure of human populations.
Science
2002
;
298
:
2381
–5.
20
Ioannidis JP, Ntzani EE, Trikalinos TA. ‘Racial’ differences in genetic effects for complex diseases.
Nat Genet
2004
;
36
:
1312
–8.
21
Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals.
Hum Hered
2002
;
53
:
79
–91.
22
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies.
Nat Genet
2006
;
38
:
904
–9.
23
Newton-Cheh C, Hirschhorn JN. Genetic association studies of complex traits: design and analysis issues.
Mutat Res
2005
;
573
:
54
–69.

Supplementary data