Abstract
Increased mammographic density (MD), the proportion of dense tissue visible on a mammogram, is a strong risk factor for breast cancer, common in the population and clusters in families. We conducted the first genome-wide linkage scan to identify genes influencing MD. DNA was obtained from 889 relatives (756 women, 133 men) from 89 families. Percent MD was estimated on 618 (82%) female family members using a validated computer-assisted thresholding method. The genome-wide scan included 403 microsatellite DNA markers with an average spacing of 9 cM. Fine mapping of a region of chromosome 5p (5p13.1-5p15.1) was done using 21 additional closely spaced DNA markers. Linkage analyses were conducted to quantify the evidence for a gene responsible for MD across the genome. The maximum log odds for linkage (LOD) score from the genome-wide scan was on chromosome 5p (LOD = 2.9, supporting linkage by a factor of 102.9 or 794 to 1) with a 1-LOD interval spanning 28.6 cM. Two suggestive regions for linkage were also identified on chromosome 12 (LOD = 2.6, 1-LOD interval of 14.8 cM; and LOD = 2.5, 1-LOD interval of 17.2 cM). Finer mapping of the region surrounding the maximum LOD on chromosome 5p resulted in stronger and statistically significant evidence for linkage (LOD = 4.2) and a narrowed 1-LOD interval (13.4 cM). The putative locus on chromosome 5p is likely to account for up to 22% of variation in MD. Hence, 1 or more of the 45 candidate genes in this region could explain a large proportion of MD and, potentially, breast cancer. [Cancer Res 2007;67(17):8412–8]
Introduction
One of the most important risk factors for breast cancer is mammographic density (MD), with risk estimates that are 3- to 5-fold greater for women in the highest quartile of density than women of similar age in the lowest quartile (1). Variation in the radiographic appearance of the breast reflects differences in tissue composition: Darker regions indicate fat tissue, and lighter regions represent dense tissue, primarily fibroglandular tissue consisting of the functional elements or parenchyma and supporting elements or stroma (2). This trait is quantifiable (3) and modifiable (4, 5), and its association with breast cancer is independent of age and other risk factors for breast cancer (6, 7). Two recent studies concluded that the incorporation of this risk factor in the Gail model improves prediction of breast cancer risk among both pre- and postmenopausal women (8, 9).
Epidemiologic risk factors associated with MD, including strong inverse associations with body mass index (BMI) and age, account for only 20% to 30% of variation in the trait (7, 10, 11). Genetic factors and the interaction between genes and environment likely account for the remaining variation (12). Evidence for a genetic influence comes from familial aggregation (13, 14), family-based segregation (13), and twin (12, 15) studies. A large study in two populations of monozygotic and dizygotic twins estimated heritability from 65% to 74% for age-adjusted to 60% to 67% for multivariable-adjusted analyses (12). Gene association studies have also been used to identify genetic factors involved in increased MD (16–23), but few findings have been replicated to date (20, 23), illustrating that the selection of candidates genes based on our limited understanding of the biology of MD has not been particularly instructive.
Genetic linkage analysis represents another approach to identify genes for MD, one that does not require complete understanding of the biology of the trait. Previous work suggesting evidence for a major gene effect (13) and the evidence for a high degree of heritability of MD (12) justify a linkage analysis approach to identify genes influencing the trait. Here, we present the first genome-wide linkage analysis of MD in a large collection of families and provide strong evidence for a gene or genes influencing MD on chromosome 5p.
Materials and Methods
Study population. Four-hundred and twenty-six multigenerational families ascertained through a breast cancer proband diagnosed from 1944 to 1952 at the University of Minnesota (24) formed the sample. Probands were consecutive cases, unselected for family history. First- and second-degree blood relatives of the proband and spouses were interviewed between 1990 and 1996; 93% of those contacted provided a telephone interview that included detailed risk factor information. Almost all (99%) women in the 426 families were Caucasian and from Minnesota. Details of these families are provided elsewhere (24).
Simulation studies were done to identify the families most informative for linkage analyses. A subset of 90 of the 426 families was selected, and 1,146 family members were invited to provide a blood or buccal sample as a source of DNA; 901 (79%) consented. After the exclusion of 12 individuals due to Mendelian (familial) inconsistencies across markers, the final sample included 89 families, with 889 Caucasian individuals (133 men, 756 women). As part of the parent study, women provided the location of the most recent mammogram and permission to obtain and digitize their mammograms. Mammograms were requested from clinics across the United States, and all were recent mammograms done over the 1990 to 2001 period when national standards were in place for mammography. Among the 737 age-eligible women, we retrieved the mammograms of 658 (89%). Of women with mammograms, 618 (82%) had both craniocaudal and mediolateral oblique views available. Five percent of women had a breast cancer diagnosis during the follow-up period (2000–2002); for these women, mammograms before the diagnosis were used.
The protocol was approved by the Mayo Clinic Institutional Review Board.
Percent MD phenotype estimation. Original mammograms were obtained on 658 women and digitized on a Lumiscan 75 scanner with 12-bit grayscale depth. The pixel size was 0.130 × 0.130 mm2 for both the 18 × 24- and 24 × 30-cm2 films. MD was estimated for each view using a computer-assisted thresholding program (ref. 3; Fig. 1) with proven reliability (25). For this study, MD from the mediolateral oblique and craniocaudal views were averaged and used as the phenotype.
Two breasts of contrasting MDs. The original craniocaudal mammogram view (top) and estimation of MD (bottom) using the computer-assisted thresholding method, Cumulus.
Two breasts of contrasting MDs. The original craniocaudal mammogram view (top) and estimation of MD (bottom) using the computer-assisted thresholding method, Cumulus.
Machine settings were not available on the majority of films, and all mammograms were obtained from different centers with few from the same clinic, preventing adjustment for these variables. However, the limited studies on this topic have shown that the influence of machine settings on the MD measure is minimal (26).
Genotyping methods. The genome-wide screen initially consisted of 400 microsatellite DNA markers across the chromosomes from the ABI Prism Linkage Mapping Set version 2.5 (PE Applied Biosystems; ref. 27). Six markers (on chromosomes 1, 4, 13, 14, and 16) were replaced (three of these with two new markers) due to the presence of null alleles (one marker), amplification problems (four markers) or strong allele bias (one marker), bringing the total number of markers to 403. The average information content across all chromosomes was 85% (25th–75th percentiles; 83.2–88.4%), and the average intermarker distance was 8.99 cM (25th–75th percentiles; 6.1–11.1 cM) based on the deCODE linkage map (28); the five largest gaps ranged from 21.0 to 28.0 cM and were located on chromosomes 1, 5q, 6p, 6q, and 8q.
The genome scan was done within the Mayo Clinic Genome Shared Resource by standard methods (27). Genotyping for the fine mapping of the identified candidate region consisted of 21 markers spaced 1.6 cM (±0.27) apart and was done by deCODE Genetics (29). A total of 91 patient samples, 2 duplicates, and 3 controls from the Centre d'Etude du Polymorphisme Humain (CEPH) were run per 96-well plate for both the genome scan and fine mapping to evaluate the quality of the genotyping.
Genotype Quality
The genotype data from the initial scan were evaluated for genotyping accuracy using locally written procedures to assess Mendelian consistency within each of the families and the PREST (30) program to assess relationships. After accounting for relationship misclassifications, all markers were again assessed for Mendelian consistency. From the PREST results, we found convincing evidence, in eight families, of full siblings who were actually half sibs and corrected the relationships. In addition, we identified one individual, labeled as the mother, who was not blood related to her offspring, and excluded her from all analyses.
Pairwise comparisons were made for all pairs of subjects to examine the percentage of identical genotypes across all markers to ensure that all monozygotic twins were correctly identified, and no plating errors had occurred. No discrepancies were found. The potential for departures from Hardy-Weinberg equilibrium (HWE) was assessed through a resampling approach where a single individual was randomly selected from each family. All markers for both the genome scan and fine mapping were in HWE defined by P > 0.05. For the genome scan and fine mapping, all duplicate samples were concordant across all markers, and CEPH controls matched across all plates. The genome screen resulted in 357,172 (98.9%) genotypes that were Mendelian consistent and useable in analysis. For the fine mapping, 48,566 genotypes (99.3% of those called) were available for analysis.
Statistical analyses. Quantitative trait linkage analyses using a variance components approach (31) were done using the EMVC software package, which uses an expectation-maximization algorithm to estimate genetic variance components (32). The multipoint identical-by-descend (IBD) sharing probabilities used in EMVC were estimated in SIMWALK2 (33) for the autosomes. IBD estimates for the X chromosome are not available in SIMWALK2. We therefore used the utilities incorporated in the MERLIN (34) suite of linkage analysis tools (MINX) to perform variance component linkage analyses along the X chromosome after breaking the large families into smaller subsets to enable computation. These variance component approaches estimate the variability in the trait explained by genetic sharing specifically attributable to a single locus and by genetic sharing broadly attributable to familial relationships. Variance components were estimated by maximum likelihood while adjusting for nongenetic correlates of MD. Log odds for linkage (LOD) scores were obtained by comparing the likelihoods from models with and without accounting for locus-specific variability (31). LOD scores >3.3 were considered to provide significant evidence in favor of linkage; LOD scores more than 2.2 were considered suggestive for linkage (35). Support intervals were identified as the continuous genetic region surrounding the maximum LOD score that had LOD scores no smaller than the maximum LOD score minus one.
Variance component models were evaluated for the primary phenotype, mean MD (mean of craniocaudal and mediolateral oblique views) adjusted only for age and also adjusted for covariates previously associated with MD in this cohort (10). Covariates were obtained from a self-administered questionnaire completed within a median of 4 months (interquartile range, 0.96–15.7 months) of the mammogram date and included age (1/age), body mass index (1/BMI), menopausal status, hormone therapy (HT), lifetime alcohol consumption, oral contraceptive (OC) use, education, number of live births, age at first live birth and pack-years smoking. Only women with complete information on the primary phenotype and all of the covariates (n = 583) were used in the linkage analyses. However, the genetic information from all those who had been genotyped, including males (n = 133) as well as women with missing covariate (n = 38) or phenotype data (n = 138), was used in the estimation of IBD sharing described above. Secondary analyses were done after removing the 5% of women who developed breast cancer over the course of the study.
Because BMI is inversely correlated with MD (r = −0.50 in our sample), we wanted to ensure we were identifying genes for MD, not for BMI. Thus, we did linkage analyses for mean MD with and without adjusting for BMI. In addition, for those chromosomes found to have regions of suggestive linkage, we did linkage analyses of BMI with and without adjustment for MD (the genome-wide scan of BMI is subject of a separate report). Additionally, we estimated the genetic correlation between these two traits via maximum likelihood methods and did bivariate linkage analyses (36) on both MD and BMI, again at our suggestive loci, to assess whether one genetic locus influenced both MD and BMI. A high genetic correlation would suggest that much of the correlation between two traits (in this case BMI and MD) could be explained by the same genes. In the case where a putative gene is actually linked to both traits, we would expect to see a significant increase in the LOD for the bivariate compared with the univariate linkage analyses. All of these models were run in EMVC while adjusting for the covariates listed above.
After identifying a genomic region where there was evidence for linkage following fine mapping, we did two additional analyses to further inform our locus. First, we estimated the magnitude of the effect corresponding to the putative locus by extracting the variance component corresponding to the locus-specific genetic sharing at the location with the maximum LOD score. The ratio of this variance component to the overall trait variance reflects the proportion of the variability explained by the locus, although this is likely an overestimate, because genetic effect sizes based on an initial linkage scan are known to be upwardly biased (37). Second, we did a bootstrap simulation to estimate the confidence in our region. We resampled families with replacement to form 10,000 bootstrap data sets and computed LOD scores for each bootstrap sample to determine how consistently they were >3.3.
Results
The 89 multigenerational families averaged 10 members (SD = 8.5) providing DNA and subsequent genotype information for the genetic linkage analyses. Exactly 9 of the 89 families were comprised of three or more nuclear families, and 80 had 2 or fewer.
Both mediolateral oblique and craniocaudal mammogram views were available on 618 of the 756 genotyped women, and 583 (94%) women also had complete covariate information and were used in analyses. Mean MD was 26.5% and ranged from 0% to 73.2%, with a SD of 15.8. The distribution of mean MD was slightly positively skewed and had a slight negative kurtosis (skewness, 0.57; kurtosis, −0.20). After accounting for nongenetic covariates, the MD-adjusted distribution conformed well to the normality assumption; the Shapiro-Wilk test did not show evidence for departures from normality (P = 0.23); the skewness was 0.15, and the kurtosis was 0.01. Overall, the women in the genome screen were primarily postmenopausal (69%), parous (85%), high school educated (84%), moderate drinkers (82% weekly or monthly), and 41% had used postmenopausal hormones (Table 1). MD was positively associated with education, lifetime alcohol consumption, current OC and HT use, years of OC use, nulliparity, and premenopausal status and inversely associated with age and BMI. MD was not associated with pack-years smoking (r = −0.02, P value, 0.56) in this set of families. The greatest variability in MD was explained by BMI and age (Table 1).
Descriptive characteristics (means ± SD; percent distribution) and association (means ± SD; correlation) of risk factors with MD in women from linkage families
Variable . | All women in 89 linkage families (n = 756)* . | Women with MD phenotype and complete covariate information (n = 583)† . | Mean (SD) MD or Correlation with MD‡ . | P§ . | Variance in MD (%) explained by factor∥ . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean MD (10th, 90th percentile) | 26.5 ± 15.9 (7.4, 48.8) | 26.4 ± 15.8 (7.4, 48.8) | ||||||||
Correlation | ||||||||||
Age at mammogram (y) | 57.9 ± 12.0 | 57.2 ± 11.7 | −0.44 | <0.01 | 20 | |||||
BMI | 26.8 ± 5.7 | 27.0 ± 5.6 | −0.53 | <0.01 | 28 | |||||
Years on OCs | 3.2 ± 4.7 | 3.2 ± 4.6 | 0.17 | <0.01 | 3 | |||||
Breast cancer | ||||||||||
No | 94% | 97% | ||||||||
Yes | 5% | 3% | ||||||||
Missing | <1% | 0% | ||||||||
Mean MD (SD) | ||||||||||
Menopausal status | <0.01 | 12 | ||||||||
Pre-menopausal | 31% | 31% | 34.5 (16.3) | |||||||
Post-menopausal | 69% | 69% | 22.7 (14.2) | |||||||
Missing | <1% | |||||||||
HT use at mammogram | 0.02 | 1 | ||||||||
Never | 59% | 56% | 26.6 (16.6) | |||||||
Former use | 16% | 16% | 22.3 (14.5) | |||||||
Current use | 25% | 28% | 28.1 (14.5) | |||||||
Number of live births (age at first live birth) | <0.01 | 8 | ||||||||
Nulliparous | 13% | 12% | 34.6 (16.8) | |||||||
1–2 (≤20) | 7% | 6% | 25.6 (14.6) | |||||||
1–2 (>20) | 24% | 24% | 30.1 (17.5) | |||||||
3+ (≤20) | 23% | 24% | 21.1 (13.3) | |||||||
3+ (>20) | 31% | 33% | 24.6 (14.3) | |||||||
Missing | 2% | |||||||||
OC use at mammogram | <0.01 | 6 | ||||||||
Never | 41% | 41% | 22.0 (13.9) | |||||||
Former use | 55% | 57% | 29.0 (16.3) | |||||||
Current use | 3% | 2% | 38.6 (13.8) | |||||||
Missing | <1% | |||||||||
Lifetime alcohol consumption | <0.01 | 4 | ||||||||
Never | 12% | 12% | 19.2 (12.0) | |||||||
Monthly | 62% | 63% | 26.6 (16.0) | |||||||
Weekly | 20% | 21% | 28.4 (15.4) | |||||||
Daily | 4% | 4% | 33.9 (18.6) | |||||||
Missing | 1% | |||||||||
Educational level completed | <0.01 | 3 | ||||||||
No diploma | 16% | 14% | 20.7 (13.0) | |||||||
High school | 41% | 41% | 26.3 (15.9) | |||||||
Some college | 28% | 30% | 27.7 (15.9) | |||||||
College graduate | 14% | 15% | 29.6 (16.6) | |||||||
Missing | <1% |
Variable . | All women in 89 linkage families (n = 756)* . | Women with MD phenotype and complete covariate information (n = 583)† . | Mean (SD) MD or Correlation with MD‡ . | P§ . | Variance in MD (%) explained by factor∥ . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Mean MD (10th, 90th percentile) | 26.5 ± 15.9 (7.4, 48.8) | 26.4 ± 15.8 (7.4, 48.8) | ||||||||
Correlation | ||||||||||
Age at mammogram (y) | 57.9 ± 12.0 | 57.2 ± 11.7 | −0.44 | <0.01 | 20 | |||||
BMI | 26.8 ± 5.7 | 27.0 ± 5.6 | −0.53 | <0.01 | 28 | |||||
Years on OCs | 3.2 ± 4.7 | 3.2 ± 4.6 | 0.17 | <0.01 | 3 | |||||
Breast cancer | ||||||||||
No | 94% | 97% | ||||||||
Yes | 5% | 3% | ||||||||
Missing | <1% | 0% | ||||||||
Mean MD (SD) | ||||||||||
Menopausal status | <0.01 | 12 | ||||||||
Pre-menopausal | 31% | 31% | 34.5 (16.3) | |||||||
Post-menopausal | 69% | 69% | 22.7 (14.2) | |||||||
Missing | <1% | |||||||||
HT use at mammogram | 0.02 | 1 | ||||||||
Never | 59% | 56% | 26.6 (16.6) | |||||||
Former use | 16% | 16% | 22.3 (14.5) | |||||||
Current use | 25% | 28% | 28.1 (14.5) | |||||||
Number of live births (age at first live birth) | <0.01 | 8 | ||||||||
Nulliparous | 13% | 12% | 34.6 (16.8) | |||||||
1–2 (≤20) | 7% | 6% | 25.6 (14.6) | |||||||
1–2 (>20) | 24% | 24% | 30.1 (17.5) | |||||||
3+ (≤20) | 23% | 24% | 21.1 (13.3) | |||||||
3+ (>20) | 31% | 33% | 24.6 (14.3) | |||||||
Missing | 2% | |||||||||
OC use at mammogram | <0.01 | 6 | ||||||||
Never | 41% | 41% | 22.0 (13.9) | |||||||
Former use | 55% | 57% | 29.0 (16.3) | |||||||
Current use | 3% | 2% | 38.6 (13.8) | |||||||
Missing | <1% | |||||||||
Lifetime alcohol consumption | <0.01 | 4 | ||||||||
Never | 12% | 12% | 19.2 (12.0) | |||||||
Monthly | 62% | 63% | 26.6 (16.0) | |||||||
Weekly | 20% | 21% | 28.4 (15.4) | |||||||
Daily | 4% | 4% | 33.9 (18.6) | |||||||
Missing | 1% | |||||||||
Educational level completed | <0.01 | 3 | ||||||||
No diploma | 16% | 14% | 20.7 (13.0) | |||||||
High school | 41% | 41% | 26.3 (15.9) | |||||||
Some college | 28% | 30% | 27.7 (15.9) | |||||||
College graduate | 14% | 15% | 29.6 (16.6) | |||||||
Missing | <1% |
A total of 756 women in the 89 linkage families contributed to the calculation of IBDs.
A total of 583 women had both a CC and MLO mammogram view and complete covariate information, which were required for the primary phenotype (mean MD) for the linkage analyses.
Unadjusted mean MD by categorical variables and Pearson correlations with MD.
P value from the test of correlation or mean.
Variance attributed to factor estimated from R2 from linear regression models.
Genome scan. The strongest evidence for linkage from the genome scan, although not reaching statistical significance, was on chromosome 5p with a LOD of 2.9 at 50.4 cM flanked by markers D5S419 (46.2 cM) and D5S426 (56.8 cM). Similar to our segregation analyses on these families (13), the evidence for linkage was only apparent after adjustment for relevant covariates (Fig. 2). Chromosome 12 contained two suggestive regions of linkage (LOD = 2.6: 31.2–45.9 cM on 12p and LOD = 2.5: 74.0–91.2 cM on 12q). Results were similar when excluding the 5% of women who developed breast cancer after their mammogram (LODs = 2.8, 2.6, and 2.5 for peaks on chromosomes 5p, 12p, and 12q).
Results from genome-wide scan for mean MD. LOD scores from age-adjusted and full (multivariable-adjusted) models by chromosome in 89 families.
Results from genome-wide scan for mean MD. LOD scores from age-adjusted and full (multivariable-adjusted) models by chromosome in 89 families.
We observed a high level of genetic correlation between MD and BMI (0.71). However, further analyses suggested that the linkage signals observed for MD did not represent loci that were linked to both MD and BMI. None of the candidate regions on chromosome 5 or 12 showed evidence for linkage with BMI, with or without adjustment for MD (Fig. 3). Without adjustment for BMI, the evidence of linkage to MD in the chromosome 5p region weakened (LOD = 1.5), suggesting that the observed linkage signal is actually masked by BMI, and that performing analyses for MD while conditioning on the degree of BMI is important for isolating the genetic signals for MD. Finally, bivariate analyses of BMI and MD together resulted in the level of evidence for linkage on chromosome 5p that was similar to that for mean MD adjusted for BMI (LODs of 3.3 and 2.9, respectively); the same was true for chromosome 12 where the results from the bivariate linkage analyses (LOD scores of 2.4 and 2.6) were virtually identical to those from the linkage of MD alone in these regions (LODs of 2.6 and 2.5; Fig. 3). Mean MD and not BMI seems to be responsible for the linkage results to chromosomes 5 and 12.
Chromosomes 5 and 12 linkage analysis results for BMI, mean MD, and the combined BMI and mean MD phenotype, indicating a strong linkage signal for mean MD full (multivariable-adjusted) model on chromosome 5. Age-adjusted and full models are shown.
Chromosomes 5 and 12 linkage analysis results for BMI, mean MD, and the combined BMI and mean MD phenotype, indicating a strong linkage signal for mean MD full (multivariable-adjusted) model on chromosome 5. Age-adjusted and full models are shown.
Fine mapping. Fine mapping of the 1-LOD region (28.6 cM) surrounding the maximum peak on chromosome 5p with 21 additional DNA markers strengthened the results (LOD = 4.2 at 48.4 cM) and narrowed the continuous 1-LOD interval to 13.4 cM (Fig. 4). This increase in the LOD score was not altered when breast cancer cases were excluded (LOD = 4.2 at 48.4 cM).
Chromosome 5 linkage results from original genome-wide scan (dashed line) for mean MD and after the inclusion of 21 additional markers for fine mapping (solid line). Results are from full models.
Chromosome 5 linkage results from original genome-wide scan (dashed line) for mean MD and after the inclusion of 21 additional markers for fine mapping (solid line). Results are from full models.
Based on the magnitude of the major gene variance component at the maximum LOD, the locus on chromosome 5p could explain up to a maximum of 22% of the total variability in mean MD and up to 41.6% of the variance not attributable to the nongenetic correlates of the trait. Additionally, the bootstrap analyses to estimate confidence in the putative locus illustrated that 72% of the LOD scores estimated from the 10,000 bootstrap data sets were >3.3, indicating a consistent linkage signal in these 89 families.
Discussion
In this first large genome-wide linkage study of MD, we identify three regions suggestive for genes influencing MD: one on chromosome 5p and two on chromosome 12. Finer mapping of the peak LOD region on 5p resulted in statistically significant evidence for linkage (LOD = 4.2). The locus on chromosome 5p may explain up to 22% of the variance in mean MD and up to 42% of the random effects variance, although we recognize that these estimates may be upwardly biased based on estimating effect sizes from an initial linkage scan. This represents compelling evidence that genes in this region are responsible for a large portion of variation in MD and, potentially, breast cancer.
Unraveling the genetic components of complex diseases is enhanced through focus on the genetics of heritable risk factors for the disease. For example, in the cardiovascular disease arena, studies focus on the risk factors of cholesterol, blood pressure, lipoproteins, and clotting factors (38, 39); the same has been seen in type 2 diabetes, with linkage analyses of BMI and insulin response (40). By analogy, we focused our efforts on a quantitative trait that is strongly associated with breast cancer risk in more than 40 studies to date (1) and has a proven genetic component (12–14). The locus for MD on chromosome 5p has not been previously identified in linkage analyses for breast cancer [e.g., chromosomes 8p (41), 13q (42), 2q (43), and 4p (44)], underscoring the merit of this approach to identify novel genes that may influence both MD and breast cancer risk.
Although this is the first family linkage study of MD, several genetic association studies have been conducted to identify genes influencing MD (16–23). Given the hormonal basis of breast cancer, most studies have explored genes involved in hormone metabolism (17–20, 23) or the insulin-like growth factor (IGF) pathway (16, 21, 22). No clear associations have been established, and these candidate genes have not explained a large portion of variation in MD. None are located within the identified chromosome 5p region (Supplementary Table), but interestingly, IGF-I does lie just outside the 2-LOD region surrounding the maximum LOD on chromosome 12q.
MD is associated with stromal fibrosis and epithelial proliferation with or without atypia (7). Positive correlations of percent MD have also been found with cellularity of breast tissue (45, 46); stained areas of collagen (45–47); the stromal proteins, lumican and decorin (47); and tissue inhibitor of matrix metalloproteinase-3 (TIMP-3) (ref. 45). These findings implicate genes encoding proteins involved in proliferative activity, maintenance, and regulation of the breast epithelium, stroma, extracellular matrix, and fat as candidate modifiers of MD (46). Several of the 45 genes in the 1-LOD interval surrounding the chromosome 5p peak LOD are involved in these processes and could influence MD (Supplementary Table). The prolactin receptor, which lies between the 1- and 2-LOD regions surrounding the maximum LOD on 5p, is also a strong candidate because MD is positively associated with prolactin in postmenopausal women (48). However, our linkage signal might not be due to genes, but other important regulators in the noncoding DNA, such as microRNAs. In addition, the importance of intronic regions is only beginning to be understood (49).
Our study had several strengths, including a large sample of families with high participation rates and detailed risk factor information. Adjustment for other risk factors, in particular BMI, proved to be critical to efforts to identify the linkage signal. BMI and MD likely operate on breast cancer through different biological pathways, but their strong negative correlation leads to underestimation of effects of either pathway if not adjusted for each other (50). Using the semiautomated measure of percent MD, which has been shown to provide the strongest association with breast cancer risk across studies (1), is also a strength. Finally, the genes in the identified candidate region on chromosome 5 have not been previously examined with breast cancer risk. As such, a link with MD would provide new insight into mechanisms through which some breast cancers might occur.
This study was limited to Caucasian women in the Midwest, and it is not clear if our findings can be generalized to other populations. Also, although our data are provocative, rigorous examination of genes in the chromosome 5p region in relation to levels of MD will be required to identify the specific gene or genes contributing to the linkage signal. In addition, follow-up of the chromosome 12–suggestive regions could provide new insight to genes for MD.
In conclusion, there may be at least one gene influencing MD on chromosome 5p. The identification of genes for MD could translate to both the identification of novel genes for breast cancer and biological targets for the reduction of density.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).
Acknowledgments
Grant support: NIH (P01 CA82267).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.