Mammographic features are associated with breast cancer risk, but estimates of the strength of the association vary markedly between studies, and it is uncertain whether the association is modified by other risk factors. We conducted a systematic review and meta-analysis of publications on mammographic patterns in relation to breast cancer risk. Random effects models were used to combine study-specific relative risks. Aggregate data for >14,000 cases and 226,000 noncases from 42 studies were included. Associations were consistent in studies conducted in the general population but were highly heterogeneous in symptomatic populations. They were much stronger for percentage density than for Wolfe grade or Breast Imaging Reporting and Data System classification and were 20% to 30% stronger in studies of incident than of prevalent cancer. No differences were observed by age/menopausal status at mammography or by ethnicity. For percentage density measured using prediagnostic mammograms, combined relative risks of incident breast cancer in the general population were 1.79 (95% confidence interval, 1.48-2.16), 2.11 (1.70-2.63), 2.92 (2.49-3.42), and 4.64 (3.64-5.91) for categories 5% to 24%, 25% to 49%, 50% to 74%, and ≥75% relative to <5%. This association remained strong after excluding cancers diagnosed in the first-year postmammography. This review explains some of the heterogeneity in associations of breast density with breast cancer risk and shows that, in well-conducted studies, this is one of the strongest risk factors for breast cancer. It also refutes the suggestion that the association is an artifact of masking bias or that it is only present in a restricted age range.(Cancer Epidemiol Biomarkers Prev 2006;15(6):1159–69)
Breast density, a measure of the extent of radiodense fibroglandular tissue in the breast, has the potential to be used as a predictor of breast cancer risk, to monitor risk-lowering interventions and as an intermediate end point in studies of breast cancer etiology. More than 40 studies have assessed associations with Wolfe grade or percentage breast density and the majority reported 2- to 6-fold increased risks for the highest compared with the lowest risk categories; however, there has been no recent comprehensive assessment of the reasons for the wide range in estimates. Furthermore, findings from subgroup analyses (e.g., by age) have not been systematically reviewed.
Factors that may explain between-study variations in relative risk (RR) estimates include features of the study design (e.g., masking bias affects cross-sectional and cohort studies differently), population characteristics (e.g., use of general population or symptomatic controls), and methods of analysis (e.g., adjustment for confounders). There may also be important effect modifiers of this association. Differential associations by age would have wide implications for the use of this variable as an intermediate marker of risk. Some studies have found stronger associations at older (1) or younger (2) ages, whereas others have reported no age interactions (3). It is also not known whether this association differs by time since mammography, ethnicity, or any other breast cancer risk factors.
We conducted a systematic review and meta-analysis of published studies on the association between both quantitative and qualitative measures of mammographic features with the aim of exploring reasons for differences in RR estimates and, where appropriate, estimating pooled effects. The sources of heterogeneity that we explored, among others, are (a) use of concurrent or prospective mammograms, (b) use of general population or symptomatic controls/noncases, (c) age, (d) ethnicity, and (e) time since mammography.
Materials and Methods
We conducted a systematic review of all published studies investigating the association between qualitative or quantitative measures of the radiological mammographic pattern and risk of prevalent or incident breast cancer in adult women. Qualitative measures include the Wolfe grade (N1, normal fatty breast; P1 and P2, prominent ducts occupying <25% and 25-75% of the breast, respectively; and Dy, dysplastic breast with sheets of dense parenchyma) and Breast Imaging Reporting and Data System (BIRADS) of the American College of Radiology density classification (fatty, scattered fibroglandular density, heterogeneously dense, and extremely dense). The five-grade Tabar classification was also considered (I, scalloped contours and Cooper's ligaments; II, evenly scattered terminal ductal lobular units; III, oval-shaped lucent areas; IV, extensive nodular and linear densities; and V, homogeneous structureless fibrosis with convex contours). Quantitative measures include percentage breast density (percentage of the mammogram with radiodense fibroglandular tissue), area of dense tissue, fractal dimension, and skewness.
We identified studies that satisfied the above inclusion criteria by doing an electronic search on November 18, 2005 of Medline, EMBASE, and Pubmed databases using the following criteria: English language journals, “breast” or “mamm*” in any field and “cancer*,” “malig*,” “carcin*,” or “neopl*” in the title and at least one of the following terms in the title: “breast densit*,” “parenchym*,” “mammo* pattern*,” “radiological pattern*,” “Wolfe*,” “Tabar*,” “BIRAD*,” “mammo* feature*,” “breast pattern*,” “mammo* densit*,” or “tissue densit*.” We also manually searched the references of all relevant articles to identify further studies.
For each study, we extracted information on study design (prospective, case-control, nested case-control, or cross-sectional), study population, matching variables (if appropriate), time since mammography, use of concurrent or prospective films relative to time of diagnosis, blinding of density assessors to cancer status, age at mammography, and density classification. We recorded RRs for breast cancer associated with each breast feature and their 95% confidence intervals (95% CI)/SEs, noting the adjustments made. Where effect estimates or SEs were not reported, we calculated age-adjusted odds ratios and their 95% CIs.
Studies of breast cancer incidence were considered to be those where exposure assessment was made with a negative mammogram; thus, cancers were diagnosed subsequently. In prevalence studies, density was assessed in the contralateral breast of mammograms taken at diagnosis.
RRs of breast cancer were estimated by risk, odds, rate, or hazard ratios depending on the method of analysis. Forest plots were used to visualize study-specific RRs. Heterogeneity across studies was assessed using a χ2 statistic (Q) with inverse variance weights. The extent of inconsistency between study estimates was assessed as a percentage (I2). Metaregression models were used to explore sources of heterogeneity. Publication bias was assessed visually from funnel plots and using the Egger test (4). We used Stata version 8.1 for analyses.
Pooled RRs were estimated using random-effects models and were only reported for similar breast density classifications and for independent subjects. Tests of heterogeneity were based on fixed-effects models to maximize power. For percentage density, we combined RR estimates using the five most commonly used categories of ∼<5% (reference), 5% to 24%, 25% to 49%, 50% to 74%, and ≥75%. Where categorizations from individual studies differed, we placed each RR into the category where its midpoint fell.
Included and Excluded Studies
The electronic search yielded 210 articles, 150 of which were not included: 135 were not relevant, and 15 were commentaries/reviews. Of the 60 considered studies, 3 were excluded because of unblinded exposure assessment (5) and use of a classification system, which could not be related to any others (6), and a single-twin study (7). A further 19 studies were excluded from the main analysis, as they were not independent of other included studies (studies; refs. 8-14) as well as 12 studies from Breast Cancer Detection and Demonstration Project centers (15-26) that were likely to have been subsets of two studies that included cases from all or most centers (2, 27), although they were considered in subgroup analyses. Four articles were found manually (28-31), giving a total of 42 included articles for the main analysis.
The 42 articles represent aggregate data for a total of 14,134 cases and 226,871 noncases, arising from 17 incidence studies (6,967 cases) and 17 prevalence studies (4,983 cases) in the general population and 9 studies (2,184 cases) in symptomatic populations (some studies contribute to more than one category). Incidence studies used nested case-control or cohort designs (Table 1; refs. 2, 3, 10, 27, 29, 30, 32-44), whereas prevalence studies were case-control or cross-sectional (Table 2; refs. 1, 28, 45-59). Nine studies were conducted using symptomatic populations or controls (Table 3; refs. 31, 49, 60-65).
Qualitative Classifications of Breast Density
Effect estimates from studies that analyzed Wolfe grade in four categories are shown in Fig. 1, including the study by Thomas et al. (30) that combined grades N1 and P1 as the reference category. Pooled RR estimates for both incidence and prevalence general population studies show an increasing trend with Wolfe grade with statistically significant point estimates for P1, P2, and Dy relative to N1 of 1.76, 3.05, and 3.98 in incidence studies and lower estimates of 1.25, 1.97, and 2.42 in prevalence studies. There was no evidence of between-study heterogeneity (P > 0.05), with the exception of the RRs for P2 relative to N1 in prevalence studies, where the studies of Weich et al. (46) and Nagao et al. (57) were clear outliers (Fig. 1). Combined RRs excluding these two articles were higher but remained lower than that for incidence studies. Hainline et al. (45) could not be pooled, as SEs could not be estimated, but RR point estimates were consistent with overall findings (1.5, 2.7, and 4.2, respectively). Among studies conducted in symptomatic populations, RRs were very variable (heterogeneity, P < 0.01 for P2 and Dy relative to N1), and only one found a monotonic increasing trend (60).
Pooled RRs from studies that dichotomized the Wolfe grade (P2+Dy versus N1+P1) were highly consistent with the above findings (Fig. 2). Pooled RRs indicate that incidence and prevalence were, respectively, 1.86 and 1.44 times higher in the high-risk category. In the four studies using symptomatic controls, all found very large increased risks, but 95% CIs were wide, and there was large heterogeneity between studies.
Four studies used qualitative measures of density other than Wolfe grade. Brisson et al. reported ∼2-fold increased breast cancer rate in glandular/dense compared with fatty breasts (no 95% CI was provided; ref. 2). Three studies used BIRADS classification (3, 41, 55), and, in the two that compared risks relative to the fatty parenchyma, results are consistent, giving combined RRs of 2.04 for scattered, 2.81 for heterogeneous, and 4.08 for extremely dense tissue. However, the combined estimate may be positively biased because the findings from one of these studies is restricted to postmenopausal women, as estimates were not provided for premenopausal women where “no significant associations” were found (55). Using the second category (scattered density) as the reference group, Ziv et al. also reported an increasing trend with BIRADS classification (Fig. 2; ref. 41). The single study that used Tabar classification found a 2.42-fold increased risk for pattern IV compared with pattern I but no increased risk for pattern V (Fig. 2; ref. 54).
Quantitative Measures of Breast Density
Studies that examined percentage density are shown in Fig. 3, including one study that assessed percentage dysplasia (28). A linear increasing trend was observed in both incidence and prevalence general population studies. Pooled RRs in incidence studies were 25% to 30% higher than in prevalence studies for categories 5% to 24% and ≥75% and were similar for categories 25% to 49% and 50% to 74%. RRs were as high as 2.92 (2.49, 3.42) and 4.64 (3.64, 5.91) for percentage density 50% to 74% and ≥75% (compared with <5%), respectively. The study by Byrne et al. contributed to at least 40% of these combined estimates (27). Despite differences in exact boundaries for breast density categories, there was no evidence of significant statistical heterogeneity (see Ps in Fig. 3).
Quantitative measures studied less frequently include the area of dense tissue and fully automated measures, such as skewness of the gray-level histogram, lacunarity, and fractal dimension. Two studies found that fractal dimension was inversely associated with risk, but the strength of the association was weaker than that for percentage density (36, 43). One study reported a negative association with regional skewness (36), but this was not replicated by another (43). Nagao et al. analyzed a fully automated measure according to whether the peak of the gray-level histogram occurred in the fat region (reference category), pectoral muscle region (RR, 2.82; 95% CI, 1.33-5.98), or had no definite peak (1.55, 0.93-2.58; ref. 57).
Sources of Heterogeneity
There was no evidence of publication bias in studies of percentage density and breast cancer incidence (Ps = 0.22, 0.94, 0.65, and 0.12, Egger test). For prevalence studies (1, 56, 58, 59, 66), publication bias was present in the third category (P = 0.002), which was accounted for by the more extreme results with wide 95% CIs for premenopausal and postmenopausal women in the study by Nagata et al. (59). Exclusion of this study did not greatly alter the pooled estimate [from 2.93 (2.27-3.79) to 2.83 (2.16-3.70)].
To explore further potential sources of heterogeneity, we primarily focused on incidence studies that analyzed percentage density (10, 27, 29, 30, 37, 40, 42-44), as its risk gradient was largest, and some factors, such as time since mammography, could not be examined in prevalence studies. The effect of masking bias was evident, as a lower RR was observed in the highest density category among studies that excluded cancers that arose during the first year (10, 27, 30, 40) after a negative screen compared with those that included them (Fig. 4; refs. 29, 37, 42-44). This difference was largely influenced by the study of Ciatto et al. (44), in which all cases were diagnosed within 2 years of screening (as this study was designed to investigate associations with interval cancers rather than with overall breast cancer risk). This study was removed from further analyses.
We assessed associations with short- (10, 27, 29, 30, 37, 42) and long-term (27, 30, 40, 43) risk by separating studies according to whether the majority of mammograms were taken within or >5 years previously. Two studies contributed to both groups, as they presented results by time since mammography (27, 30). A stronger trend was observed for short-term risk than for long-term risk, especially for the highest density category where the test for heterogeneity was borderline significant (Fig. 4; P = 0.06). This difference remained when we restricted analyses to studies that excluded cancers in the first year of follow-up. These results largely reflect the weakening, with time since mammography, of the association in the study by Thomas et al. (30), but the larger study by Byrne et al. (27) did not observe this trend.
Pooled estimates from four incident studies of percentage density where it was possible to analyze younger (premenopausal or <50 years at the time of mammography) and older women separately are shown in Fig. 4 (10, 27, 30, 37). There are some suggestions of stronger RRs at older ages for category 25% to 49% relative to <5% (P = 0.05), but differences between age groups for other categories were not significant and not consistent. There is no suggestion that this association is restricted to either younger or older women. On examination of the large prevalent studies that presented data by menopausal status or age, Ursin et al. (1) also found a significantly stronger association at an age of >50 years (P = 0.05). However, two other large studies had contrary findings. Brisson et al. (2) found a stronger association in younger, whereas Vacek et al. (3) found no significant difference in the relationship at premenopausal and postmenopausal ages for the association with BIRADS classification.
Although the majority of studies were conducted in Caucasian populations, four were carried out in Asia, and their findings were consistent with the overall pooled effects (54, 57, 59, 61). Four U.S. studies presented ethnic group-specific associations (1, 29, 31, 42). Two of them commented on associations in African American women, but neither found statistically significant differences: Ursin et al. (1) found a slightly weaker association in African American women (RR, 1.66; 95% CI, 0.64-4.32) compared with that in White women (RR, 2.56; 95% CI, 1.23-5.31 for ≥60% breast density compared with <10%), whereas Wolfe et al. (31), in a study of symptomatic women, found a slightly stronger association in the former group. In two Hawaiian studies, weaker associations were observed in Asian (Japanese) women than in Caucasian women (29, 42). Duffy et al. also reported modifications by ethnic group in a case-control study in Singapore (P = 0.05), as they found that non-Chinese women with dense patterns were at an increased risk but that Chinese women were not (13). On the contrary, Ursin et al. found a larger association in Asian Americans than in White or African Americans (odds ratios, 1.30, 1.15, and 1.10, respectively, for a 10% increase in percentage density; ref. 1).
Within studies, authors have reported age and body size [weight or body mass index (BMI)] to be negative confounders. Among incidence studies of percentage density, age was controlled for in all nine studies. In studies that controlled for body size (10, 27, 37, 42, 43), pooled RRs were 1.78, 2.46, 3.02, and 4.59, respectively, for categories 5% to 24%, 25% to 49%, 50% to 74%, and ≥75% relative to <5%, which were generally larger than the corresponding pooled RRs for studies that did not make this adjustment (1.81, 1.56, 2.67, and 4.40; refs. 29, 40, 43). Two studies (using Tabar classification and percentage density) reported stronger associations in women with higher BMIs, but the test for heterogeneity was not significant in either study (13, 42).
Other potential effect modifiers have been examined only by a small minority of studies, so it was not possible to produce subgroup-specific RRs; instead, we simply quote the findings reported in individual studies. Parity or number of deliveries has been reported as an effect modifier in two studies, with one finding a stronger association of percentage density in nulliparous women, whereas the other found that women with Tabar grades IV and V were at an increased risk only if they had one or two deliveries but not more (13, 40). Herrinton et al. found no evidence of effect modification by alcohol intake (26), and two studies found no differences according to ever use of hormone replacement therapy (13, 55). Boyd et al. found that associations of percentage density with breast cancer risk were stronger in women with a first-degree relative with breast cancer (9).
The combined data presented here confirm that breast density, measured using either Wolfe grade or percentage density, is strongly associated with breast cancer risk, as determined by general population studies of either incident or prevalent cancer risk. Pooled RRs were higher for percentage density than for Wolfe grade. Individual studies that were able to assess the two measures simultaneously have suggested that Wolfe grade is not predictive after adjusting for percentage density (56). However, the breast density-breast cancer association only held in general population studies. The inconsistent findings from studies conducted within symptomatic populations may reflect differing extents of underlying breast disease. Breast density is also a risk factor for benign breast disease, so RRs were expected to be lower.
In general population studies, there was evidence that masking bias affected study results as estimates were, as expected, consistently lower in prevalence than in incidence studies and lower in incidence studies after excluding cancers diagnosed in the first-year postmammography. In prevalence studies, in which breast density measurement and diagnosis were conducted using mammograms taken concurrently, RRs are expected to be underestimated, as there is lower mammographic sensitivity and, hence, more false negatives in dense than in fatty breasts. On the contrary, during the follow-up of a screen-negative cohort in a study of breast cancer incidence, excess cancers in dense breasts may actually be prevalent cancers that were initially undetected, leading to an overestimation of RRs.
Strong linear trends were observed for percentage density, despite RRs referring to different (but overlapping) ranges. RR estimates were consistent within each density category, suggesting that this method, although not ideal, was reasonable in this situation. The combined pooled RRs for categories 5% to 24%, 25% to 49%, 50% to 74%, and ≥75% relative to <5% were 1.79 (1.48-2.16), 2.11 (1.70-2.63), 2.92 (2.49-3.42), and 4.64 (3.64-5.91), respectively, for incidence studies. The true association may be even stronger, as nondifferential measurement error of breast density would lead to underestimation of associations. Assuming moderate error (intraclass correlation of 0.90; ref. 27), RR estimates for categories 50% to 74% and ≥75%, when corrected, would increase to 3.29 and 5.50, respectively. New automated methods for density measurement based on estimation of the volume of dense tissue are currently being developed. Demonstration that their RRs are as strong as those reported here will be needed if they are to serve as useful alternatives to current methods of breast density measurement.
Other classifications and measurements of breast density have been assessed less frequently. Studies of the BIRADS classification suggest that this four-category grade is also a strong predictor of breast cancer risk, although the magnitude may be slightly lower than that for percentage density. However, as this density grade is routinely recorded at the time of mammography in the United States (67), the BIRADS classification could be used in epidemiologic studies within this setting. Other measures, such as skewness and fractal dimension, have only been examined in two studies whose findings were not entirely consistent, and their effect estimates were not as strong as those for percentage density (36, 43). Debate still exists about whether percentage density or area of dense tissue is the strongest predictor of risk (43).
The increased risk associated with breast density is independent of other known risk factors for this disease but is confounded by age and BMI. Failure to take BMI into account in postmenopausal women explained some heterogeneity, as adjustment for this confounder leads to a strengthening of the association. We found little evidence of interactions between other risk factors for breast cancer and breast density. Importantly, the combined data suggest that breast density measured at both premenopausal and postmenopausal ages is a marker of subsequent breast cancer risk and that there is no clear evidence that the strength of this association differs between these ages. However, most premenopausal women in these studies were in their late 30s and 40s, so it is still not known whether mammographic density in early adulthood is predictive of risk in later life. From the available data, there is no evidence to suggest that this association does not hold in all studied ethnic groups (Caucasian, Asian, and African American). Future studies based on or including women of other ethnicities would be informative.
Two of the included studies examined changes in the strength of the association with time since mammography, but only one suggested that the association weakened over time (27, 30). Further investigation of the duration for which an increased RR is applicable is needed. Breast density measured >5 years previously will not capture more recent changes in density, which may influence risk. This problem may be more acute at menopausal ages when density reduces rapidly.
The strength of the association of breast density with breast cancer risk is greater than that for most other established breast cancer risk factors, with the exception of age and some genetic factors. Breast density also accounts for a large number of cases. Assuming that ∼20% and 10% of premenopausal women have densities of 50% to 74% and ≥75%, respectively, the population attributable fraction is 26.7% for densities over 75% and 42.8% for densities over 50%. At postmenopausal ages, the corresponding fractions would be 9.8% and 23.2% (based on 10% and 3% with densities of 50-74% and ≥75%). However, there was no threshold level below which density was not associated with risk. Shifting the entire breast density distribution downwards by a few percentage (if possible) might reduce overall breast cancer rates. Breast density may be amenable to change. A comprehensive review of correlates of breast density and the importance of this marker has recently been conducted (68). Although it has a strong genetic component (69), it is also influenced by lifestyle factors, such as diet (70, 71), and it can be reduced by drugs, such as tamoxifen (72). Controversy remains, however, over whether women should be informed of their density, as it is not known whether reductions in breast density would necessarily lead to lower breast cancer risk. In terms of screening policy, women with dense breasts may benefit from shorter screening intervals. For research purposes, breast density may be used as an intermediate or additional end point in studies of breast cancer. Candidate susceptibility genes for breast cancer may be initially identified using breast density as an intermediate phenotype.
The mechanisms through which breast density affects breast cancer risk are not fully understood (68). Density reflects the proportion of epithelial and stromal tissue in the breast as opposed to nondense fatty tissue. Breast cancers originate in epithelial cells, so greater areas of fibroglandular tissue may reflect a greater number of cells that are at risk of carcinogenesis and/or an increased rate of epithelial proliferation. It is plausible that many of the established breast cancer risk factors influence risk through their effect on density. There is striking similarity between the breast density-age profile and the model for the rate of breast tissue ageing as proposed by Pike et al. to explain the incidence of breast cancer with age (73), and, corresponding to this model, cumulative exposure to breast density would be the pertinent risk factor for which density at a given age is a good proxy.
In summary, some of the heterogeneity in associations of mammographic density with breast cancer risk can be explained by differences in density classification, use of concurrent or prospective mammograms, and inclusion of general or symptomatic populations. RRs were consistent in studies conducted in the general population but were highly heterogeneous in symptomatic populations. They were much stronger for percentage density than for Wolfe grade or BIRADS classification and were 20% to 30% stronger in studies of incident than of prevalent cancer. No differences were observed by age/menopausal status at mammography or by ethnicity. Well-conducted incidence studies suggest that increasing breast density is associated with an increased risk of breast cancer and that the magnitude of this association is 4.64-fold (3.64-5.91) for the most dense (≥75%) compared with the least dense category (<5%). Given the large population attributable risk of mammographic density and that mammography is conducted regularly for screening purposes, the routine measurement of mammographic density should be given more consideration. This marker has great potential to be used for research into the etiology and prevention of breast cancer.
Grant support: Cancer Research UK five-year program grant and Cancer Research UK fellowship (V.A. McCormack).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.