## Abstract

**Background:** Mammographic density is a strong risk factor for breast cancer. It is unknown whether there are different causes of variation in mammographic density at different ages.

**Methods:** Mammograms and questionnaires were obtained on average 8 years apart from 327 Australian female twin pairs (204 monozygous and 123 dizygous). Mammographic dense area and percentage dense area were measured using a computer-assisted method. The correlational structure of the longitudinal twin data was estimated under a multivariate normal model using FISHER. Inference about causation from examination of familial confounding was made by regressing each twin's recent mammographic density measure against one or both of her and her co-twin's past measures.

**Results:** For square root dense area and percentage dense area (age- and body mass index–adjusted), the correlations over time within twins were 0.86 and 0.82, and the cross-twin correlations were 0.71 and 0.65 for monozygous pairs and 0.25 and 0.20 for dizygous pairs, respectively. As a predictor of a twin's recent dense area, the regression coefficient (SE) for the co-twin's past dense area reduced after adjusting for her own past measure from 0.84 (0.03) to 0.09 (0.03) for monozygous pairs and from 0.63 (0.04) to 0.04 (0.03) for dizygous pairs. Corresponding estimates for percentage dense area were 0.73 (0.04), 0.10 (0.03), 0.42 (0.05), and 0.03 (0.03).

**Conclusion:** Mammographic density measures are highly correlated over time and the familial/genetic components of their variation are established before mid-life.

**Impact:** Mammographic density of young women could provide a means for breast cancer control. *Cancer Epidemiol Biomarkers Prev; 21(7); 1149–55. ©2012 AACR.*

## Introduction

Mammographic density, adjusted for age and body mass index (BMI), is a strong and heritable risk factor for breast cancer. There is large variation in mammographic density across women of all ages, and the proportion of this variance that is estimated to be due to genetic factors is approximately 60%. There is a large body of research dedicated to identifying these genetic factors as at least 10% of the genetic causes of the variation in mammographic density also cause breast cancer, and vice versa (1).

There is also evidence that measures of mammographic density within women are highly correlated over time. McCormack and colleagues reported a within-woman rank correlation of percentage dense area for mammograms taken 9 years apart of 0.8, suggesting that measures of mammographic density track well over time, particularly for women ages 50 to 65 years (2). This has practical implications in a clinical setting as women could be identified as high or low risk from a baseline mammogram, and the appropriate screening interval could be tailored to reflect that risk.

It is not known whether there are different causes of variation in mammographic density at different ages or in particular whether different genes are involved at different times of life. Several studies have investigated associations between reproductive or environmental factors such as menopause, diet and tamoxifen, and mammographic density over time (2–6); however, these factors still only account for a small portion of the variation in mammographic density. Some have speculated that different sets of genetic variants are associated with pre- and postmenopausal mammographic density, although no statistically significant differences have been observed (7). It is unclear whether there are different genetic causes of variation at different ages.

In this article, we have estimated the correlation in 2 risk-associated mammographic density measures over a time period of about 8 years using a prospective study of 327 Australian female twin pairs. We have applied a new regression approach which allows inference on causation based on examination of familial confounding (ICE FALCON) to determine whether correlations between twins in the same pair across time are due to one or both of (i) familial factors that determine risk-associated mammographic density measures at both ages or (ii) familial factors that help determine risk-associated mammographic density measures at the first age which then predict future risk–associated mammographic density measures.

## Materials and Methods

### Subjects

Between 1995 and 1999, we recruited 599 female twin pairs (353 monozygous and 246 dizygous) aged 40 to 70 years living in Victoria, New South Wales, and Western Australia as part of an international mammographic density study (8, 9). Between 2004 and 2009, these women were recontacted and invited to continue their participation in the study (10). A recent mammogram was obtained for both members of 327 pairs (204 monozygous pairs and 123 dizygous pairs) and a telephone questionnaire interview was conducted. The individual response was 70% but there were more than 90 pairs where only one twin agreed to continue their participation. There were no differences in the measured characteristics or the mammographic density measures between participating subjects and those without a follow-up mammogram.

Mammograms were randomized into reading sets of approximately 100, ensuring that all twins from the same pair were measured in the same set. Two right-sided, cranio-caudal mammograms were measured for each subject, one from the original twin study (time 1) and another recent mammogram from the follow-up study (time 2). If a subject developed breast cancer in the right breast between times 1 and 2, the left breast was measured instead, and the mammogram used at time 2 was taken before diagnosis. Within a subject, the mammograms were measured consecutively but in random unknown order. The reader (J. Stone) was blinded to all identifying information. A 10% random sample of repeats was included in each set and between every fourth set and used to assess the intra- (within a reading set) and inter- (between reading sets) reliability by calculating the intraclass correlation coefficient (ICC).

### Statistical methods

#### Analysis of correlational structure.

The cross-time correlation is the correlation between the same mammographic density measure at 2 time points for the same twin. The cross-time cross-pair correlation is the correlation between a mammographic density measure at time 1 for one twin and the same mammographic density measure at time 2 for the other twin. As described in the work of Stone and colleagues (9), we estimated the correlational structure of the repeated measures for twin pairs, including the cross-time and cross-time cross-pair correlations, using a multivariate normal model. We also conducted a similar correlational analysis of the difference in measures between time points. Model fitting, statistical inference, and testing of model assumptions were conducted under asymptotic likelihood theory using the statistical package FISHER (11, 12). All quoted *P* values are nominal and 2-sided.

#### Regression analysis to allow ICE FALCON.

We previously introduced a regression approach that allows for inference about causation based on examination of familial confounding (13), an approach that we have named ICE FALCON. Notice that we use the expression “familial” to mean factors shared by relatives, and this could include environmental as well as genetic factors. For the purpose of the argument below, it is not necessary to decompose the causes of variation into genetic and environmental components. The method takes advantage of their existence, without which there would be nothing to “examine,” and in practice, these effects need to be substantial (and/or the sample size large) to make clear inferences. It is important, however, to allow for a different covariance structure for monozygous and dizygous pairs.

In the current context, our approach allows for (scenario A: familial factors) cross-time cross-pair correlation through the existence of familial factors acting on both measures (mammographic density at time 1 for one twin and at time 2 for the co-twin), and/or (scenario B: causal relationship) for a direct within-twin causal relationship from a measure at time 1 on the same measure at a future time 2 (13). By “causal,” we mean that if a covariate was varied experimentally, then the expected value of the outcome measure would change. As with all modeling, best fit does not imply proof but can be used to determine the extent to which the data are consistent with different scenarios. In brief, if a measure at given time 2 for one twin (henceforth, “self”) is associated with the same measure for the co-twin at an earlier time 1, then: (i) if this association is unchanged after adjusting for an earlier measure of the self, the data are consistent with the existence of familial factors acting on both measures (scenario A), whereas (ii) if this association is negated by adjusting for an earlier measure of the self, the data are consistent with the causal scenario B.

Formally, let *Y* be the age- and BMI-adjusted mammographic density measure at a given time and *X* the same age- and BMI-adjusted measure at an earlier time. Let “self” and “co-twin” refer to the twins within a pair, with choice being arbitrary. Within a pair, we allow for *Y*_{self} and *Y*_{co-twin} to be correlated because of familial factors relevant to *Y* that are shared by twins in the same pair. Similarly, we allow for *X*_{self} and *X*_{co-twin} to be correlated because of familial factors relevant to *X* that are shared by twins in the same pair. We also allow for *Y*_{self} and *X*_{co-twin} to be correlated because of familial factors relevant to both *Y* and *X*.

As described in the work of Dite and colleagues (13), let

so that the expected *Y* value for a twin is regressed on their own observed *X* value (represented by the coefficient *β*_{self}, the within-twin coefficient) as well as on their co-twin's observed *X* value (represented by the coefficient *β*_{co-twin}, the co-twin coefficient); see equations (5) and (6) in the work by Gurrin and colleagues (14).

We fitted 4 models. For each model, the response variable is the most recent mammographic density measure (time 2) for one twin (self). In model I, the independent variable is the mammographic density measure from time 1 for herself, that is, the coefficient *β*_{self} is estimated with *β*_{co-twin} = 0, whereas in model II, the independent variable is the mammographic density measure from time 1 for the other twin (co-twin), that is, the coefficient *β*_{co-twin} is estimated with *β*_{self} = 0. In model IIa, the *β*_{co-twin} coefficient is presumed to be independent of zygosity, whereas in model IIb, it is estimated separately for monozygous and dizygous pairs. In models IIIa and IIIb, *β*_{self} and *β*_{co-twin} are estimated together. In model IIIa, *β*_{co-twin} is presumed to be independent of zygosity, whereas in model IIIb, *β*_{co-twin} is allowed to depend on zygosity.

Whether or not the *β*_{co-twin} estimates change between the fits of models II and III allows inferences to be made about the possible reasons why the mammographic density measure at time 1, *X*, is associated with the mammographic density measure at time 2, *Y*. If *β*_{co-twin} is unchanged, this is consistent with scenario A, existence of familial factors that act on both measures. If *β*_{co-twin} attenuates in magnitude toward zero, this is consistent with (scenario B) the mammographic density measure at time 1, *X*, having at least, in part, a causal effect on the future mammographic density measure, *Y*.

Under the assumptions of the classic twin model, a test of whether these familial factors have a genetic component can be made by fitting *β*_{co-twin} separately for monozygous and dizygous pairs and testing whether the coefficients differ (models IIb and IIIb).

## Results

The reliabilities of the measurements within reading sets (over all reading sets) were high with intraclass correlation estimates of 0.97 and 0.97 for dense area and percentage dense area, respectively. The intraclass correlation estimates between reading sets were 0.93 and 0.98 for dense area and percentage dense area, respectively.

Table 1 shows that the mean time between mammograms was about 8 years, and for 50% of the sample observations, the interval was between 7 and 9 years. Over this period, mean BMI increased by 0.86 kg/m^{2} (3.5% of mean at time 1) and dense area and percentage dense area decreased by 4.5 cm^{2} (1.5% of mean at time 1) and 7.2% points (2.4% of mean at time 1), independent of zygosity.

Figures 1A and B illustrate, for each subject, the unadjusted square root dense area (A) and square root percentage dense area (B) measures at the 2 ages of measurement. These plots show that the average of the dense area measures is considerably less dependent on age than the average of the percentage dense area measures. They also show that, for the vast majority of women, these mammographic density measures are highly stable with time over the period of study.

Figures 2A and B show the age- and BMI-adjusted residuals of the recent mammogram (*y*-axis) plotted against the earlier mammogram (*x*-axis), for square root dense area (A) and percent dense area (B), respectively. Table 2 shows that the correlations over time for both the dense area and percentage dense area residuals were 0.86 and 0.82 (SE = 0.01), respectively, whereas the cross-twin correlations were 0.71 and 0.65 (both SE = 0.03) for monozygous pairs and 0.25 and 0.20 (SEs = 0.07 and 0.08) for dizygous pairs, respectively. The age- and BMI-adjusted mammographic density measures were also highly correlated with each other, being 0.86 and 0.87 at the first and second time points, respectively.

The regression coefficients in Table 3 show that, within twins, the age- and BMI-adjusted mammographic density measures at time 2 were associated with their previous measures at time 1 (model I; all *P* <0.001). Both mammographic density measures were associated with their co-twin's measures at the previous time (model IIa; both *P* <0.001), and these associations were stronger for monozygous pairs than for dizygous pairs (model IIb; both *P* < 0.001). Model IIIa shows that when both the within-twin and co-twin regression coefficients were estimated simultaneously, the within-twin coefficient remained virtually unchanged, whereas the co-twin coefficient was less than or equal to 10% of that for model IIa. Model IIIb shows that the co-twin coefficients did not differ by zygosity (*P* > 0.1 for both dense area and percentage dense area).

After fitting the main effects of model IIIb, the within-pair correlation between the residuals of age- and BMI-adjusted square root dense area at time 2 were 0.28 (*P* < 0.001) for monozygous pairs and 0.07 (*P* = 0.5) for dizygous pairs. For square root percentage dense area, the respective correlations were 0.34 (*P* < 0.001) and 0.10 (*P* = 0.3).

The correlations in the difference in the age- and BMI-adjusted mammographic density measures between the 2 time points were, for square root dense area and percentage dense area respectively, 0.28 and 0.35 (both SE = 0.07) for monozygous pairs (both *P* < 0.001) and 0.04 and 0.07 (both SE = 0.08) for dizygous pairs (both *P* > 0.4). The correlation within monozygous pairs was greater than within dizygous pairs (*P* = 0.02 and 0.005, respectively).

## Discussion

We found, as have others (2, 6), that age- and BMI-adjusted dense area and percentage dense area were highly correlated over time for women in mid-life, that is, they tracked strongly with correlations in excess of 0.8 over 8 years. We also found that for an individual twin, her future risk–predicting mammographic density measure was correlated with the past risk–predicting mammographic density measure of her co-twin, the more so for monozygous pairs than dizygous pairs.

Using the standard bivariate classic twin model, it would have been concluded that this strong tracking was due to the existence of familial factors that cause variation in these risk-associated mammographic density measures at both times, and that those familial factors have a strong genetic basis, although the latter conclusion would have been highly dependent on the “equal shared environments” assumption of the classic twin model. Using our novel regression approach, we have found that the data are not consistent with this version of the classic model as the association between a recent measure of a twin and the past measure of her co-twin was substantially smaller, and virtually null, after adjusting for her past measure (see model III fits).

The so-called “causal model” in chapter 13 in the work of Neale and Cardon (15) is simply a sub-model of the bivariate twin model and as such does no more than study marginal associations (i.e., unconditional correlations). It does not, as we have done, consider conditional correlations. This is an essential difference between the 2 approaches. Also, that approach relies on assumed knowledge about the genetic and environmental causes of variation in the traits, something our model does not require. Furthermore, we have shown the results of the usual bivariate analysis and indicated that is not consistent with our data.

That is, our model is consistent with the genes operating at or before time 1. The standard analysis would instead suggest there is modest evidence of shared genetic influence across time, which would imply the genes are operating at both time 1 and time 2. When we looked at the *difference* in mammographic density measures between time 1 and time 2 (see last paragraph of Results) and compared monozygous and dizygous pairs, we found no evidence for (new) genes acting between these times of life.

The patterns of associations we have observed from the regression analysis are more consistent with the existence of a causal relationship between each woman's prior measure and her recent measure. We propose that there are familial factors that determine variation in the mammographic density measures before the time of first measurements and that thereafter these measures are subject, at least for dizygous pairs, to entirely nonfamilial “causes” specific to the individual that determine the future mammographic density measure. For dizygous pairs, the correlations of the measures at time 2 are completely explained by the measures being familial at time 1 and the existence of the causal relationship. For monozygous pairs, there is evidence for some additional familial factors influencing the measures at both time points and further familial factors influencing the measures at the time 2.

We have called our regression approach “Inference about Causation from Examination of FAmiliaL CONfounding” (ICE FALCON). This can be thought of as another guideline to be considered, when assessing evidence consistent with causation along with the other guidelines proposed by Hill (16), which include the need to eliminate confounding in general by more traditional approaches of design and analysis of data on independent individuals.

In describing the counterfactual model of causality, McGue and colleagues (17) note that the major issues in assessing evidence for causation from observational data are reverse causation and confounding. The former is dealt with by the fact we have longitudinal data, so the second measurement cannot “cause” the first. With respect to confounding, adjustment can never be made for confounders that are not measured. Therefore, McGue and colleagues (17) propose that the same-sex within-pair twin approach is possibly the best observational design, given its natural matching for age, sex, and unmeasured familial factors (including 100% of genetic effects for monozygous pairs). For a given measured exposure, within-pair analyses of both outcome and exposures can be made using standard techniques (18).

Here, we use the fact that the covariate is familial to make a simple test of the null hypothesis that there is no causality, that is, the cross-trait cross-time regression coefficient is the same before and after condition on time 1. In the current example, it not only was the same, it was virtually obliterated. We do not claim that this “proves” causation but do make the point that it is highly consistent with the predictions of a causal model.

Limitations of the study include the potential for the nonsignificant dizygous cross-time cross-pair regression coefficients in models IIIa and IIIb, and the nonsignificant dizygous correlation in the difference in measures between time 1 and time 2, to be due to lack of statistical power. There is also the potential for sources of differential drop out to influence the results and interpretations, although we have not been able to detect any except zygosity. Not unexpected for twin studies, monozygous pairs were a little more likely to participate at follow-up than dizygous pairs.

To our knowledge, there has been nothing published on the potential for genetic factors to influence changes in mammographic density measures over time. We found that mammographic density changes were correlated within monozygous, but not dizygous, twin pairs. The correlation within monozygous pairs was greater than that within dizygous pairs. While under the assumption of the classic twin model, these correlations could be interpreted as being consistent with the existence of genetic factors that influence variation in change over this time period, they could also be interpreted as being consistent with nongenetic factors shared by monozygous twin pairs, or some combination of the above. The lack of correlation within dizygous pairs argues against a genetic basis for these twin correlations, which appear to be specific to monozygous pairs alone.

Figures 1A and B show that the variation in the unadjusted mammographic density measures at age 40 years was quite large and a substantial proportion of women had little or no mammographic density at this age and beyond. Until recently, it was unclear as to whether breasts when they are formed around puberty are made up of entirely mammographically dense tissue with the amount/proportion declining with age at different rates or whether the amount/proportion of dense tissue varies from woman to woman from the time of breast formation. Using MRI as a surrogate measure of mammographic density for young women, Boyd and colleagues concluded that the most likely situation is one of large variation in mammographic density from a young age with the amount/proportion declining with age at different rates. They also concluded that mammographic density in middle age might partly be the result of genetic factors that affect growth and development in early life (19).

In summary, we have found evidence that the mammographic density measures that predict breast cancer are highly correlated over time and the familial/genetic components of their variation are established before mid-life. Mammographic density of young women could provide a means for breast cancer control, for example, as a biomarker for intervention studies or as a means for predicting women at increased risk at a young age, especially for those with a family history.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

## Authors' Contributions

**Conception and design:** J.L. Hopper, G.S. Dite, G.G. Giles, D.R. English

**Development of methodology:** J.L. Hopper, G.S. Dite

**Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.):** J. Stone, G.S. Dite, J.L. Hopper

**Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis):** J. Stone, J.L. Hopper, G.S. Dite, D.R. English

**Writing, review, and/or revision of the manuscript:** J. Stone, G.S. Dite, G.G. Giles, J. Cawson, D.R. English, J.L. Hopper

**Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases):** G.S. Dite, J. Cawson, J.L. Hopper

## Acknowledgments

The authors thank the Australian Mammographic Density Research Facility (AMDRF) for collecting and digitizing all of the mammograms; the Australian Twin Registry; and all of the twins who participated in this research.

## Grant Support

This work was supported by the Australian National Breast Cancer Foundation, Cancer Australia and the Australian National Health and Medical Research Council (NHMRC; grant number 300048).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.