## Abstract

Understanding which factors influence mammographically dense and nondense areas is important because percent mammographic density adjusted for age is a strong, continuously distributed risk factor for breast cancer, especially when adjusted for weight or body mass index. Using computer-assisted methods, we measured mammographically dense areas for 571 monozygotic and 380 dizygotic Australian and North American twin pairs ages 40 to 70 years. We used a novel regression modeling approach in which each twin's measure of dense and nondense area was regressed against one or both of the twin's and co-twin's covariates. The nature of changes to regression estimates with the inclusion of the twin and/or co-twin's covariates can be evaluated for consistency with causal and/or other models. By causal, we mean that if it were possible to vary a covariate experimentally then the expected value of the outcome measure would change. After adjusting for the individual's weight, the co-twin associations with weight were attenuated, consistent with a causal effect of weight on mammographic measures, which in absolute log cm^{2}/kg was thrice stronger for nondense than dense area. After adjusting for weight, later age at menarche, and greater height were associated with greater dense and lesser nondense areas in a manner inconsistent with causality. The associations of dense and nondense areas with parity are consistent with a causal effect and/or within-person confounding. The associations between mammographic density measures and height are consistent with shared early life environmental factors that predispose to both height and percent mammographic density and possibly breast cancer risk. (Cancer Epidemiol Biomarkers Prev 2008;17(12):3474–81)

## Introduction

Mammographic density refers to the radiologically dense breast tissue that seems light on a mammogram. Stroma and epithelium attenuate X-rays more than fat and seem light (i.e., are mammographically dense), whereas fat seems dark (i.e., is mammographically nondense; ref. 1). The extent of mammographic density can be quantified using a computer-assisted method, which measures total breast area and dense area, and hence nondense area. Percent mammographic density (PMD), dense area as a percentage of total breast area, decreases with increasing age, body weight, and body mass index.

Nested case-control studies, which match for age and adjust for weight or body mass index in the analysis, as well as for other known and measured risk factors, have consistently shown PMD to be a strong, continuously distributed risk factor for breast cancer (2). It has not been well-recognized that the risk factor is PMD *adjusted* for age. The effect of PMD is stronger when also adjusted for weight or body mass index, and is little changed by adjustment for other measured breast cancer risk factors. PMD is also an independent risk factor for women who carry disease-predisposing mutations in the breast cancer susceptibility genes *BRCA1* and *BRCA2* (3). It seems that the amount of dense tissue area, adjusted for age, is associated with risk of breast cancer. An ∼3-fold gradient of risk across the extremes of 6 equally spaced categories of dense tissue area for age has been reported (4). The association of absolute nondense tissue area with breast cancer risk, independent of dense tissue area, is unknown.

It is of interest, therefore, to determine the factors that might influence both components of PMD, dense area, and nondense area. In this regard, we conducted a large twin study in Australia and North America and under the assumptions of the classic twin model found reproducibly that, after adjusting for age and weight, the majority of population variance in PMD, dense area, and nondense area seems to be explained by genetic factors (5, 6). This conclusion was based on our findings that the twin pair correlations for these three measures were ∼0.6 for identical (monozygous; MZ) pairs and 0.2 to 0.3 for nonidentical (dizygous; DZ) pairs. The ∼2:1 ratio is consistent with, although not exclusively, the effects of additive genetic factors explaining the twin pair correlations.

Here, we have used data from this twin study to assess the possible influence of measured determinants (covariates) on the mammographic density measures. For both dense area and nondense area, we have regressed each twin's value against her covariates and against her co-twin's covariates. We have previously found substantial within-pair correlations for both dense and nondense area, and percent dense area (data on both members of a pair are incorporated by using a bivariate normal model and estimating the MZ and DZ pair correlations in mammographic measures; see refs. 5, 6). We have estimated the association of the co-twin's covariate, both before and after adjusting for the individual's covariate, and tested whether this depends on zygosity.

This novel approach to analysing twin data might permit insights into whether the data are consistent with a measured covariate having a direct or indirect influence on a mammographic density measure, and/or whether its association is being generated by factors shared by twin pairs that influence both the covariate and the mammographic density measures. Should the latter exist, it is possible to test the null hypothesis that this association does not have a genetic basis.

## Materials and Methods

### Subjects

Two concurrent twin studies of mammographic density were conducted in Australia and North America, as previously described (5, 6). Eligible twins were ages 40 to 70 y at interview, understood written and spoken English, and both members of a pair had a mammogram within 3 y of each other and within 3 y of the interview. Pairs were excluded if either of the twins had a history of breast cancer, breast augmentation, or breast reduction before the date of the mammogram. A total of 571 (353 Australian and 218 North American) MZ and 380 (246 Australian and 134 North American) DZ pairs participated. All twins provided written, informed consent before participation in the study.

### Measurement

For each twin, an original film of the craniocaudal view from one breast was digitized for the study. The measurement of breast tissue composition was done by one observer using an interactive thresholding technique that measured the total breast area and the area of dense tissue from which PMD and area of nondense tissue measures were calculated. Films were assessed in sets of ∼120 randomly ordered images with each set containing both MZ and DZ pairs and both members of each pair. The total breast area and dense area were measured from which nondense area was calculated by subtraction (all mammographic measures were in cm^{2}). The distributions of dense area and nondense area both had long tails and were therefore log transformed for these analyses. The few instances where dense area was <1 cm^{2} were transformed to 0 on the log scale. Because the mammographic measures were log transformed, the regression coefficients indicate the relative difference in dense (and nondense) area, not the absolute difference, for a unit change in the specified covariate.

All twins completed a questionnaire that asked about zygosity, demographic details, weight, height, smoking, alcohol consumption, age at menarche, number of live births, cessation of menstruation, oral contraceptive use, hormone replacement therapy use, breast examination, physical activity, and family history of cancer. In Australia, the interviews were conducted by telephone, whereas in North America, questionnaires were mailed with telephone follow-up to clarify any incomplete or ambiguous responses.

In North America, zygosity was determined using standard questions, and for pairs where the twins disagreed or were unsure, and for a random sample of pairs, zygosity was confirmed using molecular analysis. In Australia, zygosity was determined using a single question, and for pairs where the twins disagreed or were unsure, the twins were telephoned and asked the same questions used in the North American study. We and others have shown that the concordance between the molecular testing and questions is about 0.97 (data not shown).

### Statistical Methods

Figure 1 shows a path model in which the squares represent measured variables, and the circles represent unmeasured (latent) causes. A single-headed arrow between a cause and a measured variable indicates the presumed direction of causation, whereas a double-headed arrow indicates that the causes are correlated.

For a given outcome measure *Y*, such as mammographically dense area or mammographically nondense area, consider first a single covariate *X*, such as weight, height, etc. (extension to more than one covariate is straightforward.) Let “self” and “co-twin” refer to the individual twins within a pair, with choice being arbitrary. As explained in detail below, we allow *Y* and *X* to be correlated due to being both associated with an individual-specific cause (individual-specific confounders I), and/or to factors shared by the twins (S_{X} and S_{Y}) that are correlated (ρ; i.e., twin-specific confounders), and/or due a direct causal effect of *Y* on *X*, and/or of *X* on *Y*. By causal, we mean that if it were possible to vary a covariate experimentally then the expected value of the outcome measure would change.

Within a pair, *Y _{self}* and

*Y*may be correlated due to genetic and/or nongenetic factors relevant to

_{co-twin}*Y*that are shared by twins in the same pair (S

_{Y}). Similarly,

*X*and

_{self}*X*

_{co-twin}may be correlated due to genetic and/or nongenetic factors relevant to

*X*that are shared by twins in the same pair (S

_{X}). It is well-established that height, weight, and some of the other covariates have substantial within pair correlations. Let ρ be the correlation between S

_{X}and S

_{Y}(S

_{X}and S

_{Y}can be decomposed into genetic and environmental components, in which case there would be separate correlations, ρ

_{G}and ρ

_{E}). Let

so that the expected *Y* value for a twin is regressed on their own *X* value (represented by the coefficient *β*_{self}, the within-individual coefficient) as well as on their co-twin's *X* value (represented by the coefficient *β*_{co-twin}, the co-twin coefficient); see equations 5 and 6 of Gurrin et al. (7).

As an aside, Gurrin et al. (7) have shown that this is equivalent to the model

where *β _{W}*, the within-pair coefficient, represents the effects of a unit change in the difference within a pair in the exposure [i.e., a difference in (

*X*)] with the mean (

_{self}− X_{co-twin}*X*/2 held constant, and

_{self}+ X_{co-twin})*β*, the between pair mean effect, represents the effects of a unit change in the pair mean of the exposure [i.e., a difference in (

_{B}*X*+

_{self}*X*/2] with the within-pair difference (

_{co-twin})*X*) held constant. Note that

_{self}− X_{co-twin}*β*= (

_{self}*β*+

_{B}*β*)/2 and

_{W}*β*= (

_{co-twin}*β*)/2, so that

_{B}-β_{W}*β*=

_{B}*β*implies

_{W}*β*

_{co-twin}= 0, and vice versa. See Carlin et al. (8) for a review of regression models for twin data with a detailed discussion of the distinction between the interpretations of the between-pair mean and within-pair coefficients.

We fit four models. The first two consider the twin and co-twin covariates separately (i.e., fixing the other coefficient at zero). In Model I, the coefficient *β*_{self} is estimated with *β*_{co-twin} = 0, whereas in Model II, the coefficient *β*_{co-twin} is estimated with *β*_{self} = 0. In Models III and IV, *β*_{self} and *β*_{co-twin} are estimated together [i.e., allowing both coefficients to be unrestrained]. In Model IV, *β*_{co-twin} is allowed to depend on zygosity.

As discussed below, whether or not the estimates change between model fits allows inferences to be made about the possible reasons why the covariate might be associated with the outcome measure. Of particular importance is whether *β*_{co-twin} = 0 when estimated together with *β*_{self}, or if *β*_{co-twin} depends on the zygosity of the pair.

The philosophy behind our regression approach is the same as that behind the usual application of the classic twin model, in which patterns of correlations are defined under various presumed models and selection between competing models is based on how well the data fit those predictions. Note that this process cannot prove that any one model is a true representation of nature but can be used to test hypotheses and identify the most plausible models, taking into consideration the assumptions underlying those models (and the equal environments assumption of the classic twin model is a major issue in that scenario).

If *X* and *Y* are correlated solely due to a nonzero correlation between S_{X} and S_{Y}, then the estimates of *β*_{self} and *β*_{co-twin} will be independent of one another, and therefore, no different whether they are estimated alone or together. Note also that the individual-specific effects I_{1} and I_{2} are by definition independent of one another, and do not contribute to a correlation between *Y _{self}* and

*X*

_{co-twin}.

Suppose that *β*_{co-twin} ≠ 0 when estimated on its own (Model II) but *β*_{co-twin} = 0 when estimated together with *β*_{self} (Model III). The former observation is consistent with ρ ≠ 0, or with there being a direct casual effect of *X* on *Y* and/or of *Y* on *X*. Under the latter, ρ = 0. Therefore the two observations together are consistent with there being a direct causal effect of *X* on *Y*.

The variable *β*_{co-twin} can be allowed to depend on zygosity (as in Model IV) and, in theory, on any other measured factors that might for example represent the effects of cohabitation. Under assumptions, such as those of the classic twin model, inference could be made about possible reasons why the co-twin's *X* value is independently associated with the outcome. For example, if *β*_{co-twin} is greater when estimated from MZ twin pairs than when estimated from DZ twin pairs, it might reflect the greater genetic similarity between MZ pairs provided one assumes that MZ and DZ pairs are equal for the degree to which they share nongenetic factors relevant to the two measures. Nevertheless, the observation that *β*_{co-twin} is no different between MZ and DZ pairs would argue against a correlation between *Y* and *X* across twin pairs being due to a correlation between the genetic factors that determine variation in each measure.

### Model Fitting

Each *Y* is assumed to be normally distributed with mean *E(Y)* given by the regression equation above, and variance independent of zygosity. The correlation between *Y _{self}* and

*Y*, Corr (

_{co-twin}*Y*) is potentially nonzero and can be allowed to depend on zygosity (5, 6). The twin pairs were assumed to be independent of one another.

_{twin}Y_{co-twin}In all analyses, adjustments were made for study location (Australia or North America) and age at mammogram as pair-specific factors (there was little within pair variation in the latter, and none in the former). We chose to evaluate covariates that were significant in our previous regression analyses of the within-person predictors of mammographic density measures, and these were weight, height, age at menarche, and number of live births, and considered as continuous variables (see Supplementary Material of ref. 5). We first fitted each covariate alone. Because weight was such a strong predictor of both dense and especially nondense areas, we then refitted each covariate after adjusting also for weight. That is, whenever a within-individual (self) coefficient for a covariate was estimated, a self-coefficient for weight was also estimated.

All analyses were done under a bivariate normal model using the statistical package FISHER (9), allowing for each outcome measure to have different within-pair correlations for MZ and DZ pairs (5). All *P* values are nominal and two sided and, after convention, associations were considered to be statistically significant if *P* was <0.05.

## Results

Table 1 shows summary statistics for the transformed and untransformed outcome measures and for each of the exposure variables.

**Table 1.**

. | Mean . | SD . | 25th percentile . | Median . | 75th percentile . | |||||
---|---|---|---|---|---|---|---|---|---|---|

Mammographic measures | ||||||||||

Dense area (cm^{2}) | 39.76 | 26.13 | 21.35 | 34.3 | 53.02 | |||||

Nondense area (cm^{2}) | 80.86 | 54.72 | 39.84 | 65.94 | 107.18 | |||||

Log dense area (log cm^{2}) | 3.43 | 0.81 | 3.06 | 3.55 | 3.97 | |||||

Log nondense area (log cm^{2}) | 4.17 | 0.70 | 3.68 | 4.19 | 4.67 | |||||

PMD (%) | 37.02 | 21.20 | 19.10 | 36.65 | 53.60 | |||||

Covariates | ||||||||||

Age at mammogram (y) | 51.43 | 8.37 | 44.00 | 50.00 | 58.00 | |||||

Height (cm) | 162.42 | 6.80 | 157.00 | 163.00 | 168.00 | |||||

Weight (kg) | 66.83 | 13.45 | 58.00 | 64.02 | 72.99 | |||||

Age at menarche (y) | 12.99 | 1.56 | 12.00 | 13.00 | 14.00 | |||||

No live births | 2.35 | 1.49 | 2.00 | 2.00 | 3.00 |

. | Mean . | SD . | 25th percentile . | Median . | 75th percentile . | |||||
---|---|---|---|---|---|---|---|---|---|---|

Mammographic measures | ||||||||||

Dense area (cm^{2}) | 39.76 | 26.13 | 21.35 | 34.3 | 53.02 | |||||

Nondense area (cm^{2}) | 80.86 | 54.72 | 39.84 | 65.94 | 107.18 | |||||

Log dense area (log cm^{2}) | 3.43 | 0.81 | 3.06 | 3.55 | 3.97 | |||||

Log nondense area (log cm^{2}) | 4.17 | 0.70 | 3.68 | 4.19 | 4.67 | |||||

PMD (%) | 37.02 | 21.20 | 19.10 | 36.65 | 53.60 | |||||

Covariates | ||||||||||

Age at mammogram (y) | 51.43 | 8.37 | 44.00 | 50.00 | 58.00 | |||||

Height (cm) | 162.42 | 6.80 | 157.00 | 163.00 | 168.00 | |||||

Weight (kg) | 66.83 | 13.45 | 58.00 | 64.02 | 72.99 | |||||

Age at menarche (y) | 12.99 | 1.56 | 12.00 | 13.00 | 14.00 | |||||

No live births | 2.35 | 1.49 | 2.00 | 2.00 | 3.00 |

### Weight

From Table 2, Model I shows that both dense and nondense areas were associated with weight within an individual (all *P* <0.001). The direction of association was negative for dense area and positive for nondense area. The absolute size of the within-individual regression coefficient for nondense area, in terms of log cm^{2}/kg, was about thrice that for dense area [for dense area, the negative association seems to be driven by women over 80 kg in weight (data not shown).].

**Table 2.**

. | . | Model I . | Model II . | Model III . | Model IV . | ||||
---|---|---|---|---|---|---|---|---|---|

. | . | Est. (SE)* . | Est. (SE)* . | Est. (SE)* . | Est.(SE)* . | ||||

Weight (kg) | |||||||||

Dense area | Self | −1.00 (0.14) | −0.96 (0.14) | −0.95 (0.14) | |||||

Co-twin | −0.38 (0.14) | −0.22 (0.14) | MZ, −0.18 (0.15) | ||||||

DZ, −0.25 (0.14) | |||||||||

Nondense area | Self | 2.84 (0.10) | 2.81 (0.10) | 2.81 (0.10) | |||||

Co-twin | 0.71 (0.17) | 0.24 (0.10) | MZ, 0.24 (0.10) | ||||||

DZ, 0.23 (0.10) | |||||||||

Height (cm) | |||||||||

Dense area | Self | 0.17 (0.25) | 0.10 (0.25) | 0.12 (0.25) | |||||

Co-twin | 0.44 (0.25) | 0.43 (0.25) | MZ, 0.47 (0.26) | ||||||

DZ, 0.42 (0.25) | |||||||||

Nondense area | Self | −0.23 (0.21) | −0.21 (0.21) | −0.21 (0.23) | |||||

Co-twin | −0.46 (0.21) | −0.45 (0.21) | MZ, −0.47 (0.21) | ||||||

DZ, −0.45 (0.21) | |||||||||

Age at menarche (y) | |||||||||

Dense area | Self | 3.30 (1.17) | 3.47 (1.16) | 3.44 (1.16) | |||||

Co-twin | 2.59 (1.17) | 2.80 (1.16) | MZ, 2.90 (1.16) | ||||||

DZ, 2.55 (1.19) | |||||||||

Nondense area | Self | −4.46 (0.93) | −4.75 (0.94) | −4.74 (0.94) | |||||

Co-twin | −1.46 (0.94) | −1.46 (0.94) | MZ, −1.52 (0.95) | ||||||

DZ, −1.13 (0.97) | |||||||||

Live births (number) | |||||||||

Dense area | Self | −4.84 (1.18) | −5.18 (1.22) | −5.18 (1.22) | |||||

Co-twin | 0.10 (1.18) | −1.29 (1.21) | MZ, −1.28 (1.33) | ||||||

DZ, −1.30 (1.53) | |||||||||

Nondense area | Self | 5.88 (0.91) | 6.24 (0.99) | 6.21 (1.00) | |||||

Co-twin | −1.67 (0.93) | 0.88 (1.00) | MZ, 0.79 (1.08) | ||||||

DZ, 1.06 (1.28) |

. | . | Model I . | Model II . | Model III . | Model IV . | ||||
---|---|---|---|---|---|---|---|---|---|

. | . | Est. (SE)* . | Est. (SE)* . | Est. (SE)* . | Est.(SE)* . | ||||

Weight (kg) | |||||||||

Dense area | Self | −1.00 (0.14) | −0.96 (0.14) | −0.95 (0.14) | |||||

Co-twin | −0.38 (0.14) | −0.22 (0.14) | MZ, −0.18 (0.15) | ||||||

DZ, −0.25 (0.14) | |||||||||

Nondense area | Self | 2.84 (0.10) | 2.81 (0.10) | 2.81 (0.10) | |||||

Co-twin | 0.71 (0.17) | 0.24 (0.10) | MZ, 0.24 (0.10) | ||||||

DZ, 0.23 (0.10) | |||||||||

Height (cm) | |||||||||

Dense area | Self | 0.17 (0.25) | 0.10 (0.25) | 0.12 (0.25) | |||||

Co-twin | 0.44 (0.25) | 0.43 (0.25) | MZ, 0.47 (0.26) | ||||||

DZ, 0.42 (0.25) | |||||||||

Nondense area | Self | −0.23 (0.21) | −0.21 (0.21) | −0.21 (0.23) | |||||

Co-twin | −0.46 (0.21) | −0.45 (0.21) | MZ, −0.47 (0.21) | ||||||

DZ, −0.45 (0.21) | |||||||||

Age at menarche (y) | |||||||||

Dense area | Self | 3.30 (1.17) | 3.47 (1.16) | 3.44 (1.16) | |||||

Co-twin | 2.59 (1.17) | 2.80 (1.16) | MZ, 2.90 (1.16) | ||||||

DZ, 2.55 (1.19) | |||||||||

Nondense area | Self | −4.46 (0.93) | −4.75 (0.94) | −4.74 (0.94) | |||||

Co-twin | −1.46 (0.94) | −1.46 (0.94) | MZ, −1.52 (0.95) | ||||||

DZ, −1.13 (0.97) | |||||||||

Live births (number) | |||||||||

Dense area | Self | −4.84 (1.18) | −5.18 (1.22) | −5.18 (1.22) | |||||

Co-twin | 0.10 (1.18) | −1.29 (1.21) | MZ, −1.28 (1.33) | ||||||

DZ, −1.30 (1.53) | |||||||||

Nondense area | Self | 5.88 (0.91) | 6.24 (0.99) | 6.21 (1.00) | |||||

Co-twin | −1.67 (0.93) | 0.88 (1.00) | MZ, 0.79 (1.08) | ||||||

DZ, 1.06 (1.28) |

NOTE: Associations nominally significant at the 0.05 level are shown in bold.

Estimates and SEs are multiplied by 100.

Model II shows that there was also an association between an individual's mammographic measure and their co-twin's weight (all *P* <0.001), although the magnitude of the co-twin coefficient was smaller than the within-individual coefficient from Model I.

Model III shows that, when both the within-individual and co-twin regression coefficients were estimated simultaneously, the within-individual coefficient remained virtually unchanged, whereas the co-twin coefficient became small, about an order of magnitude less than the within-individual coefficient, and was at best marginally statistically significant (*P* = 0.1 and 0.02, respectively). Model IV shows that the co-twin coefficients did not depend on zygosity (both *P* > 0.7).

The observation of a co-twin association with weight that is vastly attenuated by adjusting for the individual's weight is consistent with the causes of variation in mammography measures shared by twins (S_{Y}) being uncorrelated with the causes of variation in weight shared by twins (S_{X}); i.e., ρ = 0. Were this to be the case, the only way there could be an association between the mammographic measures and weight in the co-twin was if the within-individual association with weight was due to a causal effect of weight on the mammographic measure. If it were to be due to confounding with individual specific factors (I) alone, there would not be an association between the mammographic measure and the co-twin's weight. This leaves the two causal pathways. If the pathway went from mammographic measure to weight that also would not generate an association between the measure and the co-twin's weight.

### Height

Table 2 also shows that neither dense nor nondense area was associated with height within an individual (Model I; both *P* > 0.3). Furthermore, the estimated coefficients changed little when both were fitted simultaneously (Model II and Model III), and there was no evidence that the co-twin coefficient differed by zygosity (Model IV; both *P* > 0.9).

Table 3 shows that, after adjusting for the individual's weight, both dense and nondense areas were associated with height within an individual (Model I; *P* = 0.01 and *P* < 0.001, respectively). Model II shows before that there was an association between an individual's mammographic measure and their co-twin's height, once adjustment was made for the co-twin's weight (*P* = 0.01 and *P* < 0.001, respectively). Model III then shows, that before, when both the within-individual and co-twin coefficients were estimated simultaneously, the coefficients for height decreased by a small proportion. After adjusting for weight, the individual and co-twin height coefficients were marginally statistically significant for dense area (*P* = 0.04 and 0.03, respectively) but highly significant for nondense area (both *P* < 0.001). Model IV shows that the co-twin coefficients did not depend on zygosity (both *P* > 0.9).

**Table 3.**

. | . | Model I . | Model II . | Model III . | Model IV . | ||||
---|---|---|---|---|---|---|---|---|---|

. | . | Est. (SE)* . | Est.(SE)* . | Est. (SE)* . | Est. (SE)* . | ||||

Height (cm) | |||||||||

Dense area | Self | 0.62 (0.25) | 0.54 (0.26) | 0.54 (0.26) | |||||

Co-twin | 0.61 (0.25) | 0.53 (0.25) | MZ, 0.56 (0.25) | ||||||

DZ, 0.53 (0.14) | |||||||||

Nondense area | Self | −1.54 (0.17) | −1.42 (0.17) | 2.81 (0.17) | |||||

Co-twin | −0.95 (0.18) | −0.73 (0.17) | MZ, −0.73 (0.17) | ||||||

DZ, −0.73 (0.18) | |||||||||

Age at menarche (y) | |||||||||

Dense area | Self | 2.48 (1.16) | 2.61 (1.16) | 2.60 (1.15) | |||||

Co-twin | 2.68 (1.17) | 2.80 (1.15) | MZ, 2.87 (1.15) | ||||||

DZ, 2.64 (1.17) | |||||||||

Nondense area | Self | −2.20 (0.78) | −2.34 (0.78) | −4.74 (0.78) | |||||

Co-twin | −1.14 (0.78) | −1.37 (0.78) | MZ, −1.32 (0.78) | ||||||

DZ, −1.48 (0.80) | |||||||||

Live births (number) | |||||||||

Dense area | Self | −4.16 (1.17) | −4.41 (1.21) | −4.44 (1.21) | |||||

Co-twin | 0.17 (0.12) | −0.10 (0.12) | MZ, −0.11 (0.13) | ||||||

DZ, −0.07 (0.15) | |||||||||

Nondense area | Self | 4.01 (0.91) | 3.98 (0.99) | 4.08 (0.83) | |||||

Co-twin | −1.35 (0.93) | −0.09 (1.00) | MZ, 0.24 (0.89) | ||||||

DZ, −0.67 (1.04) |

. | . | Model I . | Model II . | Model III . | Model IV . | ||||
---|---|---|---|---|---|---|---|---|---|

. | . | Est. (SE)* . | Est.(SE)* . | Est. (SE)* . | Est. (SE)* . | ||||

Height (cm) | |||||||||

Dense area | Self | 0.62 (0.25) | 0.54 (0.26) | 0.54 (0.26) | |||||

Co-twin | 0.61 (0.25) | 0.53 (0.25) | MZ, 0.56 (0.25) | ||||||

DZ, 0.53 (0.14) | |||||||||

Nondense area | Self | −1.54 (0.17) | −1.42 (0.17) | 2.81 (0.17) | |||||

Co-twin | −0.95 (0.18) | −0.73 (0.17) | MZ, −0.73 (0.17) | ||||||

DZ, −0.73 (0.18) | |||||||||

Age at menarche (y) | |||||||||

Dense area | Self | 2.48 (1.16) | 2.61 (1.16) | 2.60 (1.15) | |||||

Co-twin | 2.68 (1.17) | 2.80 (1.15) | MZ, 2.87 (1.15) | ||||||

DZ, 2.64 (1.17) | |||||||||

Nondense area | Self | −2.20 (0.78) | −2.34 (0.78) | −4.74 (0.78) | |||||

Co-twin | −1.14 (0.78) | −1.37 (0.78) | MZ, −1.32 (0.78) | ||||||

DZ, −1.48 (0.80) | |||||||||

Live births (number) | |||||||||

Dense area | Self | −4.16 (1.17) | −4.41 (1.21) | −4.44 (1.21) | |||||

Co-twin | 0.17 (0.12) | −0.10 (0.12) | MZ, −0.11 (0.13) | ||||||

DZ, −0.07 (0.15) | |||||||||

Nondense area | Self | 4.01 (0.91) | 3.98 (0.99) | 4.08 (0.83) | |||||

Co-twin | −1.35 (0.93) | −0.09 (1.00) | MZ, 0.24 (0.89) | ||||||

DZ, −0.67 (1.04) |

NOTE: Associations nominally significant at the.05 level are shown in bold.

Estimates and SEs are multiplied by 100.

Thus, an association with height was only evident after adjusting for weight, and was in the opposite direction to that with weight. After adjusting for weight, the association with the co-twin's height was independent of the individual's height. This is consistent with the unmeasured determinants of the mammographic measures that are shared by twins (S_{Y}) being correlated with the unmeasured determinants of height that are shared by twins (S_{X}); i.e., ρ ≠ 0. For each mammographic measure, the determinants of both the measure and height that are shared by twins are unlikely to have a substantial genetic etiology because the co-twin coefficient did not depend on zygosity.

### Age at Menarche

Table 2 shows that dense and nondense areas were associated with age at menarche within an individual (Model I; both *P* < 0.001). The direction of association was positive for dense area and negative for nondense area.

For dense area, Model II shows that there was also a positive association with the co-twin's age at menarche (*P* < 0.001). Model III shows that, when both the within-individual and co-twin coefficients were estimated simultaneously, both remained virtually unchanged and stayed highly significant (both *P* < 0.001). Model IV shows that the co-twin coefficient did not depend on zygosity (*P* = 0.8).

For nondense area, Tables 2 and 3 show that, within an individual, both dense and nondense areas were associated with age at menarche within an individual only after adjusting for weight (Model I; *P* = 0.03 and 0.005, respectively; Table 3).

As discussed above for height adjusted for weight, for dense area, the causes shared by twins (S_{Y}) are likely to be correlated with the causes of age at menarche shared by twins (S_{X}); i.e., ρ ≠ 0. These correlated causes, however, are unlikely to have a genetic etiology because the co-twin coefficient did not depend on zygosity. Their effect was also not changed by adjusting for weight. Weight, though, seemed to explain a proportion of the within-individual association between dense area and age at menarche because the within-individual (self) coefficients under Models I, III, and IV went from 3.30, 3.47, and 3.44, respectively (Table 2) to 2.48, 2.61, and 2.60, respectively, after adjusting for weight. Note that the corresponding co-twin coefficients did not change appreciably after adjusting for weight. There was no evidence that ρ ≠ 0, although a small correlation cannot be excluded. Therefore, the association between dense area and age at menarche seems to be most consistent with factors that are specific to individuals, not shared by or common to twins within the same pair.

### Number of Live Births

Table 2 shows that, within an individual, both mammographic measures were associated with number of live births (both *P* < 0.001). The direction of association was negative for dense area and positive for nondense area, and the absolute magnitudes, in terms of log cm^{2}/birth, were about the same. For both dense and nondense area, adjusting for weight led to a decrease in the within-individual associations, and in the absolute value of the self-coefficients for dense area, and the co-twin coefficients remained not significant (Table 3). There was no association between an individual's mammographic measure and the number of live births of their co-twin (both *P* > 0.1).

## Discussion

The fundamental problem in making a causal interpretation of associations from observational data is the possibility that such associations are due to confounding, especially by unmeasured factors. Some putative predictors of an outcome may be substantially correlated between relatives, and it is therefore possible that unmeasured familial factors are responsible for at least part of an observed association between such predictors and the outcome.

This the first application of a novel approach to analyzing such familial data that can identify not only whether a covariate is associated with the outcome of interest, but also, by asking if there is an independent contribution from the relative's covariate, give insights into how that association's underlying mechanisms might be elucidated. The example we have presented involves a continuously distributed outcome, and consequently uses ordinary linear regression. The method could be extended to other outcome measures such as binary or categorical traits, although the statistical theory is not as straightforward. In this study, we used data from twin pairs (both MZ and DZ), although the method can equally be applied to data from pairs (or even multiples) of relatives other than twins, such as siblings or parent-offspring pairs. The key is that the associations between both the predictor and outcome, and the predictor of one relative and the predictor of the other relative, are sufficiently strong and the regression coefficients are estimated precisely enough that clear statements can be made about whether or not they change as different factors are included in the models.

We do not claim that this analysis of cross-sectional data alone can be used to make definitive causal inferences, as there are many other issues that need to be taken into consideration. Rather, we suggest that this approach could complement Austin Bradford Hill's guidelines for determining causality (10). As explained by Sir Richard Doll, Hill did not discuss whether in principle conclusions about causality could be reached by epidemiologic associations alone, but considered each situation as it presented itself to “see where the evidence led”; the guidelines were “aids to thought” (11). In this context, the approach we have presented considers the evidence for confounding due to (genetic and/or nongenetic) factors shared by, or common to, twins against that for a direct causative effect. As a guideline for causation, it might be called “elimination of familial confounding.”

We found that different scenarios can occur, so it is not as if the approach has an inherent predisposition to support only one paradigm. In particular, the traditional twin modeling approach, which assumes there is no direct causation and that all cross-twin cross-trait associations are due to familial factors that influence both the covariate and the outcome, does not seem to apply universally. This suggests a rethinking of classic twin modeling of multivariate traits.

First, it has been known for a long time that weight is negatively associated with PMD. We found that this largely derives from a strong positive association with nondense area and, to a lesser extent, to a negative association with dense area. For both dense and nondense area, twins are correlated. There is also an association, albeit weaker, between the mammographic measures for one twin and the same measure for the co-twin. This co-twin association, however, disappeared almost completely when the mammographic measures were also regressed against the twin's weight. This is consistent with there being a direct casual association between weight and the mammographic density measures, in particular nondense area (the negative association for nondense area might be an artifact of reduced ability to measure dense areas in women over 80 kg in weight.) That is, there might not be a substantial role for genetic factors that cause variation in both mammographic density measures and weight. In this regard, it is of interest to note that the first genome-wide scan family study of mammographic density measures failed to find evidence for markers in any chromosomal region that displayed linkage to both PMD and body mass index (12).

If we had applied the usual bivariate twin model to these data, we would have concluded, perhaps incorrectly, that part of the reason why these two measures were associated was that one or more of the factors that cause mammographic measures to be correlated within twin pairs (S_{Y}), and those that cause weight to be correlated within twin pairs (S_{Y}), are correlated. But had this been so, we would have found that the association of a given twin's mammographic measure with her co-twin's weight would have been for the main part unaltered by inclusion or exclusion of the given twin's weight in the regression equation, contrary to what was observed.

Second, the results of the model fitting both for height (after adjusting for weight), and for age at menarche (irrespective of adjusting for weight), followed a different pattern from that described above for weight. The individual and co-twin coefficients were similar whether or not the other coefficient was included in the model. This is consistent with there being factors shared by twins that have an effect on both the mammographic measure and the covariate under consideration, height, and age at menarche. This pattern is not consistent with a casual association. Moreover, the co-twin coefficients seem to be the same for MZ and DZ pairs, so it is difficult to argue that there are genes that influence both a mammographic measure (adjusted for weight) and height or age at menarche. This is despite the fact that there is considerable evidence dating back to Fisher's seminal 1918 paper (13) that, within a fixed environment, genetic factors might explain the vast majority of variance. This does not, however, preclude a role for environmental factors shared by relatives as there are many reasons why so-called common environment effects can be missed using traditional variance components modeling (14), especially given that the women in our study experienced menarche between the early 1940s and late 1960s, a period during which there were rapid changes in the availability of food. On the other hand, one must be cautious about concluding that there are no such genetic effects, as our null finding could have been due to a lack of power, and recent work that shows MZ and DZ within-pair regression coefficients can be similar even in the presence of a genetic effect that contributes to the regression relationship (7).

Third, there were no associations with the co-twin's number of live births irrespective of whether or not adjustment was made for the individual's own number of live births. Thus, there is no evidence for factors shared by twins influencing both number of births and the mammographic measures. We cannot untangle whether the association between number of live births and mammographic measures in the same individual is due to confounding with an individual specific factor, or causative.

Thus, we have found three different causal scenarios for individual and co-twin coefficients with regard to measures of mammographic density. The association with weight is consistent with a causal model, in the sense that a change in a woman's weight will have a direct influence of her age-adjusted mammographic density measure, especially nondense area. It does not seem to be due to confounding with, nor suggest evidence for, the existence of some genes or environmental factors being involved with both mammographic density and weight.

On the other hand, our analyses suggest that there are factors shared by twins that influence height and/or age at menarche, and also influence the mammographic measures adjusted for weight (and age), especially nondense area. These factors seem more likely to be environmental, rather than genetic, because the co-twin coefficient were independent of zygosity. Given that virtually maximum height is achieved within a few years of menarche, these environmental factors would seem to be related to early life. This raises the intriguing possibility that insights into growth during adolescence and breast cancer risk might come from studying mammographic density measures.

Later age at menarche and greater height were associated with greater dense area and lesser nondense area, adjusting for age and weight, so are associated with greater PMD (as reported previously by Vachon and colleagues; ref. 15 and others, e.g., ref. 16). That is, our and others' data suggest that the association of a later age at menarche with a reduced breast cancer risk is not mediated through PMD. Height, on the other hand, *might* have an effect on breast cancer risk through its association with age- and weight-adjusted mammographic density measures. Furthermore, this association does not seem to be directly causal but might be due at least in part to the environmental factors shared by twins in early life referred to in the paragraph above.

We found a negative association between number of live births and dense area, a positive association with nondense area, and hence a negative association with PMD (as have others refs. 15-22). A small proportion of these associations seemed to be explained by weight. Therefore, the protective effect of increased parity on breast cancer risk could be in part mediated through PMD, through being associated with one or both of decreasing dense area and increasing nondense area.

Although the composite measure, PMD, has been used in almost all previous reports on mammographic densities, we have focused on the component measures, dense area, and nondense area. We have done so to try to understand how the different covariates might be influencing or are associated with PMD. As we have shown above, inference about PMD can be readily drawn from our findings due to the statistically significant associations being consistently in opposite directions for dense and nondense areas.

If a covariate, *X*, has a causative effect on an outcome, *Y*, this regression approach can give insights provided as follows: (*a*) *X* is correlated to a reasonable, if not large, extent within twin pairs, and (*b*) the effect of *X* on *Y* is sufficient that, when combined with *a*, produces a reasonably strong cross-trait cross-twin correlation. If this is the case, the patterns of associations with weight from the different model fits that we observed will occur. The important aspect of this regression approach is that both univariable and multivariable models are fitted, and the focus of attention is on the way variable estimates change as other factors are included in the model, rather than on the “best fitting model” itself.

This is the first time that we have applied this regression approach to analyzing associations using twin data. Statistical modeling is an attempt to identify the plausible and implausible explanations of data; i.e., determine the model(s) that are consistent with the data. It is intriguing that the data for different covariates were consistent with different models. The finding that weight might have a direct causal association, especially with nondense area, is consistent with the latter measure's correlation with body fat. The finding of a role for nongenetic (i.e., environmental) factors shared by twins having an effect on both height and dense area could be revealing how growth around the time of puberty influences breast cancer risk. It is possible that one cannot translate from studying differences between and similarities within twin pairs to the same issues for non-twin pairs, so we are in the process of collecting similar data for (non-twin) sister pairs and plan to conduct similar analysis to try to replicate these findings.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

**Grant support:** The studies were supported in Australia by grants from the Kathleen Cuningham Foundation (now the National Breast Cancer Foundation), the National Health and Medical Research Foundation, the Victorian Breast Cancer Research Consortium and the Merck Sharp and Dohme Research Foundation, and in North America by a grant from the Canadian Breast Cancer Research Initiative.

**Note:** Supplementary data for this article are available at Cancer Epidemiology Biomarkers and Prevention Online (http://cebp.aacrjournals.org/).

G.S. Dite, MRE. McCredie, D.R. English, G.G. Giles, J. Cawson, and J.L. Hopper participated in the design and conduct of the Australian study. A. Gunasekara, R.A. Hegele, A.M. Chiarelli, M.J. Yaffe, and N.F. Boyd participated in the design and conduct of the North American study, and measurement of mammograms for both studies. J.L. Hopper, G.S. Dite, L.C. Gurrin, G.B. Byrnes, and J. Stone devised and perform the statistical analysis, and G.S. Dite, L.C. Gurrin, and J.L. Hopper drafted the manuscript. All authors read and approved the final manuscript.

## Acknowledgments

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank Maggie Angelakos for data management and digitization of mammograms, and Penny Allen and Helen O'Connor for assistance with interviewing, and the twins who participated in the studies, the Australian Twin Registry, BreastScreen (Australia), and the Ontario Breast Screening Program. J.L. Hopper is an Australia Fellow of the National Health and Medical Research Council and a Group Leader of the Victorian Breast Cancer Research Consortium.