Abstract
Background: Breast cancer has a complex etiology that includes genetic, biologic, behavioral, environmental, and social factors. Etiologic factors are frequently studied in isolation with adjustment for confounding, mediating, and moderating effects of other factors. A complex systems model approach may present a more comprehensive picture of the multifactorial etiology of breast cancer.
Methods: We took a transdisciplinary approach with experts from relevant fields to develop a conceptual model of the etiology of postmenopausal breast cancer. The model incorporated evidence of both the strength of association and the quality of the evidence. We operationalized this conceptual model through a mathematical simulation model with a subset of variables, namely, age, race/ethnicity, age at menarche, age at first birth, age at menopause, obesity, alcohol consumption, income, tobacco use, use of hormone therapy (HT), and BRCA1/2 genotype.
Results: In simulating incidence for California in 2000, the separate impact of individual variables was modest, but reduction in HT, increase in the age at menarche, and to a lesser extent reduction in excess BMI >30 kg/m2 were more substantial.
Conclusions: Complex systems models can yield new insights on the etiologic factors involved in postmenopausal breast cancer. Modification of factors at a population level may only modestly affect risk estimates, while still having an important impact on the absolute number of women affected.
Impact: This novel effort highlighted the complexity of breast cancer etiology, revealed areas of challenge in the methodology of developing complex systems models, and suggested additional areas for further study. Cancer Epidemiol Biomarkers Prev; 23(10); 2078–92. ©2014 AACR.
Introduction
Factors at multiple levels of organization ranging from the biologic to societal influence most diseases. Breast cancer is an example of a disease with such a complex etiology and various social, physical, individual, and biologic factors have been identified as being associated with breast cancer incidence (1). But as in many fields of scientific inquiry, these factors have generally been treated as distinct and unrelated to each other. The common tendency to focus on single causal agents obscures the reality of a complex web of causation (2). Individuals in the general public still talk about “the cause” of breast cancer, but the reality is not so simple.
The interaction of causal factors across biologic, social, and environmental determinants is poorly understood, but this understanding is necessary for developing prevention strategies (3). The lack of integration among risk factors at different levels, as well as the failure to consider feedbacks and dependencies undermines a full understanding of the causes of breast cancer and prospects for its prevention. We report here on the development of a multilevel model of the causes of postmenopausal breast cancer incidence.
We had four primary objectives in mind. The first was to create a conceptual model that included relevant and important known and potential causes within major domains, specifically sociocultural, behavioral and lifestyle physical/chemical and biologic domains. A second objective was to develop a mathematical simulation model that could provide a quantitative framework for exploration of “what if?” scenarios of how interventions could influence long-term breast cancer incidence rates. The use of a model is critical because there are currently no datasets that offer all of the variables included in the complex system we envision.
Third, we wanted this model to be adaptable so that as new scientific evidence accumulates, we want to be able to integrate it into the model. The fourth objective was to create a model that could be used by multiple audiences, including scientists and the general public, to foster understanding of the range of factors that contribute to causing breast cancer.
Materials and Methods
The model was the result of a systematic process and based on input from a range of experts across multiple disciplines. The strategy for development was considered in four parts: (i) the selection of the experts, (ii) a series of meetings with thoughtful, unrushed, and respectful interaction among the experts to develop the components of the model, (iii) refinement of the model based on input from a larger range of potential model users, and (iv) the development and application of a mathematical model that used a selected subset of factors from the full model.
Process for creating the conceptual model
We first convened experts in the fields of genetics (A. Balmain), cell biology (Z. Werb), clinical oncology (M. Moasser), nutrition (L.H. Kushi), environmental health (G.C. Windham), epidemiology (D. Braithwaite, A.V. Diez-Roux, L.H. Kushi, R.A. Hiatt, D.H. Rehkopf, and G.C. Windham), breast cancer advocacy (J. Barlow), complex systems (T.C. Porco and A.V. Diez-Roux), and mathematical modeling (A.V. Diez-Roux and T.C. Porco). Over the course of one year, the group met for three extended meetings to develop the model, with a core group working to implement the group consensus. The first meeting was focused on listening to and reflecting on each individual perspective on the causes of breast cancer incidence, which differed substantially between scientific disciplines. In the second meeting, participants worked to integrate their perspectives into a single model. In the third meeting, participants worked to refine the final draft version of the model that had been constructed in the interim. Finally, the model was circulated to these experts and a wider range of knowledgeable individuals for feedback on both content and format and then a final review by all coauthors. The two end products were an annotated conceptual model and a smaller mathematical model that drew largely modifiable factors from the parent conceptual model.
Deciding on the factors to be included in the conceptual model
As a first step, we created a broad working model by collecting proposed causal risk factors from each panel member. The second step was for each panel member to select the best evidence from published peer-reviewed literature (i.e, 2–3 papers) that supported these causal relationships. The third step was for each panel member to interpret the strength of association and quality of evidence for each factor based on standard criteria (see below). The fourth step was identifying, from the broader literature, how each of the factors was related to each other. The fifth step was interpreting the strength of association and strength of evidence from the literature for connections between factors in the models.
Inclusion and exclusion of factors
To create a practical and useful model, we adopted a number of constraints. We decided that the model would be specific to invasive breast cancer incidence, rather than mortality. This was due to our focus on causes of breast cancer for prevention and not wanting to include factors in diagnosis and treatment that influence mortality. We also chose to focus our model on postmenopausal breast cancer incidence because the causes of breast cancer differ by menopausal status. Roughly 78% of breast cancer in California, our chosen geographic area of study, occurs in women over the age of 50 years (4). Also, more genetic determinants of pre- versus postmenopausal breast cancer have been found, suggesting a relatively greater importance of environmental causes of postmenopausal breast cancer.
We did not include screening as a factor in this model for two reasons. First, our intent was to highlight factors important in the etiology, not early detection of breast cancer. Second, the impact of screening on the incidence of invasive cancer (the outcome of this model) was probably small by 2000 when the rate of screening had reached a steady or slightly decreasing state in California (5). Thus, although it is clear that breast cancer screening will influence incidence rates, our model is meant to include only truly etiologic factors that could be modified through primary prevention.
The model is not specific to particular subtypes of breast cancer [e.g., estrogen receptor (ER) status, basal cell type] because there was not yet sufficient evidence on differential causation by subtype. We also decided that to create a more comprehensive framework, we should include factors even if the types of evidence for causes for these factors differed. For example, for some factors there are neither population-based studies (e.g., inflammation and immune function) nor quantitative data (e.g., ancestry) available. Finally, we could not include all possible factors and decided to limit the number of factors to what the expert committee judged to be major factors in each domain.
Strength of association
To illustrate the strength of association between factors in the model based on the best evidence available, we derived our categories of association based on the range of relative risks of studies of breast cancer. Our three categories were: category 1, RR > 3.0 (strong); category 2, RR 1.8–3.0 (modest); category 3, RR 1.1–1.8 (weak). These specific categories are based on frequently used levels similar to those used by the Harvard Cancer Risk Index (6) and allow for a reasonably good differentiation of the strength of association for breast cancer-related risk factors.
Quality of the data
The expert panel felt that is was important to express the overall quality of the evidence, even though the type of evidence differed across factors included in our model. This approach allowed factors to be included in the model that were representative of etiologic factors of interest (e.g., environmental factors) but for which little human evidence currently exists.
The U.S. Preventive Services Task force (USPSTF) grades the quality of the overall evidence on a 3-point scale (good, fair, poor): where good means “evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health outcomes,” fair means “evidence is sufficient to determine effects on health outcomes, but the strength of the evidence is limited by the number, quality, or consistency of the individual studies, generalizability to routine practice, or indirect nature of the evidence on health outcomes,” and poor is appropriate when “evidence is insufficient to assess the effects on health outcomes because of limited number or power of studies, important flaws in their design or conduct, gaps in the chain of evidence, or lack of information on important health outcomes” (7). It is difficult to assess the level of evidence that relates a concept like country of birth to other factors in the model, or some of the immunologic evidence that has not been examined in human studies, but generally our three categories—1, strong; 2, moderate; and 3, weak—are analogous to the three USPSTF levels of evidence and should be considered relative measures within the range of factors that we examined.
Mathematical model
We used the structure of the conceptual model as the basis for representing the processes that lead to population level postmenopausal breast cancer incidence rates. For tractability, the simulation model presented here used a subset of 11 factors that are included in the overall conceptual model. These factors were age, self-identified race/ethnicity (classified as “Black,” “Hispanic or Latina,” “White,” or “Other”), age at menarche, age at first birth, age at menopause, obesity, alcohol consumption, income, tobacco use, use of hormone therapy (HT), and BRCA1/2 genotype. These factors were chosen because they include those with the strongest empirical support for a causal role in breast cancer incidence and because they include factors that are generally modifiable (i.e., income, alcohol, tobacco, and HT use). For each predictive factor, we first derived a marginal distribution for each ethnic group using census data, National Health and Nutrition Examination Survey (NHANES) data or other sources. We then derived the ethnic group-specific correlation matrix for these predictive factors from NHANES data. We generated a simulated population matching the racial/ethnic composition and age distribution of California women in 2000, using this correlation structure. We then applied a function to predict the cancer incidence based on these risk factors. All simulations were conducted using R v 3.0 for MacIntosh (www.r-project.org).
This approach allows for the incorporation of results from various types of studies about the relations between variables. Data from published studies can be used to inform various rules (or parameters) encoded in the model. In addition, model output (e.g., predictions) can be contrasted with various types of population data. The distributions of variables can also be based on existing population data. The covariance and distributions of the 11 factors included in our model were calculated from the female population age of 50 years and older in NHANES III, which is a population-based nationally representative source that was surveyed between 1988–1994 (8).
California Department of Finance data for the 2000 census were used to first establish the fraction of the population in each of four race/ethnicity classifications: Black, Hispanic, White, and Other (the latter category includes mixed race, Asian and Pacific Islander, Native American and Native Hawaiian; ref. 9). We then used California Department of Finance demographic data to yield the age structure for women of the age 55 years and older for each ethnic classification based on the year 2000 census. The distribution of income by ethnic group was derived from analysis of the 2000 census (10).
For each of the four ethnic classifications, the marginal distribution of the following variables was derived from NHANES data: HT (HT-percent of the population), body mass index (BMI), alcohol use (g/day), age at menopause, age at menarche, age at first birth more than 30 years, income, and tobacco use (percentage of the population). We approximated these marginal distributions by using a piecewise linear cumulative distribution function whose deciles match the ethnic group-specific marginal distribution of each risk factor based on NHANES.
We used the normal-to-anything transformation to generate samples from the prescribed marginal distributions and the given correlation structure (11). Calibration was conducted to insure convergence with tolerance ≤0.01 for each correlation. Simulating cancer incidence from the joint distribution of the predictive factors was conducted by simply multiplying the relative hazards for each risk factor given the other risk factors and age of each individual.
Results
Conceptual model
The factors shown in Fig. 1, supported by the literature shown in Table 1, were the final results of our model building process. We divided selected factors into the four domains of interest: sociocultural, behavioral, physical-chemical, and biologic. The table provides a list of each of the relationships (arrows) that are in the model, indicating the starting node, the ending node, the rank of strength of association, and the rank of the quality of evidence supporting this arrow along with the supporting citations. The direction of the influence, increasing or decreasing incidence, is not specified by the model itself, but provided by the supporting literature. The references provided are not exhaustive, but meant to provide a source for the evidence that can be explored or updated further by the interested reader.
Arrow start . | Arrow end . | Strength . | Quality . | Reference . |
---|---|---|---|---|
Socio/cultural | ||||
Country of birth | Breast feeding | 2 | 2 | (12, 13) |
Country of birth | Education | 1 | 1 | (14, 15) |
Country of birth | Latitude | 1 | 1 | (16) |
Country of birth | Phytoestrogens | 2 | 2 | (17, 18) |
Country of birth | Race/ethnicity | 1 | 1 | (19, 20) |
Education | Age at first birth, parity | 2 | 2 | (21) |
Education | Income | 1 | 1 | (22, 23) |
Education | Occupation | 1 | 1 | (24, 25) |
Education | Physical activity | 2 | 2 | (26) |
Education | Tobacco use | 2 | 2 | (27, 28) |
Income | Age at menarche | 3 | 2 | (29) |
Income | Alcohol | 3 | 3 | (30, 31) |
Income | Environmental tobacco | 1 | 3 | (32, 33) |
Income | HT | 2 | 2 | (34) |
Income | Obesity | 2 | 3 | (35, 36) |
Income | Physical activity | 2 | 3 | (26, 37) |
Income | Tobacco use | 2 | 3 | (38) |
Income | Height | 3 | 2 | (39, 40) |
Occupation | Alcohol | 3 | 2 | (41) |
Occupation | Environmental tobacco | 3 | 2 | (42) |
Occupation | HT | 2 | 2 | (34) |
Occupation | Sleep disturbance | 3 | 3 | (43) |
Race/ethnicity | Age at menarche | 2 | 2 | (29) |
Race/ethnicity | Breast feeding | 3 | 3 | (13, 44) |
Race/ethnicity | Education | 2 | 2 | (45, 46) |
Race/ethnicity | Endocrine disrupting chemicals | 2 | 2 | (47, 48) |
Race/ethnicity | Endogenous hormones | 2 | 3 | (49–51) |
Race/ethnicity | Environmental tobacco | 2 | 2 | (52, 53) |
Race/ethnicity | Income | 2 | 2 | (46) |
Race/ethnicity | Obesity | 3 | 2 | (35) |
Race/ethnicity | Occupation | 2 | 2 | (46, 54) |
Race/ethnicity | Tobacco use | 2 | 2 | (27) |
Physical | ||||
Endocrine disrupting chemicals | Incidence | 2 | 2 | (55, 56) |
Environmental tobacco | Incidence | 3 | 3 | (42, 57) |
Latitude | Vitamin D | 3 | 3 | (58, 59) |
Radiation | Incidence | 2 | 2 | (60, 61) |
Sleep disturbance | Incidence | 3 | 2 | (62, 63) |
Behavioral | ||||
Age at first birth, parity | Breast density | 3 | 3 | (64) |
Age at first birth, parity | Endogenous hormones | 2 | 2 | (65) |
Age at first birth, parity | Incidence | 2 | 2 | (66, 67) |
Alcohol | Incidence | 3 | 2 | (68–71) |
Alcohol | Income | 3 | 3 | (72) |
Breast feeding | Incidence | 3 | 2 | (73) |
HT | Incidence | 2 | 1 | (74, 75) |
Obesity | Age at menarche | 2 | 2 | (76) |
Obesity | Immune function | 3 | 3 | (77, 78) |
Obesity | Incidence | 2 | 2 | (68, 69, 79, 80) |
Obesity | Income | 3 | 3 | (36, 81) |
Obesity | Insulin resistance | 2 | 2 | (82) |
Obesity | Physical activity | 2 | 2 | (83) |
Physical activity | Incidence | 2 | 2 | (68, 69) |
Physical activity | Obesity | 2 | 1 | (84, 85) |
Phytoestrogens | Incidence | 3 | 2 | (86–88) |
Tobacco use | Endogenous hormones | 3 | 2 | (89) |
Tobacco use | Incidence | 3 | 2 | (90, 91) |
Biologic | ||||
Age | Incidence | 1 | 1 | (66) |
Age at menarche | Endogenous hormones | 3 | 2 | (65, 66) |
Age at menarche | Incidence | 3 | 2 | (66, 92) |
Age at menopause | Breast density | 2 | 2 | (93, 94) |
Age at menopause | Endogenous hormones | 3 | 2 | (65) |
Age at menopause | Incidence | 2 | 2 | (66) |
Ancestry | High-penetrance genes | 1 | 1 | (95–98) |
Ancestry | Low-penetrance genes | 1 | 1 | (99–104) |
Ancestry | Race/ethnicity | 1 | 1 | (19, 105) |
Breast density | Incidence | 2 | 2 | (106–109) |
Endogenous hormones | Age at menarche | 1 | 3 | (110–112) |
Endogenous hormones | Age at menopause | 1 | 1 | (113) |
Endogenous hormones | Obesity | 2 | 1 | (113) |
Endogenous hormones | Incidence | 3 | 3 | (65) |
Genotoxins | Incidence | 3 | 3 | (56, 114) |
Height | Incidence | 3 | 3 | (115, 116) |
High-penetrance genes | Incidence | 3 | 1 | (100, 117, 118) |
Immune function | Incidence | 3 | 3 | (119) |
Immune function | Insulin resistance | 2 | 2 | (120, 121) |
Insulin resistance | Immune function | 2 | 2 | (120, 121) |
Insulin resistance | Incidence | 3 | 3 | (122, 123) |
Insulin resistance | Obesity | 3 | 2 | (124, 125) |
Low-penetrance genes | Breast density | 3 | 2 | (126, 127) |
Low-penetrance genes | Height | 3 | 1 | (128, 129) |
Low-penetrance genes | Incidence | 2 | 1 | (99, 100, 102, 103, 130, 131) |
Low-penetrance genes | Insulin resistance | 2 | 1 | (132, 133) |
Low-penetrance genes | Obesity | 3 | 1 | (134–136) |
Vitamin D | Incidence | 2 | 2 | (137, 138) |
Arrow start . | Arrow end . | Strength . | Quality . | Reference . |
---|---|---|---|---|
Socio/cultural | ||||
Country of birth | Breast feeding | 2 | 2 | (12, 13) |
Country of birth | Education | 1 | 1 | (14, 15) |
Country of birth | Latitude | 1 | 1 | (16) |
Country of birth | Phytoestrogens | 2 | 2 | (17, 18) |
Country of birth | Race/ethnicity | 1 | 1 | (19, 20) |
Education | Age at first birth, parity | 2 | 2 | (21) |
Education | Income | 1 | 1 | (22, 23) |
Education | Occupation | 1 | 1 | (24, 25) |
Education | Physical activity | 2 | 2 | (26) |
Education | Tobacco use | 2 | 2 | (27, 28) |
Income | Age at menarche | 3 | 2 | (29) |
Income | Alcohol | 3 | 3 | (30, 31) |
Income | Environmental tobacco | 1 | 3 | (32, 33) |
Income | HT | 2 | 2 | (34) |
Income | Obesity | 2 | 3 | (35, 36) |
Income | Physical activity | 2 | 3 | (26, 37) |
Income | Tobacco use | 2 | 3 | (38) |
Income | Height | 3 | 2 | (39, 40) |
Occupation | Alcohol | 3 | 2 | (41) |
Occupation | Environmental tobacco | 3 | 2 | (42) |
Occupation | HT | 2 | 2 | (34) |
Occupation | Sleep disturbance | 3 | 3 | (43) |
Race/ethnicity | Age at menarche | 2 | 2 | (29) |
Race/ethnicity | Breast feeding | 3 | 3 | (13, 44) |
Race/ethnicity | Education | 2 | 2 | (45, 46) |
Race/ethnicity | Endocrine disrupting chemicals | 2 | 2 | (47, 48) |
Race/ethnicity | Endogenous hormones | 2 | 3 | (49–51) |
Race/ethnicity | Environmental tobacco | 2 | 2 | (52, 53) |
Race/ethnicity | Income | 2 | 2 | (46) |
Race/ethnicity | Obesity | 3 | 2 | (35) |
Race/ethnicity | Occupation | 2 | 2 | (46, 54) |
Race/ethnicity | Tobacco use | 2 | 2 | (27) |
Physical | ||||
Endocrine disrupting chemicals | Incidence | 2 | 2 | (55, 56) |
Environmental tobacco | Incidence | 3 | 3 | (42, 57) |
Latitude | Vitamin D | 3 | 3 | (58, 59) |
Radiation | Incidence | 2 | 2 | (60, 61) |
Sleep disturbance | Incidence | 3 | 2 | (62, 63) |
Behavioral | ||||
Age at first birth, parity | Breast density | 3 | 3 | (64) |
Age at first birth, parity | Endogenous hormones | 2 | 2 | (65) |
Age at first birth, parity | Incidence | 2 | 2 | (66, 67) |
Alcohol | Incidence | 3 | 2 | (68–71) |
Alcohol | Income | 3 | 3 | (72) |
Breast feeding | Incidence | 3 | 2 | (73) |
HT | Incidence | 2 | 1 | (74, 75) |
Obesity | Age at menarche | 2 | 2 | (76) |
Obesity | Immune function | 3 | 3 | (77, 78) |
Obesity | Incidence | 2 | 2 | (68, 69, 79, 80) |
Obesity | Income | 3 | 3 | (36, 81) |
Obesity | Insulin resistance | 2 | 2 | (82) |
Obesity | Physical activity | 2 | 2 | (83) |
Physical activity | Incidence | 2 | 2 | (68, 69) |
Physical activity | Obesity | 2 | 1 | (84, 85) |
Phytoestrogens | Incidence | 3 | 2 | (86–88) |
Tobacco use | Endogenous hormones | 3 | 2 | (89) |
Tobacco use | Incidence | 3 | 2 | (90, 91) |
Biologic | ||||
Age | Incidence | 1 | 1 | (66) |
Age at menarche | Endogenous hormones | 3 | 2 | (65, 66) |
Age at menarche | Incidence | 3 | 2 | (66, 92) |
Age at menopause | Breast density | 2 | 2 | (93, 94) |
Age at menopause | Endogenous hormones | 3 | 2 | (65) |
Age at menopause | Incidence | 2 | 2 | (66) |
Ancestry | High-penetrance genes | 1 | 1 | (95–98) |
Ancestry | Low-penetrance genes | 1 | 1 | (99–104) |
Ancestry | Race/ethnicity | 1 | 1 | (19, 105) |
Breast density | Incidence | 2 | 2 | (106–109) |
Endogenous hormones | Age at menarche | 1 | 3 | (110–112) |
Endogenous hormones | Age at menopause | 1 | 1 | (113) |
Endogenous hormones | Obesity | 2 | 1 | (113) |
Endogenous hormones | Incidence | 3 | 3 | (65) |
Genotoxins | Incidence | 3 | 3 | (56, 114) |
Height | Incidence | 3 | 3 | (115, 116) |
High-penetrance genes | Incidence | 3 | 1 | (100, 117, 118) |
Immune function | Incidence | 3 | 3 | (119) |
Immune function | Insulin resistance | 2 | 2 | (120, 121) |
Insulin resistance | Immune function | 2 | 2 | (120, 121) |
Insulin resistance | Incidence | 3 | 3 | (122, 123) |
Insulin resistance | Obesity | 3 | 2 | (124, 125) |
Low-penetrance genes | Breast density | 3 | 2 | (126, 127) |
Low-penetrance genes | Height | 3 | 1 | (128, 129) |
Low-penetrance genes | Incidence | 2 | 1 | (99, 100, 102, 103, 130, 131) |
Low-penetrance genes | Insulin resistance | 2 | 1 | (132, 133) |
Low-penetrance genes | Obesity | 3 | 1 | (134–136) |
Vitamin D | Incidence | 2 | 2 | (137, 138) |
NOTE: The starting point for an arrow is the independent variable and the ending point is the dependent variable. Strength of the relationship is categorized as (1) strong (RR > 3.0), (2) modest (RR >1.8–3.0), or (3) weak (1.1–1.8). Quality of the evidence is categorized as (1) strong, (2) moderate, or (3) weak.
The complexity of the model (Fig. 1) not only illustrates the many factors thought to have a causal role in breast cancer etiology, but may also lead readers to consider absent or interesting areas for further clarification or research. Although many of the well-documented risk factors for postmenopausal breast cancer incidence show arrows going directly to this outcome in our model, there are also a number of other factors that mediate or moderate their relation to breast cancer. For example, many of the factors in the social environment domain are mediated by behavioral factors that affect postmenopausal breast cancer incidence. There are also a few instances where arrows go in both directions between factors in the model, indicating that there may be bidirectional influences. Although we sought to be as inclusive as possible in documented interacting connections between factors, two were left out of the picture for clarity of presentation: age and country of birth. Both of these factors influence the majority of other factors in our model, but the addition of this many arrows would obscure the other connections presented.
The nature of the four domains is described in Table 2 and they should best be considered to be heuristic categories, rather than absolute. Indeed, one of our primary motivations for the development of this model was that factors are interdependent and may not neatly fit into one category.
Domain . | Description . |
---|---|
Physical-chemical environment | Factors that fit most with the physical world, both the natural world and the physical world as created by humans. Factors in this category range from the latitude of residence, to physical and chemical environmental hazards that are located near work and housing. In our conceptual model, most of these factors are influenced at least in part by the social environment. |
Sociocultural environment | Social and cultural factors that are dependent on geography, social networks, and social class that affect access to resources and networks. Factors in this category range from country of birth to occupation. |
Individual behavioral | Individual choices that are made on diet, physical activity, smoking, HT, and child bearing. In our conceptual model, most of these factors are influenced by the social environment and in turn have effects on biologic pathways. |
Biologic | Biologic states of individuals that range from morphologic (e.g., breast density) to genetic (e.g., BRCA1). In our conceptual model, many of these factors are impacted by individual behaviors and the social and physical environment. |
Domain . | Description . |
---|---|
Physical-chemical environment | Factors that fit most with the physical world, both the natural world and the physical world as created by humans. Factors in this category range from the latitude of residence, to physical and chemical environmental hazards that are located near work and housing. In our conceptual model, most of these factors are influenced at least in part by the social environment. |
Sociocultural environment | Social and cultural factors that are dependent on geography, social networks, and social class that affect access to resources and networks. Factors in this category range from country of birth to occupation. |
Individual behavioral | Individual choices that are made on diet, physical activity, smoking, HT, and child bearing. In our conceptual model, most of these factors are influenced by the social environment and in turn have effects on biologic pathways. |
Biologic | Biologic states of individuals that range from morphologic (e.g., breast density) to genetic (e.g., BRCA1). In our conceptual model, many of these factors are impacted by individual behaviors and the social and physical environment. |
Mathematical model
Figure 2 shows the relationships between the subset of 11 factors that are included in the mathematical model. We applied relative hazard estimates for each of these factors, derived from systematic reviews and high-quality studies (66, 67, 70, 71, 74, 75, 79, 80, 91, 92, 100, 117, 118, 139), together with race/ethnic group-specific estimates of the joint distribution of these factors derived from NHANES III, to the population of California excluding sparsely populated areas (Table 3). We then applied a base rate of incidence of breast cancer to the lowest risk group and simulated the progression of cancer in each ethnic group over time. Our results produce incidence rates that closely approximate the observed rates in California in the time period under consideration (i.e., ∼2000). We also reproduce the differences observed for self-reported race and ethnicity. Established “nonmodifiable” risk factors affect risks in the direction and with the magnitude expected.
Name . | Marginal distribution (source) . | Meta-analysis estimated relative hazard and CI . | Meta-analysis reference (if available and applicable) . | |
---|---|---|---|---|
Race-age (y) | California censusa | See Supplementary Table S1 for matrix of risk estimates. | N/A | |
. | ||||
Age at first birth (years, all among parous women) | NHANESa | Category | RR | (140) |
<20 y | 1.00 | |||
20–24 | 1.04 (0.91–1.18) | |||
25–29 | 1.17 (1.02–1.34) | |||
30–34 | 1.30 (1.11–1.51) | |||
35+ | 1.40 (1.15–1.70) | |||
. | ||||
Age at menarche (y) | NHANESa | Age | RR | (139) |
<11 | 1.19 (1.13–1.25) | |||
11 | 1.09 (1.06–1.12) | |||
12 | 1.07 (1.05–1.09) | |||
13 | 1.00 (0.98–1.02) | |||
14 | 0.98 (0.96–1.00) | |||
15 | 0.92 (0.89–0.95) | |||
≥16 | 0.82 (0.79–0.85) | |||
. | ||||
Age at menopause (y) | NHANESa | Age | RR | (139) |
<40 | 0.67 (0.62–0.73) | |||
40–44 | 0.73 (0.70–0.77) | |||
45–49 | 0.86 (0.84–0.89) | |||
50–54 | 1.00 (0.98–1.02) | |||
≥55 | 1.12 (1.07–1.17) | |||
. | ||||
Alcohol use (g/day) | NHANESa | g/d | RR (SE) | (141) |
0 | 1.00 (0.015) | |||
<5 | 1.01 (0.020) | |||
5–14 | 1.01 (0.023) | |||
15–24 | 1.19 (0.048) | |||
25–34 | 1.22 (0.056) | |||
35–44 | 1.18 (0.093) | |||
≥45 | 1.49 (0.110) | |||
OR increase in relative risk per 10 g/day: 7.1% (1.3%) | ||||
. | ||||
BRCAb | 0.1% | 5.0 (4–6) estimated | ||
. | ||||
Race/ethnicity | California census | N/A | ||
. | ||||
HT | NHANESa | Current combined vs. never use: 2.14 (2.04–2.24) | (142) | |
Current estrogen-only vs. never use: 1.32 (1.24–1.39) | ||||
. | ||||
Obesity | NHANESa | Risk ratio per 5 U of BMI for postmenopausal breast | (143) | |
1.12 (1.08–1.16) | ||||
. | ||||
Tobacco use | NHANESa | Relative risk of ever vs. never smokers, for those who went through natural menopause at: | (141) | |
Age <45 y | ||||
RR = 1.11 SE = 0.15 | ||||
Age 45–49 | ||||
RR = 0.98 SE = 0.08 | ||||
Age ≥50 years | ||||
RR = 1.12 SE = 0.06 |
Name . | Marginal distribution (source) . | Meta-analysis estimated relative hazard and CI . | Meta-analysis reference (if available and applicable) . | |
---|---|---|---|---|
Race-age (y) | California censusa | See Supplementary Table S1 for matrix of risk estimates. | N/A | |
. | ||||
Age at first birth (years, all among parous women) | NHANESa | Category | RR | (140) |
<20 y | 1.00 | |||
20–24 | 1.04 (0.91–1.18) | |||
25–29 | 1.17 (1.02–1.34) | |||
30–34 | 1.30 (1.11–1.51) | |||
35+ | 1.40 (1.15–1.70) | |||
. | ||||
Age at menarche (y) | NHANESa | Age | RR | (139) |
<11 | 1.19 (1.13–1.25) | |||
11 | 1.09 (1.06–1.12) | |||
12 | 1.07 (1.05–1.09) | |||
13 | 1.00 (0.98–1.02) | |||
14 | 0.98 (0.96–1.00) | |||
15 | 0.92 (0.89–0.95) | |||
≥16 | 0.82 (0.79–0.85) | |||
. | ||||
Age at menopause (y) | NHANESa | Age | RR | (139) |
<40 | 0.67 (0.62–0.73) | |||
40–44 | 0.73 (0.70–0.77) | |||
45–49 | 0.86 (0.84–0.89) | |||
50–54 | 1.00 (0.98–1.02) | |||
≥55 | 1.12 (1.07–1.17) | |||
. | ||||
Alcohol use (g/day) | NHANESa | g/d | RR (SE) | (141) |
0 | 1.00 (0.015) | |||
<5 | 1.01 (0.020) | |||
5–14 | 1.01 (0.023) | |||
15–24 | 1.19 (0.048) | |||
25–34 | 1.22 (0.056) | |||
35–44 | 1.18 (0.093) | |||
≥45 | 1.49 (0.110) | |||
OR increase in relative risk per 10 g/day: 7.1% (1.3%) | ||||
. | ||||
BRCAb | 0.1% | 5.0 (4–6) estimated | ||
. | ||||
Race/ethnicity | California census | N/A | ||
. | ||||
HT | NHANESa | Current combined vs. never use: 2.14 (2.04–2.24) | (142) | |
Current estrogen-only vs. never use: 1.32 (1.24–1.39) | ||||
. | ||||
Obesity | NHANESa | Risk ratio per 5 U of BMI for postmenopausal breast | (143) | |
1.12 (1.08–1.16) | ||||
. | ||||
Tobacco use | NHANESa | Relative risk of ever vs. never smokers, for those who went through natural menopause at: | (141) | |
Age <45 y | ||||
RR = 1.11 SE = 0.15 | ||||
Age 45–49 | ||||
RR = 0.98 SE = 0.08 | ||||
Age ≥50 years | ||||
RR = 1.12 SE = 0.06 |
aRace/ethnic group–specific marginal distributions were derived from NHANES data; see Table 2 for details.
bBRCA was assumed to have a population frequency of 0.1% in all race/ethnic groups independent of all other risk factors.
Table 4 presents the estimated impact of changes at a population level for five of these selected factors considered one at a time in categories of age and race/ethnicity, but adjusting for the effects of the other variables. Overall, the impact of interventions or change in modifiable risk factors is modest. However, the model suggests that the greatest impact in 2000 would be from reductions in HT use and an increase in the age at menarche followed by reductions in excess BMI. Reductions in alcohol or tobacco use have only small effects as would be expected from the levels of attributable risk estimated from prospective cohort studies (70, 90). Interestingly, if the age at menarche, which has been dropping over the last century (144, 145), were to be reversed (by a year or 18 months), breast cancer incidence would be reduced by 5.5% overall.
Predictive factor . | Degree change . | Total (White, Black, Latino) . | SD . | 55–64 y . | SD . | 65–74 y . | SD . | 75+ y . | SD . | White . | SD . | Black . | SD . | Latina . | SD . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Total observed | 379 | 314 | 451 | 423 | 430 | 379 | 254 | ||||||||
Total simulated | 393.0 | 0.8 | 306.9 | 1.3 | 452.7 | 1.8 | 450.9 | 1.6 | 431.7 | 1.1 | 364.1 | 2.5 | 245.4 | 1.4 | |
Excess BMI | 50% decrease | 384.4 | 0.8 | 300.4 | 1.2 | 442.6 | 1.7 | 440.9 | 1.5 | 423.3 | 1.0 | 349.4 | 2.4 | 238.3 | 1.4 |
100% decrease | 375.8 | 0.8 | 293.8 | 1.2 | 432.5 | 1.6 | 430.9 | 1.5 | 414.9 | 1.0 | 334.7 | 2.2 | 231.2 | 1.3 | |
Alcohol consumption | 25% decrease | 391.9 | 0.8 | 305.9 | 1.3 | 451.4 | 1.7 | 449.7 | 1.6 | 430.5 | 1.0 | 363.5 | 2.5 | 244.3 | 1.4 |
50% decrease | 389.5 | 0.8 | 303.9 | 1.3 | 448.7 | 1.7 | 447.1 | 1.6 | 427.5 | 1.0 | 362.5 | 2.5 | 243.6 | 1.4 | |
Tobacco use: % of population | 25% decrease | 392.0 | 0.8 | 305.8 | 1.3 | 451.5 | 1.8 | 450.1 | 1.6 | 430.5 | 1.1 | 362.9 | 2.5 | 244.8 | 1.4 |
50% decrease | 390.9 | 0.8 | 304.6 | 1.3 | 450.3 | 1.7 | 449.4 | 1.6 | 429.3 | 1.1 | 361.8 | 2.5 | 244.3 | 1.4 | |
Age at menarche | 1 y increase | 377.4 | 0.8 | 294.3 | 1.2 | 434.5 | 1.7 | 433.5 | 1.5 | 415.3 | 1.0 | 346.9 | 2.4 | 233.5 | 1.4 |
1.5 y increase | 371.7 | 0.8 | 289.8 | 1.2 | 428.0 | 1.7 | 427.0 | 1.5 | 409.1 | 1.0 | 341.4 | 2.4 | 229.8 | 1.3 | |
HT: % of population | 50% decrease | 288.3 | 0.7 | 225.2 | 1.1 | 332.1 | 1.6 | 330.7 | 1.4 | 316.7 | 1.0 | 267.1 | 2.3 | 180.0 | 1.3 |
100% decrease | 183.7 | 0.4 | 143.4 | 0.6 | 211.5 | 0.8 | 210.7 | 0.7 | 201.7 | 0.5 | 170.1 | 1.2 | 114.7 | 0.7 |
Predictive factor . | Degree change . | Total (White, Black, Latino) . | SD . | 55–64 y . | SD . | 65–74 y . | SD . | 75+ y . | SD . | White . | SD . | Black . | SD . | Latina . | SD . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Total observed | 379 | 314 | 451 | 423 | 430 | 379 | 254 | ||||||||
Total simulated | 393.0 | 0.8 | 306.9 | 1.3 | 452.7 | 1.8 | 450.9 | 1.6 | 431.7 | 1.1 | 364.1 | 2.5 | 245.4 | 1.4 | |
Excess BMI | 50% decrease | 384.4 | 0.8 | 300.4 | 1.2 | 442.6 | 1.7 | 440.9 | 1.5 | 423.3 | 1.0 | 349.4 | 2.4 | 238.3 | 1.4 |
100% decrease | 375.8 | 0.8 | 293.8 | 1.2 | 432.5 | 1.6 | 430.9 | 1.5 | 414.9 | 1.0 | 334.7 | 2.2 | 231.2 | 1.3 | |
Alcohol consumption | 25% decrease | 391.9 | 0.8 | 305.9 | 1.3 | 451.4 | 1.7 | 449.7 | 1.6 | 430.5 | 1.0 | 363.5 | 2.5 | 244.3 | 1.4 |
50% decrease | 389.5 | 0.8 | 303.9 | 1.3 | 448.7 | 1.7 | 447.1 | 1.6 | 427.5 | 1.0 | 362.5 | 2.5 | 243.6 | 1.4 | |
Tobacco use: % of population | 25% decrease | 392.0 | 0.8 | 305.8 | 1.3 | 451.5 | 1.8 | 450.1 | 1.6 | 430.5 | 1.1 | 362.9 | 2.5 | 244.8 | 1.4 |
50% decrease | 390.9 | 0.8 | 304.6 | 1.3 | 450.3 | 1.7 | 449.4 | 1.6 | 429.3 | 1.1 | 361.8 | 2.5 | 244.3 | 1.4 | |
Age at menarche | 1 y increase | 377.4 | 0.8 | 294.3 | 1.2 | 434.5 | 1.7 | 433.5 | 1.5 | 415.3 | 1.0 | 346.9 | 2.4 | 233.5 | 1.4 |
1.5 y increase | 371.7 | 0.8 | 289.8 | 1.2 | 428.0 | 1.7 | 427.0 | 1.5 | 409.1 | 1.0 | 341.4 | 2.4 | 229.8 | 1.3 | |
HT: % of population | 50% decrease | 288.3 | 0.7 | 225.2 | 1.1 | 332.1 | 1.6 | 330.7 | 1.4 | 316.7 | 1.0 | 267.1 | 2.3 | 180.0 | 1.3 |
100% decrease | 183.7 | 0.4 | 143.4 | 0.6 | 211.5 | 0.8 | 210.7 | 0.7 | 201.7 | 0.5 | 170.1 | 1.2 | 114.7 | 0.7 |
NOTE: Rates were simulated from 100,000 persons with 800 iterations, and were age adjusted to the 2,000 U.S. Standard Population (19 age groups - Census P25-1130: http://www.census.gov/prod/1/pop/p25-1130/p251130.pdf). The simulated incidence rates were from one parameter set using the average value in Table 3.
Incidence rates by the three major race/ethnic categories were, as expected, highest in Whites, lowest in Latinas, and intermediate in Blacks. There were no substantial differences to the influences of change in any of the risk factors by race/ethnic category.
We assessed the absolute risk reductions associated with modifications in risk factors and breast cancer incidence in this model. A 50% reduction in the proportion of the population with a BMI ≥30 kg/m2 would result in a decrease in the simulated rate of invasive breast cancer from 393/100,000 (no intervention) to 384.4/100,000 for an estimated total of 386 fewer cases of breast cancer per year in the state of California in 2010 with a census population of 4,486,843 women over 55 years of age. A reduction in the use of HT by 50%, which in 2000 was still very common, resulted in a rate of 288.3/100,000 or 4,697 fewer breast cancer cases.
Discussion
Our conceptual model (Fig. 1) is a graphical depiction of the relationships between factors that play a role in determining the etiology of postmenopausal breast cancer incidence. This model indicates a wide range of factors at different levels of organization that impact each other to influence breast cancer incidence rates. The findings from the mathematical model begin to illustrate how accounting for multiple factors can be used to integrate various types of evidence and to estimate the impact of various types of manipulations or interventions. Importantly, the exercise of specifying these models highlights gaps in our understanding and suggests new areas for data collection (e.g., endocrine disrupting chemicals, medical radiation exposure).
Existing models for breast cancer causation have been very successful in some ways, but not in others. For the past three decades, one of the most recognized and well-used conceptual framework of cancer causation overall comes from the analysis of Doll and Peto in 1981, frequently referred to as a model, which divides up the proportionate domain of causes for contributing to cancer (e.g., environment, diet; ref. 146). This approach has been more recently updated for the United Kingdom (147) and for the United States (148). These perspectives have been helpful in identifying which factors might be further investigated and that there are multiple causes. However, these causal factors have been viewed as relatively separate and mutually exclusive. For example, the impact of the environment has been regarded as separate from impact of individual level behavior although these domains frequently interact in cancer causation.
There are also a number of specific prediction models of cancer; the most well recognized and used for breast cancer is the Gail model (149). This model has proved to be useful for clinical applications in predicting individual future risk of breast cancer, based on the consideration of a number of biologic and behavioral factors, including the number of biopsies taken.
Mathematical models have also been used to explore the role of risk factors in breast cancer. For example, the Pike model has been used to explain the age–incidence curve in breast cancer (150). Modification of this model based on parameters from cohort data has resulted in further refinements and accuracy focused on individual level reproductive risk factors. Rosner and Colditz extended the Pike model using a log-incidence model to show the effects of reproductive risk factors on breast cancer incidence using data from the Nurses Health Study (151). Our model builds on this prior work by considering potentially causal variables in multiple domains and levels potentially amenable to intervention.
More recently, there has been more interest in complex systems modeling in health-related work. A notable example is the Foresight model, a conceptual model that depicts the causes of obesity that was developed by an expert commission in the United Kingdom (152). The development of this model was not constrained by parsimony, and the model includes more than 100 causal factors. Another model developed by Galea and colleagues depicts how a complex system can shed light on how measures of social class and neighborhood can influence health and disease across time (153). Auchincloss and colleagues have developed agent-based models to examine neighborhood influences on diet and walking (154).
Our model, which focuses only on postmenopausal breast cancer, has a number of strengths. First, it allows for the consideration of multiple levels of causes across multiple domains that are not typically a part of particular research domains (e.g., country of origin, sleep disturbance). Second, it is designed to focus consideration on the prevention of breast cancer, as compared with prior models that were more focused on treatment. Third, it is adaptable as new evidence comes to light. Finally, it can help highlight gaps where, because there is no data or weak evidence, further exploration and research are needed.
There are also a number of limitations and self-imposed constraints in our model. First, many of the relationships (i.e., arrows) in our conceptual model relied on evidence that may not estimate true causes, but instead be the result of confounding or bias. Another limitation is that although there are more risk factors that could be included (e.g., metals, mother's age at menarche), it was the view of the expert committee that inclusion of more risk factors with a lower level of supporting evidence would diminish the interpretability of the model. We also understand that risk factors may be different for pre- and postmenopausal breast cancer, by ER status and by breast cancer subtype (e.g., basal cell type). Future modifications of the model could expand to include subtypes as more information on risk factors emerges. Finally, breast cancer is a neoplasm with risk factors that come into play all along the life course (155). Although we included breast feeding, age at menarche, height, age at first birth, and age at menopause, we intend to explore better ways to display life course etiologic factors in future versions of the conceptual model.
There were also a number of limitations to the implementation of the mathematical model. First, at this stage of development, we did not include feedback in this model (e.g., the effect of alcohol consumption on income) although this is a characteristic of complexity models and we plan to consider this for subsequent versions. We also assumed linear relationships between factors in the models, which is almost certainly not correct. Unfortunately, very little data exist on more detailed specifications of relationships between variables and we did not include interactions between factors in this model. Finally, there are interdependencies between individuals, such as have been demonstrated for obesity and tobacco use in studies of social networks (156, 157), which will require more development in the future. Feedbacks and dependencies are two hallmarks of complex systems and are undoubtedly present in the system that results in breast cancer.
Our conceptual model and the complex system it depicts can be used in a number of ways by researchers, policymakers, and the public. (158) Researchers may be aided in seeing the “big picture” of breast cancer causation and where new transdisciplinary research is needed. This model should help make the connections across domains and areas of disciplinary expertise. It also highlights gaps in knowledge for further research such as a better understanding of the role of environmental toxicants and their interactions with social factors, biologic pathways for environmental and behavioral factors, and strengthening evidence where the quality of data is weak. For policymakers, the model can help identify the multiple avenues for primary prevention and the need for resources and funding in specific areas. It may make clearer the need for considering multiple domains for interventions within their jurisdictions and areas of influence. The public, breast cancer advocates, and other lay stakeholders can use the model to understand the complexities in breast cancer causation (i.e., there is not “a cause” of breast cancer), and that these causes extend beyond genetic susceptibility, traditional reproductive and lifestyle risk factors, and potential environmental carcinogens.
In conclusion, we hope the growing understanding of the etiology of breast cancer will allow us to modify the relationships and pathways in this newly developed model and make it a continuously updated resource for scientists and the public. The model can and should be modified with the results of new knowledge and research. Equally important, although our model is focused on the causes of postmenopausal breast cancer, this approach can serve as a framework to be applied to other diseases and outcomes. By doing so, it can lead to better appreciation of the complex interplay of causal factors at multiple organizational levels, while simultaneously providing greater clarity on avenues for prevention. Complex systems approaches are increasingly called for in population health research yet specific applications to research questions beyond infectious disease remain rare. We have developed a conceptual model of breast cancer and operationalized it through a mathematical model. The advantages of this approach are illustrated in terms of (i) formulating dynamic conceptual models of breast cancer etiology that are explicit and can be understood and debated by various stakeholders, (ii) integrating various types of data, (iii) identifying the possible impact of various interventions and policies, and (iv) identifying gaps in knowledge where new data are needed.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: R.A. Hiatt, T.C. Porco, J. Barlow, A.V. Diaz-Roux, L.H. Kushi, M.M. Moasser, D.H. Rehkopf
Development of methodology: R.A. Hiatt, T.C. Porco, J. Barlow, L.H. Kushi, Z. Werb, G.C. Windham, D.H. Rehkopf
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R.A. Hiatt, L.H. Kushi, Z. Werb, G.C. Windham, D.H. Rehkopf
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): R.A. Hiatt, T.C. Porco, F. Liu, D. Braithwaite, L.H. Kushi, M.M. Moasser, Z. Werb, D.H. Rehkopf
Writing, review, and/or revision of the manuscript: R.A. Hiatt, A. Balmain, J. Barlow, D. Braithwaite, A.V. Diaz-Roux, L.H. Kushi, M.M. Moasser, Z. Werb, G.C. Windham, D.H. Rehkopf, K. Balke
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): R.A. Hiatt, K. Balke, D.H. Rehkopf
Study supervision: R.A. Hiatt, D.H. Rehkopf
Acknowledgments
The authors thank Elad Ziv, Peggy Reynolds, Laura van t Veer, and Jeanne Rizzo for their comments on an early draft of the article.
Grant Support
This work was supported by the California Breast Cancer Research Program (15QB-8301; to R.A Hiatt, T.C. Porco, K. Balke, A. Balmain, J. Barlow, A.V. Diez-Roux, L.H. Kushi, M.M. Moasser, Z. Werb, G.C. Windham, and D.H. Rehkopf).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.