Abstract
The etiology of breast cancer is a complex system of interacting factors from multiple domains. New knowledge about breast cancer etiology continues to be produced by the research community, and the communication of this knowledge to other researchers, practitioners, decision makers, and the public is a challenge.
We updated the previously published Paradigm model (PMID: 25017248) to create a framework that describes breast cancer etiology in four overlapping domains of biologic, behavioral, environmental, and social determinants. This new Paradigm II conceptual model was part of a larger modeling effort that included input from multiple experts in fields from genetics to sociology, taking a team and transdisciplinary approach to the common problem of describing breast cancer etiology for the population of California women in 2010. Recent literature was reviewed with an emphasis on systematic reviews when available and larger epidemiologic studies when they were not. Environmental chemicals with strong animal data on etiology were also included.
The resulting model illustrates factors with their strength of association and the quality of the available data. The published evidence supporting each relationship is made available herein, and also in an online dynamic model that allows for manipulation of individual factors leading to breast cancer (https://cbcrp.org/causes/).
The Paradigm II model illustrates known etiologic factors in breast cancer, as well as gaps in knowledge and areas where better quality data are needed.
The Paradigm II model can be a stimulus for further research and for better understanding of breast cancer etiology.
Introduction
There has been substantial research to identify breast cancer risk factors and effective methods of breast cancer prevention, but the incidence of invasive breast cancer has not decreased substantially in recent time. In 2019, breast cancer remains the most common cancer in women, with an estimated 268,600 new cases and 41,760 deaths estimated in the United States (1, 2). Traditional prevention efforts have largely focused on risk factors proximal to the clinical diagnosis. More may be learned about etiology and prevention if we expand our attention to the complexity of causal pathways contributing in a more global sense. It is clear that there is no single cause of breast cancer. Rather the causes of breast cancer are complex and work through many potentially interacting pathways. Furthermore, these risk factors and pathways may operate differently at varying times during a woman's life and can have different implications for breast cancer risk depending on the period and duration of exposure (3, 4).
Epidemiologic research over the past several decades has identified many reproductive and behavioral risk factors important in breast cancer etiology, ranging from postmenopausal hormone therapy (HT) and alcohol use to mammographic density. In spite of these advances, there remain many potential risk factors where the biological plausibility of a causative role is high, but substantive epidemiologic data are lacking due to challenges in accurate exposure measurement or inadequate control for confounding. For example, the association of higher socioeconomic status (SES) with higher breast cancer incidence is complex and the relative importance of SES factors is incompletely understood (5). Also the mechanism of action of endocrine-disrupting chemicals (EDC) is well documented biologically (6), with many EDCs listed as known human carcinogens by the International Agency for Research on Cancer (7, 8). However, the short half-lives and low levels of exposure of most of these chemicals makes epidemiologic approaches to establishing causal relationships difficult (9). Identifying proximate risk factors for disease is important as they are often more easily defined and measured and may be more amenable to public health intervention. However, an understanding of the upstream determinants of these proximate risk factors may be useful to inform interventions that seek to prevent breast cancer on a societal or population level.
Many individual risk factors operate on multiple causal pathways to breast cancer or act as proxies or noncausal risk factors for upstream determinants of the disease. Some factors operate directly on breast cancer (e.g., endogenous hormones), while others operate indirectly through intermediaries (e.g., income via alcohol use). Together, these factors represent complex causal pathways that are not often captured in the traditional risk-factor epidemiology approach to cancer prevention. Ignoring the fundamental “causes of causes” can potentially lead to interventions targeted at noncausal risk factors that may be ineffectual in preventing disease (10). A more systems-based understanding of causal pathways to breast cancer incidence may help to inform population- and individual-level prevention strategies and reveals gaps in existing literature where more research is needed on mechanisms or pathways to carcinogenesis.
Many models exist to predict the occurrence of breast cancer in individuals (11–13). These models often use information on commonly collected risk factors and their association with breast cancer to inform risk prediction scores, representing the absolute risk of breast cancer in the next 5 or 10 years or over the lifetime. Because these models do not attempt to structure the temporal order and directionality between risk factors, they are limited in guiding prevention efforts at the population level. To address this limitation of prior models, Hiatt and colleagues (14) developed the Paradigm Model, which may be the first conceptual framework of postmenopausal breast cancer incidence taking a complex systems, population-level approach. This framework complemented clinical risk prediction models through an enhanced conceptual understanding of how physical, behavioral, and social factors “get under the skin” to cause biological processes that result in breast cancer. In this article we report on an extension of the initial Paradigm conceptual framework (14). Additional work is being conducted to create a mathematical agent-based model of breast cancer incidence.
Materials and Methods
We sought to extend the original Paradigm conceptual framework in three ways. First, we included a more in depth consideration of genetic and biological pathways. Second, we extended the prior model to include premenopausal and postmenopausal breast cancer, as risk factors vary over the life course (15). And third, we considered the role of evidence from animal models to inform aspects of the model where human evidence was more limited.
We assembled a multidisciplinary panel with expertise in epidemiology, biostatistics, mathematical and agent-based modeling, breast cancer biology, toxicology, genetics, population health, and breast cancer advocacy. Members of the panel convened three times a year for 2 years to discuss and develop the conceptual framework presented here. Individual meetings and conversations were arranged in the intervals between full panel meetings to address specific questions.
Selection of variables
The multidisciplinary expert panel began by reviewing the variables included in the conceptual framework published by Hiatt and colleagues (14). We identified new literature for each risk factor and pathway where it was available and made changes to the strength and quality of each association based on new evidence. For proximate risk factors directly affecting breast cancer incidence (e.g., physical activity to breast cancer), we identified the most recent meta-analysis or systematic review from the literature. Where systematic review, meta-analysis, or pooled analysis was unavailable, we identified the two largest, most recent cohort studies. For relationships between distal (i.e., indirect) and proximate (i.e., direct) factors (e.g., education to physical activity), we sought input from the expert panel to identify meta-analysis where possible and recent, high quality literature where meta-analysis or systematic reviews were unavailable. For upstream factors of general relevance (e.g., income), we used effect estimates for women if stratification by gender was available; if data were not stratified by gender (e.g., for factors not leading directly to incident breast cancer), we used overall estimates. For interdependencies between social and economic factors we used U.S. Department of Labor statistics or nationally representative surveys (i.e., the National Health and Nutrition Examination Survey and the National Health Interview Survey; refs. 16, 17) where possible to prevent selection bias and ensure robust sample sizes.
We added several variables to the conceptual framework that were not included in the original framework (14). New variables that were added to the model include polyaromatic hydrocarbons (PAH), polychlorinated biphenyls (PCB), bisphenol A (BPA), phthalates, and benzene. These specific chemical classes replaced the general term “EDCs,” as their evidence in humans varies across the class of chemical. Among others factors, we also added genetic ancestry, genetic polymorphisms, inflammation, and light at night and workplace. In addition to new risk factors, we both added and removed pathways from the original model where the most recent literature supported new evidence of an association or was not conclusive enough to suggest a true association with breast cancer. Despite this effort, we were not able to include all possible etiologic factors or provide risks for subcategories of breast cancer in this framework. The new variables that were included and the strength of association and quality of the evidence associated with them, were derived from the expertise of the multidisciplinary panel and our review of the most recent literature.
Each variable was classified into one of four domains: biological, behavioral, social, and physical. Although these domains are often overlapping (e.g., obesity has components of biology, behavior, social, and physical environment), we used the domains to emphasize the multifactorial nature of chronic disease etiology. Furthermore, we recognized that all factors must eventually operate through biologic pathways, although their origin itself may not be biological (e.g., education). However, for many of these relationships, the biological mechanisms are unknown and could not be included.
Inclusion and exclusion
We sought to include risk factors for both premenopausal and postmenopausal breast cancer incidence. Differing associations between the risk factor for premenopausal compared with postmenopausal breast cancer incidence (e.g., obesity) were reflected in the relationships the factor had with other variables in the model. The risk factors and pathways in the framework represent risks for invasive female breast cancer and not ductal carcinoma in situ or male breast cancer. Because of limited data on the differential effects of risk factors on subtype of breast cancer at this time (e.g., Luminal A, Luminal B, HER2+, and triple-negative), we could not stratify these multiple effects by tumor subtype in this version of the model. Although more data exists on the factors related to estrogen receptor–positive (ER+) versus ER− cancers (18), we also elected not to try and create a submodel for both types of breast cancer outcomes. Most factors in the model relate to ER+ breast cancer (18).
Strength of association
We classified each relationship with a strength score, reflecting the strength of the association between the variables based on the literature available. Strength scores were based on RRs or standardized regression coefficients and their confidence limits; RRs ranged from 1 to 3, with 1 representing the strongest association. Our three categories were as follows: category 1, RR >3.0 or regression coefficient 0.6–1 (strong); category 2, RR >1.8–3.0 or coefficients 0.3–0.6 (modest); and category 3, RR 1.1–1.8 or coefficients of 0–0.3 (weak). These categories are based on the previous conceptual model (14) and were originally adapted from the levels used in the Harvard Cancer Risk Index (19). This index allowed for a reasonably good differentiation of the strength of association for breast cancer–related risk factors (18). When no or only weak human epidemiologic studies were available for selected variables, we used strong (B1) or modest (B2) designations based on animal or mechanistic studies. These relationships are difficult to examine in epidemiologic studies and we lack a clear understanding of biological mechanisms in humans because of methodologic issues related to exposure assessment limitations, such as short half-lives of the toxicants, measures of duration, or time of critical exposures over the life course (7).
Quality of the data
Quality scores were included for each relationship represented in the conceptual model to reflect the strength of study design and execution of the research. These scores also demonstrate where higher quality data are needed and allow for inclusion of factors with putative biological pathways but little epidemiologic evidence (e.g., chemical exposures). We used the U.S. Preventive Services Task Force (USPSTF; ref. 20) guidelines on the quality of evidence, which assigns grades on a 3-point scale (good, fair, and poor). Poor quality evidence, score 3, suggests that “evidence is insufficient to assess the effects on health outcomes because of limited number or power of studies, important flaws in their design or conduct, gaps in the chain of evidence, or lack of information on important health outcomes.” Fair evidence, score 2, suggests that “evidence is sufficient to determine effects on health outcomes, but the strength of the evidence is limited by the number, quality, or consistency of the individual studies, generalizability to routine practice, or indirect nature of the evidence on health outcomes.” Good evidence, score 1, suggests that “evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health outcomes.” (20). To assess quality, the USPSTF uses a hierarchy of study designs with “properly powered and conducted randomized controlled trial (RCT); well-conducted systematic review or meta-analysis of homogeneous RCTs” at the top.
Variables that received scores of 3 for strength of association and 3 for quality of the literature were not included in this conceptual framework as the poor quality of the data and the low RR highlights true uncertainty about the causal nature of these risk factors.
Results
The breast cancer risk factors and pathways grouped by domains are presented in Table 1 (21–152). The relationships between risk factors and relationship between risk factors and breast cancer incidence, including strength and direction of association and quality of the literature, are illustrated in Fig. 1. Table 1 notes the direction of the relationship as “increases,” “decreases,” or “affects” when there is no directionality per se (e.g., race/ethnicity affects age at menarche).
Arrow start . | Direction . | Arrow end . | Strength . | Quality . | References . |
---|---|---|---|---|---|
Social | |||||
Country of birth | Affects | Age at first birth | 2 | 1 | (21, 22) |
Country of birth | Affects | Breastfeeding | 1 | 1 | (21, 23–25) |
Country of birth | Affects | Education | 1 | 2 | (26, 27) |
Country of birth | Affects | Latitude | 1 | 1 | (28) |
Country of birth | Affects | Diet | 1 | 3 | (29) |
Country of birth | Affects | Race/ethnicity | 1 | 1 | (30, 31) |
Education | Raises | Age at first birth | 1 | 1 | (32–34) |
Education | Affects | Alcohol use | 2 | 2 | (35) |
Education | Increases | Breastfeeding | 3 | 2 | (36) |
Education | Decreases | Second-hand smoke | 2 | 2 | (37) |
Education | Increases | Income | 1 | 1 | (38, 39) |
Education | Affects | Occupation | 1 | 1 | (40) |
Education | Decreases | Parity | 1 | 1 | (32–34) |
Education | Increases | Physical activity | 2 | 1 | (41) |
Education | Decreases | Tobacco use | 2 | 2 | (42, 43) |
Income | Increases | Age at menarche | 3 | 2 | (44, 45) |
Income | Affects | Alcohol use | 2 | 2 | (46–48) |
Income | Increases | Height | 3 | 1 | (49–51) |
Income | Decreases | HT (E + P) | 3 | 2 | (52–54) |
Income | Decreases | Tobacco use | 2 | 1 | (55–57) |
Income | Decreases | Obesity | 2 | 1 | (58, 59) |
Income | Increases | Physical activity | 2 | 2 | (60) |
Occupation | Affects | HT (E + P) | 2 | 3 | (54) |
Light at night | Increases | Incidence | 3 | 1 | (61) |
Occupation | Affects | Alcohol use | 1 | 1 | (62, 63) |
Workplace | Affects | Second-hand smoke | 1 | 2 | (64, 65) |
Race/ethnicity | Affects | Age at first birth | 2 | 2 | (66, 67) |
Race/ethnicity | Affects | Parity | 2 | 2 | (66, 67) |
Race/ethnicity | Affects | Age at menarche | 1 | 1 | (68) |
Race/ethnicity | Affects | Breastfeeding | 2 | 2 | (25) |
Race/ethnicity | Affects | Education | 1 | 2 | (27) |
Race/ethnicity | Affects | Second-hand smoke | 2 | 3 | (69, 70) |
Race/ethnicity | Affects | Income | 2 | 2 | (71) |
Race/ethnicity | Affects | Obesity | 2 | 1 | (59) |
Race/ethnicity | Affects | Occupation | 1 | 2 | (72) |
Race/ethnicity | Affects | Tobacco use | 1 | 2 | (55, 57) |
Race/ethnicity | Affects | Endogenous sex steroid levels | 3 | 1 | (73) |
Race/ethnicity | Affects | Breast density | 3 | 1 | (74) |
Race/ethnicity | Affects age at | Menopause | 3 | 1 | (75, 76) |
Biological | |||||
Age | Increases | Incidence | 1 | 1 | (77) |
Age at menarche | Decreases | Endogenous sex steroid levels | 3 | 2 | (78) |
Age at menarche | Decreases | Incidence | 3 | 1 | (79) |
Menopause | Increases | Incidence | 3 | 1 | (79) |
Breast density | Increases | Incidence | 2 | 1 | (80) |
Endogenous sex steroid levels | Increase age at | Menopause | 1 | 2 | (81) |
Endogenous sex steroid levels | Increase | Incidence | 3 | 1 | (82, 83) |
Genetic ancestry | Affects | Race/ethnicity | 1 | 1 | (30) |
Genetic ancestry | Affects | Genetic polymorphisms | B1 | 1 | (84) |
Genetic ancestry | Affects | High-penetrance genes | 1 | 1 | (85) |
Genetic polymorphisms | Affect | Breast density | 1 | 1 | (86) |
Genetic polymorphisms | Affect | Height | 1 | 1 | (87, 88) |
Genetic polymorphisms | Affect | Incidence | 3 | 2 | (89) |
Genetic polymorphisms | Affect | Insulin resistance | 1 | 1 | (90, 91) |
Genetic polymorphisms | Affect | Obesity | 1 | 1 | (92, 93) |
Genetic polymorphisms | Affect | Age at menarche | 2 | 1 | (94) |
Genetic polymorphisms | Affect age at | Menopause | 2 | 2 | (95) |
Genetic polymorphisms | Affect | Alcohol use | 2 | 2 | (96) |
Genetic polymorphisms | Affect | Tobacco use | 1 | 1 | (97, 98) |
Height | Increases | Incidence | 3 | 1 | (99) |
High-penetrance genes | Increase | Incidence | 1 | 1 | (100) |
Inflammation | Increases | Insulin resistance | B1 | 1 | (101–103) |
Inflammation | Increases | Incidence | 3 | 1 | (104) |
Insulin resistance | Increases | Inflammation | B1 | 1 | (105) |
Insulin resistance | Increases | Obesity | B1 | 1 | (106, 107) |
Insulin resistance | Increases | Incidence | 3 | 1 | (108) |
Menopause | Decreases | Breast density | 3 | 1 | (109) |
Obesity | Decreases | Age at menarche | 1 | 1 | (68) |
Obesity | Decreases | Breast density | 1 | 1 | (110) |
Obesity | Increases | Endogenous sex steroid levels | B1 | 1 | (111) |
Obesity | Increases | Inflammation | B1 | 1 | (112, 113) |
Obesity | Affects | Incidence | 3 | 1 | (114) |
Obesity | Decreases | Physical activity | 1 | 3 | (115) |
Obesity | Increases | Insulin resistance | B1 | 1 | (116) |
Vitamin D | Decreases | Incidence | 3 | 1 | (117, 118) |
Behavioral | |||||
Age at first birth | Increases | Breast density | 3 | 1 | (119, 120) |
Age at first birth | Increases | Incidence | 3 | 1 | (121) |
Alcohol use | Increases | Incidence | 3 | 1 | (122) |
Breastfeeding | Decreases | Incidence | 3 | 1 | (123) |
HT (E + P) | Increases | Incidence | 3 | 1 | (124) |
HT (E + P) | Increases | Breast density | 1 | 1 | (119, 125) |
Parity | Decreases | Breast density | 3 | 1 | (119, 120) |
Parity | Decreases | Incidence | 3 | 1 | (121) |
Physical activity | Decreases | Incidence | 3 | 1 | (126) |
Physical activity | Decreases | Obesity | 2 | 1 | (127) |
Diet | Decreases | Incidence | 3 | 1 | (128, 129) |
Tobacco use | Increases | Endogenous sex steroid levels | 3 | 2 | (130, 131) |
Tobacco use | Increases | Incidence | 3 | 1 | (132) |
Physical | |||||
PAHs | Increase | Incidence | 2 | 3 | (133–135) |
PCBs | Increase | Incidence | 2 | 3 | (136–140) |
BPAs | Increase | Incidence | B2 | 3 | (141, 142) |
Phthalates | Decrease | Age at menarche | 3 | 3 | (143–145) |
Benzene | Increases | Incidence | 2 | 3 | (146, 147) |
Second-hand smoke | Increases | Incidence | 3 | 2 | (148) |
Radiation | Increases | Incidence | 1 | 1 | (149) |
Latitude | Decreases | Vitamin D | 3 | 3 | (150) |
Latitude | Affects | Incidence | 1 | 2 | (151, 152) |
Arrow start . | Direction . | Arrow end . | Strength . | Quality . | References . |
---|---|---|---|---|---|
Social | |||||
Country of birth | Affects | Age at first birth | 2 | 1 | (21, 22) |
Country of birth | Affects | Breastfeeding | 1 | 1 | (21, 23–25) |
Country of birth | Affects | Education | 1 | 2 | (26, 27) |
Country of birth | Affects | Latitude | 1 | 1 | (28) |
Country of birth | Affects | Diet | 1 | 3 | (29) |
Country of birth | Affects | Race/ethnicity | 1 | 1 | (30, 31) |
Education | Raises | Age at first birth | 1 | 1 | (32–34) |
Education | Affects | Alcohol use | 2 | 2 | (35) |
Education | Increases | Breastfeeding | 3 | 2 | (36) |
Education | Decreases | Second-hand smoke | 2 | 2 | (37) |
Education | Increases | Income | 1 | 1 | (38, 39) |
Education | Affects | Occupation | 1 | 1 | (40) |
Education | Decreases | Parity | 1 | 1 | (32–34) |
Education | Increases | Physical activity | 2 | 1 | (41) |
Education | Decreases | Tobacco use | 2 | 2 | (42, 43) |
Income | Increases | Age at menarche | 3 | 2 | (44, 45) |
Income | Affects | Alcohol use | 2 | 2 | (46–48) |
Income | Increases | Height | 3 | 1 | (49–51) |
Income | Decreases | HT (E + P) | 3 | 2 | (52–54) |
Income | Decreases | Tobacco use | 2 | 1 | (55–57) |
Income | Decreases | Obesity | 2 | 1 | (58, 59) |
Income | Increases | Physical activity | 2 | 2 | (60) |
Occupation | Affects | HT (E + P) | 2 | 3 | (54) |
Light at night | Increases | Incidence | 3 | 1 | (61) |
Occupation | Affects | Alcohol use | 1 | 1 | (62, 63) |
Workplace | Affects | Second-hand smoke | 1 | 2 | (64, 65) |
Race/ethnicity | Affects | Age at first birth | 2 | 2 | (66, 67) |
Race/ethnicity | Affects | Parity | 2 | 2 | (66, 67) |
Race/ethnicity | Affects | Age at menarche | 1 | 1 | (68) |
Race/ethnicity | Affects | Breastfeeding | 2 | 2 | (25) |
Race/ethnicity | Affects | Education | 1 | 2 | (27) |
Race/ethnicity | Affects | Second-hand smoke | 2 | 3 | (69, 70) |
Race/ethnicity | Affects | Income | 2 | 2 | (71) |
Race/ethnicity | Affects | Obesity | 2 | 1 | (59) |
Race/ethnicity | Affects | Occupation | 1 | 2 | (72) |
Race/ethnicity | Affects | Tobacco use | 1 | 2 | (55, 57) |
Race/ethnicity | Affects | Endogenous sex steroid levels | 3 | 1 | (73) |
Race/ethnicity | Affects | Breast density | 3 | 1 | (74) |
Race/ethnicity | Affects age at | Menopause | 3 | 1 | (75, 76) |
Biological | |||||
Age | Increases | Incidence | 1 | 1 | (77) |
Age at menarche | Decreases | Endogenous sex steroid levels | 3 | 2 | (78) |
Age at menarche | Decreases | Incidence | 3 | 1 | (79) |
Menopause | Increases | Incidence | 3 | 1 | (79) |
Breast density | Increases | Incidence | 2 | 1 | (80) |
Endogenous sex steroid levels | Increase age at | Menopause | 1 | 2 | (81) |
Endogenous sex steroid levels | Increase | Incidence | 3 | 1 | (82, 83) |
Genetic ancestry | Affects | Race/ethnicity | 1 | 1 | (30) |
Genetic ancestry | Affects | Genetic polymorphisms | B1 | 1 | (84) |
Genetic ancestry | Affects | High-penetrance genes | 1 | 1 | (85) |
Genetic polymorphisms | Affect | Breast density | 1 | 1 | (86) |
Genetic polymorphisms | Affect | Height | 1 | 1 | (87, 88) |
Genetic polymorphisms | Affect | Incidence | 3 | 2 | (89) |
Genetic polymorphisms | Affect | Insulin resistance | 1 | 1 | (90, 91) |
Genetic polymorphisms | Affect | Obesity | 1 | 1 | (92, 93) |
Genetic polymorphisms | Affect | Age at menarche | 2 | 1 | (94) |
Genetic polymorphisms | Affect age at | Menopause | 2 | 2 | (95) |
Genetic polymorphisms | Affect | Alcohol use | 2 | 2 | (96) |
Genetic polymorphisms | Affect | Tobacco use | 1 | 1 | (97, 98) |
Height | Increases | Incidence | 3 | 1 | (99) |
High-penetrance genes | Increase | Incidence | 1 | 1 | (100) |
Inflammation | Increases | Insulin resistance | B1 | 1 | (101–103) |
Inflammation | Increases | Incidence | 3 | 1 | (104) |
Insulin resistance | Increases | Inflammation | B1 | 1 | (105) |
Insulin resistance | Increases | Obesity | B1 | 1 | (106, 107) |
Insulin resistance | Increases | Incidence | 3 | 1 | (108) |
Menopause | Decreases | Breast density | 3 | 1 | (109) |
Obesity | Decreases | Age at menarche | 1 | 1 | (68) |
Obesity | Decreases | Breast density | 1 | 1 | (110) |
Obesity | Increases | Endogenous sex steroid levels | B1 | 1 | (111) |
Obesity | Increases | Inflammation | B1 | 1 | (112, 113) |
Obesity | Affects | Incidence | 3 | 1 | (114) |
Obesity | Decreases | Physical activity | 1 | 3 | (115) |
Obesity | Increases | Insulin resistance | B1 | 1 | (116) |
Vitamin D | Decreases | Incidence | 3 | 1 | (117, 118) |
Behavioral | |||||
Age at first birth | Increases | Breast density | 3 | 1 | (119, 120) |
Age at first birth | Increases | Incidence | 3 | 1 | (121) |
Alcohol use | Increases | Incidence | 3 | 1 | (122) |
Breastfeeding | Decreases | Incidence | 3 | 1 | (123) |
HT (E + P) | Increases | Incidence | 3 | 1 | (124) |
HT (E + P) | Increases | Breast density | 1 | 1 | (119, 125) |
Parity | Decreases | Breast density | 3 | 1 | (119, 120) |
Parity | Decreases | Incidence | 3 | 1 | (121) |
Physical activity | Decreases | Incidence | 3 | 1 | (126) |
Physical activity | Decreases | Obesity | 2 | 1 | (127) |
Diet | Decreases | Incidence | 3 | 1 | (128, 129) |
Tobacco use | Increases | Endogenous sex steroid levels | 3 | 2 | (130, 131) |
Tobacco use | Increases | Incidence | 3 | 1 | (132) |
Physical | |||||
PAHs | Increase | Incidence | 2 | 3 | (133–135) |
PCBs | Increase | Incidence | 2 | 3 | (136–140) |
BPAs | Increase | Incidence | B2 | 3 | (141, 142) |
Phthalates | Decrease | Age at menarche | 3 | 3 | (143–145) |
Benzene | Increases | Incidence | 2 | 3 | (146, 147) |
Second-hand smoke | Increases | Incidence | 3 | 2 | (148) |
Radiation | Increases | Incidence | 1 | 1 | (149) |
Latitude | Decreases | Vitamin D | 3 | 3 | (150) |
Latitude | Affects | Incidence | 1 | 2 | (151, 152) |
Note: The starting point for an arrow is the independent variable, and the ending point is the dependent variable. The direction of the effect could be to increase, decrease, or simply affect the dependent variable. Strength of the relationship is categorized as: 1, strong (RR >3.0); 2, modest (RR >1.8–3.0); or 3, weak (RR 1.1–1.8). B1 and B2 indicate strong or modest relationships from animal or mechanistic studies in humans. Quality of the evidence is categorized as: 1, strong; 2, moderate; or 3, weak.
Abbreviation: E + P, estrogen + progestins.
The strongest risk factors with a direct effect on breast cancer incidence were age, high-penetrance genes (e.g., BRCA1 and 2), breast density, and radiation, which were all classified as 1 or 2 for strength of association. Income and race/ethnicity were strongly associated with behavioral factors, including tobacco, alcohol, and postmenopausal HT (i.e., estrogen + progestins) use, and reproductive factors, including age at first birth, parity, and breastfeeding that have distinct but smaller magnitude effects on breast cancer incidence.
Obesity was modestly associated with postmenopausal breast cancer risk, but evidence indicates an inverse or no association with premenopausal breast cancer risk, so the model indicates that obesity affects incidence in general. Evidence suggested that income and race/ethnicity are strong determinants of obesity and that obesity operates through several known pathways to breast cancer incidence, including increased insulin resistance, decreased immune function, and increased circulating hormones in postmenopausal women.
Dietary influences (“diet” in Fig. 1), in addition to alcohol, include specific references only to the weak or modest relationships of phytoestrogens, mainly from soy, and carotenoids from yellow and red pigmented fruits and vegetables, but not dietary fat, given the weakness of that relationship.
EDC classes PAHs, PCBs, and benzene were associated with breast cancer risk in epidemiologic studies, while BPAs showed clear carcinogenicity in animal studies despite a lack of epidemiologic studies in humans. Certain classes of phthalates were associated with earlier puberty in girls with higher exposure, although epidemiologic literature relating phthalates directly to breast cancer incidence were lacking and thus were not included in the conceptual framework.
Discussion
The conceptual framework for breast cancer incidence includes 96 relationships linking the biological, behavior, social, and physical domains, demonstrating the complexity of breast cancer causation. This compares with 83 relationships in the Paradigm I model. We included information on specific EDCs as risk factors and added new pathways between variables where the evidence has strengthened substantially since the prior conceptual framework (14). However, there are still additional factors that could have been included but were not for the sake of relative simplicity. For example, hormonal contraceptives increase risk during current use but not after stopping for 10 years (153), so the population impact later in life when breast cancer becomes more common is minimal. Also, we have perhaps overly simplified the role of endogenous estrogens by including only a term for postmenopausal endogenous sex steroid levels, which are strongly associated with breast cancer (83), but we have not included IgF or anti-Mullerian hormones.
Evidence that premenopausal and postmenopausal breast cancers have different risk factors (15) and may be etiologically distinct has contributed to the stratification of premenopausal and postmenopausal breast cancer in the literature. However, for risk factors that contained substantial evidence by pre- and postmenopausal status including breast density, age at first birth, parity, and alcohol use, we found broadly similar direction and magnitude of association with breast cancer. The current literature suggests that excess adiposity or obesity is a strong risk factor for postmenopausal breast cancer, and although the evidence for premenopausal breast cancer suggests a likely protective effect overall (154), there may be null (114, 155) or positive (156, 157) effects for certain tumor types and in different ethnic groups. Overall, however, the model does not reflect different risk factors for pre- versus postmenopausal breast cancer as separate diseases. Rather, we identified the individual factors that are related to the continuously changing estrogen levels as women age.
Age itself is the strongest overall risk factor for breast cancer and can be related to many of the factors in the model either because it is embedded in the variable itself (e.g., age at menarche) or because of its influence on other pathways (e.g., breast density, endogenous hormones, obesity, alcohol consumption, hormone replacement, tobacco use, and income). We chose, however, for the sake of ease of interpretation to include only the direct relationship of age to breast cancer in this conceptual model.
The body of evidence supporting proximate risk factors for breast cancer was generally strong, with risk factors having at least a single meta-analysis, pooled analysis, or systematic review. However, the literature underpinning distal pathways connecting upstream and downstream risk factors through mediating variables was often more limited. In addition to the quality of the studies included, the strength of the associations between factors varied widely. Most proximate risk factors for breast cancer, with the exception of BRCA mutations and breast density, had a strength score of <2, although strength scores between upstream and downstream factors varied from 1 to 3. For example, education had a strong effect on reproductive variables, such as age at first birth and parity, but was more weakly associated with behaviors such as physical activity and substance use. These strength scores are notable as they allow for an understanding of patterns of breast cancer incidence in the population that can be traced beyond distributions of proximate risk factors to how and in whom these behaviors or exposures arise.
Our conceptual framework focused on risk factors for breast cancer as currently understood, but these relationships may change over time. As a past example, the predominant users of postmenopausal HT in the 1990s and early 2000s were highly educated, white women who tended to have high SES (158). However, once the results of the Women's Health Initiative trial were released in 2002 showing that HT did not reduce cardiovascular disease and indeed may cause breast cancer, these educated, higher SES women were the first to discontinue use (159). Therefore, the current association between income, education, and HT is the opposite of what we had seen in past decades. This has implications for breast cancer risk, whereby we have seen increases in the proportion of ER+ breast cancers in African American women in the past decade, potentially because of prolonged use of HT, compared with white women (159). It is important to understand how relationships in upstream pathways can change over time and affect breast cancer incidence differently in different groups.
A major goal of this conceptual framework was to highlight areas where the literature on risk factors or pathways is unclear and increased research investment could provide critical information. This can be helpful in one of at least three ways: first, if the model shows only a direct relationship, new research may define a more detailed pathway through mediators; second, if the quality of the data is poor or weak on the basis of current knowledge, further research is needed (e.g., BPA, PCBs, and phthalates); and finally, if there are missing relationships or variables of interest, then new studies are needed.
Another area worthy of increased research investment is identifying the critical windows of susceptibility in the life course where exposure to particular factors poses a greater risk than exposure at other periods in life (3). A growing body of literature has evaluated the effects of some risk factors during and around puberty, where rapidly developing cells in the breast may be at heightened risk of environmental insults (9). However, most of these studies have followed girls and adolescents and thus have not matured enough to obtain information on cancer outcomes in adults. Nevertheless, it seems clear that even in utero and perinatal exposures can be risk factors for breast cancer. Systematic reviews of birthweight from retrospective studies have documented increased ORs of about 1.2 for birthweight of 4,000 g or more (160, 161). Additional research investment is needed to identify other windows of susceptibility to allow for appropriately targeted intervention strategies to the right women at the right time. Finally, our model draws attention to the complexity of breast cancer incidence by showing the myriad of factors and pathways in action. Future research might build upon this framework to inform interdependencies between factors, whereby the risk of one factor depends on other factors (e.g., interactions). As research is increasingly moving toward more precision in prevention, interactions between factors help to identify subgroups at the highest risk and appropriately tailor prevention and screening strategies.
There are few other efforts in the literature to describe the complexity of etiologic factors in breast cancer. However, a recent study that pooled six cohort studies from Australia reported on the strength of modifiable behavioral risk factors with a focus on prevention at population levels (162). By using population attributable fractions they identified alcohol as the leading modifiable risk factor for premenopausal women and obesity and alcohol consumption as the most modifiable factors for postmenopausal women. Within this Paradigm II study, a manuscript with the results of an agent-based model that will assess the relative contribution of multiple modifiable risk factors is in progress.
Our study has several positive features, including providing summary strengths and quality scores on literature for 96 relationships representing proximate and distal pathways to breast cancer causation. We reviewed pathways to examine the root causes of breast cancer that may explain patterns of incidence in California and the United States more broadly and can be understood by a wide audience of readers, including scientists, advocates, policymakers, and an educated lay public. Our framework benefits from a transdisciplinary approach to understand the interactions of multiple risk factors on multiple levels and identify the highest quality literature supporting these risk factors and pathways.
Our study also had several important limitations. We limited ourselves to using RRs or correlation coefficients in this work but realized from the work of others the additional value of expressing these relationships in terms of absolute risk (163, 164) and see this as a challenge for a future version of this model. The data supporting some pathways that connect upstream or distal factors (e.g., income or education) with proximate or behavioral factors (e.g., second-hand smoke exposure) were often not of high quality or were missing entirely. We included relationships in the framework where there was some data to support the existence of an association, but where the quality was not yet high (e.g., quality score of 3), as a way to highlight areas where future research is needed. When we began this work, very few risk factors had literature supporting associations with different molecular subtypes of breast cancer, although new findings in this area are emerging rapidly. Thus, in general, we report risk factors for breast cancer incidence overall. Our estimates of strength of association are based on the literature for each factor; however, when studies used different categorization or measurement of risk factors, we choose the strength of association from the strongest study. Finally, we describe the direction of the association (Table 1) to give readers an understanding of whether a factor incurs risk or provides protection from breast cancer, although in many situations, this relationship is not linear. For example, there is good evidence that women with low SES are more likely to abstain from alcohol but also more likely to binge drink compared with higher SES women (46, 47). These nonlinear associations are important, although not fully described in our conceptual framework. Also, in some cases (e.g., race/ethnicity) it is difficult to say anything about the direction of effect and have used the term “affects.”
In summary, this conceptual framework aims to capture the relevant risk factors and pathways that cause breast cancer in U.S. women. We target a wide audience, including scientists, policy makers, breast cancer advocates, and the lay public. To this end, we created the dynamic digital version of the framework that is accessible online (https://cbcrp.org/causes/) and provides an understanding of both the individual pathways at play and the overall complexity of breast cancer etiology. It also highlights the evidence behind each relationship. Taken together, the article and the online dynamic framework aim to educate and facilitate debate and discussion around risk factors and prevention strategies and to highlight areas where further research may substantially enhance our understanding of breast cancer prevention and control.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: R.A. Hiatt, K. Balke, D.H. Rehkopf
Development of methodology: R.A. Hiatt, N.J. Engmann, D.H. Rehkopf
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R.A. Hiatt, N.J. Engmann, K. Balke, D.H. Rehkopf
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): R.A. Hiatt, N.J. Engmann
Writing, review, and/or revision of the manuscript: R.A. Hiatt, N.J. Engmann, K. Balke, D.H. Rehkopf
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): N.J. Engmann, K. Balke
Study supervision: R.A. Hiatt
Acknowledgments
This work was supported by the California Breast Cancer Research Program (20ZB-8303, to R.A. Hiatt, N.J. Engmann, K. Balke, and D.H. Rehkopf). The authors thank the Paradigm II working group for their input leading up to the construction of this conceptual model: Janice Barlow, Suzanne Fenton, Sarah Gehlert, Ross Hammond, George Kaplan, John Kornak, Krisida Nishioka, Thomas McKone, Martyn Smith, Leonardo Trasande, Melissa Troester, and John Witte. The authors also thank the following for their comments on an early draft of the conceptual model: Iona Cheng, Peggy Reynolds, and John Witte.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.