Abstract
Breast cancer is a leading cancer diagnosis among premenopausal women around the world. Unlike rates in postmenopausal women, incidence rates of advanced breast cancer have increased in recent decades for premenopausal women. Progress in identifying contributors to breast cancer risk among premenopausal women has been constrained by the limited numbers of premenopausal breast cancer cases in individual studies and resulting low statistical power to subcategorize exposures or to study specific subtypes. The Premenopausal Breast Cancer Collaborative Group was established to facilitate cohort-based analyses of risk factors for premenopausal breast cancer by pooling individual-level data from studies participating in the United States National Cancer Institute Cohort Consortium. This article describes the Group, including the rationale for its initial aims related to pregnancy, obesity, and physical activity. We also describe the 20 cohort studies with data submitted to the Group by June 2016. The infrastructure developed for this work can be leveraged to support additional investigations. Cancer Epidemiol Biomarkers Prev; 26(9); 1360–9. ©2017 AACR.
This article is featured in Highlights of This Issue, p. 1357
Introduction
Breast cancer is the most common cancer diagnosed among women worldwide, with an estimated 1.67 million cases diagnosed in 2012, accounting for a quarter of all new cancers in women. Breast cancer is also the most common cancer diagnosed among women ages 15 to 39 years worldwide (1). Furthermore, breast cancer among premenopausal women often presents at more advanced stages (2) and, at the youngest ages, has less favorable prognosis (3) than among postmenopausal women.
Identifying contributors to breast cancer risk in younger women is critical to prevention. In the United States, incidence rates of advanced breast cancer have increased among premenopausal women in recent decades, whereas they have consistently decreased among women ages 60 and older during the same period (4). Accumulating evidence supports etiologic heterogeneity between pre- and postmenopausal breast cancer. Several lifestyle factors, including childbirth (5), obesity (6), and cigarette smoking (7), have been reported to have differential associations with breast cancer risk before and after menopause. Breast cancer subtypes, including those defined by gene expression (8), or clinical markers including estrogen receptor (ER), progesterone receptor (PR), or HER2/neu oncogene expression, have emerged as critical considerations for risk factor associations and are differentially distributed by menopausal status (9). Investigations of breast cancer etiologic heterogeneity require large sample sizes to have sufficient statistical power to account jointly for menopausal status and tumor subtype.
The Premenopausal Breast Cancer Collaborative Group (referred to hereafter as the Collaborative Group) was established to facilitate cohort-based analyses of risk factors for premenopausal breast cancer, both overall and according to tumor characteristics. This article describes the formation of the Collaborative Group, the methods used for ongoing efforts, and provides the rationale for initial analyses related to pregnancy, obesity, and physical activity. The infrastructure developed to address these questions can support future investigations of additional potential risk factors.
Collaborative Group Studies
The National Cancer Institute (NCI) Cohort Consortium was formed to address the need for large-scale collaborations to pool data in cohort studies of cancer and hence to quicken the pace of research (http://epi.grants.cancer.gov/Consortia/cohort.html). The Collaborative Group was initiated within the Cohort Consortium in 2013 by investigators at The Institute of Cancer Research (ICR) in London and the National Institute of Environmental Health Sciences (NIEHS). The ICR and the NIEHS serve as the Data Coordinating Centers.
Eligibility
Prospective cohorts in the Cohort Consortium with at least 100 female breast cancers diagnosed during follow-up before age 55 and data collection at two or more time points (baseline and at least one follow-up, to allow for exposure information and menopausal status to be updated) are eligible to participate.
Participating cohorts
This report describes the 20 cohort studies (counting the European Prospective Investigation into Cancer and Nutrition, which has many cohorts within it, as a single cohort; refs. 6, 10–28) with data submitted to the Collaborative Group as of June 2016. Participating cohorts are shown in Table 1 and span North America, Europe, Asia, and Australia. The number of female participants from these cohorts ages <55 at enrollment ranges from 5,671 (Campaign against Cancer and Heart Disease) to 117,733 (Nurses' Health Study cohort). The cohorts were initiated as early as 1950 (the Radiation Effects Research Foundation Life Span Study) or as recently as 2003 (Generations and Sister Study cohorts). All cohorts have conducted more than one round of data collection; however, follow-up data are not yet fully available for three cohorts. The number of follow-up rounds for which data have been submitted as of June 1, 2016, ranged from 1 to 16 across cohorts.
Cohort . | Location . | Ages at enrollment; mean (SD), range . | Calendar years of enrollment . | Baseline data collection methods . | N of data collection roundsa . | Breast cancer cases (N) . | Breast cancer ascertainment sources . | Cohort size (women <55 years) . | N years of follow-up, mean (SD), range (<55 years) . |
---|---|---|---|---|---|---|---|---|---|
Black Women's Health Study (10) | United States | 37.1 (8.6)20–54 | 1995 | Mailed questionnaire | 9 | 1,299 | Self-report and state registry | 52,543 | 12.6 (5.6)0–18.6 |
California Teachers Cohort (28) | United States | 40.4 (7.4)22–54 | 1995–1998 | Mailed questionnaire | 4 | 1,185 | State registry | 47,516 | 11.6 (5.0)0.0–17.2 |
Campaign against Cancer and Heart Disease (CLUE II; ref. 13) | United States | 39.6 (9.6)18–54 | 1989 | Administered questionnaire | 6 | 131 | State registry | 5,671 | 10.8 (5.4)0.3–26.0 |
Canadian Study of Diet, Lifestyle, and Health (12)b | Canada | 44.1 (6.9)23–54 | 1991–1999 | Mailed questionnaire | 1 | 377 | Provincial and national registry | 1,589 | 8.1 (4.7)0–18.6 |
European Prospective Investigation into Cancer and Nutrition (14)c | Europe | 44.2 (8.1)19–54 | 1991–2000 | Self-reported/administered questionnaires | 1 | 2,122 | Self-report and national/regional registries | 150,291 | 7.5 (4.2)0–16.6 |
Etude Epidémiologique auprès de femmes de la Mutuelle Générale de l'Education Nationale (E3N; ref. 15) | France | 46.5 (4.2)38–54 | 1989–1991 | Mailed questionnaire | 8 | 1,908 | Self-report | 72,748 | 8.1 (4.2)0–16.4 |
Generations Study (11) | United Kingdom | 39.8 (9.5)16–54 | 2003–2012 | Mailed questionnaire | 2 | 719 | Self-report and national registry | 72,058 | 5.4 (1.7)0–9.7 |
Helseundersøkelsen i Nord-Trøndelag (HUNT2; ref. 26) | Norway | 38.9 (9.7)20–54 | 1995–1997 | Administered questionnaire | 1 | 209 | National cancer registry | 20,974 | 10.2 (4.1)0.3–14.0 |
Melbourne Collaborative Cohort Study (16) | Australia | 47.5 (4.4)31–54 | 1990–1994 | Administered questionnaire | 3 | 227 | State registry | 12,029 | 7.3 (4.4)0–20.1 |
New York University Women's Health Study (19, 20) | United States | 45.2 (5.5)31–54 | 1984–1991 | Self-administered questionnaire | 6 | 371 | Self-report and state registry | 8,757 | 9.5 (5.5)0–23.5 |
Norwegian Women and Cancer Study (105) | Norway | 45.7 (6.0)31–54 | 1991–2007 | Mailed questionnaire | 3 | 2,124 | National registry | 117,633 | 9.0 (5.8)0.3–20.5 |
Nurses' Health Study (17) | United States | 42.6 (7.1)29–54 | 1976–1978 | Mailed questionnaire | 16 | 2,743 | Self-report | 117,730 | 12.2 (7.0)0.1–25.5 |
Nurses' Health Study II (18) | United States | 34.8 (4.7)24–44 | 1989–1990 | Mailed questionnaire | 12 | 3,765 | Self-report | 116,415 | 18.7 (3.7)0.1–23.7 |
Radiation Effects Research Foundation Life Span Study (21) | Japan | 41.3 (8.5)18–54 | 1963–1993 | Administered or mailed questionnaire | 6 | 130 | City registry | 18,420 | 13.5 (8.5)0.1–36.7 |
Singapore Chinese Health Study (22) | Singapore | 49.6 (3.0)43–54 | 1993–1998 | Administered questionnaire | 2 | 134 | National cancer registry | 16,056 | 5.3 (3.0)0.3–11.5 |
Sister Study (6) | United States | 47.9 (4.9)35–54 | 2003–2009 | Telephone and written questionnaire | 3 | 679 | Self-report | 24,044 | 4.7 (2.5)0.1–10.6 |
Southern Community Cohort Study (23) | United States | 47.3 (4.2)40–54 | 2002–2009 | Administered questionnaire | 2 | 233 | State registry | 30,289 | 5.1 (2.4)0.1–13.3 |
Sweden Women's Lifestyle and Health Study (27) | Sweden | 39.7 (5.8)29–49 | 1991–1992 | Mailed questionnaire | 2 | 1,192 | National registry | 49,010 | 14.4 (5.3)0.1–21.1 |
Swedish Mammography Cohort (24) | Sweden | 46.6 (4.3)38–54 | 1987–1990 | Mailed questionnaire | 2 | 649 | National registry | 34,126 | 8.3 (4.3)0–16.6 |
United States Radiologic Technologist Cohort (25) | United States | 36.8 (7.3)22–54 | 1983–1998 | Mailed questionnaire | 3 | 1,570 | Self-report | 62,862 | 14.5 (5.6)0–22.8 |
Cohort . | Location . | Ages at enrollment; mean (SD), range . | Calendar years of enrollment . | Baseline data collection methods . | N of data collection roundsa . | Breast cancer cases (N) . | Breast cancer ascertainment sources . | Cohort size (women <55 years) . | N years of follow-up, mean (SD), range (<55 years) . |
---|---|---|---|---|---|---|---|---|---|
Black Women's Health Study (10) | United States | 37.1 (8.6)20–54 | 1995 | Mailed questionnaire | 9 | 1,299 | Self-report and state registry | 52,543 | 12.6 (5.6)0–18.6 |
California Teachers Cohort (28) | United States | 40.4 (7.4)22–54 | 1995–1998 | Mailed questionnaire | 4 | 1,185 | State registry | 47,516 | 11.6 (5.0)0.0–17.2 |
Campaign against Cancer and Heart Disease (CLUE II; ref. 13) | United States | 39.6 (9.6)18–54 | 1989 | Administered questionnaire | 6 | 131 | State registry | 5,671 | 10.8 (5.4)0.3–26.0 |
Canadian Study of Diet, Lifestyle, and Health (12)b | Canada | 44.1 (6.9)23–54 | 1991–1999 | Mailed questionnaire | 1 | 377 | Provincial and national registry | 1,589 | 8.1 (4.7)0–18.6 |
European Prospective Investigation into Cancer and Nutrition (14)c | Europe | 44.2 (8.1)19–54 | 1991–2000 | Self-reported/administered questionnaires | 1 | 2,122 | Self-report and national/regional registries | 150,291 | 7.5 (4.2)0–16.6 |
Etude Epidémiologique auprès de femmes de la Mutuelle Générale de l'Education Nationale (E3N; ref. 15) | France | 46.5 (4.2)38–54 | 1989–1991 | Mailed questionnaire | 8 | 1,908 | Self-report | 72,748 | 8.1 (4.2)0–16.4 |
Generations Study (11) | United Kingdom | 39.8 (9.5)16–54 | 2003–2012 | Mailed questionnaire | 2 | 719 | Self-report and national registry | 72,058 | 5.4 (1.7)0–9.7 |
Helseundersøkelsen i Nord-Trøndelag (HUNT2; ref. 26) | Norway | 38.9 (9.7)20–54 | 1995–1997 | Administered questionnaire | 1 | 209 | National cancer registry | 20,974 | 10.2 (4.1)0.3–14.0 |
Melbourne Collaborative Cohort Study (16) | Australia | 47.5 (4.4)31–54 | 1990–1994 | Administered questionnaire | 3 | 227 | State registry | 12,029 | 7.3 (4.4)0–20.1 |
New York University Women's Health Study (19, 20) | United States | 45.2 (5.5)31–54 | 1984–1991 | Self-administered questionnaire | 6 | 371 | Self-report and state registry | 8,757 | 9.5 (5.5)0–23.5 |
Norwegian Women and Cancer Study (105) | Norway | 45.7 (6.0)31–54 | 1991–2007 | Mailed questionnaire | 3 | 2,124 | National registry | 117,633 | 9.0 (5.8)0.3–20.5 |
Nurses' Health Study (17) | United States | 42.6 (7.1)29–54 | 1976–1978 | Mailed questionnaire | 16 | 2,743 | Self-report | 117,730 | 12.2 (7.0)0.1–25.5 |
Nurses' Health Study II (18) | United States | 34.8 (4.7)24–44 | 1989–1990 | Mailed questionnaire | 12 | 3,765 | Self-report | 116,415 | 18.7 (3.7)0.1–23.7 |
Radiation Effects Research Foundation Life Span Study (21) | Japan | 41.3 (8.5)18–54 | 1963–1993 | Administered or mailed questionnaire | 6 | 130 | City registry | 18,420 | 13.5 (8.5)0.1–36.7 |
Singapore Chinese Health Study (22) | Singapore | 49.6 (3.0)43–54 | 1993–1998 | Administered questionnaire | 2 | 134 | National cancer registry | 16,056 | 5.3 (3.0)0.3–11.5 |
Sister Study (6) | United States | 47.9 (4.9)35–54 | 2003–2009 | Telephone and written questionnaire | 3 | 679 | Self-report | 24,044 | 4.7 (2.5)0.1–10.6 |
Southern Community Cohort Study (23) | United States | 47.3 (4.2)40–54 | 2002–2009 | Administered questionnaire | 2 | 233 | State registry | 30,289 | 5.1 (2.4)0.1–13.3 |
Sweden Women's Lifestyle and Health Study (27) | Sweden | 39.7 (5.8)29–49 | 1991–1992 | Mailed questionnaire | 2 | 1,192 | National registry | 49,010 | 14.4 (5.3)0.1–21.1 |
Swedish Mammography Cohort (24) | Sweden | 46.6 (4.3)38–54 | 1987–1990 | Mailed questionnaire | 2 | 649 | National registry | 34,126 | 8.3 (4.3)0–16.6 |
United States Radiologic Technologist Cohort (25) | United States | 36.8 (7.3)22–54 | 1983–1998 | Mailed questionnaire | 3 | 1,570 | Self-report | 62,862 | 14.5 (5.6)0–22.8 |
aContributed as of June 2016; includes baseline and each follow-up.
bThe Canadian Study of Diet, Lifestyle, and Health is the only case-cohort study. The cohort size (N = 1,589) represents the subcohort only.
cThe European Prospective Study into Cancer and Nutrition (EPIC) dataset does not include the French or Norwegian EPIC sites which contributed from the Etude Epidémiologique auprès de femmes de la Mutuelle Générale de l'Education Nationale and Norwegian Women and Cancer Study directly.
Breast cancer ascertainment
To date, data have been received for 1,030,761 women, and include 21,766 incidents invasive or in situ breast cancers diagnosed after study enrollment and before age 55 years (Table 2). Across studies, cancer diagnoses are identified by linkage with city/state/provincial/regional (10, 12, 13, 23, 28–31) or national (11, 12, 14, 24, 26, 32, 33) population-based cancer registries, and/or through self-report followed by medical record review (6, 10, 11, 14, 15, 25, 34, 35). All participating studies established case ascertainment procedures and published findings related to incident breast cancer risk prior to joining the Collaborative Group.
Characteristic . | Combined (N) . | Total N studies with data availablea . |
---|---|---|
Total breast cancers diagnosed | 21,766 | 20 (all) |
Age at diagnosis (y) | 20 (all) | |
<30 | 32 | |
30–39 | 1,245 | |
40–44 | 3,340 | |
45–49 | 7,053 | |
50–54 | 10,096 | |
Extent of disease | 20 | |
In situ | 3,651 | |
Invasive | 17,357 | |
Missing | 758 | |
Estrogen receptor status | 16 | |
Positive | 9,583 | |
Negative | 3,182 | |
Borderline | 52 | |
Missing | 8,949 | |
Progesterone receptor status | 16 | |
Positive | 7,919 | |
Negative | 3,939 | |
Borderline | 95 | |
Missing | 9,813 | |
HER2/neu overexpression | 11 | |
Positive | 1,093 | |
Negative | 4,808 | |
Borderline | 29 | |
Missing | 15,836 |
Characteristic . | Combined (N) . | Total N studies with data availablea . |
---|---|---|
Total breast cancers diagnosed | 21,766 | 20 (all) |
Age at diagnosis (y) | 20 (all) | |
<30 | 32 | |
30–39 | 1,245 | |
40–44 | 3,340 | |
45–49 | 7,053 | |
50–54 | 10,096 | |
Extent of disease | 20 | |
In situ | 3,651 | |
Invasive | 17,357 | |
Missing | 758 | |
Estrogen receptor status | 16 | |
Positive | 9,583 | |
Negative | 3,182 | |
Borderline | 52 | |
Missing | 8,949 | |
Progesterone receptor status | 16 | |
Positive | 7,919 | |
Negative | 3,939 | |
Borderline | 95 | |
Missing | 9,813 | |
HER2/neu overexpression | 11 | |
Positive | 1,093 | |
Negative | 4,808 | |
Borderline | 29 | |
Missing | 15,836 |
aContributed as of June 2016.
Data exchange and harmonization
After approval by the NCI Cohort Consortium executive committee, the aims of the proposed collaboration were circulated to all Consortium members in 2013. Key exposure, covariate, and outcome information necessary to address the initial analyses and potential confounding or effect modification was identified by the Coordinating Centers. Complete capture of all information across exposures is not required for participation.
After confirming eligibility, a data request template is sent to cohorts that wish to participate. Requested exposure data include: age/year of cohort entry, length of follow-up, demographic characteristics (age, race/ethnicity, education, socioeconomic status), lifestyle factors (physical activity, anthropometric characteristics, alcohol intake, smoking information, mammography use), reproductive history (menarche, menstrual cycle characteristics, gravidity, parity, pregnancy complications, infertility, breastfeeding, hormonal medications, menopausal status), benign breast disease, and family history of breast cancer (Supplementary Table S1). Most characteristics are requested at enrollment and each follow-up, as available. Breast cancer information includes age at diagnosis, stage, grade, histology, and expression of ER, PR, HER2, CK5/6, or EGFR. Participating studies are asked where possible to recode their own data to fit the data request template to minimize the potential for error in the recoding or understanding of variables in their original form. However, if this is not possible due to programming support constraints or other reasons, data are sent to the Coordinating Centers in their original form with a study-specific contact person identified to address questions from Coordinating Center programmers who reformat the information to fit the standard definitions in the data request template.
After data transfer agreements are signed between each individual study and the Coordinating Centers, completed datasets were transferred to the coordinating centers using secure file transfer protocols. Each cohort submits their data to one of the two Coordinating Centers who are responsible for data transfer and harmonization procedures. By having two data coordinating sites, one located in the United States and the other in the United Kingdom, we are able to minimize time zone differences to facilitate rapid communication, and accommodate studies that are only able to send data to certain locations because of country-specific information governance requirements.
Data harmonization procedures are standardized across Coordinating Centers. Quality control checks are run on each dataset to identify (i) potential data inconsistencies for each questionnaire round (e.g., nulliparous women reporting more than zero births), (ii) inconsistencies between questionnaire rounds (e.g., number of births at follow-up being lower than at baseline questionnaire), and (iii) implausible values. Data checking procedures are automated with a shared program that was run at each Coordinating Center with standardized output. Each cohort is contacted regarding any issues that were identified, and clarifications or updates are incorporated into the study-specific dataset. Where issues could not be resolved, pre-established recoding rules are applied to the data. When study-specific variables cannot be recoded to meet the data template formats (e.g., age at exposure was collected in categories but a continuous variable was requested), differences are documented and original data are retained for potential future use. Once the datasets are recoded to the standardized formats, data are merged to create a pooled dataset containing values from all cohorts.
Defining menopausal status
A primary issue for the Collaborative Group analyses is the definition of menopausal status during follow-up and at diagnosis. Menopausal status was ascertained by cohorts at each follow-up round for which it was available. In addition, we request at least one follow-up round after age 55 or breast cancer diagnosis (if available) to allow menopausal status to be defined retrospectively. In analyses conducted by menopausal status we will explore different lag periods to determine patterns for “premenopausal” or “perimenopausal” breast cancer, as menopause can be a gradual transition.
Statistical approach
Two statistical approaches are being used to analyze the data. We first examine study-specific estimates and a pooled estimate across studies using a random-effects model that weights estimates by the inverse of the study-specific variance (36–38). An advantage of this approach is that each study-specific estimate can be derived based on its own available covariates. Cochran's Q statistic is used to examine statistical heterogeneity between studies by comparing a weighted measure of difference between individual study estimates and the pooled estimate (39, 40). We calculate the I2 statistic to examine the proportion of variance that is due to between-study heterogeneity rather than chance (41). Potential sources of heterogeneity are investigated.
Maximum flexibility for confounder adjustment and assessment of effect modification can be achieved by pooling individual-level data across cohorts. If homogeneity assumptions are not violated, we pool data into a single dataset to conduct aggregate analyses stratified by study and adjusted for potential confounders that are available in all included studies.
In both approaches, Cox regression models are used to calculate HRs and 95% confidence intervals (CI) for breast cancer (42). Regression models are constructed with age as the time scale such that person-time is accrued from age at cohort entry until breast cancer diagnosis, age at last follow-up, or other exit age, whichever occurred first. Follow-up time is stratified by time-updated exposures obtained from follow-up questionnaires, as appropriate. We test the proportional hazards assumption for exposures of interest, and in case of time-varying associations, for example, an interaction between attained age and the risk factor of interest, we investigate the addition of time-varying covariates in the model. In pooled analyses, potential variation in the association between exposures and breast cancer risk according to tumor subtype are assessed using Cox proportional hazards regression accounting for alternative tumor subtypes as competing risks (43, 44).
Rationale for Initial Aims
Pregnancy
A “dual effect” of pregnancy on breast cancer risk has been used to describe the short-term increase in breast cancer risk observed after childbirth followed by a long-term protective effect of parity. This pattern has been reported in epidemiological studies nested within European population registries (45–49) and in other case–control (50–55) and cohort (56) studies. Observational studies have reported 1.25- to 3-fold increases in breast cancer risk for up to 10 years after the last birth (5, 57). The magnitude of the pregnancy-related increase in breast cancer risk varies across studies, and may be influenced by maternal, pregnancy, or post-partum characteristics. Although a period of increased breast cancer risk after childbirth has been reported across several studies, it remains unclear whether this observation is different for, or limited to, specific groups defined by age (5, 50, 51), parity (45, 52, 53), oral contraceptive use (58), breastfeeding practices, family history of breast cancer (48, 59), or varies by breast cancer subtype (55, 56, 60) or other tumor characteristics (61, 62).
Women who have a first birth at an older age may have the greatest initial increase in breast cancer risk, and the longest interval until a protective effect appears (5, 49, 54, 63). Over the last 50 years, more women have postponed childbirth to older ages (5); this trend may have contributed to the increasing advanced-stage breast cancer rates among reproductive-age women. Pregnancy may also have opposite effects on risks of different breast cancer subtypes. For example, without considering menopausal status or subtype, parity reduces overall breast cancer risk by approximately 30% (64). However, parous women have a 50% to 90% increased risk of basal-like or ER−/PR− breast cancer overall (56, 65, 66). Associations for pre- and postmenopausal breast cancer combined often reflect patterns among the majority postmenopausal breast cancer cases. Our study will be well positioned to examine potential variation in the association between recent pregnancy and breast cancer subtype among premenopausal women. Others have proposed that pregnancy-related increases in breast cancer risk may also be affected by the relatively greater influence of genetic predisposition at younger versus older ages at diagnosis (48). In support of this hypothesis, at least two studies have shown stronger associations with recent birth and breast cancer risk among women with a mother or sister who was diagnosed with breast cancer (48, 59).
Theories to explain the transient increased risk of breast cancer after childbirth vary. High levels of estrogen and progesterone and the rapid expansion of breast cells during pregnancy could promote latent initiated tumor cells. However, breast tumors diagnosed postpartum are more often at an advanced stage and are associated with lower survival compared with those diagnosed during pregnancy (67–69). This evidence has led to increased focus on the role of post-partum exposures, including lactational involution (the process that returns the mammary gland to a non-milk producing state), as contributors to a pro-tumorigenic microenvironment that may be favorable for cancer cell migration and metastasis (70). Potential adverse effects of lactational involution on the breast microenvironment must also be reconciled with demonstrated lower risks of specific tumor subtypes among parous women who breastfeed, including ER-negative or basal-like tumors that confer a worse prognosis (56, 65). A better understanding of the factors that contribute to short-term increases in breast cancer risk after pregnancy, including potential variation by age, parity, oral contraceptive use, breastfeeding, family history, or tumor subtype could provide necessary information for refining hypotheses about carcinogenesis in reproductive-age women (71). Individual studies have had insufficient statistical power or have lacked key information to evaluate these characteristics jointly, making the Collaborative Group an ideal setting in which to advance understanding of pregnancy's role in premenopausal breast cancer development.
BMI and other anthropometrics
There is epidemiological evidence for higher BMI at premenopausal ages having an inverse association with breast cancer risk (72–75). Higher adiposity in childhood and adolescence appears to be associated with a lower risk of breast cancer at both premenopausal (73, 76–78) and postmenopausal (77–79) ages. Whether further weight gain contributes to additional reductions in premenopausal breast cancer risk is not entirely clear (80, 81). A protective effect of adiposity at premenopausal ages is in contrast with the effect of adiposity at postmenopausal ages, with greater BMI after menopause associated with higher risk of breast cancer, probably through production of estrogens by aromatase in adipose tissue (82).
The reason for the protective effect of adiposity at premenopausal ages is unclear, although several hypotheses have been put forward. Fewer ovulatory cycles in heavier women, and consequent lower sex hormone levels, has been suggested as a potential explanation (83). Similarly, an effect of polycystic ovary syndrome (PCOS) has been proposed, although Nurses' Health Study II data did not support this (73). To find the reasons for the inverse associations with premenopausal adiposity, large study populations are needed to produce stable estimates and to stratify by potentially explanatory factors.
Few published studies have had sufficiently large numbers of premenopausal cases to produce age-specific estimates over a range of ages, or to explore whether risks differ by other explanatory factors or by breast cancer subtype. The few that stratified by established breast cancer risk factors such as parity have so far reported risk estimates to be similar across these factors (72, 78). The association between adiposity and premenopausal breast cancer has been reported to vary by ethnicity, with strong associations in Caucasian, but not in Asian (84) or African-American (85), women, and associations are possibly stronger for ER+ than ER− premenopausal breast cancer (73). It is not clear what level of BMI confers the highest breast cancer risks—one study reported a non-linear association between BMI and risk, with the highest risk around 24 kg/m2 (72).
The Collaborative Group, with its large number of cases in the pooled dataset and data on a wide range of risk factors, will be able to clarify the contribution of premenopausal adiposity to breast cancer risk, by examining which subtypes of breast cancer are affected, analyzing associations by exposures such as menstrual factors, and by assessing the effect of changes in adiposity over time.
Physical activity
Physical activity is of particular interest in that it constitutes a potentially modifiable risk factor for breast and other cancers. For premenopausal women, the effect of physical activity on reducing breast cancer risk appears to be smaller and less certain than for postmenopausal women (86). However, very few studies (35, 87, 88) have published prospective data for premenopausal breast cancer risk in relation to physical activity, whereas others have published by age at breast cancer (89–91) or menopausal status at study entry (92–95), or have included premenopausal women in their study but did not publish effect estimates for these women separately (96, 97).
The biological mechanisms through which physical activity could exert an effect in premenopausal women is less clear than in postmenopausal women, but might be through an effect on menarche, menstrual dysfunction, cycle length, endogenous hormone levels or estrogen metabolism (98–100). A smaller effect of physical activity in premenopausal than postmenopausal women is possible because, in contrast to postmenopausal women, in whom the protective effect of physical activity on breast cancer risk is partly through its effect on reducing adiposity, adiposity in premenopausal women has a protective effect on breast cancer risk. In addition, the impact of physical activity on hormone levels might be less influential among premenopausal women given their high levels of circulating hormones.
To aid prevention, information is needed on the type, frequency and intensity of exercise required to influence breast cancer risk, as well as the ages and characteristics of women for whom it is most effective. There might be periods of life during which physical activity has a higher impact than others, such as the time-period between menarche and first birth (101). There is also emerging evidence of differential effects of activity by ethnicity, weight, parity and family history of breast cancer, but mostly based on data from postmenopausal women (35, 91, 102). It is a limitation, however, that physical activity information is collected in many different ways and is difficult to harmonize (103).
The Collaborative Group aims to address premenopausal breast cancer risk by frequency, intensity, type, and ages of exercise, within strata defined by factors such as BMI, family history of breast cancer and age at diagnosis, and to explore specific breast cancer subtypes and stages, on a much larger scale than previously. The information gained can be used to advise young women about the extent and type of exercise that can influence their breast cancer risk.
Opportunities and challenges
The Collaborative Group is an international collaboration formed to address etiological factors for breast cancer that may be particular to, or differ in, premenopausal or perimenopausal women. By harmonizing a wide range of exposure variables across 20 studies and developing quality assurance and analysis programs, our collaboration is in a position to conduct initial analyses of pregnancy, obesity and physical activity, and to leverage the research infrastructure and established collaboration model for investigations of other risk factors. Our initial aims do not require the use of biospecimens. However, biospecimens have been collected in many of the participating studies, as described in the Cancer Epidemiology Descriptive Cohort Database (available at https://cedcd.nci.nih.gov/biospecimen.aspx) and could potentially be incorporated to address future hypotheses.
Some limitations and challenges have emerged. As in many consortia, information from the participating studies in the Collaborative Group was not collected with future pooling efforts in mind and follow-up data are not collected at standardized intervals. Therefore, harmonization efforts must identify common data elements that are collected with minimal levels of measurement error. Identification of these elements can be complicated by questionnaires and codebooks that must be translated to a common language.
Another aspect of working on pooling cohorts that requires planning and forethought is the potential for overlap of participants between studies, for example, in Scandinavian countries with multiple cohorts that have wide geographic catchment areas. Although the existence of national identifiers makes it theoretically possible to identify women who may contribute information to more than one study in a country, the logistics for obtaining approval and merging datasets can be prohibitive. Therefore, we have worked with study investigators to identify the individual cohorts within a country with the most relevant information for specific Collaborative Group aims, and to develop strategies for excluding specific geographic regions from one cohort, but not another, where overlap of cohort catchment areas is known to exist.
The value of cancer consortia to address scientific questions efficiently and create new opportunities has become increasingly recognized (104). Conducting analyses across multiple studies requires ongoing communication and transparency. Our Collaborative Group holds in-person working group meetings in conjunction with the NCI Cohort Consortium annual meeting, as well as regular telephone conferences. These meetings provide a forum to discuss additional hypotheses that can be addressed in the future to maximize the value of the created infrastructure. The Cohort Consortium provides valuable coordinating and communication services and dedicated time and space through the annual meeting; however, other research support for data preparation, ongoing infrastructure development, and dedicated time for collaboration remains a challenge faced across many large-scale projects. Our Collaborative Group and others continue to work to identify and streamline data sharing models to maximize productivity and collaborative opportunity.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
We wish to acknowledge all study participants, staff, and participating cancer registries as well as Hoda Anton-Culver, Jianwen Cai, Jessica Clague, Christina Clarke, Dennis Deapen, Niclas Håkansson, Allison Iwan, Diane Kampa, James Lacey, Eunjung Lee, Siew-Hong Low, David Nelson, Susan Neuhausen, Katie O'Brien, Hannah Park, Jerry Reid, Peggy Reynolds, Sophia Wang, Renwei Wang, Mark Weaver, Jiawei Xu, Jeffrey Yu, and Argyrios Ziogas.
Grant Support
Support for this research comes, in part, from the Avon Foundation (02-2014-080); Breast Cancer Now; The Institute of Cancer Research, London; the United States National Institutes of Health National Institute of Environmental Health Sciences (Z01 ES044005, P30 ES000260) and National Cancer Institute (UM1 CA176726, UM1 CA186107, UM1 CA182876, UM1 CA182934, UM1 CA164974, R01 CA058420, R01 CA092447, CA077398, CA144034); the United States National Center for Advancing Translational Sciences (KL2-TR001109), the National Program of Cancer Registries of the Centers for Disease Control and Prevention, and the Department of Energy; the Swedish Research Council and Swedish Cancer Foundation; the Japanese Ministry of Health, Labor and Welfare; the Hellenic Health Foundation; Karolinska Institutet Distinguished Professor Award Dnr: 2368/10-221; Cancer Council Victoria and the Australia National Health and Medical Research Council (209057, 396414, 504711); the State of Maryland, the Maryland Cigarette Restitution Fund; and the United Kingdom National Health Service funding to the Royal Marsden/ICR NIHR Biomedical Research Centre. The coordination of the European Prospective Investigation in Cancer is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l'Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM; France); German Cancer Aid, German Cancer Research Center (DKFZ), Federal Ministry of Education and Research (BMBF), Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation (Greece); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); ERC-2009-AdG 232997 and Nordforsk, Nordic Centre of Excellence program on Food, Nutrition and Health (Norway); Health Research Fund (FIS), PI13/00061 to Granada, PI13/01162 to EPIC-Murcia, Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, ISCIII RETIC (RD06/0020; Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C570/A16491 and C8221/A19170 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford; United Kingdom).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.