Background: We report on the establishment of a web-based Cancer Epidemiology Descriptive Cohort Database (CEDCD). The CEDCD's goals are to enhance awareness of resources, facilitate interdisciplinary research collaborations, and support existing cohorts for the study of cancer-related outcomes.

Methods: Comprehensive descriptive data were collected from large cohorts established to study cancer as primary outcome using a newly developed questionnaire. These included an inventory of baseline and follow-up data, biospecimens, genomics, policies, and protocols. Additional descriptive data extracted from publicly available sources were also collected. This information was entered in a searchable and publicly accessible database. We summarized the descriptive data across cohorts and reported the characteristics of this resource.

Results: As of December 2015, the CEDCD includes data from 46 cohorts representing more than 6.5 million individuals (29% ethnic/racial minorities). Overall, 78% of the cohorts have collected blood at least once, 57% at multiple time points, and 46% collected tissue samples. Genotyping has been performed by 67% of the cohorts, while 46% have performed whole-genome or exome sequencing in subsets of enrolled individuals. Information on medical conditions other than cancer has been collected in more than 50% of the cohorts. More than 600,000 incident cancer cases and more than 40,000 prevalent cases are reported, with 24 cancer sites represented.

Conclusions: The CEDCD assembles detailed descriptive information on a large number of cancer cohorts in a searchable database.

Impact: Information from the CEDCD may assist the interdisciplinary research community by facilitating identification of well-established population resources and large-scale collaborative and integrative research. Cancer Epidemiol Biomarkers Prev; 25(10); 1392–401. ©2016 AACR.

This article is featured in Highlights of This Issue, p. 1367

Understanding the determinants of cancer and other chronic diseases requires an in-depth knowledge of the complex interactions of genomic, biological, clinical, lifestyle, and societal factors (1). Cohort studies have helped researchers to better understand the complex etiology of cancer, study cancer outcomes, develop risk prediction analyses and models, and improve guidelines for cancer prevention and control policies (2). Combining risk factors and molecular data across cohorts has supported genomic, epigenomic, proteomic, and metabolomics research of unprecedented scope (3–7). It has been proposed that a “synthetic cohort,” achieved through a collaborative approach among the major cohorts funded by the NIH (Bethesda, MD), could expedite an integrative approach by assembling multilevel data collected through the lifespan, in health and disease status, and by providing a framework for interdisciplinary research (8–10). The value of existing and future cohorts has been greatly enhanced by expanding collaborations and studying multiple health outcomes through data sharing and pooling (10–12). Recently, it has been suggested that registration of observational studies may help improve the transparency and quality of epidemiologic research and reduce the extent of publication biases (10, 13–15). In contrast to randomized trials where single protocols are registered, for observational studies, it may be best to register unique cohorts' infrastructures, so that they can be tapped to produce collaborative studies involving multidisciplinary teams (14, 16). A dynamic, comprehensive, and publicly accessible inventory of cohorts is fundamental to facilitate collaborative scientific efforts and cost-effective assembly and utilization of resources and will assist the research community and funding agencies in the planning of new studies and in maximizing the returns on investments.

The Epidemiology and Genomics Research Program (EGRP), of the NCI's Division of Cancer Control and Population Sciences, fosters cohort-based research and the establishment of cohorts infrastructures through numerous initiatives and workshops (17–19). The NCI Cohort Consortium (19) was formed in 2000 to foster collaborations across cancer epidemiology cohorts, by initially focusing on the need to assemble large populations for genome-wide association studies to investigate genetic risk factors in cancer (20, 21). The Cohort Consortium expanded its reach to address critical research that could not be addressed by single-research studies (22–24). Similarly, other consortia in the United States and worldwide have combined cohort infrastructures to address the etiology and genomics of complex diseases (25–29).

The NCI Cohort Consortium has had success in stimulating more than 50 ongoing and completed collaborative projects; however, many challenges still exist. These include barriers to systematic data harmonization, especially when considering the integration of multilevel datasets; lack of a data-coordinating center (30) to maximize data management and deployment; and consistent, streamlined, and comprehensive data sharing policies and processes (31). Paramount is the overall need for tools facilitating transparency, expediting scientific collaborations, enhancing the quality of reported research, and supporting interdisciplinary approaches within the framework of large epidemiologic studies (10). We report on the establishment and characteristics of the Cancer Epidemiology Descriptive Cohort Database (CEDCD), developed to address some of these challenges and facilitate interdisciplinary research collaborations. We review the descriptive data currently available in the CEDCD and discuss the scope and potential of the combined cancer cohorts' infrastructure.

Development of the descriptive data collection instruments

To populate the descriptive data included in the online database, a Data Collection Form and the Biospecimen and Cancer Count Information Spreadsheet (Supplementary Material S1 and S2) were developed. Both tools were assessed for the time and effort required for completion and received OMB clearance. These instruments were designed with consideration for study variation in an effort to promote standardized metrics across cohorts. To maintain participants' privacy given the publicly accessible nature of CEDCD, only descriptive data (i.e., no individual-level data) were collected. The principal investigators from each cohort provided consent to publicly share the cohorts' descriptive information through the CEDCD. The Data Collection Form requested descriptive data in the following categories: basic profile information, such as investigators, websites, and a brief description of the cohort; study design and eligibility criteria; enrollment counts by race/ethnicity/gender; major content domain data collected at baseline and follow-up; types of cancer and other disease outcomes; mortality data, incorporation of mobile health technologies in protocols, and types and counts of biospecimens collected. If accurate information in any of these domains was available from public or NCI sources, the form was prefilled and the cohorts were asked to review the data. The Biospecimen and Cancer Count Information Spreadsheet was included to provide incident and prevalent cancer numbers by gender, with cancers defined by ICD-9 and ICD-10/O coding, and biospecimen counts provided by cancer type. Supplementary information requested included cohorts' policies on data sharing and access, publication policies, and questionnaires. Review and editing of these tools as well as a pilot test to assess the time, effort, and ease in completing the forms were conducted by cohort investigators and affiliated staff.

Study selection and data collection

The 57 NCI Cohort Consortium members (as of January 2015) were invited to participate in the CEDCD. These cohorts met the following criteria: (i) focusing on the study of cancer as their primary outcome, and (ii) a minimum of 10,000 study participants currently enrolled, or (iii) cancer patient/survivor cohorts of at least 5,000 participants across multiple cancer sites or 2,000 participants diagnosed with the same or narrowly related cancer sites. The latter criteria was determined to support research addressing determinants of cancer progression, recurrence, mortality, incidence, and other cancer/health–related outcomes (32). The database will be supported by NCI's EGRP, and cohorts will be contacted annually to update existing information. Eligible cohorts can begin the process for inclusion at any time by completing an online request (https://cedcd.nci.nih.gov/contact.aspx).

Completed questionnaires were reviewed for discrepancies and missing data to ensure completeness and accuracy before posting to the web-based database. The website was designed to maximize users' abilities to search for cohorts by name, find detailed information about a single cohort, and compare information across cohorts. On the basis of feedback received from beta-testing, the website design was maximized for ease of navigation, and user-friendly help tips were added throughout the site to highlight different features.

Characteristics of participating cohorts

The CEDCD website was launched in March 2015. As of December 2015, descriptive data from 46 of the 57 invited cohorts (81%) are available on the CEDCD website. Of the 11 not listed, five cohorts did not respond, four requested additional time, and two submitted partially completed forms. Participating cohorts were classified according to three categories: risk, survivor, and hybrid cohorts. Risk epidemiology cohorts, those enrolling healthy participants to be followed over time to detect cancer incidence, represented the majority (83%); hybrid cohorts (15%), which enrolled families including healthy individuals and cancer survivors for prospective follow-up, and one cohort (2%) enrolled only cancer survivors and followed them over time to study cancer-related outcomes and survivorship issues (Table 1). Review of the basic cohort characteristics, including the year enrollment began, and the most recent year data that were collected showed that, overall, this is a group of long-established cohorts, that began enrollment decades ago, with the average (and median) year for the start of enrollment being 1993 (Table 1). Six cohorts (13%) are still actively enrolling participants, while the most recent year of active follow-up ranged from 1974 to 2015. Given that the primary outcome of interest was cancer, most cohorts specifically enrolled adults, with the average minimum and maximum ages at enrollment being 34 and 79.5 years old, respectively. However, four cohorts enrolled participants under the age of 18 (Table 1).

Demographics

More than 6.5 million individuals have been enrolled by the 46 cohorts, with three times as many females enrolled compared with males (Table 2). More than half of the cohorts, 24 in total (52%), have enrolled both genders, while 15 (33%) enroll only females, and 7 (15%) enroll only males (Table 1). Overall, females outnumber males in 34 cohorts. Most cohorts (44) provided data on the race/ethnicities of their participants, with stratified enrollment numbers listed in Table 2. The majority of enrolled participants classify themselves as non-Hispanic whites (65%), followed by Asian (10%). More than 385,000 participants classified themselves as black or African American and 290,000 as Hispanic. In addition, more than 30,000 individuals reported belonging to more than one race.

Participants have been enrolled across three continents, from 17 different countries. The highest recruitment was from North America, with 31 cohorts having catchment areas in the United States and 8 in Canada. There are 13 cohorts with catchment areas in Europe, followed by Asia (6) and Australia (Supplementary Table S1; ref. 3).

Data collection

The database is capable of displaying the types of data collected by each cohort as well as summarizing descriptive data across the cohorts selected for comparison. The tables found under Table 3 show the categories of data collected at baseline, including risk factors, cancer outcomes, comorbidities, and cancer treatment received. The vast majority of cohorts reported collecting data on key risk factors including smoking/cigarette use (91%), alcohol use (84%), physical activity (78%), and dietary intake (78%) (Table 3A). Data collection on other major diseases, which is useful to investigators trying to utilize the cohort data across different complex disease phenotypes, includes diabetes and heart disease in 36 cohorts (78%; Table 3B), representing almost 4 million participants. More than 50% of the cohorts (n = 30 and n = 27, respectively) collected data on digestive and lung disease as well (Table 3B). The cohorts were not asked to specify whether the outcomes were collected through self-report of if they were determined through medical records. Cancer treatment data were collected by 27 cohorts (59%; Table 3C). The most common method to obtain treatment information was from patient-reported questionnaires (55%), followed by medical chart abstraction (44%), abstraction from electronic medical records (18%), and from administrative claims (14%; data not shown).

The majority of cohorts (76%) have utilized multiple data collection approaches, with most still using mail-in questionnaires (89%), followed by in-person interviews (50%). Interestingly, 41% of cohorts administered questionnaires electronically via the Internet, surpassing the number of studies that contact participants by telephone (24%). Two cohorts have adopted cloud-based approaches for the collection, management or distribution of their study data on the Internet, and another seven (15%) are considering using this technology within their cohort. With the evolution of mobile technologies, cohorts are moving toward using such novel approaches for data collection. Three (6.5%) cohorts have adopted data collection through mobile devices, but another 11 cohorts (24%) are considering the use of mobile devices. Two of the cohorts using mobile technologies reported using tablets, with one specifying its use to obtain consent and collect questionnaires, while the other collected detailed specimen data during in-person blood collections.

Half (n = 23) of the cohorts link their data to other existing databases. The majority of the linkages are between state registries, SEER, Medicare, and other tumor and cancer registries. Almost all of the cohorts confirmed collecting mortality data (44 cohorts, 96%). The majority of cohorts confirm the death of participants by using linkage to the National Death Index (27 cohorts, 59%) or to state death certificates (28 cohorts, 51%) and to other registration systems, such as SEER, cancer registries, obituaries, and the Social Security Death Index.

A total of 634,801 incident cancer cases were reported by the risk epidemiology cohorts and 40,676 prevalent cancer cases by the hybrid and survivors cohorts. A breakdown of the cancer sites reported is listed in Table 4. The top five incident cancers are breast, prostate, lung, colon, and melanoma, while higher prevalent cases included breast, colon, rectum and anus, prostate, and cervical cancers. There were considerable numbers for the less common cancer, as well. For example, there were almost 8,000 incident cases of brain cancer and 22,944 for bladder cancer.

Biobanking, genomics, and other -omics

Blood samples were collected at least once on a subset of cohort participants by 36 cohorts (78%), while 24 cohorts (52%) did so at multiple time points (Table 5A). Types of biospecimens collected included blood, buccal, sputum, feces, lymphocytes, tumor tissue, and urine samples. Tumor tissues were collected by 46% of the cohorts, and normal tissue was available for 30% of the cohorts. For the studies that do not collect tumor tissue, 39% had knowledge of where the tumor tissue was stored for future studies (Table 5B).

The rise in molecular and genomic epidemiology studies is reflected in the biospecimen and molecular data collection numbers (Table 6), with 67% of the cohorts performing genotyping (SNPs) for genome-wide association studies, 46% implementing whole-exome or whole genome sequencing either on blood or tissues, and 50% collecting epigenetic/metabolic marker data. Although not all cohorts perform a systematic molecular and genomic characterization of cohort participants through high-throughput assays, these data reflect the feasibility of comprehensive “omic” characterization within large cohorts, which is currently limited by lack of resources and rapidly changing technologies.

Cohort-based research

A total of 75% of the cohorts (n = 36) report to have participated in cross-cohort data harmonization activities, a fact due at least in part to their participation in the NCI Cohort Consortium, and their involvement in many additional consortia and pooling projects.

CEDCD-based analyses in combination with data linkage to existing databases providing information related to cancer research have also been explored. A list of NIH-funded grants supporting the cohorts' research was obtained by searching the NIH's Query, View and Report database (conducted in May–June 2015). A text search, using each of the cohort's name and acronym, was used to compile the list of grants for each of the cohorts. It should be noted that there are limitations to the text search function when searching grant applications submitted prior to 2008, as a number of applications in the NIH grants database are scanned image files. Those grants that were funded by the NCI were further examined using the NCI's Portfolio Management Application (PMA) 16.1 database and the cancer activity (CA) code to determine the type of grant awarded and the research area addressed.

Thirty-seven cohorts were found to be supported by NIH-funded grants since 1997. Of the nine cohorts without funding, all but one, were international. Of the 37 funded cohorts, 29 were based in the United States, and eight were international. Considering the primary outcome of interest of the cohorts is cancer, it was not surprising that the majority (66%) of the NIH grants were funded by the NCI (n = 407). Other major funding institutes included the National Heart, Lung, and Blood Institute (50 grants), the National Institute of Diabetes and Digestive and Kidney Diseases (36 grants), the National Institute of Environmental Health Sciences (30 grants), and the National Institute on Aging (22 grants).

The PMA 16.1 database was used to categorize grants awarded by the NCI by type (Supplementary Fig. S1) and by cancer activity (CA) code (Supplementary Fig. S2). The five most common activity codes contained more than 60% of all of the NCI-funded grants and included Genomic Epidemiology (73 grants), Modifiable Risk Factors (Epidemiology; 70 grants), General Epidemiology (51 grants), Training (29 grants), Nutrition (24 grants), and Early Detection/Biomarkers (21 grants).

The CEDCD facilitates collaborative research using data from cancer epidemiology cohort studies. Participation in the CEDCD is open to all eligible cohorts meeting the criteria as described previously. For those investigators interested in having their cohort be included in the database, an inquiry should be made through the contact page (https://cedcd.nci.nih.gov/contact.aspx). When accessing the database, users can select cohorts by searching via cohort name, selecting from the alphabetized list, or by choosing advanced criteria of interest. The “Help” icon enables users to get a step-by-step tutorial on how to explore all of the databases' functions.

Analysis of the CEDCD descriptive data for the major cohorts studying cancer and related outcomes shows a powerful population infrastructure and identifies areas for possible enhancement (33). Pooling projects that combine individual-level data across these cohorts to achieve large sample sizes are feasible, and collaborative studies that no single cohort alone can perform could be undertaken. These include studies involving population subgroups, rare cancers, rare genotypes, exposures, and phenotypes, and evaluating interactions between common genetic and nongenetic risk factors. Research on rare cancers typically relies on case–control studies because the sample sizes required exceed most prospective studies. However, the CEDCD can help investigators identify studies with the rare exposures and endpoints of interest to facilitate pooling projects. Furthermore, with the varied age range of the participants, large-scale research on early-onset cancers among certain ethnic and racial groups is possible. In addition, the assessment for other disease endpoints in the CEDCD cohorts could greatly facilitate the study of comorbidities, especially in elderly populations. Collaboration among cancer research teams and experts in other common disease domains (e.g., CVD and aging) is essential for this transdisciplinary work.

The study of cancer health disparities is a current priority for the NCI and NIH. Population-based research on health disparities has been hampered in the past by the lack of sufficient numbers of study participants reflecting diverse ethnic and racial composition (34, 35). The combined data from the CEDCD show that collaborative efforts could greatly help facilitate research addressing underserved ethnic and racial groups, given the collective number of cohorts' participants from the relevant populations, including Hispanics (one of the fastest growing ethnic groups in the United States), African Americans, and Asian populations.

The lifestyle data collected by the majority of the CEDCD cohorts reflect the major risk factors for cancer, as well as for cardiovascular disease and diabetes, which are among the 10 most prominent causes of mortality in high-income countries. The systematic addition of clinical risk factor and treatment data, particularly electronic medical records, is an important feature that could enhance and enrich the usefulness of the cohorts' data. Although the CEDCD data indicate that the process of data integration is under way and present tremendous resource for pooled analyses, it is also clear that a comprehensive standardized effort for multilevel data harmonization, management, and distribution is necessary (36).

In this initial survey of the use or willingness to use m-health technologies, we did not request further details on the specific technologies and approaches that the cohort intended to use to implement m-health. The use of mobile applications for data collection (e.g., physical activity, diet, and medication use), storage, and management will likely increase during the coming years, due in part to methodologic advances and efforts of the precision medicine initiative in the United States and others abroad. Communication across cohorts and a forum to define the most successful m-health protocols to acquire accurate information from each subpopulation and the adoption of cross-cohort standardization for these rapidly evolving tools and protocols are key to support more efficient and cost-effective next-generation collaborative studies. These trends could be monitored in future editions of the CECDC.

Extensive genomic and other ‘omics characterization has already been performed on a large subset of the cohorts’ biospecimens. A collaborative approach to systematic integration of the existing genomics data has enabled the development of studies large enough to address the complex nature of the genetic factors underlying common diseases. This approach has been adopted by international OncoArray Network (37), an NCI, Genome Canada, and Cancer Research UK initiative, which includes as members many of the CEDCD participating cohorts, enabling the discovery of additional common and rarer susceptibility variants (37) Similarly, the integration of ‘omics, lifestyle, functional and clinical data derived from cohorts infrastructure has been initiated by large NCI-sponsored initiatives, such as the Genetic Associations and Mechanisms in Oncology and by multiple NCI-sponsored Consortia (add cohort consortium (17, 19), showing the feasibility of combining epidemiologic, genomics, functional, and clinical data to support a new generation of integrative epidemiology analyses.

The CEDCD cohorts ascertain extremely high numbers of incident common cancers. More than half of the cohorts have collected blood at multiple time points, and most have collected or have the capability to collect tissues, empowering research in disentangling the heterogeneity of molecular heterogeneity within cancer sites and the validation of predictive biomarkers as well as the relationship between somatic and germline genomes. However, the relative lack of systematic tissue and DNA collection across cohorts handicaps the ability to pool molecular data derived from these samples for larger studies.

Given the overall large numbers of incident cancers, the extensive follow-up data, the enrollment of diverse populations and the rich biospecimen collections, these cohorts could provide the infrastructure to enable build-in clinical trials seeking individuals with unique characteristics within a cohort population who may benefit from precision medicine approaches (38, 39). Moreover, they can be used to embed clinical trials of interventions on asymptomatic individuals (40). These cohorts could also support studies of late effects of cancer therapy and related comorbidities, which are not easily explored in the time frame of a clinical trial. To function, this translational pipeline would require the seamless integration of multidisciplinary teams, including epidemiologists, molecular scientists, behavioral scientists, and clinicians, a transition already in process in many of the NCI-funded cancer centers.

In summary, we have created a unique and detailed inventory of existing cohorts with cancer as a primary outcome. Examination of the CEDCD initial data show that the combined cancer cohorts examined could provide access to the broadest range of genotypes, phenotypes, and exposures, thereby accelerating efforts to detect and analyze subtle and important signals with greater accuracy and inform precision medicine. Increasing the number and variety of cohorts enrolled would provide valuable data for investigators from multiple disciplinary domains and facilitate collaborative study of cancer-related endpoints. Continuing this initial effort by maintaining a current and comprehensive descriptive database of large prospective cohorts will facilitate important epidemiologic research by allowing the identification of areas of strength and needs for the current overall cohorts' infrastructure, and by informing the use of resources through cost-effective planning by both investigators and funding agencies.

No potential conflicts of interest were disclosed.

Conception and design: A.E. Kennedy, M.J. Khoury, J.P.A. Ioannidis, C. Lane, G.Y. Lai, C. Harvey, D. Seminara

Development of methodology: A.E. Kennedy, M.J. Khoury, J.P.A. Ioannidis, A. Miller, C. Lane, G.Y. Lai, D. Seminara

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.E. Kennedy, M. Brotzman, A. Miller, C. Lane, G.Y. Lai, C. Harvey, D. Seminara

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A.E. Kennedy, J.P.A. Ioannidis, C. Lane, G.Y. Lai, C. Harvey, D. Seminara

Writing, review, and/or revision of the manuscript: A.E. Kennedy, M.J. Khoury, J.P.A. Ioannidis, M. Brotzman, A. Miller, C. Lane, G.Y. Lai, S.D. Rogers, C. Harvey, J.W. Elena, D. Seminara

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A.E. Kennedy, M. Brotzman, C. Lane, G.Y. Lai, C. Harvey, D. Seminara

Study supervision: D. Seminara

The authors wish to acknowledge all of the participating cohorts. A special thanks to the program managers and study staff for taking the time to complete the data collection forms and to the principal investigators of each of the cohort studies:

Seventh-day Adventist Cohort Study-2: Gary E. Fraser, Synnove Knutsen, and Karen Jaceldo-Siegl (Loma Linda University School of Public Health), and Michael Orlich (Loma Linda University School of Medicine)

Alpha-Tocopherol Beta-Carotene Cancer Prevention Study: Demetrius Albanes and Stephanie Weinstein (NCI)

British Columbia Generations Project: John J. Spinelli, Angela R. Brooks-Wilson, Tim K. Lee, and Nhu D. Le (BC Cancer Agency)

Breast Cancer Family Registry Cohort: Esther M. John (Cancer Prevention Institute of California), Mary Beth Terry (Columbia University), Irene A. Andrulis (Mount Sinai Hospital), Mary Daly (Fox Chase Cancer Center), Saundra S. Buys (University of Utah Health Sciences Center), and John L. Hopper (The University of Melbourne)

Breast Cancer Surveillance Consortium Research Resource: Ellen O'Meara (Group Health Research Institute)

Breakthrough Generations Study: Anthony Swerdlow (Institute of Cancer Research)

A Follow-up Study for Causes of Cancer in Black Women: Black Women's Health Study: Lynn Rosenberg and Julie Palmer (Boston University), and Lucile Adams-Campbell (Georgetown University)

Carotene and Retinol Efficacy Trial: Gary Goodman, Mark Thornquist, and Marian Neuhouser (Fred Hutchinson Cancer Research Center)

CARTaGENE: Philip Awadalla (CARTaGENE, St-Justine Hospital)

Clue Cohort Study I and II: Kala Visvanathan, Josef Coresh, Corrine Joshu, and Elizabeth Platz (Johns Hopkins University)

Colon Cancer Family Registry Cohort: Robert W. Haile (Stanford University), Mark Jenkins (University of Melbourne), Noralane Lindor (Mayo School of Medicine, Scottsdale), Steven Gallinger (Mount Sinai Hospital), Loic Le Marchand (University of Hawaii), Polly A. Newcomb (Fred Hutchinson Cancer Research Center), Dennis Ahnen (University of Colorado, Denver), Kristen Anton (Geisel School of Medicine at Dartmouth), Graham Casey (University of Southern California), Iona Cheng (Cancer Prevention Institute of California), James Church (Cleveland Clinic Digestive Disease Institute), and Timothy Church (University of Minnesota)

Cancer Prevention Study II Nutrition Cohort: Susan Gapstur (American Cancer Society)

Canadian Study of Diet, Lifestyle and Health: Thomas Rohan (Albert Einstein College of Medicine) and Vicki Kirsh (Cancer Care Ontario)

California Teachers Study: Leslie Bernstein and James Lacey (City of Hope)

European Prospective Investigation in Cancer and Nutrition: Elio Riboli (Imperial College, London)

Golestan Cohort Study: Christian Abnet and Sanford Dawsey (NCI), Paolo Boffetta (Mount Sinai), Paul Brennan (IARC), Farin Kamangar (Morgan State University), and Reza Malekzadeh (Digestive Diseases Research Institute)

Health Professionals Follow-up Study: Walter Willett, Eric Rimm, Donna Spiegelman, and Meir Stampfer (Harvard School of Public Health)

Iowa Women's Health Study: Kim Robien (George Washington University), DeAnn Lazovich (University of Minnesota)

Janus Serum Bank: Giske Ursin and Hilde Langseth (Cancer Registry of Norway)

Mexican American (Mano a Mano) Cohort: Xifeng Wu, Hua Zhao, and Wong-Ho Chow (MD Anderson Cancer Center)

The Melbourne Collaborative Cohort Study: Graham Giles and Roger Milne (Cancer Council Victoria), Dallas English and John Hopper (University of Melbourne)

Multiethnic Cohort: Loic Le Marchand and Lynne Wilkens (University of Hawaii), Christopher Haiman (University of Southern California)

Mayo Mammography Health Study: Celine M. Vachon (Mayo Clinic)

Nurses' Health Study: Meir Stampfer (Harvard School of Public Health), Wendy Chen, Vincent Carey, and Diane Feskanich (Harvard Medical School)

Nurses' Health Study II: Walter C. Willett and Donna Spiegelman (Harvard School of Public Health), Heather Eliassen and Rulla Tamimi (Brigham and Women's Hospital)

The National Institutes of Health AARP Diet and Health Study: Rashmi Sinha, Charles E. Matthews, Louise Brinton, and Linda Liao (NCI)

NYU Women's Health Study: Anne Zeleniuch-Jacquotte (NYU School of Medicine)

Ontario Health Study: Mark Purdue (Ontario Health Study)

Prostate Cancer Prevention Trial: Catherine Tangen and Michael LeBlanc (Fred Hutchinson Cancer Research Center), Ian Thompson (University of Texas Health Science Center)

Physicians' Health Study I and II: J. Michael Gaziano and Howard D. Sesso (Brigham & Women's Hospital/Harvard Medical School)

Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial: Paul Pinsky and Neal Freedman (NCI)

RERF Life Span Study: Kotaro Ozasa and Eric Grant (Radiation Effects Research Foundation)

Southern Community Cohort Study: William Blot and Wei Zheng (Vanderbilt University Medical Center), Margaret Hargreaves (Meharry Medical College)

Singapore Chinese Health Study: Jian-Min Yuan and Lesley Butler (University of Pittsburgh), Woon-Puay Koh (National University of Singapore)

Shanghai Cohort Study: Jian-Min Yuan and Lesley Butler (University of Pittsburgh), Yu-Tang Gao (Shanghai Cancer Institute)

Selenium and Vitamin E Cancer Prevention Trial: Catherine Tangen and Michael LeBlanc (Fred Hutchinson Cancer Research Center), Ian Thompson (University of Texas Health Science Center)

Sister Study: Dale P. Sandler, Jack Taylor, Clarice R. Weinberg, (National Institute of Environmental Health Sciences)

Shanghai Men's Health Study: Xiao O. Shu, Wei Zheng, Gong Yang (Vanderbilt University Medical Center), Yong-Bing Xiang (Shanghai Cancer Institute)

Shanghai Women's Health Study: Xiao O. Shu, Wei Zheng, Gong Yang (Vanderbilt University Medical Center), Yu-Tang Gao (Shanghai Cancer Institute)

VITamins And Lifestyle: Emily White and Ulrike Peters (Fred Hutchinson Cancer Research Center)

Women's Health Initiative: Garnet Anderson (Fred Hutchinson Cancer Research Center)

Women's Health Initiative Cancer Survivor Cohort: Garnet Anderson (Fred Hutchinson Cancer Research Center), Electra Paskett (Ohio State University), Bette Caan (Kaiser Permanente Northern California), and Rowan Chlebowski (University of California, Los Angeles)

Women's Health Study: Julie E. Buring, I-Min Lee, Nancy Cook, and Dan Chasman (Brigham and Women's Hospital)

Women's Lifestyle and Health: Elisabete Weiderpass Vainio (Karolinska Institutet, Stockholm, Sweden)

The authors would also like to thank Dr. Kathy Helzlsouer for her comments and edits to the manuscript.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Lauer
MS
. 
Time for a creative transformation of epidemiology in the United States
.
JAMA
2012
;
308
:
1804
5
.
2.
Potter
JD
. 
Epidemiology informing clinical practice: from bills of mortality to population laboratories
.
Nat Clin Pract Oncol
2005
;
2
:
625
34
.
3.
Thomas
G
,
Jacobs
KB
,
Yeager
M
,
Kraft
P
,
Wacholder
S
,
Orr
N
, et al
Multiple loci identified in a genome-wide association study of prostate cancer
.
Nat Genet
2008
;
40
:
310
5
.
4.
Cox
DG
,
Blanche
H
,
Pearce
CL
,
Calle
EE
,
Colditz
GA
,
Pike
MC
, et al
A comprehensive analysis of the androgen receptor gene and risk of breast cancer: results from the National Cancer Institute Breast and Prostate Cancer Cohort Consortium (BPC3)
.
Breast Cancer Res
2006
;
8
:
R54
.
5.
Lynch
SM
,
Vrieling
A
,
Lubin
JH
,
Kraft
P
,
Mendelsohn
JB
,
Hartge
P
, et al
Cigarette smoking and pancreatic cancer: a pooled analysis from the pancreatic cancer cohort consortium
.
Am J Epidemiol
2009
;
170
:
403
13
.
6.
Yeager
M
,
Orr
N
,
Hayes
RB
,
Jacobs
KB
,
Kraft
P
,
Wacholder
S
, et al
Genome-wide association study of prostate cancer identifies a second risk locus at 8q24
.
Nat Genet
2007
;
39
:
645
9
.
7.
Amundadottir
L
,
Kraft
P
,
Stolzenberg-Solomon
RZ
,
Fuchs
CS
,
Petersen
GM
,
Arslan
AA
, et al
Genome-wide association study identifies variants in the ABO locus associated with susceptibility to pancreatic cancer
.
Nat Genet
2009
;
41
:
986
90
.
8.
Boerwinkle
E
. 
Translational genomics is not a spectator sport: a call to action
.
Genet Epidemiol
2012
;
36
:
85
7
.
9.
Willett
WC
,
Blot
WJ
,
Colditz
GA
,
Folsom
AR
,
Henderson
BE
,
Stampfer
MJ
. 
Merging and emerging cohorts: not worth the wait
.
Nature
2007
;
445
:
257
8
.
10.
Khoury
MJ
,
Lam
TK
,
Ioannidis
JP
,
Hartge
P
,
Spitz
MR
,
Buring
JE
, et al
Transforming epidemiology for 21st century medicine and public health
.
Cancer Epidemiol Biomarkers Prev
2013
;
22
:
508
16
.
11.
Khoury
MJ
,
Wei
G
. 
The future of epidemiology in the age of precision medicine: cancer, cardiovascular disease, and beyond
.
Atlanta, GA
:
CDC Genomics and Health Impact Blog
; 
2015
. Available from: http://blogs.cdc.gov/genomics/2015/08/11/the-future-of-epidemiology/.
12.
Roger
VL
,
Boerwinkle
E
,
Crapo
JD
,
Douglas
PS
,
Epstein
JA
,
Granger
CB
, et al
Strategic transformation of population studies: recommendations of the working group on epidemiology and population sciences from the National Heart, Lung, and Blood Advisory Council and Board of External Experts
.
Am J Epidemiol
2015
;
181
:
363
8
.
13.
Bracken
MB
. 
Preregistration of epidemiology protocols: a commentary in support
.
Epidemiology
2011
;
22
:
135
7
.
14.
Dal-Re
R
,
Ioannidis
JP
,
Bracken
MB
,
Buffler
PA
,
Chan
AW
,
Franco
EL
, et al
Making prospective registration of observational research a reality
.
Sci Transl Med
2014
;
6
:
224cm1
.
15.
Samet
JM
. 
To register or not to register
.
Epidemiology
2010
;
21
:
610
1
.
16.
Rifai
N
,
Bossuyt
PM
,
Ioannidis
JP
,
Bray
KR
,
McShane
LM
,
Golub
RM
, et al
Registering diagnostic and prognostic trials of tests: is it the right thing to do?
Clin Chem
2014
;
60
:
1146
52
.
17.
National Cancer Institute
. 
Genetic Associations and Mechanisms in Oncology (GAME-ON): a network of consortia for Post-Genome Wide Association (Post-GWA) research
.
Available from
: http://epi.grants.cancer.gov/gameon/.
18.
National Cancer Institute
. 
Cancer Epidemiology Consortia
.
Available from
: http://epi.grants.cancer.gov/Consortia.
19.
National Cancer Institute
. 
NCI Cohort Consortium
.
Available from
: http://epi.grants.cancer.gov/Consortia/cohort.html.
20.
Hunter
DJ
,
Riboli
E
,
Haiman
CA
,
Albanes
D
,
Altshuler
D
,
Chanock
SJ
, et al
A candidate gene approach to searching for low-penetrance breast and prostate cancer genes
.
Nat Rev Cancer
2005
;
5
:
977
85
.
21.
Kraft
P
,
Pharoah
P
,
Chanock
SJ
,
Albanes
D
,
Kolonel
LN
,
Hayes
RB
, et al
Genetic variation in the HSD17B1 gene and risk of prostate cancer
.
PLoS Genet
2005
;
1
:
e68
.
22.
Gallicchio
L
,
Helzlsouer
KJ
,
Chow
WH
,
Freedman
DM
,
Hankinson
SE
,
Hartge
P
, et al
Circulating 25-hydroxyvitamin D and the risk of rarer cancers: design and methods of the Cohort Consortium Vitamin D Pooling Project of Rarer Cancers
.
Am J Epidemiol
2010
;
172
:
10
20
.
23.
Birmann
BM
,
Neuhouser
ML
,
Rosner
B
,
Albanes
D
,
Buring
JE
,
Giles
GG
, et al
Prediagnosis biomarkers of insulin-like growth factor-1, insulin, and interleukin-6 dysregulation and multiple myeloma risk in the Multiple Myeloma Cohort Consortium
.
Blood
2012
;
120
:
4929
37
.
24.
Trabert
B
,
Ness
RB
,
Lo-Ciganic
WH
,
Murphy
MA
,
Goode
EL
,
Poole
EM
, et al
Aspirin, nonaspirin nonsteroidal anti-inflammatory drug, and acetaminophen use and risk of invasive epithelial ovarian cancer: a pooled analysis in the Ovarian Cancer Association Consortium
.
J Natl Cancer Inst
2014
;
106
:
djt431
.
26.
The Asia Cohort Consortium
.
Available from
: https://www.asiacohort.org/.
27.
European Network of Genomic and Genetic Epidemiology (ENGAGE)
.
Available from
: http://www.euengage.org/.
28.
Type 1 Diabetes Genetics Consortium (T1DGC)
.
Available from
https://www.niddkrepository.org/studies/t1dgc/.
29.
Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO)
.
Available from
: https://www.fredhutch.org/en/labs/phs/projects/cancer-prevention/projects/gecco.html.
30.
National Cancer Institute
. 
Novel approaches and challenges to data harmonization: maximizing the use of multi-level data in collaborative studies
.
Available from
: http://epi.grants.cancer.gov/events/data-harmonization/.
31.
National Cancer Institute
. 
NCI Workshop on broadening epidemiologic data sharing
.
Available from
: http://epi.grants.cancer.gov/events/datasharing2014/.
32.
National Institutes of Health
. 
Core Infrastructure and Methodological Research for Cancer Epidemiology Cohorts (U01)
.
Available from
: http://grants.nih.gov/grants/guide/pa-files/PAR-15–104.html.
33.
National Institutes of Health Precision Medicine Initiative Working Group
. 
The Precision Medicine Initiative Cohort Program – Building a research foundation for 21st century medicine
.
Available from
: https://www.nih.gov/sites/default/files/research-training/initiatives/pmi/pmi-working-group-report-20150917–2.pdf.
34.
Khoury
MJ
,
Iademarco
MF
,
Riley
WT
. 
Precision public health for the era of precision medicine
.
Am J Prev Med
2016
;
50
:
398
401
.
35.
Khoury
MJ
,
Evans
JP
. 
A public health perspective on a national precision medicine cohort: balancing long-term knowledge generation with early health benefit
.
JAMA
2015
;
313
:
2117
8
.
36.
Rolland
B
,
Reid
S
,
Stelling
D
,
Warnick
G
,
Thornquist
M
,
Feng
Z
, et al
Toward rigorous data harmonization in cancer epidemiology research: one approach
.
Am J Epidemiol
2015
;
182
:
1033
8
.
37.
National Cancer Institute
. 
OncoArray Network
.
Available from
: http://epi.grants.cancer.gov/oncoarray/.
38.
Lawler
M
,
Sullivan
R
. 
Personalised and precision medicine in cancer clinical trials: panacea for progress or Pandora's box?
Public Health Genomics
2015
;
18
:
329
37
.
39.
Schwaederle
M
,
Zhao
M
,
Lee
JJ
,
Eggermont
AM
,
Schilsky
RL
,
Mendelsohn
J
, et al
Impact of precision medicine in diverse cancers: a meta-analysis of phase II clinical trials
.
J Clin Oncol
2015
;
33
:
3817
25
.
40.
Ioannidis
JP
,
Adami
HO
. 
Nested randomized trials in large cohorts and biobanks: studying the health effects of lifestyle factors
.
Epidemiology
2008
;
19
:
75
82
.