Cohort studies have been central to the establishment of the known causes of cancer. To dissect cancer etiology in more detail—for instance, for personalized risk prediction and prevention, assessment of risks of subtypes of cancer, and assessment of small elevations in risk—there is a need for analyses of far larger cohort datasets than available in individual existing studies. To address these challenges, the NCI Cohort Consortium was founded in 2001. It brings together 58 cancer epidemiology cohorts from 20 countries to undertake large-scale pooling research. The cohorts in aggregate include over nine million study participants, with biospecimens available for about two million of these. Research in the Consortium is undertaken by >40 working groups focused on specific cancer sites, exposures, or other research areas. More than 180 publications have resulted from the Consortium, mainly on genetic and other cancer epidemiology, with high citation rates. This article describes the foundation of the Consortium; its structure, governance, and methods of working; the participating cohorts; publications; and opportunities. The Consortium welcomes new members with cancer-oriented cohorts of 10,000 or more participants and an interest in collaborative research. Cancer Epidemiol Biomarkers Prev; 27(11); 1307–19. ©2018 AACR.

For over 60 years, since Doll and Hill (1) and Hammond and Horn (2) demonstrated the ill effects of smoking, and Case and colleagues (3) demonstrated the hazards of dyestuff manufacture to the bladder, cohort studies have been the principal method to provide definitive evidence of the carcinogenicity to humans of noxious agents, behaviors, and other exposures. Almost every cause of cancer that is known was established by this means (4). Increasingly, however, the need to dissect cancer etiology in more detail, and to pursue risk factors with smaller effects, has meant that even the largest cohorts are proving to be too small—for instance, to assess risks from uncommon exposures, or of uncommon types or subtypes of cancer, or to examine interactions or risks in population subgroups. The advent of molecular genetics, often examining very small elevations or decreases in risk, and of personalized risk prediction, examining risks subdivided across multiple susceptibility strata, have exacerbated the problem.

Existing cohorts, however, have tended to be limited in size by financial and practical constraints. They have frequently been recruited from a restricted subset of the population, for instance teachers (5) or nurses (6, 7), and, with notable exceptions such as the Multiethnic Cohort Study (8), have often been limited in ethnic diversity. Similarly, cancer-focused cohorts, with some exceptions (e.g., refs. 5–7, 9, 10), have tended to recruit study participants who were at least 35 years of age at baseline, to increase the numbers of incident cancers early in follow-up. Furthermore, study investigators have focused most of their energy on their cohort's areas of research strength. Consequently, questionnaire data and biospecimens from these cohorts have often remained unused (or if used, greatly underpowered) for analyses that require larger numbers.

One potential solution to this problem has been to assemble ever-larger new cohorts (11, 12), but there are practical and financial limitations to the size of these, and new cohorts take many years before they accrue sufficient follow-up and outcome events. They have also tended to be of limited diversity by age, sometimes by sex, and usually by ethnicity and country. Pooling data from existing cohorts internationally can relatively cheaply and quickly provide cohort data on a very large scale, with diversity of populations and exposures, capitalizing on the investments already made in such cohorts and their follow-up.

To address these challenges, and to exploit new opportunities such as advances in methods for molecular genetics, the NCI (Rockville, MD), led by Drs. Robert Hoover and Robert Hiatt, convened a meeting in 2001 with the principal investigators of several cancer epidemiology cohorts. The meeting was embedded in a wider strategy that NCI developed in relation to genetic risk factors, cancer risks, and strategies for prevention.

These investigators agreed on the value of pooling projects to enable analyses that no one cohort, nor even a few cohorts in alliance, could do alone, while still allowing the individual cohorts to meet their own scientific objectives. Another contributor to the Consortium's ethos was the Pooling Project of Prospective Studies of Diet and Cancer (13), which started in 1991, includes many of the studies within the Consortium, and now forms a Working Group within the Consortium. The NCI therefore created the Cohort Consortium, initially fostering within it the Breast and Prostate Cancer Cohort Consortium (BPC3; ref. 14), which examined risks of breast and prostate cancers, using a nested case–control design within each of nine cohorts. BPC3 initially assessed risks in relation to germline variants in more than 60 candidate genes related to steroid hormone metabolism and the IGF pathway (15–17) with greater precision than previously available and then, taking advantage of the rapidly declining costs of genomic assays, conducted a series of genome-wide association studies (GWAS) of these cancers (18, 19). The infrastructure and collaborative trust built up over time were central to the success of these GWAS.

The Consortium has since grown greatly in size and scope. It now comprises over 200 scientists from multiple institutions internationally, who have agreed to participate in collaborative research efforts and to pool data from their cohorts to address scientific questions that cannot otherwise be addressed through single institutions and cohorts.

Governance and leadership

The Consortium's bylaws (https://epi.grants.cancer.gov/Consortia/bylaws.html) describe its current overall governance structure and the roles and responsibilities of the steering committee, Consortium members, and NCI staff, and are designed to facilitate dynamic, collaborative research, for instance by frequent rotations of the steering committee membership, and by each cohort having one vote on matters related to the Consortium.

The Consortium activities are overseen by a steering committee elected by the members. The steering committee is responsible for policy development, management, and setting the scientific direction. It includes 6 to 9 principal investigators who reflect the institutional, geographic, and gender diversity of the member cohorts, and three NCI ex officio voting members representing the NCI Divisions of (i) Cancer Control and Population Sciences and (ii) Cancer Epidemiology and Genetics. The chair and chair-elect are elected by the steering committee. Steering committee members are appointed for up to three 3-year terms, and the chair for a 1-year term. The overall day-to-day operations and technical support are managed by four NCI executive staff.

The steering committee holds monthly teleconferences to monitor the progress of ongoing Consortium studies and projects, address problems that arise in those projects, assess proposals for new scientific research projects and working groups, and organize the annual meeting.

Membership

There are currently 58 epidemiologic cohorts in the Consortium, representing populations from 20 countries across four continents (North America, Europe, Asia, and Australia). Membership of the Consortium is open, on application, to any cohort study with a minimum of 10,000 participants, in which cancer incidence is accurately assessed and some risk factor data are available. Membership also requires a general commitment to scientific collaboration through contribution of data for pooling research, but for each specific pooling project, individual cohorts decide freely whether or not they wish to participate. Membership is granted after a review of the application and a vote of the steering committee.

Annual meetings

The Consortium members meet annually in person to review progress, gain updates on ongoing and new projects, discuss new research ideas, share study results, and address methodologic challenges. The content of the annual meetings is decided by the steering committee and the practical organization is by the NCI Leadership Staff. The meetings last 2 to 3 days and include: talks by some individual working groups, reporting on their progress; talks on scientific methodology or on areas, for example, metabolomics, that have potential for use in cohort epidemiology; and interactive sessions about the working of the Consortium. In addition, the annual meetings provide a forum for individual Consortium working groups to meet to progress their pooling research. Such working group meetings may be open to all attending the Consortium meeting, or limited to cohorts participating in the working group, for instance, because confidential new results are to be shown, including cohort-specific results from cohorts that have not yet published their own data separately.

Working groups and new research projects

The collaborative research in the Consortium is conducted by investigator-led working groups, of which there are currently more than 40. The steering committee assesses proposals monthly for new projects and hence for new working groups. Proposals from nonmembers are evaluated on the same basis as those from members of the Consortium. The proposals are made on a standard form that captures the rationale, design, and proposed funding sources for the intended research, and practical details such as the minimum number of incident cancers that will be required for a cohort to join the particular pooling study. Each proposal is evaluated by the Consortium steering committee, based on the need for prospectively collected data/samples pooled from multiple cohorts, whether there is overlap or duplication of efforts by existing working groups, and the project's potential to make a novel contribution to scientific research and public health. Initial appraisal frequently leads to a request to the proposer for clarification or further information, with the intention to improve the clarity of the proposal to individual cohort members of the consortium and hence improve the chance that these cohorts will take part. The great majority of proposals are either then accepted by the steering committee to be disseminated to members as new working groups, or directed toward joining an existing working group as a subproject. Although approval by the steering committee is required for a new working group to be formed, decisions on subsequent research or spin-off projects within an existing working group are the responsibility of the group itself.

Once a working group is formed, participation of cohorts in the group is solicited by an email from the steering committee to members, and then direct communication by the lead investigator. The great majority of working groups go on to conduct and publish research successfully, but occasionally one does not, because the research proves to be infeasible (e.g., there are fewer cases than had been anticipated), or too few investigators choose to join, or the researchers have not been able to obtain funding.

Communications

A growing, large-scale international consortium of this kind requires effective and efficient bidirectional communication to maximize collaborations and productivity. A newsletter, including information on new projects, is sent to Consortium members monthly. Webinars are used to host virtual meetings of working groups, as well as in conjunction with the in-person annual meetings to include members who cannot attend. Working groups provide regular progress updates on their projects to the steering committee in conference calls, to discuss accomplishments, challenges, lessons learned and suggestions to improve the working of the consortium. They also share their study results through oral and poster presentations at the annual meeting.

A secure portal, with access limited to consortium members, has been created to foster collaboration and information sharing. The portal serves as a repository and archive for all consortium related activities including concept proposals, operational guidelines, historical documents and best practice documents, and is used by the scientific working groups for research activities. Individual working groups and projects are able to set up private work spaces within the portal with access limited to members of that working group. Several, but not all, working groups have used the portal in varying capacities for sharing updates, study policies and protocols, data, and manuscript versions, among their members.

Overview

Most Consortium working groups have focussed on specific cancer outcomes (e.g., pancreatic cancer or ovarian cancer), whereas others have focused on particular exposures (e.g., diabetes or alcohol), or a combination of the two (e.g., circulating carotenoids and breast cancer risk, or vitamin D and risk of rare cancers). Others have related to particular ethnic groups, notably the African American BMI and mortality pooling project, which includes over 200,000 African Americans from seven cohorts. Several working groups (including five at present) have conducted GWAS. A small number have addressed rare cancers, for instance, male breast cancer and renal cell cancers, and a few have investigated causes of cancer mortality or general mortality.

Working groups are encouraged to remain open to new members joining provided that the new cohorts meet the minimum analysis-specific criteria: Currently, 29 of the working groups are open to new members. Reasons why working groups may at some point elect to close to new members are: that they were formed for a particular project and are now completing existing analyses before winding up; that their funding (especially for laboratory assays) will not cover further cohorts joining; or that adding further cohorts, and the data transfer agreements and data harmonization entailed, would seriously delay analyses already well underway.

Operations and organization

The organization of individual projects and working groups varies. Most are led by the proposing investigator(s). Sometimes a steering committee is formed to manage activities and help with decision-making. The structure of working groups has varied widely. Some, for instance, have extensive written ground rules, publication guidelines, etc., whereas other are informal with no written rules. Some have continued for 10 years or more, accruing new analyses and purposes over time, whereas others have been formed for and conducted a specific investigation (e.g., a particular assay), published it, and disbanded. It is a strength of the Consortium that the collaborative arangements for working groups are flexible and in the hands of the members of each working group; additionally, because most cohorts are members of many working groups, there is a great deal of expertise and experience from previous working groups to be drawn upon when creating new ones.

The Consortium supports the development of the next generation of cancer epidemiology researchers by encouraging junior investigators to assume leadership and other active roles in managing Consortium studies, for instance, as leaders of spin-off projects. Funding for projects varies, as described below.

Details of the working groups and their accomplishments can be found on the Consortium's website (https://epi.grants.cancer.gov/Consortia/members/#members).

The Consortium steering committee and NCI provide overall support to the working groups by fostering communication, providing networking opportunities, and providing limited administrative resources. To facilitate data sharing and data harmonization, the NCI has funded the Cancer Epidemiology Descriptive Cohort Database (CEDCD; https://cedcd.nci.nih.gov/), and two data repositories, the Cohort Metadata Repository (CMR; https://cmr.nci.nih.gov/) and the Cancer Epidemiology Data Repository (CEDR; https://epi.grants.cancer.gov/CEDR/), described in Table 1. The CEDCD allows investigators to search for the types of data and biospecimen that were collected by each cohort study, numbers of cohort participants and numbers of incident cancers, in order that investigators planning a consortium project can determine which cohorts have data on the specific variables of interest and potential numbers of subjects and cancers that might be available.

Table 1.

NCI Cohort Consortium: cancer cohort databases and data repositories

Database/repositoryAcronymAccessDescription
Cancer Epidemiology Descriptive Cohort Database CEDCD Public use Searchable database that contains general descriptive information about cohort studies studying cancer incidence and mortality, e.g., type of data collected at baseline, numbers of incident cancers by site, and biospecimen information. 
Cohort Metadata Repository CMR Restricted usea Searchable database that documents data harmonization efforts across cohorts. Researchers can view metadata (variable names, formats, codes, descriptions) across cohorts, and view harmonized variables from specific projects and the specifications used to create them. 
Cancer Epidemiology Data Repository CEDR Controlled accessa Investigators will be able to deposit individual-level deidentified nongenomic data from observational cancer epidemiology datasets. Datasets can be updated with follow-up information on cancer incidence and mortality and could be individually linked to genomic data in NIH's database of Genotypes and Phenotypes (dbGaP). 
Database/repositoryAcronymAccessDescription
Cancer Epidemiology Descriptive Cohort Database CEDCD Public use Searchable database that contains general descriptive information about cohort studies studying cancer incidence and mortality, e.g., type of data collected at baseline, numbers of incident cancers by site, and biospecimen information. 
Cohort Metadata Repository CMR Restricted usea Searchable database that documents data harmonization efforts across cohorts. Researchers can view metadata (variable names, formats, codes, descriptions) across cohorts, and view harmonized variables from specific projects and the specifications used to create them. 
Cancer Epidemiology Data Repository CEDR Controlled accessa Investigators will be able to deposit individual-level deidentified nongenomic data from observational cancer epidemiology datasets. Datasets can be updated with follow-up information on cancer incidence and mortality and could be individually linked to genomic data in NIH's database of Genotypes and Phenotypes (dbGaP). 

aBecause the Consortium is a voluntary pooling by cohorts from many countries that need to meet their local legal, ethics, and funder-mandated policies on subject privacy and data sharing, as well as meeting the signed consent and participant information constraints of their cohort-specific recruitment, these consortial databases are restricted or have controlled access.

The CMR is a tool that documents data harmonization processes, decisions, and harmonized variables across cohorts that are participating in Consortium studies. It does not include individual-level data. Researchers interested in conducting pooled analyses in the consortium can view these metadata, including harmonized variables from specific projects and the specifications used to create them, to determine whether they could use already-harmonized datasets for their analyses instead of undertaking a separate time-consuming harmonization effort.

The CEDR is a controlled-access database developed to enable sharing of actual research data while protecting the privacy of research participants. Researchers can deposit, access, and analyze a variety of individual-level deidentified data, ensuring that their use aligns with specific data use agreements and informed consent for each study.

The cohorts taking part in the Consortium are shown in Table 2. They are contributing data from more than nine million study participants, and biospecimens, including germline DNA collected at baseline, from approximately two million participants.

Table 2.

Cohorts taking part in the NCI Cohort Consortium

Cohort nameDesignLocationPopulation typeSizeAge range at recruitmentSexYears of recruitmentFollow-up type
Adventist Health Study-2 Cohort USA and Canada Members of the Adventist church 96,000 30–100+ M + F 2002–2007 Repeated questionnaires and record linkage 
Agricultural Health Study Cohort IA and NC, USA Licensed pesticide applicators and their spouses  89,656  20–77 M + F 1993–1997  Questionnaires (mail or telephone) and record linkage 
Alberta's Tomorrow Projecta Cohort Alberta, Canada General population 55,000 35–69 M + F 2001–2015 Questionnaire and record linkage 
Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study Follow-up of an RCT Southwest Finland Male smokers 29,133 50–69 1985–1988 Record linkage 
Atherosclerosis Risk in Communities Study — Cancer Cohort Four locations, USA General population 15,792 45–64 M + F 1987–1989 Repeat reexaminations plus by telephone 
Atlantic PATHa Cohort New Brunswick, Newfoundland and Labrador, Nova Scotia and Prince Edward Island, Canada  General population 35,471 18–78 M + F 2009–2015 Repeated questionnaires and record linkage 
Black Women's Health Study Cohort USA Black women 59,000 21–69 1995 Repeated mailed questionnaires 
Breakthrough Generations Study Cohort UK General population 113,000 16–102 2003–2015 Repeated mailed and online questionnaires plus record linkage 
Breast Cancer Detection Demonstration Project Follow-up Study Follow-up of breast screening program participants 27 cities, USA Selected participants from breast cancer screening program 64,182 40–79 1980 Repeated telephone and mailed questionnaires 
Breast Cancer Family Registry Cohort Cohort San Francisco area, New York City, Philadelphia, and Salt Lake City, USA; Ontario, Canada; Melbourne and Sydney, Australia Multigenerational families 37,776  18+ M + F 1996–2012 Repeated questionnaires (mail and telephone) and record linkage 
Breast Cancer Surveillance Consortium Follow-up of breast imaging participants Several locations, USA Breast cancer imaging participants >2.9 million 18+ 1994–2014 Record linkage 
British Columbia Generations Projecta Cohort British Columbia General population 30,000 35–69 M + F 2009–2015 Repeated questionnaires and record linkage 
California Teachers Study Cohort CA, USA California public school teachers, administrators, and other school professionals 133,479 22–104 1995–1996 Repeated mailed questionnaires 
Canadian Study of Diet, Lifestyle, and Health Cohort Canada Alumni from Universities of Alberta, Toronto, and Western Ontario 73,909 21+ M + F 1992–1998 Record linkage 
Canadian Partnership for Tomorrow Project Cohort Canada Aggregation of five regional cohorts—BC Generations, Alberta's Tomorrow, Ontario Health Study, Quebec's CARTaGENE, and Atlantic PATH – already separately members of the Consortium 300,000 30–74 M + F 2001–2017 Repeated questionnaires and record linkage 
Cancer Prevention Study-II (CPS II) Cohort All U.S. states, plus Puerto Rico General population 1,185,106 30–111 M + F 1982–1983 Record linkage for cause of death 
CPS II Nutrition Cohortb Cohort 21 U.S. states General population (subset of CPS II) 184,194 40–92 M + F 1992–1993 Repeated questionnaires and record linkage 
Carotene and Retinol Efficacy Trial Follow-up of an RCT Several locations, USA Clinical trial of beta carotene and retinyl palmitate; persons at high risk of lung cancer 18,314 45–74 M + F 1985–1994 Mailed questionnaires and telephone calls to 2005. Record linkage thereafter 
CARTaGENE Cohort and Biobanka Cohort Quebec, Canada General population 43,068 40–69 M + F 2009–2015 Repeated questionnaires and record linkage 
Campaign against Cancer and Stroke Study (CLUE I) Cohort Washington County, MD, USA General population 25,802 12–97 M + F 1974 Record linkage and for about a third, repeated mailed questionnaires 
Campaign against Cancer and Heart Disease Study (CLUE II) Cohort Washington County, MD, USA General population 32,898 2–102 M + F 1989 Repeated mailed questionnaires and record linkage 
Cohort of Swedish Men Cohort Central Sweden General population 48,850 45–79 1997 Questionnaires and record linkage 
Colon Cancer Family Registry Cohort Follow-up of case-family cohort All or part of U.S. states: AZ, CA, CO, HI, MN, NC, NH, OH, WA, plus Australia, New Zealand, and Ontario, Canada Colorectal cancer cases (ascertainment through cancer registries and high-risk clinics) and their family members 38,000  18–75 M + F 1997–2012 Repeated questionnaires and record linkage 
European Prospective Investigation into Cancer and Nutrition Collaboration of cohorts 22 centers in Denmark, France, Germany, Greece, Italy, Netherlands, Norway, Spain, Sweden, and the UK. Mainly general population but other sources at 4 of the 22 centers 519,978 25–70 M + F 1992–1998 Record linkage in most centers plus repeat questionnaires 
General Cohort of Adults in Norway Collaboration of 10 cohorts (including Tromsø, HUNT, HUSK, and HUBRO studies) Several locations in Norway Mainly general population 185,000 20–103 M + F 1994–2008 Record linkage, plus repeat visits by some participants 
Generation Scotland: Scottish Family Health Study Cohort Scotland General population, with at least one first-degree relative recruit per proband 23,960 18–98 M + F 2006–2011 Record linkage and limited recontact 
Golestan Cohort Study Cohort Golestan, Iran General population 50,045 40–75 M + F 2004–2008 Yearly by phone or in person 
Health Professionals Follow-up Study Cohort USA Health professionals, mainly dentists and veterinarians 51,529 40–75 1986 Repeated questionnaires 
Iowa Women's Health Study Cohort IA, USA General population 41,836 55–69 1986 Mailed questionnaires, and record linkage 
Janus Serum Bank Cohort Norway Mainly general population; ∼10% blood donors 318,628 18–68 M + F 1972–2004 Record linkage 
Mayo Mammography Health Study Cohort Rochester, MN, USA Breast cancer screening practice 19,923 35–90 2003–2006 Mailed questionnaires and record linkage 
Melbourne Collaborative Cohort Study Cohort Melbourne, Australia General population, with Greek and Italian ethnicities over-represented 41,500 40–69 M + F 1990–1994 Record linkage and repeated questionnaires 
Mexican American (Mano a Mano) Cohort Cohort TX, USA General population—Mexican Americans 24, 460 35–75 M + F 2001–2014 Telephone, self-reports, and record linkage 
Mexican Teachers' Cohort Cohort 12 states in Mexico Mexican teachers 115,315 + 2,160  25–82 Largely F (2% M) 2006–2013 Repeated questionnaires and record linkage 
Millennium Cohort Study Cohort USA U.S. military service personnel and spouses 201,620  18–68 M + F 2001–2013 Repeated questionnaires and record linkage 
Multiethnic Cohort Study (MEC) Cohort HI and CA, USA General population, stratified by ethnicity 215,251  45–75 M + F 1993–1996 Repeated questionnaires and record linkage 
NIH-AARP Study Cohort Six states and two metropolitan areas, USA Members of American Association of Retired Persons 566,401 50–71 M + F 1995–1996 Repeated questionnaires and record linkage 
Netherlands Cohort Study Cohort Netherlands General population 120,852 55–69 M + F 1986 Record linkage 
Northern Sweden Health and Disease Study Cohort Västerbotten and Norrbotten county, northern Sweden General population  140,000  25–70 M + F 1985 → Repeated questionnaires and record linkage 
Nurses Health Study I Cohort 11 U.S. states U.S. nurses 121,700 30–55 1976 Repeated questionnaires 
Nurses Health Study II Cohort 14 U.S. states U.S. nurses 116,430 25–42 1989 Repeated questionnaires 
Nutrition Intervention Trials — Linxian Follow-up of randomized trial Linxian, China Participants in a nutrition intervention trial in high-risk area 32,887 25–81 M + F 1985–86 Record linkage and subsample questionnaire 
NYU Womens Health Study Cohort New York City, USA Breast cancer screening clinic 14,274 34–65 1985–1991 Repeated questionnaires and record linkage 
Ontario Health Studya Cohort Ontario, Canada General population 228,611 18 and older M + F 2010–17 Repeated questionnaires and record linkage 
Physicians' Health Study I Follow-up of RCT USA Physicians 22,071 40–84 1982–1983 Repeated questionnaires and record linkage 
Physicians' Health Study II Follow-up of RCT USA Physicians 14,641 50+ 1997–2000 Repeated questionnaires and record linkage 
PONS Polish Cohort Study Cohort Kielce and surrounding area, Southeast Poland General population 13,172 45–64 M + F 2010–2011 Repeated questionnaires, record linkage, and clinical assessments 
Prostate Cancer Prevention Trial Follow-up of RCT USA and Canada General population 18,882 55+ 1993–1997 Clinic visits and questionnaires until March 2009; subsequently record linkage 
PLCO Follow-up of randomized screening trial 10 sites in USA Men and women with no previous diagnosis of prostate, lung, colorectal, ovarian cancers enrolled at screening centers 154,935 55–74 M + F 1993–2001 Questionnaire, and record linkage 
RERF Life Span Study Cohort Hiroshima and Nagasaki, Japan Distally exposed A-bomb survivors and unexposed individuals with questionnaire data 53,000c All ages M + F 1950, 1958 Record linkage, clinical assessments, and repeated questionnaires 
Selenium and Vitamin E Cancer Prevention Trial Follow-up of RCT USA, Puerto Rico, Canada General population 34,887 50–100+ 2001–2004 Clinic visits and questionnaires until September 2012; subsequently record linkage 
Shanghai Cohort Study Cohort Defined geographic areas of Shanghai, China General population 18,244 45–64 1986–1989 Annual visits to known surviving cohort members 
Shanghai Men's Health Study Cohort District of Shanghai, China General population 61,480 40–74 2002–2006 Record linkage, biennial in-person surveys 
Shanghai Women's Health Study  Cohort Defined geographic areas of Shanghai, China General population 74,942  40–70 1997–2000 In-person surveys and record linkage 
Singapore Chinese Health Study Cohort Residents of public housing estates, Singapore General population, restricted to the two major dialect groups, Hokkien and Cantonese 63,257 45–74 M + F 1993–1998 Telephone interviews and record linkage 
Sister Study Cohort USA and Puerto Rico Sisters of women with breast cancer 50,884 35–74 2003–2009 Repeated questionnaires and medical records 
Southern Community Cohort Study Cohort 12 southeastern states of USA General population and enrollees at community health centers 85,000 40–79 M + F 2002–2009 Repeated questionnaires, telephone follow-up, and record linkage 
Swedish Mammography Cohort Cohort Västmanland and Uppsala counties, central Sweden Breast screening program 59,036 40–76 1987–1990 Repeated questionnaires and record linkage 
Swedish National March Cohort Cohort 3,600 cities and villages across Sweden Attended national (fund raising) march 43,880 8–94 M + F 1997 Record linkage 
US Radiologic Technologists Cohort Cohort USA Radiologic technologists (current and former) 146,022 22–90 M + F 1982–2014 Repeated questionnaires 
VITamins And Lifestyle Cohort 13 counties in western WA, USA General population 77,738 50–76 M + F 2000–2002 Record linkage 
Women's Health Initiative (WHI) Follow-up of RCT and observational cohort 40 clinical centers in USA Postmenopausal women from the general population 161,808 50–79 1994–1998 Repeated questionnaires and record linkage 
WHI Cancer Survivor Cohortb Follow-up of RCT and observational cohort Clinical centers in USA General population (subset of WHI) 14,000 50–100+ 2013–2017 Record linkage 
Women's Health Study Follow-up of RCT USA Health professionals 39,876 45–100+ 1993–1996 Repeated questionnaires and record linkage 
Women's Lifestyle and Healthd Cohort Uppsala, Sweden and Norway General population 96,541 30–49 1991–1992 Questionnaire and record linkage 
Cohort nameDesignLocationPopulation typeSizeAge range at recruitmentSexYears of recruitmentFollow-up type
Adventist Health Study-2 Cohort USA and Canada Members of the Adventist church 96,000 30–100+ M + F 2002–2007 Repeated questionnaires and record linkage 
Agricultural Health Study Cohort IA and NC, USA Licensed pesticide applicators and their spouses  89,656  20–77 M + F 1993–1997  Questionnaires (mail or telephone) and record linkage 
Alberta's Tomorrow Projecta Cohort Alberta, Canada General population 55,000 35–69 M + F 2001–2015 Questionnaire and record linkage 
Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study Follow-up of an RCT Southwest Finland Male smokers 29,133 50–69 1985–1988 Record linkage 
Atherosclerosis Risk in Communities Study — Cancer Cohort Four locations, USA General population 15,792 45–64 M + F 1987–1989 Repeat reexaminations plus by telephone 
Atlantic PATHa Cohort New Brunswick, Newfoundland and Labrador, Nova Scotia and Prince Edward Island, Canada  General population 35,471 18–78 M + F 2009–2015 Repeated questionnaires and record linkage 
Black Women's Health Study Cohort USA Black women 59,000 21–69 1995 Repeated mailed questionnaires 
Breakthrough Generations Study Cohort UK General population 113,000 16–102 2003–2015 Repeated mailed and online questionnaires plus record linkage 
Breast Cancer Detection Demonstration Project Follow-up Study Follow-up of breast screening program participants 27 cities, USA Selected participants from breast cancer screening program 64,182 40–79 1980 Repeated telephone and mailed questionnaires 
Breast Cancer Family Registry Cohort Cohort San Francisco area, New York City, Philadelphia, and Salt Lake City, USA; Ontario, Canada; Melbourne and Sydney, Australia Multigenerational families 37,776  18+ M + F 1996–2012 Repeated questionnaires (mail and telephone) and record linkage 
Breast Cancer Surveillance Consortium Follow-up of breast imaging participants Several locations, USA Breast cancer imaging participants >2.9 million 18+ 1994–2014 Record linkage 
British Columbia Generations Projecta Cohort British Columbia General population 30,000 35–69 M + F 2009–2015 Repeated questionnaires and record linkage 
California Teachers Study Cohort CA, USA California public school teachers, administrators, and other school professionals 133,479 22–104 1995–1996 Repeated mailed questionnaires 
Canadian Study of Diet, Lifestyle, and Health Cohort Canada Alumni from Universities of Alberta, Toronto, and Western Ontario 73,909 21+ M + F 1992–1998 Record linkage 
Canadian Partnership for Tomorrow Project Cohort Canada Aggregation of five regional cohorts—BC Generations, Alberta's Tomorrow, Ontario Health Study, Quebec's CARTaGENE, and Atlantic PATH – already separately members of the Consortium 300,000 30–74 M + F 2001–2017 Repeated questionnaires and record linkage 
Cancer Prevention Study-II (CPS II) Cohort All U.S. states, plus Puerto Rico General population 1,185,106 30–111 M + F 1982–1983 Record linkage for cause of death 
CPS II Nutrition Cohortb Cohort 21 U.S. states General population (subset of CPS II) 184,194 40–92 M + F 1992–1993 Repeated questionnaires and record linkage 
Carotene and Retinol Efficacy Trial Follow-up of an RCT Several locations, USA Clinical trial of beta carotene and retinyl palmitate; persons at high risk of lung cancer 18,314 45–74 M + F 1985–1994 Mailed questionnaires and telephone calls to 2005. Record linkage thereafter 
CARTaGENE Cohort and Biobanka Cohort Quebec, Canada General population 43,068 40–69 M + F 2009–2015 Repeated questionnaires and record linkage 
Campaign against Cancer and Stroke Study (CLUE I) Cohort Washington County, MD, USA General population 25,802 12–97 M + F 1974 Record linkage and for about a third, repeated mailed questionnaires 
Campaign against Cancer and Heart Disease Study (CLUE II) Cohort Washington County, MD, USA General population 32,898 2–102 M + F 1989 Repeated mailed questionnaires and record linkage 
Cohort of Swedish Men Cohort Central Sweden General population 48,850 45–79 1997 Questionnaires and record linkage 
Colon Cancer Family Registry Cohort Follow-up of case-family cohort All or part of U.S. states: AZ, CA, CO, HI, MN, NC, NH, OH, WA, plus Australia, New Zealand, and Ontario, Canada Colorectal cancer cases (ascertainment through cancer registries and high-risk clinics) and their family members 38,000  18–75 M + F 1997–2012 Repeated questionnaires and record linkage 
European Prospective Investigation into Cancer and Nutrition Collaboration of cohorts 22 centers in Denmark, France, Germany, Greece, Italy, Netherlands, Norway, Spain, Sweden, and the UK. Mainly general population but other sources at 4 of the 22 centers 519,978 25–70 M + F 1992–1998 Record linkage in most centers plus repeat questionnaires 
General Cohort of Adults in Norway Collaboration of 10 cohorts (including Tromsø, HUNT, HUSK, and HUBRO studies) Several locations in Norway Mainly general population 185,000 20–103 M + F 1994–2008 Record linkage, plus repeat visits by some participants 
Generation Scotland: Scottish Family Health Study Cohort Scotland General population, with at least one first-degree relative recruit per proband 23,960 18–98 M + F 2006–2011 Record linkage and limited recontact 
Golestan Cohort Study Cohort Golestan, Iran General population 50,045 40–75 M + F 2004–2008 Yearly by phone or in person 
Health Professionals Follow-up Study Cohort USA Health professionals, mainly dentists and veterinarians 51,529 40–75 1986 Repeated questionnaires 
Iowa Women's Health Study Cohort IA, USA General population 41,836 55–69 1986 Mailed questionnaires, and record linkage 
Janus Serum Bank Cohort Norway Mainly general population; ∼10% blood donors 318,628 18–68 M + F 1972–2004 Record linkage 
Mayo Mammography Health Study Cohort Rochester, MN, USA Breast cancer screening practice 19,923 35–90 2003–2006 Mailed questionnaires and record linkage 
Melbourne Collaborative Cohort Study Cohort Melbourne, Australia General population, with Greek and Italian ethnicities over-represented 41,500 40–69 M + F 1990–1994 Record linkage and repeated questionnaires 
Mexican American (Mano a Mano) Cohort Cohort TX, USA General population—Mexican Americans 24, 460 35–75 M + F 2001–2014 Telephone, self-reports, and record linkage 
Mexican Teachers' Cohort Cohort 12 states in Mexico Mexican teachers 115,315 + 2,160  25–82 Largely F (2% M) 2006–2013 Repeated questionnaires and record linkage 
Millennium Cohort Study Cohort USA U.S. military service personnel and spouses 201,620  18–68 M + F 2001–2013 Repeated questionnaires and record linkage 
Multiethnic Cohort Study (MEC) Cohort HI and CA, USA General population, stratified by ethnicity 215,251  45–75 M + F 1993–1996 Repeated questionnaires and record linkage 
NIH-AARP Study Cohort Six states and two metropolitan areas, USA Members of American Association of Retired Persons 566,401 50–71 M + F 1995–1996 Repeated questionnaires and record linkage 
Netherlands Cohort Study Cohort Netherlands General population 120,852 55–69 M + F 1986 Record linkage 
Northern Sweden Health and Disease Study Cohort Västerbotten and Norrbotten county, northern Sweden General population  140,000  25–70 M + F 1985 → Repeated questionnaires and record linkage 
Nurses Health Study I Cohort 11 U.S. states U.S. nurses 121,700 30–55 1976 Repeated questionnaires 
Nurses Health Study II Cohort 14 U.S. states U.S. nurses 116,430 25–42 1989 Repeated questionnaires 
Nutrition Intervention Trials — Linxian Follow-up of randomized trial Linxian, China Participants in a nutrition intervention trial in high-risk area 32,887 25–81 M + F 1985–86 Record linkage and subsample questionnaire 
NYU Womens Health Study Cohort New York City, USA Breast cancer screening clinic 14,274 34–65 1985–1991 Repeated questionnaires and record linkage 
Ontario Health Studya Cohort Ontario, Canada General population 228,611 18 and older M + F 2010–17 Repeated questionnaires and record linkage 
Physicians' Health Study I Follow-up of RCT USA Physicians 22,071 40–84 1982–1983 Repeated questionnaires and record linkage 
Physicians' Health Study II Follow-up of RCT USA Physicians 14,641 50+ 1997–2000 Repeated questionnaires and record linkage 
PONS Polish Cohort Study Cohort Kielce and surrounding area, Southeast Poland General population 13,172 45–64 M + F 2010–2011 Repeated questionnaires, record linkage, and clinical assessments 
Prostate Cancer Prevention Trial Follow-up of RCT USA and Canada General population 18,882 55+ 1993–1997 Clinic visits and questionnaires until March 2009; subsequently record linkage 
PLCO Follow-up of randomized screening trial 10 sites in USA Men and women with no previous diagnosis of prostate, lung, colorectal, ovarian cancers enrolled at screening centers 154,935 55–74 M + F 1993–2001 Questionnaire, and record linkage 
RERF Life Span Study Cohort Hiroshima and Nagasaki, Japan Distally exposed A-bomb survivors and unexposed individuals with questionnaire data 53,000c All ages M + F 1950, 1958 Record linkage, clinical assessments, and repeated questionnaires 
Selenium and Vitamin E Cancer Prevention Trial Follow-up of RCT USA, Puerto Rico, Canada General population 34,887 50–100+ 2001–2004 Clinic visits and questionnaires until September 2012; subsequently record linkage 
Shanghai Cohort Study Cohort Defined geographic areas of Shanghai, China General population 18,244 45–64 1986–1989 Annual visits to known surviving cohort members 
Shanghai Men's Health Study Cohort District of Shanghai, China General population 61,480 40–74 2002–2006 Record linkage, biennial in-person surveys 
Shanghai Women's Health Study  Cohort Defined geographic areas of Shanghai, China General population 74,942  40–70 1997–2000 In-person surveys and record linkage 
Singapore Chinese Health Study Cohort Residents of public housing estates, Singapore General population, restricted to the two major dialect groups, Hokkien and Cantonese 63,257 45–74 M + F 1993–1998 Telephone interviews and record linkage 
Sister Study Cohort USA and Puerto Rico Sisters of women with breast cancer 50,884 35–74 2003–2009 Repeated questionnaires and medical records 
Southern Community Cohort Study Cohort 12 southeastern states of USA General population and enrollees at community health centers 85,000 40–79 M + F 2002–2009 Repeated questionnaires, telephone follow-up, and record linkage 
Swedish Mammography Cohort Cohort Västmanland and Uppsala counties, central Sweden Breast screening program 59,036 40–76 1987–1990 Repeated questionnaires and record linkage 
Swedish National March Cohort Cohort 3,600 cities and villages across Sweden Attended national (fund raising) march 43,880 8–94 M + F 1997 Record linkage 
US Radiologic Technologists Cohort Cohort USA Radiologic technologists (current and former) 146,022 22–90 M + F 1982–2014 Repeated questionnaires 
VITamins And Lifestyle Cohort 13 counties in western WA, USA General population 77,738 50–76 M + F 2000–2002 Record linkage 
Women's Health Initiative (WHI) Follow-up of RCT and observational cohort 40 clinical centers in USA Postmenopausal women from the general population 161,808 50–79 1994–1998 Repeated questionnaires and record linkage 
WHI Cancer Survivor Cohortb Follow-up of RCT and observational cohort Clinical centers in USA General population (subset of WHI) 14,000 50–100+ 2013–2017 Record linkage 
Women's Health Study Follow-up of RCT USA Health professionals 39,876 45–100+ 1993–1996 Repeated questionnaires and record linkage 
Women's Lifestyle and Healthd Cohort Uppsala, Sweden and Norway General population 96,541 30–49 1991–1992 Questionnaire and record linkage 

Abbreviations: F, female; M, male; RCT, randomized controlled trial.

aNot included separately when counting numbers of cohorts in the text, because it is included within Canadian Partnership for Tomorrow Project in this table.

bNot included separately when counting numbers of cohorts in the text, because it is a subset of another cohort in this table.

cOverall RERF cohort is 197,000, but 53,000 of these are suitable and potentially available for consortial pooling.

dIncludes data from the Swedish Women's Lifestyle and Health Study and the Norwegian Women and Cancer Study.

The majority (60%) of the cohorts are entirely or predominantly U.S. based, two are from Canada, 13 from Europe, six from South-East Asia, and one each from Mexico, Australia, and Iran. Most (66%) studies have less than 100,000 participants (40% less than 50,000 and 14% less than 20,000), 16 have 100,000 to 300,000, and four have over 500,000 participants. The vast majority of cohorts restricted recruitment to adult ages, with 30 limited to people over age 35 years and 10 of those restricted to ages 50 and older. Three studies, all limited to women, restricted recruitment to younger participants (within the age range 25–55 years). Seventeen studies recruited only women and nine only men. Most studies were predominantly of whites.The vast majority of studies did not select on ethnic origin; exceptions were the Black Women's Health Study and Southern Community Cohort Study (exclusively and predominantly African Americans, respectively), the Mexican American (Mano a Mano) Cohort, the Multiethnic Cohort Study (oversampled on several minority groups), and the Singapore Chinese Health Study. One study, the Radiation Effects Research Foundation Life Span Study, began in the 1950s, three in the 1970s, 18 in the 1980s, 21 in the 1990s, and the remainder since 2000.

Primary objectives

Twenty-eight cohorts were established to investigate multiple causes of cancer, five of these for specific cancers only; the remainder also aimed to investigate other diseases or causes of death. Eight cohorts had as their primary aim to assess the influence of diet on cancer and/or other diseases, and 10 others focused on vitamins, minerals, or other medications as potential preventative agents for cancer. Other cohorts were established to investigate specific risk factors such as radiation, exogenous sex hormones, and genetics.

Base populations

The cohorts were recruited using a variety of sampling strategies and target populations (Table 2). The largest group of cohorts were sampled from geographic regions or countries, several were from occupational groups, and 10 were leveraged from established cancer or cardiovascular randomized clinical or screening trials, by extending follow-up and collecting new exposures and outcomes. To enrich for cancer risk, four cohorts sampled high-risk families or siblings of cancer cases, two enrolled people with precursor conditions, and four sampled from people with known risk factors for cancer. Four cohorts were established from breast screening services.

Outcome ascertainment

All cohorts in the Consortium follow participants for cancer incidence, with the majority linking to cancer registries and/or using regular follow-up questionnaires or telephone calls. Self-reported cancers are generally validated through medical or pathology record review. To follow for overall mortality or cause of death, some cohorts link to existing data sources such as national, state, or county death registries, or medical records; some also use active follow-up for these purposes. Several of the U.S. cohorts including the VITamins and Lifestyle Study (Washington), the California Teachers Study, and the Multiethnic Cohort Study (Hawaii and California), purposely sampled regions covered by the NCI Surveillance, Epidemiology, and End Results Program and other U.S. population-based cancer registries, in order to obtain cancer incidence and survival data. Noncancer and nondeath outcomes are ascertained via linkages to administrative databases (e.g., for hospital discharges and outpatients, and military records), as well as through direct contact with study participants: for instance, the three Shanghai cohorts schedule ongoing in-person visits with cohort participants to obtain repeat measurements and health status over time. Half of the cohorts have gained further data directly from subjects by repeat questionnaires or in person visits, although the frequency varies greatly. Most cohorts collected blood samples from at least a subset of participants either at recruitment, or less often later. Several cohorts have collected tumor samples, for one or more cancer sites, for cancers incident in their cohort.

Further details and contact information about the cohorts in the Consortium can be found on the website (https://epi.grants.cancer.gov/Consortia/members/#members) and in the Cancer Epidemiology Descriptive Cohort Database (https://cedcd.nci.nih.gov/).

Assembling a new pooling project is a complex and time-consuming endeavor. Consortium investigators have gained valuable experience in organizing such pooling successfully.

Data acquisition and harmonization

For most analyses conducted within the Consortium, gathering the necessary data from the participating cohorts, harmonizing and analyzing them, has been done at one or two centers. Although the NCI has on occasion conducted harmonization of the database and/or the statistical analyses for specific projects, these tasks have mostly been carried out by members of the working groups themselves, with the procedures then re-used for any further pooled projects within the working group. The Consortium has not in the past had a central data repository; members have generally preferred to provide data sets from their cohorts to the team conducting the analysis, rather than to a central entity. However, NCI has recently instituted a controlled-access repository, the CEDR, described above, where investigators can deposit individual-level deidentified data, to make their data more readily accessible by others and avoid having to respond to repeated data requests.

Data harmonization is frequently highlighted by Consortium investigators as a major bottleneck, and one for which the workload tends to be underestimated. Questions and response categories for a particular exposure often differ between cohorts, and when harmonizing the data, the exposure categories may have to be limited to fewer categories in common.

The ease or difficulty of harmonization relates closely to the complexity, and degree of variety between studies, in the questions about, as well as the recording of, the risk factors. Relatively simple risk factors such as height and weight, tend to be relatively easy to harmonize, but even these can be problematic, for example weights can be at different ages and because recording may have grouped weight into different, potentially incompatible, categories. More-complex variables such as diet and exercise have been more difficult, but nevertheless have been harmonized successfully, for instance, for exercise by converting the questionnaire responses in each study to metabolic equivalent of tasks (MET). Socioeconomic variables can be difficult because they can be based on very different systems (e.g., salary, education level, place of residence) in different countries. Serial exposures from baseline plus follow-up questionnaires can also be very difficult to harmonize because they may reveal inconsistences between the serial responses. Age at menopause is exceptionally problematic to gain unambiguous data about, even within a single study and the more so for harmonization.

New working groups have often harmonized data ab initio, in part because a different set of cohorts may be included than in previous working groups, and new analyses may require new algorithms and variables. To try to avoid each new Consortium working group having to reharmonize the same cohorts' data for each new project, the NCI developed the Cancer Metadata Repository described above. In 2013, the NCI supported a comprehensive harmonization of a large number of commonly used study variables for cohorts who chose to participate in the Diabetes and Cancer Initiative (n currently = 28) and the code book from this harmonization is available in the CMR for other investigators. Greater use of such previously developed study dictionaries and harmonization codes has the potential to speed up substantially the assembly of future pooled databases. The use of analytic platforms specifically adapted to harmonize and analyze epidemiologic data from multiple sources, such as the Maelstrom research open-source software, (20), may further facilitate this process.

Legal issues

Over time, the legal constraints on data and material transfers have become more stringent. This has resulted in more comprehensive and complex legal agreements (data and material transfer agreements) that stipulate the rules under which transfers of data or materials from individual cohorts are done. These agreements can be arduous and time-consuming to establish even for simple bilateral collaborations, but they can lead to considerable delays for consortium projects involving dozens of institutions, each with different data sharing policies and often different national regulations, and with limitations imposed by different funders and employers. Although the Consortium steering committee has established a pro forma template for data transfers, it remains to be determined whether this will help facilitate future Consortium projects. Initiating the establishment of the necessary data and material sharing agreements as early as possible in a new consortium study is key to avoiding downstream delays.

Governance and coordination of individual consortium projects

There are no rules as to how individual pooling consortia assembled under the Consortium umbrella are governed. As noted above, some have been organized by an individual research group, others have a steering committee or multiple research groups leading different tasks or analyses. A common feature is a strong involvement of all investigators who wish to, in directing the research, such that in practice governance is by the participating cohorts, even though one or two groups may be central.

The collaborative nature of Consortium studies is also reflected in how scientific contributions are recognized in the authorship of resulting publications, typically with one to three coauthors from each participating cohort, and sometimes with a writing group of a few consortium members who have undertaken initial data interpretation and drafting of a manuscript.

Financial aspects

Obtaining sustained funding to maintain existing cohorts can itself be a major challenge, but without this, cohorts have limited capacity to contribute to a single, let alone multiple, consortial projects.

Setting up a new Consortium project also involves costs, both for the coordinating group and to a lesser extent for each participating cohort. Although “data only” studies have sometimes been realized on a relatively slim or no budget by leveraging existing resources, studies involving assays of biological samples are inevitably costly. Retrieving biospecimens from cohort biobanks, preparation of aliquots, shipping, and performing the relevant assays is expensive, and biomarker-based consortium studies have typically required substantial grants.

When grants are obtained, these frequently include a small amount of support to each participating cohort, for instance, for preparation of the study database. NCI has developed and funded certain targetted initiatives within the Cohort Consortium to address NCI high programmatic and scientific research needs, but usually funding for projects has been obtained from investigator-initiated grants, from NCI or other sources. In the 171 published Consortium articles that stated funding sources, all but two cited the NCI as a source (but this can include support of individual cohorts, not just the overall pooling) and 30 cited other NIH Institutes (Bethesda, MD), especially the National Institute on Aging (22) and the National Institute of Environmental Health Sciences (14), whereas the American Cancer Society contributed to funding of 43. Three quarters of the papers cited funding solely from the United States, and a tenth by three or more countries.

The Consortium working groups have published 188 articles since the Consortium began in 2003 (https://epi.grants.cancer.gov/Consortia/publications.html). These articles are well cited, with an average of nine citations each per year. Consortium articles have, on average, been cited over three times as often per year as the average NIH-funded article in their field. They have particularly contributed to the role of common genetic variants in risk of various cancers (14–19, 21), a greater understanding of multifactorial contributors to common cancers (22, 23), understanding of the etiology of several less common cancers or cancer subtypes (24–27), analyses with sample sizes sufficient to investigate risk factors for cancer in African Americans (28), and improved understanding of the shape of dose response for risk factors for all-cause and cancer-specific mortality (23, 27–29).

The vast majority of the articles [159 (85%)] were on cancer epidemiology (Table 3), mainly focusing on associations between genetic (51%) or lifestyle/anthropometric (27%) risk factors and cancer incidence (155) or less often for cancer survival (4) or cancer mortality (2). These were mainly of nested case–control or cohort design, but in some instances also included non-cohort-based case–control studies to maximize numbers, for instance for genetic analyses (Table 4). The remainder of the articles covered epidemiology of other outcomes including mortality (most commonly all-cause mortality; 4%), biomarkers (2%), mechanisms including DNA methylation and mosaicism (3%), and methodology for pooling projects (7%).

Table 3.

Types of cancer and risk factor under study in the NCI Cohort Consortium cancer publications

Risk factors examined
Lifestyle factors
Cancer siteNo. of publicationsAverage no. of studiesAverage no. of eventsGeneticAnthropometryaBiomarkerbDietTobaccoAlcoholDiabetesPhysical activityReproductiveMultiple risk factorscLifestyle total
Any 159 10.1 7,026 94 11 24 14 41d 
Breast 42e,f 7.8 7,921 31 11 
Prostate 35f 7.0 7,822 34 
Pancreas 29 16.0 3,013 16 
Endometrium 14 12.5 4,830 
Liver 7.7 743 
Colorectum 10.0 4,198 
Multiple myeloma 10.5 750 
Esophagus 3.5 5,331 
Glioma, brain 10.5 1,203 
Ovary 12.7 2,691 
Many types (>5) 16.5 74,399 
Thyroid 9.2 1,493 
Renal cell 12.0 2,625 
Head and neck 10.5 2,199 
Anogenital 1.0 400 
Chronic lymphocytic leukemia 22.0 2,343 
Kidney 8.0 775 
Lung 1.0 899 
NHL 10.0 1,353 
Upper gastrointestinal 8.0 1,065 
Risk factors examined
Lifestyle factors
Cancer siteNo. of publicationsAverage no. of studiesAverage no. of eventsGeneticAnthropometryaBiomarkerbDietTobaccoAlcoholDiabetesPhysical activityReproductiveMultiple risk factorscLifestyle total
Any 159 10.1 7,026 94 11 24 14 41d 
Breast 42e,f 7.8 7,921 31 11 
Prostate 35f 7.0 7,822 34 
Pancreas 29 16.0 3,013 16 
Endometrium 14 12.5 4,830 
Liver 7.7 743 
Colorectum 10.0 4,198 
Multiple myeloma 10.5 750 
Esophagus 3.5 5,331 
Glioma, brain 10.5 1,203 
Ovary 12.7 2,691 
Many types (>5) 16.5 74,399 
Thyroid 9.2 1,493 
Renal cell 12.0 2,625 
Head and neck 10.5 2,199 
Anogenital 1.0 400 
Chronic lymphocytic leukemia 22.0 2,343 
Kidney 8.0 775 
Lung 1.0 899 
NHL 10.0 1,353 
Upper gastrointestinal 8.0 1,065 

Abbreviation: NHL, non-Hodgkin lymphoma.

aBody mass index, height, and waist circumference.

bIncludes circulating IGF, sex hormones, vitamin D, and carotenoid levels.

cIn risk model validation analyses.

dIncludes also one publication on nonsteroidal anti-inflammatory drugs and risk of liver cancer.

eIncludes 4 males and 38 females.

fIncludes five publications that included a breast cancer and prostate cancer investigation.

Table 4.

Study analysis design by cancer site and risk factor in the NCI Cohort Consortium publications

Type of pooled analysis
CohortNested case–controlMixed, nested, and non–cohort-based case–controlTotal
Cancer site 
 Breast 35 42 
 Prostate 31 35 
 Pancreas 16 29 
 Endometrium 14 
 Liver 
 Colorectum 
 Multiple myeloma 
 Esophagus 
 Glioma, brain 
 Ovary 
 Many types (>5) 
 Thyroid 
 Renal cell 
 Head and neck 
 Anogenital 
 Chronic lymphocytic leukemia 
 Kidney 
 Lung 
 NHL 
 Upper gastrointestinal 
Risk factor 
 Genetics 79 94 
 Anthropometry 11 
 Lifestyle 23 10 41 
 Biomarkers 24 24 
Total 36 103 20 159 
Type of pooled analysis
CohortNested case–controlMixed, nested, and non–cohort-based case–controlTotal
Cancer site 
 Breast 35 42 
 Prostate 31 35 
 Pancreas 16 29 
 Endometrium 14 
 Liver 
 Colorectum 
 Multiple myeloma 
 Esophagus 
 Glioma, brain 
 Ovary 
 Many types (>5) 
 Thyroid 
 Renal cell 
 Head and neck 
 Anogenital 
 Chronic lymphocytic leukemia 
 Kidney 
 Lung 
 NHL 
 Upper gastrointestinal 
Risk factor 
 Genetics 79 94 
 Anthropometry 11 
 Lifestyle 23 10 41 
 Biomarkers 24 24 
Total 36 103 20 159 

Abbreviation: NHL, non-Hodgkin lymphoma.

The 94 articles on cancer genetics have included 29 on GWAS, nine of which were meta-analyses of GWAS. Twenty of the GWAS found novel risk loci and eleven confirmed previously reported risk loci. Forty-eight publications investigated candidate genes, individually or in pathways: 25 found significant associations with risk, whereas 23 had null findings. The remaining 17 studied pleiotropy (5), gene–environment interactions (8), or addition of genetics in risk models (4). Studies of anthropometry, smoking, and alcohol often found moderate associations with cancer risk, whereas studies of diet often gave less marked or null findings. A large number of studies examined the associations between prediagnostic biomarkers and cancer, most of which found no association with risk.

Although the majority of cohort participants overall are white, there are sufficient non-whites that six articles were published solely on Chinese and two solely on African Americans, as well as 12 with analyses stratified by ethnic group.

Among the 159 cancer-related articles, the sites most frequently studied were breast (26%), prostate (22%), and pancreas (18%), but many articles also included less common sites (Table 3). In total, 67 of the articles were on cancers defined by the NCI as “rare” (<15 cases per 100,000 per annum) and many of common cancers included subdivisions that require large numbers, for example, 17 articles on breast cancer subdivided by hormone receptor status, and articles on in situ breast cancer and type II endometrial cancer. The exposures analyzed were in general relatively common (Table 3), but again large numbers enabled investigation of uncommon subsets within these (e.g., there were two articles on risks in metabolically healthy obese subjects), and of subdivisions by more than one variable, for example, lung cancer risks in relation to vegetable intake and smoking, and liver cancer risks in relation to OC use and oophorectomy. For genetics, the need for large numbers was to enable the search for the relatively small elevations in relative risk that are typically present for individual SNPs, thus necessitating pooling even for analyses of common cancers.

Pooling successfully enabled analyses of large numbers of events, allowing greater power even for rare events than would have been possible in a single study. The average number of studies included in an article was 10 and the average number of events was 7,026. There was a tendency for the publications on rarer cancers to include a larger number of studies and a smaller average sample size.

The average number of authors listed on the articles was 44, reflecting the team effort required for projects that combine studies; less than 15% of the articles had 15 or fewer authors. The Cohort Consortium offered opportunities to many researchers, including many junior investigators. A total of 118 unique investigators were first authors on the 188 articles.

The Cohort Consortium has come a long way from a group of investigators from nine like-minded cohort studies, all but two from the United States, in 2001, to a collaboration now of 58 cohorts from 20 countries. We have learned that cohort investigators are willing and eager to engage in large-scale pooling projects, easily self-organize, and work productively with each other on studies that have yielded important findings with implications for understanding of etiology, clinical guidelines, and public health measures. Over 180 publications have resulted, with productivity across a very wide range of aspects of cancer epidemiology research, as well as providing a forum for discussing and improving methods and best practice for cohort studies and pooling analyses.

Critical to the success of the Consortium have been the provision of a central administrative and coordinating infrastructure by NCI, and the trust between cohort investigators, expectation of reciprocity, and willingness to pool, that have built up over the years. This is particularly critical because cohort studies are extraordinarily long term, and individual investigators have often invested decades of work into them, so that the decision whether to join pooling efforts, which may invalidate future solo publications (e.g., on a rare tumor), is not a simple one. Trust has been greatly increased by the facilitative, nonprescriptive, bottom-up ethos of the Consortium, which is driven by the research ideas of its members, with leadership by different research groups on different projects.

The Consortium has provided an efficient means to bring together multiple diverse cohorts to address scientific questions that the cohort investigators could not address on their own. The underlying cohorts in aggregate represent a huge investment, and have taken decades to collect follow-up data. The Consortium has, over a relatively short period, produced important added value based on very large numbers.

As a voluntary, investigator-led, collaborative framework, the future directions of the Consortium will depend on the wishes of its members. However, some general comments can be made. The opportunities that the Consortium has exploited, and will exploit in future, depend on funding constraints and incentives, the intrinsic strengths and limitations of the component cohorts, scientific opportunities and priorities, and the research interests of the members. Thus, for instance, there are many more gene–environment interactions, uncommon exposures, and uncommon cancers that the Consortium could address in future, although the latter has the difficulty of a lack of funding opportunities (e.g., site-specific charities for common cancers such as breast tend to be much better resourced than those for rarer tumors). Furthermore, for very rare cancers or subtypes of cancers, even a consortium of this size may contain too few cases to generate precise risk estimates. The exposures that can be investigated with large numbers are limited to those for which there are sufficient cohorts (and especially large ones) that collected data on these variables and to the level of detail that can be harmonized. Another limitation of the Consortial approach is the workload to involve large numbers of groups and investigators (although this is also a strength, because of the large number, and variety, of experienced investigators contributing to the quality of each publication).

The strengths of consortial research, in particular compared with starting new cohorts, also include the long collective length of follow-up, and therefore the large numbers of incident cancers after recruitment already accrued, for immediate analysis, and the very low cost of combined analysis compared with the cost of initiating new cohorts. It would help to enable this if funding agencies made available funding initiatives aimed specifically at (a) maintenance and continuing follow-up of existing high-quality cohorts and (b) large-scale pooled cohort analyses, especially for uncommon cancers, which collectively account for a large burden of mortality and morbidity but are individually difficult to fund.

The Consortium needs to continue to expand, both to increase numbers for analyses of interactions, subgroup analyses and rare tumors and exposures, and to include more non-white populations, to enable larger analyses for these groups for which numbers within the Consortium are currently much fewer than for whites. The Consortium also needs to consider how to encourage and enable research on questions that currently are not addressed because no investigator has initiated investigation of the topic. Much of the 2017 annual meeting was spent discussing which research directions (e.g., rare tumors, rare exposures, gene–environment interactions) should be given priority in the next 5 years, and the steering committee is now formulating plans to address the priority areas.

There is great potential for the Consortium to contribute to future research, with opportunities provided by the wealth and growing volume of data and biospecimens available from the underlying cohorts, the changing landscape of cancer risk factors within and across populations (e.g., the growing worldwide obesity epidemic), and the extraordinary technological advances occurring in the assessment of genetic, metabolomic, and other molecular characteristics. The large sample sizes of the Consortium are advantageous both for discovering novel associations and for validating findings from individual studies. The move toward precision medicine and prevention (30, 31) will need reliable, stable risk estimates from cohort studies in ever-finer subdivisions of the population. As noted above, large cohort-based databases will also be needed to investigate rare cancers, rare exposures, and gene–environment interactions, to stratify risks by tumor subtypes, and to refine population-level and individual-level risk prediction. There are also great opportunities in survivorship research, and in biomarker research. The Consortium welcomes new members with cancer-oriented cohorts of 10,000 or more participants and an interest in collaborative research.

No potential conflicts of interest were disclosed.

The authors acknowledge the great contribution of those who founded the Consortium and of all the cohorts who have joined it since. They thank the members of the steering committee and the principal investigators of the cohorts who made comments on the manuscript and gave information about their cohorts.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Doll
R
,
Hill
AB
. 
The mortality of doctors in relation to their smoking habits; a preliminary report
.
Br Med J
1954
;
1
:
1451
5
.
2.
Hammond
EC
,
Horn
D
. 
The relationship between human smoking habits and death rates: a follow-up study of 187,766 men
.
J Am Med Assoc
1954
;
155
:
1316
28
.
3.
Case
RAM
,
Hosker
ME
,
McDonald
DB
,
Pearson
JT
. 
Tumours of the urinary bladder in workmen engaged in the manufacture and use of certain dyestuff intermediates in the British chemical industry
Br J Ind Med
1954
;
11
:
75
104
.
4.
Breslow
NE
,
Day
NE
. 
Statistical methods in cancer research. Volume II-The design and analysis of cohort studies
.
IARC Sci Publ
1987
;
82
:
1
406
.
5.
Bernstein
L
,
Allen
M
,
Anton-Culver
H
,
Deapen
D
,
Horn-Ross
PL
,
Peel
D
, et al
High breast cancer incidence rates among California teachers: results from the California Teachers Study (United States)
.
Cancer Causes Control
2002
;
13
:
625
35
.
6.
Hennekens
CH
,
Speizer
FE
,
Rosner
B
,
Bain
CJ
,
Belanger
C
,
Peto
R
. 
Use of permanent hair dyes and cancer among registered nurses
.
Lancet
1979
;
1
:
1390
3
.
7.
Colditz
GA
,
Hankinson
SE
. 
The Nurses' Health Study: lifestyle and health among women
.
Nat Rev Cancer
2005
;
5
:
388
96
.
8.
Kolonel
LN
,
Henderson
BE
,
Hankin
JH
,
Nomura
AM
,
Wilkens
LR
,
Pike
MC
, et al
A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics
.
Am J Epidemiol
2000
;
151
:
346
57
.
9.
Rosenberg
L
,
Adams-Campbell
L
,
Palmer
JR
. 
The Black Women's Health Study: a follow-up study for causes and preventions of illness
.
J Am Med Womens Assoc (1972)
1995
;
50
:
56
8
.
10.
Swerdlow
AJ
,
Jones
ME
,
Schoemaker
MJ
,
Hemming
J
,
Thomas
D
,
Williamson
J
, et al
The Breakthrough Generations Study: design of a long-term UK cohort study to investigate breast cancer aetiology
.
Br J Cancer
2011
;
105
:
911
7
.
11.
The Million Women Study Collaborative Group
. 
The Million Women Study: design and characteristics of the study population
.
Breast Cancer Res
1999
;
1
:
73
80
.
12.
Collins
R
. 
What makes UK biobank special?
Lancet
2012
;
379
:
1173
4
.
13.
Smith-Warner
SA
,
Spiegelman
D
,
Ritz
J
,
Albanes
D
,
Beeson
WL
,
Bernstein
L
, et al
Methods for pooling results of epidemiologic studies: the pooling project of prospective studies of diet and cancer
.
Am J Epidemiol
2006
;
163
:
1053
64
.
14.
Hunter
DJ
,
Riboli
E
,
Haiman
CA
,
Albanes
D
,
Altshuler
D
,
Chanock
SJ
, et al
A candidate gene approach to searching for low-penetrance breast and prostate cancer genes
.
Nat Rev Cancer
2005
;
5
:
977
85
.
15.
Beckmann
L
,
Hüsing
A
,
Setiawan
VW
,
Amiano
P
,
Clavel-Chapelon
F
,
Chanock
SJ
, et al
Comprehensive analysis of hormone and genetic variation in 36 genes related to steroid hormone metabolism in pre- and postmenopausal women from the breast and prostate cancer cohort consortium (BPC3)
.
J Clin Endocrinol Metab
2011
;
96
:
E360
7
.
16.
Gu
F
,
Schumacher
FR
,
Canzian
F
,
Allen
NE
,
Albanes
D
,
Berg
CD
, et al
Eighteen insulin-like growth factor pathway genes, circulating levels of IGF-I and its binding protein, and risk of prostate and breast cancer
.
Cancer Epidemiol Biomarkers Prev
2010
;
19
:
2877
87
.
17.
Canzian
F
,
Cox
DG
,
Setiawan
VW
,
Stram
DO
,
Ziegler
RG
,
Dossus
L
, et al
Comprehensive analysis of common genetic variation in 61 genes related to steroid hormone and insulin-like growth factor-I metabolism and breast cancer risk in the NCI breast and prostate cancer cohort consortium
.
Hum Mol Genet
2010
;
19
:
3873
84
.
18.
Schumacher
FR
,
Berndt
SI
,
Siddiq
A
,
Jacobs
KB
,
Wang
Z
,
Lindstrom
S
, et al
Genome-wide association study identifies new prostate cancer susceptibility loci
.
Hum Mol Genet
2011
;
20
:
3867
75
.
19.
Husing
A
,
Canzian
F
,
Beckmann
L
,
Garcia-Closas
M
,
Diver
WR
,
Thun
MJ
, et al
Prediction of breast cancer risk by genetic risk factors, overall and by hormone receptor status
.
J Med Genet
2012
;
49
:
601
8
.
20.
Doiron
D
,
Marcon
Y
,
Fortier
I
,
Burton
P
,
Ferretti
V
. 
Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination
.
Int J Epidemiol
2017
;
46
:
1372
8
.
21.
Yeager
M
,
Orr
N
,
Hayes
RB
,
Jacobs
KB
,
Kraft
P
,
Wacholder
S
, et al
Genome-wide association study of prostate cancer identifies a second risk locus at 8q24
.
Nat Genet
2007
;
39
:
645
9
.
22.
Barrdahl
M
,
Canzian
F
,
Joshi
AD
,
Travis
RC
,
Chang-Claude
J
,
Auer
PL
, et al
Post-GWAS gene-environment interplay in breast cancer: results from the Breast and Prostate Cancer Cohort Consortium and a meta-analysis on 79,000 women
.
Hum Mol Genet
2014
;
23
:
5260
70
.
23.
Maas
P
,
Barrdahl
M
,
Joshi
AD
,
Auer
PL
,
Gaudet
MM
,
Milne
RL
, et al
Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States
.
JAMA Oncol
2016
;
2
:
1295
302
.
24.
Helzlsouer
KJ
;
VDPP Steering Committee
. 
Overview of the cohort consortium vitamin D pooling project of rarer cancers
.
Am J Epidemiol
2010
;
172
:
4
9
.
25.
Arem
H
,
Brinton
LA
,
Moore
SC
,
Gapstur
SM
,
Habel
LA
,
Johnson
K
, et al
Physical activity and risk of male breast cancer
.
Cancer Epidemiol Biomarkers Prev
2015
;
24
:
1898
901
.
26.
Wentzensen
N
,
Poole
EM
,
Trabert
B
,
White
E
,
Arslan
AA
,
Patel
AV
, et al
Ovarian cancer risk factors by histologic subtype: an analysis from the ovarian cancer cohort consortium
.
J Clin Oncol
2016
;
34
:
2888
98
.
27.
Campbell
PT
,
Newton
CC
,
Freedman
ND
,
Koshiol
J
,
Alavanja
MC
,
Beane Freeman
LE
, et al
Body mass index, waist circumference, diabetes, and risk of liver cancer for U.S. Adults
.
Cancer Res
2016
;
76
:
6076
83
.
28.
Sonderman
JS
,
Bethea
TN
,
Kitahara
CM
,
Patel
AV
,
Harvey
C
,
Knutsen
SF
, et al
Multiple myeloma mortality in relation to obesity among African Americans
.
J Natl Cancer Inst
2016
;
108
.
pii: djw120
.
29.
Berrington de Gonzalez
A
,
Hartge
P
,
Cerhan
JR
,
Flint
AJ
,
Hannan
L
,
MacInnis
RJ
, et al
Body-mass index and mortality among 1.46 million white adults
.
N Engl J Med
2010
;
363
:
2211
9
.
30.
Collins
FS
,
Varmus
H.
A new initiative on precision medicine
.
N Engl J Med
2015
;
372
:
793
5
.
31.
Rebbeck
TR
. 
Precision prevention of cancer
.
Cancer Epidemiol Biomarkers Prev
2014
;
23
:
2713
5
.