During the last two decades, epidemiology has undergone a rapid evolution toward collaborative research. The proliferation of multi-institutional, interdisciplinary consortia has acquired particular prominence in cancer research. Herein, we describe the characteristics of a network of 49 established cancer epidemiology consortia (CEC) currently supported by the Epidemiology and Genomics Research Program (EGRP) at the National Cancer Institute (NCI). This collection represents the largest disease-based research network for collaborative cancer research established in population sciences. We describe the funding trends, geographic distribution, and areas of research focus. The CEC have been partially supported by 201 grants and yielded 3,876 publications between 1995 and 2011. We describe this output in terms of interdisciplinary collaboration and translational evolution. We discuss challenges and future opportunities in the establishment and conduct of large-scale team science within the framework of CEC, review future prospects for this approach to large-scale, interdisciplinary cancer research, and describe a model for the evolution of an integrated Network of Cancer Consortia optimally suited to address and support 21st-century epidemiology. Cancer Epidemiol Biomarkers Prev; 22(12); 2148–60. ©2013 AACR.
The transition toward large-scale collaborations has been a hallmark in many fields of research in the last few decades (1). In epidemiology, and especially in the genomic epidemiology of complex diseases, this trend has been supported by a convergence of factors, including the rapid development of increasingly sophisticated genomic technologies (2), the progressive building of large population resources such as cohorts and biobanks (3, 4), and the requirement for larger sample sizes to address small effects. As a consequence, epidemiologic research on the genetic and environmental determinants of complex diseases has experienced a paradigm shift toward “Big and Bigger Science,” embodied by the increase of consortia as “hubs” of collaborative and interdisciplinary research within the framework of population sciences (5). Cancer epidemiology has been markedly affected by the advent of such collaborative infrastructures. Here, we examine the impact of research originating from interdisciplinary cancer consortia from 1995 to 2011, based on our experience with a network of 49 cancer-related consortia supported by the Epidemiology and Genomics Research Program (EGRP) of the National Cancer Institute (NCI). To our knowledge, this is the largest disease-based research network for collaborative, population-based research currently in existence. The reported analysis offers insights about the growth, impact, and future prospects of CECs as well as their role in supporting high-impact interdisciplinary research.
We define a consortium as “a group of scientists from multiple institutions who have agreed to cooperative research efforts involving, but not limited to, pooling of information from more than one study for the purpose of combined analyses and collaborative projects. Such consortia are geared to address scientific questions that cannot otherwise be addressed through the effort of a team of investigators at a single institution due to scope, resources, population size, or the need for an interdisciplinary approach” (6). The Cancer Epidemiology Consortia (CEC) network is a group of eligible cancer consortia that have received different forms of support by the National Cancer Institute (NCI) since 1995.
Consortia included in this network have either contacted the NCI/EGRP to be listed or have been specifically solicited through a targeted initiative. When applying to be listed on the EGRP website, each consortium is required to provide the following to be eligible: a description of cancer-related research questions that can be uniquely addressed by that consortium because of its characteristics (e.g., size and characteristics of population, enrolled biobank infrastructure, involvement of an interdisciplinary team of scientists from multiple institutions, etc.); an existing or proposed organizational structure and leadership plan; and a statement of commitment to data sharing within and outside the consortium. As emerging consortia can experience a substantial lag time before beginning to publish in a substantive manner, we have limited the reported analyses to established CEC that were launched before 2010. A complete list of the 49 established EGRP consortia included in the analyses, along with descriptive information on each appears in Table 1.
Characteristics of the CEC
A common characteristic of the 49 established CEC is that each team has not assembled uniquely to execute one project, but their collaborations extend through time and across research projects that vary in design and complexity. Only 3 of these CEC had assembled in response to solicitations by the NCI or other NIH institutes (7–9), with the remainder coalescing spontaneously to address diverse research agendas. Funding for consortia-related research is provided by EGRP/NCI through a variety of investigator-initiated mechanisms or through support for communication and research networking activities such as meetings or teleconferences. Most consortia (n = 41, 84%) focus on a single cancer type. The remaining 8 (16%) CEC study multiple cancers, specific translational topics (e.g., radiogenomics), or focus on diverse ethnic populations (Hispanics, African, Asian and Caribbean) in the United States and abroad (Table 1). Most CECs are international in nature, with the largest distribution of collaborating groups in high- and mid-level income countries (Supplementary Fig. S1). All 49 CEC have some type of associated biorepository.
Websites for CEC and Consortium Policies
Public websites are an essential tool for communication and global sharing of study results, dissemination of research tools, and provide a conduit to data sharing and research opportunities for large collaborative groups. Forty (82%) CEC have developed publicly available websites (reviewed from March 4–14, 2013): 34 (85%) included information on CEC leadership and 20 (50%) detailed the CEC organizational structure. Eighteen (45%) of the CEC websites included information on consortium membership requirements, 9 (23%) included submission guidelines for new project proposals, and 7 (18%) had eligibility requirements and contact information for participant enrollment. Twenty-one (53%) of the CEC websites included a restricted access area (portal) reserved for consortia members communication and internal data sharing.
CEC websites (accessed May 13–16, 2013), associated grant applications, and descriptive manuscripts were reviewed to determine whether CEC had established data sharing policies, as was intended. In cases where no policy was found, the CEC liaison or lead investigator was contacted and asked if the CEC had a data sharing policy in place. Overall, 29 (59%) had data sharing policies, 3 (6.1%) were in the process of developing them, 10 (20.4%) did not have policies in place, and for 8 (16.3%) CEC, we were unable to confirm whether or not they had data sharing policies. Consortia supported entirely or in part through NCI-awarded grants and cooperative agreements are mandated to comply with the NIH data and resource sharing policies for what concerns the specific aims listed on the funded grants or cooperative agreements (10).
Grants Funding for CEC
To evaluate the investment in terms of funding support and the scientific productivity of CEC, we reviewed all the EGRP grants that were related to the 49 established CEC. Overall, 201 grants, funded by EGRP between fiscal year (FY) 1995 and 2011, were identified as consortium related by searching the NCI Portfolio Management Application (PMA) database (v14.0.3). Grant coding is conducted by EGRP program staff and consortium codes were confirmed through a manual review of the EGRP grant portfolio. A grant was defined as consortium-related if it directly supported the main research activities and/or infrastructure of the consortium or if it explicitly relied on the consortium's resources to conduct the proposed research project. An analysis of CEC-related grants shows a linear increase of investment in consortial research by EGRP from FY 1995–2011 (Supplementary Fig. S2A and S2B). The average yearly increase in total number and total direct cost for EGRP CEC-associated grants was $5.3 and $4.2 million respectively; in contrast, the total number of grants funded by EGRP has been flat since 1997 and the total direct cost of the whole portfolio has increased at a much slower rate (Supplementary Fig. S3A and S3B). Starting in 2002, funding for projects that are collaborations between CEC increased over time. This reflects the increasing number of CEC as well as increases in the size and complexity of the collaborative network. While the total costs of CEC-associated grants have been increasing, the average cost per grant has been relatively flat since 1998 and the fraction of small grants (<$250 K in direct costs per year) increased substantially from 2002 to 2007. This trend toward smaller grants is not seen in the overall EGRP portfolio (Supplementary Fig. S4A and S4B).
The success rate of CEC-related grants (percentage of reviewed applications that receive funding) was compared with the success rate of EGRP-funded and NIH-funded grants (all mechanisms). Data were extracted from NIH RePORTER (11) and the NCI Portfolio Management Application (PMA) database (v14.0.3, accessed 4/30/2013). The success rate of CEC-related grants (48%) was consistently higher than the success rates for EGRP grants (28%) and NIH grants (25%) since FY 2000 (Table 2). This may reflect many factors, including the ready availability of resources and infrastructure in established consortia, increased communication across participating scientists and groups (12), as well as a more intense pre-submittal review of grants applications by the multiple participating investigators.
Productivity and Impact
To measure CEC scientific productivity, 3,876 CEC-related manuscripts were identified using 3 different methods. First, the CEC websites (if available) were searched for listings of manuscripts (websites initially accessed the week February 20, 2012 and checked for updates on January 21, 2013). Second, the CEC names and abbreviations were used as search terms for PubMed (13) queries (search conducted on April 30, 2012); the titles and abstracts of these manuscripts were reviewed and results returned because of ambiguity in the search terms (such as a different organization having the same abbreviation as the CEC) were excluded. Finally, the NCI code and serial numbers of the 201 consortium-related EGRP grants (as identified above) were used as search terms in PubMed (search conducted on May 13, 2012) to identify grant-related manuscripts. Results of these searches were combined, and duplicate manuscripts and manuscripts published before the initiation year of the oldest, associated consortium and after 2011 were removed. The ascertainment and censoring, of EGRP CEC-related publications, is summarized in Supplementary Fig. S5.
The number of CEC-associated articles published each year has increased linearly since 1998 (Fig. 1A). Furthermore, a PubMed search for the terms “consortia OR consortium” NOT “bacteria OR microbe OR microbial” (to exclude articles on microbial consortia; searched on January 24, 2013) reveals an exponential increase in the number of articles containing these terms in their titles, abstracts, or authorship from 1985 to 2012 (Fig. 1B). Further refining those search results by searching for the terms “cancer OR tumor” also reveal an exponential increase in cancer consortia articles since 1985. The increase of genome-wide association studies (GWAS) and the consequent need for extremely large numbers to reach adequate power has been cited as a major impetus for the formation of CEC (14, 15); however, the trend toward team science in cancer research began well before publication of the first GWAS. Twenty-one (43%) of the CEC in the EGRP network were initiated before the first GWAS in the NHGRI GWAS Catalogue (16, 17).
CEC and Interdisciplinary Research
It has been proposed that team science, and therefore CEC-related science, should be ideally geared to support interdisciplinary research (18, 19). We define interdisciplinary research as “a mode of research by teams or individuals that integrates information, data, techniques, tools, perspectives, concepts, and/or theories from two or more disciplines or bodies of specialized knowledge to advance fundamental understanding or to solve problems whose solutions are beyond the scope of a single discipline or area of research practice” (20). To ascertain whether the publications produced by the 49 CEC examined are reflecting the CEC capability to support interdisciplinary research within an epidemiologic framework, the titles and abstracts for each of the 3,876 articles identified through the literature search described above were reviewed and each article was assigned a primary scientific area. Articles were randomly assigned to 1 of 3 reviewers, with 10% of the articles assigned to all 3, and reviews were conducted over 14 rounds. Between each round, reviewers convened to discuss difficult to categorize articles, resolve discrepancies, and refine definitions. The scientific areas considered and their definitions are presented in Supplementary Table S1. A total of 3,729 articles were assigned a primary scientific area. See Supplementary Methods for exclusion criteria. Fifty-six percent of the articles being scored fell into the environmental, lifestyle, and genomic epidemiology categories, 11.9% of the articles involved development of new methods and technologies, 9.4% focused on clinical and translational research, and 11% were classified as biology, which encompasses basic laboratory research, including studies in cell lines and animal models (Supplementary Fig. S6). Considering the CEC consortia were primarily designed to address questions within the framework of population sciences, the diversity of the associate literature is striking and shows the flexibility of interdisciplinary nature of the CECs.
The types of research projects being undertaken by CECs have extended their interdisciplinary scope, and evolved with the recent “genomic revolution” as shown by the trends in the CEC publication's scientific areas over time (Fig. 2). Classic epidemiology studies evaluating environmental and lifestyle exposures represent a large proportion of this literature and increase over time (1995–2011). However, the number of CEC articles in genomics areas, including candidate genes (CG), gene characterization (GC), genome-wide association (GWA), linkage (LK), LOH, or next-generation sequencing (NGS) has grown significantly since 2000, and since 2010 genomics, articles represented the largest category of EGRP CEC-associated publications. The growth in genomics publications is largely driven by the exponential growth of GWAS, which (as defined here) include initial scans as well as replication studies (Supplementary Fig. S7). This trend mirrors the decline of linkage publications as the search for genetic determinants of cancer risk shifted from searching for rare, highly penetrant variations in family studies to the search for common variants with small effect sizes, primarily through association studies (Supplementary Fig. S8). Publications pertaining to methods and technologies, biology, molecular epidemiology, and clinical and translation areas show a more modest but steady increase.
Team Science and Translation
To evaluate the contribution of EGRP CECs to translational research 3,363 articles from the literature search were reviewed and coded by translational research phase. See Supplementary Methods for exclusion criteria. The phases of the translational research continuum have been previously described (21–24) and are summarized with example articles from our literature search in Table 3 (25–34). Coding for translational research phase was conducted concurrently with coding for primary scientific area following the same procedures. Overall, 2,645 (79%) articles were coded as T0, 582 (17%) as T1, 112 (3.3%) as T2, 18 (0.5%) as T3, and only 6 (0.2%) as T4.
Previous analyses have observed that T2 and above studies account for a small fraction of NIH-funded cancer genetics research (24). It has been estimated that only 0.64% of cancer genetics articles published in 2007 would be scored T2 or above (23). In our database of CEC publications, 136 (3.5%) were scored T2+, and when we limited CEC publications to cancer genetics articles published in 2007, we found 3.4% (6 of 171) were scored T2 or above. The increased proportion of T2 or above articles among CEC-associated cancer genetics articles was statistically significant (P = 2.4 × 10−6; see Supplementary Methods for details of this analysis). This enrichment of articles further down the translational continuum indicates that the collaborative and interdisciplinary infrastructure of CEC may specifically facilitate translational research.
Challenges and Future Prospects
The extended CEC network examined does not include all consortia focusing on cancer research but is highly representative of that segment of cancer research focusing on human populations to understand the causes of cancer and related outcomes. The CEC network is international in scope, allowing for the study of populations with diverse genetic background and lifestyles, and encompasses studies with a variety of designs, from familial to case–control to prospective cohorts. The geographic distribution of the participating teams is still showing underrepresentation in low-income regions, reflecting the need for infrastructure building to enrich the network to include populations with diverse genetic backgrounds, lifestyles, and cultures. This may also reflect the fact that NIH funds primarily USA-based investigators and a more in-depth analysis of other sources of funding for international groups may reveal a more comprehensive panorama of international consortia. Tools of virtual communication (websites, portals) are widely used, but public dissemination of internal policies and processes and membership/participation criteria are still somewhat limited in this network. In this regard, complete transparency could greatly facilitate scientific exchange and rapid replication of initial results. Increasing the public posting of clear data sharing policies would facilitate not only collaborative projects among the investigators within the consortium but also ready access to the consortia resources from new investigators interested in initiating collaborations thereby extending the consortia network. Consortia investigators are usually funded through a mosaic of mechanisms awarded by different funding agencies, and only rarely through special initiatives. For grants funded through NCI, adherence of the principal investigators to the NIH and NCI policies of data sharing is monitored through the lifetime of the grant and the application of current NIH/NCI policies on data sharing is mandatory and a condition for funding. For example, for some EGRP-funded consortia, applications for collaborative projects are tracked through a process of review and approval, and followed for productivity until completion (35, 36). It is to be noted that the data sharing policies implemented by other national and international agencies vary considerable both in content and in implementation (37). The NCI is in the process of establishing a database of consortia including the public posting of the internal and agency-mandated data sharing policies and to encourage maximum transparency. Increased transparency could greatly facilitate scientific exchange and rapid replication of initial results. A funding trend toward smaller CEC-associated grants may indicate that while CEC require an initial substantial expenditure to establish the infrastructure necessary to conduct large-scale studies, their subsequently established resources and collaborative culture can be leveraged to support cost-efficient research effectively. Increased collaborations and synergism across consortia is also shown by the involvement of multiple CEC in individual grants. The increased success rate for consortia-related grant applications, as compared with the success rate for applications submitted to EGRP and NCI, may be symptomatic of the leverage provided by the extended consortia infrastructure and the extensive pre-application scrutiny and constructive pre-review usually provided by the consortia teams.
Our ascertainment of CEC-related publications has some limitations. Publications listed on a consortium's website are likely to be truly consortia-associated. However, 9 of the consortia did not have public websites, and of the remaining 40 websites, only 31 (62%) displayed a publication list. The completeness and update frequencies of the website publication lists were also variable. Publications captured by querying PubMed for the CEC name or abbreviation (followed by confirmation through independent review) are, almost by definition, CEC-related. However, CEC publication policies vary considerably and it is not uncommon for contributing CEC to only be mentioned in the methods, acknowledgments, or supplementary materials of a article, that is, sections that are not queried in PubMed searches, leading to underascertainment. Examination of the manuscripts associated with a sample of 4 established large-scale consortia funded by NIH (2 of which were not included in the analyses presented in this manuscript as they did not respond to the selection criteria used) shows that acknowledgment of publications in consortia-related manuscripts, cited by their websites or listed in the associated grants, varied considerably across consortia, from 100% to 52%, (see Supplementary Table S2). This may be due to journal policies on acknowledgment format, to the lack of standard acknowledgement language distributed to the collaborators, or to the absence of appropriate consortium acknowledgment requirements. CEC-associated grants often encompass multiple specific aims, which may or may not require CEC resources. Therefore, when searching by acknowledged CEC-associated grant numbers we may be overascertaining CEC articles; this is in contrast to the website review and name/abbreviation searches, which usually underascertain CEC-related publications. Another possible source of measurement error is that CEC are typically supported through a mosaic of funding mechanisms, including grants from funding sources other than EGRP. Our grant number searches were limited to manuscripts citing EGRP-funded grants. While each individual search has limitations, we combined several strategies, with different strengths and weaknesses to obtain an overall picture of the scientific productivity of this network of CECs. Results not only show a consistent increase in scientific output, after an initial lag period for the establishment of the needed infrastructures, but also the capability of consortia to support interdisciplinary science beyond the domains of classic epidemiology. The enrichment in the production of publications further down the translational continuum (T2 and higher) may also indicate the potential for unique contributions of CEC to translational research as a result of their interdisciplinary, team science approach. Interdisciplinary science is the first step in the path to translation and large consortia network may in the future provide an accelerated avenue to the development of preventive intervention and new therapeutic strategies.
We have described an extensive collection of cancer consortia that is showing the initial characteristics of an emerging interactive network as also shown by extensive co-authorship and co-membership across consortia (abstract under review). Public posting of internal policies and processes, especially for what concerns data sharing and publications, and public availability of descriptive data on existing CECs resources (populations characteristics, protocols, questionnaires, publications, etc.) could considerably expedite collaborations across the consortia and with the scientific community at large. The combined expertise and infrastructure represented within this established network could also be of use in developing training approaches for young investigators across the spectrum of cancer epidemiology and related disciplines. This emerging Network of Cancer Consortia (NOCC) has shown the capability to incorporate novel genomics technologies (genome-wide genotyping and next-generation sequencing technologies) and has the potential to be a fertile ground for the high-throughput application of different ‘omics’ approaches (38). Publication output shows that multilevel data sets are being assembled, integrated, and analyzed to address hypotheses of increasing complexity.
It has been proposed that 21st-century epidemiology will be driven by 4 overlapping drivers in the production of new knowledge and its translation: acceleration of trends toward multiple group interactive networks; rapid incorporation of emerging technologies into large-scale population studies; the building of infrastructure through which to assess factors and interventions at multiple levels; and the capability of effectively integrating multilevel data sets for increasingly complex analyses (39). The extensive population resource, the reliance on interdisciplinary teams, and the facilitation of the translational pipeline by this emerging network of consortia may offer a supportive infrastructure to begin implementing these transformative goals.
Disclosure of Potential Conflicts of Interest
The findings and conclusions in this report are those of the authors and do not necessarily reflect the views of the Department of Health and Human Services. No potential conflicts of interest were disclosed.
Conception and design: M.R. Burgio, J.P.A. Ioannidis, M.J. Khoury, D. Seminara
Development of methodology: M.R. Burgio, J.P.A. Ioannidis, M.J. Khoury, D. Seminara
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M.R. Burgio, B.M. Kaminski, E. DeRycke, D. Seminara
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M.R. Burgio, J.P.A. Ioannidis, B.M. Kaminski, E. DeRycke, M.J. Khoury, D. Seminara
Writing, review, and/or revision of the manuscript: M.R. Burgio, J.P.A. Ioannidis, B.M. Kaminski, E. DeRycke, M.J. Khoury, D. Seminara
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M.R. Burgio, B.M. Kaminski, S. Rogers, M.J. Khoury
Study supervision: M.J. Khoury, D. Seminara