Abstract
Research advances build upon the validity and reproducibility of previously published data and findings. Yet irreproducibility in basic biologic and preclinical research is pervasive in both academic and commercial settings. Lack of reproducibility has led to invalidated research breakthroughs, retracted articles, and aborted clinical trials. Concerns and requirements for transparent, reproducible, and translatable research are accelerated by the rapid growth of “post-publication peer review,” open access publishing, and data sharing that facilitate the identification of irreproducible data/studies; they are magnified by the explosion of high-throughput technologies, genomics, and other data-intensive disciplines. Collectively, these changes and challenges are decreasing the effectiveness of traditional research quality mechanisms and are contributing to unacceptable—and unsustainable—levels of irreproducibility. The global oncology and basic biologic research communities can no longer tolerate or afford widespread irreproducible research. This article discusses (i) how irreproducibility in preclinical research can ultimately be traced to an absence of a unifying life science standards framework, and (ii) makes an urgent case for the expanded development and use of consensus-based standards to both enhance reproducibility and drive innovations in cancer research. Cancer Res; 74(15); 4024–9. ©2014 AACR.
Introduction
By now, most cancer biologists should be familiar with the growing body of literature (1–4), including increasingly mainstream (5) media coverage, documenting that many published studies cannot be reproduced. Reproducibility—also referred to as replication, validation, verification, or reanalysis (6, 7)—is a fundamental pillar of scientific research. Yet irreproducibility is a pervasive, systemic, and expensive problem in both academia and pharma, and has led to invalidated research breakthroughs, retracted articles, discontinued clinical trials, and reduced trust in the research enterprise (8). Moreover, valuable time and resources are wasted whereas opportunities to enhance public health, in cancer and in all human diseases, are delayed or lost. This article discusses the crisis of irreproducibility in cancer—especially basic/preclinical—and related life science research. It builds on The Case for Standards in Life Science Research: Seizing Opportunities at a Time of Critical Need (8) for the expanded development and use of standards to improve its credibility, reproducibility, and translatability.
Irreproducibility in basic biologic and preclinical research
Virtually all scientific research depends upon the validity and reproducibility of findings published in the literature and presented at conferences. Despite increases in “precompetitive collaborations” (i.e., nontraditional research collaborations that feature the sharing of information, resources, and capabilities) in oncology (9), pharmaceutical and biotechnology industries continue to rely on published results from academia, especially about new targets and their underlying biologic mechanisms (1), to form the basis of new cancer therapeutic or biomarker research programs (3). Yet the historical rate of successfully translating cancer research findings into safe and effective diagnostics and therapies remains shockingly low (1, 10). In addition to the inherently complex nature of carcinogenesis, many factors are responsible for the ongoing high failure rate of cancer clinical trials, but many can be traced to the various limitations of preclinical studies that can be grouped into the following four categories/issues: reference materials, study design, laboratory protocols, and data collection and analysis. As Begley and Ellis (1) note “Unquestionably, a significant contributor to failure in oncology trials is the quality of published preclinical data…The results of preclinical studies must, therefore, be very robust to withstand the rigours and challenges of clinical trials, stemming from the heterogeneity of both tumours and patients.” More broadly, the importance of transparent reporting and reproducibility of preclinical studies using animal models was emphasized in Landis and colleagues (11) such that the scientific community, disease advocacy organizations, and research funders can independently evaluate the reliability of previously published findings.
To further complicate matters, concerns of transparency, reproducibility, and the academic scientific community's response to an increased focus on translational research (12) are occurring amidst a largely uncoordinated maelstrom of data collection, analysis, and sharing efforts associated with the explosive growth of high-throughput technologies, genomics, and other data-intensive disciplines (13, 14). Another ongoing change in the life science research and publication landscape involves the appropriate role of journals as gatekeepers of information and the effectiveness of the peer-review process itself (11, 15, 16)—both to maintain the quality of published research in the short-term and to “self-correct” erroneous hypotheses and results through additional studies (and sometimes retractions; refs. 17, 18) over the longer-term. Further complicating, yet accelerating this evolution is the rapid growth of “post-publication peer review” (e.g., PubMed Commons, PubPeer) open access publishing, and data sharing that facilitate the identification of irreproducible data/studies (19, 20).
Collectively, these changes are decreasing the effectiveness of traditional research quality mechanisms in an imbalanced climate of flat research funding and still-growing community of biomedical researchers (16), and are contributing to unacceptable levels of irreproducibility in published research. Cancer research is no exception (1, 8). Despite high failure levels, human clinical trials seem to be less at risk as a result of governing organizations and regulations that require rigorous study design and independent and overlapping oversight, such as randomization, blinding, and oversight by Institutional Review Boards (15). As discussed below, we believe that irreproducibility can ultimately be traced to an absence of a unifying life science standards framework. Indeed, few broadly implemented standards exist in basic biologic and preclinical research.
Extent and causes of irreproducibility
Discussion about the irreproducibility of many studies is not new, and not without controversy (21). Two highly publicized studies highlighted a significant lack of reproducibility of academic research findings in the industrial (pharmaceutical) setting (see Fig. 1). In 2011, Prinz and colleagues (2) reported that in 67 projects over 4 years, published data were reproducible only 21% to 32% of the time depending on the definition of reproducibility (all data vs. main dataset) used. A subsequent report by Begley and Ellis (1) reported an even more alarming finding: Of 53 “landmark” articles attempted to be replicated over a 10-year period, only 6 (11%) were reproducible. Industry stakeholders interviewed for the Global Biological Standards Institute (GBSI) report (8) also reported extensive experiences with a lack of reproducibility. A 2013 University of Texas MD Anderson Cancer Center survey (3) found that more than 65% of senior academic faculty had experienced being unable to reproduce a finding from a published article. Although most tried to contact the original authors, more than 60% of the time they received an indifferent or negative response or no response at all.
Awareness of widespread irreproducibility and efforts to address it continue unabated, and were the subject of two sessions of the President's Council of Advisors on Science and Technology (PCAST) in early 2014 (http://www.whitehouse.gov/sites/default/files/microsites/ostp/PCAST/pcast_public_agenda_jan_2014_updated.pdf). The NIH has recently shared ongoing and future activities to enhance reproducibility, including the planned development of a training module on increasing the reproducibility and transparency of research findings, with an emphasis on good experimental design (11, 15). Also in early 2014, NIH solicited input on existing and needed training activities relevant to data reproducibility and in support of a planned extramural grant program. Founded in 2012, GBSI (http://gbsi.org/) is the only organization specifically dedicated to enhancing the quality of biomedical research by advocating best practices and standards to accelerate the translation of research breakthroughs into life-saving therapies. More recently, the Meta-Research Innovation Center at Stanford (METRICS; http://med.stanford.edu/metrics/) was formed and focuses on the rigorous evaluation of research practices and identifying ways to optimize the reproducibility and efficiency of scientific investigations.
As shown in Fig. 1, irreproducibility is the result of multiple, interrelated factors that can occur in response to the changing life science research/publication landscape and in conjunction with the absence of a unifying, consensus-based standards framework (8). For example, these differences can occur when one laboratory performs an experiment in a more optimal way, or even when both laboratories perform the experiment optimally but with inherent differences in methods or use of reagents (e.g., authenticated cancer cell lines). This situation is made far worse by an absence of detailed written protocols and poor documentation of research practices, which is common in biomedical and public health research (22). Multiple systemic causes contribute to variability of both practices to ensure quality and performance of the specific experiment in question, including absence of formal research quality systems, variable education/training of staff, and differences in journal review and reporting policies such as adoption and adherence to reporting guidelines (23).
Lack of standards in basic biologic research
The concept of standards is not new, nor specific to the life sciences. Standards have been essential to the successful development of technologies and products in countless other fields outside of biology, but have also been effectively used in the life sciences, particularly in the clinic, to reduce variability and improve quality and outcomes such as blood banking (24) and clinical laboratory diagnostics. Yet “standard” can be an inflammatory word for many researchers, one that conjures up unwelcome images of bureaucracy, regulation, and obscure references and alphanumeric designations (8). In reality, biologic standards include highly characterized reagents (material standards); documents that outline community consensus around certain practices [written consensus (or paper) standards]; as well as myriads of ad hoc or formalized systems, processes, and procedures (best practices) developed and instituted by individual laboratories. Although material standards are already used in some research communities, particularly in areas like vaccine and gene therapy development, few established and broadly implemented standards exist in basic biologic/preclinical research. Both established and newly emerging areas in biology are in urgent need of expanded development and implementation of consensus-based standards if we are to avoid an increasing “irreproducibility” crisis. Among those that are particularly impactful to cancer biologists are next-generation sequencing, stem cell and synthetic biology, biomarker development, mass spectrometry, and flow cytometry (just to name a few).
The remainder of this article briefly highlights, as examples, the need for consensus-based standards in the following three fairly well-established areas: cancer cell lines, research antibodies, and high-throughput screening (HTS).
Cell line misidentification
Cell lines (e.g., breast cancer) are a critical component of cancer research. However, even for an expert, it can be difficult or impossible to determine with certainty from which type of tumor a particular cell line originates (25). Because cells are repeatedly grown, frozen, passaged, and stored by laboratories, cross-contamination and errors can occur, resulting in experiments being unintentionally performed on an incorrect cell type (26). Although many have been contaminated by HeLa, several common esophageal cell lines that have served as a basis for more than 100 publications and several clinical trials have been shown to be from other tissues of origin (e.g., lung and colon; ref. 27). Misidentification of cell lines can be prevented by cell line authentication. Authentication is based on determining the genetic signature of a particular cell line and comparing it with established databases to ensure that the cell line used by a laboratory matches the expected signature. In 2011 and 2012, an international group of scientists from academia, major cell repositories, government agencies, regulatory agencies, and industry collaborated to develop an accredited standard that describes optimal cell line authentication practices, ANSI/ATCC ASN-0002-2011: Authentication of human cell lines: standardization of STR profiling (28). Multiple journals, including Nature (29), several American Association for Cancer Research (AACR) journals, and the International Journal of Cancer, now require or strongly recommend cell line authentication. Other issues particularly relevant to cancer lines (e.g., aneuploidy and mutation generation from multiple passages) will also have to be addressed as future authentication-related standards are considered, such as by GBSI's recently launched Cancer Cell Authentication and Standards Development Task Force.
Research antibodies
The application of antibodies across cancer biology—from basic research to diagnostics to therapeutics—is pervasive (13, 30). Standards have been applied, formally or informally, to the generation and use of diagnostic and therapeutic antibodies; however, basic research antibody reagents are generally devoid of any kind of consistency or standardization. Although blame is often pointed at the manufacturers, Helsby and colleagues (30) contend that scientists and publishers are not doing everything they can to improve the situation and, thus, improve reproducibility, such as failing to provide thorough reporting of research antibody use or demonstrating that antibodies were validated. Moreover, publications routinely do not provide the host species, code number, and even the antibody supplier. Such omissions make it difficult or impossible for reviewers to establish the likely reliability of the results, and for researchers to reproduce the experiments. The scale of this problem, combined with high-profile concerns about experimental reproducibility discussed throughout this article, has led the Nature Publishing Group to include a section on antibody information in their recent Reporting Checklist for Life Science Articles (29). Moving forward, science policymakers and funding agencies should consider the establishment of an independent certification program from which commercial venders of research antibodies could seek approval.
Propagation of HTS artifacts
A third area, often overlooked in the irreproducibility debate, is the application, or misapplication, of high-throughput technologies (i.e., HTS). Such technologies are experimental power multipliers giving scientists the capability, in principle, to catalyze discoveries expediting problem solving and hypothesis testing. Chemical library HTS, developed to enhance the rate of lead generation by pharmaceutical and biotechnology companies, is now widely adopted in academia to both enable drug discovery and the rapidly expanding field of chemical biology. Although notable HTS standards and practices have emerged, such as Z-factor analysis as a statistical measure of signal and variation in HTS assays and guidelines for minimal reporting information (see ref. 31), there is a growing recognition that considerable time and resources are consumed, and promising compounds underestimated in the pursuit of red herrings. In most cases, compounds with reproducible but deceptive primary HTS assay activity are eventually abandoned, but rarely with cautionary reports describing the basis and mitigating solution to the confounding behavior (32). This often leads to repeated reencounters with the same problem (ironically, an issue of erroneous reproducibility rather than irreproducibility).
For example, in vitro phenotypic and animal model irreproducibility founded on questionable interpretations of highly reproducible HTS results can have costly and prolonged consequences (see refs. 33, 34, and http://www.clinicaltrials.gov/ct2/results?term=ataluren&Search=search). An illustrative example is PTC124 (ataluren), developed as a general suppressor of nonsense mutations that would be efficacious in rare genetically inherited disorders caused by nonsense codon-mediated disruption of important proteins (e.g., CFTR or dystrophin; ref. 35). However, the foundation of ataluren discovery and optimization originated with an HTS reporter gene assay in which a nonsense mutation was engineered into a luciferase reporter sequence as a surrogate for a disease gene. Additional studies using this HTS assay design demonstrated a dependence of ataluren activity on the specific reporter used in its discovery that was subsequently explained by an alternate mechanism for its activity that was inconsistent with nonsense codon suppression (36). Importantly, essential lead optimization information was lacking in the Methods section of the key published article (35). Several journals are now responding to insufficient descriptions of methodology by vastly increasing the space allotted for Materials and Methods, and repositories such as PubChem provide a means to easily link chemical structure-activity data with the publication. In cases in which the HTS artifact is carefully examined, a fundamental understanding of the underlying mechanism can lead to general solutions to a common problem (37).
Building the case for standards
Because irreproducibility is highly prevalent and its effects are profound and lasting, all stakeholder groups (see Fig. 2) interviewed for in The Case for Standards (8) agreed that there is a need for additional standards in biologic research. Cancer biology is certainly no exception. Solutions to this pervasive problem can focus on (i) increasing recognition of intentional differences between experiments conducted by different laboratories (e.g., improving the systematic documentation of research and data reporting practices), and (ii) reducing unintentional differences between laboratories. Because standards can effectively reduce differences in practices by aligning the community around consensus-based methods, they can serve as an effective solution for irreproducibility. Of course, achieving these goals will require systematic quality control and accountability across the entire life science community, including researchers, reference material providers, standard developers, and publishers. About the latter, some researchers have suggested that the incentive structure of scientific publishing itself must change for the above reforms to become successful. Under this system, the quality of individual scientists is judged on the basis of their number of publications and citations, particularly in so-called high-impact journals (16). Tracking replication as a means of post-publication evaluation, as proposed by Hartshorne and Schachner (19), can both assist researchers identify reliable findings and explicitly recognize and incentivize the publication of reproducible data and results.
Multiple standards can be envisioned; for example, standards to ensure quality for reagents, assays, laboratory practices, as well as data analysis, reporting, and sharing. The development, implementation, and harmonization of standards requires community consensus and alignment around both the necessity for standards and their content, including.
Educational initiatives and training for both students and experienced researchers as well as certification programs for institutions and laboratories to raise stakeholder awareness of the importance of data/study reproducibility and the purpose and benefits of adopting biologic standards;
Opportunities and forums for stakeholders to identify areas in the life sciences in which accelerated standards adoption could provide maximum benefit;
Engagement of stakeholders with standards development organizations or material reference providers in the development of specific standards; and
Development of effective policies and practices and sufficient and sustainable funding within the life science research community to ensure the proactive development and periodic updating of biologic standards.
Next steps and moving forward
The need for cancer research standards increases with advances that drive innovations and new discoveries. These have fueled the explosive growth of bioinformatics, genomics, and other emerging “omics” that are transforming translational research and healthcare and fueling the quest for personalized medicine. Despite this emphasis on translational research, basic biologic research remains essential to support the prevention, diagnosis, and treatment of cancer and other diseases. The biologic research community can no longer tolerate irreproducible results. The GBSI, METRICS, and other organizations; journals such as Cancer Research; funders such the NIH and Prostate Cancer Foundation; and professional organizations like AACR, have recognized the urgent need to address irreproducibility and are stepping up to improve research quality. But these early, largely disconnected efforts must now be augmented with a much larger collaborative and unifying effort that engages and mobilizes all stakeholders across the life sciences to provide coordinated input on the need for standards and how these needs can be effectively addressed and ultimately implemented into practice.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: L.P. Freedman
Development of methodology: L.P. Freedman
Writing, review, and/or revision of the manuscript: L.P. Freedman, J. Inglese
Study supervision: L.P. Freedman
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.