Research using laboratory mice has led to fundamental insights into the molecular genetic processes that govern cancer initiation, progression, and treatment response. Although thousands of scientific articles have been published about mouse models of human cancer, collating information and data for a specific model is hampered by the fact that many authors do not adhere to existing annotation standards when describing models. The interpretation of experimental results in mouse models can also be confounded when researchers do not factor in the effect of genetic background on tumor biology. The Mouse Tumor Biology (MTB) database is an expertly curated, comprehensive compendium of mouse models of human cancer. Through the enforcement of nomenclature and related annotation standards, MTB supports aggregation of data about a cancer model from diverse sources and assessment of how genetic background of a mouse strain influences the biological properties of a specific tumor type and model utility. Cancer Res; 77(21); e67–70. ©2017 AACR.
Laboratory mice have been used as models for investigation into the genetic basis of human cancers for over a century (1). The laboratory mouse's physiologic and genetic similarity to humans combined with the amenability of the mouse genome to precise manipulation (e.g., transgenes, targeted mutations, recombineering, CRISPR/Cas9, etc.) contributes to the position of the mouse as the premier animal model for investigating human biology and disease (2, 3). Although the limits of animal models for recapitulation of all aspects of human disease must be appreciated (4, 5), the laboratory mouse remains a powerful genetic model for analyzing a myriad of clinically relevant aspects of cancer biology (6, 7). Genetic mapping studies in mice have been instrumental in identifying genes associated with cancer susceptibility and resistance (8). Key insights into the molecular genetics of cancer initiation and progression have been discovered using genetically engineered mouse models (GEMM; ref. 9).
The Mouse Tumor Biology (MTB) database (http://tumor.informatics.jax.org) is a comprehensive information resource of mouse models of human cancer. The central mission of the resource is to facilitate the use of the laboratory mouse as a model system for investigating the genetic and genomic factors that underlie human cancers. Evaluating the influence of genetic background on phenotypic variation has long been acknowledged to be critical for the generation and proper utilization of valid mouse models of human cancer (10). To achieve its mission, MTB provides the community with expertly curated, semantically consistent data needed to select appropriate strains and mutant mice for experimentation. Expert curation to enforce semantic consistency is central to MTB's operating principles. The rigorous application of gene and strain nomenclature standards is essential for aggregation of data about the same model from diverse sources and for comparing models one to another.
MTB was first released online in 1998 (1) and continues to be a freely available community informatics resource. The database currently contains information on over 46,000 mouse models (representing >82,000 individual tumor frequency records) from nearly 7,000 different cohorts of mice. It also includes >6,500 images, derived from both histopathologic studies and cytogenetic assays. MTB reference collection holds data on nearly 15,000 scientific articles of mouse models of cancer, each indexed to their associated tumor types. Historically, the scope of MTB covered GEMM and conventional inbred and hybrid model systems. Renewed interest in xenograft models using immunodeficient and “humanized” mice has led to an expansion of scope to include over 400 patient-derived xenograft (PDX) models. These models present a unique opportunity for studying human cancers in vivo and to address clinically relevant questions about therapy response, cancer heterogeneity, and acquired and de novo resistance to therapy (4).
Data Acquisition and Curation
The primary source of data in MTB is the peer-reviewed scientific literature. Data are also obtained from direct researcher submissions and from a variety of cancer research–related bioinformatics resources such as PathBase (http://www.pathbase.net) and the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/). All data in MTB are reviewed to ensure conformance to official nomenclature for genes, alleles, and mouse strain names (11). Internally developed controlled vocabularies for anatomy, cell type, and tumor type are used to annotate records in the database to ensure that database searches are complete and accurate. Members of the MTB curation team are participating in ongoing community efforts to develop and improve ontologies for these biological concepts. Comments, suggestions, and data contribution from the cancer research community are encouraged. Community feedback can be provided using the user support feedback form that can be accessed from every web page in MTB.
Display and Search Features
MTB supports several methods for searching and browsing the database content, including faceted search interfaces, interactive tables and graphics, and other web-based search forms (see Supplementary Video S1).
Model summary table
The model summary table on the MTB home page lists counts of mouse models in the database for the most common types of human cancer (Fig. 1). The table displays mouse models where the tumor type is reported at a frequency of ≥80% in a colony/study size of ≥20 mice. This allows users of the resource to focus their search on high potential models instead of models where a specific tumor type may appear only sporadically. For example, a lung cancer researcher might begin by choosing the link corresponding to potential lung cancer models in mutant mice. The data summary returned from this link can be narrowed or broadened using the appropriate facet selections. The same lung cancer researcher, for example, could narrow their search by choosing to see only those models with Kras mutations. Subsequently, they might choose to broaden their search by deselecting the facet for “human lung tissue,” thereby opening up the search to display all potential cancer models with a Kras mutation.
In addition to inbred and GEMMs, MTB also contains data on PDX models distributed by The Jackson Laboratory. Data for these models can also be accessed from the model summary table on the home page. Each PDX model is displayed with its key characteristics, including diagnosis, cancer stage and grade, cancer type, summarized genomic data, and therapy response data.
Interactive graphical data summaries
High-level comprehensive visual summaries are useful to enhance comprehension of trends in large and diverse datasets and for drawing attention to areas of interest or patterns held within those data. The tumor frequency grid presents a high-level summary of all the data in MTB regarding spontaneous tumors in a subset of inbred strains of mice. Patterns of tumor susceptibility can be recognized in the “heatmap” pattern of the grid, and the underlying data can be accessed by clicking on a grid cell. A dynamic iteration of the grid presents the data in a user-configurable format, so that the strains and organs can be refined to reflect the specific interests of the user. MTB's Cancer QTL Viewer presents a graphical summary of published cancer-related QTL studies in the mouse that are integrated with the rich biological annotations of mouse genes available from the Mouse Genome Informatics database (12). Users can upload their own unpublished annotations and display them in the context of published QTL regions.
Other web-based search forms
Conventional web-based search forms in MTB allow users to craft searches with very specific criteria for tumor characteristics, mouse strain names and types, genes and allele type, and literature references. Researchers can also search the database using human gene symbols, such as a list of genes commonly mutated in a human cancer. The input list is mapped to corresponding mouse orthologs, and a summary table with links to relevant records in MTB is returned. MTB also maintains search tools to aid researchers with finding relevant genomic datasets for mouse models of human cancer in public repositories such as GEO. To make external genomic datasets about mouse models “findable,” MTB curators associate datasets with semantically consistent metadata for strain name and tumor type.
The MTB software is designed to be stable, extensible, and cost-effective. The system has four major software components:
(2) The curatorial interface for data entry is implemented as a Java Swing desktop application.
(3) The backend is a highly normalized relational database running on Postgres.
(4) The MTB application programming interface (API) is implemented as SOAP-based web services. The API allows access to MTB data in a platform and language-independent manner that is sufficiently flexible to serve the diverse needs of bioinformaticians.
Discussion and Future Directions
In an era of rapidly evolving technologies that are generating a grand abundance of cancer models and associated data, MTB has a unique role in ensuring that the investment in the development and characterization of cancer models is realized through the implementation of robust data integration practices and development of tools for model comparison. MTB will continue to provide high-quality, value-added information on different kinds of cancer models in a freely accessible community database.
A major focus for MTB in the future will be to expand from supporting researchers focused primarily on the genetics of cancer to supporting translational and preclinical research communities by representing data and information needed for cancer model validation and credentialing (13). Examples of new data sources relevant to genetic models of cancer that will be integrated into MTB include cancer phenotypes identified through the international Knockout Mouse Project (14) and complex trait mapping studies using state-of-the-art genetic resources, such as diversity outbred and collaborative cross mice (15). To enhance the relevance of MTB for translational cancer research, we have partnered with other community informatics resources and with cancer researchers to draft minimal information data standards to describe PDX models. These standards, in turn, will be used in a collaborative effort to develop a web portal to support searches of PDX models across multiple, distributed repositories.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: D.M. Krupke, D.A. Begley, C.J. Bult
Development of methodology: D.M. Krupke, D.A. Begley, C.J. Bult
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): D.M. Krupke, D.A. Begley, J.P. Sundberg, S.B. Neuhauser, C.J. Bult
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D.M. Krupke, D.A. Begley, J.P. Sundberg, S.B. Neuhauser, C.J. Bult
Writing, review, and/or revision of the manuscript: D.M. Krupke, D.A. Begley, J.P. Sundberg, S.B. Neuhauser, C.J. Bult
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): D.M. Krupke, D.A. Begley, J.P. Sundberg, J.E. Richardson, S.B. Neuhauser, C.J. Bult
Other (software engineer): S.B. Neuhauser
This work was supported by NCI (CA089713 to C. J. Bult).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.