Abstract
The NCI-60 cancer cell line panel provides a premier model for data integration, and systems pharmacology being the largest publicly available database of anticancer drug activity, genomic, molecular, and phenotypic data. It comprises gene expression (25,722 transcripts), microRNAs (360 miRNAs), whole-genome DNA copy number (23,413 genes), whole-exome sequencing (variants for 16,568 genes), protein levels (94 genes), and cytotoxic activity (20,861 compounds). Included are 158 FDA-approved drugs and 79 that are in clinical trials. To improve data accessibility to bioinformaticists and non-bioinformaticists alike, we have developed the CellMiner web–based tools. Here, we describe the newest CellMiner version, including integration of novel databases and tools associated with whole-exome sequencing and protein expression, and review the tools. Included are (i) “Cell line signature” for DNA, RNA, protein, and drugs; (ii) “Cross correlations” for up to 150 input genes, microRNAs, and compounds in a single query; (iii) “Pattern comparison” to identify connections among drugs, gene expression, genomic variants, microRNA, and protein expressions; (iv) “Genetic variation versus drug visualization” to identify potential new drug:gene DNA variant relationships; and (v) “Genetic variant summation” designed to provide a synopsis of mutational burden on any pathway or gene group for up to 150 genes. Together, these tools allow users to flexibly query the NCI-60 data for potential relationships between genomic, molecular, and pharmacologic parameters in a manner specific to the user's area of expertise. Examples for both gain- (RAS) and loss-of-function (PTEN) alterations are provided. Clin Cancer Res; 21(17); 3841–52. ©2015 AACR.
Introduction
This review provides a synopsis of both the use and novel features of the CellMiner web application. CellMiner is designed specifically for the purpose of facilitating integration of pharmacologic with molecular data from the NCI-60 cell lines. Its provision of “cell line signatures” for both drug activity and multiple forms of molecular data, in which many of the preprocessing steps have already been done, allow a broad segment of the scientific community to make rapid and meaningful explorations into pharmacologic–molecular relationships. The cell line models will always form the basis for studies of this type, due to their obvious advantages in providing testable models. Observations and hypotheses with translational importance made with these models will increasingly form the intellectual basis for a more specific and logical application of treatments, based on a patient's disease's-specific molecular characteristics.
Data and Tools Available through CellMiner
CellMiner is a web-based application that provides both the data for the NCI-60 cancer cell lines, and the tools to mine those data (1, 2). It is appropriate for both the novice and the expert in the field. The application is accessed using http://discover.nci.nih.gov/cellminer/ and, in the current version, is organized into the 7 tabs shown in Fig. 1A (3).
The “Home” tab (top left, Fig. 1A) provides (i) a general description of the site, (ii) references, (iii) recent press releases, (iv) a description of each of the other 6 tabs, and (v) links to the Discover and Developmental Therapeutics Program websites (4, 5).
The “NCI-60 Analysis Tools” tab provides visualizations and patterns for quality-controlled molecular and pharmacologic data, as well as tools for data integration. Both the datasets and integration tools available in this tab are focused on in this publication.
The “Query Genomic Data” tab provides access to all molecular data, queryable by gene identifier, chromosomal, or genomic location, or platform-specific identifier for 17 platforms. It does not provide a synopsis result by gene or have internal requirements for consistency or range (within a single gene) as within the “NCI-60 Analysis Tools.” Data are received in both Excel (.xls) and text (.txt) format. New datasets and organization improvements continue to be added to this tab.
The “Query Drug Data” tab provides access to the data for growth inhibition 50% (GI50) compound activities measured by the Developmental Therapeutics Program (5, 6). The curated version of these data with synopsis results by drug and internal requirements for consistency and range (within a single drug) are in the “NCI-60 Analysis Tools” (see below). Compounds are queryable using 1 of 6 options (Fig. 1B, step 1). Data are received in both Excel and text formats. Data updates and organization improvements have been and will continue to be incorporated in this tab.
The “Download Data Sets” tab allows one to download entire datasets in either raw or normalized formats (dependent on dataset). Available in this section is also our cell line fingerprinting data, a useful resource for identification of each of the 60 cell lines (7). Also, introduced and included here in the lower “Download Normalized Data Set” section of the page is our “RNA: 5 Platform Gene Transcript” compilation of quality-controlled transcript data from microarrays. This is the all gene version of the information provided by “NCI-60 Analysis Tools”\“Cell line signature”\“Gene transcript z scores.” The datasets within this tab will primarily be of use to the bioinformatician. New datasets and organization improvements continue to be incorporated to this tab.
The “Cell Line Metadata” tab provides background information on the cell lines, including tissue of origin, age and sex of patient, prior treatment (when known), histology, ploidy, TP53 status, multidrug resistance function, and doubling time. A link to each cell lines fingerprinting data is included.
The “Data Set Metadata” tab provides background information on the types of data, platform information, the principal collaborators, and some description of the data. Currently, there are 22 datasets described here, with more being added.
The “Query Genomic Data” and “Query Drug Data” Tools
These 2 tools give access to the specific data in the absence of the additional quality control assumptions applied within the “NCI-60 Analysis Tools” section. The data in this form may be preferential dependent on the question being asked and allow users flexibility to apply their own judgment and assumptions.
“Query Genomic Data” functions as the unfiltered data query tool for the molecular datasets (Fig. 1A). In step 1, users select the type of queries that best fit their needs. The query options include (i) gene name, (ii) RefSeq (mRNA or protein), (iii) Entrez identifier, (iv) chromosome number, (v) chromosome location, (vi) cytoband, or (vii) 4 types of platform specific identifier. In step 2, users input these identifiers either as a list or as an uploaded text (.txt) or Excel (.xls) file. In step 3, users select from among 17 datasets that provide various types of information at the DNA, RNA, or protein levels described previously (2, 3, 8–13). In step 4, users provide their e-mail address and click “Get data.” There are currently 25,722 transcripts, including genes, pseudogenes, and open reading frames with data in this format.
“Query Drug Data” functions as the unfiltered data query tool for the compound activity dataset (Fig. 1B). In step 1, users may select the type of query: (i) NSC, (ii) compound name, (iii) molecular formula-exact match, (iv) molecular formula-element match, (v) molecular weight range, or (vi) mechanism of action (introduced here). The “Molecular formula-element match” allows the used to search based on specific elements in the compound, such as Zn or Se. In step 2, the user inputs these identifiers either as a list or as an uploaded text (.txt) or Excel (.xls) file. In step 3, users provide their e-mail address and click “Get data.” There are currently 52,269 compounds with activity data in this format.
NCI-60 Analysis Tools
These tools (Fig. 2A) provide synopsis information for 5 forms of molecular as well as compound activity data, in addition to offering several forms of data integration. Data in this form will be used most frequently when one wishes to make comparisons across platforms in a more systems biologic fashion without having to engage in time-consuming and detailed analysis for each dataset. A potential drawback for using data of this type is that quality control requirements are applied, such as data reproducibility and minimal probe and experimental range requirements, and these might eliminate meaningful data in specialized cases. In those cases where data are unavailable, the user will receive a “USER_ERROR_MESSAGE.” This file will detail why data were not received, including the input gene or compound failed quality control, that we have no data for that input, or there was some problem with your input.
The tool of interest is selected in step 1 using the checkboxes. The current choices are (i) “Cell line signature,” (ii) “Cross-correlation of transcripts, drugs, and microRNAs, and drugs,” (iii) “Pattern comparison,” (iv) “Exome sequencing (DNA) graphical synopsis by gene,” and (v) “Genetic variant versus drug visualization.” Two footnotes are included: the first “1Available identifiers and drug mechanisms of action definitions” provides all identifiers for (i) drugs (and compounds), (ii) gene transcripts, (iii) microRNAs, (iv) proteins, (v) amino acid changing genetic variants, (vi) protein function affecting genetic variants, (vii) DNA copy number, and (viii) the definitions for the drug mechanism of action categories and abbreviations. The second footnote “2Pattern comparison input template” provides the input file to which numerical values may be added for uploading into “Pattern comparison.” A maximum of 10 patterns may be entered in this fashion in a single query.
In step 2, you select your input by checking either “Input list” or “Upload file.” Using “Input list” allows you to type up to 150 identifiers [as provided in footnote 1 into the “Input the identifier(s)” box]. Using “Upload file” allows uploading of these same identifiers as either a text (.txt) or Excel (.xls) file or to upload your “Pattern comparison input template.” In step 3, enter your e-mail address and click “Get data.”
“Cell Line Signature” Data Compilation and Outputs
“Cell line signature” (Fig. 2B and C) has been developed to (i) integrate the multiple probes or experiments that exist for a single gene or drug, (ii) provide a “best” synopsis of several forms of molecular and pharmacological data, facilitating their cross-comparison, and (iii) avoid the necessity for users to have expertise in the integration of multiple platform types for systems biologic and pharmacologic studies. This tool provides several data types in a stereotypical format. The 6 current profiles types are shown in Fig. 2C and include (i) “Gene transcript z scores” for 25,722 genes, pseudogenes, and open reading frames (from 5 microarray platforms, recorded in the National Center for Biotechnology Information, Gene Expression Omnibus (GEO) with accession numbers GSE5949, GSE5720, GSE32474, GSE29682, and GSE29288); (ii) “microRNA mean values” for 360 microRNAs (GEO accession number GSE22821), (iii) “Drug activity z scores” for 20,861 compounds, (iv) “Gene DNA copy number” for 23,413 genes (from 4 microarray platforms, with GEO accession numbers GPL11068, GPL13786, GPL3812, and GPL6983); (v) “Genetic variant summation” for 12,705 amino acid changing and 9,142 protein function affecting genes; and (vi) “Protein mean values” for 94 genes and 162 antibodies (GEO accession number GSE5501). Introduced here are help pages designed to give the new user a quick synopsis of what the tool does for each of the signatures. These are accessed by clicking on the text for the signature-of-interest (Fig. 2B).
To use the tool, in step 1, check the “Cell line signature” box and then select the radio button for the data type of interest. Input the appropriate identifiers in “step 2” (provided in the step 1 footnote 1). Five of these signatures have been introduced previously (2, 8, 14). Examples of the bar graph outputs for each of the 6 signature types are presented in Fig. 2C.
The “Protein mean values” signature is introduced in this article with the release of the CellMiner version 1.6. Data were generated using reverse-phase protein lysate arrays (15). The dataset is high quality, having met stringent specificity requirements. Details of the antibodies used in its generation are described in our AbMiner web–based tool (16, 17). The tool output includes relative protein levels as tabular data, in both normalized and normalized mean-centered forms. The mean-centered version is visualized as a bar graph (example in Fig. 2C for p53), with the bars for each cell line color-coded by tissue of origin (2). Range, minimum, maximum, average, and SD for the normalized data are included. The user also receives the distribution of the normalized data as a histogram.
The “Cross-Correlations” Tool: Combining and Comparing Different Data
To enable cross-comparison of several forms of data, we have developed “Cross correlations of transcripts, microRNAs and drugs” (Supplementary Fig. S1A). Upgraded from its introduction as an option for either gene transcripts or drug activities (2) under the old “z score determinations” tool, it is now a stand-alone tool (introduced in this article), allowing direct comparison of any combination of from 2 to 150 gene transcript levels, microRNA expression levels, and compound activities in the same query. Identifiers available for input are listed in the “Identifiers and drug mechanism of action” footnote 1 (Fig. 2A, step 1). The output provides all cross-correlations for the selected identifiers.
In the example given in Supplementary Fig. S1B, the “Cell line signature” pattern of SLFN11 transcript expression, a gene recently identified in the DNA damage response pathway (18, 19) was used as input for a “Pattern comparison” analysis, which identified significantly correlated drugs. The cross-correlations output between the transcript expression level of SLFN11 and the activities of these drugs, organized by mechanism of action type, is shown (mechanism of action headers have been added to clarify the figure).
The “Cross correlations” output identifies their interrelationships, identifying significant positive correlations between SLFN11 and DNA damaging drugs (alkylating agents targeting guanine N7, A7; DNA synthesis inhibitors, Ds; topoisomerase I inhibitors, T1; topoisomerase II inhibitors, T2). Among the FDA-approved or clinical trial drugs, tubulin inhibitors and serine-threonine kinase inhibitors (STK), including RO-5126766, had significant negative correlation. The importance of SLFN11 expression for the DNA targeting classes of drugs has been documented (18, 19). An additional point made obvious by viewing the data is that the A7, Ds, T1, and T2 drugs have significant positive correlated to one another, presumably due to their common DNA damaging effects and the common features determining cellular response to these drugs.
These relationships between SLFN11 and drugs from these mechanism-of-action categories, originally detected bioinformatically, have been shown to be causal and illustrate the discovery potential of these databases (18–20).
Pattern Comparison
To compare any pattern of interest for the NCI-60 to the complete lists of multiple forms of data, we have developed “Pattern comparison” (Supplementary Fig. S1C). The “Pattern comparison” output has previously provided correlations to the input pattern for: (i) the activities of compounds (20,861 currently), (ii) the transcript expression of genes (26,065 currently), and (iii) 365 microRNAs. Included within the compounds are 158 FDA-approved and 79 clinical trial drugs. Upgraded and reintroduced in this article, “Pattern comparison” now also provides correlations to: (i) amino acid changing genetic variants (from exome sequencing) for 12,705 genes, (ii) putative protein function affecting variants absent in either the 1000 Genomes or ESP5400 (non-cancerous genomes from 5400 patients) for 9,143 genes, (iii) protein levels for 94 genes (as measured by 162 antibodies), and (iv) 24 phenotypic parameters, currently focused on genomic instability, epithelial versus mesenchymal status, and pharmacologic response (21, 22). The 2 forms of genetic variants are as obtained using the “Genetic variant summation” cell line signature (Fig. 2B). Format changes include a split of the results into 2 worksheets, with the statistically significant correlations in the “Significant” worksheet, and the complete (significant and insignificant) results within the “All” worksheet. For comparisons derived from “60 element pattern” inputs, the outputs also include a “Pattern input” worksheet that records the input pattern used (as an organizational help). Patterns may be entered using NA values to exclude cell lines from the analysis when there is scientific reason to do so, such as when one wishes to consider only those cell lines with wild-type TP53.
Direct identification of input parameters is available for: (i) gene transcript, (ii) microRNA transcript, (iii) drug activity, and (iv) protein expression levels (introduced here). All these identifiers are available within the footnote 1 identifiers download (Fig. 2A). Any other pattern, such as for a phenotype, characteristic, combination of molecular events, or tissue-of-origin may be entered using the “60 element pattern” option. An input template for this option is available using the “Pattern comparison input template” download from footnote 2 (Supplementary Fig. S1C). This template has also been updated to allow up to 10 patterns to be submitted simultaneously.
In the example given in Supplementary Fig. S1C and S1D, a melanoma tissue–specific “60 element pattern” was uploaded (Supplementary Fig. S1D, “Input”). The top 2 correlated gene transcript patterns plus CTLA4 are shown under “Outputs.” Calpain-3 (CAPN3) and dopachrome tautomerase (DCT) have prior association with invasiveness and assessment of melanomas, respectively (23, 24). CTLA4 is a target for ipilimumab in patients with nonresectable and malignant melanoma; however, its method of action is thought to occur through effects on the patient's endogenous immune system so its unexpected expression within 5 of 10 melanomas is of interest (25, 26). Of the next 5 most highly correlated genes from this list, 4 have previously established connections to melanoma (SOX10, TYR, S100A1, and MLANA). The top 2 significant microRNA correlations, hsa-miR-146a and 211, have prior association to melanoma initiation and progression and to invasiveness, respectively (27, 28). Among the FDA-approved or clinical trial drugs, the top 2 correlated drugs, vemurafenib and selumetinib, have prior report of efficacy in patients with melanoma (29, 30). The pattern matches between the melanoma input pattern and these already established molecular and pharmacologic patterns illustrate the quality and informative nature of these databases. The drug target CTLA4, microRNA 514, and drug hypothemycin are all novel correlations and illustrate the potential for of Pattern Comparison as a discovery tool.
Exome Sequencing (DNA) Graphical Synopsis by Gene
To provide the user a visual summary of the genetic variants that occur within a gene as identified in our exome sequencing study, we developed “Exome sequencing (DNA) graphical synopsis by gene” (10). This tool is selected by checking its box as shown in Fig. 3A. Identifier input and data retrieval are as described for Fig. 2A. The output includes both a synopsis of all variants within the NCI-60, as seen in Fig. 3B, and individual visualizations for all cell lines (not shown). The types of variants are identified as defined in Fig. 3B. The version of the gene used for visualization is as defined by the denoted NCBI accession number (in the html output). The 2 examples shown include a tumor suppressor (TP53) and an oncogene (KRAS). Note that in general, inactivating mutations tend to be more widespread for tumor suppressor genes as seen for TP53, whereas activating mutations tend to congregate for oncogenes, as seen for KRAS codon 12, with mutations in 7 cell lines. This may be expected to be the case for the important, although uncommon, neomorphic (gain of novel gene function) mutations as well.
Genetic Variant versus Drug Visualization
Designed to explore potential gene–drug relationships, “Genetic variant versus drug visualization” provides a visualization that compares variants for a given gene to shifts in activity for a given compound (14). This tool is selected as shown in Fig. 3C.
Using this tool, previously recognized (proof-of-principle) relationships may be observed, as for the protein kinase BRAF and the MEK (MAP2K1, 3, and 6)-ERK (MAPK1, 3, 4, and 15) inhibitor hypothemycin (10). Novel plausible relationships may also be discovered, such as between the DNA repair gene MUS81 and the DNA synthesis inhibitor clofarabine. Another example of this type is the proapoptotic STAT2 and the alkylating agent uracil mustard. STAT2 has had prior reported association with DNA synthesis inhibitors (31); however, genes that affect apoptosis might be expected to influence the response to multiple drug types. In addition, one can identify unstudied compounds that may target potentially pharmacologically useful genes, such as the epithelial–mesenchymal associated tight junction protein 3 (TJP3).
Introduced here, this tool also allows the user to query a single gene versus all compounds (example input, *:BRAF) or a single drug versus all genes (example input, 123127:*). These inputs will identify all correlated drugs for the *:BRAF example, or genes for the 123127:* (doxorubicin) example. Queried in this fashion, BRAF identifies 38 correlated drugs and doxorubicin identifies 4 genes. The criteria for inclusion using this function include a Mathew's correlation coefficients (MCC) of ≥0.596 (P ≤ 0.0002 for n = 35), and are the gene–compound pairings in Table 3 and Supplementary Table S4A and S4B of our prior article (14). Outputs of this type with more than 150 matches will be received as a summary sheet.
Consideration of Interparameter Relationships: RAS Activation
In addition to the consideration of individual molecular events involved in either cancer progression or pharmacologic response, CellMiner provides an opportunity to examine the contribution of multiple molecular events and forms of data simultaneously. A simple example that also illustrates the complexity of considering overlapping molecular events is provided by consideration of whether RAS is activated across these cancer cell lines. The 3 forms of RAS, H, K, and N, are well-studied, important oncogenes (32).
Three forms of data provided by the “Cell line signature” tool (Fig. 4A) are informative for this purpose, DNA copy number, gene transcript levels, and activating mutations (from the “Genetic variant summation” tool). Reviewing shifts in copy number from 2N from the “Gene DNA copy number” option, in its “Graphical Output” worksheet, one finds amplifications for all 3 forms of RAS. Examples are given for each gene in Fig. 4B. Reviewing those data systematically, 1, 6, and 5 of the cell lines appear to have DNA amplifications for HRAS, KRAS, and NRAS, respectively (Fig. 4C). Next, by observing the transcript levels using the “Gene transcript z scores” option, it is apparent that each of these RAS-amplified cell lines also have upregulated mRNA expression. In addition, 1 and 4 additional cell lines (without DNA amplifications) are found to have upregulated transcript levels in HRAS and NRAS, respectively (Fig. 4C). Finally, by determining the presence of activating mutations (those with amino acid changes at 12, 13, and 61) using the “Genetic variant summation” option, 1, 10, and 2 of the cell lines appear to be activated genetically in HRAS, KRAS, and NRAS, respectively (Fig. 4C). Viewing these data in aggregate (Fig. 4D), 3, 14, and 10 of the cell lines appear activated for HRAS, KRAS, and NRAS, respectively. So overall, 26 of 60 of the cell lines have indication of RAS activation, including 4 of 5 breast, 4 of 6 leukemia, and 5 of 9 lung cell lines.
Consideration of Interparameter Relationships and Their Relationship to Pharmacology: PTEN Knockdown
An extension of the consideration of the contributions of multiple molecular events simultaneously is how it might influence pharmacology. An example of this is provided by the consideration of PTEN knockdown. PTEN is an important tumor suppressor that antagonizes the PIK3 (PIK3CB, C3, and R5)–AKT (AKT1, 2, and 3) pathway and is commonly deleted in cancer (33, 34).
The same 3 forms of data used in Fig. 4 are again provided by the “Cell line signature” tool, querying the database for PTEN DNA copy number, gene transcript levels, and deactivating mutations, as shown in Fig. 5A. Reviewing shifts in DNA copy number from 2N from the “Gene DNA copy number” option, in the “Graphical Output” worksheet, one finds PTEN deletions for 4 cell lines. Three of these are shown in Fig. 5B. By observing the transcript levels using the “Gene transcript z scores” option, it is apparent that each of these PTEN-deleted cell lines also has downregulated expression and that these cell lines are the 4 lowest PTEN expressers in the NCI-60 (Fig. 5C). By determining the presence of predicted function-affecting mutations, as detailed in the “Genetic variant summation” tool, 4 different cell lines have indication of being genetic hypomorphs with loss-of-function (Fig. 5C). Of these, both BT-549 and SF-295 have nonsense (premature stops), MOLT-4, a frameshift, and SK-MEL28, the T167A mutation, which has been shown previously to result in a 50% reduction in function (35). Taking these 3 molecular parameters as a composite, a pattern of 8 cell lines with apparent PTEN inactivation can be derived (Fig. 5C, right).
Using this PTEN composite pattern as input for the “Pattern comparison” tool, using the “60 element pattern” option (Fig. 5A), 4 drugs with either FDA approval or clinical trial status are found to have significant correlation (P < 0.05, Fig. 5D). Of these, fenretinide (NSC 374551) has been shown to reactivate PTEN previously, affirming the relationship to the gene (36). PX-316 (NSC 710297) also has pathway connection, being an AKT inhibitor. Thus, 2 of 4 (50%) of the drugs with significant correlation to the input PTEN knockdown pattern have obvious connection to PTEN, whereas only 4.8% of the total known mechanism of action drugs present in CellMiner do. This enrichment was found to be significant (P < 0.01) by binomial distribution, providing evidence for the saliency of the molecular data when compared with the pharmacological.
Discussion
In addition to providing a review of the preexisting CellMiner tools, the current manuscript also introduces new tools and upgrades to current ones, as well as providing examples of how the tools and databases may be used for systems biology and systems pharmacology. Examples are given that provide both proof-of-principle and novel findings (Supplementary Fig. S1A–S1D; Figs. 3C and D, 4A–D, and 5A–D).
New and upgraded tools include (i) the introduction of the “Protein mean values” cell line signature (Fig. 2B and C), (ii) the “Cross-correlations of transcripts, microRNAs, and drugs” as an upgraded stand-alone tool (Supplementary Fig. S1A and S1B), (iii) 4 additional data types for the “Pattern comparison” output-“Amino acid changing” and “Protein function affecting” genetic variants, “Protein levels,” and “Miscellaneous phenotypic parameters,” and (iv) introduction of the star feature (*) for the “Genetic variant versus drug visualization” tool, allowing the user to rapidly identify all compounds with significant correlation to a single gene, or vice versa.
CellMiner previously has enabled us to (i) identify promoter–proximal transcriptional pausing in human genes (37, 38), (ii) discover the helicase SLFN11 as a causal determinant of response to DNA damaging agents (18), (iii) recognize the regulation of MYC expression by miR-375 (39), (iv) recognize the importance of MYC as a driver of mitochondrial genes (40), (v) reveal genetic inactivation or endogenous activation of CHEK2 across the NCI-60 (40); (vi) link USP7 and Daxx to taxane resistance (41), (vii) link TP53 wild-type status, Mdm2 transcript level, and miR-34a transcript level with nutlin activity (10), (viii) reveal the interrelationship between RAS (H, K, and NRAS)–BRAF–PTEN mutational status, EGFR expression, and ERBB2 expression with erlotinib activity (10), (ix) demonstrate the strong correlation between ABCB1 expression and doxorubicin activity (2), (x) recognize both known and novel genes expression levels, microRNA expression levels, and drug activities with a colon-specific pattern input to “Pattern comparison” (2), (xi) identify predominant co-regulation among cell migration genes (42), (xii) identify co-regulation among kinetochore genes, their prospective regulatory elements, and their association with genomic instability (43), (xiii) show the connection between accumulation of mass homozygotes in the cancer cell lines as compared with non-cancerous HapMap trios (44), (xiv) identify the drug Ro5-3335 as a candidate treatment for core binding factor leukemias (45), (xv) associate CDKN2A DNA copy number and expression to mitoxantrone activity (8), (xvi) define an epithelial gene expression signature (46), and (xvii) recognize the composite relationship between the mutational status of multiple genes from the EGFR–ERBB2 pathway and drug response, including the directionality of that influence as a function of molecular pathway considerations (14). The diversity among these observations gives an indication of the boundless scope and range of the types of possible discoveries that can be made using the NCI-60 database and CellMiner set of tools.
In addition to being a resource for generating or providing tests for hypotheses, such as those described above, CellMiner also provides a template for making “omic” data of this type accessible and usable for a broad portion of the scientific public. Access of this type remains a serious shortcoming for the field currently and improved access to and connectivity between the multiple cell line and clinical databases that have either already been done or are in progress should be a goal. A synopsis of the types of data available across the 3 major cell line screens, the NCI-60, the Cancer Cell Line Encyclopedia (CCLE), and the Cancer Genome Project (CGP), is presented in Supplementary Table S1. However, currently, the barriers to data integration and interrogation remain daunting, restricting access primarily to bioinformaticians, statisticians, and those with computer expertise. Because of the multifaceted nature of the disease information that needs to considered in the cancer context, there is certainly need for the input of molecular biologists, clinicians, and all others with pertinent domain expertise as well.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.