Abstract
Major advances have been made in the field of precision medicine for treating cancer. However, many open questions remain that need to be answered to realize the goal of matching every patient with cancer to the most efficacious therapy. To facilitate these efforts, we have developed CellMinerCDB: National Center for Advancing Translational Sciences (NCATS; https://discover.nci.nih.gov/rsconnect/cellminercdb_ncats/), which makes available activity information for 2,675 drugs and compounds, including multiple nononcology drugs and 1,866 drugs and compounds unique to the NCATS. CellMinerCDB: NCATS comprises 183 cancer cell lines, with 72 unique to NCATS, including some from previously understudied tissues of origin. Multiple forms of data from different institutes are integrated, including single and combination drug activity, DNA copy number, methylation and mutation, transcriptome, protein levels, histone acetylation and methylation, metabolites, CRISPR, and miscellaneous signatures. Curation of cell lines and drug names enables cross-database (CDB) analyses. Comparison of the datasets is made possible by the overlap between cell lines and drugs across databases. Multiple univariate and multivariate analysis tools are built-in, including linear regression and LASSO. Examples have been presented here for the clinical topoisomerase I (TOP1) inhibitors topotecan and irinotecan/SN-38. This web application provides both substantial new data and significant pharmacogenomic integration, allowing exploration of interrelationships.
CellMinerCDB: NCATS provides activity information for 2,675 drugs in 183 cancer cell lines and analysis tools to facilitate pharmacogenomic research and to identify determinants of response.
Introduction
The approach to molecular biology and pharmacology, commonly referred to as precision medicine, has been significantly changed over approximately the last 25 years by the introduction of omics data and the conceptual shift to the use of computer analyses of large datasets with a combination of statistics, machine learning, omics visualizations, and integration of multiple disparate forms of data. Starting with the pioneering work of the Developmental Therapeutics Program (DTP) at the NCI (1), many projects have been and are contributing sizable blocks of data, prominently including (but not limited to) the large (∼1,000 cell line) panels of the Cancer Cell Line Encyclopedia (CCLE) from the Broad/Novartis, the Genomics of Drug Sensitivity in Cancer (GDSC) from Sanger and Massachusetts General Hospital and the Cancer Therapeutics Response Portal (CTRP) from the Broad Institute.
The Genomics and Pharmacology Facility (GPF) has pioneered omics data acquisition and integration since the mid 1990s (1–9). Its efforts have led to the CellMiner and CellMinerCDB web application (2–7, 9, 10) allowing pharmacogenomic database access and integrative analyses across all public cancer cell line genomics and drug response databases (2).
NCATS has established an automated compound screening platform for large compound libraries using quantitative high-throughput (qHTS) format across multiple different disease models since 2008 (11–13). For cancer cell line viability screening, NCATS created the Mechanism Interrogation PlatEs (MIPE) compound library comprising approved and investigational chemotherapeutic agents, as well as common medications for noncancer indications. An additional design feature of the MIPE library is compound mechanistic redundancy allowing analyses across multiple compounds reported to hit the same target. Compound screening data using the MIPE library has demonstrated value for multiple cancer types, such as diffuse intrinsic pontine glioma (DIPG), Hodgkin lymphoma, Ewing sarcoma, small-cell lung cancer (SCLC), glioblastoma, and others (9, 14–17). Published and unpublished MIPE library compound screening data have been aggregated into a unified dataset called the NCATS–NCI Cytoxicity Dataset shared internally with the NCI through the Palantir Foundry platform. A subset of this unified dataset is now being made public through CellMinerCDB.
Here we introduce the public databases and web portal of CellMinerCDB: NCATS(https://discover.nci.nih.gov/rsconnect/cellminercdb_ ncats/). CellMinerCDB: NCATS enables individual users to access and explore the large NCATS drug response database, with an emphasis on pharmacology and its relationships to molecular genomics. CellMinerCDB: NCATS is integrated with 33 datasets from multiple projects from DTP, GPF, CCLE, GDSC, CTRP, the NCI DTP SCLC, NCI60-DTP Almanac, MD Anderson, and the Project Achilles from the Cancer Dependency Map Portal (DepMap; see Supplementary Materials and Methods for a full listing; refs. 4, 5, 7, 18–28). The omics analyses include single and two-drug activities, DNA copy number, methylation, and sequencing, whole genome transcriptome, mRNA and selected protein expression, metabolite levels, and clustered regularly interspaced short palindromic repeats (CRISPR) knockouts, allowing explorations of the relationships between those data and pharmacologic responses. Functionalities of the new CellMinerCDB: NCATS web application are introduced and discussed here with multiple examples validating the database. Details about general functionalities of the CellMinerCDB (https://discover.nci.nih.gov/rsconnect/cellminercdb/) platforms have been reviewed recently (2) and a 10-minute tutorial is on YouTube (Fig. 1A).
CellMinerCDB: NCATS is a public web application hosted in the Genomics and Pharmacology Facility of the Developmental Therapeutics Branch of the NCI Center for Cancer Research, and of the NCATS of the NIH.
Materials and Methods
The NCATS screening data contained within the CellMinerCDB: NCATS web application utilize RSTUDIO-2022.12.0–353 and were generated as previously described (29). Cells were treated with compounds for 48 hours in 1,536 well plates and assessed for viability using CellTiter Glo (Promega). Data were normalized to plate controls of DMSO-treated cells as 100% viability and no cells at 0% viability. A four-parameter curve fit was used to generate an IC50 and AUC. Z-score AUC (across cell lines) was calculated by subtracting the mean AUC and dividing by the SD of each drug across all cell lines screened.
All compounds were matched using SMILES and InChIKey to external databases to pull clinical status. NCATS Inxight, DrugBank, and CHEMBL were used as references for compound structure matching and global clinical status (30–32). Structure matching was done within the Palantir Foundry platform (Palantir Technologies) utilizing RDKit: Open-source cheminformatics (2021-09-4; Q3 2021 Release); and NCATSFind Resolver. NCATS cell lines were annotated internally using Cellosaurus for disease and tissue type and matched to the other cell line sets (33). The NCATS web application is an R/shiny app hosted on an NCI server.
Information sources for the cell lines and drugs include the NCI Thesaurus, PubChem and the scientific literature. The large amount of data coming from the included omics efforts and the platforms used to develop them has been previously described. Compound and cell line name variation across the different institutions cell line sets were resolved internally. An example is a single-compound with the names 122958 (NCI-60), ATRA (GDSC), tretinoin (CTRP), and isotretinoin (NCATS). Another example is a single-cell line with the names CO:COLO 205 (NCI-60), COLO 205 (CCLE), COLO-205 (GDSC), COLO205 (MD Anderson). All datasets have instances of missing data for specific cell lines, drugs, or genes.
Univariate analysis and multivariate analysis shown throughout were done using CellMinerCDB: NCATS functionalities or using data downloaded directly from CellMinerCDB: NCATS. The web application generated scatterplots, tables, and heatmap shown were generated using the selections described in the input boxes and figure legends. Drug versus drug activity comparisons not generated by the web application were done by Pearson correlation using R version 3.6.3. Bar charts were generated using GraphPad PRISM version 7.0. Violin plots were generated using ggplot version 3.3.5.
Bimodal drug activity density distributions were identified using a combination of a Gaussian Mixed Model-based (norm1mix package; version 1.3), a kurtosis test and visual inspection. Both these calculations and the density plots were done using The R Project for Statistical Computing.
Prediction of NCATS IC50 activity using CCLE microarray transcript expression by both univariate and multivariate analysis used Pearson correlation between drug response and gene expression of the target. The multivariate models use stepwise forward regression. Each model was initiated with a target for a given drug; multiple targets generated multiple models. Possible regression features included genes from Onco500 (34). A maximum of 10 features were added to each model and then pruned. For each iteration step, the feature with the lowest partial correlation P value after removing the effects of already included features was added using rcellminer 2.9.1 (35). A 10-fold cross-validated predicted response was calculated at each step using rcellminerElasticNet 0.1.1. Models were pruned by examining the statistical difference in the correlation of predicted versus observed response with each added feature using cocor 1.1–3. CCLE microarray expression data from CellMinerCDB was used (2).
Data availability
The data analyzed in this study were obtained from multiple sources. Within the application, the source of each data set is accessible within the Metadata tab, both within the “select here to learn more about…” link and from the “download footnotes” tab. A description of all data sources used in CellMinerCDB: NCATS is provided in the Supplementary Materials and Methods.
Results
The CellMinerCDB: NCATS web application
The CellMinerCDB: NCATS publicly accessible web application was created to both access the NCATS drug response data and enrich and expand its usefulness by integrating multiple other forms and sources of genomics, proteomics, and metabolomics data from the other public cancer cell line datasets using the CellMinerCDB platform (2).
A screenshot of the site, banner, and tabs for the CellMinerCDB: NCATS web application is presented in Fig. 1A. CellMinerCDB: NCATS allows drug comparisons and emphasizes cross-database (CDB) analyses with the other public cancer cell line databases. The univariate analyses tab allows generation of on-the-fly bivariate scatter plots and correlation analyses from a single input to compare all profiles within selected data sets. The multivariate analyses tab allows the exploration of multivariate models predictive of an observed profile. Analyzing selected tissues of origin is an option for both univariate and multivariate analyses. The metadata tab allows the download of datasets of interest for further processing and archiving. The search IDs tab provides the identifiers within each cell line set by data type. The help tab provides explanations and descriptions of the various functionalities within the web application. In addition, the video tutorial tab provides a description and explanation of the CellMinerCDB functionalities. Thus, CellMinerCDB: NCATS provides new data, multiple functionalities, and data integration, allowing users to mine independently the NCATS data without having to seek support from bioinformatics teams.
The NCATS input data
CellMinerCDB: NCATS comprises 2,675 drugs and compounds tested in 183 cell lines, of which, 2,667 have mechanism of action designations. The dataset was created as described in Materials and Methods and Fig. 1B. The output is fully compatible and integrated with CellMinerCDB (2). An asset of CellMinerCDB: NCATS is the unique compounds and cancer cell lines included (Fig. 1C and D).
NCATS contains two drug sensitivity metrics, Z- AUC and IC50 values. These boast a large range of screening concentrations, routinely using 11 concentrations between 0.79 nanomolar and 47 micromolar, which is an asset of NCATS drug testing (12). The drugs include 952 (36%) clinically approved, 790 (30%) that have entered clinical trials, and 908 drugs (34%) that are preclinical (Fig. 1C, left). Notably, 1,877 (70%) drugs and compounds are unique to NCATS (Fig. 1C, right). They have been annotated with their commonly accepted mechanisms of action. A feature of the NCATS dataset is the inclusion of 518 approved nononcology drugs not found in the other public databases (Supplementary Table S1). Those include 103 antiinfectives (antibacterial, mycobacterial, viral, or fungal) for systemic use, 86 cardiovascular or nervous system drugs, 72 alimentary tract and metabolism compounds.
The 183 NCATS cell lines distribution by tissue of origin is detailed in Supplementary Table S2. They include 72 (38%) unique cancer cell lines absent in other public cancer cell line databases (Fig. 1D; Supplementary Table S3). Figure 1D shows several of the rare disease subtypes including DIPG, renal Birt-Hogg- Dubé syndrome, hereditary leiomyomatosis, and TFE3 fusion cancer cell lines. Thus, CellMinerCDB: NCATS provides the user with substantial new drug and cell line data.
Cell line and drug overlaps of NCATS with other cancer cell line datasets
The cell lines overlaps for CellMinerCDB: NCATS as well as all other cell line sets are listed in Fig. 2A. As in our other CellMinerCDB websites (https://discover.nci.nih.gov/), cell lines are matched with common tissue of origin terms based on the OncoTree ontology levels developed by the Memorial Sloan Kettering Cancer Center (New York, NY) and Dana-Farber Cancer Institute (Boston, MA), primarily version 1.1 as described previously (2). Additional information such as patient gender or age from which the cell line originated are also included. Comparison between drug responses in cell lines is made possible by the overlap of cell lines across databases (Fig. 2A).
The drug and compound activity overlap between the multiple cell line sets is presented in Fig. 2B. Information on each cell line set activity measurements are accessible in the “data type” input box, Metadata “units” description or footnotes, or the provided urls. An asset for the user is that CellMinerCDB: NCATS automatically matches cell line and drug data across any cell line sets queried, which allows their comparison for identical, related by mechanism of action, or disparate drugs.
Omics data available for cross-comparisons in CellMinerCDB: NCATS
Figure 2C summarizes by cell line set and measurement type the profiles available in CellMinerCDB: NCATS, including 31,617 drug (and compound) activities, 261,848 molecular measurements and 18,119 miscellaneous signatures. All 28 included datasets are available for download from the Metadata tab (Fig. 1A). Our curation and standardization of these datasets minimizes the task of name matching.
The data types available for exploration based on the databases with overlapping cell lines include single-drug activities, two-drug combination activities, gene copy number, methylation and mutation levels, transcript expression, protein expression, metabolite levels, the DepMap Achilles (Achilles) CRISPR genetic dependencies, and miscellaneous molecular signatures. Those miscellaneous phenotypic signatures include the antigen presenting machinery (APM), epithelial–mesenchymal transition (EMT) status, replication stress (RepStress), genomic instability (HRD_LOH, HRD-SUM, NtAI, LST) and neuroendocrine status (NE). The metadata phenotypic signatures are accessible in the univariate analyses\data type\mda: miscellaneous phenotypic data. The number of data explorations one might pursue, depending on one's interest, easily jumps into the billions. The NCATS drug data can be compared to genomics data for the same cell lines in other datasets allowing one to relate the drug responses to omics features using CellMinerCDB: NCATS. The following examples illustrate the basic use of CellMinerCDB: NCATS.
Drug comparisons
The overlaps between cell lines and drugs across the “cell line sets” facilitate multiple forms of drug comparisons. Figure 3A shows a univariate analyses/plot data output for two structurally related TOP1 inhibitors commonly used in clinical oncology (36), topotecan (x-axis) versus SN-38 (y-axis), the active metabolite of irinotecan. Both are measured by NCATS and displayed using CellMinerCDB-NCATS. The highly significant correlation between the two drugs (P = 9.1×10−52) demonstrates internal assay consistency.
Similarly, Fig. 3B shows a univariate analyses/compare patterns comparing the NCATS anaplastic lymphoma kinase (ALK) inhibitor TAE-684 to other NCATS ALK inhibitors (by entering ALK inhibitor in the output MOA column). Of the 12 ALK inhibitors in the NCATS database, 10 show significant correlations demonstrating assay and mechanism of action reproducibility across cell lines within the NCATS drug response database.
Comparison of NCATS with GDSC and CTRP drug activities in Fig. 3C and D, respectively, shows the top 15 correlated compounds for each. Four protein kinase inhibitors are common between these two (shown as red bars): linifanib, sorafenib, AZD-7762, and tivozanib. Figure 3E is a univariate analyses/plot data analysis of one of these comparisons, AZD-7762 as measured by both NCATS (x-axis) and CTRP (y-axis), yielding a P value of 1.1×10−10. These observations demonstrate ways of comparing drug activities across databases to determine consistency across common cell line sets.
Compared globally, the average Pearson correlation for NCATS versus either GDSC or CTRP across all compounds using Z-AUC or IC50 is 0.4. Violin plots (Fig. 3F) visualize significant correlations between NCATS and 102/265 compounds (38.4%) for CTRP and 71/212 compounds (33.5%) for GDSC. The NCATS versus the PRISM drug data are not included in this analysis as none had a minimum 16 cell lines with overlap. The Fig. 3 examples are only a small sampling of the types of informative comparisons one might do.
Exploration of NCATS drug responses with omics or CRISPR data
The integration of the NCATS drug responses with a wide range of molecular, phenotypic, and signature data from the other omics databases (CCLE, GDSC, and NCI) allows correlation queries for overlapping cell lines. We next present a small group of these as illustrations with outputs and screenshots from CellMinerCDB: NCATS.
Figure 4A validates SN-38 activity (in NCATS) versus SLFN11 gene transcript expression (in GDSC) using CellMinerCDB: NCATS univariate analyses/plot data. The scatter plot confirms the expected significant correlation between these causally linked parameters (36). Figure 4B presents additional examples between NCATS and GDSC; all showing significant correlation between a drug's activity and the transcript levels of that drug target.
A second form of omics data comparison is given in Fig. 4C, comparing activity of the mTOR inhibitor VS-5584 from NCATS and MTOR DNA copy number from CCLE demonstrating significant correlation. CellMinerCDB: NCATS also shows that mTOR DNA copy number is significantly correlated to its transcript level (r = 0.49, P = 1.6E–61), providing the logical link between the drug activity and DNA copy number. Figure 4D provides additional examples of significant correlations between drug activities and DNA copy numbers; all linked through having the same gene both as drug target and molecular measurement. All have significant correlations between gene DNA copy number and transcript levels.
Figure 4E and F exemplify the possibility of testing NCATS drug activity versus genetic inactivation of the drug target. Figure 4E compares the growth inhibitory activity of vemurafenib (a BRAF inhibitor) to cell survival with BRAF CRISPR knockdown (as measured by Project Achilles). The resultant scatter plot demonstrates a significant correlation between the two. Figure 4F lists other examples showing significant correlations between drug activities and CRISPR knockdown; in each case linked through having the same gene both as the drug and CRISPR target. As for the drugs in Figs. 3 and 4 provides only a small sampling of the types of informative comparisons one might do.
To compare the predictive value of different genomics parameters, the NCATS approved and clinical trial drugs IC50 activities were each compared with the different genomics evaluations of their gene targets (transcript expression, gene copy number, methylation, mutations, and CRISPR) across the other nine platforms in CellMinerCDB: NCATS, resulting in 1,100 drug versus gene pairings (Supplementary Table S4). The percent significant correlations by platform were: (i) 5.3% for the CCLE DNA copy number, (ii) 8.8% for the GDSC methylation, (iii) 6.5% for the CCLE mutation, (iv) 5.1% for GDSC mutation, (v) 11.8% for the CCLE transcript microarray, (vi) 10.8% for the GDSC microarray, (vii) 12.8% for the CCLE RNA sequencing (RNA-seq), (viii) 9.2% for the CCLE protein, and (ix) 6.2% for the Achilles CRISPR. These results demonstrate the value of RNA-seq and proteomic analyses for predicting drug activity.
Although determination of protein levels remains limited in clinical samples, we found that both protein expression and gene-expression of the proapoptotic factor BAX in CCLE are significantly correlated with the IC50 activity of SN-38 in NCATS (Supplementary Fig. S1; P = 0.0013 and 0.0026 for 88 and 95 common cell lines, respectively). Thus, on the basis of the analysis of drugs tested in NCATS, we conclude that RNA-seq is currently the most practical predictor of drug response.
Multivariate and miscellaneous phenotypic signature (mda) analyses using CellMinerCDB: NCATS
Presuming that multiple factors are involved in drug response (36), we present two approaches for clinical TOP1 inhibitors (topotecan and SN-38, the active metabolite of irinotecan) using CellMinerCDB: NCATS.
The first utilizes the prior knowledge that the cytotoxicity of TOP1 inhibitors are dependent on SLFN11, apoptosis and transcription (36). Combining transcript expression of SLFN11 (Fig. 5A), BPTF (Fig. 5B) and high mobility group nucleosome-binding domain-containing protein 1(HMGN1; Fig. 5C) shows how the predictive value of SLFN11 can be strengthened by using the multivariate analysis tool of NCATS:CDB (Fig. 5D and 5E).
The second multivariate analysis available in NCATS:CDB (and the other CellMinerCDB websites) uses previously described multigene expression signatures, which can be retrieved using the “mda” tab in the “data type” pull-down menu at the left of the website (Fig. 6). Together, these examples demonstrate the increased power of aggregating multiple genomic parameters to predict drug activity.
Drug activity distributions and additional multivariate analysis
Figure 7 presents another form of exploration generated from the NCATS drug database: drug activity distributions with consideration of tissues of origin. Bimodal drug distributions were identified, demonstrating both sensitive and resistant cancer cell line responses. Enrichment for specific tissues of origin in the activity peaks demonstrates novel prospective therapeutic indications. Multivariate analyses using CCLE transcriptomics visualize multivariate molecular predictors. The first example given is for filanesib, with its bimodal activity distribution visualized in Fig. 7A and the significant prediction of that activity by KIF11, MYBBP1A, and TNFRSF10D (P = 1.2×10−7) in Fig. 7B. The second example given is for epothilone, with its bimodal activity distribution visualized in Fig. 7C and the significant prediction of that activity by TUBB6, ABCG1, GSK3G, and MLH1 (P = 1.2×10−7) in Fig. 7D. Diverse mechanisms of action drugs reveal enhanced activities for bladder, blood (leukemia), bone (sarcoma), bowel, brain, and lymphatic cancer cells in Fig. 7E.
Supplementary Table S5 presents an example of a more systematic pharmacologic prediction approach of NCATS IC50 drug activity distributions using CCLE microarray transcript levels. Included are 63 significant gene–drug combinations in which the genes are known targets for those drugs. In the case of ABT-737 (a BH3 mimetic and BCL2 gene family inhibitor), the generated multivariate model includes two known targets: BCL2L2 and BCL2 (as given by NCATS annotation).
Discussion
Making the NCATS drug activities publicly available is a significant addition to the omics arena. CellMinerCDB: NCATS gathers the NCATS drug response database and integrates it with nine other genomic and proteomic projects (see Fig. 1). The NCATS 2,675 drugs and compounds is second only to the large NCI/DTP activity screening in number (Figs. 1 and 2; ref. 2). Its high proportion of novel drugs, large number of nononcology drugs and inclusion of many novel cell lines, including rare tumors add significantly to the omics cancer cell line field.
Our curation of both the cell line and drug names enables integration with our previous CellMiner databases (2, 3, 9). It also resolves differences, making data retrieval and comparisons available with an intuitive web application. This combined with the molecular, metabolic, phenotypic, and signature data from NCI, CCLE, GDSC, and other databases adds a myriad of informative molecular parameters for the purposes of exploration, discovery, prediction, and verification of either previously known or novel relationships.
We find that the activity of drugs with similar mechanisms of action is in general internally consistent within NCATS and across the other drug databases (CCLE, CTRIP, GDSC, and NCI) as shown in Fig. 3. Activity variability for overlapping drugs between institutes is recognized and presumably comes from a combination of the type of robotics and biological techniques employed (37). NCATS uses 1,536 well plates, with compounds added immediately after cell plating and 48-hour drug exposure. CTRP and GDSC use 384 well plates, with compounds added 24 hours after cells plating and 72-hour drug incubation. All three projects use CellTiter-Glo. It is unsurprising that drug activity assays done under different conditions might give different results. However, our analyses shows that multiple drugs and compounds perform similarly regardless of differences in assay parameters. Thus, our recommendation for pharmacogenomics exploration with CellMinerCDB: NCATS is to first perform interdatabase analyses with drugs present in at least two platforms and prioritize drugs with consistent cytotoxicity response across databases.
CellMinerCDB: NCATS comprises two main analysis tools): “univariate analyses” and “multivariate analyses” (Fig. 1A). The pharmacogenomics analyses shown in Figs. 3,456–7, all generated within the CellMinerCDB: NCATS web application, provide examples of the many types of analysis possible. With 14.7 billion drug activity versus gene-molecular or phenotypic (CRISPR) measurements, practically, one is limited only by the number of questions and knowledge one has. This number does not include the many intergene molecular and interdrug activity comparisons one might do.
Figures 3A, 4A, 5A–E, and Supplementary Fig. S1 provide pharmacogenomic and proteomic explorations for SN-38, as prior work has causally related SLFN11 expression to the activity of TOP1 inhibitors (6, 38–40). The additional transcript examples in Fig. 4B, and DNA copy-number examples in Fig. 4C and D link various NCATS drugs to their molecular targets. The ability to perform gene knockdown (CRISPR) comparisons reflect how a gene knockdown measured in Project Achilles relates to response to drugs measured in NCATS. None of the 33 drug-target examples listed are FDA-approved biomarkers for their respective drugs; so each of them provides possible incentive for their development and use. One might easily expand this type of analysis to nontarget, but biologically relevant genes based on domain knowledge.
When using “univariate analyses,” we find the transcript data are stronger predictors of pharmacologic response than the other genomic data (gene mutations, copy-number variation, or methylation) available in the cancer cell lines (Supplementary Table S4). Currently DNA mutation is a predominant biomarker used for drug prediction. Although we see the expected predictive value of BRAF mutations with the activity of vemurafenib and dabrafenib (Supplementary Fig. S2), mutations only predict the activity of a relatively small subset of drugs routinely used in oncology. In addition to having reliable gene coverage and being implemented clinically RNA-seq data are advantageous for the construction of multigene signatures. The cell line superiority for the prediction of pharmacologic response is likely to translate clinically over time, leading to its gaining dominance for that purpose.
Because pharmacologic response is a product of multiple molecular factors, drug activity prediction, or exploration is expected to be improved and tested using the “multivariate analyses” tools of CellMinerCDB: NCATS. Figure 5 provides examples of how building multigene analyses can be explored. This approach requires an understanding of the pathways and targets that determine drug response. Taking the example of SN-38 (the active metabolite of irinotecan) and topotecan (36), Fig. 5 shows how “multivariate analyses” can be generated. CellMinerCDB also provides preexisting gene-signatures. Fig. 6 uses a precomputed multigene signature, the 18-transcript RepStress signature (29). Increased level of this stress parameter is significantly correlated with topotecan and SN-38 response, providing proof-of-principle and a testable preclinical model for RepStress as predictive for patient response to TOP1 inhibitors. Having precomputed signatures avoids looking up the reference, finding the genes involved, determining, and then applying the algorithm for the cell line set of interest.
Downloading the data of CellMinerCDB: NCATS reveals drug activity distribution enrichments for some tissue of origins within the cancer cell line panels. All the cancer types enriched indicate prospective novel applications for those drugs, presumably with responsive subsets. Nononcology drugs might also be studied. An example from Fig. 7E is disulfiram, a drug used to discourage alcohol intake. Response to this drug is bimodal across the NCATS cancer cell lines, with improved activity in bone (sarcoma) cell lines. This result expands our prior work on the discovery of acetalax, another noncancer drug, with activity in triple-negative breast cancer cell lines (3).
In summary, the wealth of information in the CellMinerCDB: NCATS web application, albeit with its own limitations, allows basic and clinician researchers to explore pharmacogenomic relationships in either univariate or multivariate fashion. One may consider drug response in the context of multiple forms or combinations of outputs that easily run into the billions. The web application facilitates the user's ability to explore those relationships and explore potential pharmacogenomic parameters applicable to clinical studies.
Limitations of the data come in multiple forms requiring multiple solutions. Missing data might be addressed by simply carrying out the salient form of analysis to fill those gaps.
More complete analysis of variability between platforms might be done by adding overlapping cell lines, drugs, or assays of interest. Algorithmic approaches that better consider the limitations and proper interpretation of datasets can improve results at that level, including the expansion of multivariate analysis functionality and approach selection. Recognitions of signatures predictive of pharmacologic response should yield improved success in that area. It should be noted that the relationships found do not constitute proof of causality. The continued exploration and definition of how best to integrate cancer cell lines omics data with that from patients and to integrate clinical data into the omics format remain fields in their infancy.
Authors' Disclosures
K.R. Bradwell reports other support from Palantir Technologies during the conduct of the study and other support from Palantir Technologies outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
W.C. Reinhold: Conceptualization, formal analysis, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing. K. Wilson: Resources, data curation, software, formal analysis, investigation, visualization, project administration, writing–review and editing. F. Elloumi: Resources, data curation, software, formal analysis, visualization, writing–review and editing. K.R. Bradwell: Resources, data curation, software, writing–review and editing. M. Ceribelli: Resources, data curation, and software. S. Varma: Resources, data curation, formal analysis, and investigation. Y. Wang: Resources, data curation, and software. D. Duveau: Resources, data curation, and software. N. Menon: Resources and data curation. J. Trepel: Conceptualization, resources, data curation, software, investigation, writing–review and editing. X. Zhang: Conceptualization, resources, data curation, software, investigation, writing–review and editing. C. Klumpp-Thomas: Resources. S. Michael: Resources. P. Shinn: Resources, data curation, and software. A. Luna: Data curation, software, formal analysis, writing–review and editing. C. Thomas: Conceptualization, resources, data curation, software, writing–review and editing. Y. Pommier: Conceptualization, resources, supervision, investigation, visualization, writing–review and editing.
Acknowledgments
Our studies are supported by the Center for Cancer Research, the Intramural Program of the NCI, NIH, Bethesda, MD (Z01 BC 006150).
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).