Abstract
Identifying robust biomarkers of drug response constitutes a key challenge in precision medicine. Patient-derived tumor xenografts (PDX) have emerged as reliable preclinical models that more accurately recapitulate tumor response to chemo- and targeted therapies. However, the lack of computational tools makes it difficult to analyze high-throughput molecular and pharmacologic profiles of PDX. We have developed Xenograft Visualization & Analysis (Xeva), an open-source software package for in vivo pharmacogenomic datasets that allows for quantification of variability in gene expression and pathway activity across PDX passages. We found that only a few genes and pathways exhibited passage-specific alterations and were therefore not suitable for biomarker discovery. Using the largest PDX pharmacogenomic dataset to date, we identified 87 pathways that are significantly associated with response to 51 drugs (FDR < 0.05). We found novel biomarkers based on gene expressions, copy number aberrations, and mutations predictive of drug response (concordance index > 0.60; FDR < 0.05). Our study demonstrates that Xeva provides a flexible platform for integrative analysis of preclinical in vivo pharmacogenomics data to identify biomarkers predictive of drug response, representing a major step forward in precision oncology.
A computational platform for PDX data analysis reveals consistent gene and pathway activity across passages and confirms drug response prediction biomarkers in PDX.
See related commentary by Meehan, p. 4324
Introduction
Preclinical models are vital for investigating disease biology and therapeutics, constituting essential tools for translational research and drug development. In cancer research, immortalized cell lines are the most used preclinical models because of their low cost, flexibility, and the existence of assays enabling genetic and chemical screen in a high-throughput manner. However, the cell line models suffer from multiple limitations. Although they are derived from patient tumors, they have evolved to survive in artificial culture conditions resulting in major alterations at the genomic level (1–6). These in vitro models also lack the tumor heterogeneity and three-dimensional structure of the origin patient tumor (4, 7–9).
To create cancer models that better recapitulate the tumor molecular features and drug response, the pharmaceutical industry and academia massively invested in the development of patient-derived xenografts (PDX; ref. 10), which enable engraftment of human tumors in animal models (11–17). PDXs are created by subcutaneous or orthotopic engraftment of the cancerous tissues or cells from patients' tumors into immunodeficient mice. Once established, these tumors can be passed from mouse to mouse, leading to consecutive “passages” of the initial tumor cells.
PDX models are being generated and distributed by several academic groups, research institutes, and commercial organizations. This makes it challenging to find PDX models with specific characteristics such as a model with a specific mutation. Therefore, catalogs of PDX models [e.g., PDXFinder (18), EurOPDX (19), and PRoXe (20)] are being developed, which contain relevant information and provide links to model acquisition. Furthermore, as PDXs are becoming the gold-standard model for preclinical studies, better data standardization and analysis platforms are required to ensure consistency and reproducibility in PDX-based analysis. Recently, a robust standard called PDX models minimal information (PDX-MI) has been proposed (21) for reporting and quality assurance for PDX models. However, management, analysis, and visualization of the PDX-based drug screening and genomic data still constitute major challenges.
Here we present Xenograft Visualization and Analysis (Xeva), a computational package enabling storage, access, and analysis of in vivo pharmacogenomics data. The Xeva toolbox facilitates biomarker discovery in PDX-based pharmacogenomic data. It implements class structure to manage and connect PDX-based drug screening data to the genomic features of the corresponding tumor. It provides functions for PDX data analysis, including multiple metrics to summarize drug response for tumor growth curves. Furthermore, Xeva provides functions to compute the association between genomic features and response to a drug in PDXs (gene–drug association). Different response metrics for the PDX growth curve can be used as outcome to identify novel gene–drug associations that can act as drug companion tests. Using the Xeva platform, we analyzed gene expression of PDXs across passages and demonstrated that activity patterns of the majority of the genes and pathways are stable across different PDX passages. We identified multiple pathways significantly associated with anticancer drug response, including known and new biomarkers based on gene expression, copy number aberrations (CNA), and mutations. Our results support the value of large-scale PDX-based drug screening for biomarker discovery using the integrative pharmacogenomic analysis pipelines implemented in the Xeva computational platform.
Materials and Methods
Processing of pharmacogenomic data
Gene expression data, PDX passages, and tissue information were obtained from the Gene Expression Omnibus and publications (Fig. 1). Molecular profiles including mutation, CNA, gene expression, and pharmacologic profiles were obtained from the publication Gao and colleagues (17) and Gene Expression Omnibus series GSE78806. Gene expression profiles were normalized with the RMA algorithm in the Affy package (version 1.58.0) in Bioconductor (22). PDX passages and tissue information were curated manually. Data were processed using R statistical software (http://www.r-project.org/). For mutation, CNA and RNAseq-based gene expression data, processed values were directly download from Gao and colleagues' publication (17). The final dataset contains 3,470 unique PDX models tested across 57 treatments and derived from 191 patients spanning across 5 different cancer types.
Xeva facilitates analysis of the PDX-based pharmacogenomic data. A, Major publicly available PDX-based pharmacogenomic datasets. For a detailed list and references, see Supplementary Table S1. B, Distribution of genomic, drug screening, passage-related data, and cancer types in PDXE dataset. For 350 patients belonging to 6 different cancer types, availability of genomic data (CNA, mutation, and RNAseq) is shown in inner tracks. Number of drugs also includes untreated/control PDXs. Availability of passage-specific gene expression data is shown in outer track. C, Computation and visualization of response for PDX-based drug screening in PDXE breast cancer data using Xeva mRECIST function. For patient IDs highlighted in gray color, the control and treatment (paclitaxel) growth curve of PDXs is shown in D. The patient IDs are X-2344, X-1004, X-3078, and X-5975, respectively. Visualization was done using plotPDX function in Xeva.
Xeva facilitates analysis of the PDX-based pharmacogenomic data. A, Major publicly available PDX-based pharmacogenomic datasets. For a detailed list and references, see Supplementary Table S1. B, Distribution of genomic, drug screening, passage-related data, and cancer types in PDXE dataset. For 350 patients belonging to 6 different cancer types, availability of genomic data (CNA, mutation, and RNAseq) is shown in inner tracks. Number of drugs also includes untreated/control PDXs. Availability of passage-specific gene expression data is shown in outer track. C, Computation and visualization of response for PDX-based drug screening in PDXE breast cancer data using Xeva mRECIST function. For patient IDs highlighted in gray color, the control and treatment (paclitaxel) growth curve of PDXs is shown in D. The patient IDs are X-2344, X-1004, X-3078, and X-5975, respectively. Visualization was done using plotPDX function in Xeva.
Implementation
PDX pharmacogenomic experiments aim to investigate how tumor volume changes in in vivo models with respect to time, with and without drug treatment. The corresponding metadata, such as drug dose, number of days from tumor implantation to treatment start, or reason for stopping the experiment (e.g., whether mice died because of complications or were sacrificed due to maximal allowed tumor volume reached), are crucial factors for downstream analysis. In the Xeva platform, we have developed the XevaSet class to effectively store pharmacologic response (time vs. tumor volume) of PDXs along with metadata related to the experiments and molecular data (Supplementary Fig. S1). Furthermore, to store individual PDX (mouse) model data, we have implemented the pdxModel class, which provides slots for PDX-MI variables, along with time versus tumor volume data. Detailed schematics of the XevaSet and pdxModel classes are shown in Supplementary Figs. S2 and S3, respectively. XevaSet object can contain multi-omics data, which are linked to individual xenograft models and their pharmacologic profiles. Data can be subsetted by metadata and experimental factors, such as tumor types or drug names.
PDX experiment design
In vivo evaluation of drug sensitivity in cancer xenografts has traditionally used experimental approaches incorporating multiple animals (5–10) replicates for each control and treatment arm. This experimental design allows the assessment of the variability in the PDX drug response across individual animals (23). However, scaling this strategy for high-throughput drug screening is costly and requires the use of a large number of experimental animals. To increase the number of compounds being tested, reduce cost, and permit the use of fewer animals to provide essential data (24), a simpler “1 × 1 × 1″ experimental design for PDX clinical trial (PCT) has been proposed (17, 25). In this design, each compound is tested in only 1 PDX model from each patient (Supplementary Fig. S4). Although the 1 × 1 × 1 experiment design allows high-throughput screening at reduced cost, it is more prone to experimental errors due to lack of replications. Factors such as difference in the handling of mice or measurement errors in the tumor volume and biological variability in tumor growth rates between animals could significantly impact the results. We foresee that both experimental design strategies will coexist, where the 1 × 1 × 1 experiment design will be used for population-level and high-throughput screening, whereas the replicate-based experiment design will be used for more focused studies where discrimination of small differences in drug response is a desired outcome. We therefore implemented functions to accommodate both experimental designs for data visualization and statistical analyses.
PDX response metrics
Given the lack of consensus regarding the best summary metrics to estimate in vivo drug response (17, 26–29), we implemented the state-of-the-art response metrics used for PDX-based drug response experiments. These include the slope of curves, angle between the mean control and treatment curves, tumor growth inhibition (TGI), area between the curves, linear mixed model (28), best average response (BAR), best response (BR), and mRECIST (17).
For each PDX model, least squares fits were obtained by regressing tumor volume at each time point as:
where V denotes tumor volume, T denotes time. The intercept and slope are denoted by α and β, respectively. Subsequently, the angle was computed using inverse tangent of regression line slope as:
The TGI is defined as:
where VC and VT are the median of control and treated growth curve, respectively, at the end of the study. VC0 and VT0 indicate the initial tumor volume for control and treated growth curve, respectively.
The BAR metric for each PDX model is defined as follows:
At each time point t, the normalized change in tumor volume is computed as
Next, for each time t, the running average of |\Delta {V_t}$| from t = 0 to t was calculated. BAR is defined as the minimum of this running average (at |t \ge 10\ \,{\rm{days}}$|). The minimum value of |\Delta V$|(at |t \ge 10\ \,{\rm{days}}$|) defines as BR. Using these, the mRESIST metric for PDX is computed as:
complete response (CR): BR < −95% and BAR < −40%
partial response (PR): BR < −50% and BAR < −20%
stable disease (SD): BR < 35% and BAR < 30%
progressive disease (PD): not otherwise categorized
Visualization of in vivo drug-screening data
Several Xeva functions enable multifaceted visualization and exploration of the PDX data. The plotPDX function displays tumor growth curves, plotting time versus tumor volume data for a patient–drug pair or matched control and treatment PDX models (called batch). The waterfall function visualizes population-level response for a given set of PDXs. Similarly, plotmRECIST displays mRECIST-based drug response as a heatmap, with drugs along heatmap rows and PDXs along columns. Heatmap cells are colored according to the mRECIST value for the corresponding PDX–drug pair.
Gene expression consistency analysis
We calculated the Pearson correlation coefficient between pairs of samples belonging to the same lineage using all available gene expression. For comparison, we also computed the Pearson correlation coefficient between all possible pairs of samples that do not belong to the same lineage. Similar analysis was performed using the L1000 landmark gene set (30). For genes, we computed the intraclass correlation coefficient (ICC) using the psych package (version 1.8.4). For a particular gene, in each sample we computed its rank by sorting the expression values. The rank of genes along with passage information is used for ICC calculation. ICC values of the genes were used to perform gene-set enrichment analysis on the MSigDB hallmark gene sets and Reactome pathway database (31), which contains gene sets associated with specific pathways.
Pathways and gene–drug association analysis
We computed the association between a molecular feature and response to a drug across PDXs (commonly referred to as gene–drug association or drug response predicting biomarker). The gene–drug association was assessed separately for all three available molecular features, that is, gene expression, CNA, and gene mutation. The response of PDXs to a drug treatment is defined using BAR.
The association between genomic feature and PDX response was computed using nonparametric measure of association, concordance index (CI). The CI represents the probability that two variables will rank a random pair of samples the same order. For each drug, we compiled a list of potential biomarkers using OncoKB (32), DrugBank, and literature curation. We adjusted the P value using FDR for drug combinations and drugs with multiple potential biomarkers. Univariate gene–drug associations were calculated for three different molecular profiling modalities that are gene expression, copy number variation, and mutation. For drug–pathway association analysis, the Reactome pathway database (31) was used. We created a subset of the pathway database by selecting only the gene sets containing at least 1 potential drug target and containing less than 300 genes. We also grouped the drugs in different classes according to their genomic target. In total, we selected 94 pathways related to 57 drugs and drugs were classified into 11 classes. Pathway analysis was performed using the R Piano package (version 1.20.1; ref. 33). We used the R package ComplexHeatmap (version 1.20.0) to visualize the gene–drug associations and R package circlize (version 0.4.6) to visualize the drug–pathway association (34).
Research reproducibility
The open-source Xeva package is available from GitHub (https://github.com/bhklab/Xeva) at under GPLv3 license and Bioconductor repository (http://bioconductor.org/packages/Xeva/). We also provide a complete software environment through Code Ocean containing all necessary data and code to reproduce the analysis and figures described in this manuscript under the doi https://doi.org/10.24433/CO.bfb2dd77-53a3-4083-bf3a-16504fc8c786.
Results
Xeva follows PDX-MI standards to store pharmacogenomic data
We designed XevaSet, a new object class enabling integration of molecular and pharmacologic profiles of PDXs (Supplementary Figs. S1–S3) following the recent minimal information for patient-derived tumor xenograft models (PDX-MI) standard for reporting on the generation, quality assurance, and use of PDX models (21). The PDX-MI standard ensures that all necessary clinical attributes of the tumor along with PDX-related essential experimental information, such as host mouse strain and passage information, is reported. Given that this information is crucial for downstream analysis and research reproducibility, we have implemented the pdxModel class (Supplementary Figs. S2–S4) that provides slots for PDX-MI variables. In this study, we curated the recent Novartis PDX Encyclopedia (PDXE; ref. 17) and created the PDXE XevaSet object (Fig. 1A and B; Supplementary Table S1) to investigate the consistency of gene expression patterns across passages and mine the pharmacogenomic data for known and new biomarkers predictive of drug response in vivo.
Xeva provides useful functions for analysis and visualization of PDX-based pharmacogenomic data. For every PDX model in the PDXE breast cancer dataset, mRECIST-based response metrics was computed using the Xeva function response (Fig. 1C). Heatmaps representing the mRECIST data cutaneous melanoma (CM), colorectal cancer, gastric cancer, non–small cell lung carcinoma (NSCLC) and pancreatic ductal adenocarcinoma can be found in Supplementary Figs. S5 to S9, respectively. Visualization of PDX growth curves is an essential part of data quality control and analysis. Tumor growth curves for individual PDX models and for matched control-treatment models can be plotted using the plotPDX function (Supplementary Table S2). Examples of PDX tumor growth curves in control (untreated) and treatment (paclitaxel) conditions are shown in Fig. 1D. All 4,483 tumor growth curves with matched control can be found in Supplementary Data File S1. PDX-related genomic data and linked drug screening response data can be extracted using the function summarizeMolecularProfiles (Supplementary Table S2). Similarly, the drugSensitivitySig function (Supplementary Table S2) allows users to quantify the strength of each gene–drug association using a linear regression model.
Gene expression is consistent across passages
PDX models are known to better represent the molecular characters of human tumor compared with simpler in vitro models (13, 35, 36). PDXs show high similarity to patient samples for mutation rate (17, 37) and CNA (13) and methylation (38) patterns, histological and molecular subtypes (13, 36, 39). Before conducting drug screening, PDXs are passaged multiple times with the assumption that genomic characteristics of PDXs are stable across passages. Several studies have shown that mutation and copy number patterns are largely stable across passages (14, 36, 40–44). However, a meta-analysis using inferred copy number data asserted that the genomic landscape of PDXs changes rapidly during passaging (45). Given these contradictory results, it is vital to perform systematic analysis of noninferred genomic data (i.e., gene expression profile) across passage (45).
To address these concerns, we sought to systematically analyze the changes in gene expression pattern across passages. We curated gene expression data for 661 PDX samples [all samples are overlapping with Ben-David and colleagues (45) study], derived from 371 patients and spanning from passage 0 (P0) to passage 5 (P5; Fig. 2A; Supplementary Table S3). To visualize the consistency of gene expression patterns across passages, we performed a t-distributed stochastic neighbor embedding (t-SNE) analysis and projected the high-dimensional PDX gene expression data into a two-dimensional plot (Fig. 2B). PDXs derived from a patient but belonging to different passages (defined as belonging to same lineage) were linked together in the t-SNE visualization. We observed that in the visualization PDXs from the same lineage are projected nearby even though coming from different passages. Next, we computed the Pearson correlation coefficient for all pairs of PDX samples belonging to same lineage using all available genes (Fig. 2C) and L1000 gene set (Supplementary Fig. S10). We found that the median Pearson correlation for related pairs is high (0.98) in comparison with nonrelated tissue-specific PDX pairs (0.87) and the difference was statistically significant (Wilcoxon rank sum test, P < 0.0001). Collectively, these results strongly support that the gene expression profile of PDXs is consistent across different passages. For specific genes, however, the expression behavior can vary across PDX passages, which may affect drug response depending on which passage is used for drug testing. We therefore assessed stability of each gene across passages by computing the intraclass correlation coefficient (ICC) for genes in samples from same PDX lineage for all tissue types and stratified by tissue type (Fig. 3A). We found that the ICC values of genes in pairs of related PDXs are higher when compared with nonrelated pairs (Wilcoxon rank sum test, P <1E−15). Furthermore, the ICC values of genes are skewed towards high values, indicating that the majority of genes have a stable expression pattern across PDX passages. Our results show that expression patterns of known biomarker genes such as EGFR, ERBB2, and MAP2K1 are stable across passage (Table 1; Supplementary Data File S2). Consistent with Rubio-Viqueira and colleagues (46), we found that expression of genes such as VEGFA, MDM2, and CDK4 is stable in pancreatic cancer PDXs.
Gene expression landscape of PDXs is consistent across passages. A, Distribution of samples in different passages. B, t-SNE analysis of gene expression data (for 17,304 genes) from different passages of the PDXs. Samples belonging to same lineage but belonging to different passages are linked together by line (light gray color). Overlapping samples or samples very near to each other are connected using curved lines (light gray color). C, Pearson correlation for related sample pairs (belonging to same lineage) and randomly selected samples pairs. The correlation coefficient of related pairs is significantly higher than randomly selected pairs (P < 0.001). All available (17,304) gene expressions were used for the computation of the correlation coefficient.
Gene expression landscape of PDXs is consistent across passages. A, Distribution of samples in different passages. B, t-SNE analysis of gene expression data (for 17,304 genes) from different passages of the PDXs. Samples belonging to same lineage but belonging to different passages are linked together by line (light gray color). Overlapping samples or samples very near to each other are connected using curved lines (light gray color). C, Pearson correlation for related sample pairs (belonging to same lineage) and randomly selected samples pairs. The correlation coefficient of related pairs is significantly higher than randomly selected pairs (P < 0.001). All available (17,304) gene expressions were used for the computation of the correlation coefficient.
PDX maintains expression pattern of the genes across passages. A, Violin plot shows ICC for genes across PDX passages for all samples and is stratified by tissue type. Black violin plot represents ICC values for genes calculated using nonpassage-related (randomly selected) samples. B, Pathways are stable across passages in PDXs. Barplot shows top 10 pathways with negative enrichment score in gene-set enrichment analysis. Only one pathway has a statistically significant (FDR < 0.05) negative enrichment score across PDX passages.
PDX maintains expression pattern of the genes across passages. A, Violin plot shows ICC for genes across PDX passages for all samples and is stratified by tissue type. Black violin plot represents ICC values for genes calculated using nonpassage-related (randomly selected) samples. B, Pathways are stable across passages in PDXs. Barplot shows top 10 pathways with negative enrichment score in gene-set enrichment analysis. Only one pathway has a statistically significant (FDR < 0.05) negative enrichment score across PDX passages.
To identify pathways that are enriched with unstable genes, we performed a gene set enrichment analysis with the genome-wide ranking of genes based on their stability across passages (ICC values; Fig. 3B; Supplementary Data Files S3 and S4). Pathways enriched with unstable genes will be sensitive to passage and targeting these pathways might result in inconsistent drug response across passages. Our analysis shows that the pathways in the Hallmark gene set (47) have positive enrichment scores, indicating high stability of these pathways across passages (Supplementary Data File S3). For the Reactome gene set (31), we found that posttranscriptional mRNA processing pathways such as mRNA 3′ end processing and mRNA splicing are enriched with unstable genes (Fig. 3B; Supplementary Data File S4). It is well established that during proliferation and differentiation, cells adjust the mRNA and protein level by controlling the posttranscriptional mRNA processing pathways (48–50). Therefore, the instability of these posttranscriptional mRNA-processing pathways might be attributed to the tumor growth in the PDX. Targeting these pathways using a drug in PDXs might result in inconsistent response at different passages.
In vivo biomarker discovery
One of the main goals of pharmacogenomic studies is to find genomic biomarkers for drug response prediction. We evaluated the association between a molecular feature and response to a given drug (gene–drug association) in PDXE (17) data. The PDXE data consists of a 1 × 1 × 1 experimental design where 60 compounds were tested across 277 PDXs. The Xeva function plotPDX provides an interface for the visualization of time versus tumor volume data of PDXs. To model the tumor growth curve and to quantify the response of PDXs, we implemented several drug response metric including slope, area between curves, linear mixed effects model (28), BAR, and mRECIST (17).
We used Xeva to identify biomarkers for drug response prediction by computing gene–drug associations for drugs in the PDXE data and their corresponding known biomarker genes defined using OncoKB resource (32). For the analysis, a PDX's response to a drug is defined using BAR and association was computed using concordance index. Analysis was done for each tissue type with gene expression, CNA, and mutation data. In vitro drug testing has shown that the drug encorafenib can produce synergistic effects with binimetinib in cutaneous melanoma (17, 51). This drug combination also shows synergistic effects in PDXs as 50% of tested PDXs show tumor shrinkage, whereas binimetinib and encorafenib monotherapy show tumor shrinkage in 40% and 25% of PDXs, respectively (Fig. 4A and B). For the drug combination of encorafenib and binimetinib, we found that NRAS mutation status is significantly associated with response (Wilcoxon tests, FDR = 0.01; Fig. 4C). Among PDXs with NRAS wild-type status, 63.6% show a negative BAR to the binimetinib and encorafenib drug combination, whereas in the mutated category only 20% show a response.
Xeva facilitates biomarker discovery and visualization. Waterfall plots show response of CM PDXs for drugs binimetinib (A), encorafenib (B), and binimetinib + encorafenib (C). Each bar represents one PDX derived from a patient and color represents NRAS mutation status (red mutated and blue wild-type). Response of the PDX is defined as BAR. Association between mutation and drug response was calculated using Xeva function drugSensitivitySig and FDR correction was applied to P values. For visualization, waterfall function in Xeva is used.
Xeva facilitates biomarker discovery and visualization. Waterfall plots show response of CM PDXs for drugs binimetinib (A), encorafenib (B), and binimetinib + encorafenib (C). Each bar represents one PDX derived from a patient and color represents NRAS mutation status (red mutated and blue wild-type). Response of the PDX is defined as BAR. Association between mutation and drug response was calculated using Xeva function drugSensitivitySig and FDR correction was applied to P values. For visualization, waterfall function in Xeva is used.
The drug trastuzumab (Herceptin) is an monoclonal antibody that targets the extracellular domain of the HER2 protein and inhibits the proliferation of tumor cells. In the breast cancer PDXE data, we found that expression of the HER2 encoding gene (ERBB2) is significantly associated with trastuzumab response (CI = 0.682; FDR = 0.02; Fig. 5). ERBB3, another member of the EFGR (EGFR/ERBB) family, was also found to be associated with trastuzumab response (CI = 0.68; FDR = 0.026). ERBB3 is known to be implicated in growth, proliferation, metastasis, and drug resistance in tumors through interacting with ERBB2 (52, 53).
PDXs faithfully recapitulate known gene–drug associations. For the breast cancer PDXE dataset, the left side of the figure shows mutation (top), CNA (middle), and expression (bottom) pattern of known biomarker genes. Here, each column represents genomic features from an individual PDX model. The right panel shows association (concordance index) between genomic features and response of the drug. Columns represent drugs for which names are annotated on the top. In this panel, the association between gene feature and drug is represented using circles. For drugs, the corresponding biomarker is highlighted using dark gray color if the association is statistically significant (FDR < 0.05). Association between genomic feature and drug response was computed using the Xeva function drugSensitivitySig and visualization was done using ComplexHeatmap.
PDXs faithfully recapitulate known gene–drug associations. For the breast cancer PDXE dataset, the left side of the figure shows mutation (top), CNA (middle), and expression (bottom) pattern of known biomarker genes. Here, each column represents genomic features from an individual PDX model. The right panel shows association (concordance index) between genomic features and response of the drug. Columns represent drugs for which names are annotated on the top. In this panel, the association between gene feature and drug is represented using circles. For drugs, the corresponding biomarker is highlighted using dark gray color if the association is statistically significant (FDR < 0.05). Association between genomic feature and drug response was computed using the Xeva function drugSensitivitySig and visualization was done using ComplexHeatmap.
The drug binimetinib is a targeted and potent MAPK kinase (MAP2K or MEK) inhibitor (54). In breast cancer PDXs, expression of MAP2K2 (CI = 0.67; FDR = 0.02; Fig. 5) was significantly associated with binimetinib response. This gene belongs to the MAP2K kinase family and produces a protein that activates the MAPK/ERK pathway. Associations between potential biomarkers and PDX drug response for all tissue types can be found in Supplementary Data File S5.
Pathways targeted by drugs are significantly enriched in PDXs. In the circos plot, drug classes (left) and targeted pathways (right) are linked when the association is significant (gene set enrichment analysis, FDR < 0.05). Activity of EGFR class drugs has significant association with EGFR signaling pathway. Similarly, MAPK class drugs show significant association with MAPK activation-related pathways.
Pathways targeted by drugs are significantly enriched in PDXs. In the circos plot, drug classes (left) and targeted pathways (right) are linked when the association is significant (gene set enrichment analysis, FDR < 0.05). Activity of EGFR class drugs has significant association with EGFR signaling pathway. Similarly, MAPK class drugs show significant association with MAPK activation-related pathways.
To gain understanding of the association between drug response and target pathway, we performed gene set enrichment analysis (Fig. 6). Drugs were classified into 11 classes according to their known targets and the Reactome pathway database was used for gene set enrichment analysis. We found that EGFR signaling in cancer pathway is significantly enriched (FDR < 0.05) in the EGFR class of drugs. Similarly, for MAPK class drugs, relevant pathways such as MAPK activation in Toll-like receptor (TLR) cascade and NF-κB and MAPK activation by TLR are significantly enriched (FDR < 0.05). Associations between each drug and reactome pathways can be found in Supplementary Data File S6.
Discussion
PDXs are valuable models for cancer modeling and pharmacogenomic analysis. However, the translational potential of PDX preclinical models is highly dependent on the tools and techniques to processing and analyzing the data. Computational tools that enable standardized processing are thus an integral part of this line of research, and harmonized approaches can provide the community with accessible means for the analysis. The Xeva platform allows researchers to visualize and analyze the complex pharmacogenomic data generated during in vivo drug-screening studies. The key strengths of the Xeva platform is its ability to store all metadata from a PDX experiment, link genomic data to corresponding PDX models, and provide user friendly functions for analysis.
In a recent study, Ben-David and colleagues (45) analyzed changes in CNAs during PDX passaging using experimental and computational inferred copy number alterations (CNA) from gene expression profiles. They concluded that the CNA landscape of PDXs changes rapidly with passage as within four passages 12.3% (median) of the genome was affected by model-acquired CNAs. Although such an analysis is an important component of credentialing PDX as a preclinical platform, their analysis depended largely on inference rather than direct measurement of CNAs: for 84% (933 of 1,110) of PDX samples, the CNAs were inferred from microarray-based gene expression data and lack matched normal tissue samples. As the authors stated, virtual karyotyping does not fully recapitulate the CNA observed from SNP microarrays or whole-exome sequencing. For the samples where DNA-based CNA profiles are available, the authors have reported a concordance of 0.82 between experimental (DNA-based) and expression-based inferred CNA profile. However, directly assessing the gene expression pattern can provide better insight about changes in genomic landscape of PDXs across passages than the CNA profile. In this study, using Xeva, we have assessed the gene expression landscape of PDXs. Our results indicate that the gene expression landscape of PDXs is similar across different passages as the correlation between the related PDXs is very high. At the level of individual genes, we observe high consistency in expression patterns for the majority of genes. However, caution is required when analyzing genes with low stability in PDX models. Lack of stability in gene expression may lead to inconsistent results when targeting proteins or pathways related to genes with low expression stability. We have provided a list of genes and their stability score that will help researchers to assess the consistency in expression patterns of genes of interest and thereby deciding if PDXs are suitable models for their (targeted) drug of interest or to study behavior of a particular gene.
In our study, we found that the multiple known biomarkers predictive of drug response can be identified in the large Novartis PDXE dataset. Notably, the dataset used in this study has the 1 × 1 × 1 PDX experiment design (Supplementary Fig. S4). Thus, our results demonstrate that the 1 × 1 × 1 PDX experiment design is an adequate way to discover drug response predicting biomarkers at the population level. Although the 1 × 1 × 1 experiment design is a cost- and animal-resource efficient way to find biomarkers, using Xeva, we have found cases where control (untreated) mice show a partial response (PR) or tumor shrinkage. Possible causes of such early tumor shrinkage might include experimental error, handling glitches, or genetic properties of the tumor. In a 1 × 1 × 1 experiment design, lack of replicates makes it impossible to decipher the exact cause. Visualization and analysis of PDX response data using Xeva provides an efficient way to recognize such cases.
For biomarker discovery analysis, treatment response in PDXs is defined by the BAR (17). This metric of treatment response provides a continuous value; however, it does not take into account the control arm of the PDX experiment. As PDX-based pharmacogenomics gains popularity, standardized metrics to define response are required, thereby taking into account the control arm of the PDX experiments. Such methods will also improve the biomarker discovery process. Xeva provides a standard tool for comparison of different PDX response metrics.
The Xeva package enables easy and efficient analysis of the PDX-based pharmacogenomic data. Xeva includes functions to link molecular features to drug response, therefore providing a unified framework for analysis and development of biomarkers of drug response.
We developed the Xeva package to facilitate visualization, analysis, and biomarker discovery in PDX pharmacogenomic data. We showed that PDX gene expression is consistent across passages. Our platform allowed us to confirm the existence of several drug response prediction biomarkers in a large PDX pharmacogenomic dataset. The reproducibility of known biomarkers and consistency in gene expression shows that PDX experiments are suitable for in vivo biomarker discovery or validation. Xeva is an open-source, flexible, and timely tool in an era of increasing efforts to use PDXs as the main model system for cancer research. We envision Xeva will play a crucial role in PDX-based pharmacogenomic analysis, biomarker discovery, and validation.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: A.S. Mer, M.-S. Tsao, D.W. Cescon, B. Haibe-Kains
Development of methodology: A.S. Mer, D.W. Cescon, B. Haibe-Kains
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): A.S. Mer, M.-S. Tsao
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): A.S. Mer, B. Brew, J. Ortmann, D.W. Cescon, A. Goldenberg, B. Haibe-Kains
Writing, review, and/or revision of the manuscript: A.S. Mer, W. Ba-Alawi, Y.X. Wang, M.-S. Tsao, D.W. Cescon, B. Haibe-Kains
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): A.S. Mer, P. Smirnov, M.-S. Tsao, B. Haibe-Kains
Study supervision: D.W. Cescon, A. Goldenberg, B. Haibe-Kains
Other (partial funding of study): M.-S. Tsao
Acknowledgments
A.S. Mer was supported by the SU2C Canada—Canadian Cancer Society Breast Cancer Dream Team Research Funding (SU2C-AACR-DT-18-15), with supplemental support from the Ontario Institute for Cancer Research, through funding provided by the Government of Ontario. Stand Up To Cancer Canada is a Canadian Registered Charity (Reg. #80550 6730 RR0001). Research Funding is administered by the American Association for Cancer Research International—Canada, the Scientific Partner of SU2C Canada. B. Haibe-Kains was supported by the Gattuso Slaight Personalized Cancer Medicine Fund at Princess Margaret Cancer Centre, the Canadian Institute of Health Research, Natural Sciences and Engineering Research Council, and The Terry Fox Research Institute fund. A.S. Mer would like to extend a sincere thank you to Dr. Carmen Ludwig for her help with editing the manuscript. We would like to thank the investigators of the Novartis Patient-Derived Xenograft Encyclopedia who have made their invaluable data available to the scientific community.










