Abstract
Cancer precision medicine aims to predict the drug likely to yield the best response for a patient. Genomic sequencing of tumors is currently being used to better inform treatment options; however, this approach has had a limited clinical impact due to the paucity of actionable mutations. An alternative to mutation status is the use of gene expression signatures to predict response. Using data from two large-scale studies, The Genomics of Drug Sensitivity of Cancer (GDSC) and The Cancer Therapeutics Response Portal (CTRP), we investigated the relationship between the sensitivity of hundreds of cell lines to hundreds of drugs, and the relative expression levels of the targets these drugs are directed against. For approximately one third of the drugs considered (73/222 in GDSC and 131/360 in CTRP), sensitivity was significantly correlated with the expression of at least one of the known targets. Surprisingly, for 8% of the annotated targets, there was a significant anticorrelation between target expression and sensitivity. For several cases, this corresponded to drugs targeting multiple genes in the same family, with the expression of one target significantly correlated with sensitivity and another significantly anticorrelated suggesting a possible role in resistance. Furthermore, we identified nontarget genes that are significantly correlated or anticorrelated with drug sensitivity, and find literature linking several to sensitization and resistance. Our analyses provide novel and important insights into both potential mechanisms of resistance and relative efficacy of drugs against the same target.
Introduction
Traditionally, treatment options for cancer were based on the organ of origin and simple histomorphologic features. Patients with tumors from the same tissue of origin were typically treated with the same therapeutic agent (most commonly chemotherapy or radiotherapy). More recently, molecular subtyping of cancers has revealed that specific subtypes originating from the same organ often respond differently. This suggests that only a subset of patients will benefit from the approved treatment. For example, expression of ER, EGFR, and HER2 are associated with different subtypes of breast cancer, each of which has a different clinical outcome and treatment regimen (1). HER2-Positive breast cancers proliferate aggressively and have poor survival rates but are often treatable with the anti-HER2 mAb trastuzumab (Herceptin; ref. 2). In contrast, patients with triple-negative breast cancer (TNBC) have a poorer prognosis, do not benefit from hormonal or trastuzumab-based therapy, and are typically treated with surgery and chemotherapy (3).
Precision oncology takes this further and recognizes that each cancer is unique. It aims to use genomic information to identify drugs that will specifically target an individual patient's tumor and thus provide the greatest benefit for each patient (4, 5). For example, in lung cancer, patients with mutations in EGFR may be treated with osimertinib (6), erlotinib (7), or gefitinib (8), while those with the gene translocation resulting in the EML4-ALK oncogene (9) are treated with crizotinib and those with BRAF V600E mutations (10) are treated with dabrafenib plus trametinib. To date, the major focus on precision medicine has been an attempt to identify “actionable mutations” in a patient's genome. This has yielded some encouraging results, in particular, through the use of basket trials (11–16); however, we are yet to see how broadly this approach will translate to improvements in the clinic (14, 15).
For targeted therapy to work, it is critical that there is a strong evidence-based association between each “actionable mutation” and response to the drug and importantly the drug target needs to be expressed. Here, with the simple logic that expression of the target is required for a targeted therapy to work, we systematically examined the relationship between drug target expression and sensitivity to a panel of drugs. To investigate this relationship, we analyzed basal gene expression of drug targets and sensitivity profiles of hundreds of cell lines treated with large panels of anticancer drugs in two large-scale studies, the Genomics of Drug Sensitivity of Cancer (GDSC) and the Cancer Therapeutics Response Portal (CTRP; refs. 17–19). We show that for a substantial proportion of targeted therapeutics, target gene expression levels are indeed correlated with drug sensitivity. Unexpectedly, we also found for some targeted therapeutics, target gene expression levels anticorrelated with drug sensitivity calling into question their utility against these targets.
Materials and Methods
Gene expression and drug sensitivity datasets used in the study
Data were extracted from two large-scale studies of the GDSC and the CTRP, in which many anticancer drugs were screened against hundreds of cell lines to determine their sensitivity. For the GDSC dataset, multiarray average (RMA) normalized basal expression (untreated) profiles for 985 cell lines were obtained from GDSC in cancer website (ref. 17; https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources//Pathway_Activity_Scores.html). Area under the dose–response curve (AUC) values for all screened cell line/drug combinations were obtained from The Wellcome Sanger Institute website (ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/; v17_fitted_dose_response.xlsx being the file). The AUC values in GDSC range from 0 to 1. We define sensitivity as 1−AUC. We investigated the correlation between expressions of drug target versus drug sensitivity covering 222 drugs and 182 drug targets across 985 cell lines.
For the CTRP dataset, multiarray average (RMA)-normalized basal (untreated) mRNA gene expression data were obtained from the Cancer Cell Line Encyclopedia (20). The AUC values were obtained from refs. 18 and 19 (https://ocg.cancer.gov/ programs/ctd2/data-portal). The AUC values in CTRP range from 0 to 30. We define sensitivity as [1−(AUC/30)]. We investigated the correlation between expressions of drug target versus drug sensitivity covering 360 drugs and 314 drug targets across 823 cell lines.
Protein-level expression dataset used in the study
Protein level data for 284 cell lines present in GDSC and 266 cell lines from CTRP were extracted from the atlas of protein level measurement of drug targets in cancer cell lines (ref. 21; https://tcpaportal.org/mclp/#/download). We examined the correlations between protein levels and compared correlations with drug sensitivity for protein- and mRNA-level expression data.
Molecular targets of anticancer drugs
We used several sources to determine the molecular targets of anticancer drugs. The Supplementary Table S1F of the ref. 17 and Supplementary Table S1 from ref. 19 were downloaded. To map the anticancer drugs and gene targets, target information from these Supplementary tables and from DrugBank (22) was used. The drugs and their target from both datasets are shown in Supplementary Table S3a.
Correlation of gene expression with sensitivity
For each cancer cell line, the expression of the drug target in the untreated cells and the corresponding sensitivity value of that cell line to that drug were extracted. In each study, AUCs are provided, measuring the relationship between drug concentration and cell viability. Drugs effective in killing a particular cell line at a lower concentration have low AUCs while drugs that need to be used at much higher concentrations have higher AUCs. Here we define sensitivity as 1−(0–1 scaled AUC) for GDSC and [1−(AUC/30)] for CTRP. Thus, highly sensitive cell lines have AUCs approaching 0, and sensitivity approaching 1. Spearman correlation coefficients were then calculated between the basal expression and the sensitivity value across all 985 cancer cell lines for the GDSC and 823 cancer cell lines for the CTRP for which both expression and response data were available. The expression of the drug targets was then randomized and the analysis rerun for 1,000 times. The number of times where the random permutated correlation coefficient was better than the actual correlation coefficient value for each drug–target pair was then calculated. Any drug–target pair, where the correlation coefficient was no better than the actual correlation coefficient value, was considered significant.
Cancer subtype analysis
Cell lines were classified into tumor subtypes based on either the GDSC Tissue Descriptor 2 (Supplementary Table S1E, GDSC Paper) or the CCLE primary histology types (CTRP dataset). Only cancer subtypes where there were at least 50 cell lines were considered. In the GDSC analysis, we considered cell lines labeled as breast, glioma, melanoma, lung small-cell carcinoma (SCLC), and lung NSCLC adenocarcinoma (corresponding to 50, 52, 52, 60, 62 cell lines, respectively). In the CTRP analysis, we considered cell lines labeled as carcinoma, lymphoid neoplasm, and malignant melanoma (corresponding to 512, 113, and 51 cell lines, respectively). As in the main analysis, correlations between expression levels of each drug target and sensitivity of the cancer cell lines in each subtype were calculated. We then considered correlations better than 1,000 random permutations (99.9% confidence interval) as significant.
Transcript-level analysis
The GDSC Affymetrix Human Genome U219 Array raw gene expression profiles (Affymetrix CEL format) were retrieved from the European Bioinformatics Institute website (https://www.ebi.ac.uk/arrayexpress/experiments/ E-MTAB-3610/files/raw/?ref = E-MTAB-3610). The SDRF file, which is used to get the mapping between CEL file and cell line name, was downloaded from The European Bioinformatics Institute website (https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-3610/E-MTAB-3610.sdrf.txt). The CDF file; which provides a mapping between the locations of sets of individual probes on an array and a probeset, transcript or gene they detect for the array with a mapping to GENCODE transcripts was downloaded from The European Bioinformatics Institute website (http://mbni.org/customcdf/22.0.0/genecodet.download/HGU219_Hs_GENECODET_22.0.0.zip). The matching between the cell line name and the cosmic id was obtained from The Wellcome Sanger Institute website (ftp://ftp.sanger.ac.uk/pub/project/cancerrxgene/releases/release-6.0/; the file name being Cell_Lines_Details.xlsx).
The CTRP Affymetrix GeneChip Human Genome U133 Plus 2.0 Array raw gene expression profiles (Affymetrix CEL format) were retrieved from The Cancer Cell Line Encyclopedia (CCLE_Expression.Arrays_2013-03-18.tar.gz; www.broadinstitute.org/ccle). The array sample information file was downloaded from The Cancer Cell Line Encyclopedia (CCLE_Expression.Arrays.sif_2012-10-18.txt; www.broadinstitute.org/ccle). The CDF for this array with a mapping to GENCODE transcripts was downloaded from The European Bioinformatics Institute website (http://mbni.org/customcdf/22.0.0/genecodet.download/ HGU133Plus2_Hs_GENECODET_22.0.0.zip). The matching between the cell line name and the master ccl id was obtained from the NCI Office of Cancer Genomics website (https://ocg.cancer.gov/programs/ctd2/data-portal).
These raw intensity signals from the gene expression arrays were normalized using RMAExpress (23–25) using default parameters. The parameters used were (background adjust: yes; normalization: quantile; summarization method: median polish). The RMA-normalized expression value of each transcript/isoform on both the datasets was thus obtained. The column names were converted into the corresponding cell line names using information obtained from the SRDF file. For the GDSC and CTRP datasets we then used a CDF file with a mapping to GENCODE transcripts, which provides a mapping between the locations of sets of individual probes on an array and a probeset, transcript, or gene they detect. Utilizing the CDF information, we were thus able to detect the same isoform, which was detected in both the datasets. Correlation analysis was performed as described previously for the gene-level analysis except for this time we measured it at the transcript level rather than the gene level.
Results
Correlations between gene expression and sensitivity to targeted therapeutics
To examine the general relationship between gene expression and drug sensitivity, we extracted the basal expression of approximately 18,000 genes profiled by microarray across 985 and 823 cell lines in the GDSC and CTRP resources, and, the sensitivity of these cell lines to 250 and 481 anticancer drugs, respectively. As GDSC and CTRP report AUCs for drug response using different scales, we first scaled both the AUC values to a range of 0–1. We then defined cell sensitivity by subtracting these scaled AUC values from 1. Next, we calculated the Spearman correlation between the expression profile of each gene across the cell lines and the sensitivity profile of each drug across the same cell lines (Supplementary Table S1; Supplementary Table S2). These correlations were then used to examine whether the expression profiles of known molecular targets of the drugs tended to be better correlated with sensitivity to the corresponding therapeutic than other genes.
For 222 of the drugs in GDSC and 360 of the drugs in CTRP, known molecular targets were identified in GDSC, CTRP, or DrugBank (refs. 17–19, 22; see Supplementary Table S3; Table 1). As expected, we identified examples where the expression profile of an annotated target was more highly correlated with the sensitivity to the corresponding drug than the majority of other genes (e.g., EGFR expression and sensitivity to cetuximab and afatinib; Fig. 1A and B). Despite this, there was a broad range of ranks observed for annotated targets and the expression levels of many targets were poorly correlated with sensitivity (Fig. 1C–F).
Summary of the drug targets considered in this study
Number of targets . | Number of drugs in GDSC . | Number of drugs in CTRP . |
---|---|---|
1 | 109 | 213 |
2 | 67 | 66 |
3 | 18 | 34 |
4 | 8 | 20 |
5 | 11 | 12 |
≥6 | 9 | 15 |
Total | 222 | 360 |
Number of targets . | Number of drugs in GDSC . | Number of drugs in CTRP . |
---|---|---|
1 | 109 | 213 |
2 | 67 | 66 |
3 | 18 | 34 |
4 | 8 | 20 |
5 | 11 | 12 |
≥6 | 9 | 15 |
Total | 222 | 360 |
Distribution of correlation coefficients between gene expression and drug sensitivity for 6 drugs cetuximab (A), afatinib (B), dasatinib (C), GSK690693 (D), pelitinib (E), and CUDC-101 (F) profiled by the GDSC. The x-axis represents the correlation coefficient between sensitivity and expression and the y-axis represents 17,392 genes ranked by correlation. Highlighted in the plot were the annotated drug targets for each of the drugs. G, Histogram of number of genes with a better correlation than the best correlated annotated target for 222 drugs in the GDSC dataset. We have grouped the data into bins of 250. H, Histogram of number of genes with a better correlation than the best correlated annotated target for 360 drugs in the CTRP dataset.
Distribution of correlation coefficients between gene expression and drug sensitivity for 6 drugs cetuximab (A), afatinib (B), dasatinib (C), GSK690693 (D), pelitinib (E), and CUDC-101 (F) profiled by the GDSC. The x-axis represents the correlation coefficient between sensitivity and expression and the y-axis represents 17,392 genes ranked by correlation. Highlighted in the plot were the annotated drug targets for each of the drugs. G, Histogram of number of genes with a better correlation than the best correlated annotated target for 222 drugs in the GDSC dataset. We have grouped the data into bins of 250. H, Histogram of number of genes with a better correlation than the best correlated annotated target for 360 drugs in the CTRP dataset.
In Fig. 1G and H, we summarize the ranking of the annotated target by examining the number of nontarget genes with a better correlation than the annotated target with the best correlation for each drug. Among the 222 targeted drugs tested in GDSC, 22 had an annotated target in the top 1% of best correlations and 95 had an annotated target in the top 15%. Similarly, for the 360 targeted drugs tested in CTRP, 21 had an annotated target in the top 1% of correlations, and 139 drugs had an annotated target in the top 15% (See Fig. 1G and H). Note that in only 5 cases did an annotated target have the best correlation among all tested genes (nutlin-3 in the CTRP analysis and BMS-536924, BMS-754807, CP724714, and gefitinib in the GDSC analysis). Unexpectedly, in some cases, drug sensitivity was strongly anticorrelated with gene expression of an annotated target (Fig. 1C, D, and F). For four drugs profiled by the GDSC, an annotated target was in the bottom 1% of ranked gene correlations (44 drugs in the bottom 15%). Similarly, for 16 of the drugs profiled by CTRP, an annotated target was in the bottom 1% of ranked gene correlations (79 drugs in the bottom 15%).
Identifying significant correlations and anticorrelations between target expression and drug sensitivity
To determine the fraction of drugs with a significant correlation between drug target expression and drug sensitivity, we carried out a random permutation of the gene expression tables and recalculated the correlations. We considered correlations better than 1,000 random permutations (99.9% confidence interval) as significant (see Methods). From a total of 445 drug–target pairs evaluated in GDSC, 104 showed a significant correlation between expression level and sensitivity and 37 were significantly anticorrelated (Supplementary Table S3). Similarly for CTRP, from a total of 686 drug–target pairs evaluated, 178 showed a significant correlation between expression level and sensitivity and 57 were significantly anticorrelated (Supplementary Table S3).
As almost half of the drugs in our analysis have multiple annotated targets (Table 1), we also asked whether the expression level of at least one target was significantly correlated with sensitivity. For 33% (73/222) of the drugs tested in GDSC and 36% (131/360) of the drugs tested in CTRP, the expression of at least one of the targets was significantly correlated with sensitivity. Thus, analysis of both the GDSC and CTRP datasets suggests that for approximately 35% of all drugs tested in the studies, drug sensitivity is significantly correlated with the basal expression profile of at least one annotated target and that if we consider all targets approximately, a quarter show a significant correlation.
Comparison of multiple drugs against the same drug target
As many genes in GDSC (263) and CTRP (372) were targeted by more than one drug, we examined the relative correlation between the expression of each drug target and the sensitivity to the set of drugs that target it. For example, 8 agents that target EGFR were tested in the GDSC dataset (Fig. 2D). EGFR expression levels were most highly correlated with sensitivity to cetuximab (0.51), followed by afatinib, gefitinib, lapatinib, and erlotinib. In our analysis, sensitivity to five agents that are reported to target EGFR, HG-5-88-01, pelitinib, neratinib, WZ4002, and WZ8040, was not significantly correlated with EGFR levels, and sensitivity to one, CUDC-101, was significantly anticorrelated (−0.28) with EGFR expression levels. Indeed, clinically it has been shown that pelitinib, a second-generation EGFR tyrosine kinase inhibitor (TKI) showed no significant tumor response in a phase I clinical trial and produced stable disease in only one patient with NSCLC while in another cutaneous squamous cell carcinoma patient, diarrhea was a dose-limiting toxic effect (26). In contrast, WZ4002 and WZ8040 are third-generation EGFR TKIs that selectively target mutant EGFRT790M (27). This may explain why no significant correlation between sensitivity and EFGR expression levels was observed. Similarly, examining ALK and MET inhibitors revealed XMD14-99 sensitivity was best correlated with ALK expression and that none significantly correlated with MET expression (Supplementary Fig. S1). Finally, in the case of HDAC inhibitors, HDAC2 expression was best correlated with sensitivity to belinostat and vorinostat while HDAC6 expression was best correlated with sensitivity to apicidin, brd-a94377914, and isox (Supplementary Fig. S2). This suggests that relative correlations between sensitivity and target expression may be a useful way to evaluate the relative specificity of drugs against the same target as shown in Supplementary Tables S1 and S2.
Scatterplots comparing EGFR expression against sensitivity to cetuximab (A), pelitinib (B), and CUDC-101 (C). The plots have been represented in log scale. D, Comparison of correlations for EGFR, ERBB2, and HDAC1. Blue cells, significant correlation; red cells, significant anticorrelation. Gray cells, not significant after permutation testing; white cells, those that have not been tested or are not an annotated target.
Scatterplots comparing EGFR expression against sensitivity to cetuximab (A), pelitinib (B), and CUDC-101 (C). The plots have been represented in log scale. D, Comparison of correlations for EGFR, ERBB2, and HDAC1. Blue cells, significant correlation; red cells, significant anticorrelation. Gray cells, not significant after permutation testing; white cells, those that have not been tested or are not an annotated target.
Anticorrelated targets suggest a competitive mechanism of resistance
The observation that expression of approximately 8% of annotated drug targets was significantly anticorrelated with sensitivity (37 of 445 in GDSC and 57 of 686 in CTRP) was unexpected and suggested that some annotated targets may play a role in resistance. Examining this in more detail, we observed that 21 of the 32 corresponding drugs in GDSC and 25 of the 50 corresponding drugs in CTRP had more than one target. Furthermore, sensitivity against 8 of the drugs in GDSC and 15 of the drugs in CTRP was correlated with at least one target and anticorrelated with another (Supplementary Table S4). For example, sensitivity to CUDC-101 shown in Fig. 2D is correlated with one target HDAC1 and anticorrelated with both EGFR and ERBB2 the other two targets.
We note many of these also correspond to drugs that target multiple members of the same gene family, which suggests that some family members may compete in drug binding and mediate resistance in cells where more than one family member is expressed. As examples, we show the relationship between the expression of correlated targets and anticorrelated targets from four gene families targeted by four drugs, GSK690693, alisertib, ABT-737, and crizotinib (Fig. 3). In each, the expression of one target (x-axis) is correlated with expression while the expression of another from the same gene family is anticorrelated (y-axis). In the case of alisertib, cell lines with higher expression of AURKB tend to be more sensitive while those with higher expression of AURKA tend to be more resistant (Fig. 3B). Note, using single-cell expression profile data for four cell lines profiled in Ramsköld and colleagues (28) and three cell lines profiled in Li and colleagues (29), we can confirm that the competing targets shown in Fig. 3 can indeed be expressed in the same individual cells and thus it is feasible that they do compete directly (Supplementary Table S5).
Examples of drugs, where a significant correlation between drug target expression and sensitivity with one target and a significant anticorrelation with another was observed. GSK690693 (A), alisertib (B), ABT-737 (C), and crizotinib (D). The x-axis represents the expression of the correlated target. The y-axis represents the expression of the anticorrelated target. The data points are colored on the basis of ranked sensitivity values.
Examples of drugs, where a significant correlation between drug target expression and sensitivity with one target and a significant anticorrelation with another was observed. GSK690693 (A), alisertib (B), ABT-737 (C), and crizotinib (D). The x-axis represents the expression of the correlated target. The y-axis represents the expression of the anticorrelated target. The data points are colored on the basis of ranked sensitivity values.
Concordance between the two datasets
To confirm the associations shown above, we next considered the 86 drugs (directed at 215 targets) that were present in both resources. Overall, the correlation analyses agreed for 71% of the drug–target pairs; 39 drug–target pairs were correlated in both studies, 5 anticorrelated in both and 109 were not significantly correlated in either dataset (Fig. 4A; Supplementary Table S6A). As the correlation analyses disagreed for 29% of the drug–target pairs (for example lapatinib was only significantly correlated with EGFR expression in the GDSC study, Fig. 2D), we explored possible differences between the two studies that might explain this.
Assessing the concordance between the GDSC and CTRP datasets. A, Summary of the degree of overlap between the GDSC and CTRP analyses. The correlation coefficient was calculated for 86 common drugs (targeting 215 gene products). The entire 985 cell lines in GDSC and 823 cell lines in CTRP where gene expression and AUC values could be obtained were considered. B, AUC concordance analysis. X-axis represents the AUC correlation threshold and the y-axis the concordance % as a function of threshold of AUC correlation. C, Gene expression concordance analysis. X-axis represents the gene expression correlation threshold and the y-axis the concordance % as a function of threshold of gene expression correlation.
Assessing the concordance between the GDSC and CTRP datasets. A, Summary of the degree of overlap between the GDSC and CTRP analyses. The correlation coefficient was calculated for 86 common drugs (targeting 215 gene products). The entire 985 cell lines in GDSC and 823 cell lines in CTRP where gene expression and AUC values could be obtained were considered. B, AUC concordance analysis. X-axis represents the AUC correlation threshold and the y-axis the concordance % as a function of threshold of AUC correlation. C, Gene expression concordance analysis. X-axis represents the gene expression correlation threshold and the y-axis the concordance % as a function of threshold of gene expression correlation.
We first calculated pairwise correlations between basal gene expression in GDSC and CTRP for the 558 cell lines common to both studies (Supplementary Table S6B) and correlations between AUC measurements for the 86 drugs common to both (Supplementary Table S6C). Notably, the correlations between basal gene expression measurements for the same cell lines were typically high (median Spearman correlation of 0.87); however, for 17 of the cell lines a better correlation was observed with an unmatched cell line suggesting the possibility of sample swapping (see Supplementary Table S6D). Unlike the gene expression correlations, the AUC correlations were substantially lower (median Spearman correlation of 0.29).
Given this, we next evaluated the impact on the degree of concordance by applying increasingly strict thresholds based on either the AUC correlations or the basal gene expression correlations (Fig. 4B and C). Excluding the cell lines with the lower 50% of gene expression correlations had little impact on concordance (Fig. 4C); however, when excluding drugs with the lower 50% of AUC correlations, the concordance between the two studies jumped to 80%. Of note, recalculation of gene expression measurements to account for differences in microarray versions had little impact on concordance (see Supplementary Fig. S3; Supplementary Table S7).
In summary, as others have reported (30), discrepancies in the AUC measurements between the two studies is the major factor contributing to discordant observations. Despite this, we conclude that the majority of significant correlations that we report here are valid.
Gene expression predictors of drug sensitivity and resistance
Focusing beyond annotated drug targets, we next used Supplementary Tables S1 and S2 to identify genes whose expression was most highly correlated with sensitivity to any drug (correlation above 0.5). For 9 of the 250 drugs in GDSC and 35 of the 481 drugs in CTRP, the expression of at least one gene was highly correlated with sensitivity.
We examined this in more detail; the top 20 highest drug sensitivity–gene expression correlations for the GDSC and CTRP datasets are summarized in Supplementary Table S8. For the GDSC analysis, we observed gene expression profiles that were highly correlated with sensitivity to trametinib, cetuximab, SN-38, tanespimycin, afatinib, refametinib, camptothecin, and PD0325901. Often multiple genes were highly correlated with the same drug sensitivity profile. Among these, only one corresponded to a known drug–target relationship (cetuximab sensitivity and EGFR expression). Two of the most highly correlated relationships identified genes where experimental perturbation has confirmed they sensitize cells to the drug in question. For example, NQO1 is not an annotated target of tanespimycin but sensitizes melanoma, pancreatic cancer, and colorectal cancer cells to the drug (31, 32). Similarly, SLFN11 is not a target of topoisomerase inhibitors but its expression sensitizes cells to the topoisomerase inhibitors SN-38 and camptothecin (33, 34).
The second class of genes strongly correlated with drug sensitivity corresponded to genes downstream of the target. For example, expression of both FOSL1 and PHLDA1 is highly correlated with sensitivity to trametinib, and both are downstream of MEK and downregulated upon MEK inhibition with trametinib (35, 36). These actually have a better correlation with trametinib sensitivity than MEK expression itself (trametinib vs. PHLDA1, r = 0.5; trametinib vs. FOSL1, r = 0.5; trametinib vs. MAP2K2, r = 0.03; trametinib vs. MAP2K1, r = 0.02). Thus, expression of the downstream targets may be a better indication of functionally active MEK than MEK expression itself.
In another example, expression of myoferlin (MYOF) is highly correlated (r = 0.53) with sensitivity to cetuximab and other EGFR-targeting drugs (afatinib r = 0.43; erlotinib r = 0.31; and lapatinib r = 0.47). Myoferlin is a cargo protein used in endocytosis and a key regulator of EGFR degradation after it is internalized upon activation (37). Of note, in the absence of myoferlin, activated EGFR cannot be degraded in breast cancer cells (37). So, we suggest that MYOF expression might be used as a key predictor for response to EGFR inhibitor therapy.
Similarly, from the CTRP dataset, we observed gene expression profiles that were highly correlated with sensitivity to austocystin D, GSK-J4, BRD-K30748066, teniposide, and belinostat, none of which corresponded to known targets of these drugs. One of the top correlations was between sensitivity to GSK-J4 and expression of the histone methyltransferase EZH2. This is of particular interest as the expression of the known targets of GSK-J4, KDM6B, and KDM6A (which are histone demethylases) is not correlated with GSK-J4 sensitivity. This suggests that GSK-J4, developed to inhibit H3K27me3 demethylases (38), may, in fact, inhibit the histone methyltransferase EZH2, or alternatively, that sensitivity to demethylation inhibition is only relevant in cells where methyltransferase activity is intact.
Finally, we examined the genes for which expression was most strongly anticorrelated with drug sensitivity (Supplementary Table S8). In the GDSC analysis, genes with expression profiles strongly anticorrelated to cetuximab, tanespimycin, vorinostat, PD0325901, and laptinib sensitivity were identified. Similarly, from the CTRP dataset, we observed gene expression profiles that were highly anticorrelated to GSK-J4, BRD-K30748066, and belinostat sensitivity. The majority of these associations are novel findings; however, very recently, the top anticorrelation seen in the CTRP data (between sensitivity to GSK-J4 and MGLL expression) was demonstrated to be biologically relevant. It was shown that MGLL expression conferred resistance to GSK-J4 and, moreover, the addition of an MGLL inhibitor increased sensitivity to GSK-J4 (39).
Correlations between protein drug target expression and sensitivity to targeted therapeutics
We calculated correlations between mRNA expression levels and drug target sensitivity above; however, a possible concern is how well the mRNA measurements reflect the levels of the proteins targeted by these drugs. Here, we used the largest atlas to date of protein-level measurement of drug targets in cancer cell lines (21) to examine the correlations between mRNA and protein levels, and to compare correlations between drug sensitivity and protein levels or mRNA levels, respectively. We focus on 284 cell lines from GDSC and 266 cell lines from CTRP where both protein and mRNA measurements were available (this corresponds to 43 drug targets in CTRP and 50 drug targets in GDSC where matching mRNA and protein measurements were available). For the majority of these drug targets, mRNA and protein measurements were positively correlated. The median Spearman correlation between the mRNA and protein measurements in GDSC was 0.41 (and 40 of the 50 drug targets had positive correlations above 0.2). Similarly, the median Spearman correlation between the mRNA and protein measurements in CTRP was 0.47 (and 36 of the 43 drug targets had positive correlations above 0.2).
We next plotted mRNA–drug sensitivity correlation versus protein–drug sensitivity correlation for GDSC and CTRP and highlighted cases where mRNA (green) or protein (red) were significantly better correlated than the other (Fig. 5). In the GDSC analysis, we identified 6 drugs where mRNA expression levels were better correlated with sensitivity and 6 where protein levels were better correlated. Similarly, for the CTRP, we identified 4 drugs where mRNA expression levels were better correlated with sensitivity and 3 where protein levels were better correlated. From these analyses, we conclude that mRNA measurements of these drug targets performed similarly to that of the protein-level measurements. We note, however, that mRNA measurements currently have a major advantage over protein-level measurements as the microarrays used in the GDSC and CTRP studies cover all known protein-coding genes, whereas the proteomic study only measured levels of 341 proteins.
Scatterplot comparing mRNA–drug sensitivity correlation versus protein–drug sensitivity correlation for GDSC (A) and CTRP (B). Cases where either the protein levels or the mRNA levels were better correlated with drug sensitivity are highlighted.
Scatterplot comparing mRNA–drug sensitivity correlation versus protein–drug sensitivity correlation for GDSC (A) and CTRP (B). Cases where either the protein levels or the mRNA levels were better correlated with drug sensitivity are highlighted.
Discussion
Here we examined the relationship between sensitivity to targeted cancer therapeutics and gene expression using data in the GDSC and CTRP resources. For the drugs considered, sensitivity was significantly correlated with expression of at least one of the known targets (33% in GDSC and 36% in CTRP). Notably, for drugs with multiple annotated targets, sensitivity was often only correlated with a subset of targets and in some cases was even anticorrelated suggesting these targets may play a role in resistance. For example, supporting a role for AKT3 in resistance to GSK690693, AKT3 expression was anticorrelated with sensitivity to GSK690693 while AKT2 expression was correlated. Indeed, experimental depletion of AKT3 has been shown to increase the sensitivity of breast cancer cells to GSK690693 (40) and more generally AKT3 expression has been implicated in acquired resistance to AKT inhibitors (40, 41). Following this rationale, in three novel examples, we would predict that AURKA, BCL2L2, and MET expression is involved in the resistance to alisertib, ABT−737, and crizotinib respectively (Fig. 3).
Our analysis also provides important insights into the relative efficacy of drugs against the same target. For example, we observed significant correlations between EGFR expression and sensitivity to cetuximab, afatinib, erlotinib, gefitinib, lapatinib, vandetanib, and PF153035, but no significant correlations with sensitivity to HG-5-88-01, pelitinib, canertinib, neratinib, WZ4002, or WZ8040 (Fig. 2D). This suggests the latter drugs may actually be ineffective against EGFR. Thus, a key criterion to preclinical assessment of targeted therapies should be to confirm that sensitivity correlates with target expression in a large panel of cell lines expressing various levels of the target.
Our results suggest expression profiling should play a more prominent role in precision oncology clinical trials either as an alternative method of predicting drug sensitivity or a complementary tool to the actionable mutation-based approach. Large-scale basket trials based on genomic (or targeted genomic) sequencing to identify “actionable mutations” currently ignore expression levels of the predicted target [e.g., the NCI-MATCH (42, 43) clinical trial is the largest precision oncology trial in the world but it does not incorporate expression profiling] and the expression levels of any other genes that may modulate the uptake, metabolism, and resistance to the drug. Given our observations, we would argue a key criterion in such a clinical trial is the expression of the target. In addition, we have identified other genes whose expression levels are predictive of response or resistance, for example, the higher sensitivity to tanespimycin observed when NQO1 is more highly expressed, and, the resistance of cells to GSK-J4 that express high levels of MGLL.
We acknowledge that there are several limitations in our analysis. First, we have focused on clonal cell lines whereas in a tumor, subclones may have distinct responses to any given therapeutic. In addition, drug–target interactions require the protein to be expressed, and the target site to be intact. In both of these situations examining the expression of downstream targets (PHLDA1 and FOSL1 in the case of the MEK inhibitor trametinib above) may help indicate whether the target is functionally active. Our analyses are also limited to the cell lines and drug concentrations tested in the GDSC and CTRP. Using different combinations of cell lines, or panels focused on a particular cancer type may identify stronger relationships. Supporting this, we observed several drug–drug target pairs that were not significantly correlated in our pan-cancer analysis but were significantly correlated when restricted to a cancer subtype (Supplementary Table S9). Similarly, using different drug concentrations or different formulations may provide better correlations. Despite these caveats our work highlights the power of drug target expression analysis to identify a subset of drugs where there appears to be a significant correlation between target expression and sensitivity.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: A.R.R. Forrest
Development of methodology: R. Roy, A.R.R. Forrest
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): R. Roy, L.N. Winteringham, T. Lassmann, A.R.R. Forrest
Writing, review, and/or revision of the manuscript: R. Roy, L.N. Winteringham, A.R.R. Forrest
Study supervision: L.N. Winteringham, T. Lassmann, A.R.R. Forrest
Acknowledgments
R. Roy was supported by scholarships from the University of Western Australia. A.R.R. Forrest is supported by an Australian National Health and Medical Research Council Fellowship APP1154524. T. Lassmann is supported by a Fellowship from the Feilman Foundation. This work was carried out with the support of a collaborative cancer research grant provided by the Cancer Research Trust “enabling advanced single-cell cancer genomics in Western Australia” and an enabling grant from the Cancer Council of Western Australia. This work was also supported by funds raised by the MACA Ride to Conquer Cancer and a Senior Cancer Research Fellowship from the Cancer Research Trust (to A.R.R. Forrest). This project was supported by the Platform for Applied and Translational Bioinformatics at the Telethon Kids Institute, Western Australia.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.