Combination therapies are commonly used to treat patients with complex diseases that respond poorly to single-agent therapies. In vitro high-throughput drug screening is a standard method for preclinical prioritization of synergistic drug combinations, but it can be impractical for large drug sets. Computational methods are thus being actively explored; however, most published methods were built on a limited size of cancer cell lines or drugs, and it remains a challenge to predict synergism at a large scale where the diversity within the data escalates the difficulty of prediction. Here, we present a state-of-the-field synergy prediction algorithm, which ranked first in all subchallenges in the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge. The model was built and evaluated using the largest drug combination screening dataset at the time of the competition, consisting of approximately 11,500 experimentally tested synergy scores of 118 drugs in 85 cancer cell lines. We developed a novel feature extraction strategy by integrating the cross-cell and cross-drug information with a novel network propagation method and then assembled the information in monotherapy and simulated molecular data to predict drug synergy. This represents a significant conceptual advancement of synergy prediction, using extracted features in the form of simulated posttreatment molecular profiles when only the pretreatment molecular profile is available. Our cross-tissue synergism prediction algorithm achieves promising accuracy comparable with the correlation between experimental replicates and can be applied to other cancer cell lines and drugs to guide therapeutic choices.

Significance: This study presents a novel network propagation–based method that predicts anticancer drug synergy to the accuracy of experimental replicates, which establishes a state-of-the-field method as benchmarked by the pharmacogenomics research community involving models generated by 160 teams. Cancer Res; 78(18); 5446–57. ©2018 AACR.

Numerous meta-analysis studies have shown that multi-agent therapies achieve better efficacy compared with single-agent therapies when treating complex diseases such as cancers and cardiovascular diseases (1–4). The “triple drug cocktail” for AIDS, introduced in 1995, is a prime example (5); compared with monotherapy treatments, it showed dramatically improved efficacy in patients and became one of the earliest successful combination therapies in history (6, 7). Multi-agent therapies involving targeted and nontargeted drugs have now become a standard practice in cancer treatments (8). The targeted drugs are usually designed to interact with driver oncogenes in cancers. For example, Trastuzumab binds to the HER2 and interferes the proliferation of HER2-positive breast cancer cells (9). However, human cancers involve numerous genomic/epigenomic aberrations and complex interactions between the compartments. Single-agent regimens could fail when downstream factors or parallel pathways in the cancer cells are activated to compensate for the drug effect (10, 11). Drug combinations can achieve better outcomes than single-agent therapy through many mechanisms. First, targeting multiple oncogenic pathways or mutations simultaneously can reduce the chance of developing drug resistance (12–14). Second, lower toxicity and better efficacy can be achieved at the same time through synergism. There are three types of drug combinatorial effects: additive effect, synergistic effect, and antagonistic effect. Additive effect is estimated by assuming no interaction exists between the drugs. Synergism means that the combined effect is larger than the additive effect (15). Antagonism is the opposite of synergism.

The properties of FDA-approved drugs are not understood well enough to infer the complex pharmacokinetic interactions between drugs. Therefore, efforts have been made to find synergistic drug combinations via both experimental and computational methods. The standard way is through in vitro or in vivo drug screening experiments. However, technical and economic barriers make this process expensive and slow. Drug screening experiments are easy to operate on a small scale but become highly labor intensive as the number of drugs grows. Apart from the technical limitations, the economic issue between drug companies and sectors is also a big impediment to the development of drug combinations (16).

In view of these concerns, in silico methods have been proposed to find candidate drug combinations for further experimental tests (17–21). Most of the existing methods aim at predicting the synergistic effects of two drugs, as the triple drug combinatorial effect is technically harder to predict and lacks experimental data for model evaluation. Many datasets have been utilized for feature extraction, such as drug chemical structure, drug target information, side effects, gene expression data, and monotherapy data. Despite the highly diverse computational approaches, the main bottleneck in drug combination prediction remains the lack of experimental data for drug combinations (22). The Cancer Cell Line Encyclopedia (CCLE; ref. 23) and Genomics of Drug Sensitivity in Cancer (GDSC; ref. 24) are two of the most significant databases for anticancer drug discovery (25). They both provide explicit baseline information, such as cell line genetic profiles. Those large-scale baseline datasets can be used to build models, but the prediction performance is often evaluated on a small dataset. Those “gold standard” drug combinations usually come from the FDA Orange Book, published literatures, or clinical trials. The limited drug screening dataset could potentially affect the robustness of the models. There also exists a difficulty in the performance comparison among the established methods, due to the inconsistency in the scoring metrics and evaluation datasets.

The mechanisms of synergistic effects are not universal among different drugs or cancers. It is unknown whether synergistic effects of drugs can be predicted accurately for cancer cell lines from different tissues and drugs that have distinct targets. In addition, the relationship between different features and the contributions of different data to the prediction accuracy remains unclear. Here, we present a biological network propagation–based algorithm for large-scale drug synergy prediction. This model integrates the indirect drug targeting effects by interrogating the gene–gene network structure, and is optimized using a large drug combination screening dataset, provided by the AstraZeneca-Sanger Drug Combination Prediction DREAM Challenge (22, 26). The DREAM Challenge data represent the largest drug screening dataset at the time of the DREAM challenge, consisting of approximately 11,500 experimentally assessed continuous synergy scores across 85 cancer cell lines and 118 targeted drugs. Furthermore, the DREAM Challenge provides a platform for evaluating all submissions in a uniform and consistent fashion in order to crowdsource the best solutions for synergy prediction. Among the submissions from 160 competing teams, our method achieved the best performance in all subchallenges. Our model also highlights the predictive power of monotherapy data and baseline molecular data, and demonstrates that using a combined feature dataset will improve the prediction performance compared with single predictors. Moreover, our model is able to transfer information among cell lines and drugs, and achieves high performance comparable with the theoretical limit, i.e., the reproducibility of experimental replicates.

### Drug screening data acquisition

In this research, we integrated the monotherapy data, drug target information, baseline molecular features, and the gene–gene interaction network to predict drug synergy (Fig. 1A–C). A total of approximately 11,500 synergy scores were experimentally assessed and provided by the DREAM consortium. The synergistic effects were measured through a combinatorial in vitro drug screening of 85 cancer cell lines and 118 anonymous chemical compounds. The cancer cell lines come from breast (n = 34), lung (n = 22), urinary tract (n = 14), gastrointestinal tract (n = 12), male genital system (n = 2), and lymphoma (n = 1). All these cancer cell lines were authenticated using the short-tandem repeat assay at AstraZeneca cell banking. Each cell bank was confirmed to be mycoplasma-free, and these cells were cultured for up to 20 passages. In each drug screening experiment, 5 nontrivial doses of each drug were tested. The observed drug responses were summarized in a 6 × 6 matrix. The efficacy of the drugs was measured by the percentage of the remaining cells after drug treatment compared with the untreated cells. The synergy scores were calculated with Combenefit v1.31 (27) using the monotherapy and drug combination response data. Additive effects were calculated using the Loewe model (28, 29). Positive synergy scores indicate synergism, and negative synergy scores indicate antagonism. The absolute values represent the strength of the synergistic/antagonistic effect. The synergy scores can be summarized in a sparse 3D tensor (Drug A × Drug B × Cell line; Fig. 2A; Supplementary Fig. S1), where approximately 2% of the elements are filled with known synergy scores. The monotherapy data of all drugs in the training and testing sets are available. The drug combination screening data are partitioned into the “Training set,” “Leaderboard set,” and “Final evaluation set” (Table 1). In this research report, we used both the training set and the leaderboard set for model optimization. The final evaluation set was kept confidential during the model development stage and was used for evaluating and comparing the performance of participants.

Figure 1.

The overall algorithm design and network propagation strategy. A, Four feature sets were used to predict drug synergy, including monotherapy data, molecular data, gene interaction network, and drug target information. B, The aims of SC1 and SC2. In SC1, participants were asked to predict drug synergy for drug pairs with combinational training data (blue cells), whereas in SC2, the combinational training data for target drug pairs were unavailable C, The pipeline of simulating posttreatment molecular features. The original molecular features were first filtered to exclude nontarget genes. Then, a gene–gene interaction network was used to simulate the values of both drug target genes and other genes connected to them.

Figure 1.

The overall algorithm design and network propagation strategy. A, Four feature sets were used to predict drug synergy, including monotherapy data, molecular data, gene interaction network, and drug target information. B, The aims of SC1 and SC2. In SC1, participants were asked to predict drug synergy for drug pairs with combinational training data (blue cells), whereas in SC2, the combinational training data for target drug pairs were unavailable C, The pipeline of simulating posttreatment molecular features. The original molecular features were first filtered to exclude nontarget genes. Then, a gene–gene interaction network was used to simulate the values of both drug target genes and other genes connected to them.

Close modal
Figure 2.

A visualization of training samples and the pipeline of the GSM and LSM. A, The known synergy scores are visualized as a 3D matrix, where the three axes are drug A, drug B, and cell line. Each blue dot represents an experimental measurement of the combination effect of drugs A and B in a cancer cell line. This matrix is not symmetric as it should be because the order of drug A and B is kept as it was in the raw dataset. B, In the global model, all the training data are fed into the random forest classifier, and the predictions are made at once. In the LSM, the training data are first split into mutually nonexclusive subsets. In the example described in the figure, the interested unknown score is the synergy score for drug Blue and drug Red in cell line Gray. The subset of training data should contain all the experiments that involve either drug Blue, drug Red, or both. The model is then trained on the subset and gives predictions for the synergy scores of drug Blue + drug Red + cell line Gray. In the last step, the predictions are collected together.

Figure 2.

A visualization of training samples and the pipeline of the GSM and LSM. A, The known synergy scores are visualized as a 3D matrix, where the three axes are drug A, drug B, and cell line. Each blue dot represents an experimental measurement of the combination effect of drugs A and B in a cancer cell line. This matrix is not symmetric as it should be because the order of drug A and B is kept as it was in the raw dataset. B, In the global model, all the training data are fed into the random forest classifier, and the predictions are made at once. In the LSM, the training data are first split into mutually nonexclusive subsets. In the example described in the figure, the interested unknown score is the synergy score for drug Blue and drug Red in cell line Gray. The subset of training data should contain all the experiments that involve either drug Blue, drug Red, or both. The model is then trained on the subset and gives predictions for the synergy scores of drug Blue + drug Red + cell line Gray. In the last step, the predictions are collected together.

Close modal
Table 1.

The splitting of drug combination screening data in subchallenges 1 and 2

Number of drug combinationsNumber of synergy scoresNumber of drug combinationsNumber of synergy scores
SC1 539 6,598 167 1,089
SC2   370 3,826
Number of drug combinationsNumber of synergy scoresNumber of drug combinationsNumber of synergy scores
SC1 539 6,598 167 1,089
SC2   370 3,826

NOTE: “Drug combinations” refers to unique drug–drug pairs. Synergy scores are calculated for each drug screen experiment that involves an interested drug–drug pair and an interested cancer cell line.

The participants were asked to predict synergy scores in two scenarios, which correspond to two subchallenges (Fig. 1B):

1. Subchallenge 1 (SC1) asks to predict the synergy scores of drug–drug combinations that have been previously tested in other cell lines.

2. Subchallenge 2 (SC2) asks to predict the synergy score of drug–drug combinations without prior experimental test results. The two subchallenges shared the same training samples and feature datasets, but evaluated the algorithms on different testing samples (Table 1).

### Drug effects and synergy scores

The drug effect (cell viability) is calculated by fitting the following equation to an observed dose–response curve:

where $E( a )$ is the observed cell viability at concentration a. The ${\rm IC}_{50}$⁠, $H$⁠, and ${E_\infty }$ represent the concentration where half of the cells are killed, the slope of the curve, and the maximum kill, respectively. Similarly, the combination effect $E( {a,b} )$is the cell viability in the 2D space of a drug pair. This effect can be decomposed using the following equation:

where $A( {a,b} )$ is the additive effect, and $S( {a,b} )$ is the synergistic effect, based on the definition using the Loewe model (28, 29). These effect values are obtained by computing the integral of the 2D curve in logarithmic concentration space.

### Baseline molecular data, pharmacogenetic data, and gene network acquisition

Apart from the drug screening data, the DREAM consortium also provided the drug target information and physicochemical properties of all 118 drugs, as well as 4 sets of baseline molecular profiles of all 85 cancer cell lines: (i) GDSC gene expression data from Affymetrix Human Genome U219 array plates (E-MTAB-3610), (ii) GDSC gene mutation data from whole-exome sequencing with Illumina HiSeq 2000 Agilent SureSelect (EGAS00001000978), (iii) GDSC copy-number variation (CNV) data from Affymetrix SNP6.0 microarrays (EGAS00001000978), (iv) DNA methylation data from the Bellvitge Biomedical Research Institute (raw methylation data from GEO: GSE68379). We also used two external datasets: (i) CCLE gene expression data and (ii) CCLE CNV data, which are both publically available at https://portals.broadinstitute.org/ccle. We utilized a published mouse gene network (30), which illustrates the probabilistic functional linkages among 20,581 protein-coding genes. This network is available at http://fntm.princeton.edu. The full dataset from DREAM can be downloaded at https://openinnovation.astrazeneca.com/data-library.html. The detailed description of the generation method is available at https://www.synapse.org/DrugCombinationChallenge. In the following content, we will refer to the synergy scores as gold standards or target values of the model, and the other datasets as feature datasets or predictors.

### The rationale of algorithm design

We integrated the various datasets by building individual models and then ensembling the prediction of each model to obtain a final prediction. The synergy scores predicted by different models were combined by a weighted average:

where ${p_i}$ is the prediction made based on the ${i^{th}}$ model, ${w_i}$is the weight of that model, and k is the total number of models. The left side of the equation is the final prediction. Here, ${p_i}$ and ${p_{\rm Final}}$ are both vectors representing a list of predicted synergy scores, whereas ${w_i}$ is a positive scalar.

We developed three models for making predictions: the global synergy model (GSM), the local synergy model (LSM), and the single drug model (SDM; Fig. 2B). GSM uses a single training set and makes predictions for all testing samples at once. LSM constructs the training set for each unknown synergy score in the testing dataset and makes predictions separately. The training dataset of LSM is a subset of the drug pairs in GSM, including only the pairs when either drug in the testing drug pair appears. SDM is similar to LSM, except that the training dataset is generated for each drug rather than drug combination and the predicted synergy scores are represented by the average of the two predicted values (Supplementary Fig. S2).

Each of the models was developed using the random forest algorithm implemented by MATLAB TreeBagger (NumTrees = 300, keeping the other parameters to default values). The analysis was implemented using Perl v5.10.1, Python 3.5.4, and R 3.4.1. The scripts submitted to DREAM are available at https://www.synapse.org/#!Synapse:syn5665972. In the original model, we used 3 sets of features for SC1 and 4 sets of features for SC2. In this article, we only implement and discuss the two most significant feature sets (see Discussion).

### Cross-validation setup

During the development stage, we used 5-fold cross-validation to test the prediction performance of our model. First, the Training + Leaderboard data were randomly partitioned into five equally sized sets. Then we sequentially used one of the five sets as the testing set and the rest four sets as the training dataset. This process was repeated 5 times. The prediction performance was evaluated using the scoring metric. The final performance of a model was represented by the average of the 5 sets × 5 repeats = 25 scores.

After the model was finalized, all the training data (Training + Leaderboard) were fed into the model, and the final evaluation dataset was used to evaluate the prediction performance of the model. The models we used for SC1 and SC2 were different and optimized separately (Supplementary Table S1).

### The scoring metrics for continuous and binary predictions

For each drug combination, SC1 requires continuous predictions of the synergy scores in multiple cell lines, whereas SC2 requires binary predictions indicating whether two drugs are synergistic or not. In SC1, we used the original predicted values. In SC2, we centered all the predicted synergy scores to zero and labeled the positive scores with “1” (predicted synergy) and the negative scores with “0” (predicted nonsynergy).

As the sample size for each drug combinations varies (i.e., some drug combinations were tested in more cell lines than others), applying Pearson correlation directly on all synergy scores will potentially give more weights to drug combinations that have more experiments. Therefore, to give equal weights to all drug combinations, a weighted average of Pearson correlation was used as the scoring metric in SC1, which is calculated using the following formula.

where N is the total number of tested drug combinations, ${n_i}$ is the number of cell lines a specific drug combination was tested on, and ${r_i}$ is the conventional Pearson correlation between the predicted and observed synergy scores for a drug combination across multiple cell lines. This is the primary metric used in SC1 in the challenge. Similarly, the tie-breaking metric is also the weighted average of Pearson correlation, but evaluated on the subset of drug pairs with the best synergy score larger than or equal 20.

We also computed the area under the receiver operating characteristic curve (ROC-AUC) to evaluate the performance of the binary predictions. We labeled the observed synergy scores $\ge$20 as “1” (observed synergy) and scores $\lt$20 as “0” (observed nonsynergy). This threshold is also used by the DREAM consortium. For the binary predictions in SC2, the DREAM consortium chose a sequential three-way ANOVA as the primary scoring metric to measure the significance of the binary separation.

Here, $a$ is the sign of the effect size. And $p$ is the P value of the ANOVA test. In addition to the weighted Pearson correlation (WPC) and three-way ANOVA, ROC-AUC was also considered to assess the prediction performance.

### Predicting drug synergy with monotherapy drug–response data

Two types of data were provided for each monotherapy experiment: (i) The dose–response data, which include the drug responses of 5 nonzero doses. (ii) The data quantification and quality check information. We applied data augmentation to generate two sets of training features and three sets of testing features. Eventually, three lists of predictions were obtained. The final prediction was the weighted average of predictions using three models (see details in Supplementary File).

### Simulating posttreatment molecular data with network propagation for feature extraction

As a form of feature extraction, we simulated the posttreatment molecular features (3D feature matrix) by modifying the baseline molecular features (2D feature matrix) according to the drug target information (2D feature matrix) and a previously published gene network (30). These simulated features act as a set of extracted informative features that are unique to each (Drug A × Drug B × Cell line) triplet. Genes that are not the targets of any drugs in the training dataset were removed to speed up computation. Then, the drug target genes were mapped to the gene network and assigned a probability of interaction with other genes. After this step, we modified different molecular features of genes using the following strategy (Fig. 1C). The drug effects on nontarget genes were scaled according to the interaction probability between the nontarget and target genes. For gene mutation features: ${v_t} = 1$⁠; ${v_{nt}} = {v_0} \times {\rm max}\{ {{p_1},{p_2}...{p_k}} \}$⁠. For CNV, gene expression and gene methylation features: ${v_t} = 0$⁠; ${v_{nt}} = {v_0} \times ( {1 - {\rm max}\{ {{p_1},{p_2}...{p_k}} \}} )$⁠. Here, ${v_0}$ is the original value of the molecular feature,${v_t}$ is the simulated value for the target genes, ${v_{nt}}$ is the simulated value for the nontarget genes, and ${p_k}$ is the interaction probability between two genes.

### Monotherapy has intrinsic correlation with synergism

When we examined the monotherapy features, we found that they were intrinsically correlated with drug synergistic effects. The raw monotherapy data included the drug responses (5 dosages/drug) and quality control information of the experiments. To construct a more complete feature space, we performed feature extraction and derived 135 predictors based on the raw monotherapy data (see Supplementary File). Figure 3A shows the Pearson correlations between each derived monotherapy feature and the target values (i.e., the synergy scores). Approximately 55% (74/135) of the features are significantly correlated with the synergy scores (FDR < 0.05; see Supplementary Table S2 for the correlations). Most of the monotherapy features, including all top 10 features (i.e., features with 10 highest absolute correlation coefficients), are negatively correlated with the synergy scores. Among the top ten features, nine are exclusively derived from the dose–response curve, and six are in the form of ${\rm max}\{ {{E_{a,\ k}},\ {E_{b,\ k}}} \}$⁠, where ${E_{a,\ k}}$ is the efficacy of drug $a$at dosage $k$⁠. The correlation analysis shows that monotherapy data have a small but significant correlation with the synergy scores. The information in different types of monotherapy data has varied correlations with the synergy scores, which implies varied predictive powers.

Figure 3.

Prediction performance of models using different features. A, The Pearson correlation between the constructed monotherapy features and the gold standard (observed synergy scores). The features with P value < 0.5 are shown in red. B, The WPC between predictions and observations using simulated posttreatment molecular features through network propagation (Propagated) or not (Original). The Propagated model achieved higher correlations in terms of both the primary and tie-breaking metrics used in the challenge. C, The cross-validation results of SC1 and SC2. Our final combined model is the ensemble of these three models and has the best performance. D, The ternary plot of using different weights for different models and the corresponding three-way ANOVA results in SC2. The directions of the three axes are indicated by the short lines aligned along the border of the ternary heatmap.

Figure 3.

Prediction performance of models using different features. A, The Pearson correlation between the constructed monotherapy features and the gold standard (observed synergy scores). The features with P value < 0.5 are shown in red. B, The WPC between predictions and observations using simulated posttreatment molecular features through network propagation (Propagated) or not (Original). The Propagated model achieved higher correlations in terms of both the primary and tie-breaking metrics used in the challenge. C, The cross-validation results of SC1 and SC2. Our final combined model is the ensemble of these three models and has the best performance. D, The ternary plot of using different weights for different models and the corresponding three-way ANOVA results in SC2. The directions of the three axes are indicated by the short lines aligned along the border of the ternary heatmap.

Close modal

### Novel network propagation strategy integrates the key information of the gene–gene interaction network

Because all the original molecular features (gene expression, CNV, methylation, and mutation) are cell line–specific, these features are identical for samples from the same cell line but treated with different drugs or drug pairs. Clearly, different drugs interact with different proteins or DNAs, interfering with the corresponding functional pathways. Therefore, the posttreatment genomic information should vary dramatically, depending on both the drug effects and the cell line–specific baseline molecular features. To leverage the prior knowledge of the gene–gene network and drug target genes, we developed a novel feature extraction strategy that simulates the drug-specific molecular features for all the cell lines. Compared with the Original model without simulation, the Propagated model with simulated data achieved higher prediction correlations in terms of both the primary and tie-breaking evaluation metrics used in the challenge (Fig. 3B; see Materials and Methods about the scoring metrics).

For CNV, expression, and methylation, we assume the drug target genes are silenced so that the values (⁠${v_t}$⁠) become zero. The nontarget genes are also affected, proportional to the probabilities (⁠$p$⁠) of their connections to the target genes. For each nontarget gene, we assume that it is affected mostly by the strongest connected drug target gene. Therefore, the simulated molecular feature of a nontarget gene (⁠${v_{nt}}$⁠) becomes ${v_0} \times ( {1 - {\rm max}\{ {{p_1} , {p_{2 }},\hellip, {p_k}} \}} )$⁠, where ${v_0}$ is the original value of the molecular feature, ${p_1} , {p_{2 }},\hellip, {p_k}$ are the connection strengths to target gene 1, gene 2, …, gene k. For genetic mutation, the values are binary: mutation ${v_0} = 1$ and nonmutation ${v_0} = 0$⁠. We assume that a mutated gene is not affected by the drug, and its value remains 1. For a nontarget gene, the value is adjusted, proportional to the connection probabilities of drug target genes ${v_0} \times max\{ {{p_1} , {p_{2 }},\hellip , {p_k}} \}$⁠.

### Molecular and monotherapy features complement each other in predicting drug synergy

We extracted the information in the molecular data with network propagation and found that it complemented the monotherapy data when predicting drug synergy. We simulated the posttreatment gene expression, methylation, mutation, and CNV profiles by leveraging the interactions between the direct and indirect drug targets. In order to reduce model complexity and remove irrelevant information, we performed an aggressive prefiltering. First, we used a local model and utilized only a subset of training samples each time. Second, we kept only the target genes of the drugs in the training data subset as predictors (Fig. 1C). Finally, the six molecular datasets were filtered down to approximately 900 molecular features (see Materials and Methods). The posttreatment molecular profiles were then generated by incorporating the gene–gene interaction and the drug target information into the baseline molecular features. This approach is innovative because it fuses three 2D feature spaces into a 3D feature space and avoids feature redundancy, which is especially beneficial for random forest (see Discussion). In SC1, the performance of LSM using molecular features is approximately 0.32, which is 0.04 lower than GSM using monotherapy features (∼0.36), but still significantly better than random (i.e., ∼0). To further improve the prediction accuracy, we assembled the results of LSM and GSM through weighted averaging. A grid search strategy was performed to optimize the weights assigned to each model. For SC1, as the weight of the molecular feature changes from 0 to 1 (Supplementary Fig. S3), the prediction performance first increases then decreases and reaches its maximum at ${p_{\rm Combined}} = 0.3 \times {p_{\rm LSM}}\, +\, 0.7 \times\, {p_{\rm GSM}}$⁠. As the predicted values are on the same scale for different feature sets, this result indicates that monotherapy data are more informative than the posttreatment molecular features. Meanwhile, the combined model that integrates both features performed better than any individual feature set (Fig. 3C).

Molecular and monotherapy features are both informative and nonredundant for drugs without prior experiments (SC2). We kept both GSM and LSM, and tested SDM using molecular features (Fig. 3D). The ternary plot shows that, in consistence with SC1, monotherapy data are still a key predictor, as the performance drops as the weight of monotherapy features decreases. The superior prediction power of monotherapy data can also be shown by comparing the individual models in parallel (Fig. 4A and B). The performance of SDM (8.18) is marginally better than the GSM (6.81). And the performance of GSM (11.34) is the best among the three models. Similarly, we assembled these three models via weighted averaging (Fig. 3D). The highest performance (three-way ANOVA = 16.87) was achieved at ${p_{\rm Combined}} = 0.4 \times {p_{\rm SDM}}\, +\, 0 \times {p_{\rm LSM}}\, +\, 0.6\, \times\, {p_{\rm GSM}}$ (Supplementary Table S1). The second best performance (three-way ANOVA = 16.78) involves contributions from both SDM (weight = 0.25) and LSM (weight = 0.10). Again, we find that monotherapy features are more informative than molecular features.

Figure 4.

The comprehensive evaluation of the performance using different features and metrics. A, The cross-validation results of three individual models and the combined model in SC2, evaluated via three-way ANOVA. B, The performance decreases when the sample size is reduced. Here, 100% represents that all the available training samples are used, whereas 75% (50%) means that we randomly subsampled 75% (50%) of all the samples in the training dataset. C, The performance comparison of the models in SC1 and SC2 using ROC-AUC. Our final combined model is the ensemble of these three models and has the best performance. D, The overall distribution of the predicted and the observed synergy scores in SC1.

Figure 4.

The comprehensive evaluation of the performance using different features and metrics. A, The cross-validation results of three individual models and the combined model in SC2, evaluated via three-way ANOVA. B, The performance decreases when the sample size is reduced. Here, 100% represents that all the available training samples are used, whereas 75% (50%) means that we randomly subsampled 75% (50%) of all the samples in the training dataset. C, The performance comparison of the models in SC1 and SC2 using ROC-AUC. Our final combined model is the ensemble of these three models and has the best performance. D, The overall distribution of the predicted and the observed synergy scores in SC1.

Close modal

Monotherapy provides the essential information about the direct treatment effect on a cancer cell line, which is the basis of drug synergism. For example, if a drug does not work on a specific cell line at all, it is unlikely to synergize with any other drugs. Therefore, the monotherapy-based GSM contains the basic probability of a drug to synergize and has a larger contribution. In addition, the single drug response is also predictable from the observed and simulated genomic features. So even if the monotherapy results are absent for some drug–cell line pairs, our genomics-based model can still make predictions. In fact, the genomics-based model further considers the gene–gene interactions, which are reflected by the observed and simulated genomic features through network propagation. For example, when two drugs synergize by targeting a specific functional pathway, genes within this pathway are simultaneously affected by both drugs. Through our network propagation algorithm, multiple genomic features under the treatment of both drugs are simulated. In this way, the prior knowledge of gene–gene interactions is integrated to improve the prediction of synergism.

### Drug synergy information is transferable among cell lines and drugs

By integrating the cross-cell and cross-drug information, we find that the synergy information is transferable among cell lines and drugs, and can be strengthened by within-cell and within-drug information. We used two uniform scoring metrics (see Materials and Methods), which are the WPC and ROC-AUC, to compare the prediction performance of our model in SC1 and SC2. The results of WPC showed that the models in SC1 all performed better than the corresponding models in SC2 (Fig. 3C). This is expected because in SC1 the training data contain prior experimental results of the drug combinations in the testing dataset, whereas SC2 does not. The WPC assesses the accuracy of continuous predictions based on linear regression. We tested the ability of the model on making binary (synergy/nonsynergy) predictions using ROC-AUC (Fig. 4C). In both SC1 and SC2, the combined model performed better than any one of the single models. The rank of the performance of GSM and LSM is reversed in SC1 compared with WPC results. This is because our model is optimized under the WPC metric. Even though the prediction performance of the combined model in SC2 (WPC = 0.38, AUC = 0.68) is less significant than SC1 (WPC = 0.41, AUC = 0.75), it still proves that the drug synergy information is transferable between different cell lines and drugs, and that the drug synergy can be predicted without any prior experiments of the interested drug combination. Meanwhile, the drug tests of the interested combinations can improve the prediction accuracy.

Under the benefit of this DREAM challenge, we gained access to a large-scale drug combination screening dataset. To test the generalizability of our model and to investigate the influence of the size and diversity of the dataset on the prediction performance, we randomly subsampled 50% and 75% of the training dataset and assessed their prediction accuracy used 5-fold cross-validation (Fig. 4B). The prediction performance is improved by approximately 0.02 and 0.01 as dataset goes from 50% to 75% and 75% to 100%, respectively. This result shows that the performance of our model improves as the size of the training data increases. It also indicates that this model is robust and has the potential to be generalized to include more cell lines and drugs, while maintaining or improving the prediction performance.

### High generalization performance of our model in the large-scale unseen dataset

Our model retains high performance in predicting drug synergy in the held-out final testing dataset provided by the DREAM competition (Table 1; Fig. 4D). As cross-validation only utilizes 80% of the whole training dataset for training, the performance of the model on the final evaluation set, which uses 100% of the training data, reaches even higher scores than the cross-validation results. For SC1, the final testing dataset consists of 1,089 unknown synergy scores. The WPC between the predicted and observed synergy scores is 0.47 (Fig. 4D), which is 0.07 higher than the cross-validation performance. This score is comparable with the correlation between experimental replicates (ρ = 0.56) as recognized as the upper bound in the DREAM challenge (22). The ROC-AUC is 0.76, which is also higher than the cross-validation results (0.75; Fig. 4C). This result indicates that our model is robust and is not overfitted to the training dataset. The area under of precision–recall curves (AUPRC) were also calculated to demonstrate the good performance of our model on the unbalanced data (Supplementary Fig. S4). The final evaluation dataset of SC2 consists of 3,826 unknown synergy scores involving 370 unique combinations of 104 drugs. None of the 370 drug combinations overlaps with the combinations in the training dataset, but all the drugs are from the training dataset. Figure 5A and B shows the prediction performance of our model on the final testing dataset. The predicted values are generally well correlated with the observed values, except for the extreme values on two tails of the observed dataset. The three-way ANOVA score is 54.16, which is significantly improved compared with the cross-validation result (score = 16.87). This again demonstrates the robustness of our model.

Figure 5.

The diversity of prediction performances. A, The overall distribution of the predicted and the observed synergy scores in SC2. B, The ROC curves and AUCs of models using the monotherapy, molecular, or combined features. The performance was evaluated on the final testing set in SC1. C, The distribution of the prediction performance for individual drugs, unique drug combinations, and individual cell lines. The x-axis represents the value of the WPC. The y-axis represents the frequency of the corresponding Pearson correlation.

Figure 5.

The diversity of prediction performances. A, The overall distribution of the predicted and the observed synergy scores in SC2. B, The ROC curves and AUCs of models using the monotherapy, molecular, or combined features. The performance was evaluated on the final testing set in SC1. C, The distribution of the prediction performance for individual drugs, unique drug combinations, and individual cell lines. The x-axis represents the value of the WPC. The y-axis represents the frequency of the corresponding Pearson correlation.

Close modal

### The prediction performance varies dramatically among cell lines and drugs

The prediction performances are highly dependent on the characteristics of the cell line and drug(s). The drug screening data include 85 cancer cell lines and 118 drugs, which cover a wide spectrum of synergistic/antagonistic effects caused by distinct mechanisms. The diversity of the underlying biological mechanism could result in various predictableness of different drugs and cell lines. We investigated the prediction performances for each drug combination, individual drug, and individual cell line (Fig. 5C). The performance is represented by the Pearson correlation between all the observed and the predicted synergy scores of the interested subject (drug, drug combination, or cell line). Figure 5C shows that the performance varied greatly (see the performance for all cell lines, drug combinations, and drugs in Supplementary Tables S3, S4, and S5, respectively). This result is consistent with the other teams in the challenge (22), where a subset of approximately 20% drug combinations were poorly predicted across all methods. It on the one hand shows the possibility of predicting synergy with extraordinary accuracy for a subset of drug and cell lines, and on the other hand indicates that a subset of drugs might not be promising candidates for drug combination in view of the difficulty in predicting their synergistic effects.

We further calculated the prediction correlation across test drug pairs for each cell line, and the tissue-specific performances of SC1 and SC2 are shown in Fig. 6A and Supplementary Fig. S5, respectively (Supplementary Tables S3 and S6). Our model achieved relatively high correlation in digestive system (median, 0.689) and lung (median, 0.583). And for each drug, the weighted average correlation was calculated, and the top 20 predictable drugs are shown in Fig. 6B. Specifically, drugs including ALK_2, MTOR_5, and PTK2 are relatively easy to predict, with correlation values above 0.75 (Supplementary Tables S7, S8, and S9). The drug identity is anonymized and masked by the target gene, due to confidentiality agreements. Moreover, we investigated the performance of predicting the best drug pairs on the held-out test set using our model. The predicted ranks of the true best pairs were calculated. We found that for 30% of the cell lines, we correctly predicted the best synergistic drug pairs (rank 1 in Fig. 6C), and for more than 73% of the cell lines, the predicted ranks of the best pairs were among top 5 (ranks 1 to 5 in Fig. 6C).

Figure 6.

The different aspects of model validation on the external test dataset. A, The tissue-specific distribution of the prediction correlations for all drug pairs. Note that blood and skin are excluded from the figure because they only have 1 and 2 cell lines, respectively. B, The top 20 easiest drugs to predict synergy. For most drugs, their names are masked by their primary target genes, which were used in the AZ dataset. We first calculated the WPC of every drug pair across test cell lines. Then for each drug, the average correlation of all drug pairs containing it was calculated as its correlation. C, The prediction rankings of the best synergized drug pairs in all cell lines. For each cell line, the prediction synergy scores were ranked and compared with the drug pairs with the highest observed scores. The prediction rankings were summarized and are shown in the pie chart. For more than 73% test cell lines, our model generated useful information indicating the potential best drug pair. And for 30% cell lines, the best synergized pairs were correctly predicted.

Figure 6.

The different aspects of model validation on the external test dataset. A, The tissue-specific distribution of the prediction correlations for all drug pairs. Note that blood and skin are excluded from the figure because they only have 1 and 2 cell lines, respectively. B, The top 20 easiest drugs to predict synergy. For most drugs, their names are masked by their primary target genes, which were used in the AZ dataset. We first calculated the WPC of every drug pair across test cell lines. Then for each drug, the average correlation of all drug pairs containing it was calculated as its correlation. C, The prediction rankings of the best synergized drug pairs in all cell lines. For each cell line, the prediction synergy scores were ranked and compared with the drug pairs with the highest observed scores. The prediction rankings were summarized and are shown in the pie chart. For more than 73% test cell lines, our model generated useful information indicating the potential best drug pair. And for 30% cell lines, the best synergized pairs were correctly predicted.

Close modal

We summarized the best synergized drug pairs for the 85 test cell lines (Supplementary Table S10). Then, we categorized these 85 pairs based on the tissues where the test cell line originated (Supplementary Fig. S6). We find that drugs are more likely to synergize in lung, achieving a median synergy score of 91.81 among 22 test cell lines. For the urogenital system, the score is relatively low, with a median value of 70.22 among 14 cell lines. Furthermore, the relationship between the prediction performance and the number of drug targets was investigated. For each drug, we calculated the WPC from our predictions and the number of genes it directly targets (Supplementary Fig. S7A). The selected drug categories are shown in different colors. We did not find any significant correlation between the prediction performance and the number of target genes (Pearson product–moment correlation = –0.058; P value = 0.553). Moreover, we categorized the drugs based on their target—whether it is a protein or DNA (including carboplatin, CarboTaxol, cisplatin, FOLFIRI, FOLFOX, gemcitabine, oxaliplatin, and topotecan). The overall distribution of prediction correlations for these two categories is shown in Supplementary Fig. S7B. We did not find any significant differences between these two types of drugs.

In silico synergistic drug combination prediction has drawn extensive attention in the past decades, yet so far there are no computational tools that have received wide acceptance or come into routine drug development process. One reason is that the available gold-standard dataset, which is essential for model training and validation, is limited in size and format (binary labels of synergy or nonsynergy). In this research, we had access to a large drug combination screening dataset that included approximately 11,500 continuous synergy scores and built a novel synergism prediction algorithm. The predictors are derived from the monotherapy data, baseline molecular data (including gene expression, methylation, CNV, and mutation), the gene–gene interaction network, and the drug target information. We simulated the posttreatment features by integrating the prior pharmacokinetic knowledge of drugs and biological knowledge of gene–gene interactions into the molecular profiles in order to generate an informative set of features. By comparing the performances of single models, we find that monotherapy data are more predictive of drug synergy than the simulated molecular profile, and the combined model performs better than any individual models. In addition, we find that the predictability varies greatly among cell lines and drugs. This explains why studies using small training sets are also capable of making accurate predictions. The result provides a list of candidate drugs that is promising for drug selection due to their relatively high predictableness and another list of drugs that requires further exploration and is currently not suitable for in silico prediction.

We chose random forest as the primary classifier because it is a nonlinear model that can account for the high-level interactions between features. These characteristics make it a good choice for solving intrinsically nonlinear biological problems. The network propagation–based feature simulation is a key contributor to the novelty of the model. If a naïve approach is used instead, where different features are aligned alongside with each other, the gene–gene interaction and drug target information will be identical for samples that are treated with the same drugs. The same is true for samples tested on the same cell line. As a result, the features that have identical values across multiple samples will then become redundant and dilute the informative features, reducing the learning efficiency. In contrast, our simulation method avoids the above issues and retains the useful information.

The robustness and accuracy of our model have been validated on a large external test set covering more than a total of 4,916 combinations of diverse drug pairs across 85 cancer cell lines. The overall performance is close to the theoretical upper limit. Specifically, we successfully predicted the best synergistic drug pairs in 30% test cell lines and provided useful ranking information (rank ≤ 5) of the best pairs in more than 73% test cell lines (Fig. 6C). With monotherapy results and genomic data, our model can be used in predicting drug synergism for a broad range of cell lines and drug types in the future. The embedded ideas of gene–gene network propagation and transfer learning across cell lines and drugs establish a new way to address the computational prediction of drug synergism. It is generalizable for other cell lines or new drugs.

We examined the predictive power of multiple features in this research, whereas more features can be added to the model. In our submission to the DREAM challenge, in addition to monotherapy features and simulated molecular features, we added one additional feature in SC1 and two additional features in SC2 to further improve the performance (see Supplementary File). The difference between the performance of the submitted full model and the simplified model is marginal (∼0.02; Supplementary Fig. S8). Thus, the two additional features are capable of improving the overall prediction performance but are weaker predictors compared with the two key feature sets. The size of the drug screening dataset grows quickly in recent years. A bigger drug combination screening dataset (∼22,000 samples) was published after the DREAM challenge (31). New methods built on this dataset have shown promising future for this research field (32). Our model also has the potential to be trained on even larger datasets and reach higher performance. In sum, our model shows the possibility of making cross-cancer large-scale predictions of drug synergy and can potentially facilitate the experimental design of drug combinations screening.

No potential conflicts of interest were disclosed.

Conception and design: H. Li, Y. Guan

Development of methodology: H. Li, Y. Guan

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): T. Li, Y. Guan

Writing, review, and/or revision of the manuscript: H. Li, T. Li, D. Quang, Y. Guan

Study supervision: Y. Guan

This work has been supported by NSF 1452656, Michigan Institute for Data Science (MIDAS), and George M. O'Brien Kidney Research Core Centers. Y. Guan is supported by these grants. We thank Hongjiu Zhang for technical and editorial assistance. We thank Shuai Hu for discussion about therapy-resistant and drug treatment in cancers.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Łopuch
S
,
Kawalec
P
,
Wiśniewska
N
.
Effectiveness of targeted therapy as monotherapy or combined therapy in patients with relapsed or refractory multiple myeloma: a systematic review and meta-analysis
.
Hematology
2014
;
20
:
1
10
.
2.
Gu
L
,
Liu
H
,
Fan
L
,
Lv
Y
,
Cui
Z
,
Luo
Y
, et al
Treatment outcomes of transcatheter arterial chemoembolization combined with local ablative therapy versus monotherapy in hepatocellular carcinoma: a meta-analysis
.
J Cancer Res Clin Oncol
2013
;
140
:
199
210
.
3.
Zhang
X
,
Zhang
X-J
,
Zhang
T-Y
,
Yu
F-F
,
Wei
X
,
Li
Y-S
, et al
Effect and safety of dual anti-human epidermal growth factor receptor 2 therapy compared to monotherapy in patients with human epidermal growth factor receptor 2-positive breast cancer: a systematic review
.
BMC Cancer
2014
;
14
:
625
.
4.
Escobar
C
,
Barrios
V
.
Combined therapy in the treatment of hypertension
.
Fundam Clin Pharmacol
2010
;
24
:
3
8
.
5.
.
Attacking AIDS with a “cocktail” therapy: drug combo sends deaths plummeting news
.
July 1, 1999 [cited 2018 Jan 24]. Available from
: https://aidsinfo.nih.gov/news/493/attacking-aids-with-a-cocktail-therapy-drug-combo-sends-deaths-plummeting.
6.
Hammer
SM
,
Saag
MS
,
Schechter
M
,
Montaner
JSG
,
Schooley
RT
,
Jacobsen
DM
, et al
Treatment for adult HIV infection: 2006 recommendations of the International AIDS Society–USA panel
.
Top HIV Med
2006
;
14
:
827
43
.
7.
Henkel
J
.
Attacking AIDS with a “cocktail” therapy?
FDA Consum
1999
;
33
:
12
7
.
8.
Tallarida
RJ
.
Quantitative methods for assessing drug synergism
.
Genes Cancer
2011
;
2
:
1003
8
.
9.
Schramm
A
,
De Gregorio
N
,
Widschwendter
P
,
Fink
V
,
Huober
J
.
Targeted therapies in HER2-positive breast cancer - a systematic review
.
Breast Care
2015
;
10
:
173
8
.
10.
Rosell
R
,
Karachaliou
N
,
Morales-Espinosa
D
,
Costa
C
,
Molina
MA
,
Sansano
I
, et al
Adaptive resistance to targeted therapies in cancer
.
Transl Lung Cancer Res
2013
;
2
:
152
9
.
11.
Neel
DS
,
Bivona
TG
.
Resistance is futile: overcoming resistance to targeted therapies in lung adenocarcinoma
.
NPJ Precis Oncol
2017
;
1
.
12.
Komarova
NL
,
Richard Boland
C
.
Cancer: calculated treatment
.
Nature
2013
;
499
:
291
2
.
13.
Flaherty
KT
,
Infante
JR
,
Daud
A
,
Gonzalez
R
,
Kefford
RF
,
Sosman
J
, et al
Combined BRAF and MEK inhibition in melanoma with BRAF V600 mutations
.
N Engl J Med
2012
;
367
:
1694
703
.
14.
Hu
C-MJ
,
Zhang
L
.
Nanoparticle-based combination therapy toward overcoming drug resistance in cancer
.
Biochem Pharmacol
2012
;
83
:
1104
11
.
15.
Sucher
NJ
.
Searching for synergy in silico, in vitro and in vivo
.
Synergy
2014
;
1
:
30
43
.
16.
Humphrey
RW
,
Brockway-Lunardi
LM
,
Bonk
DT
,
Dohoney
KM
,
Doroshow
JH
, et al
Opportunities and challenges in the development of experimental drug combinations for cancer
.
JNCI J Nat Cancer Inst
2011
;
103
:
1222
6
.
17.
Gayvert
KM
,
Aly
O
,
Platt
J
,
Bosenberg
MW
,
Stern
DF
,
Elemento
O
.
A computational approach for identifying synergistic drug combinations
.
PLoS Comput Biol
2017
;
13
:
e1005308
.
18.
Zhao
X-M
,
Iskar
M
,
Zeller
G
,
Kuhn
M
,
van Noort
V
,
Bork
P
.
Prediction of drug combinations by integrating molecular and pharmacological data
.
PLoS Comput Biol
2011
;
7
:
e1002323
.
19.
Flobak
Å
,
Baudot
A
,
Remy
E
,
Thommesen
L
,
Thieffry
D
,
Kuiper
M
, et al
Discovery of drug synergies in gastric cancer cells predicted by logical modeling
.
PLoS Comput Biol
2015
;
11
:
e1004426
.
20.
Huang
L
,
Li
F
,
Sheng
J
,
Xia
X
,
Ma
J
,
Zhan
M
, et al
DrugComboRanker: drug combination discovery based on target network analysis
.
Bioinformatics
2014
;
30
:
i228
36
.
21.
Iwata
H
,
R
,
Mizutani
S
,
Kotera
M
,
Yamanishi
Y
.
Large-scale prediction of beneficial drug combinations using drug efficacy and target profiles
.
J Chem Inf Model
2015
;
55
:
2705
16
.
22.
Menden
MP
,
Wang
D
,
Guan
Y
,
Mason
M
,
Szalai
B
,
Bulusu
KC
, et al
A cancer pharmacogenomic screen powering crowd-sourced advancement of drug combination prediction
.
BioRxiv February 13, 2018
.
Available from:
23.
Barretina
J
,
Caponigro
G
,
Stransky
N
,
Venkatesan
K
,
Margolin
AA
,
Kim
S
, et al
The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity
.
Nature
2012
;
483
:
603
7
.
24.
Yang
W
,
Soares
J
,
Greninger
P
,
Edelman
EJ
,
Lightfoot
H
,
Forbes
S
, et al
Genomics of drug sensitivity in cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells
.
Nucleic Acids Res
2013
;
41
:
D955
61
.
25.
Azuaje
F
.
Computational models for predicting drug responses in cancer research
.
Brief Bioinform
2017
;
18
:
820
9
.
26.
Wang
D
,
Menden
M
,
Yu
T
.
Data description of AstraZeneca-Sanger drug combination prediction DREAM challenge [cited 2017 Oct 15]. Available from:
https://www.synapse.org/#!Synapse:syn4231880/wiki/235651
27.
Di Veroli
GY
,
Fornari
C
,
Wang
D
,
Mollard
S
,
Bramhall
JL
,
Richards
FM
, et al
Combenefit: an interactive platform for the analysis and visualization of drug combinations
.
Bioinformatics
2016
;
32
:
2866
8
.
28.
Fitzgerald
JB
,
Schoeberl
B
,
Nielsen
UB
,
Sorger
PK
.
Systems biology and combination therapy in the quest for clinical efficacy
.
Nat Chem Biol
2006
;
2
:
458
66
.
29.
Geary
N
.
Understanding synergy
.
Am J Physiol Endocrinol Metab
2013
;
304
:
E237
53
.
30.
Guan
Y
,
Myers
CL
,
Lu
R
,
Lemischka
IR
,
Bult
CJ
,
Troyanskaya
OG
.
A genomewide functional network for the laboratory mouse
.
PLoS Comput Biol
2008
;
4
:
e1000165
.
31.
O'Neil
J
,
Benita
Y
,
Feldman
I
,
Chenard
M
,
Roberts
B
,
Liu
Y
, et al
An unbiased oncology compound screen to identify novel combination strategies
.
Mol Cancer Ther
2016
;
15
:
1155
62
.
32.
Preuer
K
,
Lewis
RPI
,
Hochreiter
S
,
Bender
A
,
Bulusu
KC
,
Klambauer
G
.
DeepSynergy: predicting anti-cancer drug synergy with Deep Learning
.
Bioinformatics
2017
;
34
:
1538
46
.