Existing cancer driver prediction methods are based on very different assumptions and each of them can detect only a particular subset of driver genes. Here we perform a comprehensive assessment of 18 driver prediction methods on more than 3,400 tumor samples from 15 cancer types, all to determine their suitability in guiding precision medicine efforts. We categorized these methods into five groups: functional impact on proteins in general (FI) or specific to cancer (FIC), cohort-based analysis for recurrent mutations (CBA), mutations with expression correlation (MEC), and methods that use gene interaction network-based analysis (INA). The performance of driver prediction methods varied considerably, with concordance with a gold standard varying from 9% to 68%. FI methods showed relatively poor performance (concordance <22%), while CBA methods provided conservative results but required large sample sizes for high sensitivity. INA methods, through the integration of genomic and transcriptomic data, and FIC methods, by training cancer-specific models, provided the best trade-off between sensitivity and specificity. As the methods were found to predict different subsets of driver genes, we propose a novel consensus-based approach, ConsensusDriver, which significantly improves the quality of predictions (20% increase in sensitivity) in patient subgroups or even individual patients. Consensus-based methods like ConsensusDriver promise to harness the strengths of different driver prediction paradigms.

Significance: These findings assess state-of-the-art cancer driver prediction methods and develop a new and improved consensus-based approach for use in precision oncology. Cancer Res; 78(1); 290–301. ©2017 AACR.

Cancers result from the accumulation of various types of DNA mutations including point mutations, indels, large-scale copy number aberrations (CNA), and structural variations (1). During tumor development, in addition to mutations that confer functional advantages to tumor cells (i.e., driver mutations; ref. 2), a large number of passenger mutations with no or little functional impact may arise, confounding our ability to identify the key events in oncogenesis for understanding and treating cancers (3).

Recent large-scale cancer genome sequencing efforts such as The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) have harnessed technologic advances in DNA/RNA sequencing to provide comprehensive mutation catalogs and associated omics profiles in tumors. These compendiums provide a rich resource for the development of integrative cancer driver prediction methods (genes and mutations; refs. 4–6). In addition, they further highlight the challenges that still remain in driver prediction. In particular, due to the heterogeneity of cancer types, often few frequently mutated (and likely driver) genes were identified in these studies with many more genes being rarely mutated and thus indistinguishable from noise due to passenger mutations (7, 8). Despite this, the ability to identify cancer drivers (genes and mutations) may be key for improved targeted therapy (9, 10). For example, breast cancer patients with ERBB2 driver mutations can respond successfully to the ERBB2 inhibitor trastuzumab (11), but similar therapy may also benefit patients with other cancers where ERBB2 mutations are rare (12). After the initial wave of large-scale cancer studies, different cohorts of patients continue to be sequenced with more distinct phenotypes (e.g., previously unprofiled disease sites or disease states such as tumors characterized by primary or acquired drug resistance). In the growing paradigm of precision oncology, individual patients are also sequenced either broadly (whole exome) or with targeted sequencing panels of genes selected by above studies and existing knowledge about the patient's disease and available treatments, to gain insights into biology and to match the right patient to the right drug at the right time. There is thus a deep biological and clinical need to identify the mutation that drives the tumor of a single patient.

Because of its biological and practical importance, a range of different approaches have been proposed for inferring the impact of mutations on genes and their likely role in cancer. These methods differ widely in the information they require as input (e.g., point mutations, indels, CNAs, expression data etc.), in the models/assumptions that they use, and what they can predict (driver gene or mutation; refs. 13, 14). For example, many methods are based on using information about protein structure and evolution to detect point mutations that may have a functional impact in general (FI; 15–18), or specifically in the context of cancer (FIC; refs. 19–21). These methods predict functional/driver mutations in each sample independently and their relative strengths have been studied in previous work (22, 23). With the availability of large and heterogeneous cancer genomic datasets, newer methods have focused on cohort-based analysis to search for biases in mutation frequency indicative of positive selection in driver genes (CBA; refs. 24–29; compared in ref. 30), or mined for mutation–expression correlations to highlight driver CNAs (MEC; refs. 31–33; jointly evaluated in ref. 34). Finally, a few methods have sought to incorporate information about gene interaction networks in their analysis with the aim of providing more sensitive predictions (35, 36), or to enable driver prediction based on the integrative analysis of genomic and transcriptomic data (interaction network-based analysis, INA; refs. 4–6).

Despite the diversity of driver prediction methods, a comprehensive evaluation of the strengths and weaknesses of different classes of methods on a diverse range of cancer types has not been conducted. We sought to address this by evaluating the performance of a panel of 18 different computational methods, covering a wide variety of models and input data types, on >3,400 tumor datasets from 15 TCGA cancer types. Methods were evaluated systematically for their concordance with gold standard lists of driver and passenger genes as well as mutations, for their robustness to noise in the input, for their utility for working with data from small patient cohorts, and for their ability to provide accurate and actionable patient-specific predictions for precision medicine applications. The overall predictive power for driver genes was found to be moderate, highlighting the need for novel approaches and improved methods. In addition, predictions from different classes of methods were found to be orthogonal to each other, motivating the development of a consensus-based approach (ConsensusDriver) to increase sensitivity and specificity of driver predictions across cancer types. Consensus-based approaches such as ConsensusDriver provide a systematic way to combine the strengths of different driver prediction algorithms in building an analytic toolbox for precision oncology.

Data source and preprocessing

CNA and exome point mutation data for all cancer types was obtained from GDAC via Firehose (https://gdac.broadinstitute.org). All point mutations excluding synonymous mutations (i.e., indels, missense, nonsense, and splice site variants) and CNAs with a value of 2 (focal amplification) or −2 (focal deletion) were used for downstream analysis. Expression data for tumor and normal samples for all cancer types was downloaded from the TCGA website (level 3; https://tcga-data.nci.nih.gov). For a detailed description of expression data analysis, see Supplementary Methods. Protein expression data was downloaded from the TCPA portal (level 4; http://www.tcpaportal.org/tcpa).

Assessment of driver prediction methods

In total, we evaluated 18 methods that could be used for driver prediction (Fig. 1A), classifying these methods into (i) methods that belong to the FI category (primarily designed to identify function altering mutations but have been used for predicting driver mutations; refs. 22, 23) such as SIFT (15), PolyPhen2 (PP2; ref. 16), MutationTaster (MT; ref. 17), and MutationAssessor (MA; ref. 18), (ii) methods that tailor this idea to cancer by learning specific models (Functional Impact in Cancer, FIC) such as CHASM (19), transFIC (TF; ref. 20), and fathmm (FH; ref. 21), (iii) methods that use cohort based analysis to detect genes with signals of positive selection (CBA) such as ActiveDriver (AD; ref. 29), MutSigCV (MCV; ref. 24), MuSiC (MUS; ref. 25), OncodriveCLUST (OCL; ref. 26), and OncodriveFM (OF; ref. 27; all point mutation based), (iv) methods that integrate mutation data with transcriptomic data by looking for mutation–expression correlations (MEC) such as Conexic (CON; ref. 31), OncodriveCIS (OCI; ref. 32), and S2N (33), and finally (v) methods that use information from gene/protein interaction networks to analyze the effect of mutations, such as NetBox (NB; ref. 35), HotNet2 (HN2; ref. 36), DriverNet (DN; ref. 4), DawnRank (DR; ref. 5), and OncoIMPACT (OI; ref. 6).

Figure 1.

Diversity of driver prediction methods and datasets. A, Two-way classification of driver prediction methods based on input data-types and modeling assumptions/approaches. The italicized methods were not included due to practical constraints on running them. B, Violin plots showing that cancer types vary widely in terms of their point mutation and CNA burden (number of patient is indicated under the name the cancer types).

Figure 1.

Diversity of driver prediction methods and datasets. A, Two-way classification of driver prediction methods based on input data-types and modeling assumptions/approaches. The italicized methods were not included due to practical constraints on running them. B, Violin plots showing that cancer types vary widely in terms of their point mutation and CNA burden (number of patient is indicated under the name the cancer types).

Close modal

A few methods were excluded from this benchmark for the following reasons: (i) they could not be run without further data processing, complex prefiltering steps or inclusion of additional data [Genome MuSiC (25), Conexic (31)] or (ii) provided incompatible predictions [Gistic2 (28) with region-level predictions].

For each method, we used default parameters or the set of recommended parameters provided in the method's manual or corresponding publication. In cases where methods required a threshold for candidate driver selection (e.g., on the P value or score for candidates), we used the value indicated in the method's publication or manual (see Supplementary Methods for a detailed description of the parameters and threshold used).

For analysis of patient-specific predictions, for most methods, mutated genes in each patient (with mutation types matching the expected input for the method) were ordered according to their rank on the full dataset. For FI and FIC methods, and for OncoIMPACT, mutation/patient specific scores were used to order genes (best score in the case of multiple mutations; ties broken by average gene score).

Performance evaluation

Comparison with cancer gene gold standard.

We assessed the performance of all methods against a gold standard list of cancer driver genes [union of Cancer Gene Census (37), a manually curated list of CNA driver genes (38), oncogenes from UniProt (http://www.uniprot.org/), gene list from the Vogelstein 20/20 rule (7), and a gene list from literature mining (39)] based on three different measures (on the top N predictions), precision (P), recall (R), and the F1 score (that combines both precision and recall):

formula

where T is the number of top N predictions in the gold standard and G is the total number of predictions.

Robustness to subsampling.

Subsampling analysis was performed for of each of the 7 cancer types with more than 200 tumor datasets. Two different measures were used to evaluate the robustness of results from a method on a subsample (S) when compared with its results on the full dataset (F): stability as a measure of precision when comparing the top N predictions of S (SN) to truth as defined by F,

formula

and recovery as a measure of sensitivity when comparing predictions in S to the top N predictions in F(FN),

formula

To make the comparison between S and F reasonable, we excluded from F genes that were not mutated in the subsampled dataset. To avoid penalizing sensitive or conservative methods, we choose N to be 20 as a majority of the methods provided >20 predictions.

Generation of decoy missense mutations.

For each patient, we introduced n false positive/decoy mutations, where n = 2%, 5%, or 10% of the number of mutations in a tumor. Point mutations were randomly placed in coding regions of unmutated genes with probability proportional to the coding length and missense mutations were selected using annovar (to avoid bias against methods that cannot analyze nonsense or splice-site mutations). For consistency, this analysis was restricted to the 12 cancer types annotated using the hg19 genome (i.e., COAD, OV, and READ samples, annotated using hg18, were excluded).

Construction of an actionable gene list.

We downloaded gene lists from IntOGen (https://www.intogen.org/) and OncoKB (http://oncokb.org/), and took the union of the actionable genes reported in them. We excluded drugs associated to a nonmutated gene from OncoKB, off-target genes from the IntOGen list, drugs targeting fusion genes, gene therapy targets, and genes associated to drug resistance. Each drug/gene association was classified into three levels in the following order of preference: approved drug (Level 1 and 2A from OncoKB and “FDA approved drug” from IntOGen), investigational target (Level 3A of OncoKB and “Drug in Clinical Trials” from IntOGen), and research target (all other genes).

Analysis of driver mutations.

For this analysis, we only studied missense mutations, as many benchmarked methods only predict drivers for this mutation type. We obtained a list of >2,100 known missense driver mutations and 227 likely nononcogenic mutations from the OncoKB database (http://oncokb.org/), and merged these with a list of >110,000 missense mutations (population allele frequency ≥ 1%, with no known clinical association) from the dbSNP database (https://www.ncbi.nlm.nih.gov/projects/SNP/, build 138) that are unlikely to be cancer drivers.

All methods were then evaluated using the measures Recall (R) and Accuracy (A):

formula

where D represents the number of known driver mutations (1,435), P represents the number of known passenger mutations (1,101 in all genes and 78 in the cancer genes gold standard) in the whole cohort, and TD and TP represent the number of correctly predicted driver and passenger mutations, respectively, in the whole cohort based on the top 5 patient-specific predictions.

ConsensusDriver method

ConsensusDriver is based on the Borda approach, where each gene was given a score equal to the sum over all methods of either its rank, if the gene was ranked, or the maximum number of predictions in that data set (M), if the gene was unranked.

formula

Genes were then reranked according to this score. To select the best set of methods for a particular cancer type, we used the following procedure (equivalent to a leave-one-out): (i) exhaustively compute the Borda consensus score on the 262,125 possible method combinations; (ii) select the method combination that obtained the best average combine score for the others 14 cancer types. For sample/mutation specific predictions, we integrated the patient/mutation–specific predictions of the six methods identified (fathmm, CHASM, OncodriveFM, MutSigCV, DriverNET, and OncoIMPACT; Supplementary Fig. S1). BORDAall used all the methods in constructing a BORDA-based ranking.

Different cancer types represent diverse driver prediction challenges

For the purpose of this study, we selected 15 different cancer types from TCGA for which exome sequencing, copy number, and expression data (RNA-seq or arrays) were available (BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; COAD, colon adenocarcinoma; GBM, glioblastoma multiforme; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; OV, ovarian serous cystadenocarcinoma; PAAD: Pancreatic Adenocarcinoma; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; STAD, stomach adenocarcinoma; THCA, thyroid carcinoma; see Materials and Methods). The cancer types selected vary widely in cohort sizes, mutational burden per patient and distribution of mutation types, thus representing a diverse set of challenges for driver prediction methods (Fig. 1B). For example, we noted that while some cancer types are predominantly affected by point mutations (KIRP) or CNAs (OV), others have similar number of genes affected by both point mutations and CNAs (GBM). In addition, certain cancer types exhibited a bimodal distribution for mutational burden (READ, COAD, PRAD, and KIRC) and this could impact the distributional assumptions of some methods. The distribution of mutation frequencies across genes also showed high variation between cancer types (Supplementary Fig. S2). For example, while LUSC and OV have many genes with mutation frequency above 25%, THCA has only 3 genes with frequency above 5%, potentially impacting the sensitivity of methods that are dependent on mutation frequency for driver prediction. We additionally noted that most tumors exhibited both point mutations and CNAs (Supplementary Fig. S3A) and thus methods that take only a subset of mutation types as input may be at a disadvantage in terms of sensitivity (e.g., FI and FIC methods that only consider missense variants; Supplementary Fig. S3B).

In terms of driver prediction methods, we attempted to be as comprehensive as possible. In total, we studied 18 methods, covering five different classes of driver prediction methods (Fig. 1A; Methods), and evaluated their performance in predicting cancer driver genes in patient cohorts and in individual patients, as well as driver mutations in patients.

Driver gene prediction identifies many novel drivers but sensitivity is still a bottleneck

To evaluate the ability of various driver prediction methods to accurately differentiate between driver and passenger genes in a dataset, we compiled gold standard lists for both. Specifically, we took the union of 5 different curated lists of driver genes that have been reported before, including the widely used Cancer Gene Census list (37), a manually curated list of driver genes affected by copy number alterations (38), genes annotated as oncogenes by UniProt (http://www.uniprot.org/), a gene list derived from the Vogelstein 20/20 rule (7), and a gene list derived from literature mining (Supplementary Table S1; ref. 39). Passenger genes were defined by taking the union of two manually curated lists of known passengers from NCG4 (40) and Rubio-Perez and colleagues (Supplementary Table S1; ref. 41). These gold standards are limited in that they are not cancer type or sample specific [although drivers are frequently shared (42) and targeted (43) across cancer types], but represent an attempt to construct as comprehensive a list as possible such that novel cancer driver genes can be more effectively demarcated. The methods were evaluated on how well their predictions identified cancer driver genes based on three standard measures (as well as others as detailed below): precision (fraction of predictions that belong to the gold standard), recall (fraction of the gold standard contained in the predictions), and the F1 score that combines both precision and recall (see Materials and Methods for a more detailed description).

Because of the wide variation in the number of driver predictions from different methods (median of 10 for MutSigCV to >8,000 for MutationTaster; Supplementary Fig. S4), we restricted our analysis to either the top 10 or top 50 predictions from each method (see Supplementary Note S1; Supplementary Fig. S5 for further details). An overview of the top 50 predictions for each method can be seen in Fig. 2A. In general, most methods report a low number of passenger genes in their top 50 predictions except for FI methods (∼20% of predictions). This is as expected as FI methods are not designed to specifically exclude function altering mutations that may not be linked to cancer, unlike the FIC methods.

Figure 2.

Evaluation of cohort-level predictions across cancer types. A, Average number of genes (over 15 cancer types) among top 50 predictions that belong to different classes (known drivers, passengers, and other genes). Note that some methods have less than 50 predictions on average. B, Summary results for the evaluation of the 18 driver prediction methods according to various criteria: precision (for top 10 or top 50 predictions), recall (for top 10 or top 50 predictions), and F1 score based on comparison with gold standard of known cancer genes, stability (precision when evaluated on predictions from the full data set), and recovery (recall when evaluated on the full data set) based on downsampling to a dataset of 50 samples. Results are averaged over three replicates on the 7 cancer types with ≥200 samples. For each evaluation metric, methods with the highest score are indicated by a tick mark. Shaded cells represent methods for which the downsampling was not performed, either because they are not affected by downsampling (baseline, FI, and FIC methods) or due to high computational time requirements (DawnRank). C, Annotation of the predicted driver genes according to the number of methods they are predicted by.

Figure 2.

Evaluation of cohort-level predictions across cancer types. A, Average number of genes (over 15 cancer types) among top 50 predictions that belong to different classes (known drivers, passengers, and other genes). Note that some methods have less than 50 predictions on average. B, Summary results for the evaluation of the 18 driver prediction methods according to various criteria: precision (for top 10 or top 50 predictions), recall (for top 10 or top 50 predictions), and F1 score based on comparison with gold standard of known cancer genes, stability (precision when evaluated on predictions from the full data set), and recovery (recall when evaluated on the full data set) based on downsampling to a dataset of 50 samples. Results are averaged over three replicates on the 7 cancer types with ≥200 samples. For each evaluation metric, methods with the highest score are indicated by a tick mark. Shaded cells represent methods for which the downsampling was not performed, either because they are not affected by downsampling (baseline, FI, and FIC methods) or due to high computational time requirements (DawnRank). C, Annotation of the predicted driver genes according to the number of methods they are predicted by.

Close modal

The number of known cancer-associated genes reported in the top 50 predictions of different methods varied widely, from a mean of 4 for OncodriveCIS to 27 for fathmm (a majority of these belong to the Cancer Gene Census). In general, the highest sensitivity was provided by methods in the FIC and INA categories, reporting >15 known driver genes in the top 50 list. Note that the FIC methods use a machine learning approach with training sets that substantially intersect our gold standard (Fig. 2A), and thus their sensitivity to predict new driver genes may not be accurately captured here. On the other hand, methods in the CBA category were most concordant with the list of gold standard driver genes (0.5 and 0.6 for OncodriveFM and MutSigCV, respectively; Fig. 2B; Supplementary Fig. S6A and S6B). Selecting the best method in each category, we observed that all methods were more enriched for driver genes in their top predictions as expected, and methods such as fathmm and OncoIMPACT retained high precision even for predictions lower down the list (Supplementary Fig. S7). A striking aspect of the results in Fig. 2A is the large number of predicted genes that are neither passengers nor known driver genes. The majority of these genes are predicted by a single method and are likely enriched in false positives (Fig. 2C). However, genes predicted by multiple methods were strongly enriched in cancer-related functions (Supplementary Fig. S8), highlighting the fact that many more driver genes remain to be discovered, and consistent with recent work showing that more driver genes exist even in extensively studied TCGA cancer types (8).

We used the F1 score that combine precision and recall to rank methods and compare them against a “baseline” method that simply orders genes based on mutation frequency (Fig. 2B; Supplementary Fig. S9A and S9B; see Materials and Methods). The methods fathmm, CHASM, NetBox, DawnRank, DriverNet, and OncoIMPACT provided significantly better results than baseline for precision and F1 score, while ActiveDriver, OncodriveFM, and MutSigCV showed significant improvement in precision (Wilcoxon rank sum test P < 0.1; Supplementary Fig. S10A and S10B). The lower scores observed for MEC methods was not explained solely by their restriction to CNAs (Supplementary Fig. S11A and S11B).

To evaluate how driver gene predictions are affected by cohort size, we tested the different methods for robustness and power using a subsampling approach that compares predictions for a method to those on the full dataset (stability = precision and recovery = recall compared to results from full dataset; see Materials and Methods; ref. 6). Many methods exhibited high stability (>70%) at least for the 50 and 100 sample comparisons (ActiveDriver, MutSigCV, S2N, DriverNet, OncoIMPACT; Fig. 2B; Supplementary Fig. S12A). However, few methods exhibited recovery >50% (NetBox, OncoIMPACT), highlighting challenges in uncovering driver genes when cohort sizes are limited (Fig. 2B; Supplementary Fig. S12B). Overall, as summarized in Fig. 2B, no single method outperformed the others in all metrics.

Most methods predict no driver genes for 10% of patients but many provide robust patient-specific predictions

We next evaluated methods for their ability to accurately identify driver genes in a patient-specific manner to assess their utility for precision medicine applications. Note that not all methods provide predictions per patient and for such methods, we assumed that nominated driver genes are drivers in all patients in which they were mutated. We began by computing statistics for the number of driver genes nominated in each patient by various methods, under the assumption that reporting too few (<1) or too many driver genes (>15) may make them less useful (Fig. 3A; number of drivers per patient is generally expected to be <10; ref. 1). Interestingly, with the exception of FI methods that call a large number of driver mutations for the majority of the patients, nearly all the other methods report no drivers for >10% of patients. This could be an indication of low sensitivity but could also be due to driver events having other origins (e.g., copy-number neutral rearrangements, large translocations, regulatory, or noncoding mutations or methylation and other epigenetic events) that were not considered by these methods. The method OncoIMPACT was found to be unusual in this aspect (even compared with INA methods) as it identified at least 1 driver gene in nearly all patients. Methods belonging to the CBA and MEC categories typically identified <2 driver genes in a large fraction of the cohorts (∼40%). On the other end, some methods frequently (in >50% of cohort) identified >15 driver genes in patients, suggesting that they may be overcalling at the patient-specific level (MutationTaster, MutationAssessor, SIFT, PolyPhen2, and S2N; Fig. 3A).

Figure 3.

Evaluation of patient-specific predictions. A, Number of predicted driver genes per patient. DawnRank was excluded for this analysis as it reports all mutations for a patient with no filtering criteria provided. B, Summary results for the evaluation of the 18 driver prediction methods according to various criteria: precision, recall, and F1 score based on comparison with gold standard of known cancer genes (for top 5 patient-specific predictions), specificity, and fraction of patients without false positives (FP) in their top 5 predictions based on the introduction of decoy mutations (10% of the number of mutations in the tumor). Results are averaged over three replicates for each cancer type. For each evaluation metric, methods with the highest score are indicated by a tick mark. Shaded cells represent methods for which the introduction of missense decoy mutations was not performed, either because they only process CNA data (S2N and oncodriveCIS) or due to high computational time requirements (DawnRank).

Figure 3.

Evaluation of patient-specific predictions. A, Number of predicted driver genes per patient. DawnRank was excluded for this analysis as it reports all mutations for a patient with no filtering criteria provided. B, Summary results for the evaluation of the 18 driver prediction methods according to various criteria: precision, recall, and F1 score based on comparison with gold standard of known cancer genes (for top 5 patient-specific predictions), specificity, and fraction of patients without false positives (FP) in their top 5 predictions based on the introduction of decoy mutations (10% of the number of mutations in the tumor). Results are averaged over three replicates for each cancer type. For each evaluation metric, methods with the highest score are indicated by a tick mark. Shaded cells represent methods for which the introduction of missense decoy mutations was not performed, either because they only process CNA data (S2N and oncodriveCIS) or due to high computational time requirements (DawnRank).

Close modal

Considering the top 5 patient-specific predictions, most methods provided similar precision and F1 score as in the cohort-level evaluation, with the network-based methods (INA) generally outperforming other approaches (Fig. 3B; Supplementary Fig. S13A and S13B). As before, CBA methods such as OncodriveFM and MutSigCV provided the best precision (Fig. 3B; Supplementary Fig. S14A and S14B).

To test robustness to noise and to estimate the specificity of the predictions at the patient-specific level, we introduced decoy passenger mutations in genes with probability weighted by the gene length (see Materials and Methods). Most methods exhibited good robustness to such noise with specificity generally higher than 95% (except for FI methods; Fig. 3B; Supplementary Fig. S15A). In particular, methods in the FIC category accounted well in identifying decoy function altering mutations, improving significantly over methods in the FI category. Also, as CBA methods explicitly model such sources of noise, they were found to have the best control over them. We also noted that most patients (>80%) do not have any of the decoy mutations in their predictions even when the overall specificity of a method is approximately 95% (Fig. 3B; Supplementary Fig. S15B). This is even more the case when only the top 5 or 10 predictions are considered, highlighting the robustness of many methods at the patient-specific level. As summarized in Fig. 3B, no single method uniformly outperformed the others at the patient-specific level as well.

Prioritization of actionable driver genes is still a challenge for most individual methods

The prioritization of driver genes and mutations that are actionable is a key requirement for decision support systems to aid in precision oncology. We sought to evaluate the performance of the various methods studied here based on curated lists of actionable genes (genes that can be targeted by a drug under certain conditions) from the OncoKB (http://oncokb.org/#/) and IntOGen (41) databases (see Materials and Methods; Supplementary Table S2). Analyzing the top 5 driver predictions per patient from each method, we observed significant variability in performance, with the fraction of patients with a predicted actionable driver gene varying from 6% for OncodriveCIS to >60% for DriverNET and OncoIMPACT (Fig. 4A). We observed that the different methods provided largely nonoverlapping predictions, enabling the union to predict actionable driver genes for up to 81% of patients. A breakdown of the predictions by cancer type (Fig. 4B) highlighted that six of them (LIHC, PRAD, KIRP, OV, KIRC, BRCA) have a much lower fraction of patients with predicted actionable driver genes. This could in part be due to the lack of sensitivity in driver prediction methods, but in most cases it is explained by the cancer types being enriched for nontargetable driver genes, highlighting the need for further drug discovery efforts in these cancer types. Finally, as a positive control test, we assessed the sensitivity of the methods in predicting two known actionable oncogenes, BRAF (various drugs are FDA approved for treating melanoma with V600 mutations; ref. 44) and PIK3CA (the inhibitor alpelisib is currently undergoing a clinical trial for breast cancer; ref. 45) in patients harboring known oncogenic mutations (BRAF V600 and 19 PIK3CA mutations located in the domains of the catalytic subunit; see Supplementary Table S3). We observed notable variation in the numbers of patients where the mutations were flagged as drivers by different methods (Fig. 4C), with multiple methods that did not report the genes for any patient (similar results were observed with top 10 predictions; Supplementary Fig. S16). These results highlight that the differences in the underlying model of various methods can lead to dramatically different abilities in predicting actionable drivers and that care should be exercised in interpreting and integrating results from different driver prediction systems.

Figure 4.

Sensitivity of methods for identifying actionable driver genes. Results are reported based on the top 5 driver predictions for each method. A, Fraction of patients with at least one actionable driver gene predicted. The dashed line represents the fraction of patient with at least one mutated actionable gene. The dashed line represents the fraction of patients where an actionable gene is mutated. B, Breakdown of patients according to their cancer type reveals a cluster of 6 cancer types (highlighted in gray rectangle) with a low fraction of patients with predicted actionable driver genes (complete linkage hierarchical clustering using Euclidean distance). C, Number of patients with a predicted actionable driver mutation for the genes BRAF and PIK3CA.

Figure 4.

Sensitivity of methods for identifying actionable driver genes. Results are reported based on the top 5 driver predictions for each method. A, Fraction of patients with at least one actionable driver gene predicted. The dashed line represents the fraction of patient with at least one mutated actionable gene. The dashed line represents the fraction of patients where an actionable gene is mutated. B, Breakdown of patients according to their cancer type reveals a cluster of 6 cancer types (highlighted in gray rectangle) with a low fraction of patients with predicted actionable driver genes (complete linkage hierarchical clustering using Euclidean distance). C, Number of patients with a predicted actionable driver mutation for the genes BRAF and PIK3CA.

Close modal

Low concordance across methods enables the construction of a better consensus-based approach

A comparison of driver predictions across methods revealed that in addition to the expected differences across categories, many methods had a significant number of calls that were unique to them (Supplementary Fig. S17A). This was particularly the case for FIC methods such as fathmm and CHASM, and network-based methods (INA) such as DawnRank, DriverNet, and OncoIMPACT. In addition, for the more sensitive methods (e.g., fathmm and OncoIMPACT), many predictions were shared by >4 methods suggesting that this could provide additional confidence for many of their calls. To evaluate whether consensus approaches could improve over predictions from individual methods, we evaluated a rank aggregation–based approach using all methods (BORDAall) as well as a subset of methods identified using cross-validation (ConsensusDriver; see Materials and Methods). We found that the same methods were consistently selected by ConsensusDriver across cancer types, covering a wide range of methods across categories (Supplementary Fig. S1), including CHASM, fathmm (FIC), OncodriveFM, MutSigCV (CBA), DriverNet, and OncoIMPACT (INA).

Across cancer types, while ConsensusDriver was able to improve over the best individual methods (1.4 × improvement compared with fathmm in median F1 score, one-sided Wilcoxon rank sum test P value = 10−3; Fig. 5A; Supplementary Fig. S17B), BORDAall did not show a significant improvement in precision (Supplementary Fig. S18A and S18B) or in the F1 score (Supplementary Fig. S17B; see Supplementary Note S2; Supplementary Fig. S19A and S19B for comparisons with other machine learning approaches). Comparing ConsensusDriver to two consensus-based gene lists, we noted that it improved recall and F1 performance over both of them [MutSig (8) and DriverDB (46); one-sided Wilcoxon rank sum test P value = 10−3 and 2 × 10−4 for F1 improvement]. Overall ConsensusDriver is a consistent improvement over individual methods and consensus-based gene lists exhibiting a precision of 0.9 for its top 10 predictions and 0.63 over its top 50 predictions (Fig. 5B).

Figure 5.

The utility of a consensus approach for driver prediction. A, Comparison with gold standard of known cancer genes (precision, recall, and F1 for the top 10 and 50 prediction) of the cohort-level predictions of ConsensusDriver, the 6 methods that it integrates (CHASM, fathmm, OncodriveFM, MutSigCV, DriverNet, and OncoIMPACT), and consensus gene lists from DriverDB and MutSig. B, Precision (fraction of predictions that belong to the gold standard) as a function of the number of predictions. C, Evaluation of patient-specific prediction for methods presented in Fig. 5A according to various criteria: precision, recall, and F1 score based on comparison with gold standard of known cancer genes (for top 5 predictions), specificity, and fraction of patients without false positives (FP) in their top 5 predictions based on the introduction of decoy mutations (10% of the number of mutations in the tumor), the fraction of patients with at least one predicted driver or actionable gene in the top 5 predictions, and recall and accuracy based on comparison with lists of driver and passenger mutations (for top 5 predictions). For each evaluation metric, methods with the highest score are indicated by a tick mark. Shaded cells represent experiments that we were not able to perform on DriverDB and MutSig as they only provide gene lists. D, Scatter plot depicting the number of known cancer genes and actionable genes in the top 5 patient-specific predictions of ConsensusDriver across cancer types.

Figure 5.

The utility of a consensus approach for driver prediction. A, Comparison with gold standard of known cancer genes (precision, recall, and F1 for the top 10 and 50 prediction) of the cohort-level predictions of ConsensusDriver, the 6 methods that it integrates (CHASM, fathmm, OncodriveFM, MutSigCV, DriverNet, and OncoIMPACT), and consensus gene lists from DriverDB and MutSig. B, Precision (fraction of predictions that belong to the gold standard) as a function of the number of predictions. C, Evaluation of patient-specific prediction for methods presented in Fig. 5A according to various criteria: precision, recall, and F1 score based on comparison with gold standard of known cancer genes (for top 5 predictions), specificity, and fraction of patients without false positives (FP) in their top 5 predictions based on the introduction of decoy mutations (10% of the number of mutations in the tumor), the fraction of patients with at least one predicted driver or actionable gene in the top 5 predictions, and recall and accuracy based on comparison with lists of driver and passenger mutations (for top 5 predictions). For each evaluation metric, methods with the highest score are indicated by a tick mark. Shaded cells represent experiments that we were not able to perform on DriverDB and MutSig as they only provide gene lists. D, Scatter plot depicting the number of known cancer genes and actionable genes in the top 5 patient-specific predictions of ConsensusDriver across cancer types.

Close modal

At the sample-specific level, ConsensusDriver is largely better than individual methods across metrics [e.g., it provides 1.5 × improvement over OncoIMPACT in recall (one-sided Wilcoxon rank sum test P < 2 × 10−16] and 1.35 × improvement in F1 score (one-sided Wilcoxon rank sum test P = 2.4 × 10−16); Fig. 5C], with the exception of precision (versus MutSigCV and OncodriveFM) and the fraction of patients without false positive predictions (versus MutSigCV). It arguably provides a better tradeoff though, by improving the fraction of samples with a predicted driver gene (from 0.8 for MutSig to 0.99) and predicted actionable driver genes (from 0.36 for MutSigCV to 0.67). This improved sensitivity is also accompanied by high specificity (0.99) for ConsensusDriver (Fig. 5C).

Evaluation of methods for their ability to predict driver mutations showed that despite being trained with a driver gene list, ConsensusDriver exhibits high recall (91% vs. 79% for DriverNet), as well as accuracy (93% vs. 87% for DriverNet; Fig. 5C). For genes mutated at a low frequency in the cohort (<2%), ConsensusDriver is able to correctly leverage the predictions of some of its constituent methods to identify known driver mutations (Supplementary Figs. S20 and S21). In addition, it retains higher discriminatory power in identifying driver and passenger mutations when restricted to known cancer genes compared to other methods (accuracy = 0.89 vs. 0.76 for DriverNet; Supplementary Fig. S22). An illustration of this can be seen in the NRAS gene in the TCGA gastric cancer cohort (mutation frequency < 2%) where ConsensusDriver predicts the known oncogenic mutations G12C as a driver for one patient and Q61R as a passenger for another patient. Deeper analysis of patient transcriptomes and proteomes reveal that the G12C mutations is accompanied by NRAS upregulation, significant deregulation of the RAS signaling pathway (12/71 genes from associated OncoIMPACT module, hypergeometric test P < 0.05) and on average a 2.7-fold increase in AKT phosphorylation over NRAS wild-type samples. In contrast, Q61R is accompanied by NRAS downregulation and little or no effect on RAS pathway expression and AKT phosphorylation. The notion that G12C and Q61R may be driver and passenger mutations respectively in the context of gastric cancer is further supported by analysis in an independent cohort of 167 patients that showed poor prognosis associated with NRAS codon 12/13 mutations and no patients with NRAS codon 61 mutations (47).

The additional sensitivity of ConsensusDriver helped establish that, with the exception of THCA, PRAD, and KIRP, most of the patients analyzed here have at least a known cancer gene in their predictions (Fig. 5D). The fraction of patients with actionable predicted driver genes is however lower, as many known driver genes are still not targetable (e.g., Ovarian Cancer, where most patients harbor a TP53 mutation, exhibits the lowest fraction of patients with a predicted actionable gene).

We provide the first systematic evaluation of different classes of driver prediction methods over a large number of cancer types. As the community still lacks standard evaluation protocols, we identified various criteria to evaluate predictions at the cancer-type level (concordance and sensitivity over know cancer genes, and stability/recovery of predictions upon sub-sampling) and at the patient-specific level (number of driver genes per patient, concordance with gold standard lists, robustness to noise mutations). The availability of our preformatted datasets, predictions from evaluated methods as well as a package of tools to study new predictions, provides a useful resource and a standardized framework to evaluate any newly developed method against a diverse panel of state-of-the-art methods and on a large number of cancer types.

A key result of our analysis is that there is no single method (or category of methods) that generally outperforms other methods and instead there are specific pros and cons that need to be taken into consideration when selecting a method for analyzing new datasets. For example, FIC methods are more appropriate for the analysis of a small number of samples when only exome data is available, while CBA methods should be selected for large-scale exome sequencing data sets and INA methods provide greater sensitivity when genomic and transcriptomic data is available. In general, our study highlights the value of integrative methods: for example, methods that are restricted to point mutations, not surprisingly, have a large drop in sensitivity in cancer types with significant amount of CNA events. In the ovarian cancer dataset, the best CBA method only predicts 3 known cancer genes compared to 18 using the best INA method. Furthermore, INA methods that integrate expression data (DriverNet, DawnRank, and OncoIMPACT) show, in most analyses, better results than methods that analyze only genomic data (NetBox and HotNet2). Further work is thus needed in this area, particularly in developing methods that incorporate information from other data types (e.g., miRNA-seq) and other mutation types (e.g., noncoding mutations).

Our study also provides a detailed analysis of the driver predictions at the patient level. It highlights the robustness (low false positive rate, high concordance with the gold standard) of driver gene predictions, but also the lack of sensitivity (significant fraction of patients with 0 to 1 driver predicted) of the vast majority of methods, with methods integrating expression having the best performance. In terms of prioritizing actionable genes or predicting driver mutations, most methods have even more severe limitations and integrating methods with different underlying models could help ameliorate this problem.

There are several limitations to our work: First, we limited our analysis to a single data source (TCGA, which currently provides the most comprehensive coverage of cancer types with genomic and transcriptomic data) and to a set of well-cited methods with software implementations that we were able to use successfully. Second, our evaluations were based on gold standard lists of driver genes and driver mutations that are not cancer-type or patient specific. They thus do not necessarily reflect the heterogeneity of cancers and lack direct evidence that a specific mutation has a functional role in a particular tumor. Other large-scale initiatives have tried to bridge this gap and provide cell-line specific shRNA (48) or drug resistance profiles (49). The results of these studies could potentially be used to generate more refined gold standards. However, such analysis will not come without drawbacks as (i) the cell lines used typically do not have normal controls and thus mutation calls can be error-prone and (ii) the experiments are limited to measuring cell growth and thus miss other relevant phenotypes (e.g., motility, invasiveness etc.). Nevertheless, large experimentally derived and patient specific gold standard driver mutation lists are needed to further advance the development and evaluation of new driver prediction methods.

Overall, our study highlights that while existing driver predictions methods can have limited sensitivity as a function of data types and modeling assumptions used, their diversity in fact provides an avenue for better consensus methods, as demonstrated by the novel consensus method proposed here (ConsensusDriver). Development of methods that harness new sources of information thus might provide greater benefits then refinement of existing paradigms for driver discovery. The design of a consensus-based approach requires careful selection of methods as demonstrated by the poor performance of Borda using all methods. The methods that are combined should be orthogonal enough to produce a different set of false positives and ideally sensitive enough to provide an intersecting set of true positives. The leave-one-out based selection approach used for ConsensusDriver allowed us to automatically perform this task, and helped select a set of methods with complementary strengths: high specificity of CBA methods, high sensitivity of FIC methods, and integration of different mutation types in addition to high sensitivity in INA methods. Our extensive evaluations suggest that ConsensusDriver not only provides good tradeoff between sensitivity and specificity for cohort level cancer gene prediction, but also for the prediction of patient specific driver mutations. In the context of precision oncology, ConsensusDriver's ability to integrate information across methods and accurately differentiate oncogenic from nononcogenic mutations, even for genes mutated in a single patient (Supplementary Fig. S21), should be very useful. Note that, ConsensusDriver is not fundamentally limited to the analysis of bulk tumors at a single time-point, and can be applied to longitudinal as well as spatially related data, including single-cell datasets for which mutation and transcriptome information are available (50). From a practical point of view, we provide an easy-to-use package to run 18 different driver prediction methods, as well as to aggregate their results into consensus predictions that are largely superior to the individual methods, thus serving as a valuable toolbox for precision oncology efforts.

A toolbox that contains scripts to reproduce results presented in this paper and to evaluate results from newly developed methods is available at https://github.com/CSB5/driver_evaluation. The site also contains results for each driver prediction method on all fifteen cancer types and the necessary input files (such as normalized expression, differential expression, mutations, and copy number alteration lists). The ConsensusDriver package is freely available under the MIT license at https://hub.docker.com/r/csb5gis/consensusdriver and allows users to run individual driver prediction methods as well as the consensus algorithm.

No potential conflicts of interest were disclosed.

Conception and design: D. Bertrand, S. Drissler, N. Nagarajan

Development of methodology: D. Bertrand, S. Drissler, B.K. Chia, N. Nagarajan

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D. Bertrand, S. Drissler, B.K. Chia, I.B. Tan, N. Nagarajan

Writing, review, and/or revision of the manuscript: D. Bertrand, B.K. Chia, J.Y. Koh, C. Suphavilai, I.B. Tan, N. Nagarajan

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): B.K. Chia, J.Y. Koh, C. Li

Study supervision: N. Nagarajan

We thank Dr. Anders Skanderup and Dr. Asif Javed for insightful comments and suggestions on the manuscript. This work was supported by funding from the Agency for Science, Technology and Research (A*STAR), Singapore.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Stratton
MR
,
Campbell
PJ
,
Futreal
PA
. 
The cancer genome
.
Nature
2009
;
458
:
719
24
.
2.
Vogelstein
B
,
Kinzler
KW
. 
Cancer genes and the pathways they control
.
Nat Med
2004
;
10
:
789
99
.
3.
Garraway
LA
,
Lander
ES
. 
Lessons from the cancer genome
.
Cell
2013
;
153
:
17
37
.
4.
Bashashati
A
,
Haffari
G
,
Ding
J
,
Ha
G
,
Lui
K
,
Rosner
J
, et al
DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer
.
Genome Biol
2012
;
13
:
R124
.
5.
Hou
JP
,
Ma
J
. 
DawnRank: discovering personalized driver genes in cancer
.
Genome Med
2014
;
6
:
56
.
6.
Bertrand
D
,
Chng
KR
,
Sherbaf
FG
,
Kiesel
A
,
Chia
BKH
,
Sia
YY
, et al
Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles
.
Nucleic Acids Res
2015
;
43
:
e44
.
7.
Vogelstein
B
,
Papadopoulos
N
,
Velculescu
VE
,
Zhou
S
,
Diaz
LA
,
Kinzler
KW
. 
Cancer genome landscapes
.
Science
2013
;
339
:
1546
58
.
8.
Lawrence
MS
,
Stojanov
P
,
Mermel
CH
,
Robinson
JT
,
Garraway
LA
,
Golub
TR
, et al
Discovery and saturation analysis of cancer genes across 21 tumour types
.
Nature
2014
;
505
:
495
501
.
9.
Garraway
LA
. 
Genomics-driven oncology: framework for an emerging paradigm
.
J Clin Oncol
2013
;
31
:
1806
14
.
10.
Garay
JP
,
Gray
JW
. 
Omics and therapy - a basis for precision medicine
.
Mol Oncol
2012
;
6
:
128
39
.
11.
Hortobagyi
GN
. 
Trastuzumab in the treatment of breast cancer
.
N Engl J Med
2005
;
353
:
1734
6
.
12.
Gunturu
KS
,
Woo
Y
,
Beaubier
N
,
Remotti
HE
,
Saif
MW
. 
Gastric cancer and trastuzumab: first biologic therapy in gastric cancer
.
Ther Adv Med Oncol
2013
;
5
:
143
51
.
13.
Gonzalez-Perez
A
,
Mustonen
V
,
Reva
B
,
Ritchie
GRS
,
Creixell
P
,
Karchin
R
, et al
Computational approaches to identify functional genetic variants in cancer genomes
.
Nat Methods
2013
;
10
:
723
9
.
14.
Cheng
F
,
Zhao
J
,
Zhao
Z
. 
Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes
.
Brief Bioinform
2016
;
17
:
642
56
.
15.
Ng
PC
. 
SIFT: predicting amino acid changes that affect protein function
.
Nucleic Acids Res
2003
;
31
:
3812
4
.
16.
Adzhubei
IA
,
Schmidt
S
,
Peshkin
L
,
Ramensky
VE
,
Gerasimova
A
,
Bork
P
, et al
A method and server for predicting damaging missense mutations
.
Nat Methods
2010
;
7
:
248
9
.
17.
Schwarz
JM
,
Cooper
DN
,
Schuelke
M
,
Seelow
D
. 
MutationTaster2: mutation prediction for the deep-sequencing age
.
Nat Methods
2014
;
11
:
361
2
.
18.
Reva
B
,
Antipin
Y
,
Sander
C
. 
Predicting the functional impact of protein mutations: application to cancer genomics
.
Nucleic Acids Res
2011
;
39
:
e118
.
19.
Carter
H
,
Chen
S
,
Isik
L
,
Tyekucheva
S
,
Velculescu
VE
,
Kinzler
KW
, et al
Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations
.
Cancer Res
2009
;
69
:
6660
7
.
20.
Gonzalez-Perez
A
,
Deu-Pons
J
,
Lopez-Bigas
N
. 
Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation
.
Genome Med
2012
;
4
:
89
.
21.
Shihab
HA
,
Gough
J
,
Cooper
DN
,
Stenson
PD
,
Barker
GLA
,
Edwards
KJ
, et al
Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models
.
Hum Mutat
2013
;
34
:
57
65
.
22.
Martelotto
LG
,
Ng
CK
,
De Filippo
MR
,
Zhang
Y
,
Piscuoglio
S
,
Lim
RS
, et al
Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations
.
Genome Biol
2014
;
15
:
484
.
23.
Gnad
F
,
Baucom
A
,
Mukhyala
K
,
Manning
G
,
Zhang
Z
. 
Assessment of computational methods for predicting the effects of missense mutations in human cancers
.
BMC Genomics
2013
;
14
Suppl 3
:
S7
.
24.
Lawrence
MS
,
Stojanov
P
,
Polak
P
,
Kryukov G
V
,
Cibulskis
K
,
Sivachenko
A
, et al
Mutational heterogeneity in cancer and the search for new cancer-associated genes
.
Nature
2013
;
499
:
214
8
.
25.
Dees
ND
,
Zhang
Q
,
Kandoth
C
,
Wendl
MC
,
Schierding
W
,
Koboldt
DC
, et al
MuSiC: identifying mutational significance in cancer genomes
.
Genome Res
2012
;
22
:
1589
98
.
26.
Tamborero
D
,
Gonzalez-Perez
A
,
Lopez-Bigas
N
. 
OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes
.
Bioinformatics
2013
;
29
:
2238
44
.
27.
Gonzalez-Perez
A
,
Lopez-Bigas
N
. 
Functional impact bias reveals cancer drivers
.
Nucleic Acids Res
2012
;
40
:
e169
.
28.
Mermel
CH
,
Schumacher
SE
,
Hill
B
,
Meyerson
ML
,
Beroukhim
R
,
Getz
G
. 
GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers
.
Genome Biol
2011
;
12
:
R41
.
29.
Reimand
J
,
Bader
GD
. 
Systematic analysis of somatic mutations in phosphorylation signaling predicts novel cancer drivers
.
Mol Syst Biol
2013
;
9
:
637
.
30.
Tokheim
CJ
,
Papadopoulos
N
,
Kinzler
KW
,
Vogelstein
B
,
Karchin
R
. 
Evaluating the evaluation of cancer driver genes
.
Proc Natl Acad Sci U S A
2016
;
113
:
14330
5
.
31.
Akavia
UD
,
Litvin
O
,
Kim
J
,
Sanchez-Garcia
F
,
Kotliar
D
,
Causton
HC
, et al
An integrated approach to uncover drivers of cancer
.
Cell
2010
;
143
:
1005
17
.
32.
Tamborero
D
,
Lopez-Bigas
N
,
Gonzalez-Perez
A
. 
Oncodrive-CIS: a method to reveal likely driver genes based on the impact of their copy number changes on expression
.
PLoS One
2013
;
8
:
e55489
.
33.
Hautaniemi
S
,
Ringnér
M
,
Kauraniemi
P
,
Autio
R
,
Edgren
H
,
Yli-Harja
O
, et al
A strategy for identifying putative causes of gene expression variation in human cancers
.
J Franklin Inst
2004
;
341
:
77
88
.
34.
Louhimo
R
,
Lepikhova
T
,
Monni
O
,
Hautaniemi
S
. 
Comparative analysis of algorithms for integration of copy number and expression data
.
Nat Methods
2012
;
9
:
351
5
.
35.
Cerami
E
,
Demir
E
,
Schultz
N
,
Taylor
BS
,
Sander
C
. 
Automated network analysis identifies core pathways in glioblastoma
.
PLoS One
2010
;
5
:
e8918
.
36.
Leiserson
MDM
,
Vandin
F
,
Wu
H-T
,
Dobson
JR
,
Eldridge J
V
,
Thomas
JL
, et al
Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes
.
Nat Genet
2014
;
47
:
106
14
.
37.
Futreal
PA
,
Coin
L
,
Marshall
M
,
Down
T
,
Hubbard
T
,
Wooster
R
, et al
A census of human cancer genes
.
Nat Rev Cancer
2004
;
4
:
177
83
.
38.
Santarius
T
,
Shipley
J
,
Brewer
D
,
Stratton
MR
,
Cooper
CS
. 
A census of amplified and overexpressed human cancer genes
.
Nat Rev Cancer
2010
;
10
:
59
64
.
39.
Pletscher-Frankild
S
,
Pallejà
A
,
Tsafou
K
,
Binder
JX
,
Jensen
LJ
. 
DISEASES: Text mining and data integration of disease–gene associations
.
Methods
2015
;
74
:
83
9
.
40.
An
O
,
Pendino
V
,
D'Antonio
M
,
Ratti
E
,
Gentilini
M
,
Ciccarelli
FD
. 
NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes
.
Database (Oxford)
2014
;
2014
:
bau015
.
41.
Rubio-Perez
C
,
Tamborero
D
,
Schroeder
MP
,
Antolín
AA
,
Deu-Pons
J
,
Perez-Llamas
C
, et al
In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities
.
Cancer Cell
2015
;
27
:
382
96
.
42.
Zack
TI
,
Schumacher
SE
,
Carter
SL
,
Cherniack
AD
,
Saksena
G
,
Tabak
B
, et al
Pan-cancer patterns of somatic copy number alteration
.
Nat Genet
2013
;
45
:
1134
40
.
43.
Redig
AJ
,
Jänne
PA
. 
Basket trials and the evolution of clinical trial design in an era of genomic medicine
.
J Clin Oncol
2015
;
33
:
975
7
.
44.
Ascierto
PA
,
Kirkwood
JM
,
Grob
J-J
,
Simeone
E
,
Grimaldi
AM
,
Maio
M
, et al
The role of BRAF V600 mutation in melanoma
.
J Transl Med
2012
;
10
:
85
.
45.
Massacesi
C
,
Di Tomaso
E
,
Urban
P
,
Germa
C
,
Quadt
C
,
Trandafir
L
, et al
PI3K inhibitors as new cancer therapeutics: implications for clinical trial design
.
Onco Targets Ther
2016
;
9
:
203
10
.
46.
Cheng
W-C
,
Chung
I-F
,
Chen
C-Y
,
Sun
H-J
,
Fen
J-J
,
Tang
W-C
, et al
DriverDB: an exome sequencing database for cancer driver gene identification
.
Nucleic Acids Res
2014
;
42
:
D1048
54
.
47.
Takahashi
N
,
Yamada
Y
,
Taniguchi
H
,
Fukahori
M
,
Sasaki
Y
,
Shoji
H
, et al
Clinicopathological features and prognostic roles of KRAS, BRAF, PIK3CA and NRAS mutations in advanced gastric cancer
.
BMC Res Notes
2014
;
7
:
271
.
48.
Cowley
GS
,
Weir
BA
,
Vazquez
F
,
Tamayo
P
,
Scott
JA
,
Rusin
S
, et al
Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies
.
Sci Data
2014
;
1
:
140035
.
49.
Garnett
MJ
,
Edelman
EJ
,
Heidorn
SJ
,
Greenman
CD
,
Dastur
A
,
Lau
KW
, et al
Systematic identification of genomic markers of drug sensitivity in cancer cells
.
Nature
2012
;
483
:
570
5
.
50.
Baslan
T
,
Hicks
J
. 
Unravelling biology and shifting paradigms in cancer with single-cell sequencing
.
Nat Rev Cancer
2017
;
17
:
557
69
.