Abstract
Major Finding: Driver mutations were identified in cryptic splice regions, 5′ UTRs, and rarely mutated genes.
Major Finding: Driver mutations were identified in cryptic splice regions, 5′ UTRs, and rarely mutated genes.
Approach: Probabilistic deep learning enabled genome-wide modeling of mutation rates in different cancers.
Impact: This work introduces a publicly available resource to identify drivers of cancer genome-wide.
Computational methods to discover key drivers of cancer have provided insights into mechanisms of tumor progression and highlighted opportunities for therapeutic intervention. The identification of driver mutations requires distinguishing between somatic mutations that undergo positive selection within tumors and neutral mutations that reflect background mutational processes. Because mutational rates vary widely depending on genomic context, efforts to characterize driver mutations have often been limited to protein-coding sequences and specific noncoding elements. To address these limitations, Sherman, Yaari, Priebe, and colleagues developed a probabilistic deep learning model to identify driver mutations genome-wide by calculating rates of somatic mutations genome-wide in a given type of cancer. Integrating datasets of somatic mutations spanning 37 tumor types from the Pan-Cancer Analysis of Whole Genomes and epigenetic information from healthy tissues, the model predicted rates of mutation based on properties that impact DNA repair and cancer-specific mutational processes. After generating a cancer-specific genome-wide map of mutational rates, the model was used to search for single-nucleotide variants (SNV) with evidence of positive selection and not only exceeded the performance of more limited methods but also required less runtime. At the pan-cancer level, SNVs at intronic cryptic splice sites in tumor suppressor genes such as TP53 and SMAD4 occurred significantly more often than expected given mutation baseline rates, whereas conversely oncogenes were not enriched for intronic cryptic splice SNVs, suggesting that positively selected cryptic splice mutations likely cause a loss of function. In addition to cryptic splice sites, the model also identified candidate driver mutations within the 5′ untranslated regions (UTR) of TP53 and the transcription factor ELF3. Notably, the model enabled identification of rare driver genes, demonstrating that the distribution of SNVs in a rare driver of a given cancer mirrored that of a cancer in which it is more common. In summary, this study provides a tool to enhance the understanding of putative drivers of cancer throughout the genome.
Note:Research Watch is written by Cancer Discovery editorial staff. Readers are encouraged to consult the original articles for full details. For more Research Watch, visit Cancer Discovery online at http://cancerdiscovery.aacrjournals.org/CDNews.