Gene fusions encode oncogenic drivers in hematological and solid tumors and are often associated with dramatic clinical responses with the appropriate targeted agents. In principle, massively parallel paired-end sequencing can identify structural rearrangements in tumor genomes and transcriptomes. However, computational methods to identify gene fusions are varied, still evolving and largely trained on cell line data. We sought to develop systematic methods to characterize known oncogenic gene fusions and to discover novel gene fusions in cancer. RNASeq data for approximately 3,400 clinical cases from 16 cancer types was obtained from the Cancer Genomics Hub (CGHub) of The Cancer Genome Atlas (TCGA). We surveyed the performance of several gene fusion callers and chose two (deFuse and TopHat) for further method development. An analysis pipeline was developed and executed in parallel on a high-performance computing cluster. Filtering and annotation was conducted on the aggregated data as a post-processing step, to enable exploratory analyses of various filters. We optimized filtering approaches on datasets that included known standards (e.g., TMPRSS2-ERG in prostate adenocarcinoma, PML-RARA in acute myeloid leukemia, etc.) to enrich for these and other gene fusions with correct 5’-3’ orientation while excluding cases with ambiguous breakpoints and spanning reads, alignment errors, and read-through transcripts from adjacent genes. Predicted fusions were summarized based on the occurrence of unique genes participating in fusions with multiple partners and of unique gene pairs, each within specific diseases. Elevated expression was observed after the predicted breakpoint of the 3’ gene in cases positive for predicted fusions, and added important confirmatory evidence. Thus, we characterized the incidence and distribution of several known oncogenic gene fusions including EML4-ALK and CCDC6-RET while expanding the number of gene partners identified in combination with oncogenes such as ROS1. In addition to characterizing the incidence and distribution of 31 known gene fusions, we nominated over 100 novel gene fusion pairs. One example of a novel gene fusion susceptible to available targeted therapy was FGFR3-TACC3 in 4% of bladder cancer, 2% of squamous cell lung carcinoma, and 1% each of glioblastoma and head and neck squamous cell carcinoma. Computational methods are now poised to complement biochemical approaches in the definition of the gene fusion landscape in cancer.
Citation Format: Seth E. Sadis, Nickolay A. Khazanov, Armand R. Bankhead, Dinesh Cyanam, Paul D. Williams, Sean F. Eddy, Peter J. Wyngaard, Daniel R. Rhodes. High-throughput, systematic analysis of paired-end next-generation sequencing data to characterize the gene fusion landscape in cancer. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 3173. doi:10.1158/1538-7445.AM2013-3173