Abstract
Over the past several years, cancer genome characterization initiatives such as The Cancer Genome Atlas, International Cancer Genome Consortium, and the Tumor Sequencing Project have produced an explosion of genomic data. The pace of data production has increased with the adoption of next-generation sequencing technologies and large-scale data production efforts to discover the breadth of genomic variation in humans. Comprehensive analysis of these datasets requires the coordinated use of Web-based data repositories and applications, desktop analysis tools and visualizers, and single-purpose algorithms. However, the effort required to transfer data between tools, convert between data formats, and manage results often prevents researchers from utilizing the wealth of methods available to them. Many integrative genomics and translational “bench to bedside” discoveries are possible with combinations of existing tools, but the necessary transitions between them puts them out of the reach of most researchers. Cloud technologies have produced a new wave of applications that transfer next-generation sequence data directly from sequencers to a compute cloud for storage and analysis, but these systems are at an early stage of maturity and many are tied to commercial sequencing vendors.
GenomeSpace, http://www.genomespace.org, is an environment that brings together diverse computational tools, enabling scientists without programming skills to easily combine their capabilities. It aims to offer a common space to create, manipulate and share an ever-growing range of genomic analysis tools. GenomeSpace features support for cloud-based data storage and analysis, multi-tool analytic workflows, automatic conversion of data formats, and ease of connecting new tools to the environment. A set of six “GenomeSpace-enabled” seed tools developed by collaborating organizations provides a comprehensive platform for the analysis of cancer data: Cytoscape (UCSD), Galaxy (Penn State University), GenePattern (Broad Institute), Genomica (Weitzmann Institute), Integrative Genomics Viewer (Broad Institute), and the UCSC Genome Browser (UCSC). The extensible format of the system has empowered a wider range of cancer analyses through the addition of ArrayExpress (European Bioinformatics Institute), InSilico DB (University of Brussels), geWorkbench (Columbia University), and Cistrome (Dana-Farber Cancer Institute).
We show how researchers can use GenomeSpace to effortlessly combine the capabilities of all of these tools in several cancer research scenarios.
Citation Format: Michael Reich, John Liefeld, Helga Thorvaldsdottir, Marco Ocana, Thorin Tabor, DK Jang, Jill P. Mesirov. GenomeSpace: an environment for frictionless bioinformatics. [abstract]. In: Proceedings of the 104th Annual Meeting of the American Association for Cancer Research; 2013 Apr 6-10; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2013;73(8 Suppl):Abstract nr 5141. doi:10.1158/1538-7445.AM2013-5141