Abstract
Reproducibility is essential to Open Science. If a finding cannot be reproduced by independent research groups its relevance is extremely limited, regardless of its validity. It is therefore crucial for scientists to describe their experiments in sufficient detail so they can be reproduced, challenged, and built upon. However, due to recent technological advances in the biological and computational sciences, experimental protocols, data analysis and interpretation have become increasingly complex. This has made reproducing research findings more challenging, with some researchers going as far as suggesting that the biomedical sciences are experiencing a "reproducibility crisis". In order to overcome these issues we developed ORCESTRA, a cloud-based platform that provides a transparent, reproducible and flexible computational framework for processing and sharing high-throughput multimodal biomedical data. The platform enables processing of genomic and pharmacological profiles of cancer samples through the use of automated processing pipelines executed by Pachyderm, a data versioning and orchestration tool. ORCESTRA creates an integrated and fully documented data object known as a PharmacoSet (PSet) for future analyses using the Bioconductor PharmacoGx package. A PSet includes cell line and drug annotations, along with molecular and pharmacological data from the largest studies and consortia. Our platform is currently being expanded to additional data types, which includes toxicogenomics, xenographic pharmacogenomic data, radiomics, and clinical genomic data. The automated pipelines can be accessed via a web interface (www.orcestra.ca). Users can view and download existing dataset or request a new one by selecting pipeline parameters. The web application provides features to improve user experience, and to accommodate different scenarios for ORCESTRA deployment. They include a personal account to save PSets, a dashboard to check the status of a requested pipeline, email notification upon the pipeline completion, handling pipeline requests while the Pachyderm cluster is offline, and “manual push” of the pipeline requests once the cluster becomes online. Funding: This project is supported by CIHR, under the frame of ERA PerMed.
Citation Format: Anthony Mammoliti, Petr Smirnov, Minoru Nakano, Zhaleh Safikhani, Sisira Nair, Arvind Singh Mer, Chantal Ho, Gangesh Beri, Benjamin Haibe-Kains. ORCESTRA: A platform for orchestrating and sharing high-throughput multimodal data analyses [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PR-07.