The Cancer Genomics Cloud (CGC) powered by Seven Bridges makes the analysis of large datasets accessible from any environment, which is critical as the volume of datasets continues to expand. This expansion of the size, complexity, and location of datasets has been prominent in the field of single-cell genomics, which can present challenges for data analysis. Here we combined the rich datasets available through the Human Cell Atlas (HCA) with the tools and computational resources available on the CGC to provide a path for any scientist to explore complex biological questions. To demonstrate these capabilities, we processed a single-cell dataset of immune and stromal cells isolated from the tumor and lymph nodes of a mouse model of melanoma using an optimized version of the SmartSeq2 pipeline recommended by the HCA. This dataset contains 13.3k paired-end sequencing files for 6.6k estimated cells, with a total size of 371.60 GB. The entire dataset was transferred from Human Cell Atlas Data Portal to the CGC, where it is accessible to all users.
The goal of this study was to process a single-cell dataset on the tumor microenvironment with the SmartSeq2 pipeline and to investigate changes in transcriptome profiles of endothelial tumor cells over time. We used an alternative approach than authors in the original study, which aimed to analyze the immune and stromal cells (Davidson et al. bioRxiv 2018). To perform alignment and quantification of raw sequencing reads using the CGC execution environment, we created an implementation of the SmartSeq2 pipeline in Common Workflow Language. Quantification results were used to perform cell clustering, followed by a pseudotime analysis on the endothelial tumor cells. For the cell clustering and pseudotime analyses, we created interactive notebooks in the RMarkdown format that can be executed within the RStudio environment on the CGC. In summary, this work demonstrates the importance of developing reproducible workflows that can run on multiple environments as well as data sharing that can bring to light new insights from the existing data.
Citation Format: Nemanja Vucic, Manisha Ray, Dalibor Veljkovic, Stefan Cidilko, Brandi Davis-Dusenbery. Single-cell analysis on the Cancer Genomics Cloud reveals changes in the transcriptome profiles of endothelial tumor cells over time - novel insights from a public dataset of a mouse model of melanoma [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 4414.