Introduction. The NCI-funded Cancer Genomics Cloud (CGC) and the NCI Predicative Oncology Model and Data Clearinghouse (MoDaC) advance NCI computing infrastructure and tools that aim to reduce the burden of cancer on patients. The CGC provides a collaborative cloud base computation infrastructure that collocates computation, bioinformatics workflows, and 3+ PB data to researchers. MoDaC provides a publicly available resource generated from the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) Program and the Accelerating Therapeutics for Opportunities in Medicine (ATOM Consortium). The NCI-sponsored MoDaC program aims to add machine learning toolsets to identifying novel treatments for Cancer Patients. We present our progress collaborating with MoDaC to make machine learning models available on the Cancer Genomics Cloud.

Methods. Our Bioinformatics teams migrated MoDaC tools on the CGC, defined standards for releasing models, and collected recommendations for decreasing the time required to make MoDaC tools available on the Cancer Genomics Cloud. We mirrored the ATOM Modeling Pipeline (AMPL), a drug discovery platform, on the CGC by making the GIT repository cloud accessible through Jupyter Notebook access, supporting interactive analysis. We translated AMPL and JDACS4C models into CWL. Converting these ML models into CWL supports reproducible execution, scalable deployment, and computational portability.

Results. The AMPL drug discovery platform on the CGC includes data ingestion & curation, featurization, model training & tuning, prediction generation, visualization & analysis functionality as Jupyter notebooks and CWL workflows. The release consists of chemo-informatics tools for integrating cancer treatment features in deep-learning graph models. JDACS4C ML Models ported to the CGC include classifiers (tumor and normal-tumor pairs), autoencoders (Gene Expression), drug response predictors (single and combination), and Multitask Convolutional Neural Networks (extract information from cancer pathology reports).

Conclusion. We optimized the MoDaC Drug Discovery and Machine Learning tools into cloud-native resources on the CGC, supporting interactive and GUI-driven analysis. The release supports technical and newer users to machine learning, allowing access to a broader user base than those who traditionally have access to ML toolsets. These MoDaC toolsets will support pre-clinical study evaluation, treatment identification, and experimental design. Moreover, existing MoDaC-AMPL tutorials on the CGC support distributed ML-Drug Discovery training. Lastly, collaborating with MoDaC teams identified standardization approaches that can reduce the time and effort to make these tools widely available across the NIH-NCI computational infrastructure.

Citation Format: Soner Koc, Vojislav Varjacic, Miona Rankovic, Marijeta Slavkovic-Ilic, Aleksandar Danicic, Sean Black, Naomi Ohashi, Titli Sarkar, Zelia Worman, Jack DiGiovanna, Brandi Davis-Dusenbery, Dennis A. Dean. Collaborating to ensure data-driven drug discovery on the Cancer Genomics Cloud: Realizing the possibilities for MoDaC and ATOM. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5356.