Abstract
Current high-throughput ligand and structural virtual screening methods typically rely on the identification of a specific functional target for the screen; however, the relationship between cancer genetic phenotype and the specific functional target is not fully understood at the onset of drug development. Cancer cell line (CCL) screening panels are continually used to study the relationship between cellular genetic and ligand structure. We propose combining the speed of machine learning-based virtual screening techniques with the greater than four million in vitro cancer cell line data points generated from the NCI-60, GDSC, and CTRP cancer cell line screens. Using deep neural networks, we train a regression neural network on an integrated dataset of CCL to predict area under the dose-response curve (AUC). Molecules are characterized using the Mordred descriptor package and with a message-passing network on the structure of the molecule, and cell lines are represented by LINCS100 RNA-seq and SNP quantification data. We present two validation styles using our model on random unseen training drugs and by unseen scaffolds. On unseen drugs, the enrichment factor is 42% for the top 1% predictions and BEDROC score is 0.25, based on binning active compounds as the bottom 2% of the overall AUC distribution from the NCI-60. Using the model as a screening tool in inference mode, we scored 1 million compounds from Enamine REAL lead-like compounds on all NCI60 and CCLE cell lines, though the entire screening results will be forthcoming (>450M compounds screened). Initial results show compounds screened by the model have a higher rate of bioactivity just based on the PubChem listings. We verify the robustness of the virtual screen by presenting common functional groups across highly scored drugs and finding stability across small perturbations both to the featurization and molecular structure. For molecules listed in PubChem, the top 10% of predicted compounds had 10x the bioactivity ranking than the bottom 10% of predictions from our model. We present a new high-throughput method for cancer drug virtual screening using CCL dose response. Our method utilizes aggregated data from in vitro CCL panels to train the dose-response model and is a multiclass model across all common CCLs. While we believe this technique is valuable for the community as a tool for integrating experimental cancer cell line data into existing virtual screening infrastructure, we acknowledge verification with in vitro testing outside the model validation data needs to be addressed in future work.
Citation Format: Austin Clyde, Arvind Ramanathan, Rick Stevens. Virtual screening with deep learning using cancer cell line dose-response data [abstract]. In: Proceedings of the AACR Special Conference on Advancing Precision Medicine Drug Development: Incorporation of Real-World Data and Other Novel Strategies; Jan 9-12, 2020; San Diego, CA. Philadelphia (PA): AACR; Clin Cancer Res 2020;26(12_Suppl_1):Abstract nr 36.