Over the past decade, high-throughput molecular profiling technologies have revolutionized cancer research. Petabytes of omics data (e.g., genomic, transcriptomic, proteomic, epigenomic, and metabolic data) have been generated from thousands of patients, animal models, and cell line samples, especially through some large consortium projects such as The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC). These rich, high-throughput cancer omics data have provided an unprecedented opportunity to characterize cancer-related molecular mechanisms and identify biomarkers and therapeutic targets systematically. However, this tidal wave of molecular data also presents a major challenge for cancer researchers in analyzing the data and obtaining meaningful biological and clinical insights effectively. This is particularly true for a large proportion of researchers who have no or limited bioinformatics and statistical expertise. Many efforts have been made to overcome this challenge. First, many programming languages with specially designed modules or libraries to allow easy analysis and visualization of omics data have been developed. However, these tools still require users to acquire some programming skills, such as Python, R, and Perl, which is not feasible for most experimental researchers. Second, many web-based and stand-alone bioinformatics databases and applications have been developed to allow users to explore and analyze cancer omics data through a user-friendly, interactive interface. But these bioinformatics tools usually focus on one specific type of molecular data, provide only predefined analysis, and do not allow the customization of analytic and visualization tasks. Moreover, users still have to spend considerable time identifying appropriate tools, learning distinct user interfaces/procedures, in addition to keeping track of the status and updates for these quickly evolving tools. As a result, there is still a substantial barrier that prevents a large body of cancer researchers from performing cancer omics data analyses in an intuitive, efficient, and reproducible way. To address this challenge, our team has developed DrOncoRight (https://drbioright.org), an open-access, natural language-oriented, artificial intelligence (AI)-driven analytics platform for analyzing and visualizing cancer omics data. A major attractive feature of this platform is that it allows users to ask biological questions through natural language (text or voice). It automatically understands users' intentions, identifies related cancer genomic datasets, performs diverse analyses, and returns the results in a timely, visually attractive manner. Further, based on user feedback, the platform improves itself through active learning. Taken together, equipped with cutting-edge AI, informatics, and analytic technologies, DrOncoRight represents a revolutionary approach to next-generation cancer science which greatly increases the efficiency and reproducibility of data analysis in cancer research.
Citation Format: Jun Li, Hu Chen, Yumeng Wang, Mei-Ju Chen, Han Liang. DrOncoRight: A natural language-oriented analytics platform for cancer omics data [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 3383.