Introduction: The druggable genome is defined as a set of roughly 3000 genes encoding proteins for which drugs can be readily designed to modify their activity. Nearly half of the genes of the druggable genome are G protein-coupled receptors (GPCRs), ion channels, protein kinases, or nuclear receptors. To date, the majority of the genes/proteins from these four families have been scarcely investigated and therefore little is known about their molecular interactions, biological functions, and roles in diseases. Illuminating the Druggable Genome (IDG) is an NIH Common Fund project that is supporting experimental and computational efforts to investigate under-studied druggable genome targets. Our objective is to perform computational analyses on existing biological data to prioritize understudied targets based on their likely disease relevance.

Methods: We collected and organized information from a diverse set of 23 resources about GPCRs, ion channels, protein kinases, and nuclear receptors. These resources cover four categories of information: curated physical interactions (e.g. kinase-substrate phosphorylations), literature-derived annotations (e.g. Gene Ontology terms), physical properties (e.g. structural domains), and data from large scale profiling studies (e.g. gene expression profiles of cancer cell lines and other disease models). We distilled the information from each resource into attribute tables where the rows are targets and the columns their attributes. This enabled us to build target and attribute networks for unsupervised learning, and cross validate and combine evidence from diverse resources for supervised learning. We integrated kinase similarity matrices using regularized logistic regression to predict phosphorylation reactions between kinases. We also used gene expression data within The Cancer Genome Atlas (TCGA) to perform enrichment analysis against several gene signatures for druggable genome targets, including: signatures of differentially expressed genes (DEG) following target knockdown, signatures of DEG following target over-expression, and signatures of proteins reported to interact with targets. We used an unsupervised approach to obtain consensus scores for the relative likelihood of a target to be dysregulated in a tumor sample, and then performed hierarchical clustering to match groups of cancer patients with potential novel targets.

Results: We computed receiver operating characteristic (ROC) curves to assess the performance of kinase similarity matrices as predictors of kinase-kinase phosphorylation reactions. We investigated the performance of the following similarity matrices individually and after integration using cross-validated, regularized logistic regression: similarity of DEG after kinase knockdown, phylogenetic similarity, similarity of kinase knockout phenotypes, similarity of curated annotations, and similarity of curated interactions. We found that although the predictive performance of each individual similarity matrix varied, with areas under the ROC curve ranging from 0.57 to 0.82, integrating similarity matrices improved the predictive performance up to 0.86. The predicted kinome network can be used to prioritize under-studied kinases by searching for connections between kinases known to play a role in cancer or other diseases. Using an unsupervised approach to identify potentially dysregulated targets in patient tumor samples, we found that within a cancer type (e.g. Acute Myeloid Leukemia), clusters of patients emerged with distinct sets of enriched targets. Under-studied genes within these sets are potentially novel targets that may be investigated further to confirm disease relevance and therapeutic potential.

Conclusions: The attribute tables, similarity matrices, and data integration pipelines we developed will enable prioritization of targets for cancer and other diseases.

Citation Format: Andrew D. Rouillard, Avi Ma'ayan. Data integration for illuminating the druggable genome. [abstract]. In: Proceedings of the AACR Special Conference on Computational and Systems Biology of Cancer; Feb 8-11 2015; San Francisco, CA. Philadelphia (PA): AACR; Cancer Res 2015;75(22 Suppl 2):Abstract nr B1-25.