Long noncoding RNAs (lncRNAs) represent a large, diverse and tissue-specific class of transcripts that are involved in gene regulation. Recent large-scale cancer sequencing efforts indicate that lncRNAs are an important component of the cancer transcriptome, and may play a critical role in carcinogenesis and drug sensitivity. Accurate profiling of lncRNAs however remains a challenge owing to significantly lower expression levels than mRNA, requiring deep paired-end total RNA sequencing, which can be prohibitively expensive. Additionally, previous generation microarrays that constitute a vast majority of GEO and ArrayExpress datasets do not provide comprehensive lncRNA coverage.
Here we propose a lncRNA expression imputation (LEXI) framework to reconstruct the lncRNA transcriptome of cancer cells using their mRNA expression profiles. Our goal is to provide a tool that enables the harnessing of enormous wealth of publicly available cancer mRNA datasets and discover novel lncRNAs associated with carcinogenesis and drug sensitivity. The LEXI approach is based on learning patterns of mRNA expression associated with each lncRNA across a diverse cohort of cancer cells and then predict lncRNA expression profile of uncharacterized cells.
We developed LEXI by evaluating the performance of various machine-learning algorithms benchmarked in a cross-validation study across a cohort of 675 cancer cell lines and 9755 pan-cancer tissues. We adapted the LEXI framework based on optimal performance and computation time and show that LEXI accurately predicts lncRNA expression profiles in both cell lines and tissues. To demonstrate the utility of LEXI, we reconstruct the lncRNA transcriptome of over 1000 cell lines and 2000 TCGA samples, and compare with RNAseq measured lncRNA levels in corresponding samples. We further show expression levels of MALAT1, HOTAIR, CCAT1 and other established cancer-associated lncRNAs are accurately predicted across cancer types, and can be used to discover novel associations in uncharacterized phenotypes. LEXI will be available as a free resource for researchers to easily obtain lncRNA profiles using their own mRNA data.
Citation Format: Aritro Nath, Paul Geeleher, R. Stephanie Huang. Leveraging protein coding gene expression profiles to accurately impute lncRNA transcriptome of cancer cells [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2017; 2017 Apr 1-5; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2017;77(13 Suppl):Abstract nr 3481. doi:10.1158/1538-7445.AM2017-3481