Long noncoding RNAs (lncRNA) play important roles in maintaining morphology and function of tissues, and their regulatory effectiveness is closely associated with spatial expression. To provide a comprehensive spatial atlas of expression for lncRNA, we propose LncSpA (http://bio-bigdata.hrbmu.edu.cn/LncSpA) to explore tissue-elevated (TE) lncRNA across human normal and adult and pediatric cancer tissues. In total, 71,131 and 12,007 TE lncRNAs and 634 clinical-related TE lncRNAs were identified across 38 normal and 33 adult cancer tissues. Moreover, 4,688 TE and 413 clinical-related lncRNAs were identified in pediatric cancer. By quick searching or query options, users can obtain eight major types of detailed information for lncRNA via various visualization techniques, including qualitative and quantitative spatial expression in different resources, coexpressed mRNAs, predicted function, known disease association, and the potential to serve as diagnostic or prognostic markers. LncSpA will be a valuable resource to understand lncRNA functions across tissues and cancers, leading to enhanced therapeutic strategies in precision oncology.
LncSpA is a new interactive resource that provides the spatial expression pattern of lncRNA across thousands of normal and cancer samples representing major tissue types.
Long noncoding RNAs (lncRNA) control various crucial biological functions to maintain morphology and function of tissues (1). Their precise regulatory effectiveness is closely associated with spatial expression patterns across tissues, whose dysfunction often influences disease development and progression (2). Thus, global spatial characterization of lncRNA expression and distribution across tissues under both physiologic and pathologic states would improve our understanding of lncRNA functions in complex diseases.
The first step for exploring the spatial expression patterns is to identify genes with tissue-elevated (TE) expression in a certain tissue or groups. TE genes usually serve as biomarkers of specific biological processes or tissues in which they are expressed (3). Moreover, lncRNAs are known to be more tissue specific than protein-coding genes (4). It has been shown that disease genes, including lncRNAs, generally tend to be expressed in a limited number of tissues. For example, a liver tissue–specific expression lncRNA, highly upregulated in liver cancer (HULC), was found to be striking upregulated in liver hepatocellular carcinoma (LIHC; ref. 5), and regulate cell proliferation, survival, and migration/invasion. These results suggest that characterization of spatial expression is an appropriate way to understand lncRNA functions across tissues and cancers.
Because of the importance of spatial expression features, several databases that offer tissue-sensitive information about expression and interactome for protein-coding genes were developed (6–10). However, the spatial expression atlas for lncRNAs across different human tissues and adult or pediatric cancer types is still limited. To fill this gap, we constructed a user-friendly resource LncSpA (LncRNA spatial atlas of expression), available at http://bio-bigdata.hrbmu.edu.cn/LncSpA/. LncSpA provides comprehensive information about spatial expression for lncRNAs. Various types of visualization and tables were used to characterize TE lncRNAs, predicted functions based on coexpression mRNAs, diseases association, and the potential as diagnostic or prognostic markers. LncSpA not only facilitates computational investigators to perform integrative analysis of TE lncRNAs in interesting tissues, but also enables experimental scientists to analyze their own data in the context of other related public data.
Materials and Methods
LncRNA transcriptome resources across normal tissues
Four widely available transcriptome datasets were collected (Fig. 1A; Supplementary Table S1), including the Genotype-Tissue Expression (GTEx) consortium (12), Human BodyMap 2.0 (HBM2.0; ref. 13), Human Protein Atlas (HPA; ref. 14), and FANTOM5 project. First, we identified the top 1,000 lncRNAs with high expression variation in each resource. There were 74 lncRNAs ranked top 1,000 in all resources. On the basis of the expression of these lncRNAs, we observed that tissues from different data resources were clustered together by t-SNE dimension reduction (Fig. 1B). High correlations were also observed for the same tissues among different resources (Fig. 1C), suggesting that lncRNA expression in these resources appears quite consistent at both tissue and sample levels.
LncRNA transcriptome across cancer types
We obtained lncRNA and gene transcriptome across 33 representing cancer types generated by The Cancer Genome Atlas (TCGA; http://cancergenome.nih.gov/), covering 11,093 samples (Supplementary Table S1). Moreover, we also downloaded the lncRNA and gene expression in 711 samples across seven pediatric cancer types from TARGET project (Fig. 1A; ref. 10). The clinical data of these patients were also downloaded.
Collecting lncRNA–disease association
All the known lncRNA–disease associations were collected from LncRNADisease, Lnc2Cancer, and exoRBase, which curated the literature-reported disease or cancer-related lncRNAs.
Creating TE lncRNA list
To create TE lncRNAs in each resource, we identified lncRNAs that have at least 5-fold higher expression levels in one tissue compared with all other tissues (3, 15). Moreover, TE lncRNAs were further classified into three subcategories to reflect increasing degrees of elevated expression in a particular tissue, including “tissue specific (TS),” “tissue enriched (TER),” and “tissue enhanced (TEH).” (i) TS lncRNAs were expressed only in a particular tissue, where the expression thresholds were set 0.1 for fragments per kilobase per million (FPKM), and 0.06 for counts per million (CPM) based on the similar proportion of expressed lncRNAs; (ii) TER lncRNAs were with at least 5-fold higher expression level in a particular tissue compared with all other tissues; and (iii) TEH lncRNAs were with at least 5-fold higher expression level in a particular tissue compared with the average expression levels in all other tissues. Similarly, we also identified cancer TE lncRNAs in each cancer type. To integrate TE lncRNAs identified from four different resources and further ensure the reliability of TE lncRNAs in normal tissues, we focused on 34 tissues that were investigated in at least two projects. TE lncRNAs in the “integration” were refined to those identified in the same tissue from at least two resources. Moreover, TE protein-coding genes in each tissue under normal or cancer states were also identified.
Identifying coexpressed mRNAs for TE lncRNAs
The coexpressed mRNAs for each TE lncRNA were identified via Pearson correlation coefficient (PCC). Users can set different PCC thresholds, spatial pattern of mRNAs, or the number of coexpressed resources, to visualize the coexpressed lncRNA-mRNA subnetwork based on the tool echarts. Moreover, TE mRNAs in the same tissue were also highlighted in the coexpression subnetwork by node colors.
Function prediction of TE lncRNAs
After users select the coexpressed mRNAs in the above step, genes are online subjected into the GREAT tool to predict enriched functions of lncRNA (16), including Gene Ontology categories, pathways, human phenotype, disease ontology and so on.
Identifying cancer clinical–related TE lncRNAs
First, t test was used to identify differentially expressed lncRNAs in each of 17 cancer types with more than five adjacent tissues to cancer. TE lncRNAs with fold changes >2 or <0.5 and Padjusted < 0.05 were identified as differentially expressed. To identify the survival-related lncRNAs in cancer, both Cox regression analysis and log-rank test were performed. Univariate and multivariate Cox regression analysis was used to evaluate the association between survival and expression level of each TE lncRNA. We ranked patients with cancer based on the expression of TE lncRNA. The difference in survival between high- and low-expression groups was evaluated by the Kaplan–Meier method. LncRNAs with P < 0.05 were identified as prognostic markers in cancer.
LncSpA was built based on Java Server Pages with Tomcat container (v 6.0). All data in LncSpA were documented and managed in MySQL database (v 5.5.48). Figures of query results in result pages were produced by Highcharts 7.1.2. The website has been tested on several popular web browsers, including Google Chrome (preferred), Firefox, or Apple Safari browsers.
On the basis of lncRNA spatial transcriptome in 20,519 samples, covering 38 normal tissues and 33 cancer types, an integrative pipeline was designed to identify comprehensive TE lncRNAs in each resource. In total, 8,875, 9,837, 13,337, 10,718, and 74,767 TE lncRNAs were identified from Integration, GTEx, HPA, HBM2.0, and FANTOM, respectively (Supplementary Table S2). We found that there was higher number of TE lncRNAs in brain and testis tissues (Fig. 1D). In addition, 12,007 cancer TE lncRNAs were identified across 33 cancer types from TCGA, ranging 64 in CESC to 3,356 in LAML, with a median of 226 TE lncRNAs per cancer type (Fig. 1D). Moreover, 634 clinical-related TE lncRNAs were identified, 399 TE lncRNAs were significantly differential expressed and 267 were associated with patient survival. We also identified 4,688 TE lncRNAs and 413 clinical-related lncRNAs in pediatric cancer.
We provided a user-friendly web interface that can enable users to query the database by a few flexible steps (Fig. 2). The users can switch between adult and pediatric cancer types by typing the button on the homepage. Users could click on the corresponding button in the homepage to enter the “Browse,” “Search,” and “Download” pages for browsing, searching, and downloading all TE lncRNAs in LncSpA. TE lncRNAs could be browsed by “LncRNA-Centric,” “Normal-Centric,” “Cancer-Centric,” and “Clinical-Centric.” In LncRNA-Centric page, all TE lncRNAs were organized in the hierarchical structure based on chromosomal localization. In the other three browse pages, normal, adult, and pediatric cancer tissues were also organized in hierarchical structure based on anatomic classification in human body map. Moreover, users can quickly enter the “Searching Result” pages by clicking tissue of interest in human body figure on the home page. In the “Search” sections, four different query options were provided on the basis of normal or cancer tissues of interest, lncRNA names, or chromosomal location. In addition, the statistic information of LncSpA can be accessed from the “Statistics” page. All data in the database can be freely downloaded from the “Download” page. A detailed tutorial showing how to browse and query data is available on the “Help” page.
To illustrate the use of the resource, we provided two examples of LIHC-related lncRNAs. The users can get the information via selecting LIHC in the “Cancer-Centric” browsing page (Supplementary Fig. S1A), or clicking the corresponding circle on human body figure in the home page, or inputting LIHC in the “Cancer-Centric” search interface (Supplementary Fig. S1B and S1C), all possible TE lncRNAs in LIHC were displayed in the result page (Supplementary Fig. S1D). Similarly, corresponding “LncRNA-Centric” browsing and searching models were also provided, for example lncRNA HULC as input. All tissues entries with elevated expression were returned. By clicking the details button in these tables, users can further obtain more details for each individual entry. A hyperlink was linked to the detail result page for the TE lncRNA HULC in LIHC. Eight major types of information were provided (Supplementary Fig. S1E–S1L). (i) Basic annotation information was first provided. (ii) TE cancer tissue (LIHC), subclassification of TE, and corresponding expression level were listed in a table, and a bar chart of expression across cancer tissues was followed. (iii) Its qualitative and quantitative spatial expression patterns in normal tissues (liver tissue as its TE tissue) were given by the order from robust TE tissue (integration) to each individual resource. (iv) Coexpression between this lncRNA HULC and mRNAs were shown in a network view. In addition, the correlation information was listed in a table, and users could select different thresholds to filter the lncRNA-mRNA coexpression subnetwork. (v) Coexpressed mRNAs were further subjected into GREAT to get enriched functions for HULC, identifying its various kinds of relations to physiologic and pathologic liver tissue. (vi) Evidences have also shown the association between LIHC and HULC from Lnc2Cancer, LncRNADisease, and exoRBase. (vii) It is easy to find HULC was upregulated in LIHC, suggesting HULC as a candidate diagnostic marker. (viii) The results of regression analysis and the Kaplan–Meier survival plot indicated that HULC was a protective factor in LIHC. Moreover, we identified another lncRNA-LINC00261 in liver cancer (Supplementary Fig. S2A–S2J), which was downregulated in cancer. The Kaplan–Meier plot showing in LncSpA allow users to investigate the prognosis effect of this lncRNA and we found that high expression of this lncRNAs was related with significantly better survival of patients with liver cancer (Supplementary Fig. S2A–S2J). Taken together, these eight panels provided detailed information for understanding the function of TE lncRNA in tissues under both physiologic and pathologic states.
Moreover, we provided five potential applications for understanding the cancer biology of lncRNAs based on the LncSpA resource (Supplementary Data S1). The LncSpA resource can help in analyzing the expression variation of lncRNAs across tissues and cancers (Supplementary Fig. S3A–S3E), identifying the clinical-associated lncRNAs (Supplementary Fig. S4A–S4C), exploring the tissue and cancer similarity (Supplementary Fig. S5A–S5C), prioritizing cancer lncRNAs (Supplementary Fig. S6A–S6D), predicting the function of lncRNAs (Supplementary Fig. S7), and comparison between adult and pediatric cancer (Supplementary Fig. S8A and S8B). All these applications demonstrate that LncSpA paves the way for further in-depth investigating the function of lncRNAs in cancer.
In summary, LncSpA is a comprehensive resource for investigating the spatial expression patterns of lncRNAs across normal tissues, and adult and pediatric cancer types. User-friendly interface was designed for querying, browsing, and downloading the TE lncRNAs of interest. Eight major types of information for TE lncRNAs were provided for visualizing and understanding their function in physiologic and pathologic phenotypes. In the future, we will continue to update LncSpA to include more samples across tissues and cancer types and maintain it as a valuable resource. Moreover, increasing single-cell sequencing is emerged and we will include these data to provide more spatial expression patterns of lncRNAs. TE lncRNAs in LncSpA are potentially promising candidate therapeutic targets in precision oncology. We believe that LncSpA will be a valuable resource for both experimental and computational researchers to bridge the knowledge gap from lncRNA expression to phenotypes.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: Y. Li, J. Xu, X. Li
Development of methodology: D. Lv, K. Xu, X. Jin, J. Li
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): D. Lv, K. Xu, X. Jin, J. Li, Y. Shi, M. Zhang, X. Jin
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D. Lv, K. Xu, X. Jin, J. Li, Y. Li, J. Xu
Writing, review, and/or revision of the manuscript: D. Lv, K. Xu, X. Jin, J. Li, Y. Shi, M. Zhang, X. Jin, Y. Li, J. Xu, X. Li
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): D. Lv, K. Xu, X. Jin, J. Li
Study supervision: Y. Li, J. Xu, X. Li
This work was supported by the National Key R&D Program of China (2018YFC2000100); the National Natural Science Foundation of China (61873075, 31871338, and 31970646); Natural Science Foundation for Distinguished Young Scholars of Heilongjiang Province (JQ2019C004); and Heilongjiang Touyan Innovation Team Program.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.