Abstract
Recently there have been explosive discoveries of new long noncoding RNAs (lncRNA) obtained by progress in the technology of second-generation sequencing. Genome scale analysis of transcriptome, in conjunction with studies on chromatin modifications at the epigenetic level, identified lncRNAs as a novel type of noncoding transcripts whose length is longer than 200 nucleotides. These transcripts are later found as major participants in various physiologic processes and diseases, especially in human cancers. LncRNAs have been found to function as novel types of oncogenes and tumor suppressors during cancer progression through various mechanisms, which endow them with the potential of serving as reliable biomarkers and novel therapeutic targets for cancers. Mol Cancer Ther; 17(9); 1816–23. ©2018 AACR.
Introduction
In recent years, there has been an explosive growth in the identification of long noncoding RNAs (lncRNA). Identification of new lncRNAs has been facilitated by progress in the technology of second-generation sequencing. This has allowed genome-wide analyses of the transcriptome and chromatin modifications at the epigenetic level. Previous research has shown the majority of the human genome produces a great number of transcripts without protein coding potential (1, 2). LncRNAs are one such class of noncoding transcripts. They are operationally defined as RNA molecules longer than 200 nucleotides that do not appear to have canonical protein-coding potential, i.e., they contain no classic open reading frame (3–8). Considering that nearly 70% of the human genome is transcribed to RNA products, among which only a minority encodes proteins (2), the number of lncRNA genes appears very large. Following the initial cloning of lncRNAs such as XIST (9) and H19 (10, 11) from cDNA libraries, two independent groups reported that the number of lncRNA genes is no less than that of protein-coding genes by using the tiling array technology (12, 13). Progress in tiling arrays (12–15), analysis of chromatin epigenetic modification (16), computational analysis of cDNA libraries (17, 18), and RNA sequencing (1, 19–21) have demonstrated that thousands of lncRNAs are widely expressed in human with noticeable specificity in tissue distribution. Recently, the GENCODE consortium (version 27) has reported 27,908 manually annotated and evidence-supported human lncRNAs originating from 15,778 gene loci (1, 2).
Many studies have revealed universal features of lncRNAs to help better understand this new class of transcripts. First, lncRNAs are independent transcriptional units. Many of the first identified lncRNAs were located in genomic regions that did not contain any previously known genes, for example, lncRNAs HOTAIR and H19 (14, 22). Now it is widely recognized that lncRNAs are unlikely to exist as “transcriptional noise” or “gene trash” as perceived before. Second, lncRNA transcripts contain fewer exons than mRNAs and utilize canonical splicing sites. Third, lncRNA transcripts are under weaker selection pressures than protein-coding ones during evolution, and many are primate-specific. It is reasonable to conclude that lncRNAs bear more possibility in nucleic acid variations during evolution because they have no need to preserve sequences critical for coding conserved amino acids of functional protein domains. Under such weaker selection constraints, primates might have flexibly evolved a set of specific lncRNAs to help accommodate unique physiologic functions of the species. Fourth, lncRNA transcription is under the control of epigenetic modifications of chromatin similar to that for protein-coding mRNAs. Many lncRNAs were initially identified by large-scale sequencing of cDNA libraries and subsequently characterized by transcriptional signatures from RNA pol II binding and epigenetic modifications of chromatin (3, 14). The transcription of lncRNAs is regulated by DNA modifications and histone codes similar to that for protein-coding mRNAs. Fifth, lncRNAs and protein-coding mRNAs also utilize the same transcriptional machinery. Transcription of the majority of lncRNAs is RNA pol II–dependent. LncRNA transcripts usually have a 5′ terminal methylguanosine cap and are polyadenylated. There is also evidence showing certain lncRNAs deficient in 3′ poly(A) tails, which are probably produced from RNA pol III promoters (23, 24). In addition, the splicing of nucleolar RNAs generates a subset of lncRNAs lacking in 5′ caps and 3′ poly(A) tails (25). This suggests that lncRNAs share the same molecular machinery for biogenesis with protein-coding mRNAs. Crosstalk and reciprocal regulation in the transcription of lncRNAs and protein-coding mRNAs may exist. Last, lncRNA expression is comparatively low and highly tissue-specific. Comprehensive analysis of human lncRNAs revealed that lncRNAs are generally expressed at lower levels than protein-coding genes, and they manifest more tissue-specific expression patterns. For example, a large number of lncRNAs are expressed specifically in the brain (1).
Molecular Mechanisms of lncRNAs
Despite the numerous functions lncRNAs possess, they utilize some common working mechanisms. These common mechanisms involve control of epigenetic modifications of chromatin, gene transcription, and mRNA stability, as well as protein translation (16, 19, 26–30). First, lncRNAs usually regulate gene expression by modulating epigenetic modifications of chromatin. LncRNA transcripts can form scaffold structures and mediate the interaction between chromatin and epigenetic modification factors. For example, lncRNA HOTAIR recruits the silencing complexes PRC2 and LSD1/CoREST/REST to targeted DNA regions for H3K27 methylation and H3K4 demethylation (27), and consequently represses the expression of a large amount of HOTAIR target genes, such as HOXD cluster genes.
Second, lncRNAs promote or repress gene expression at the transcriptional level through various mechanisms. A specific group of lncRNAs termed enhancer RNAs (eRNA) have been suggested to upregulate gene expression by acting as transcriptional enhancers (19, 28). On the other hand, lncRNAs can effect on transcription factors to control gene expression. The lncRNA NRON (noncoding repressor of NFAT) suppresses nucleocytoplasmic shuttling of the transcription factor NFAT, leading to repression on downstream gene transcription (30). In some cases, lncRNAs function as decoys for transcription factors or compete for DNA binding sites with transcription factors. For example, the lncRNA PANDA can sequester NF-YA protein away from its target genes involved in apoptosis (31). Interestingly, lncRNAs also regulate gene expression by directly affecting the organization of the nuclear compartment. Examples are MALAT1 and TUG1, who reside in distinct nuclear structures of interchromatin granules and Polycomb bodies respectively, can cooperatively control relocation of growth-control genes between the two nuclear compartments, and mediate the assembly of coactivators or corepressors to regulate gene expression in response to growth signals (32).
Third, in addition to the functions in epigenetic and transcriptional control on gene expression, lncRNAs also have proven activities in posttranscriptional regulation, affecting the processing, stability, and translational efficiency of mRNAs. MALAT1 was reported to modulate the nuclear localization and levels of phosphorylated Ser/Arg splicing factors, which directly affects the alternative splicing pattern of target gene pre-mRNAs (33). Like protein-coding genes, some lncRNAs and transcripts from pseudogenes act as miRNA sponges to sequester miRNAs away from target transcripts and promote the stabilization of mRNA molecules (29, 34, 35). LncRNAs also exert their functions in translational control. The association of lincRNA-p21 and the general translation repressor Rck was shown to repress the translational efficiency of β-Catenin and JUNB mRNAs (26). Similarly, a translational regulatory lncRNA (treRNA) was also reported to inhibit the translation of E-Cadherin mRNA (36).
Meanwhile, in addition to the noncoding functions supported by a vast amount of evidence as mentioned, recent studies implied that lncRNAs may also perform their functions by translation into proteins. Using ribosome profiling or ribosome footprinting methods, these studies showed that lncRNAs are physiologically associated with ribosomes, and the pattern of ribosome protection suggested the possibility of lncRNA transcripts in translating into short peptides (37–40). Furthermore, Schier and colleagues identified a 58-amino acid peptide encoded by a previously annotated noncoding RNA Toddler in zebrafish, and established its essential role in embryonic signaling (41). Nevertheless, whether lncRNAs play their biological roles through encoding short peptides still remains a controversial question (42). It will be an attractive field to help fully elucidate the working mechanism of lncRNAs.
LncRNAs in Human Cancers
Although the biological functions of lncRNAs have not been fully explored, more and more studies indicate that a most prominent function of lncRNAs is their regulatory roles in the development of human cancers. LncRNAs are active participants in tumor development by widely modulating cancer cell proliferation, apoptosis, migration, metabolism, and differentiation (6–8, 15, 31, 43–45). The lncRNA HOTAIR promotes metastasis of cancers through PRC2-mediated epigenetic modification on target genes (15). Correspondingly, increased expression of HOTAIR correlates with poor prognosis of many types of cancers (46, 47). ANRIL represents another typical example of oncogenic lncRNA. ANRIL expression is upregulated in prostate cancer. Its association with both CBX7/PRC1 (48) and SUZ12/PRC2 (49) leads to repression of the tumor suppressor INK4A/INK4B. The lncRNA FAL1 with amplification in gene copy number showed increased expression levels in several types of cancers. FAL1 enhances the growth rate of cancer cells by interacting with BMI1/PRC1 to suppress CDKN1A expression (50). Similarly, the lncRNA DANCR, which is widely overexpressed in many types of human cancers, was found to promote cancer progression partly by modulating CDKN1A expression (51). A TP53-regulated lncRNA, LINP1, enhances double-strand DNA break repair through the nonhomologous end joining pathway and desensitizes triple-negative breast cancer cells to radiotherapy (52). The lncRNA ceruloplasmin (NRCP) modulates cancer metabolism and promotes cancer growth by increasing the expression of genes related to glycolysis (45). In addition, a large number of lncRNAs were also found to be associated with tumorigenesis, such as PCAT-1 (20), PANDA (31), MALAT-1 (53), lincRNA-p21 (44), LINC00673 (54), etc.
It is worth noticing that genomic alterations and transcriptional deregulation of lncRNA genes are widely present in cancer cells. The somatic copy-number alterations (SCNA) of lncRNA genes across 5,860 tumor samples from 13 tumor types from the Cancer Genome Atlas project were extensively analyzed. It was found that a high frequency of copy-number gain and loss existed in an average of 13.16% and 13.53% of lncRNA genes, respectively (55). In line with the genome-wide landscape of SCNAs, individual lncRNAs with copy-number variation were identified. For example, the oncogenic lncRNA FAL1 with amplification in gene copy number shows increased expression levels in many types of cancers (50). Focal amplifications containing both the lncRNA SAMMSON and the melanoma-specific oncogene MITF were located to chromosome 3p13 and 3p14. SAMMSON was characterized as a critical player in promoting melanoma cell growth and survival (56). In addition to SCNAs, SNP is another common genomic alteration in lncRNA genes associated with cancer risk and clinical outcomes. Genome-wide studies have indicated frequent overlap between lncRNAs and cancer-related SNPs (55, 57). The tumor suppressor lncRNA LINC00673 antagonizes pancreatic cancer cell proliferation by promoting degradation of the oncogene PTPN11. A germline variant of LINC00673 has impaired activity and sensitizes cells to malignant transformation (54). NBAT-1 is an lncRNA that controls neuroblastoma cell proliferation and invasion. Interestingly, an SNP in the intron of NBAT-1 indicates high risk and poor clinical outcome of neuroblastoma (58). Besides genomic alterations, transcriptional deregulation of lncRNA genes is another mechanism responsible for their differential expression patterns in cancers. A study showed that DNA methylation patterns in the promoter regions of lncRNA genes are intrinsically distinct in cancerous and normal tissues (55). Furthermore, lncRNA transcription is under control of important oncogenes or tumor suppressors. A subset of lncRNA genes is either suppressed (such as CONCR and LINP1) or promoted (such as lincRNA-p21 and DINO) by TP53 (44, 52, 59, 60). The oncogenes RAS and MYC are also involved in transcriptional regulation of lncRNAs such as Orilnc1 and DANCR (51, 61).
LncRNAs May Serve as Biomarkers and Therapeutic Targets for Human Cancers
The explosion in lncRNA research has evoked great enthusiasm for the possibility of lncRNAs serving as diagnostic markers and therapeutic targets for human cancer. Hopefully, the unique features of lncRNAs, i.e., their differential expression patterns in cancer, temporal and tissue specificity, and diverse biological functions, will be advantages taken in clinical applications. The first inspiring case of lncRNA as a diagnostic marker is the FDA-approved testing of PCA3 for detection of prostate cancer (62). Together with the traditional serum prostate-specific antigen (PSA) testing, PCA3 testing has made diagnosis of this disease more accurate and specific. Potential lncRNA diagnostic markers for other types of cancers have been reported, including hepatocellular carcinoma and gastric cancer (63, 64). In addition, lncRNAs have been widely investigated as prognostic markers for cancer patients. Previous studies have revealed a correlation between lncRNA HOTAIR expression and colorectal cancer recurrence, lymph node metastasis, and poor prognosis (46, 65). On the other hand, lncRNAs are considered ideal biomarkers due to their presence in body fluids, such as in the form of exosome-containing RNA (66–68), which enables noninvasive diagnosis of cancers. Although numerous studies have shown differential expression of lncRNAs in tumors compared with normal tissues, extensive research and clinical data are needed to confirm the consistency of lncRNA expression in tumor tissue and body fluids for diagnostic accuracy and specificity. Despite limited progress in the development of lncRNA biomarkers so far, the diversity in biological functions of lncRNAs endows them with great potential in serving as therapeutic targets for human cancers. A supportive case is lncRNA MALAT1, an oncogene first discovered in metastatic lung adenocarcinoma patients (53) and later found highly expressed in lung, breast, and prostate cancers (69). It plays critical roles in promoting proliferation and metastasis of cancer cells. Animal studies indicated that silencing MALAT1 expression by antisense oligonucleotide (ASO) efficiently impaired the tumor growth and metastatic ability of breast and lung cancer cells, respectively (70, 71), implying a potential therapy for cancers by manipulating oncogenic lncRNA expression.
Strategies for Developing Targeted Therapy for Cancers with lncRNAs
As critical players in tumor development, lncRNAs represent promising therapeutic targets for treating cancers. However, targeting lncRNA molecules is different from targeting protein-coding transcripts. First, the lack of protein products encoded by lncRNAs means that targeting methods are restricted to RNA molecules only. However, methods for targeting RNA are currently limited. Second, unlike proteins with conserved domains and specific conformations, which serve as good targeting sites by small-molecule drugs, the three-dimensional structures of lncRNAs have been poorly explored. Furthermore, it has been speculated that lncRNA functions are independent on conserved three-dimensional structures, as they show relatively low evolutionary conservation of sequence compared with protein-coding genes. This makes structure-based drug design and screening difficult for lncRNA targets. Third, the working mechanism and regulation network of most lncRNAs have not been fully understood. This increases difficulties in developing specific targeting strategies for lncRNAs.
Despite the mentioned problems, diverse targeting methods for lncRNAs are under investigation (Figure 1 and Table 1). Using small interfering RNA (siRNA) and ASO to induce loss-of-function effect on lncRNAs represents the most common strategy. Similar strategies for targeting protein-coding genes have proven successful, and the systemic delivery methods have been widely explored (72–74). It has been well proved that both siRNA and ASO function by base pairing with mRNA to form double-stranded RNAs or RNA-DNA hybrids, resulting in degradation of target mRNA by the RNAi mechanism or by RNase H activity. This method has been adapted for knocking down lncRNAs with as potent effect as for mRNA silencing. For example, MALAT1 expression was suppressed by ASO to impair the tumor growth and metastatic ability of breast and lung cancer cells (70, 71). Furthermore, siRNA and ASO are usually subjected to certain modifications in order to enhance their stability, meanwhile retaining targeting specificity and alleviating the interferon induction effect. Such modifications include adding 2-nt 3′overhangs to siRNA, 2′-O-methyl to siRNA and ASO, and locked nucleic acid to ASO. To overcome poor stability and increase intracellular uptake of siRNA and ASO, delivery methods such as lipid-based carriers, polymersomes, and biocompatible nanoparticles can be employed. Last, the subcellular localization of lncRNAs should be considered when adapting the siRNA/ASO strategy. For lncRNAs located in the cytoplasm, both siRNA and ASO have satisfactory silencing effects. But for those located in the nucleus, ASO would be a better choice than siRNA due to the lack of RNAi machinery in the nuclear compartment. Meanwhile, considering that some lncRNAs function as sponges for miRNAs to alleviate the suppressive effects of miRNAs on endogenous target mRNAs (34, 54), targeting lncRNAs by miRNAs would also be a promising strategy. The prominent advantages of using miRNAs include their compatibility with the endogenous regulatory machinery for lncRNAs and the well-established delivery methods (74, 75).
Strategies . | Effects . | Advantage . | Limitation . |
---|---|---|---|
siRNA, ASO, miRNA | RNA degradation | Specific, potent effect | Stability, delivery |
Cas9 | Gene knockout, DNA editing, gene mutation, etc. | Versatile and long-term effect | Genomic context dependent, PAM requirement, targeting efficiency, in vivo delivery |
PspCas13b | RNA cleavage | Programmable targeting, less sequence constraint | Targeting efficiency, in vivo delivery |
REPAIR (dCas13-ADAR) | RNA editing | Programmable editing | Targeting efficiency, in vivo delivery |
Small molecules | RNA binding | Convenient in vivo delivery | Nonspecificity, nonprogrammable targeting |
Strategies . | Effects . | Advantage . | Limitation . |
---|---|---|---|
siRNA, ASO, miRNA | RNA degradation | Specific, potent effect | Stability, delivery |
Cas9 | Gene knockout, DNA editing, gene mutation, etc. | Versatile and long-term effect | Genomic context dependent, PAM requirement, targeting efficiency, in vivo delivery |
PspCas13b | RNA cleavage | Programmable targeting, less sequence constraint | Targeting efficiency, in vivo delivery |
REPAIR (dCas13-ADAR) | RNA editing | Programmable editing | Targeting efficiency, in vivo delivery |
Small molecules | RNA binding | Convenient in vivo delivery | Nonspecificity, nonprogrammable targeting |
Abbreviations: ADAR, adenosine deaminase acting on RNA; Cas9, CRISPR-associated 9; dCas13, catalytically inactive PspCas13b; PspCas13b, CRISPR-associated 13 from Prevotella sp. P5-125; REPAIR, RNA Editing for Programmable A to I Replacement.
Besides traditional methods using siRNA, ASO, and miRNA, the recently characterized clustered regularly interspaced short palindromic repeats (CRISPR)–associated deoxyribonuclease Cas9 gene editing system may be employed to achieve loss-of-function effects on lncRNAs. In this system, the Cas9 endonuclease is targeted to specific DNA sequences with a protospacer adjacent motif (PAM) by single-guide RNAs (sgRNA) that are artificially designed (76–79). This system has been widely applied for targeted DNA cleavage. For oncogenic lncRNA genes with well-characterized genomic localization, Cas9 can be directed by a pair of specifically designed sgRNAs targeting the 5′ and 3′ ends of the genes to generate knockout loci. In this way, successful knockouts of lncRNA-21A, UCA1, and AK023948 in various human cell lines have been accomplished (80). Similarly, utilizing a paired-guide RNA library, CRISPR-Cas9–mediated genome-scale deletion of lncRNAs was achieved (81). The Cas9 system is emerging as a powerful tool for targeting lncRNAs in cancer. Major advantages of the Cas9 system include, but not limited to, programmable targeting effects (e.g., gene knockout, as well as gene mutation and transcriptional activation) and convenience for high-throughput screening of lncRNA knockout phenotypes. Furthermore, a recent development of the Cas9 system was reported by Liu group, which enables conversion of cytidine to uridine, thereby leading to a C → T (or G → A) substitution, by fusing the Cas9 protein with a cytidine deaminase enzyme. This DNA base editing system can mediate base conversion in a programmable manner, without requiring double-strand DNA cleavage or DNA repairing templates (82). Thus this modified Cas9 system bears great potential especially in correcting tumorigenic SNPs on lncRNA genes with higher efficiency and accuracy. In addition, currently the in vivo delivery methods of the Cas9 system are developing very fast. These include viral vectors of lentivirus (LV) and adeno-associated viruses (AAV) for delivering DNAs encoding Cas9 and sgRNA (83–86); nonviral vectors of lipid-based carriers and polymers for delivering the DNA, mRNA, and proteins of the Cas9 system (87–89); and physical approaches, such as microinjection, electroporation, and hydrodynamic injection, for delivering DNA, mRNA, and proteins of the Cas9 system (88, 90). These versatile approaches enable efficient in vivo delivery of the Cas9 system for lncRNA targeting in cancer treatment in the future. However, the Cas9 strategy for lncRNA knockout is genomic context dependent. It is well known that numerous lncRNAs are transcribed from bidirectional promoters, or overlap with promoters or bodies of sense or antisense genes. This unique feature of lncRNAs may expose their neighboring genes to the risk of inadvertent deregulation when using the Cas9 system for lncRNA gene knockout purposes. A study showed that only one third of 15,929 lncRNA loci may be safely targeted for gene knockout via the Cas9 system without perturbation of neighboring genes (91). Thus, careful study is needed before adopting the Cas9 system for targeting of any lncRNA gene.
In addition to the Cas9 system, novel CRISPR-Cas systems capable of directly targeting RNA, rather than working through DNA recognition, have been established. The class 2 type VI CRISPR-Cas effectors, Cas13a and Cas13b, were recently identified by Zhang group as single-component programmable RNA-guided RNA-targeting RNases that have both RNA processing and RNA cleavage activities (92, 93). Cas13b from Prevotella sp. P5-125 (PspCas13b) is an RNA-targeted RNase which does not require a PAMmer for target RNA recognition. PspCas13b displays consistent, robust, and specific knockdown of a reporter mRNA in mammalian cells. Thus, it holds great promise as a powerful tool for knocking down lncRNA transcripts in cancer therapy, especially for oncogenic lncRNAs with higher expression levels in cancer cells than in normal tissues and which are unsuitable for gene knockout by the traditional Cas9 system. Furthermore, the PspCas13b system was modified to an RNA-editing system by fusing the catalytically inactive PspCas13b (dCas13b) with the catalytic domain of adenosine (A) to inosine (I) deaminase 2 (ADAR2), namely RNA Editing for Programmable A to I Replacement (REPAIR; refs. 92, 93). The REPAIR system can convert A to I in mRNA molecules without disturbing genomic sequences, which avoids unexpected effects such as frameshift and nonsense mutations of the genome. Furthermore, this system is theoretically capable of transformation to other types of base editing platforms, e.g., C-to-U editing, if dCas13b is fused with other RNA base editors. Collectively, the REPAIR system offers advantages in the programmable editing of a specific nucleotide without sequence constraints (no PAM requirement), and independence from endogenous DNA repair pathways, which are usually needed for the Cas9 system. Hopefully, the REPAIR system will function as an accurate tool for editing lncRNAs in cancer. For example, the REPAIR system may reduce cancer risk by altering a single nucleotide in tumorigenic SNPs of lncRNAs (54, 58). It might also be used to regulate the association of lncRNAs and onco-protein partners by changing specific nucleotides critical for the interaction.
Previous studies have found that small molecules represent a huge source for drugging proteins. It has always been an interesting question as to whether RNAs can also directly bind to small-molecule compounds. Previously, scientists have been pessimistic about the answer until a recent breakthrough in the field (94). A small molecule LMI070 was found to bind to the pre-mRNA of spinal muscular atrophy–related gene SMN2 which encodes survival motor neuron. The association boosts the processing of exons and translation of the protein product to antagonize the disease (95). Later, by screening a library of noncoding RNAs against a library of small molecules to find strong interactions, another small molecule, targaprimir-96, was identified as a binding partner of primary miRNA-96 at a key processing site. This interaction blocks the maturation of mir-96 and induces apoptosis in cancer cells (96). Using the same strategy, authors also found a small molecule, targaprimir-210, which binds primary miR-210 (97). These inspiring results demonstrate that both coding and noncoding RNAs are druggable just like proteins. Although no relevant studies on drugging lncRNAs have been reported, these studies provide one more prospective strategy for targeting lncRNAs for cancer therapy.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Acknowledgments
We apologize to colleagues whose work was not discussed or cited in this review due to the space limitation. This work was supported, in whole or in part, by the US Department of Defense (PC140683 to C.V. Dang), the National Key Research and Development Program of China (2017YFA0105501 to X. Zhong), the Guangdong Province Science and Technology Project (2015A020212019 to X. Zhong), the US National Institutes of Health (R01CA142776 to L. Zhang, R01CA190415 to L. Zhang, P50CA083638 to L. Zhang, P50CA174523 to L. Zhang, P50CA083639 to A.K. Sood, P50CA098258 to A.K. Sood, and R35 CA209904 to A.K. Sood), the Breast Cancer Alliance (L. Zhang and C.V. Dang), the Frank McGraw Memorial Chair in Cancer Research (A.K. Sood), the American Cancer Society Research Professor Award (A.K. Sood), the Marsha Rivkin Center for Ovarian Cancer Research (L. Zhang), the Basser Center for BRCA (L. Zhang), the Harry Fields Professorship (L. Zhang), and the Kaleidoscope of Hope Ovarian Cancer Foundation (L. Zhang).