Abstract
Approximately half of the human genome consists of repetitive sequence attributed to the activities of mobile DNAs, including DNA transposons, RNA transposons, and endogenous retroviruses. Of these, only long interspersed elements (LINE-1 or L1) and sequences copied by LINE-1 remain mobile in our species today. Although cells restrict L1 activity by both transcriptional and posttranscriptional mechanisms, L1 derepression occurs in developmental and pathologic contexts, including many types of cancers. However, we have limited knowledge of the extent and consequences of L1 expression in premalignancies and cancer. Participants in this NIH strategic workshop considered key questions to enhance our understanding of mechanisms and roles the mobilome may play in cancer biology. Cancer Res; 76(15); 4316–9. ©2016 AACR.
Introduction
Mobile DNAs, including DNA transposons, RNA transposons, and endogenous retroviruses, are highly abundant sequences that make up a major portion of eukaryotic genomes. Although human genomes no longer possess active DNA transposons, which integrate via a “cut-and-paste” mechanism, they are instead enriched in active retrotransposons that integrate via RNA intermediates. Human retrotransposons include long and short interspersed elements (LINE and SINE), composite SINE-R, VNTR, and long terminal repeat (LTR) elements. LINEs and LTRs are autonomous or protein-coding elements, although only LINE-1 (L1) is considered competent to mobilize in humans. L1 sequences represent approximately 17% of the human genome; however, the vast majority of the estimated 500,000 copies is incapable of mediating retrotransposition due to substantial truncations in the 5′ region. In contrast, recently inserted, full-length L1s encode proteins that can “copy-and paste” L1 sequences to new locations in the genome. In addition to mobilizing L1 RNA (in cis), L1 proteins can also transpose other repetitive elements and endogenous RNAs in trans.
Interest in L1 relationships to cancer biology dates back to the 1980s. Early L1 work included sequence characterization of expressed full-length L1 in teratocarcinoma cells (1) and the demonstration of retrotransposition in cultured HeLa cells (2). A somatically acquired L1 insertion was shown to disrupt the adenomatous polyposis coli (APC) tumor suppressor gene in a case of colorectal cancer (3). Nearly a decade of advances in DNA sequencing have both underscored the importance of L1 activity in causing heritable variation through retrotransposition in the germline and also demonstrated that widespread somatic retrotransposition occurs in many cancers. On September 25, 2015, the NCI (Rockville, MD) sponsored a strategic workshop to assess the potential impact of somatic retrotransposition on cancer initiation and progression. A panel of experts in mammalian mobile DNAs, chaired by K.H. Burns and J.D. Boeke, considered recent advances and challenges in the field. The goals were to define key research priorities and discuss ways to accelerate our understanding of mobile DNAs in cancer. The following is a summary of the key topics discussed.
Retroelement Biology and Its Regulation
L1 is an approximately 6-kb DNA sequence comprised of a 5′ untranslated region (UTR); two sense stranded open reading frames (ORF), which encode two proteins, ORF1p and ORF2p; a 3′ UTR; and a poly(A) homopolymer. The 5′UTR is rich in CpG dinucleotides and contains an antisense promoter activity and a recently discovered antisense ORF protein dubbed ORF0 (4, 5). Transcribed L1 mRNA is exported to the cytoplasm as a bicistronic transcript that encodes both ORF1p, an RNA-binding protein, and ORF2p, which possesses endonuclease and reverse transcriptase domains required for L1-mediated retrotransposition. ORF1p and ORF2p form a ribonucleoprotein (RNP) complex with target RNA (i.e., L1, Alu, SVA) or endogenous mRNAs in the cytoplasm and gains access to the nucleus through a poorly understood process. Transposition of L1 sequences to new positions in the genome takes place through target-primed reverse transcription (TPRT). In this reaction, the ORF2p endonuclease first nicks the DNA double helix at a TTTT/A consensus motif to allow for an interaction between the DNA and the poly(A) tail of the mRNA and then uses this DNA to prime a reverse transcription reaction. Canonical L1-mediated integrations can be recognized by target site duplications flanking the insertion site, the presence of poly-A tracts at the 3′ end of the insertion, and in the case of processed pseudogenes their lack of intronic sequences.
J.D. Boeke discussed the identification of host proteins that interact with L1 ORF1 and ORF2 proteins with the hypothesis that this will elucidate mechanisms of TPRT and promote a better understanding of how cells contend with LINE-1 activity (6). He also described a transgenic mouse model harboring an inducible L1 gene-trap cassette useful for forward genetic screens in cancer. These mice surprisingly demonstrated L1 activity–dependent, nonheritable coat color phenotypes, suggesting cell populations highly susceptible to somatic L1 activity (7). Tim Bestor (Columbia University, New York, NY) provided evidence for an L1-derived gene involved in control of L1 activity, designated ECAT11 or L1TD1. Evidence suggests that ECAT11/L1TD1 is a partially rearranged L1 ORF1, under strong selection in species with active L1, and neutral selection in species without active L1. John Moran (University of Michigan, Ann Arbor, MI) emphasized the need to elucidate L1 genomic insertion patterns, highlighting a key technical challenge in the field. Somatically acquired L1 insertions, he explained, may be targeted to specific genomic intervals owing to biases of either the retroelement or the chromatin state of a particular cell, that is, certain intervals may be more accessible than others. Alternatively, de novo insertions may occur randomly and subsequently be subject to natural selection pressures during the expansion of a cell lineage. A need for more in vivo experimental model systems to query the L1 life cycle, its interactions with host factors, and characteristics of insertion site preferences was stressed.
Roles of Retroelements in Neurobiology
Over the past several years, a number of groups have focused on analyses of somatic retrotransposition insertions in neurons, both in Drosophila (8, 9) and in single-cell analyses of human neurons (10–12). At this point in time, the field is largely in agreement that somatic retrotransposition can cause some genetic variation in neuronal populations. However, differences in methodologies, both during the genome amplification required for single-cell sequencing and in the bioinformatic analyses, have resulted in substantial discrepancies in estimations of how much L1 retrotransposition contributes to somatic variation and, by extension, its significance to the biology of mature neurons. Alice Eunjung Lee (Harvard Medical School, Boston, MA) reported that even infrequent somatic insertions recovered by sampling different areas of the brain can inform models of neural development using shared insertion sites as a marker of common cell ancestries (11). Others are considering roles for retrotransposons in the central nervous system (CNS) that are not necessarily tied to their insertion sites. Specifically, Alysson Renato Muotri (University of California, San Diego, La Jolla, CA) discussed how unchecked L1 activity may be seen in the pathogenesis of certain human neurologic conditions. He described how loss of TREX1, a cytosolic 3′-5′ DNA exonuclease involved in innate immunity, led to increased L1 activity and decreased survival of human neurons, at least in tissue culture. See the article by Richardson and colleagues (2014) for a review of L1 activity in neuronal tissue (13).
Effects of Aging and the Environment
An additional emerging area of interest is in understanding the extent of somatic retroelement expression and transposition in cellular and organismal aging and, by extension, how this relates to cancer. It is important to consider that although derepression of retrotransposons in advanced age may have negligible consequences for the fecundity of a species, its occurrence may create susceptibility to age-related diseases. Retroelements, including L1, have been shown to increase expression as a consequence of aging, a change that can be attenuated by caloric restriction in mice (14). Vera Gorbunova (University of Rochester, Rochester, NY) discussed the role of SIRT6, a chromatin regulator that mono-ADP-ribosylates KAP1 and PARP1 and deacetylates histones, in regulating L1 expression. SIRT6 binds L1 5′UTR sequences in the genome and directs heterochromatin formation to suppress transcription. Exposure to ionizing radiation or oxidative stress results in a displacement of SIRT6 away from L1 5′UTRs to sites of DNA strand breaks, thereby allowing reexpression of L1 genes. These findings suggest a relationship between accumulated DNA damage and increased L1 expression with age (15).
L1 activity also seems to be influenced by environmental factors, such as inversion of day–night cycles that disrupt circadian pathways. Victoria Belancio (Tulane University, New Orleans, LA) highlighted the role of the circadian-responsive melatonin receptor 1 (MT1) in regulating L1 activity. MT1 is capable of reducing L1 transcript and protein levels and also reduces L1-mediated retrotransposition when overexpressed in cell culture systems. Furthermore, perfusion of human prostate cancer xenografted in nude rats with serum collected from human subjects at night, which contains high levels of melatonin, suppressed L1 expression in tumor cells (16). In contrast, L1 RNA is expressed when tumors are perfused with serum collected from the same subjects during the daytime or at night after bright light exposure.
Retroelement Activity in Cancer
A direct link between somatic L1 activity and genome instability associated with oncogenesis has been difficult to assess. One case study in 1992 reported a somatic L1 insertion in the last exon of APC as a causal factor in the subject's development of colon cancer (3). Despite this observation, the significance of recent reports demonstrating that L1 insertions are found in many human clinical tumor samples by next-generation sequencing is not yet clear (17–27). Major points for discussion that have emerged in the L1 field include the prevalence of L1 expression and activity in many cancers and whether L1 expression or retrotransposition affects cancer progression.
K.H. Burns began the discussion by highlighting an interrogation of ORF1p protein expression by IHC. This revealed tumor-specific localization to be a common feature of many types of high-grade malignant cancers (21). She described follow-up studies showing extensive somatic L1 retrotransposition in pancreatic adenocarcinoma. Somatic L1 insertions occur with varying rates in the course of the clonal evolution and metastasis of these malignancies (22). She also proposed that inherited polymorphisms in repetitive element insertions may account for a portion of heritable cancer risks. Susan Logan (New York University, New York, NY) presented work from her laboratory on germline mechanisms suppressing L1 activity. A yeast two-hybrid screen using androgen receptor (AR) as bait led to the discovery of AR-trapped clone 27 (ART-27; ref. 28), which in subsequent studies was found to interact with unconventional prefoldin RPB5 interactor (URI) to regulate AR target genes (29). She described a transcriptional repressor complex comprised of ART-27, URI, and KRAB-associated protein 1 (KAP1), the latter of which has been implicated in L1 regulation. ART-27 conditional knockout resulted in a loss of germ cells in mouse testes. Whether the loss of germ cells was due to toxicity associated with excessive L1 activity will be explored in future studies. Prescott Deininger (Tulane University, New Orleans, LA) emphasized technical challenges in understanding L1 RNA expression. He stressed the importance of identifying characteristics that distinguish L1 loci as potentially retrotransposition competent (“hot”) versus quiescent. Additional technical challenges discussed include limits to our ability to uniquely map mobile element reads from recently inserted L1 sequences and the issue that many active L1 are not represented in the reference genome. Solving these problems will allow for a deeper understanding of the role of specific L1 loci in disease. The session concluded with Peter Park (Harvard Medical School, Boston, MA) presenting his group's analysis of sequencing data generated by The Cancer Genome Atlas (TCGA) consortium suggesting some recurrent somatic L1 integration sites in cancers and the potential impact of L1 insertions on the expression of surrounding gene loci. He underscored a need for standardization among analytic pipelines and new and innovative ways for storing, analyzing, and sharing data, considering that TCGA alone houses approximately 750 TB of raw sequencing data for their approximately 2,500 sequenced cases.
Summary
Mobile DNAs give rise to interspersed repeats, sequences that comprise the majority of our genomes. These sequences have been historically understudied, both because their significance is unknown and because high copy number repeats can pose experimental challenges to high-throughput genomic analyses. As enabling technologies are maturing at a fast pace, neither impediment now seems steadfast, and interest in defining the functional mobilome in health and diseases, such as cancer, has never been greater.
In particular, the intersection of the L1 field and cancer biology has recently been burgeoning with intriguing questions. New sequencing technologies and bioinformatics tools, as well as new reagents for detecting L1 ORF1p, have revealed L1 expression and somatic retrotransposition in many human cancers. Models of cancer initiation and progression presume the involvement of genomic stresses, and the extent to which the mobilome acts as a contributor to these processes remains an area open for further study. Arguably, the most imperative question is whether this activity causes or promotes malignancy. Furthermore, why such a vast heterogeneity of retrotransposition activity is evident both between tumor types and even among tumors of the same type is unknown. Whether these differences can be attributed to the inherited complement of active L1 loci or differences in how tumors control L1 expression and activity is not clear. Beyond L1, analysis of other mobile elements, Alu and SVA, as well as more ancient sequences, has yet to be thoroughly undertaken in cancer.
There are several needs in the field at the moment. In particular, curated databases with robust quality control mechanisms to catalog these structural variants will be essential for continued progress. Standardization of accurate and accessible methods for calling insertions in genomic sequencing data would enable more samples to be compared and promote future functional exploration of these sequences. More development in the challenging area of single cell L1 mapping, where different laboratories have reached different conclusions on the extent of somatic insertions in neuronal development, is especially needed. Experimentally, there is a need for more disease-relevant in vitro and in vivo model systems to interrogate mobile elements in cell-specific contexts, in the germline and CNS, as well as in a variety of normal and malignant tissues. Finally, a better understanding of how host systems restrict L1 and other active retrotransposons, and what results when those cellular systems fail, will provide important perspectives on cancer biology. We expect that these efforts will increase our knowledge of the biology of mobile DNAs and their contributions in cancer.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.