The emergence of whole-genome annotation approaches is paving the way for the comprehensive annotation of the human genome across diverse cell and tissue types exposed to various environmental conditions. This has already unmasked the positions of thousands of functional cis-regulatory elements integral to transcriptional regulation, such as enhancers, promoters, and anchors of chromatin interactions that populate the noncoding genome. Recent studies have shown that cis-regulatory elements are commonly the targets of genetic and epigenetic alterations associated with aberrant gene expression in cancer. Here, we review these findings to showcase the contribution of the noncoding genome and its alteration in the development and progression of cancer. We also highlight the opportunities to translate the biological characterization of genetic and epigenetic alterations in the noncoding cancer genome into novel approaches to treat or monitor disease.
Significance: The majority of genetic and epigenetic alterations accumulate in the noncoding genome throughout oncogenesis. Discriminating driver from passenger events is a challenge that holds great promise to improve our understanding of the etiology of different cancer types. Advancing our understanding of the noncoding cancer genome may thus identify new therapeutic opportunities and accelerate our capacity to find improved biomarkers to monitor various stages of cancer development. Cancer Discov; 6(11); 1215–29. ©2016 AACR.
The Identification and Annotation of CIS-Regulatory Elements in the Human Genome
The Human Genome Is More Than a Collection of Genes
The diploid human genome consists of over 6 billion bases of DNA that provide the genetic basis for our phenotypic individuality (1, 2). Approximately 20,000 genes are encoded in the human genome and are transcribed into ∼80,000 transcripts that are subsequently translated into various proteins. Despite the importance of proteins in diverse cellular processes, protein coding sequences account for under 2% of the human genome (less than 120 million bases of DNA in the diploid genome). The role for the remaining noncoding bases (∼98%) is a source of investigation and debate (1, 3). Nearly half of the noncoding genome consists of repetitive elements, including interspersed satellites, short interspersed nuclear elements, long interspersed nuclear elements, ribosomal DNA, DNA transposons, and retrotransposons, that affect various biological functions (4). Additionally, the noncoding genome harbors nonrepetitive elements, including cis-regulatory elements such as promoters, enhancers, and anchors of chromatin interactions (5, 6). These cis-regulatory elements are directly involved in modulating gene expression and noncoding RNA transcription through long-range chromatin interactions (Fig.1A–D; refs. 7, 8). Identifying and characterizing functional noncoding elements within the genome hold great promise to improve our understanding of the human genome in health and disease. In this review, we focus on the progress in noncoding functional element annotation and recent advances demonstrating the central role of genetic and epigenetic alterations affecting noncoding cis-regulatory elements of relevance to cancer initiation and progression.
Identification and Annotation of Noncoding Functional Elements across the Genome
The noncoding genome has historically been overlooked because of technical limitations hindering the characterization of its genetic and epigenetic nature. Recent advances in whole-genome annotation, inclusive of next-generation sequencing technologies, now offer the means to effectively delineate functional noncoding regions of the human genome. This annotation takes into account multiple definitions of functionality to incorporate the evolutionary, genetic, and molecular biology perspectives.
From an evolutionary perspective, comparative analyses are commonly used to identify conservation of DNA sequences across related species (9). Genetic elements that are retained across species are generally considered biologically important and are thus considered functional. In a recent study, about 8% of the human genome was reported to be under evolutionary constraint (10). Taking into account protein-coding sequences, this implies that functionality can be ascribed to approximately 6% of the noncoding genome (10, 11). Identifying conserved DNA sequences through sequence alignment along a linear genome, however, disregards the three-dimensionality of the genome in which the sequence identity of cis-regulatory elements regulating the same gene, for instance, may be conserved across species despite localizing to neighboring yet distinct positions along the linear genome of different species. This is suggested by a comparative study of cis-regulatory elements between the human and mouse genomes revealing conservation at the level of transcription factor networks between these two species (12). Similar multispecies comparative analyses based on the chromatin binding profiles for multiple transcription factors with different DNA-binding motifs exhibit conserved DNA-binding sequence preferences but with limited binding event alignment across species (13–15). This supports the integration of epigenomics and comparative genomics to assist in the identification of functional elements in the context of the evolutionary perspective.
From a genetic perspective, functional elements of the genome are defined by the ability of a variation in their DNA sequence, either a structural alteration or a single nucleotide variant (SNV), to cause quantifiable phenotypic changes, inclusive of differential gene expression. This is exemplified by the mutations reported in the TERT gene promoter in patients with melanoma that increases TERT gene expression (16, 17). Until recently, the genetic approach was hampered by low-to-modest throughput methodologies. The recent development of high-throughput in vitro assays, however, including the Massively Parallel Reporter Assay (MPRA), Massively Parallel Functional Dissection (MPFD) assay, Self-Transcribing Active Regulatory Region sequencing (STARR-seq), and Protein Binding Microarrays (PBM), is now allowing us to measure biochemical differences across cis-regulatory elements and genetically modified variants (18–21). These assays are further complemented with newly designed in silico approaches such as IntraGenomic Replicates (IGR) and Function-Based Prioritization of Sequence Variations (Fun-seq) that predict changes in DNA–protein interaction induced by genetic alterations (22–26). Together, these technologies enable researchers to assess the impact of thousands of genetic alterations found across the genome to quantifiably change phenotypic traits, thereby accelerating the identification of functional noncoding elements based on a genetic perspective.
From a molecular biology perspective, functional noncoding elements are identified based on biochemical measurements. Following up on the work from independent laboratories characterizing biochemical activity across the noncoding genome, the Encyclopedia of DNA Elements (ENCODE) project launched in 2003 significantly accelerated the identification of biochemically active noncoding elements of the human genome. This was made possible using a series of high-throughput assays, including chromatin immunoprecipitation sequencing (ChIP-seq), RNA sequencing, and DNase I hypersensitive site sequencing (DNase-seq), across a collection of normal and cancer cell lines from different tissues of origin (27–29). Overall, this led the ENCODE project to predict biochemical activity across approximately 80% of the human genome (27). Other initiatives, including The Functional Annotation of the Mammalian Genome (FANTOM) project and the International Human Epigenome Consortium (IHEC), inclusive of the Roadmap Epigenomics project, Blueprint, DEEP, Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC), and Core Research for Evolutional Science and Technology (CREST), are further contributing to the biochemical characterization of the human genome to identify functional elements (30–32). Overall, these efforts led to the annotation of a diverse set of functional elements, inclusive of cis-regulatory elements, populating the noncoding genome to establish transcriptional programs in a lineage-specific manner across many cell and tissue types.
In accordance with the aberrant changes in gene expression profiles promoting cellular dedifferentiation and pluripotency during cancer development, cis-regulatory elements integral to transcriptional programs are garnering attention in the field of cancer genetics.
CIS-Regulatory Elements are Targets of Genetic and Epigenetic Alterations in Cancer
Genetic and Epigenetic Alterations Target Promoters in Cancer
Promoters located upstream of transcription start sites correspond to the basic unit of regulation required for the expression of any transcript (33). The initiation of gene transcription involves the recruitment of coactivator proteins to assist in a series of steps, culminating in the assembly of the transcription preinitiation complex consisting of general transcription factors (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH) and RNA polymerase II. The activation of the RNA Polymerase II results in transcriptional elongation (34, 35).
Genetic Alterations Populate Promoters in Cancer.
Given the fundamental role of promoters in transcription, genetic alterations targeting their underlying sequences can directly contribute to aberrant gene expression (Fig.2A). This is exemplified by the recurrent somatic mutations identified in the TERT gene promoter in multiple human cancers, including melanoma, glioma, medulloblastoma, lung adenocarcinoma, thyroid cancer, bladder cancer, and hepatocellular carcinoma (16, 17, 36–40). This same promoter is also affected by genetic predisposition, specifically by the rs2853669 single-nucleotide polymorphism (SNP) located 246 base pairs upstream from the start codon (17). These genetic alterations create DNA recognition motifs for members of the E26 transformation-specific (ETS) transcription factor family, leading to the increased binding of ETS factors, promoting an increase in TERT expression (16, 17, 41). The continued TERT gene expression normally suppressed in somatic cells can result in the aberrant lengthening of telomeres to favor cellular immortalization and oncogenesis (42). In addition to the TERT gene promoter, recent whole-genome analyses of various cancer types have also reported a significant burden of mutations at the WDR74, MED16, SDHD, and TFPI2 gene promoters (41, 43). Furthermore, genetic alterations in the SDHD and TERT promoters were shown to discriminate patient outcome in a collection of cancer types, including melanoma, glioma, medulloblastoma, thyroid cancer, liver cancer, and bladder cancer, supporting their usefulness as biomarkers for patient stratification (36–38, 41, 43–46).
Structural variations involving promoters can also contribute to oncogenesis. A prototypical example consists of a fusion between the ETS factor ERG proto-oncogene and the promoter region of the TMPRSS2 gene through an intronic deletion on chromosome 22q22.2–3 (47). The TMPRSS2–ERG (T2E) fusion, reported in approximately 50% of patients with prostate cancer, places ERG gene expression under the control of the androgen-regulated TMPRSS2 promoter, resulting in an oncogenic increase in ERG transcript and protein levels (47, 48). The increase in ERG protein resulting from the T2E fusion can upregulate the expression of target genes that favor prostate cancer cell migration and invasion, including CXCR4 and ADAMTS1 (47, 49), and has been associated with higher-grade prostate cancer (50). Fusion of the TMPRSS2 promoter with other ETS family members, such as ETS Variant 1, 4, and 5 (ETV1, ETV4, ETV5), has also been reported in another 5% to 10% of patients with prostate cancer (48). More recently, structural variations including deletions, duplications, inversions, and translocations associated with breakpoints at chromosome 9p24 that cluster within the 3′-untranslated region (UTR) of the PD-L1 gene (also known as CD274) were reported in multiple cancers, including adult T-cell leukemia, large B-cell lymphoma, and stomach adenocarcinoma (51). These aberrant structural alterations targeting the 3′-UTR of PD-L1 elevate the stability of PD-L1 transcripts and expression, suggested to aid cancer cells in escaping antitumor immunity (51). Together, these studies showcase that promoters can be targeted by either inherited or acquired genetic alterations, inclusive of both SNVs and structural variants, that contribute to oncogenesis.
Epigenetic Alterations Accumulate at Promoters in Cancer.
The activity of cis-regulatory elements is greatly dependent on their chromatin accessibility. Promoters found in compacted chromatin, known as heterochromatin, are inactive, whereas those found in accessible chromatin, known as euchromatin, are actively engaged in transcriptional regulation. Epigenetic modifications, inclusive of DNA methylation and histone modifications or variants, readily influence chromatin accessibility by affecting the density of nucleosomes (Fig.2B; refs. 29, 52, 53). Changes in chromatin accessibility, either increasing or decreasing its compaction, through epigenetic alterations can directly affect cancer development.
This is highlighted by changes in DNA methylation at CpG dinucleotides commonly reported at promoters of target genes or noncoding transcribed regions across different types of cancers (Fig.2B; refs. 54, 55). For example, the promoters of numerous tumor suppressor genes such as RASSF1A, BRCA1, APC, MLH1, and p16 (CDKN2A) are hypermethylated in osteosarcoma, endometrial carcinoma, glioblastoma, and pancreatic, breast, colorectal, ovarian, and non–small cell lung cancers (56–63). The hypermethylation of these promoters correlates with the reduced expression of their associated gene (58–60, 64–66). Similar results were reported for noncoding transcribed regions, inclusive of microRNAs (miR) and long noncoding RNAs (lncRNA). For instance, the hypermethylation of the miR-124a promoter is associated with reduced miR-124a expression in leukemia, lymphoma, and colon, breast, and lung cancers (67). Similarly, DNA hypermethylation of the bidirectional miR-34b/c promoter relates to miR-34b/c silencing in colorectal cancer cells, and is favorable to colony formation (68). Finally, DNA hypermethylation of the MEG3 lncRNA promoter is linked with reduced MEG3 expression in multiple cancers (69–71) and associates with poor prognosis in patients with gastric cancer (72). In summary, aberrant DNA hypermethylation can have an impact on both coding and noncoding gene promoters to affect oncogenesis. Noteworthy, differential methylation status at promoters can inform on clinical outcome. For instance, DNA methylation of the GSTP1 promoter on chromosome 11q13 can discriminate malignant from normal prostate tissue (73, 74). Methylation status of the HOXD3 promoter on chromosome 2q31 also segregates low-grade prostate cancer from intermediate- and high-grade prostate cancer (75). Moreover, the methylation status of the MGMT promoter on chromosome 10q26 in patients with glioma has been suggested to be a predictor of treatment response and post-treatment survival to temozolomide and alkylating agents (76, 77). These studies support that DNA methylation patterns at promoters can also potentially serve as biomarkers to stratify patients with cancer for treatment response and distinct clinical outcomes.
Conversely, promoters can undergo DNA demethylation as cancer develops (78), and the loss of DNA methylation at promoters associates with the overexpression of the corresponding gene. Demethylation at the ELMO3 gene promoter, for instance, is associated with its overexpression in human lung cancer (79). In accordance with the proposed role of ELMO3 in cellular migration, the overexpression of ELMO3 has been documented in metastatic lung cancer (79, 80). Promoter demethylation driving the aberrant expression of uPA, involved in tumor progression and metastasis, is similarly reported in invasive prostate cancer (81, 82). The treatment of invasive prostate cancer PC-3 cells with S-adenosylmethionine, previously shown to favor hypermethylation, inhibits uPA gene expression and cell invasion in vitro, suggesting that the inhibition of promoter demethylation may be a potential therapeutic strategy against aberrantly activated tumor-promoting genes (81).
Promoters can also be epigenetically marked by specific histone modifications (Fig.2B), and aberrant fluctuation of these modifications has been linked to oncogenesis. The co-occupancy of lysine 4 and 27 trimethylation on histone H3 (H3K4me3 and H3K27me3, associated with activation and repression of transcription, respectively) defines a “bivalent” state found at the promoters of genes poised for expression (83–85). These bivalent promoters can transit into either active (H3K4me3-positive and H3K27me3-negative) or silent (H3K4me3-negative and H3K27me3-positive) states during cell differentiation (85). During colorectal cancer initiation, gains and losses of the H3K4me3 modification at promoters are associated with differential gene expression (86). The loss of the H3K27me3 modification has also been linked to aberrant activation of oncogenic gene transcription, including MKI67 and CD133, a proliferation marker and a cancer stem cell marker, respectively (87). Moreover, the loss of both H3K4me3 and H3K27me3 modifications is associated with aberrant gains in promoter methylation in colorectal cancer (87). Apparent gains and losses of the H3K27me3 modification at promoters also discriminates androgen deprivation–resistant versus androgen deprivation–sensitive prostate cancer cells, suggesting a role for epigenetic alterations at promoters during cancer progression (88). Unfortunately, these observations do not delineate the causal role of changes in either H3K4me3 or H3K27me3 at promoter in cancer. Future work relying on the recent development of epigenetic editing technologies, such as Transcription Activator-Like Effector (TALE) or deactivated Cas9 (dCas9) fused with epigenetic writer or eraser proteins (e.g., TALE–TET1, TALE–LSD1, dCas9–p300; refs. 89–93), will provide an opportunity to directly assess the role for the changes in histone modifications targeting promoters in oncogenesis. In support, increased IL1RN gene expression was achieved in cells expressing the dCas9–p300 fusion protein targeted to the IL1RN promoter, which allowed for lysine 27 acetylation of histone H3 (H3K27ac; ref. 91).
Genetic and Epigenetic Alterations Target Enhancers in Cancer
Enhancers are cis-regulatory elements found tens to thousands of base pairs away from their target transcript promoter that can modulate their expression independently of orientation. They serve to modulate the activation of promoters and fine-tune transcription in a cell type–specific manner, a property that renders enhancer activity modulation ideal for genetic and epigenetic alterations to affect cell identity, abrogate cellular differentiation, and promote oncogenesis (94).
Genetic Alterations at Enhancers Affect Gene Expression in Cancer.
Genetic predispositions for various traits and diseases identified through genome-wide association studies preferentially map to noncoding cis-regulatory elements, particularly enhancers, in a disease- and tissue-specific manner (Fig.3A; ref. 95). Risk SNPs found in enhancers can change the DNA recognition motifs of transcription factors to alter their binding to the chromatin, directly affecting the transactivation potential of enhancers, and modulate target transcript expression (95).
This is exemplified by the colorectal cancer risk–associated SNP rs6983267 identified on chromosome 8q24 (96, 97) that maps to an enhancer containing a consensus TCF4 recognition motif upstream of the MYC gene (98, 99). The variant risk allele of this SNP increases the binding of TCF4 to the enhancer compared with the reference allele, driving the aberrant overexpression of MYC (98, 99). In agreement with the 8q24 region physically interacting with the MYC promoter in other cancer types (100), the rs6983267 locus is associated with the risk of developing other cancers, including liver, lung, and prostate cancers (101, 102). More recently, multiple lymphoma risk–associated SNPs (rs2445610, rs13255292, rs7826019, and rs59602790) were shown to map to subtype-specific lymphoma cis-regulatory elements within the chromosome 8q24 risk locus (103). Separately, the risk-associated SNP rs1859961 maps to a prostate cancer–specific enhancer in the chromosome 17q24.3 prostate cancer risk locus that regulates SOX9 gene expression (104). Aberrant SOX9 gene expression is associated with increased risk for prostate cancer and is involved in prostate oncogenesis in mice (105). The rs4784227 SNP at the chromosome 16q12.1 breast cancer risk locus provides further evidence of genetic alterations targeting enhancers in cancer. The variant risk allele of the rs4784227 SNP changes the sequence of a Forkhead DNA recognition motif within an enhancer that regulates the transcription of the TOX3 gene, favoring the binding of the FOXA1 transcription factor (106). The increased binding of FOXA1 represses the transactivation ability of the enhancer through the recruitment of the transcriptional repressor Groucho/TLE, resulting in the decreased expression of the TOX3 tumor suppressor gene (106).
Finally, SNPs can also target units of enhancers referred to as clusters of regulatory elements (CORE), such as super-enhancers or stretch-enhancers, which correspond to multiple enhancers in close proximity to each other (Fig.3A; refs. 107, 108). This is showcased by the rs2168101 SNP mapping to a tissue-specific super-enhancer near the LMO1 neuroblastoma oncogene on chromosome 11p15 (109). The variant allele of this SNP disrupts the binding of GATA3 to lower the expression of LMO1, reducing the risk of developing neuroblastoma (109, 110).
Although these examples consist of single functional SNPs changing the activity of enhancers, recent work has demonstrated that multiple SNPs within a risk locus can affect distinct enhancers, classifying these as multiple enhancer variant (MEV) risk loci (111). These MEVs have been reported to contribute to disease onset, including cancer. For instance, three SNPs (rs12352658, rs7847449, and rs10759944) in linkage disequilibrium with each other within the chromosome 9q22 thyroid cancer risk locus can change the transactivation potential of two different enhancers that physically interact with the promoters of the FOXE1 and PTCSC2 genes (112). This likely accounts for the reduced expression of the FOXE1 and PTCSC2 genes associated with the 9q22 risk locus in normal thyroid tissue from patients with cancer (113). Overall, the functional interpretation of risk loci identified through genome-wide association studies showcases the contribution of genetic alteration in enhancers to promote cancer development.
Similar to inherited genetic variants, acquired somatic mutations can alter enhancer activity and contribute to oncogenesis (Fig.3A). Although enhancers are typified by a reduced mutational density compared with the genome found in heterochromatin, argued to result from active DNA repair in these elements (114), mutations do preferentially accumulate in enhancers present in the tissue from which the tumor originates (115). For example, a putative enhancer region of the PAX5 gene essential for the commitment of lymphoid progenitors into the B-cell lineage located on chromosome 9p13 is found to be recurrently mutated in chronic lymphocytic leukemia tumors (116, 117). These mutations mapping to this enhancer are significantly correlated with altered PAX5 gene expression. Moreover, the CRISPR/Cas9-mediated deletion and introduction of mutations at this enhancer region in B cells reduced PAX5 gene expression, suggesting that these mutations directly alter the activity of the enhancer to disrupt PAX5 expression (116). Moreover, heterozygous 2–18 base pair indel mutations mapping to an intergenic site 7.5 kilobases upstream of the TAL1 transcription start site reported in T-cell acute lymphoblastic leukemia (T-ALL) change the enhancer landscape by creating binding motifs for the MYB transcription factor (118). This allows MYB binding to the chromatin followed by the recruitment of its binding partner CBP, a lysine acetyltransferase, leading to the formation of a super-enhancer upstream of the TAL1 oncogene to drive its overexpression (118).
In addition to point mutations, enhancer activity is also affected by structural variants in cancer, such as inversions, translocations, and copy-number alterations (Fig.3A). In medulloblastoma, the GFI1 and GFI1B loci are translocated from transcriptionally silent chromatin regions into the proximity of active super-enhancers, presenting them as novel oncogenic drivers (119). Similarly, the repositioning of an enhancer near the GATA2 gene on chromosome 3q21 to an ectopic region near the EVI1 gene through inversions and translocations was reported in acute myeloid leukemia (120). This leads to the formation of a super-enhancer that physically interacts with the EVI1 promoter through chromatin interactions, reducing the expression of GATA2 while simultaneously increasing the expression of the EVI1 proto-oncogene (120). In glioblastoma, the aberrant expression of the TERT gene is also suggested to be affected by the rearrangement of a super-enhancer normally found on chromosome 10q22 to the TERT gene promoter located on chromosome 5p15 (121). Likewise, the overexpression of MYC reported in multiple myeloma is mediated through a translocation of a 3′ IgH super-enhancer adjacent to the MYC oncogene (107). Moreover, as a result of the fusion between MYB and QKI (MYB–QKI) in angiocentric gliomas, active enhancers including two super-enhancers demarcated by H3K27ac are translocated from the QKI gene locus located on chromosome 6q26 to the MYB gene locus located on chromosome 6q23 to support aberrant MYB expression (122). The MYB gene is also targeted by translocations in adenoid cystic carcinoma with the NFIB and TGFBR3 loci (123). The translocations juxtapose enhancers, including super-enhancers, to the MYB locus, giving rise to a positive feedback loop regulating the aberrant expression of this potent oncogene mediated by the binding of the MYB protein to the translocated super-enhancer (123). Finally, copy-number alterations in regions that harbor super-enhancers can also contribute to aberrant gene expression (Fig.3A). For instance, two focal amplification events of regions harboring super-enhancers were identified and associated with the aberrant expression of the MYC oncogene in uterine corpus endometrial carcinoma and lung adenocarcinoma (124). The CRISPR/Cas9-mediated deletion of a 1.7-kb enhancer, part of the super-enhancer region driving MYC overexpression in lung adenocarcinoma NCI-H2009 cells, led to a significant reduction in MYC expression and impaired clonogenic growth, suggesting that super-enhancer amplification can prompt aberrant gene expression (124). Additional amplification events of regions inclusive of super-enhancers associated with an increase in gene expression are also observed near the KLF5, USP12, and PARD6B genes in head and neck squamous cell carcinoma, colorectal cancer, and hepatocellular carcinoma, respectively (124).
In summary, various types of genetic alterations targeting enhancers can adversely modulate their activity to affect normal transcription and gene expression and contribute to cancer development.
Epigenetic Alterations Accumulate at Enhancers in Cancer.
Enhancer activity is also subject to epigenetic regulation (Fig.3B). Aberrant DNA methylation observed at enhancers in cancer was suggested to be more closely related to changes in gene expression than at promoters (125). This may partly be due to differential transcription factor binding. Transcription factors are suggested to bind DNA hypomethylated enhancers more readily than DNA methylated enhancers, as exemplified by the enrichment of FOXQ1 binding within DNA hypomethylated enhancers previously implicated in colorectal cancer oncogenesis (125–127). DNA hypomethylated enhancers responsive to ESR1 binding in breast cancer are also suggested to be critical for the development of ER-positive breast cancer (128). Moreover, aberrant enhancer DNA hypomethylation during oncogenesis is suggested to associate with the upregulation of cancer-related gene expression, whereas DNA hypermethylation at enhancers correlates with reduced target gene expression (128, 129). In support of this, a putative DNA hypomethylated enhancer is associated with the increased expression of its target genes, including the MYC and RNF43 oncogenes, and DNA hypermethylation at enhancers is associated with the reduced expression of DAXX and GET4 in breast cancer (126, 128). These studies suggest a linkage between aberrant DNA methylation at enhancers and its potential role in altering transcription factor binding and gene expression in cancer development.
Enhancers permissive to transcription factor binding are commonly flanked by nucleosomes mono- and dimethylated on lysine 4 of histone H3 (H3K4me1 and H3K4me2; refs. 130–133). Moreover, active enhancers are further discriminated from poised enhancers by being flanked with H3K27ac nucleosomes (Fig.3B; ref. 134). Genome-wide profiling for H3K4me1 in both normal colon epithelia and colorectal cancer cells revealed thousands of enhancers, termed variant enhancer loci (VEL), that are either lost or gained in colorectal cancer cells compared with normal colon crypts, suggestive of ectopic enhancer activity in the process of cancer initiation (86). These VEL associate with differential expression of their putative target gene in normal versus colon cancer cells (86). Specifically, enhancers active in normal colon but inactive in colorectal cancer cells are found near genes that are part of the normal colon gene expression profile and vice versa (86). VEL also characterize cancer progression. For instance, thousands of enhancers active in endocrine therapy-sensitive breast cancer cells are no longer active in endocrine therapy–resistant cells (135). This change in enhancer usage reflects differences in the transcriptional machinery that inform on alternative therapeutic strategies (135). Moreover, cells resistant to gamma-secretase inhibitor (GSI) in T-ALL appear to be epigenetically labile, as they can readily reactivate a transcriptional program typical of GSI-sensitive cells upon GSI withdrawal. Furthermore, these cells are sensitive to BRD4 inhibition (136), known to antagonize the activity of super-enhancers (137).
The cause of epigenetic alterations at enhancers is still under investigation, but whole-exome sequencing of tumor samples supports a role for genetic alterations in chromatin remodeling factors (138). This is exemplified by the mutational load in EP300 (p300), ARID1A, CREBBP (CBP), MLL3/4, and LDB1 genes reported in bladder cancer, hepatocellular carcinoma, non-Hodgkin lymphoma, medulloblastoma, breast cancer, and colon cancer (139–145). Mutations in MLL3/4 are proposed to destabilize the MLL3/4 protein, reduce its binding to transcription factors, or inactivate its catalytic domain that can affect the methylation of nucleosomes at enhancers (138, 146). Likewise, evidence suggests that mutations in the tumor suppressor ARID1A gene (147) can impinge upon the activity of this SWI/SNF chromatin remodeling complex subunit to favor oncogenesis (148). Overall, this warrants further characterization of mutations in chromatin factors to delineate their impact on cis-regulatory element activation.
Anchors of Chromatin Interaction Are Targets of Genetic and Epigenetic Alterations in Cancer
Chromosomes are organized into a hierarchy of chromatin interactions that coordinate the interplay between enhancers and promoters to regulate the expression of their target transcripts (Fig.1). Chromatin interactions, also referred to as chromatin loops, mediate the communication between diverse types of cis-regulatory elements separated by large genomic distances at the kilobases scale by bringing them into close physical proximity. Megabase-scale chromatin interactions define topologically associated domains (TAD) separated by boundaries that are broadly conserved across cell and tissue types and demarcate active from inactive chromatin domains (Fig.1A and B; refs. 149–151). Smaller-range chromatin interactions anchored at promoters facilitate the interactions with enhancers in a cell type–specific manner and relate to cell type–specific gene expression profiles (Fig.1C; refs. 152–155). These promoter–enhancer chromatin interactions are constrained within TAD boundaries because these limit their formation across adjoining TADs to insulate target gene promoters from aberrant enhancer activity (Fig.1D; ref. 156). Chromatin interactions are mediated by factors that recognize the DNA sequence at loop anchors in conjunction with intermediary proteins. The anchors that define TAD boundaries are occupied by the CCCTC-binding protein (CTCF), which recognizes a specific 12–base pair consensus motif, and the cohesin complex consisting of RAD21, STAG1, SMC1a, and SMC3 (Fig.1D; ref. 157).
Although the vast majority of TAD boundaries harbor CTCF/cohesin binding sites, cobinding of these factors does not necessarily create these boundaries. In fact, several studies have shown that the majority of CTCF/cohesin binding sites do not block physical long-range chromatin interactions and are therefore considered to be located outside of TAD boundaries (152, 155, 158). A subset of CTCF and cohesin cobound sites are implicated in interactions involving anchors within a few hundred kilobases from each other, such as promoter–enhancer or enhancer–enhancer interactions (152, 159, 160). Although CTCF binding appears to be directed to distal cis-regulatory elements as opposed to promoters (149, 161, 162), the chromatin interaction factor ZNF143 directly occupies promoters (153, 154, 163, 164) to anchor chromatin interactions (Fig.1C; ref. 154). Studies that have examined genome-wide interaction maps in conjunction with transcription factor binding profiles have identified additional factors that preferentially occupy anchors of chromatin interactions (150, 153, 154). Some of these additional proteins found at anchors, such as the Mediator complex, assist in the formation of chromatin interactions (161). However, the role for most of the factors present at anchors of chromatin interactions remains to be determined.
Genetic Alterations Target Anchors of Chromatin Interaction.
Maintaining the genetic identity of anchors of chromatin interaction ensures appropriate chromatin folding to guide the regulation of transcriptional programs in normal cells. Chromatin interaction frequencies can be affected by genetic alterations targeting anchors of chromatin interaction (Fig.4A), as showcased by the rs12913832 human pigmentation-associated risk SNP mapping to the HERC2 enhancer modulating the loop interaction with the OCA2 promoter (165). Similarly, the ZC3HAV1 gene locus harbors a functional SNP rs13228237 capable of altering the interaction frequency between the ZC3HAV1 gene promoter and a distal enhancer located 200 kilobases away by imposing an allele-specific bias in the binding of the chromatin interaction factor ZNF143 (154). Furthermore, an analysis of mutations reported in the International Cancer Genome Consortium (ICGC) pan-cancer database revealed that the DNA recognition motifs for CTCF and ZNF143 are among the motifs with the highest average number of cancer-associated mutations (166). Moreover, in a study of 213 colorectal tumors, mutations were reported to accumulate in the DNA recognition motif for CTCF and its flanking sequences (167). Variation in the sequences flanking core transcription factor binding sites has a significant impact on binding. Indeed, these variations can explain why factors from the same family, which often recognize nearly identical core recognition motifs, have distinct genome-wide binding profiles and serve different biological functionsin vivo (168–172).
Although these studies did not distinguish between mutations affecting TAD boundaries versus inner-TAD promoter–enhancer or enhancer–enhancer interactions, several recent reports have focused on the role of genetic alterations at TAD boundaries. For instance, CTCF binding sites that define TAD boundaries show a striking enrichment for mutations compared with non-boundary CTCF binding sites in liver and esophageal carcinomas (173). Moreover, the CRISPR/Cas9-mediated deletion of two CTCF/cohesin binding sites commonly mutated in T-ALL results in the loss of TAD boundaries and leads to a significant increase in LMO2 and TAL1 gene expression, two proto-oncogenes involved in hematopoiesis (173). Hence, genetic alterations to the anchors of chromatin interaction can disrupt the activity of noncoding regulatory elements and affect downstream target gene expression.
Aberrant Epigenetic Modifications Target Anchors of Chromatin Interactions in Cancer.
A distinctive feature of CTCF binding sites is the absence of DNA methylation (174–176). Genome-wide CTCF binding in multiple cell types negatively correlates with DNA methylation (176, 177). Changes to the DNA methylation profile at anchors of chromatin interactions can compromise CTCF binding and its activity (Fig.4B). This was recently reported in human IDH-mutant gliomas that exhibit a CpG island methylator phenotype (CIMP), characterized by genome-wide hypermethylation at CTCF/cohesin binding sites (178). Moreover, IDH1 mutants were shown to be sufficient in driving the CIMP phenotype in gliomas, resulting in aberrant gene expression programs (179). Hypermethylation of CTCF/cohesin binding sites interferes with CTCF binding to the chromatin, which results in altered chromatin interactions and aberrant expression of the PDGFRA oncogene (178). The CIMP phenotype is known in several other cancer types, including colorectal, breast, and endometrial cancers (180). Although the effect of genome-wide methylation on the binding of chromatin interaction factors in these cancer types is yet to be assessed, results in gliomas combined with the well-established mutual exclusivity between CTCF binding and DNA methylation on the chromatin suggest that epigenetic alterations affecting chromatin interactions might be common across many cancer types.
Clinical Implications for the Functional Noncoding Genome
Identifying Therapeutic Opportunities and Biomarkers within the Functional Noncoding Cancer Genome
Specific factors are recruited to cis-regulatory elements, including BRD4, a chromatin reader featuring two N-terminal bromodomains that bind to acetylated histones to subsequently recruit transcriptional activators (137, 181). BRD4 inactivation with bromodomain inhibitors such as JQ1 and iBET can inhibit the cis-regulatory element activity, as reported for super-enhancers that can drive oncogene overexpression. This is showcased in the repression of aberrant MYC expression in various malignancies, including medulloblastoma, B-cell acute lymphoblastic leukemia, acute myeloid leukemia, and Merkel cell carcinoma, and resistance in T-ALL that halts proliferation (136, 181–186). Moreover, BRD4 inhibition induces differentiation and growth arrest in patient-derived NUT midline carcinoma cells (186). This example suggests that chemical modulation of cis-regulatory element activity can be of benefit to treat cancer.
The identification of genetic and epigenetic alterations in functional noncoding cis-regulatory elements also provides a source of biomarkers to monitor cancer development (187–190). For instance, genetic alterations mapping to the TERT promoter are associated with advanced cancer staging and poor patient survival in patients with glioma and bladder and thyroid cancers (40, 45, 191–193). Mutations in the TERT promoter in patients with glioma particularly have been suggested to confer radioresistance and resistance to temozolomide treatment (191, 194). Moreover, aberrant hypermethylation of the TERT promoter can serve as a predictive biomarker for poor survival in patients with ependymoma (195). Similarly, the CIMP phenotype has shown promise as a discriminator for patient stratification and outcome in ependymoma, glioma, colorectal cancer, and hepatocellular carcinoma (180, 196–201). These studies collectively suggest that genetic and epigenetic alterations targeting noncoding elements offer new opportunities for biomarker discovery in cancer to guide treatment and disease monitoring.
In summary, evidence supports the critical role of the noncoding genome in maintaining normal transcriptional programs and cell identity. Genetic and epigenetic alterations targeting functional noncoding cis-regulatory elements reported in cancer can alter these transcriptional programs and promote oncogenesis. These alterations inform on tumor biology and also reveal new biomarkers for patient stratification associated with distinct outcome. Hence, the comprehensive characterization of the noncoding cancer genome offers a promising avenue to delineate new therapeutic opportunities, identify biomarkers for disease monitoring, and ultimately improve patient care.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
The authors apologize to colleagues whose work was not cited in this review due to space limitations. The authors thank Drs. Swneke D. Bailey and Paul Guilhamon (Princess Margaret Cancer Centre), Jérôme Eeckoute (Université de Lille), as well as Michael D. Wilson (Department of Molecular Genetics, University of Toronto) for their valuable comments on this review.
The National Cancer Institute at the National Institutes of Health (R01CA155004 to M. Lupien), the Princess Margaret Cancer Foundation (to M. Lupien), and The Canadian Cancer Society (CCSRI702922 to M. Lupien) supported this work. M. Lupien holds an investigator award from the Ontario Institute for Cancer Research, a new investigator salary award from the Canadian Institute of Health Research, and a Movember Rising Star award from Prostate Cancer Canada (RS2014-04).