Abstract
Mutations in protein-coding genes are well established as the basis for human cancer, yet how alterations within noncoding genome, a substantial fraction of which contain cis-regulatory elements (CRE), contribute to cancer pathophysiology remains elusive. Here, we developed an integrative approach to systematically identify and characterize noncoding regulatory variants with functional consequences in human hematopoietic malignancies. Combining targeted resequencing of hematopoietic lineage–associated CREs and mutation discovery, we uncovered 1,836 recurrently mutated CREs containing leukemia-associated noncoding variants. By enhanced CRISPR/dCas9–based CRE perturbation screening and functional analyses, we identified 218 variant-associated oncogenic or tumor-suppressive CREs in human leukemia. Noncoding variants at KRAS and PER2 enhancers reside in proximity to nuclear receptor (NR) binding regions and modulate transcriptional activities in response to NR signaling in leukemia cells. NR binding sites frequently colocalize with noncoding variants across cancer types. Hence, recurrent noncoding variants connect enhancer dysregulation with nuclear receptor signaling in hematopoietic malignancies.
We describe an integrative approach to identify noncoding variants in human leukemia, and reveal cohorts of variant-associated oncogenic and tumor-suppressive cis-regulatory elements including KRAS and PER2 enhancers. Our findings support a model in which noncoding regulatory variants connect enhancer dysregulation with nuclear receptor signaling to modulate gene programs in hematopoietic malignancies.
See related commentary by van Galen, p. 646.
This article is highlighted in the In This Issue feature, p. 627
Introduction
Advances in genome sequencing have provided critical insights into the role of pathogenic DNA alterations as cancer drivers. However, current efforts are focused on protein-coding sequences of genomes (exomes), which comprise only about 1% of human genome. How alterations within noncoding genome contribute to cancer pathophysiology remains unclear. Similarly, genome-wide genotype–phenotype association studies continue to reveal noncoding sequences that are altered in human diseases, although identification of causal elements remains difficult, impeding drug development and therapeutics.
Enhancers are noncoding cis-regulatory elements (CRE) that determine cell identity by coordinating spatiotemporal gene expression. Major progress has been made to identify candidate enhancers by genome-wide mapping of enhancer-associated histone modifications including H3 Lys27 acetylation (H3K27ac) and H3 Lys4 monomethylation (H3K4me1), transcription factors (TF), and chromatin features (1–4). The biological significance of enhancers is underscored by gene-expression studies showing the deterministic role of enhancers in directing cell type–specific transcription (2, 5–7). Highly marked clusters of enhancers or superenhancers associated with developmental or cancer-related genes have been identified in various cell types (8–11). These studies suggest a model in which a small set of lineage-defining enhancers determine cell identity in development and cancer.
Emerging evidence points to a critical role of noncoding regulatory variants as cancer drivers (12, 13). For instance, recurrent mutations in the TERT promoter create new binding motifs for ETS family TFs to enhance gene transcription in familial and sporadic melanoma (14, 15). In the context of leukemia, recurrent noncoding mutations upstream of the TAL1 proto-oncogene introduce de novo binding sites for the MYB oncoprotein in T-cell leukemia. MYB associates with these sites with CBP, RUNX1, and TAL1 to create an oncogenic superenhancer and activates TAL1 expression (16). In another example, a single chromosomal rearrangement repositions a GATA2 enhancer to ectopically activate EVI1 and confer GATA2 haploinsufficiency in inv(3)/t(3;3) acute myeloid leukemia (AML; ref. 17). These studies raise important questions about the extent to which noncoding variants contribute to cancer development, and how they work. Moreover, the functional impact of noncoding somatic mutations found in cancer genome sequencing has not been systematically investigated or validated. The main challenge is to distinguish noncoding cancer drivers from numerous nonfunctional passenger mutations and to establish the causality of noncoding variants in cancer pathophysiology.
Human nuclear receptors (NR) are a family of signaling-activated TFs that play integral roles in development and cancer (18). The PPARs (PPARα, β/δ, and γ) function as obligate heterodimers with the retinoid X receptors (RXRα, β, and γ) in the absence of ligands and bind to hormone response elements together with corepressor proteins. The retinoic acid receptors (RARα, β, and γ) also heterodimerize with RXRs and bind constitutively to DNA in the absence of ligands. Binding of agonist ligands to PPAR–RXR or RAR–RXR complexes leads to dissociation of corepressors and recruitment of coactivator proteins, resulting in transcriptional activation of downstream target genes. The ability of NRs to rapidly and dynamically respond to various developmental and environmental clues by modulating gene programs makes them versatile cellular “sensors.” As such, NRs have historically served as biomarkers for classification of several solid tumors, including breast and prostate cancers, and as targets for hormone therapy (19). In the hematopoietic system, the fusion oncogene PML–RARA provides a diagnostic biomarker for acute promyelocytic leukemia (APL) and the molecular basis for oncoprotein-targeted therapy (20).
In this work, we developed an integrative approach to identify recurrent noncoding variants by targeted resequencing of cis-regulatory elements in human leukemia. A major limiting step toward understanding noncoding cancer variants is the lack of a high-throughput platform to assess their functional effects. To that end, we employed enhanced dCas9-based epigenetic editing and performed CRE perturbation screens. This revealed a cohort of variant-associated oncogenic and tumor-suppressive CREs, and established a new molecular link between noncoding regulatory variants and NR signaling in modifying gene programs in hematopoietic malignancies.
Results
Identification of Noncoding DNA Alterations by Targeted CRE Resequencing
To determine whether human leukemia genomes harbor recurrent noncoding alterations, we generated mutational landscapes of CREs by targeted resequencing (Fig. 1A; Supplementary Fig. S1). We first annotated hematopoietic lineage–associated CREs based on genome-wide profiling of active enhancer- or promoter-associated H3K27ac in various normal and diseased hematopoietic cell types, including CD34+ hematopoietic stem/progenitor cells (HSPC), lymphocytes, macrophages, monocytes, erythroblasts, lymphoma, acute lymphoblastic leukemia (ALL), and AML [total 72 chromatin immunoprecipitation sequencing (ChIP-seq) datasets for 51 normal and 21 disease samples, respectively; Supplementary Table S1; Supplementary Fig. S1A and S1B]. We then designed a target enrichment panel consisting of 25.2 Mb DNA sequences and 161,328 capture oligos for 22,262 blood cell–associated CREs and 86 leukemia-associated genes including DNMT3A, TET2, IDH2, and EZH2 (Supplementary Table S2; Supplementary Fig. S1). Targeted resequencing was performed on 120 primary samples or leukemia cell lines consisting of 45 AML or myelodysplastic syndrome (MDS) samples, 24 lymphoma samples, 19 ALL samples, and 32 control samples (29 matched lymphocytes from AML or MDS samples, two CD34+ HSPCs from healthy donors, and one mixed genomic DNA control; Supplementary Table S3). The samples were sequenced to an average depth of 80× and 97% on-target enrichment.
A major challenge in noncoding genome-sequencing studies has been the lack of a consensus approach for de novo mutation detection with maximized sensitivity and confidence. We sought to identify somatic noncoding variants including single-nucleotide variants (SNV) and insertions/deletions (INDEL) with a higher confidence, rather than to ascertain a larger number of possible mutations with maximal sensitivity. To this end, we integrated 5 commonly used mutation callers including GATK (21), MuTect (22), Strelka (23), Scalpel (24), and VarScan2 (25), and employed a combination of 4 callers (GATK, MuTect, Strelka, and VarScan2) to identify somatic SNVs and another combination (GATK, Strelka, VarScan2, and Scalpel) for somatic INDELs in 29 tumor/normal paired AML samples, respectively (Supplementary Figs. S1C and S2A). After filtering with RepeatMasker region, high-confidence (Tier I) somatic SNVs and INDELs were identified as those called by at least 2 of 4 mutation callers (Supplementary Fig. S2A; Supplementary Table S4). We next identified noncoding variants in all leukemia or lymphoma samples and cell lines by GATK (21) followed by removing common variants found in the Single Nucleotide Polymorphism Database (dbSNP) and/or RepeatMasker regions. The resulting variants were identified as Tier II somatic SNVs and INDELs (Supplementary Fig. S2B; Supplementary Table S4). The majority of the identified high-confidence somatic SNVs (76.7%) and INDELs (82.4%) have variant allele frequencies (VAF) less than 50% (Supplementary Fig. S2C).
We applied the same mutation calling pipeline to the 86 leukemia-associated genes included in the target enrichment panel, and identified recurrent protein-coding gene mutations (Supplementary Fig. S3A). The mutation rate of coding genes is comparable to or slightly higher than in other published studies (Supplementary Fig. S2D). By combining somatic variants in all leukemia and lymphoma samples, we identified 0 to 446 (median 75) coding mutations and 8 to 530 (median 99) noncoding mutations per megabase of DNA in each sample, respectively. The frequencies of both coding and noncoding mutations were significantly higher in AML, lymphoma, and ALL samples compared with normal HSPCs or the mixed genomic DNA control (Supplementary Fig. S3A and S3B), but not significantly different between samples isolated from bone marrow (BM) and peripheral blood (PB), male and female, AML subtypes (FAB M0 to M4), or different age groups (Supplementary Fig. S3C–S3E). By clustering analysis of all noncoding variants across the genome, we identified 4,076 mutational hotspots containing noncoding variants within 100 bp of each other (Fig. 1B; see Methods).
To identify recurrently mutated CREs harboring somatic noncoding variants, we mapped the identified SNVs and INDELs to the 22,262 H3K27ac-associated CREs, respectively. CREs containing SNVs in at least six independent leukemia samples or INDELs in at least three leukemia samples were identified as recurrently mutated CREs based on a binominal distribution test (Supplementary Fig. S2E; see Methods). We reasoned that multiple independent noncoding variants may affect the same CRE by affecting the binding sites (or motifs) of TFs in each CRE. Thus, we required that the same CRE be recurrently mutated in independent samples, whereas the noncoding variants could be identical or different within the same CRE. By this analysis, we identified 1,836 recurrently mutated CREs containing leukemia-associated noncoding variants (Supplementary Fig. S2E). The gene targets and functional roles of the vast majority of the identified variant-associated CREs in leukemia pathophysiology remain unknown.
High-Throughput Functional Analysis of Recurrently Mutated CREs in Leukemia
A major limiting step toward understanding noncoding variants in cancer pathophysiology is the lack of a high-throughput and unbiased approach to assess their functional effects. To overcome this challenge, we recently engineered enhanced CRISPR/dCas9-based epigenetic perturbation systems, enCRISPRi and enCRISPRa, for targeted modulation of CRE activities in native chromatin in situ and in vivo (Fig. 2A; ref. 26). Specifically, we employed a single guide RNA (sgRNA) design containing two MS2 hairpins recognized by the MCP RNA-binding domains (27). For CRE inhibition (enCRISPRi), we fused dCas9 with LSD1 (or KDM1A), which catalyzes the removal of H3 Lys4 mono- and dimethylation (H3K4me1/2). We next engineered CRE-targeting sgRNAs containing MS2 hairpins to recruit the MCP–KRAB repressor domains. For CRE activation (enCRISPRa), we fused dCas9 with the core domain of p300, which catalyzes H3K27ac, together with sgRNA–MS2 to recruit the MCP–VP64 activator domains. Because H3K27ac and H3K4me1/2 are the signature histone marks for active enhancers and/or promoters (2, 28), by inducible expression of dCas9–LSD1 (or dCas9–p300), sgRNA–MS2, and MCP–KRAB (or MCP–VP64), we engineered a CRE–targeting dual repressor or activator system (Fig. 2A; ref. 26).
To investigate the functional role of the recurrently mutated CREs in human leukemia, we performed enCRISPRi- and enCRISPRa-mediated perturbation screens in an established human AML cell line, MKPL-1 (Fig. 2A). We first designed a pooled sgRNA library containing 4 or 5 individual sgRNAs per CRE (total 9,224 sgRNAs for 1,836 recurrently mutated CREs) and 198 nontargeting sgRNAs as negative controls (Supplementary Fig. S4A; Supplementary Tables S5 and S6). We then generated AML cell lines stably expressing doxycycline-inducible enCRISPRi or enCRISPRa transgenes (Fig. 2B). Upon lentiviral transduction of pooled sgRNAs at multiplicity of infection (MOI) < 0.5, the transduced cells were selected, induced for enCRISPRi or enCRISPRa expression, and cultured for 28 days before harvesting the genomic DNA. The sgRNA sequences were PCR-amplified and measured by next-generation sequencing to determine the enrichment or dropout in day 28 (T28) relative to day 0 (T0) cells. We performed independent enCRISPRi and enCRISPRa perturbation screens (each with three replicate experiments) using the same pooled sgRNAs, respectively. The results from replicate screens were highly consistent (Fig. 2C; Supplementary Fig. S4B–S4D). Moreover, the negative control sgRNAs displayed no or minimal dropout or enrichment relative to CRE-targeting sgRNAs in enCRISPRi and enCRISPRa screens (P = 0.69 and 0.472 by Welch t test, respectively; Supplementary Fig. S4E and S4F).
We reasoned that CREs showing strong leukemia cell growth inhibition (e.g., sgRNA dropout in T28 relative to T0) upon enCRISPRi-mediated repression and growth enhancement (e.g., sgRNA enrichment in T28 relative to T0) upon enCRISPRa-mediated activation should nominate candidate “oncogenic” CREs. Likewise, CREs showing strong growth enhancement upon enCRISPRi and growth inhibition upon enCRISPRa should nominate candidate “tumor-suppressive” CREs. To this end, we combined results from enCRISPRi and enCRISPRa perturbation screens of 1,836 recurrently mutated CREs, and identified 131 candidate oncogenic CREs with significant sgRNA dropout by enCRISPRi (log2 fold change ≤ −1.5 in T28 relative to T0) and sgRNA enrichment by enCRISPRa (Fig. 2D; Supplementary Fig. S4E and S4F; Supplementary Table S7). Similarly, we identified 87 candidate tumor-suppressive CREs, which displayed significant sgRNA dropout by enCRISPRa (log2 fold change ≤ −1.5 in T28 relative to T0) and sgRNA enrichment by enCRISPRi (Fig. 2D; Supplementary Fig. S4E and S4F; Supplementary Table S7). We next categorized all CREs into annotated promoters and enhancers, and identified the candidate oncogenic or tumor-suppressive promoters and enhancers, respectively (Supplementary Fig. S5A and S5B; Supplementary Table S7). The candidate oncogenic or tumor-suppressive promoters include those of several known leukemia-associated genes such as RUNX1, TAL1, and MSI2 (Supplementary Fig. S5A). Importantly, although several candidate enhancers locate in the proximity of known leukemia-associated genes such as MYB and BCAT1, the vast majority of their gene targets are unknown, and their functional and mechanistic roles in leukemia remain unexplored. In addition, the oncogenic CRE-associated genes displayed strong dependency by CRISPR/Cas9-based screens in multiple AML cell lines (Supplementary Fig. S6; ref. 29). By comparing the 218 identified functional CREs (131 oncogenic and 87 tumor-suppressive) with CREs containing noncoding mutational hotspots, we observed that 82 of 218 (37.6%) functional CREs overlapped with hotspot-containing CREs, indicating that the functional CREs are enriched with noncoding mutational hotspots (P = 4.0e-15 by hypergeometric distribution; Fig. 2E).
To validate the efficacy of enCRISPRi- and enCRISPRa-mediated CRE perturbations on target gene expression, we cloned individual sgRNAs for 14 CREs including 5 candidate oncogenic CREs, 4 tumor-suppressive CREs, and 5 other CREs (see Methods; Supplementary Table S5). Upon coexpression of enCRISPRi or enCRISPRa with individual CRE-specific or nontargeting (sgGal4) sgRNAs in MKPL-1 cells, we noted that enCRISPRi resulted in significant downregulation of 13 of 14 genes relative to sgGal4 (≥2-fold, P < 0.05; Fig. 2F; Supplementary Fig. S7A–S7C). Similarly, enCRISPRa resulted in significant upregulation of 12 of 14 genes (≥2-fold, P < 0.05). Importantly, we also observed significant gene repression or activation when targeting enCRISPRi or enCRISPRa to 4 of 5 other CREs (Fig. 2F; Supplementary Fig. S7C), whereas no significant sgRNA enrichment or dropout was observed in the enCRISPRi/a screens (Fig. 2D). These results not only provide direct evidence to validate the efficacy of enCRISPRi/a in transcriptional repression/activation of CRE-associated genes, but also demonstrate that the lack of functional effect on AML cell growth (e.g., sgRNA enrichment or dropout) was not due to insufficient CRE repression/activation. To determine the efficacy of enCRISPRi/a across cell lines, we performed similar analyses in MOLM-13 AML and Jurkat ALL cells. We found that 13 of 14 genes were significantly repressed in AML (MKPL-1 and MOLM13) and ALL (Jurkat)cells by enCRISPRi (≥2-fold, P < 0.05), and 10 of 14 genes were significantly activated in AML and ALL cells by enCRISPRa (≥2-fold, P < 0.05; Fig. 2F; Supplementary Fig. S7A–S7C). By comparing the gene-expression changes of all 14 CRE perturbations in AML (MKPL-1) and ALL (Jurkat) cells, we observed a significant and positive correlation of the targeted CREs (Pearson correlation coefficient R = 0.844, P < 0.0001; Fig. 2G). These results demonstrate that enCRISPRi and enCRISPRa resulted in consistent repression or activation of CRE-associated genes in both AML and ALL cells.
An Enhancer Controlling the KRAS Proto-Oncogene Harbors a Noncoding Variant in Leukemia
By enCRISPRi and enCRISPRa perturbation screens, we identified CRE4399, located approximately 135 kb upstream of the KRAS gene, as one of the variant-associated oncogenic enhancers (Fig. 2D; Supplementary Fig. S5B). KRAS is a proto-oncogene that encodes a member of the small GTPase superfamily. Activating mutations in KRAS are common in solid tumors but relatively rare in hematologic malignancies (30, 31); however, KRAS expression is increased and associates with poor survival in patients with AML (Supplementary Fig. S8A and S8B). CRE4399 displays strong chromatin accessibility by DNase I hypersensitivity (DHS), Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq), and enrichment of H3K27ac signals, and interacts with the KRAS promoter through DNA looping by RNA Pol II (RNAPII)–mediated ChIA-PET or Hi-C analyses in normal CD34+ HSPCs and leukemia cell lines (K562 and THP1; Fig. 3A; Supplementary Fig. S8C), consistent with a putative enhancer element for KRAS. One of the identified noncoding variants locates at chr12:25538881:C>T within CRE4399 in an AML sample. To establish the functional requirement for variant-associated CRE4399 in KRAS gene regulation, we first performed CRISPR/Cas9-mediated knockout (KO) of CRE4399 in MKPL-1 cells (Supplementary Fig. S10A and S10B). We found that the expression of KRAS mRNA was significantly impaired in multiple single cell–derived CRE4399 KO clones, whereas the neighboring genes within the same topologically associating domain (TAD) were not affected (Fig. 3B; Supplementary Fig. S10A and S10B). These results demonstrate that CRE4399 functions as a gene-distal enhancer for KRAS and is required for KRAS expression in AML cells.
We next determined the requirement of CRE4399 for AML cell growth by doxycycline-inducible enCRISPRi-mediated negative selection competition analysis in MKPL-1 cells (Fig. 3C; Supplementary Fig. S8D; see Methods). We found that inducible repression of the KRAS enhancer (CRE4399) by independent sgRNAs resulted in consistent and significant inhibition of AML cell growth in vitro. Finally, we assessed the functional role of the KRAS enhancer in AML growth in vivo using xenotransplantation assays (Fig. 3D). As the positive control, KO of KRAS coding DNA sequences (KRAS KO) significantly impaired MKPL-1 cell growth in xenotransplanted NOD-scid IL2Rgnull (NSG) mice (Fig. 3D and E), consistent with the oncogenic role of KRAS in AML. Importantly, KO of KRAS enhancer (two independent single cell–derived CRE4399 KO1 and KO2 clones) also impaired AML growth, resulting in less tumor burden with significantly decreased bioluminescence signals of the luciferase-expressing AML cells 4 weeks post-transplantation (Fig. 3D and E). Compared with wild-type (WT) control cells, KRAS enhancer KO cells led to significantly less leukemic burden in BM and PB of xenotransplanted recipients by flow cytometry and bloodsmear analyses (Fig. 3F and G). These results demonstrate that KRAS and its variant-associated enhancer are required for AML cell growth in vitro and in vivo.
KRAS Enhancer Variants Colocalize with NR Binding Sites
Noncoding variants can interfere with transcriptional regulation by modulating the DNA-binding activity of TFs. By surveying candidate TF binding sites (or motifs) overlapping with or near KRAS enhancer–associated variants, we identified several motifs for NR family TFs including PPARG (PPARγ) and RXRA (RXRα) that colocalize with the KRAS enhancer variant (Fig. 3H). The canonical sequence of PPAR response element (PPRE) is composed of two AGGTCA-like motifs directionally aligned with a single nucleotide spacer. When heterodimerized with RXRA, the PPARG–RXRA motif may contact two half sites from PPARG and RXRA, respectively. It is important to note that PPARs may recognize a 12-bp DNA sequence (WAWVTRGGBBAH) instead of the generally accepted 6-bp sequence (AGGTCA) in vitro and in vivo (32). The optimized RXRA hexad binding sequence is RGKTYA, which makes the optimal PPAR–RXRA binding sequence WAWVTRGGBBAHRGKTYA (32). The predicted PPARG–RXRA motif at the KRAS enhancer appears to match the optimized PPRE with the AML-associated noncoding variant overlapping with one of the half-sites (Supplementary Fig. S8E). We next determined whether the KRAS enhancer harbors recurrent mutations in other leukemia samples or other types of human cancers. By analyzing the targeted CRE sequencing and whole-genome sequencing (WGS) datasets from International Cancer Genome Consortium (ICGC; http://icgc.org), Catalogue of Somatic Mutations in Cancer (COSMIC; ref. 33), and cBioPortal (34), we identified 13 additional variants at the KRAS enhancer in 18 individual cancer samples including T-cell acute lymphoblastic leukemia (T-ALL), breast, lung, and pancreatic cancers (Fig. 3I; Supplementary Fig. S8E; Supplementary Table S8). Importantly, 8 of 14 unique mutation sites at the KRAS enhancer locate in the proximity (±10 bp) of binding sites for PPARG–RXRA and/or PPARs (Fig. 3I; Supplementary Fig. S8E), suggesting that noncoding variant–mediated alterations of NR binding at the KRAS enhancer may be a generalizable mechanism in cancer cells.
PER2, a Clock Gene, Is Controlled by a Distal Enhancer in Leukemia Cells
By CRE perturbation screens, we also discovered that CRE12661 near the PER2 gene was among the candidate tumor-suppressive enhancers in AML cells (Fig. 2D; Supplementary Fig. S5B). PER2 acts as a negative regulator of CLOCK and BMAL1 genes in controlling circadian rhythm (35). Notably, Per2-mutant mice are deficient in circadian clock function and are unusually cancer-prone (36). These mice show a neoplastic growth phenotype and an increased sensitivity to γ radiation, indicating a tumor-suppressive role. Lowered PER2 expression is common in many tumors including AML (37, 38) and is associated with poor survival (Supplementary Fig. S9A and S9B), whereas forced expression of PER2 leads to growth inhibition and apoptosis of cancer cells (37). Emerging evidence in several in vivo models also points to important roles of other core circadian genes in tumorigenesis (39, 40). Disruption of the circadian machinery leads to changes in cellular function such as cell division and metabolism, both highly relevant to cancer. Despite these findings, the function and regulation of PER2 in human leukemia remain poorly understood.
Located approximately 11 kb downstream of the PER2 gene, CRE12661 displays strong chromatin accessibility and H3K27ac, and interacts with the PER2 promoter by DNA looping in normal CD34+ HSPCs and leukemia cell lines (Fig. 4A; Supplementary Fig. S9C), consistent with a putative gene-distal enhancer for PER2. Using similar approaches, we found that CRISPR/Cas9–mediated CRE12661 KO resulted in significant downregulation of PER2 without affecting neighboring genes within the same TAD in MKPL-1 cells (Fig. 4B; Supplementary Fig. S10C and S10D). Moreover, inducible enCRISPRa-mediated activation of CRE12661 resulted in consistent and significant inhibition of MKPL-1 cell growth in vitro (Fig. 4C; Supplementary Fig. S9D), consistent with the tumor-suppressive function of PER2 enhancer in leukemia. To establish the requirement of PER2 enhancer in leukemia development in vivo, we xenografted CRE12661 KO MKPL-1 cells (two independent single cell–derived KO1 and KO2 clones) together with PER2 overexpression (OE) cells as controls. We observed that CRE12661 KO significantly promoted AML growth, tumor burden, and leukemic phenotypes in xenotransplanted NSG mice, whereas PER2 OE significantly inhibited leukemia development in vivo (Fig. 4D–G). Together, these results not only identify a variant-associated gene-distal enhancer for PER2, but also demonstrate that PER2 and its enhancer play a tumor-suppressive role in AML cells.
Noncoding Variants at a PER2 Enhancer Colocalize with NR Binding Sites
By targeted CRE resequencing, we identified two high-confidence noncoding variants within the PER2 enhancer (CRE12661) including chr2:239142998:A>CA in an AML sample. By surveying candidate TF binding sites near the PER2 enhancer variant, we also identified enrichment of PPARG–RXRA and PPARA–RXRA motifs within 10 bp of the variant (Fig. 4H; Supplementary Fig. S9E). The predicted PPARG–RXRA motif at the PER2 enhancer contains only one conserved half-site that matches the canonical PPRE, suggesting that it may be a noncanonical PPRE or a weak canonical PPRE (41). Of note, the identified AML-associated variant (chr2:239142998:A>CA) at the PER2 enhancer locates 6 bp upstream of the predicted PPARG–RXRA motif (Supplementary Fig. S11A). The 5′ flanking sequences at PPAR–RXR binding sites have also been shown to be essential for PPAR binding and NR subtype specificity (42–44). Therefore, the AML-associated noncoding variant at PER2 enhancer may also interfere with NR function by affecting the 5′ flanking sequence of the PPARG–RXRA binding site. Moreover, analysis of the targeted CRE resequencing and WGS datasets from ICGC, COSMIC, and cBioPortal revealed 10 additional noncoding variants at the PER2 enhancer in 15 individual cancer samples (Fig. 4I; Supplementary Table S8). Importantly, 9 of 11 unique mutation sites at the PER2 enhancer are present in the proximity (±10 bp) of binding sites for PPARG–RXRA and/or PPARs (Fig. 4I; Supplementary Fig. S9E), consistent with a broad role for noncoding variants in modulating NR binding at the PER2 enhancer in cancer cells.
Noncoding Variants Modulate NR-Mediated Transcriptional Activity at the KRAS and PER2 Enhancers
To directly assess whether the noncoding variants impair NR-dependent transcriptional activities at KRAS and PER2 enhancers, we performed enhancer reporter assays in three independent cell types including AML (MKPL-1), erythroleukemia (K562), and 293T (Fig. 5A–C; Supplementary Fig. S11B–S11E). Compared with the no-enhancer control, the WT KRAS enhancer sequence significantly activated luciferase reporter expression in all three cell types (3.0- to 7.9-fold, P < 0.01; Supplementary Fig. S11B). The leukemia variant–containing KRAS enhancer sequence (MUT) further increased reporter expression (6.1- to 15.4-fold relative to control, P < 0.01; 1.4- to 2.2-fold relative to WT, P < 0.01). Importantly, activation of RXRs and RARs by 9-cis-RA (45), PPARs by bezafibrate (46), or PPARγ (PPARG) by rosiglitazone (47), but not PPARα or PPARβ (fenofibrate and GW0742), further enhanced WT KRAS enhancer activity (2.0- to 3.2-fold relative to DMSO, P < 0.001; Fig. 5B; Supplementary Fig. S11D), consistent with the presence of PPRE at the KRAS enhancer. Both PPARG and RXRA are expressed in three cell types (Supplementary Fig. S11F). These results suggest that the KRAS enhancer contains a functional PPRE, and PPARG–RXRA can activate KRAS enhancer upon ligand-induced NR signaling. More importantly, the mutant KRAS enhancer displayed increased enhancer activity upon activation of RXR–RAR and PPARG signaling (4.4- to 5.8-fold relative to DMSO, P < 0.001; Fig. 5B; Supplementary Fig. S11D), whereas depletion of PPARG and RXRA significantly impaired NR-mediatedactivation of KRAS WT and MUT enhancers in AML cells (Supplementary Fig. S11G–S11I), suggesting that the noncoding variant at the KRAS enhancer increased both the basal and NR-induced enhancer activities in leukemia cells. Consistent with these findings, we found by ChIP-seq and ChIP-qPCR that RXRA and PPARG strongly associate with the endogenous KRAS enhancer in multiple human cancer cell types including MKPL-1 AML cells (Fig. 5D–G).
To test whether the noncoding variants indeed enhance NR binding at the KRAS enhancer, we performed a series of electrophoretic mobility shift assay (EMSA) experiments (Fig. 5H). First, we confirmed the binding of PPARG and RXRA individually or together using the validated PPARG–RXRA motif-containing DNA probe (Supplementary Fig. S11J; Supplementary Table S5). Second, we compared the binding affinity of PPARG–RXRA to the WT or MUT CRE4399 DNA sequence, and observed a modest but significant increase in PPARG/RXRA binding at the MUT sequence (Fig. 5H; Supplementary Fig. S11K). Third, the PPARG–RXRA-binding probe, but not the nonspecific competitor, effectively abolished protein binding to both WT and MUT sequences, indicating specific binding of PPARG–RXRA. Finally, PPARG and RXRA antibodies further increased the mobility shift of protein–DNA complexes (supershift; Fig. 5H), suggesting that binding to the KRAS enhancer sequence involves PPARG–RXRA. A significant increase in supershift was also observed at the MUT relative to WT sequence (Fig. 5H; Supplementary Fig. S11K). Together, these results support a model in which a noncoding variant at the KRAS enhancer functions to increase the DNA binding of NR proteins and enhance NR-mediated transcriptional activation of KRAS expression in leukemia cells.
We next investigated the role of an AML-associated variant (chr2:239142998:A>CA) at the PER2 enhancer with or without NR agonists. The PER2 WT enhancer sequence activated luciferase reporter expression by 2.3- to 35.3-fold, which was significantly impaired by the variant-containing MUT enhancer sequence (Supplementary Fig. S11C). Activation of RXR–RAR by 9-cis-RA, PPARs by bezafibrate, or PPARG by rosiglitazone further enhanced WT PER2 enhancer activity by 3.9- to 4.8-fold; however, the variant-containing MUT enhancer displayed decreased activity (Fig. 5C; Supplementary Fig. S11E). Depletion of PPARG and RXRA significantly impaired the PER2 WT and MUT enhancer activity (Supplementary Fig. S11I), suggesting that the PER2 enhancer variant attenuates both the basal and NR-induced enhancer activities. RXRA and PPARG strongly associated with the PER2 enhancer in cancer cell types including AML cells (Fig. 5E–G). By a series of EMSA experiments, we observed that the PER2 MUT enhancer sequence significantly impaired the binding of PPARG and RXRA proteins (Fig. 5I; Supplementary Fig. S11K). Similar to the KRAS enhancer, PPARG–RXRA binding to the PER2 enhancer was effectively abolished by the PPARG–RXRA-binding probe and supershifted by PPARG–RXRA antibodies. Together, these results support a model in which the noncoding variant at the PER2 enhancer functions to impair NR-mediated PER2 activation in leukemia.
Knock-In of Noncoding Variants Modulates KRAS and PER2 Enhancer Activity In Situ
Enhancers function by recruiting sequence-specific TFs, coregulators, and chromatin factors in a native chromatin to control spatiotemporal gene transcription. We next determined whether the leukemia-associated variants affect endogenous enhancer activity by site-specific knock-in (KI) of noncoding variants to the WT enhancer sequences in leukemia cells. To this end, we designed sgRNAs targeting the endogenous KRAS (CRE4399) or PER2 (CRE12661) enhancer sequence, respectively. We next coexpressed the Cas9 nuclease, KRAS or PER2 enhancer–targeting sgRNA, and the single-strand donor oligo (ssDNA donor) containing the targeted variant and PAM site mutation, which rendered the targeted KI allele resistant to Cas9 cleavage, in human K562 leukemia cells (Fig. 6A). Upon Cas9/sgRNA-mediated cleavage of the targeted WT enhancer DNA, the KI allele containing the noncoding variant was introduced by homology directed repair (HDR).
We generated two independent single cell–derived KI cell lines, KI-Mut1 and KI-Mut2, each containing noncoding variants at one of the KRAS (CRE4399) or PER2 (CRE12661) enhancer sequences as monoallelic KI, respectively (Supplementary Fig. S12A and S12B). We first determined the effects of enhancer variants on baseline and NR-induced gene expression. Compared with the unmodified WT cells, cells with KI of KRAS enhancer variant (chr12:25538881:C>T) had increased baseline KRAS expression in DMSO-treated KI-Mut cells (2.7- and 2.2-fold, P < 0.001; Fig. 6B). Upon activation of NR signaling by treatment with 1 μmol/L 9-cis-RA and 5 μmol/L rosiglitazone, the expression of KRAS was further increased in KI-Mut cells relative to WT cells (5.5- and 5.1-fold, P < 0.001). These results suggest that the leukemia-associated noncoding variant functions to enhance NR-mediated transcriptional activation of the KRAS enhancer in leukemia cells. In contrast, KI of the PER2 enhancer variant (chr2:239142998:A>CA) led to modest but significant downregulation of baseline and NR-induced PER2 expression (Fig. 6C), consistent with the role of PER2 enhancer variant in attenuating its activity in leukemia. We next determined the effects of enhancer variants on the binding of NRs in native chromatin by ChIP assays using antibodies for PPARG and RXRA in KI-Mut cells. The NR-immunoprecipitated or input control DNA was quantified for allele frequency by amplicon sequencing. Compared with input DNA, PPARG–RXRA ChIP DNA were significantly enriched with KI-Mut relative to WT alleles in two independent KI cell lines (Fig. 6D), suggesting that the KRAS enhancer variant enhanced NR chromatin occupancy in leukemia cells. In contrast, the PER2 KI-Mut allele was significantly depleted relative to WT allele in PPARG–RXRA ChIP DNA (Fig. 6E), suggesting that the PER2 enhancer variant impaired NR chromatin occupancy in leukemia cells. Therefore, by site-specific KI to endogenous enhancer sequences, our findings demonstrate that the leukemia-associated noncoding variants modulate NR chromatin occupancy at KRAS and PER2 enhancers in situ, resulting in altered enhancer function and aberrant expression of KRAS and PER2 in leukemia cells.
We next determined the effect of KRAS or PER2 enhancer KI on leukemia development by xenotransplantation of WT or KI cells to NSG mice. We found that KRAS enhancer KI enhanced leukemogenesis, resulting in significantly increased leukemia burden in the peripheral blood of recipients (Supplementary Fig. S12C). These results are consistent with the oncogenic role of KRAS enhancer variant in AML by enhancing NR-mediated activation of the proto-oncogene KRAS. Similarly, we noted enhanced leukemia development by PER2 enhancer KI (Supplementary Fig. S12D), consistent with the function of PER2 enhancer variant in leukemia by impairing NR-mediated activation of the tumor suppressor PER2. Finally, we performed motif enrichment analysis of DNA sequences surrounding the noncoding variants (±25 bp) at the functionally validated oncogenic CREs from enCRISPRi and enCRISPRa screens (Fig. 2D). We also identified PPAR, RXRA, and RARA as the top enriched NR motifs relative to genomic background (Fig. 6F), suggesting that alterations of NR transcriptional activity by noncoding variants are frequent events in human leukemia. Our analyses of two examples (KRAS and PER2) strongly support the context-specific function of noncoding variants that may confer either loss-of-function or gain-of-function activity (Fig. 6G). Importantly, loss-of-function and gain-of-function activities associated with enhancer variants can be explained by the interference with DNA binding of NRs and the signaling-induced transcriptional activity for at least a subset of affected CREs in vitro and in native chromatin. Taken together, these studies support a model in which pathogenic noncoding variants cooperate with NRs to rewire signaling-dependent gene programs in cancer.
Discussion
Noncoding Variants in Cancer Pathophysiology
Compared with coding mutations that often contribute to cancer development through “qualitative” changes in protein expression, structure, or function, noncoding variants display several unique features including (i) “quantitative” control of gene expression through modulation of cis-regulatory elements and/or 3-D genome organization; (ii) integration with gene transcription through modulation of DNA binding of TFs, chromatin-modifying complexes, and/or structural proteins; (iii) interaction with cellular signaling pathways and downstream effectors, as illustrated in this study. These features allow a more “plastic” and reversible control of gene expression in response to altered growth and signaling pathways, which may be important for the maladaptation of cancer cells in tumor microenvironments.
Given that noncoding sequences comprise nearly 99% of the human genome, and the incidences of noncoding variants significantly outnumbered coding variants, a major challenge is to assign functional relevance and prioritize noncoding variants for in-depth mechanistic studies. The development of the CRE-targeting epigenetic editing system allowed us to systematically interrogate the impact of CRE repression or activation on cell growth phenotypes, and prioritize functionally validated CREs in an orthogonal fashion with mutational analyses. One caveat of using cell growth phenotypes as the functional “readout” is that CRE variants may affect other pathologic features such as cancer invasion and/or escape from immune surveillance that are likely missed in growth-based studies. Integrative analyses of both CRE repression (enCRISPRi) and activation (enCRISPRa) screens provide higher confidence to establish causality between CRE activity and growth phenotypes. It is reassuring that the top ranked “oncogenic” or “tumor-suppressive” CREs consist of promoters or enhancers in the proximity of several known leukemia-associated genes such as RUNX1, MYB, TAL1, and MSI2. More importantly, we also identified variant-associated enhancers for KRAS and PER2 with unexpected but strong effects on cell growth upon activation or repression. Overall, the mutational analyses (Fig. 1), CRE perturbation screens (Fig. 2), and functional (Figs. 3 and 4) and molecular (Figs. 5 and 6) studies support the model in which recurrent noncoding variants impinge on CREs and associated genetic pathways as critical regulatory nodes to control cancer cell phenotypes.
We noted heightened mutation frequency at noncoding mutational hotspots relative to protein-coding sequences or other genomic regions. The increased mutation frequency at the noncoding regulatory genome may be due to mechanisms associated with DNA accessibility, replication timing, chromatin organization, and/or impaired nucleotide excision repair (48–51). The higher sequencing coverage in this study may also increase the detection sensitivity in heterogeneous tumor samples. It is also important to note the limitations of using the human reference genome in enhancer annotation and mutation analyses. For instance, genome rearrangements may cause aberrant CRE function by juxtaposition and/or reorganization of chromatin structure, and these aberrations can be missed by short-read sequencing. In addition, it is possible that some of the highly recurrent noncoding mutations may be due to misannotation and/or nonrepresentative sequences in the reference genome. Hence, our studies highlight the importance of follow-up studies, such as the enCRISPRi and enCRISPRa screens described here, to identify “functionally” relevant CREs or genes for detailed molecular studies.
KRAS and PER2 Enhancer Variants in Hematopoietic Malignancies
By demonstrating that the KRAS enhancer variant confers gain-of-function activity to promote KRAS expression in leukemia cells, our results provide a mechanism that may explain the lack of KRAS hotspot coding mutations compared with other RAS family proteins such as NRAS. NRAS hotspot coding mutations (e.g., G12D, G13D, and Q61R) are prevalent in hematologic malignancies, whereas KRAS mutations are rarely found (30, 31). KrasG12D mutation was found to impair normal hematopoietic stem cell (HSC) self-renewal and differentiation of multiple hematopoietic lineages even in the heterozygous state, resulting in rapid and fatal myeloproliferative diseases in mice (52–55), suggesting that KRAS coding mutations may be detrimental to normal HSCs. In contrast, modest overexpression (∼2-fold) of the WT KRAS protein in mice increased HSC proliferation and repopulating activity, and improved hematopoietic regeneration and survival following high-dose irradiation (56). Noncoding variants at the KRAS enhancer resulted in quantitative changes in KRAS expression without qualitative alterations in KRAS protein, which may be advantageous for the clonal progression of leukemia-initiating cells without impairing normal HSCs. Hence, the dosage regulation of proto-oncogenes through altered noncoding cis-regulatory elements may expand the current catalog of genetic alterations as cancer drivers.
Our studies focusing on KRAS and PER2 also support the context-dependent function of noncoding variants that may confer loss-of-function or gain-of-function activity. Interestingly, alterations at both enhancers can be explained by the interference with NRs to enhance or impair NR binding. Because KRAS and PER2 variant–containing enhancers are located 135 kb and 11 kb away from the coding sequences, respectively, it remains challenging to assess how enhancer variants affect allele-specific gene transcription at native chromatin in response to NR signaling. Nonetheless, site-specific KI of enhancer variants resulted in alterations of KRAS or PER2 expression in leukemia cells, respectively. Moreover, motifs for NRs are highly enriched in variant-associated enhancers in leukemia and other cancers, suggesting that altered NR binding by noncoding variants may be a generalizable mechanism to deregulate proto-oncogenes or tumor suppressors. These findings contrast with coding mutations that usually function through distinct mechanisms for proto-oncogenes (gain-of-function) and tumor suppressors (loss-of-function). Rather, enhancer variants for proto-oncogenes or tumor suppressors may converge on the same regulatory pathways, such as those reported here, and represent a unique feature of pathogenic noncoding variants.
Noncoding Variants Link Enhancer Dysfunction with NR Signaling
Noncoding regulatory variants may modulate NR function in human diseases by affecting NR expression or chromatin binding. For instance, deregulated estrogen receptor signaling due to noncoding variants at cis-regulatory elements controlling the estrogen receptor-α (ESR1) gene underlies the persistent expression of ESR1 in breast cancers (57). Natural genetic variations at PPARG binding sites modulate PPARG-binding activity and subsequent expression of quantitative trait loci associated with disease risk and drug response in diabetes (58). Here, we uncover a new molecular link between NR signaling and noncoding variants through dysregulated enhancer function in human leukemia. In hematopoiesis, RXRA controls a set of genes required for HSC self-renewal, and its downregulation allows the differentiation of myeloid lineages (20). Noncoding variants may enhance or attenuate PPAR–RXR binding at a subset of enhancers (e.g., KRAS and PER2) to modulate NR-mediated transcriptional programs. Altered expression of NR target genes may impair HSC self-renewal and/or lineage differentiation to promote the development of hematologic disorders. Moreover, physiologic or therapeutic activation of NR signaling may further cooperate with variant-containing enhancers, as illustrated in this study, resulting in amplification of the effects associated with dysregulated transcriptional programs. Therefore, our results support a mechanism underlying deregulated NR signaling through interactions with noncoding regulatory variants in human cancer. Comprehensive categorization of noncoding variants may identify new diagnostic biomarkers that are independent of protein-coding genes. Pathogenic noncoding variants may also serve as potential therapeutic targets through genetic (e.g., enCRISPRa, enCRISPRi, or base editing) or pharmacologic (e.g., NR agonists and antagonists) approaches. Hence, our study provides an integrative framework to identify and functionally dissect noncoding regulatory variants in human diseases including cancer.
Methods
Patient Samples
Primary human leukemia and lymphoma samples were collected for diagnosis and deidentified for our study. This study was approved by the Institutional Review Board at UT Southwestern (IRB STU 122013-023). Samples were frozen in FBS with 10% DMSO and stored in liquid nitrogen.
Cells and Cell Culture
Human MKPL-1, MOLM-13, K562, and Jurkat cells were cultured in RPMI 1640 medium containing 10% FBS, 1% penicillin/streptomycin. 293T cells were cultured in DMEM containing 10% FBS, 1% penicillin/streptomycin. For agonist treatment, cells were cultured with 9-cis-RA (1 μmol/L, Sigma-Aldrich #R4643), ATRA (1 μmol/L, Sigma-Aldrich #R2625), bezafibrate (60 μmol/L, Sigma-Aldrich #B9273), fenofibrate (30 μmol/L, Sigma-Aldrich #F6020), GW0742 (0.05 μmol/L, Sigma-Aldrich #G3295), or rosiglitazone (5 μmol/L, Sigma-Aldrich #R2408) for 48 to 72 hours, respectively. All cultures were incubated at 37°C in 5% CO2. All cell lines were tested for Mycoplasma contamination. No cell lines used in this study were found in the database of commonly misidentified cell lines that are maintained by the International Cell Line Authentication Committee and NCBI BioSample.
Mice and Xenograft Experiments
NOD-scid IL2Rgnull (NSG) mice were purchased from and maintained at the animal core facility of UT Southwestern (Dallas, TX). Animal studies were approved and conducted under the oversight of the Institutional Animal Care and Use Committee at UT Southwestern. In brief, luciferase cassette was amplified and cloned into pLVX-Puro vector (Clontech, catalog no. 632164). Lentivirus was produced to infect the target cells. Puromycin selection (1 μg/mL) was performed 3 days after infection. Six-to-8-week-old female NSG mice were sublethally irradiated (2.5 Gy) a half day before the transplantation. Cells (1 × 106/mice) were resuspended in PBS (200 μL/mice) and intravenously transplanted. Transplanted mice underwent in vivo bioluminescence imaging at various time points to evaluate the tumor growth. Briefly, following intraperitoneal injection of 150 mg/kg d-luciferin (Gold Biotechnology), mice were imaged, and bioluminescence intensity was quantitated using Living Image 3.2 acquisition and analysis software (Caliper Life Sciences). Total flux values were determined by drawing regions of interest of identical size over each mouse and were presented in photons (p)/second (sec). PER2 and KRAS enhancer KO cells were analyzed side-by-side using the same control mice; thus, the same control bioluminescence images were used in Figs. 3D and 4D. Four weeks after transplantation, the PB and BM cells were assessed for engraftment by flow cytometry. Bloodsmear was performed and samples were stained with May–Grunwald–Giemsa. For enhancer KI xenotransplantation, cells were transduced with pLVX-EF1a-IRES-zsGreen1 lentivirus and intravenously transplanted as described above, followed by intraperitoneal injections of 10 mg/kg rosiglitazone and 1 mg/kg 9-cis-RA twice a week for 6 weeks prior to PB collection and analysis.
Plasmids
To generate the inducible dCas9-p300 expression vector, the p300 core domain was amplified from the pcDNA-dCas9-p300-Core vector (Addgene, #61357) and cloned into MluI/BstXI digested pHR-TRE3G-KRAB-dCas9-P2A-mCherry backbone (59). To generate the inducible dCas9-LSD1 expression vector, LSD1 open reading frame (ORF) was amplified and cloned into the pHR-TRE3G-KRAB-dCas9-P2A-mCherry to replace the KRAB domain. To generate the enCRISPRa sgRNA vector, the MCP-VP64-IRES-mCherry cassette was amplified from the pJZC34 vector (Addgene, #62331) and cloned into BsrGI/EcoRI digested lenti-sgRNA (MS2)-zeo backbone (Addgene, #61427). Then the mCherry cassette was replaced by puromycin or zsGreen1 by In-Fusion HD Cloning Kit (Clontech). To generate the enCRISPRi sgRNA vector, the KRAB sequence was amplified from the pLV hU6-sgRNA hUbC-dCas9-KRAB-T2A-Puro vector (Addgene, #71236) and cloned into the enCRISPRa sgRNA vector to replace VP64. The lentiviral shRNAs for PPARG (TRCN0000001673) and RXRA (TRCN0000330707) in pLKO.1 vector were obtained from Sigma-Aldrich.
Annotation of Blood CRE Repertories and Target Genes
We collected 72 independent H3K27ac ChIP-seq datasets from 17 studies (5, 7, 8, 16, 60–71) including normal (lymphocyte, monocyte, macrophage, erythroblast, and CD34+ HSPC) and diseased hematopoietic cell types (AML, erythroleukemia, T-ALL, T-lymphoma, and B-lymphoma; 51 normal and 21 disease samples; Supplementary Table S1). The erythroleukemia K562 cell line was grouped with AML and MDS samples in this study. ChIP-seq raw reads were aligned to the human (hg19) genome assembly using Bowtie2 (72) with –k 1. Only tags that uniquely mapped to the genome were used. Peaks were identified by MACS version 1.4.2 (73) with P value of 10−5, and ranked by fold enrichment and P value in each dataset. Then 150 bp upstream and downstream regions from summits of the top 30% ranked peaks were combined on the basis of subsets in normal and disease groups, respectively. To minimize the number of overlapping regions, the peaks within a distance of 200 to 400 bp were clustered and the overlapping peaks were then merged using Galaxy interval operation tools (https://usegalaxy.org/). In total, we defined 22,262 H3K27ac-associated CREs with an average size of 952 bp. For the genome-wide analysis, the nearest neighbor genes within 50 kb of a given gene-distal CRE were assigned as the CRE-associated genes. For CREs for which no gene was found within 50 kb, the associated genes for these CREs were denoted as NA. For the single gene locus (e.g., KRAS and PER2), the CREs were assigned to candidate target gene(s) by the following criteria: (i) the presence of CRE–promoter interactions based on Hi-C and/or ChIA-PET analyses; (ii) the expression of the candidate target gene(s) was significantly impaired upon CRISPR/Cas9-mediated KO of the candidate CREs. Human genome assembly (hg19) was used for all gene annotation analyses.
Targeted CRE Resequencing
Targeted resequencing of CREs was performed using Ovation Target Enrichment System (NuGEN) following the manufacturer's protocols. Briefly, probes were designed based on both sense and antisense DNA sequences of the targeted genomic region every 200∼250 bp. Genomic DNA (gDNA) from human primary leukemia samples, control samples, or leukemia cell lines were isolated by DNeasy Blood & Tissue Kit (Qiagen) and quantified. After fragmentation to approximately 500 bp using Covaris acoustic shearing following the manufacturer's protocol, 10∼500 ng of gDNA fragments were end-repaired and individually ligated with barcoded adaptors. Following purification using AMPure XP beads (Beckman Coulter), samples were combined for multiplexed target enrichment. During this step, the targeting probes were annealed and extended to create amplifiable DNA molecules. Following a second bead purification, qPCR was performed to determine the optimal number of PCR cycles to use in the library amplification step. After library amplification, the multiplexed, target enriched libraries were purified, quantified, and sequenced on an Illumina NextSeq 500 system using the 150 bp high-output sequencing kit. Initial data processing and alignments were performed using commonly used analytic tools. Specifically, for each sample, the FASTQ files were aggregated into single files for read 1 and 2. The adaptor and low-quality bases (quality < 20) were trimmed using trim_galore version 0.4.0 (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) with parameter –stringency 1 -e 0.1 –length 20. Bowtie version 2.1.0 (72) was used to align the read pairs for each sample. PCR duplications were marked using a customized pipeline. Then Picard (version 1.127; https://broadinstitute.github.io/picard/) was used to sort and convert SAM to BAM files. RealignerTargetCreator/IndelRealigner from The Genome Analysis Toolkit (GATK version 3.3; ref. 21) was used to realign the reads around insertions and deletions. BaseRecalibrator/PrintReads from GATK was used to perform base quality score recalibration. The resulting BAM files were then used as input for mutation discovery analysis.
Mutation Discovery and Validation
For tumor–normal pairs of primary leukemia samples, the SNVs were identified using GATK (21), MuTect version 1.1.4 (22), Strelka version 1.0.14 (23), and VarScan2 version 2.3.9 (25). The INDELs were identified using GATK, Strelka version 1.0.14, VarScan2 version 2.3.9, and Scalpel version 0.4.1 (24). MuTect version 1.1.4 and Strelka version 1.0.14 were run with default parameters. VarScan2 version 2.3.9 and Scalpel version 0.4.1 were run in somatic mode with the recommended filtering scheme. High-confidence (Tier I) somatic SNVs were defined as SNVs called by at least two out of four mutation callers (GATK, MuTect, Strelka, and VarScan2). High-confidence (Tier I) somatic INDELs were defined as INDELs called by at least two of four callers (GATK, Strelka, Scalpel, and VarScan2). Variants located in RepeatMasker regions were excluded from our analysis. RepeatMasker regions were downloaded from UCSC (https://genome.ucsc.edu/cgi-bin/hgTables). For all leukemia or lymphoma samples and cell lines, SNVs and INDELs were identified using GATK HaplotypeCaller. SNVs were filtered using –filterExpression “QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0.” INDELs were filtered using –filterExpression “QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0.” Common genetic variants included in dbSNP (build 138) were filtered. The resulting variants were identified as the Tier II somatic SNVs and INDELs. The variants in coding regions were annotated by Annovar version 2015-06-17 (74). Synonymous SNVs were excluded. To validate the overall targeted resequencing and mutation calling pipelines, we included 3 control samples including two normal CD34+ HSPCs from healthy donors and one mixed genomic DNA control (Ctrl-17) containing equal molar ratio of gDNA from 17 individual noncancer cell types.
To identify recurrently mutated CREs, we determined the number of independent samples using a binomial distribution as previously described (75) with modifications. Briefly, we assumed that the observed number of mutated samples, k, for a CRE followed a binomial distribution, binomial (n, pi), where n was the total number of samples with mutation data and pi was the estimated sample mutation rate for CRE i under the null hypothesis that the region was not recurrently mutated. We could therefore compute the following P value:
Here, we assumed that pi depended on the length Li of the CRE and the estimated nucleotide mutation rate qi for the region under the null hypothesis as follows:
The background mutation frequency qi was estimated by dividing the total number of observed mutations by the length of all CREs. By this analysis, we found that the probability of observing a 952 bpCRE (the average size of all CREs) harboring SNVs in ≥ 2 of 29 tumor–normal pairs, ≥ 2 of 28 leukemia samples, and ≥ 2 of 29 leukemia cell lines is 0.027. Similarly, the probability of observing a 952 bp CRE harboring INDELs in ≥ 1 of 29 tumor–normal pairs, ≥ 1 of 28 leukemia samples, and ≥ 1 of 29 leukemia cell lines is 0.098.
Identification of Noncoding Mutational Hotspots
Mutational hotspots were identified as described previously (75) with modifications. Specifically, all high-confidence Tier I mutations within 100 bp of each other were merged using BEDTools (76) into hotspot clusters until no cluster was found within 100 bp of another cluster. Clusters with only one or two mutations were removed from further consideration. A P value was calculated for each cluster using the negative binomial distribution, taking into account the length of the candidate hotspot cluster, the number of mutations in the cluster, and a background mutation rate for the cluster. The cluster background mutation rate was calculated as the mean of the background mutation probability for each sample that had a mutation represented in the cluster. The background mutation probability of each sample was calculated as the total number of mutations divided by the target region size. P values were adjusted for multiple testing with the multtest R package using the Benjamini-Hochberg method, and hotspot clusters were ranked accordingly. For the illustration of the top 100 recurrently mutated hotspots, Tier II noncoding mutations were included to compare mutations across different leukemia and lymphoma samples.
Motif Analysis
We scanned motifs around high-confidence somatic variants (Tier 1) at KRAS and PER2 enhancers using MotifScan tool (http://github.com/shao-lab/MotifScan). Briefly, to search for candidate motif targets in the given DNA sequences, the program scanned the sequences with a window of the same length as the motif, and defined raw motif score of the sequence S in the window as the ratio of the probability to observe target sequence S given the motif's Position Weight Matrix (PWM) M and the probability to observe S given the genome background B. For each annotated motif, we modeled the genome background distribution of motif scores by randomly sampling the genome 106 times, and defined the targets of this motif as the candidates with motif scores higher than the cutoff. The PWM matrix of motifs was downloaded from the Jaspar database (77). The enrichment of each motif was represented by the ratio of motif target densities at target regions compared to random control regions, together with a P value calculated from hypergeometric distribution.
ChIP and Data Analysis
ChIP-seq or ChIP-qPCR was performed as described previously (78) using antibodies for H3K27ac (Abcam, ab4729), PPARG (Santa Cruz Biotechnology, sc-271392), or RXRA (Santa Cruz Biotechnology, sc-515928) in HL60, K562, MKPL-1, MV4-11, THP1, and U937 cells, respectively. ChIP-seq libraries were generated using NEBNext Ultra II DNA library prep kit following the manufacturer's protocol (NEB), and sequenced on an Illumina NextSeq 500 system using the 75 bp high-output sequencing kit. ChIP-seq raw reads were aligned to the human hg19 genome assembly using Bowtie2 (72) with the default parameters. Only tags that uniquely mapped to the genome were used.
ATAC-seq and Data Analysis
ATAC-seq was performed in K562, MKPL-1, MV4-11, NB4, and U937 cells as described previously (79). ATAC-seq raw reads were trimmed to remove adaptor sequence and aligned to human genome assembly (hg19) using Bowtie2 (72) with default parameters. Only tags that uniquely mapped to the genome were used.
ChIA-PET and Hi-C Data Analysis
The RNAPII ChIA-PET data in K562 cells was downloaded from the Gene Expression Omnibus (GEO) with the accession number GSM970213. The in situ Hi-C data for K562 cells and CD34+ HSPCs were downloaded from GEO with accession numbers GSM1551618 and GSM2861708, respectively. The in situ Hi-C data for THP1 cells were downloaded from Juicebox (80). The processed bigwig or hic files by Juicer (80, 81) were visualized using epigenome browser (https://epigenomegateway.wustl.edu/) as ARC or heat map.
RNA Isolation and qRT-PCR Analysis
Total RNA was isolated using RNeasy Plus Mini Kit (Qiagen) following manufacturer's protocol. For quantitative RT-PCR (qRT-PCR) analysis, RNA was reverse-transcribed using iScript cDNA Synthesis Kit (Bio-Rad) following manufacturer's protocol. qRT-PCR was performed using the iQ SYBR Green Supermix (Bio-Rad) with CFX384 Touch Real-Time PCR Detection System (Bio-Rad). PCR amplification parameters were 95°C (3 minutes) and 45 cycles of 95°C (15 seconds), 60°C (30 seconds), and 72°C (30 seconds). Primer sequences are listed in Supplementary Table S5.
CRE Perturbation Screen
To generate inducible enCRISPRi and enCRISPRa stable cell lines, MKPL-1 cells were transduced with lentiviruses expressing dCas9-LSD1 (or dCas9-p300) and rtTA. Doxycycline was added following infection and flow cytometry was used to sort cells expressing mCherry and BFP. These cells were then grown in the absence of doxycycline until mCherry fluorescence returned to uninduced levels. 1,836 recurrently mutated CREs were selected for the perturbation screens. For each CRE, 4 or 5 sgRNAs were designed. Briefly, we divided each CRE into five equal segments, and then designed two or more sgRNAs in the segment closer to the ATAC-seq peak summit. All sgRNAs were designed and selected to minimize off-target cleavage based on publicly available filtering tools (http://crispr.mit.edu/). Negative control sgRNAs were also selected from the previous CRISPRi/a library (59). DNA oligonucleotide library synthesis was completed on a programmable microarray using a B3 Synthesizer (CustomArray). Full-length oligonucleotides (96 nt) were amplified for 15 cycles by PCR using Phusion High-Fidelity DNA Polymerase (referred to as PCR1). PCR2 was performed to remove barcodes and replace with enCRISPRa/i sgRNA vector homology sequences. The amplified sequences were purified for the Gibson assembly reaction. The enCRISPRi/a sgRNA vectors were digested by BsmBI (NEB), dephosphorylated, and purified. Gibson assembly was performed to generate the sgRNA library following manufacturer's protocol. Briefly, 1 μL of the Gibson Assembly reaction was added to 25 μL E.cloni 10G ELITE electrocompetent cells (Lucigen, 60052-4) and electroporated using Bio-Rad MicroPulser. The transformation was plated onto prewarmed 24.5 cm2 bioassay plates with ampicillin. The colonies were counted to calculate the library coverage (> 200×). All colonies were collected, and maxiprep was performed to isolate the sgRNA library. Lentiviruses were produced as described previously (5).
To perform high-throughput pooled enCRISPRi/a screening, the inducible enCRISPRi/a cell lines were infected with lentiviruses containing the sgRNA library at MOI < 0.5. Two days after infection, cells were selected with 1 μg/mL puromycin for 3 days, and then cultured in fresh medium for 48 hours. Cells were expanded to get enough coverage of sgRNAs (>1,000×) and gDNA were extracted (T0). The remaining cells were cultured in medium with doxycycline (1 μg/mL) at a density between 500,000 and 1,000,000 cells/mL to maintain the library coverage of at least 1,000 cells per sgRNA. Genomic DNA was harvested from all samples at day 28 (T28), and the sgRNA-containing regions were amplified by PCR and sequenced on an Illumina NextSeq 500 sequencer. All the primers are listed in Supplementary Table S5.
Three replicate experiments of each enCRISPRi or enCRISPRa screen were performed. sgRNA sequences were extracted from FASTQ files and matched to the sgRNA sequences of the enCRISPRi/a screen library. Reads of each sgRNA were counted and normalized to the total read counts for each sample. Pearson correlations between replicates were calculated using the log2-transformed sgRNA counts. Normalized sgRNA growth phenotypes were quantified as described previously (59). Specifically, the phenotype of each sgRNA was calculated by log2 transformed sgRNA ratio between end (T28) and starting time points (T0). To calculate phenotype relative to control samples, we further divided them by the median of negative control phenotypes. MAGeCK (82) was used to identify CREs positively or negatively enriched in each screen. We first determined the abundance of sgRNAs using MAGeCK “count” module in the raw FASTQ files. Then, we used MAGeCK “test” module to test for robust gene-level enrichment with the following arguments: mageck test -k path/to/count_file -t end_timepoint -c start_timepoint -n sample_name –normcounts-to-file –norm-method total –gene-lfc-method secondbest. We normalized sgRNA counts by total read counts in each sample and calculated log2 fold change (LFC) of each CRE using sgRNA with the second strongest LFC.
To validate enCRISPRi and enCRISPRa screen results, we cloned 14 individual CRE-specific sgRNAs for 5 candidate oncogenic CREs (CRE18528, CRE18532, CRE19253, CRE726, and CRE4399), 4 tumor-suppressive CREs (CRE16157, CRE9362, CRE11193, and CRE12661), and 5 other CREs (CRE17030, CRE4896, CRE2305, CRE13899, and CRE9836) into the MCP-KRAB-IRES-zsGreen1 (for enCRISPRi) and MCP-VP64-IRES-zsGreen1 (for enCRISPRa) vectors, respectively. Except for two CREs (CRE4399 and CRE12661), the other CREs were annotated promoters. Individual enCRISPRi and enCRISPRa experiments using CRE-specific (sgCRE) or control (sgGal4) sgRNAs were performed in AML (MKPL-1 and MOLM-13) and ALL (Jurkat) cells, respectively, followed by the analysis of gene expression of CRE-associated genes after 72 hours of doxycycline-induced enCRISPRi or enCRISPRa expression. All sgRNA and primer sequences are listed in Supplementary Table S5.
Negative Selection Competition Assay
Negative selection competition assays were performed to investigate the functional role of candidate CREs that may regulate the proliferation of MKPL-1 cells. Briefly, sgRNAs were cloned into the enCRISPRa or enCRISPRi sgRNA vectors containing a zsGreen1 reporter. Lentiviruses were produced to infect the enCRISPRa- or enCRISPRi-expressing MKPL-1 cells. Cells were analyzed for zsGreen1 expression 3 days after infection by flow cytometry. Then doxycycline (1 μg/mL) was added to the medium to induce the expression of dCas9-p300 or dCas9-LSD1. Flow cytometry was repeated every 4 days to measure the percentage of sgRNA-expressing cells (zsGreen1+) and the data were normalized to the starting time point. All sgRNA sequences are listed in Supplementary Table S5.
Enhancer KO by CRISPR/Cas9
Enhancer KO in MKPL-1 cells was performed following published protocols (83, 84) with modifications. Briefly, sgRNAs for site-specific cleavage of genomic targets were designed following described guidelines, and sequences were selected to minimize off-target cleavage based on publicly available filtering tools (http://crispr.mit.edu/). Oligonucleotides were annealed in the following reaction: 10 μmol/L guide sequence oligo, 10 μmol/L reverse complement oligo, T4 ligation buffer (1×), and 5U of T4 polynucleotide kinase with the cycling parameters of 37°C for 30 minutes, 95°C for 5 minutes, and then ramp down to 25°C at 5°C/minute. The annealed oligos were cloned into the pSpCas9(BB) (pX458) vector (Addgene, #48138) using a Golden Gate Assembly strategy including: 100 ng of circular pX458 plasmid, 0.2 μmol/L annealed oligos, buffer 2 (1×; NEB), 20 U of BbsI restriction enzyme, 0.2 mmol/L ATP, 0.1 mg/mL BSA, and 750 U of T4 DNA ligase (NEB) with the cycling parameters of 20 cycles of 37°C for 5 minutes, 20°C for 5 minutes; followed by 80°C incubation for 20 minutes. To induce deletions of enhancer DNA regions, CRISPR/Cas9 constructs were transfected into MKPL-1 cells by nucleofection using the ECM 830 Square Wave Electroporation System (Harvard Apparatus). Each construct was directed to flanking the target genomic regions. To enrich for deletion, the top 1% to 5% of GFP+ cells were sorted 48 to 72 hours post-transfection and plated in 96-well plates. Single cell–derived clones were isolated and screened for CRISPR-mediated deletion. PCR amplicons were analyzed by Sanger DNA sequencing to confirm nonhomologous end-joining–mediated repair upon double-strand break formation. The positive single cell–derived clones containing deletion of the targeted sequences were expanded and processed for analysis.
Enhancer Reporter Assays
The genomic sequences containing WT or mutant allele of the enhancers were amplified and cloned into the firefly luciferase reporter constructs pGL4.24[luc2P/minP]. The reporter constructs (1 μg) were cotransfected with pRL-SV40-Renilla luciferase constructs (50 ng) into 293T, K562, or MKPL-1 cells maintained in DMEM containing 10% charcoal/dextran-treated FBS (HyClone) using FuGENE 6 (Promega) or ECM 830 Square Wave Electroporation System (Harvard Apparatus) according to the manufacturer's protocols. The NR ligands were added to the medium 8 hours after transfection. Cells were maintained in DMEM containing 10% charcoal/dextran-treated FBS and harvested after 48 hours, and luciferase activity was measured by Dual-Luciferase Assay system (Promega). Briefly, cells were washed twice with PBS. Cell lysates were prepared by adding 100 μL of PLB to each well, followed by incubation for 15 minutes at room temperature. Cell lysates were transferred to white opaque 96-well microplates (three replicates for each sample), and firefly luminescence was measured after adding 30 μL LAR II. After adding an equal volume of Stop&Glo Reagent to the wells, the Renilla luminescence was measured. To determine whether the sensitivity to the mutated enhancers was dependent on PPARG and RXRA expression, shRNA lentiviruses were produced to transduce MKPL-1 cells. Puromycin selection (1 μg/mL) was performed 3 days after infection. Selected cells and control cells were maintained in DMEM containing 10% charcoal/dextran-treated FBS for 48 hours prior to electroporation, and the luciferase activity was measured by Dual-Luciferase Assay system as described above.
EMSA
EMSA was performed using a Thermo Fisher Scientific LightShift Chemiluminescent EMSA Kit following the manufacturer's instructions. Briefly, human PPARG and RXRA ORFs were amplified and cloned to the lentiviral vectors pLVX-EF1a-IRES-zsGreen1 (Clontech, #631982) and pLVX-EF1a-IRES-mCherry (Clontech, #631987), respectively. Lentiviruses were produced as described previously (5). MKPL-1 cells were then infected with both lentiviruses and zsGreen1+mCherry+ cells were sorted and expanded. Cell nuclear extracts were prepared using NE-PER Nuclear and Cytoplasmic Extraction Reagents (Thermo Fisher Scientific) according to the manufacturer's protocols. Unlabeled and biotin-labeled probes were synthesized and annealed on a PCR thermal cycler (Bio-Rad) by heating up the mixture at 95°C for 5 minutes and lowering the temperature to room temperature at a ramp of 0.1°C/second. EMSA reactions included 1× binding buffer, 50 ng/μL poly(dI-dC), 2.5% glycerol, 0.05% Nonidet P-40, 5 mmol/L MgCl2, 2 μL nuclear extracts, and 20 fM biotin-labeled probes. Specificity of mobility shifts was analyzed by including 8 pmol/L unlabeled WT or mutant PPAR–RXR competitor oligonucleotides. Supershift assays were performed by including 2 μg antibodies against PPARG (Santa Cruz Biotechnology, sc-7273X) and/or RXRA (Santa Cruz Biotechnology, sc-515929X). Reactions were incubated at room temperature for 20 minutes, size-separated on a 6% DNA retardation gel, and transferred to a charged nylon membrane (Hybond-XL, GE/Amersham Biosciences). Membranes were cross-linked at 120 mJ/cm2 using Spectrolinker XL-1500. Free or protein-bound biotin-labeled probes were detected using streptavidin-horseradish peroxidase conjugates and chemiluminescent substrates according to the manufacturer's protocols. Intensity of the resulting bands was measured by ImageJ. All probe sequences are listed in Supplementary Table S5.
Site-Directed Knock-In of Enhancer Variants
The sgRNAs for site-specific cleavage of genomic targets were cloned into the pSpCas9(BB) (pX458) vector (Addgene, #48138) using a Golden Gate Assembly strategy. To generate independent single cell–derived knock-in clones containing the noncoding variants, 5 μg sgRNA vector, 3 μmol/L ssDNA donor, and cell suspension were combined and electroporated using the ECM 830 Square Wave Electroporation System (Harvard Apparatus) according to the manufacturer's protocol. To enrich for knock-in cells, the top 1% to 5% of GFP+ cells were sorted 48 to 72 hours post-transfection and plated in 96-well plates. Single cell–derived clones were isolated and screened for CRISPR-mediated knock-in. Briefly, genomic DNA was extracted and amplified by specific genotyping primers. The PCR products were then subjected to NcoI (CRE4399) or BssSI (CRE12661) digestion, respectively. Positive clones were further validated by Sanger DNA sequencing. The validated single cell–derived knock-in clones were expanded and processed for subsequent analyses. Sequences of sgRNAs, ssDNA donors, and genotyping primers are listed in Supplementary Table S5.
To determine the effects of enhancer variants on the baseline and NR-induced target gene expression, the single cell–derived knock-in clones were maintained in DMEM containing 10% charcoal/dextran-treated FBS for 48 hours before treatment with DMSO or NR agonists (1 μmol/L of 9-cis-RA and 5 μmol/L rosiglitazone). Total RNA was isolated after 48 hours and qRT-PCR analysis was performed to measure KRAS and PER2 expression. To determine the effects of enhancer variants on the chromatin occupancy of NRs, the single cell–derived knock-in clones were maintained in DMEM containing 10% charcoal/dextran–treated FBS for 48 hours before the treatment with NR agonists (1 μmol/L 9-cis-RA and 5 μmol/L rosiglitazone). Cells were fixed after 48 hours and ChIP experiments were performed using antibodies for PPARG (Santa Cruz Biotechnology, sc-271392) and RXRA (Santa Cruz Biotechnology, sc-515928). The NR ChIP or input control DNA were PCR amplified and quantified for allele frequency by amplicon sequencing.
Quantification and Statistical Analysis
Statistical details including N, mean and statistical significance values are indicated in the text, figure legends, or Methods. Error bars in the experiments represent SEM or SD from either independent experiments or independent samples. All statistical analyses were performed using GraphPad Prism, and the detailed information about statistical methods is specified in figure legends or Methods.
Data and Software Availability
All raw and processed targeted CRE resequencing, ATAC-seq, and ChIP-seq data are available in GEO: GSE137656.
Disclosure of Potential Conflicts of Interest
M. Chen reports receiving a commercial research grant from ALAB, has received speakers bureau honoraria from EUSA Pharma, and is a consultant/advisory board member for ALAB/ZIWEI. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: K. Li, Y. Zhang, X. Liu, Z. Gu, M. Ni, J. Xu
Development of methodology: K. Li, Y. Zhang, X. Liu, Y. Liu, Z. Gu, H. Cao, M. Chen, W. Chen, M. Ni, J. Xu
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): K. Li, X. Liu, Z. Gu, J. Xu
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K. Li, Y. Zhang, X. Liu, Y. Liu, K.E. Dickerson, M. Ni, J. Xu
Writing, review, and/or revision of the manuscript: K. Li, Y. Zhang, Y. Liu, K.E. Dickerson, M. Ni, J. Xu
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): K. Li, Y. Zhang, X. Liu, Z. Shao, J. Xu
Study supervision: M. Ni, J. Xu
Acknowledgments
We thank Liqiang Wang, Yi Du, and David Trudgian at UTSW BioHPC for assistance, Lin Li at CRI for technical support, Jing Zhang at University of Wisconsin-Madison for discussion, and other Xu laboratory members for helpful discussion and technical support. K. Li and Y. Liu were supported by the Cancer Prevention and Research Institute of Texas (CPRIT) training grants (RP160157). X. Liu was supported by the American Heart Association postdoctoral fellowship (18POST34060219). J. Xu is a Scholar of The Leukemia & Lymphoma Society (LLS) and an American Society of Hematology (ASH) Scholar. This work was supported by the NIH grants R01DK111430 and R01CA230631, by the Leukemia Texas Foundation research award, by CPRIT grants (RR140025, RP180504, RP180826, and RP190417), and by the Welch Foundation grant I-1942 (to J. Xu).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.