Abstract
Acute myeloid leukemia (AML) with complex karyotype is characterized by high genomic complexity, including frequent TP53 mutations and chromothripsis. Genomic rearrangements can reposition active enhancers near proto-oncogenes, leading to their aberrant expression; however, a comprehensive understanding of these events in AML is still incomplete. To facilitate the discovery of such “enhancer hijacking” events, we developed Pyjacker, a computational tool, and applied it to 39 AML samples with complex karyotype. Pyjacker identified several enhancer hijacking events in AML patient samples, including aberrant expression of MNX1, which can result from del(7)(q22q36) and is associated with hijacking of a CDK6 enhancer. MNX1 activation occurred in 1.4% of patients with AML and showed significant co-occurrence with BCOR mutations. Through a xenograft mouse model, we demonstrated that MNX1 is required for leukemia cell fitness. Pyjacker is an easy-to-use, accurate, and broadly applicable tool for identifying consequences of genomic events driving tumorigenesis, especially when germline genomic data are missing.
This study examines the consequences of structural alterations in AML and demonstrates that proto-oncogene activation by enhancer hijacking is an understudied pathomechanism. MNX1 overexpression demonstrates that deletions on chromosome 7q can not only lead to haploinsufficiency but also to activation of oncogenes by enhancer hijacking.
Introduction
Acute myeloid leukemia (AML) is a disease characterized by a block in differentiation and uncontrolled proliferation of myeloid progenitor cells. AML is a very heterogeneous disease and has been divided into several subgroups based on recurrent cytogenetic alterations [e.g., t(15;17)(q24.1;q21.2), inv(16)(p13.1q22), or t(8;21)(q22;q22.1)] and mutations (e.g., in NPM1, TP53, or CEBPA; refs. 1–3). Complex karyotype AML (ckAML) is a subtype with dismal prognosis, and there is currently an incomplete understanding of the pathogenetic mechanisms driving this disease (4). ckAML is defined by the presence of at least three cytogenetic alterations in the absence of any of the recurrent class-defining lesions. It accounts for 10% to 12% of all AML cases and is more frequent among older patients (4). ckAML samples often harbor TP53 mutations, which are associated with a high frequency of chromothripsis, defined as the shattering of certain chromosomes and refusion in random order, resulting in highly rearranged chromosomes with loss of chromosomal material (5–7). Deletions in ckAML are more frequent than gains, and the most common deletions affect chromosome arms 5q, 7q, 17p, and 12p, whereas gains mostly occur on 8q, 11q, and 21q (4, 8, 9). According to Knudson’s two-hit hypothesis, deletions in cancer usually lead to the complete inactivation of a tumor-suppressor gene whose other copy is also inactivated, for example, by a mutation. However, apart from TP53 on 17p, the search for tumor-suppressor genes with both copies inactivated in ckAML has been unsuccessful (4), and the current paradigm is that copy number alterations (CNA) in ckAML lead to gene dosage effects driving tumorigenesis (10), in which a higher or lower gene copy number results in a higher or lower gene expression, respectively.
Deletions of chromosomal segments on 7q are one of the most common structural alterations in AML, occurring in 10% of patients (2, 11). 7q deletions are frequently seen in ckAML but can also be found as a sole abnormality, in which it is still associated with a poor prognosis (12). The clustering of these deletions in certain regions on 7q has been used for more than 20 years as an indication for the presence of a tumor-suppressor gene within the minimally deleted region. However, the search for a gene with a second (epi)genetic hit has not been successful (13). Consequently, the most plausible explanation for these highly recurrent clustered deletions is that they lead to haploinsufficiency of genes in the deleted region, in which a lower copy number results in reduced gene expression, and that this haploinsufficiency is sufficient to drive cancer. Of note, many haploinsufficient genes located in the deleted regions of 7q encode enzymes that regulate genome-wide epigenetic patterns or transcription factors such as CUX1, EZH2, KMT2C, or KMT2E (13–15).
In addition to CNAs, structural variants (SV) can create fusion proteins or remove or create new enhancer–promoter interactions. For example, 5% of all AML cases harbor an inv(3)(q21q26.2) or a t(3;3)(q21;q26.2), which repositions the GATA2 enhancer in close vicinity of MECOM, leading to aberrant MECOM expression and GATA2 haploinsufficiency (16). A few other genes have been reported to be activated by enhancer hijacking in AML, including BCL11B in acute leukemias with a mixed phenotype (17) and MNX1 in pediatric AML with t(7;12)(q36;p13) (18, 19). Because ckAML samples harbor many, often cytogenetically cryptic, genomic rearrangements, we hypothesized that some of them could lead to enhancer hijacking events, activating yet-undiscovered oncogenes.
Recently, several computational methods have been developed to search for genes activated by enhancer hijacking. CESAM (20), SVExpress (21), and HYENA (22) perform a linear regression of gene expression depending on the presence of breakpoints nearby. These methods have successfully identified genes recurrently activated by enhancer hijacking, but they cannot detect genes activated in only a few samples. cis-X (23) can detect enhancer hijacking events in single samples using monoallelic expression, but this method is not very flexible and requires matched normal samples, which are rarely available for AML samples. NeoLoopFinder (24) follows a very different approach: it detects neo-loops in HiC data and does not use gene expression.
In this study, we developed a new tool, “Pyjacker,” which detects putative enhancer hijacking events occurring in single samples using RNA sequencing (RNA-seq) and whole-genome sequencing (WGS) without matched normal samples. We applied Pyjacker to 39 ckAML samples using WGS and RNA-seq and identified genes known to be activated by enhancer hijacking as well as candidate genes that, to the best of our knowledge, have been previously overlooked. We focused on MNX1, a gene encoding a homeobox transcription factor, which is mapped to chromosome band 7q36.3, that is located outside of the most commonly deleted regions found in AML with del(7q). We profiled 31 MNX1-expressing cases with WGS and discovered that del(7q) can lead to hijacking of the CDK6 enhancer driving MNX1 expression, resulting in a shared gene expression profile with pediatric AML with MNX1 activation. We showed that MNX1 knockdown reduces leukemic cell fitness in patient-derived xenograft (PDX) competition assays, demonstrating its essentiality.
Results
Pyjacker: Detection of Enhancer Hijacking with WGS and RNA-seq
We developed Pyjacker, a computational method to detect enhancer hijacking events occurring in single samples using WGS, RNA-seq, and enhancer information, without the need for matched normal samples (Supplementary Table S1). The aim of Pyjacker is to detect rearrangements that lead to a very strong overexpression of a gene that is not typically expressed or expressed only at a low level in the wild-type (WT) state. Detecting events leading to more moderate effects would not be feasible in single samples. For each gene, samples are divided into “candidate samples” which have breakpoints near the gene and “reference samples” which do not (see “Methods” section for details). Reference samples are used to compute the mean and standard deviation (SD) of the expression of this gene in the absence of enhancer hijacking, and candidate samples are tested for overexpression compared with this reference distribution (Fig. 1A). If a gene is activated by enhancer hijacking, we would expect most of the expression to come from the rearranged allele. Heterozygous SNPs are identified in the WGS data, and if these SNPs are covered in the RNA-seq data, Pyjacker tests whether the expression is mostly monoallelic (Fig. 1A). Using the breakpoint information and a list of putative enhancers, Pyjacker identifies enhancers coming close to the gene and scores the event depending on the strength of the enhancers coming close to the gene. As enhancers are cell type–specific, we used in this study chromatin immunoprecipitation sequencing (ChIP-seq) data against H3K27ac and P300 from myeloid cell lines (Supplementary Table S2) because these marks are found on active enhancers (25, 26). This enhancer information can be omitted if it is not available. The overexpression, monoallelic expression, and enhancer scores are combined into an empirical score which reflects how likely the gene is to be expressed because of a genomic rearrangement. The scores are aggregated across samples for each gene in order to give more weight to the recurrently activated genes. To estimate the FDR, “null scores” are computed by only including the “reference samples” and randomly assigning some of them to the “candidate samples,” thus reflecting the distribution of scores in the absence of enhancer hijacking. Finally, the Benjamini–Hochberg method is used to correct for multiple testing and provides a ranked list of genes putatively activated by a structural rearrangement, with corresponding FDRs. Pyjacker is flexible, and we provide an end-to-end Nextflow pipeline to run Pyjacker, starting from BAM files. We note that fusion transcripts can also result in monoallelic overexpression, when the 3′ fusion partner is not normally expressed, although this would be a different mechanism than enhancer hijacking. Various methods can be used to detect fusion transcripts from RNA-seq data, like STAR-Fusion (27) or Arriba (28). If a list of fusions generated by these methods is given as input to Pyjacker, it will annotate candidate genes with the fusion status, allowing the identification of true enhancer hijacking events. Because Pyjacker needs reference samples without breakpoints near a gene to estimate the reference expression distribution, it should be run with at least 10 samples as input but works best with large cohorts. We tested Pyjacker on two existing datasets with known enhancer hijacking events: 10 AML cell lines and 120 medulloblastoma samples (29, 30). Pyjacker identified known events, like the activation of MECOM (16), MNX1 (31), and MN1 (32) in some AML cell lines and of GFI1, GFI1B, and PRDM6 in some medulloblastoma samples, as previously reported by Northcott and colleagues (refs. 29, 30; Supplementary Tables S3–S5). cis-X also identified GFI1, GFI1B, and PRDM6, but these events did not particularly stand out among the many candidate genes reported, whereas they were all among the top 10 genes identified by Pyjacker due to Pyjacker’s aggregation of scores across samples, which gives more weight to recurrently activated genes.
Detection of enhancer hijacking in 39 ckAML samples. A, Schematic representation of the main sources of information used by Pyjacker: breakpoints, overexpression, monoallelic expression, and enhancers. B, Scatter plot of genes identified by Pyjacker in 39 ckAML samples as being potentially activated by genomic rearrangements in one or more samples, in which the x-axis shows the genomic location of the genes, and the y-axis shows the FDR. Gene names for the enhancer hijacking candidates are written in bold, and if a fusion transcript is detected, the fusion partner is named. chr, chromosome.
Detection of enhancer hijacking in 39 ckAML samples. A, Schematic representation of the main sources of information used by Pyjacker: breakpoints, overexpression, monoallelic expression, and enhancers. B, Scatter plot of genes identified by Pyjacker in 39 ckAML samples as being potentially activated by genomic rearrangements in one or more samples, in which the x-axis shows the genomic location of the genes, and the y-axis shows the FDR. Gene names for the enhancer hijacking candidates are written in bold, and if a fusion transcript is detected, the fusion partner is named. chr, chromosome.
Putative Enhancer Hijacking Events in 39 ckAML Samples
We profiled 39 ckAML samples with WGS and RNA-seq. These were diagnostic blood or bone marrow (BM) samples from patients enrolled in the ASTRAL-1 clinical trial which included older patients with AML (median age: 77 years, Supplementary Table S6; ref. 33). These samples carried some of the alterations most frequently found in ckAML (34), including biallelic TP53 alterations (64%, N = 25), del(7q) (69%, N = 27), del(5q) (67%, N = 26), and chromothripsis (43%, N = 17; Supplementary Fig. S1; Supplementary Tables S6–S10).
Pyjacker was applied to these 39 samples and detected 19 candidate genes with an FDR < 20% (Fig. 1B; Supplementary Table S11). Among them were many of the genes which had previously been reported to be activated by enhancer hijacking in AML, including MECOM (two samples), MNX1 (one sample), and BCL11B (one sample). In addition, Pyjacker identified several genes that had not been reported before and which represent interesting candidate oncogenes to be verified in future studies. For 9 of the 19 genes, no fusion transcript was detected, suggesting enhancer hijacking as the underlying activation mechanism: MECOM, MNX1, BCL11B, SLC22A10, EPO, ISM2, GSX2, CLEC10A and P2RY12. In order to evaluate how recurrent the upregulation of these genes is in AML, we used datasets from The Cancer Genome Atlas (TCGA)-LAML (1), BEAT-AML (35), and TARGET-AML (36) cohorts. We found that most of the genes identified by Pyjacker were recurrently overexpressed in these other AML cohorts, albeit at low frequencies (Supplementary Fig. S2). However, some genes were not found overexpressed in these three other AML cohorts, which suggests either that their activation is a very rare event in AML, that they are false positives, or that their overexpression in our cohort was a passenger event of chromothriptic rearrangements. For example, the activations of TEKT1 (in 16PB3075) and of SLC22A10 (in 15KM20146) were due to complex rearrangements which also contained SVs within TP53 (Supplementary Fig. S3A–S3F). Thus, these rearrangements might have been selected for because of the TP53 disruption rather than TEKT1 or SLC22A10 activation.
Activation of MECOM and Its Homolog PRDM16 by the GATA2 Enhancer
The only gene identified by Pyjacker in more than one sample from this cohort was MECOM, found to be monoallelically overexpressed in two samples (Fig. 2A and B; Supplementary Fig. S4A–S4C). In both cases, the rearrangements were more complex than those found in samples with inv(3) or t(3;3) AML, which are the most frequent rearrangements responsible for MECOM activation. One sample had chromothripsis on chromosome 3 (Fig. 2C), whereas the other one had several rearrangements between chromosome 3 and chromosome 14 (Supplementary Fig. S4A). Even though these rearrangements were very complex, they still resulted in the juxtaposition of MECOM to a GATA2 enhancer (next to RPN1) harboring enhancer marks in myeloid cell lines (Fig. 2D), which is the same enhancer that activates MECOM in the more common inv(3) and t(3;3) (16). Interestingly, the GATA2 enhancer was also reported by Pyjacker to activate PRDM16 in another sample (16KM11270) through a translocation t(1;3)(p36;q21) (Fig. 2E–G). PRDM16 is a homolog of MECOM (also known as PRDM3; ref. 37), and they are both H3K9me1 methyltransferases (38), so their overexpression could play a similar role in AML. This t(1;3) translocation has been reported before as a rare event (37), and PRDM16 has also recently been reported to be overexpressed as a result of a rare t(1;2)(p36;p21) translocation (39). Even though the expression of PRDM16 was monoallelic in this sample (Fig. 2F), which is a strong indicator of activation by enhancer hijacking, the FDR reported by Pyjacker was high (47%) because several samples without breakpoints near PRDM16 had a higher expression than this sample (Fig. 2E). MECOM is also expressed in samples without breakpoints nearby (40), although to a lesser extent, suggesting an additional activation mechanism for MECOM and PRDM16 besides enhancer hijacking.
Activation of MECOM and its homolog PRDM16 by a GATA2 enhancer. A, Expression of MECOM in all samples in TPM, ranked by expression of MECOM, in which samples 15PB19457 and 15KM20146 with breakpoints near MECOM are highlighted in green. B, Variant allele frequencies in WGS (DNA) and RNA-seq for SNPs in MECOM for sample 15PB19457 (major allele frequencies in blue and minor allele frequencies in red). C, Copy numbers (CN) and SVs on chromosome 3 for sample 15PB19457. Copy number losses are indicated in blue, and gains in red. SVs are shown as arcs at the top, in which the color indicates the orientation of the breakpoint: blue for deletion, red for duplication, and purple for inversion. In the chromosome ideogram, the three regions that are displayed with a zoom-in in (D) are highlighted in colors, with colors matching the arrows in (D). D, ChIP-seq tracks for P300 and H3K27ac in the myeloid cell lines MOLM-1 and Kasumi-1 in the region around MECOM for the rearranged chromosome of sample 15PB19457. The putative enhancer is highlighted in orange. E, Expression of PRDM16 in all samples, ranked by PRDM16 expression, in which sample 16KM11270 with a breakpoint near PRDM16 is highlighted in green. F, Variant allele frequencies in WGS (DNA) and RNA-seq for SNPs in PRDM16 in TPM for sample 16KM11270 (major allele frequency in blue and minor allele frequency in red). G, ChIP-seq tracks for P300 and H3K27ac in the myeloid cell lines MOLM-1 and Kasumi-1 in the region around PRDM16 on the rearranged chromosome of sample 16KM11270. The putative enhancer is highlighted in orange. chr, chromosome.
Activation of MECOM and its homolog PRDM16 by a GATA2 enhancer. A, Expression of MECOM in all samples in TPM, ranked by expression of MECOM, in which samples 15PB19457 and 15KM20146 with breakpoints near MECOM are highlighted in green. B, Variant allele frequencies in WGS (DNA) and RNA-seq for SNPs in MECOM for sample 15PB19457 (major allele frequencies in blue and minor allele frequencies in red). C, Copy numbers (CN) and SVs on chromosome 3 for sample 15PB19457. Copy number losses are indicated in blue, and gains in red. SVs are shown as arcs at the top, in which the color indicates the orientation of the breakpoint: blue for deletion, red for duplication, and purple for inversion. In the chromosome ideogram, the three regions that are displayed with a zoom-in in (D) are highlighted in colors, with colors matching the arrows in (D). D, ChIP-seq tracks for P300 and H3K27ac in the myeloid cell lines MOLM-1 and Kasumi-1 in the region around MECOM for the rearranged chromosome of sample 15PB19457. The putative enhancer is highlighted in orange. E, Expression of PRDM16 in all samples, ranked by PRDM16 expression, in which sample 16KM11270 with a breakpoint near PRDM16 is highlighted in green. F, Variant allele frequencies in WGS (DNA) and RNA-seq for SNPs in PRDM16 in TPM for sample 16KM11270 (major allele frequency in blue and minor allele frequency in red). G, ChIP-seq tracks for P300 and H3K27ac in the myeloid cell lines MOLM-1 and Kasumi-1 in the region around PRDM16 on the rearranged chromosome of sample 16KM11270. The putative enhancer is highlighted in orange. chr, chromosome.
Aberrant EPO Expression and EPO Receptor Amplification in Acute Erythroleukemia
Among the genes identified by Pyjacker, an interesting candidate was EPO. To our knowledge, this gene has never been reported to be activated by enhancer hijacking in human leukemias, although it has been found to be overexpressed due to genomic rearrangements in a mouse model of erythroleukemia (41, 42). EPO is not expressed in normal hematopoietic cells, but it is instead produced in the kidneys when blood oxygen levels are low, and it stimulates red blood cell proliferation by binding to its receptor [EPO Receptor (EPOR)] and activating the JAK/STAT pathway (43–45). Because EPO promotes survival, proliferation, and differentiation of erythroid progenitor cells (46), it may drive acute erythroleukemia (AEL), a rare subtype of AML enriched for complex karyotypes. In this ckAML cohort, the AEL sample 15KM18875 had high EPO expression (Fig. 3A). Although no samples from the TCGA-LAML, BEAT-AML, and TARGET-AML cohorts expressed EPO, we found that among three AEL cohorts profiled with RNA-seq (47–49), one sample from each cohort expressed EPO (Fig. 3B), indicating that EPO expression is a rare but recurrent event in AEL. In sample 15KM18875, a 100 kb region on chromosome 7 around EPO was duplicated and fused with a region on chromosome 11 (Fig. 3C) such that an extrachromosomal circular DNA (eccDNA) was formed (Fig. 3D). eccDNAs are rather common in cancer, but they are often amplified, whereas sample 15KM18875 displayed an average copy number of less than one eccDNA per cell. This eccDNA is therefore subclonal, but it is unclear whether most cells have one copy or whether a small percentage of cells contain numerous copies. The chromosome 11 portion of the eccDNA contains a putative enhancer with P300 and H3K27ac peaks in the leukemic cell line K562 with erythroid features (50), so this enhancer might be responsible for the activation of EPO in this sample. In addition to high EPO expression, we also observed very high expression of the EPOR in 15KM18875 (Fig. 3E), which was due to a massive amplification of EPOR on chromosome 19 (Fig. 3F). Chromosome 19 harbored patterns of chromothripsis, as well as foldback inversions, suggesting that the amplifications were caused by breakage–fusion–bridge cycles (51). Rearrangements of EPOR are well-known in acute lymphoblastic leukemia (52), and amplification of EPOR has recently been reported as a recurrent driver event in AEL (49). High EPOR expression could make the cells very sensitive to EPO, thus increasing the fitness advantage provided by endogenous EPO expression by the leukemic cells. In both the Iacobucci and colleagues (47) and Fagnan and colleagues (48) cohorts, the sample with EPO expression also had outlier high EPOR expression, indicating that EPO is recurrently overexpressed together with EPOR.
Aberrant EPO expression might cooperate with EPOR amplification in AEL. A,EPO expression in all samples in TPM, with sample 15KM18875 with EPO overexpression highlighted in green. B, Proportion of samples with nonzero EPO expression in three AEL cohorts profiled with RNA-seq (47–49). C, Copy numbers (CN) and SVs on chromosome 7 (containing EPO) and chromosome 11 in sample 15KM18875. Copy number losses are indicated in blue, and gains in red. SVs are shown at the top, with arcs connecting breakpoints or lines indicating the chromosome of the other side of the breakpoint (for C, D, and F the colors of SVs indicate the orientation: blue for deletion, red for duplication, purple for inversion, and green for interchromosomal SV). D, A 300 kb circular piece of DNA containing EPO and a putative enhancer (highlighted in orange), with P300 and H3K27ac peaks in the erythroid cell line K562. E,EPOR expression in TPM in all samples, with sample 15KM18875 highlighted in green. F, Copy numbers and SVs on chromosome 19 for sample 15KM18875. chr, chromosome.
Aberrant EPO expression might cooperate with EPOR amplification in AEL. A,EPO expression in all samples in TPM, with sample 15KM18875 with EPO overexpression highlighted in green. B, Proportion of samples with nonzero EPO expression in three AEL cohorts profiled with RNA-seq (47–49). C, Copy numbers (CN) and SVs on chromosome 7 (containing EPO) and chromosome 11 in sample 15KM18875. Copy number losses are indicated in blue, and gains in red. SVs are shown at the top, with arcs connecting breakpoints or lines indicating the chromosome of the other side of the breakpoint (for C, D, and F the colors of SVs indicate the orientation: blue for deletion, red for duplication, purple for inversion, and green for interchromosomal SV). D, A 300 kb circular piece of DNA containing EPO and a putative enhancer (highlighted in orange), with P300 and H3K27ac peaks in the erythroid cell line K562. E,EPOR expression in TPM in all samples, with sample 15KM18875 highlighted in green. F, Copy numbers and SVs on chromosome 19 for sample 15KM18875. chr, chromosome.
The Homeobox Genes GSX2 and MNX1 Can Be Activated by Atypical Rearrangements
Among the top Pyjacker hits were two homeobox genes, GSX2 and MNX1, which were overexpressed in samples 16PB5693 and 15PB8708, respectively. Both samples have breakpoints near the respective genes, and in sample 15PB8708, heterozygous SNPs in MNX1 confirmed monoallelic expression (Fig. 4A–C). Homeobox genes are often upregulated in AML (53), so the activation of homeobox genes by enhancer hijacking could be a driver event. Both GSX2 and MNX1 are known to be activated by rare but recurrent translocations to the ETV6 locus: GSX2 by t(4;12)(q11-q12;p13) in adult AML (54) and MNX1 by t(7;12)(q36;p13) in pediatric AML (19). In this study, however, GSX2 and MNX1 were activated by atypical mechanisms. Sample 16PB5693 was affected by a chromothripsis event involving multiple chromosomes, and several genomic segments, including GSX2, were amplified (Fig. 4D). In the WT state, the putative enhancer is located less than 1 Mb away from GSX2 but in a different topologically associating domain (TAD; Fig. 4E). In sample 16PB5693, a deletion removed the TAD boundary, which likely enabled GSX2 to interact with the enhancer. In addition to GSX2 upregulation, the recurrent t(4;12) translocation frequently leads to PDGFRA activation and to an ETV6::CHIC2 fusion transcript (55). Sample 16PB5693 only had GSX2 expression without PDGFRA expression and without fusion transcript, suggesting that GSX2 expression is the driving event. In sample 15PB8708, a 230 kb segment in the CDK6 region, containing two putative enhancers, was duplicated and inserted next to MNX1 (Fig. 4F and G). The breakpoints were verified by genomic PCR (Supplementary Fig. S5A–S5C; Supplementary Table S12). This hematopoietic super-enhancer has already been reported to be involved in enhancer hijacking events in AML, activating BCL11B (17) or EVI1 (56). MNX1 was expressed in a rather high proportion of the TCGA-LAML and BEAT-AML cohorts (2/179 and 17/707 samples with MNX1 expression, respectively), and in some cases, the karyotype contained rearrangements near MNX1 on 7q36 [del(7)(q21q36) for TCGA-AB-2847, del(7)(q22q36) for BA2921, and t(7;7)(q22;q36) for BA2802], indicating that MNX1 expression could be due to enhancer hijacking in some of these samples.
The homeobox genes GSX2 and MNX1 can be activated by atypical mechanisms. A,GSX2 expression in all samples in TPM, with sample 16PB5693 with GSX2 expression highlighted in green. B,MNX1 expression in all samples in TPM, with sample 15PB8708 with MNX1 overexpression highlighted in green. C, Variant allele frequencies in WGS and RNA-seq for an SNP in MNX1 in sample 15PB8708 (major allele frequency in blue and minor allele frequency in red). D, Circos plot showing CNAs and SVs in sample 16PB5693 for the chromosomes involved in a chromothripsis event. Copy number (CN) losses are indicated in blue, and gains in red. SVs are shown as arcs at the center, with interchromosomal breakpoints in green, duplications in red, deletions in blue, and inversion in purple. E, HiC data from HSPCs (19) and ChIP-seq data from myeloid cell lines in the region around GSX2. The putative enhancer is highlighted in orange, and the region in gray is deleted in sample 16PB5693. F, CNs and breakpoints on chromosome 7 for sample 15PB8708. In the chromosome ideogram, regions highlighted in red and teal correspond to the regions shown in (G), with matching colors. G, ChIP-seq tracks for P300 and H3K27ac in the myeloid cell lines MOLM-1 and Kasumi-1 in the region around MNX1 on the rearranged chromosome of sample 15PB8708. Enhancers of the CDK6 region are highlighted in orange. chr, chromosome.
The homeobox genes GSX2 and MNX1 can be activated by atypical mechanisms. A,GSX2 expression in all samples in TPM, with sample 16PB5693 with GSX2 expression highlighted in green. B,MNX1 expression in all samples in TPM, with sample 15PB8708 with MNX1 overexpression highlighted in green. C, Variant allele frequencies in WGS and RNA-seq for an SNP in MNX1 in sample 15PB8708 (major allele frequency in blue and minor allele frequency in red). D, Circos plot showing CNAs and SVs in sample 16PB5693 for the chromosomes involved in a chromothripsis event. Copy number (CN) losses are indicated in blue, and gains in red. SVs are shown as arcs at the center, with interchromosomal breakpoints in green, duplications in red, deletions in blue, and inversion in purple. E, HiC data from HSPCs (19) and ChIP-seq data from myeloid cell lines in the region around GSX2. The putative enhancer is highlighted in orange, and the region in gray is deleted in sample 16PB5693. F, CNs and breakpoints on chromosome 7 for sample 15PB8708. In the chromosome ideogram, regions highlighted in red and teal correspond to the regions shown in (G), with matching colors. G, ChIP-seq tracks for P300 and H3K27ac in the myeloid cell lines MOLM-1 and Kasumi-1 in the region around MNX1 on the rearranged chromosome of sample 15PB8708. Enhancers of the CDK6 region are highlighted in orange. chr, chromosome.
MNX1 Is Expressed in 1.4% of all AML Cases, Often with del(7)(q22q36)
To estimate the frequency of aberrant MNX1 expression in AML cases, we performed an unbiased qRT-PCR screen of three different AML cohorts (Rotterdam, Ulm, and Jena; Fig. 5A). In a total of 2,293 cases across five cohorts [three qRT-PCR cohorts and public RNA-seq data from TCGA-LAML (1) and BEAT-AML (35)], we estimated the frequency of MNX1-expressing samples to be 1.4% of all AML cases (Supplementary Table S13). We also screened del(7q) and ckAML cases and found a higher proportion of MNX1-expressing samples in these selected groups [8.70% in del(7q) and 2% in ckAML; Supplementary Table S13].
MNX1 is expressed in 1.4% of all AML cases, often with del(7)(q22q36). A, qRT-PCR screen for MNX1 expression in three AML cohorts (Rotterdam, Ulm, and Jena). B, Fifteen MNX1-expressing samples with del(7)(q22q36) profiled with WGS, with a zoom-in around the breakpoints (hg19 reference). The blue rectangles indicate the genomic regions that are retained, and dashed lines represent breaks. C, Percentage of samples with mutations in frequently mutated genes for MNX1+ samples with breakpoints near MNX1, MNX1+ samples without breakpoints, and TCGA-LAML samples. D, scRNA-seq analysis for MNX1+ and control del(7q) AML samples. Left, UMAP showing cell type labels of 53,479 cells integrated across eight patients. Right, UMAP highlighting MNX1 expression (top) and the presence of a del(7q) (bottom) as predicted for patients with del(7q) (n = 4) and patients with del(7q) and MNX1 activation (n = 4). chr, chromosome; UMAP, Uniform Manifold Approximation and Projection.
MNX1 is expressed in 1.4% of all AML cases, often with del(7)(q22q36). A, qRT-PCR screen for MNX1 expression in three AML cohorts (Rotterdam, Ulm, and Jena). B, Fifteen MNX1-expressing samples with del(7)(q22q36) profiled with WGS, with a zoom-in around the breakpoints (hg19 reference). The blue rectangles indicate the genomic regions that are retained, and dashed lines represent breaks. C, Percentage of samples with mutations in frequently mutated genes for MNX1+ samples with breakpoints near MNX1, MNX1+ samples without breakpoints, and TCGA-LAML samples. D, scRNA-seq analysis for MNX1+ and control del(7q) AML samples. Left, UMAP showing cell type labels of 53,479 cells integrated across eight patients. Right, UMAP highlighting MNX1 expression (top) and the presence of a del(7q) (bottom) as predicted for patients with del(7q) (n = 4) and patients with del(7q) and MNX1 activation (n = 4). chr, chromosome; UMAP, Uniform Manifold Approximation and Projection.
We performed WGS on 23 MNX1-expressing primary AML samples (whole blood or BM) taken at diagnosis, which we combined with WGS data of eight samples provided by the Munich Leukemia Laboratory (MLL), resulting in a total of 31 MNX1-expressing samples profiled with WGS. The data for the eight samples from the MLL were processed with the MLL pipeline as previously described (57), whereas the 23 other samples were processed in the same way as the 39 ckAML samples described in this article. Fifteen samples had a large del(7)(q22q36) starting within CDK6 and ending before MNX1 (Fig. 5B; Supplementary Table S14), indicating that MNX1 could be activated by an enhancer in the CDK6 region in those samples. Interestingly, this is the same region that is duplicated and inserted next to MNX1 in sample 15PB8708 (Fig. 4F and G). Four samples had other rearrangements near MNX1, including a smaller del(7q) between the T-cell receptor β locus and MNX1 (Supplementary Figs. S6A–S6D, S7A, and S7B), which supports the notion that other enhancers apart from CDK6 might activate MNX1. Indeed, we had previously found a MYB enhancer in GDM-1 cells (31) and an ETV6 enhancer in t(7;12)(q36;p13) pediatric AML (19) to drive aberrant MNX1 expression. Twelve samples had no rearrangements near MNX1, suggesting that MNX1 may also be activated through other mechanisms.
Samples with MNX1 rearrangements had a unique mutational spectrum with an absence of NPM1 and FLT3 mutations (0/19), as well as a very high frequency of BCOR mutations (10/19), which are usually rare in AML (2/200 in TCGA-LAML), although they have recently been reported to have a frequency of about 10% in AML with del(7q) (Fig. 5C; Supplementary Table S15; ref. 11). BCOR mutations were accompanied by BCORL1 (2/10) and NCOR2 (1/10) mutations indicating a potential synergistic effect of multiple hits on this gene family. We also found NCOR1 (1/9) and NCOR2 (1/9) mutations in BCOR WT cases, indicating that they might play a similar role as BCOR mutations. MNX1-expressing samples without breakpoints near MNX1 did not share this mutational landscape but had a particularly high frequency of mutations in NPM1 (8/12; ref. 58). MNX1, however, has not been shown to be in the NPM1 gene signature in previous studies. In pediatric AML, MNX1 can be expressed as a result of a translocation t(7;12), which very often co-occurs with trisomy 19 (19). However, trisomy 19 was not found in this cohort of adult MNX1-expressing samples.
We profiled 22/31 MNX1-positive (MNX1+) samples with RNA-seq and found that they had a different gene expression signature, depending on whether the sample had a breakpoint near MNX1 or not (Supplementary Figs. S8A–S8F and S9; Supplementary Table S16). MNX1-rearranged samples had a gene expression signature similar to t(7;12)(q36;p13) pediatric AML (19, 59, 60), with, for example, an upregulation of AGR2, KRT72, and KRT73. Downregulated genes included several key cancer- and hematopoiesis-associated genes: HLX, TFEC, GFI1, GAPT, SPRY2, TLE4, ACVR1B, BIK, EVI2B, PIK3CG, INPPL1 (SHIP2), MYD88, MACC1, CSF3, and CD177. MNX1–non-rearranged samples had a different gene expression signature with a significant upregulation of HOXA13, CCL1, and CX3CR1 and a downregulation of DLK1 and DDIT4L. MNX1 expression was slightly lower than in MNX1-rearranged cases, and some of the downregulated genes also showed intermediate levels in MNX1–non-rearranged samples.
Next we performed single-cell RNA-seq (scRNA-seq) on eight AML samples [four MNX1+ and four MNX1-negative (MNX1−) with del(7q); Supplementary Fig. S10] to investigate the expression of MNX1 and the presence of del(7q) at the single-cell level. We integrated scRNA-seq data for 53,479 cells across all patients and annotated the cell types by projecting the data onto a reference atlas (Fig. 5D; ref. 61). We mainly captured myeloid progenitors and leukemic blasts, consistent with the disease phenotype. We observed that del(7q) was present in virtually all leukemic blasts across both groups (MNX1− and MNX1+), suggesting that this genomic alteration was an early event in leukemogenesis in these patients. In MNX1+ cases, MNX1 was constitutively expressed in all blasts, indicating that cells with MNX1 activation might have a proliferative advantage.
Putative Enhancers in the CDK6 Region Interact with MNX1 in del(7q) AML
Because most samples with MNX1 activation have breakpoints in CDK6, we set out to identify the corresponding enhancer. To investigate whether MNX1 may interact with the CDK6 locus in selected del(7)(q22q36) samples, we performed circular chromosome conformation capture (4C) using a 5′ part of MNX1 as viewpoint in two primary AML samples (2KFQ and MTM9) and one human PDX (AML-661) with del(7q). In all three cases analyzed, we detected interactions between MNX1 and the CDK6 locus (Fig. 6A). We confirmed these interactions by reciprocal 4C using the CDK6 locus as viewpoint (Supplementary Fig. S11). We further narrowed down the CDK6-derived enhancer to roughly 200 kb by combining the genomic information from the CDK6 duplication of ckAML sample 15PB8708 and from the deletion margins of the del(7q) samples (Fig. 6B). Open chromatin profiling by Assay for Transposase-Accessible Chromatin by Sequencing (ATAC-seq) and enhancer mark profiling by antibody-guided chromatin tagmentation sequencing (ACT-seq) in two patient samples and one PDX sample with del(7)(q22q36) revealed several enhancer candidates, two of which coincided with P300 and H3K27ac peaks in the MOLM-1 cell line (Fig. 6B). Comparing intensities of common peaks, we considered the rightmost enhancer (chr7:92384001–92385000, hg19) located immediately at the deletion border as the strongest candidate and inserted it as a homology-directed repair donor template via CRISPR/Cas close to MNX1 into one of the two chromosomes 7 of the induced pluripotent stem cell (iPSC) line ChiPSC22 (Fig. 6C and D). Two heterozygous cell lines were confirmed by WGS. Upon differentiation into hematopoietic stem and progenitor cells (HSPC), the engineered, but not the WT, HSPCs showed MNX1 expression as validated by RNA-seq, although at a significantly lower level than in patient samples (Fig. 6E). Therefore, this rightmost enhancer is not sufficient to induce the high MNX1 expression observed in del(7)(q22q36) patients alone and might require additional enhancers from this region. To recapitulate the genomic configuration of MNX1 expressors with del(7q), we generated a heterozygous del(7)(q22q36) in the iPSC/HSPC model. However, del(7q) iPSCs could not be differentiated into HSPCs and therefore did not show MNX1 activation. Taken together, MNX1 activation in del(7q)(q22q36) AML could be traced to a region of 200 kb, including parts of CDK6. Identifying the precise location of the enhancer(s) will require future work.
Putative enhancers in the CDK6 region interact with MNX1 in del(7q) AML. A, Chromatin interaction detected with 4C in the region around CDK6 using MNX1 as viewpoint, for three different del(7)(q22q36) samples and one control sample (GDM-1 cell line) without del(7q). B, The 200 kb search region based on the enhancer duplication (sample 15PB8708) and the sample with the leftmost deletion (MLL215704), with tracks for enhancer marks: ATAC-seq in del(7q) samples MTM9 and 2KFQ, ATAC-seq and ACT-seq against H3K27ac and H3K4me1 in the PDX sample AML-661 derived from a del(7q) patient, and ChIP-seq against P300 and H3K27ac in the MOLM-1 cell line. The putative enhancers are highlighted in orange. C, Copy number (CN) profile and SVs on chromosome 7 in the engineered cell line validating the insertion of the 1 kb region. D, Circos plot for the same cell line showing the absence of other rearrangements. Copy number losses are indicated in blue, and gains in red. SVs are shown as arcs at the center, with interchromosomal breakpoints in green, duplications in red, deletions in blue, and inversion in purple. E,MNX1 expression in TPM for the parental ChiPSC22 HSPCs (n = 5, from independent differentiation experiments) compared with the engineered cell with the enhancer insertion (enh ins; n = 8, from independent differentiation experiments for two different cell lines). **, P < 0.01 using a two-sided t test. chr, chromosome.
Putative enhancers in the CDK6 region interact with MNX1 in del(7q) AML. A, Chromatin interaction detected with 4C in the region around CDK6 using MNX1 as viewpoint, for three different del(7)(q22q36) samples and one control sample (GDM-1 cell line) without del(7q). B, The 200 kb search region based on the enhancer duplication (sample 15PB8708) and the sample with the leftmost deletion (MLL215704), with tracks for enhancer marks: ATAC-seq in del(7q) samples MTM9 and 2KFQ, ATAC-seq and ACT-seq against H3K27ac and H3K4me1 in the PDX sample AML-661 derived from a del(7q) patient, and ChIP-seq against P300 and H3K27ac in the MOLM-1 cell line. The putative enhancers are highlighted in orange. C, Copy number (CN) profile and SVs on chromosome 7 in the engineered cell line validating the insertion of the 1 kb region. D, Circos plot for the same cell line showing the absence of other rearrangements. Copy number losses are indicated in blue, and gains in red. SVs are shown as arcs at the center, with interchromosomal breakpoints in green, duplications in red, deletions in blue, and inversion in purple. E,MNX1 expression in TPM for the parental ChiPSC22 HSPCs (n = 5, from independent differentiation experiments) compared with the engineered cell with the enhancer insertion (enh ins; n = 8, from independent differentiation experiments for two different cell lines). **, P < 0.01 using a two-sided t test. chr, chromosome.
Knockdown of MNX1 Reduces Tumor Load of AML PDX Cells In Vivo
After having demonstrated that MNX1 can be activated by enhancer hijacking in AML, we investigated whether MNX1 plays a role in the maintenance of established leukemias. To approximate the clinical situation, we studied a patient’s AML cells growing in mice using PDX model AML-661 which harbors a del(7)(q21.13;q36.3) and expresses MNX1. Using lentiviruses, we stably expressed two different constructs in each cell, namely CRE-ERT2 in which CRE becomes activated by addition of tamoxifen (TAM) and a CRE-inducible short hairpin RNA (shRNA) cassette in two different versions, for knockdown of either MNX1 or a control gene. The two knockdown constructs were molecularly marked by different fluorochromes to distinguish the two populations by flow cytometry, before and after induction of the knockdown by TAM. In vivo experiments were performed in a competitive approach, injecting a mixture of cells with MNX1 or control knockdown in a 1:1 ratio into the same mouse (Fig. 7A; ref. 62). In the first, constitutive experiment, MNX1 and control knockdowns were induced by TAM in vitro before transplantation of PDX cells into mice (Fig. 7A). After a period of several weeks of leukemic growth in mice, allowing initial engraftment in the orthotopic niche and later following substantial proliferation within the BM and dissemination to extramedullary sites in the blood, cells with MNX1 knockdown showed a pronounced disadvantage compared with cells with control knockdown in all organs studied (Fig. 7B), suggesting that lack of MNX1 reduced fitness of PDX AML-661 cells in vivo. To distinguish the effect of MNX1 knockdown on engraftment versus proliferation, a second experiment was performed in which MNX1 and control knockdowns were induced after the leukemic disease was readily established in mice by systemic treatment of mice with TAM (Fig. 7C). Again, cells with MNX1 knockdown had a remarkable disadvantage over control cells, most prominently in spleen and peripheral blood (PB), indicating that MNX1 knockdown reduced in vivo growth of AML-661 cells (Fig. 7D). As the effect was stronger in the first constitutive compared with the second inducible experiment, both biologic processes of cell engraftment and in vivo proliferation might rely on expression of MNX1.
Knockdown of MNX1 reduces tumor load of AML PDX cells in vivo. A, Scheme depicting the experimental setup of the in vivo constitutive experiment. AML-661 PDX cells expressing the cassettes for both CRE-ERT2 and the shRNA addressing MNX1 or a control gene were amplified in mice. Fresh PDX cells were stimulated with TAM (single dose, 200 nmol/L, 72 hours) to induce the knockdown in vitro. Cells with knockdown were enriched by flow cytometry gating on the respective fluorochrome markers GFP (knockdown of MNX1) and T-Sapphire (control knockdown). The two populations were mixed at a 1:1 ratio and injected into mice. The ratio between both populations was measured at advanced leukemic disease in different organs (more than 60% hCD33+ cells in PB). B, The results of the experiment described in (A) using five mice. C, Scheme depicting the experimental setup of the in vivo inducible experiment. The cell populations described in (A) were mixed in a 1:1 ratio and injected into 13 mice. Fourteen days after injection, three mice were sacrificed (N = 3) to quality control the 1:1 ratio of the two cell populations by flow cytometry. TAM (50 mg/kg) was orally administered to the 10 remaining mice. Five mice were sacrificed 3 days later to measure the rate of shRNA induction by TAM. At an advanced stage of leukemia, the remaining five mice were sacrificed to determine the ratio between the control vs. MNX1 knockdown populations. D, The results of the experiment described in (C). P values determined by a one-tailed unpaired t test. **, P < 0.01; ****, P < 0.0001.
Knockdown of MNX1 reduces tumor load of AML PDX cells in vivo. A, Scheme depicting the experimental setup of the in vivo constitutive experiment. AML-661 PDX cells expressing the cassettes for both CRE-ERT2 and the shRNA addressing MNX1 or a control gene were amplified in mice. Fresh PDX cells were stimulated with TAM (single dose, 200 nmol/L, 72 hours) to induce the knockdown in vitro. Cells with knockdown were enriched by flow cytometry gating on the respective fluorochrome markers GFP (knockdown of MNX1) and T-Sapphire (control knockdown). The two populations were mixed at a 1:1 ratio and injected into mice. The ratio between both populations was measured at advanced leukemic disease in different organs (more than 60% hCD33+ cells in PB). B, The results of the experiment described in (A) using five mice. C, Scheme depicting the experimental setup of the in vivo inducible experiment. The cell populations described in (A) were mixed in a 1:1 ratio and injected into 13 mice. Fourteen days after injection, three mice were sacrificed (N = 3) to quality control the 1:1 ratio of the two cell populations by flow cytometry. TAM (50 mg/kg) was orally administered to the 10 remaining mice. Five mice were sacrificed 3 days later to measure the rate of shRNA induction by TAM. At an advanced stage of leukemia, the remaining five mice were sacrificed to determine the ratio between the control vs. MNX1 knockdown populations. D, The results of the experiment described in (C). P values determined by a one-tailed unpaired t test. **, P < 0.01; ****, P < 0.0001.
Discussion
Reports have indicated enhancer hijacking as a mode of proto-oncogene activation in AML (16, 17, 19). In this study, we developed Pyjacker, a computational method for the systematic detection of enhancer hijacking events using WGS, RNA-seq data, and enhancer information. Pyjacker is versatile and applicable to many cancer types, but in this study we focused on ckAML. In 39 ckAML samples, Pyjacker detected 19 genes putatively activated by SVs in at least one sample with an FDR <20%. This indicates the importance of enhancer hijacking in ckAML, although it is not as frequent as the most recurrent deletions in 5q and 7q. We found known genes activated by enhancer hijacking such as MECOM, BCL11B, and MNX1 and identified multiple potential novel oncogenes in AML. This is in line with a recent study using HiC and WGS data for the detection of neo-loops in 25 AML samples, which also identified enhancer-promoter, as well as silencer-promoter interactions in AML, together suggesting an overlooked repertoire of leukemic events (63). That study used NeoLoopFinder to identify hundreds of new 3D contacts across SVs, much more than the number of putative enhancer hijacking events that we identified with Pyjacker in this study. We believe that focusing on strong gene overexpression is more robust to identify critical oncogenes than relying on HiC data, which might include many false positives where the gene expression does not vary a lot. In addition, RNA-seq data is more commonly available than HiC data, especially for large cohorts, which makes Pyjacker more widely applicable.
GSX2 is a homeobox gene which is overexpressed in AML samples with the rare t(4;12)(q12;p13) translocation (54), but this translocation also often leads to overexpression of PDGFRA and fusions involving ETV6, the most frequent being ETV6::CHIC2 (55). In this study, we found a different rearrangement causing only GSX2 overexpression without these additional effects, suggesting that activation of GSX2 might be the driver event in the t(4;12) translocation and that understanding the role of GSX2 in leukemogenesis could be important for therapeutic targeting.
EPO is another putative novel oncogene activated by enhancer hijacking in a small fraction of AEL samples. EPO had already been found to be activated by structural rearrangements in a mouse model of erythroleukemia, resulting in growth factor independence (41, 42). In this study, we found one human AEL sample with EPO overexpression linked to a genomic rearrangement. Although EPO activation is rare, it seems to be recurrent in AEL, as we identified it in three additional cohorts (47–49), including a previously reported out-of-frame fusion transcript YWHAE::EPO which was probably selected because it led to EPO upregulation (48). In addition, EPO overexpression seems to cooperate with amplifications of the gene coding for its receptor, a phenomenon recently described in AEL (49), because expression of EPO was found to co-occur with EPOR amplification.
Some identified genes were not found to be expressed in other cohorts, indicating that they may be very rare driver events, false positives, or passenger events which were selected for as part of a complex rearrangement. For example, both TEKT1 and SLC22A10 overexpression co-occurred with complex genomic rearrangements involving multiple chromosomes, which also disrupted TP53.
We focused validation experiments on MNX1 because it was, among the top Pyjacker hits, the second (following MECOM) most recurrently expressed gene in other cohorts (1, 35). We found that MNX1 is expressed in 1.4% of all AML cases, often with del(7)(q22q36). Activation of MNX1 with del(7q) had been reported before (64), and in his study we showed that the mechanism underlying the activation is a hijacking of a CDK6 enhancer. Del(7q) is a recurrent event in AML and currently explained by haploinsufficiency of one or several genes, including EZH2, KMT2C, KMT2E, and CUX1 (11, 13–15). Our findings show that, in addition to haploinsufficiency of the deleted genes, del(7q) can also lead to enhancer hijacking of MNX1. In one sample, a CDK6 enhancer was duplicated and inserted next to MNX1, without deletion, which makes it very likely that MNX1 activation is important for leukemogenesis and not merely a passenger side effect of del(7q). MNX1 upregulation had previously been observed in infant AML with t(7;12)(q36;p13) and was shown to transform fetal HSPCs in mice (19, 65). In this study, we showed that both constitutive and in vivo inducible knockdown of MNX1 in competitive assays in an AML PDX model greatly reduced the fitness of the leukemic cells, which demonstrates that MNX1 is a dependency gene in adult AML. However, only 8% of del(7q) AML cases have MNX1 expression, so enhancer hijacking cannot explain all del(7q) cases, and haploinsufficiency of genes in the deleted region remains the likely main consequence of del(7q). We found that this subgroup of MNX1-rearranged adult AML samples have a unique mutational profile with a much higher rate of BCOR mutations (53%) than other AML samples (1%) as well as del(7q) AML (10%; ref. 11). This differs from pediatric AML cases with t(7;12) which do not have these co-occurring BCOR mutations but instead frequently harbor trisomy 19 (19), an alteration that we did not detect in adult MNX1-rearranged cases. This group of adult MNX1-rearranged patients had a gene expression signature that is similar to t(7;12) pediatric AML (59), suggesting that therapeutic strategies targeting MNX1 could be jointly investigated for both pediatric and adult MNX1-rearranged AML cases. Suppression of key genes involved in hematologic malignancies, including HLX, TFEC, GFI1, EVI2B, TLE4, and MYD88, all shared with pediatric AML, suggest a transcriptional repressor activity for MNX1 in AML affecting cell proliferation and myeloid differentiation. As pediatric AML with MNX1 activation has a different activation event, does not have chr7q deletions or BCOR mutations, and is seen in infants at a different developmental state, the overlap of dysregulated key genes strongly connects the observed gene dysregulation to MNX1 activity and not to confounding factors. We also identified a subgroup of MNX1-expressing cases without genomic rearrangements near MNX1 which do not share the gene expression signature of the MNX1-rearranged cases. The expression of MNX1 in these samples remains unexplained, but we observed that they have a very high frequency of NPM1 mutations (67%), which might be linked to MNX1 expression, as NPM1 mutations have been shown to upregulate homeobox genes (58).
Taken together, our data suggest that the numerous genomic rearrangements in ckAML often lead to enhancer hijacking, a molecular event that may have been previously underestimated compared with oncofusions and CNAs. Understanding how the genes activated by this mechanism drive leukemia, or finding ways to stop this aberrant expression, could pave the way for personalized treatments targeting specific oncogenes.
Methods
Pyjacker Details
Identification of “Candidate Samples” with Breakpoints near a Gene
Only genes whose expression is greater than 1 transcript per million (TPM) in at least one sample are considered. For each gene, Pyjacker identifies “candidate samples” with a breakpoint near the gene and which may therefore overexpress this gene because of the rearrangement. Because promoter–enhancer interactions occur within TADs, Pyjacker selects samples which have a breakpoint in the same TAD as the gene. Any list of TADs can be provided, and in the present analysis, we used TADs derived from publicly available HiC data from HSPCs (Supplementary Table S17; ref. 19). To avoid missing events due to imprecise TAD boundaries, Pyjacker extends the TADs by 80 kb on each side. We note that this TAD extension did not impact the results on the ckAML cohort, as all reported events had breakpoints within the TAD of the activated gene, but it might improve the robustness in other cohorts. If a list of TADs is not provided as input, Pyjacker will instead consider all samples with breakpoints within a user-specified distance to the gene (1.5 Mb by default). All “candidate samples” for a particular gene will be scored to test whether these samples express this gene because of a structural rearrangement.
Overexpression Score
If a gene is activated by enhancer hijacking in a sample, we expect this sample to have a higher expression for this gene, compared with “reference samples” which do not have breakpoints near the gene. In order to remove the effect of amplifications and to focus on genes activated by enhancer hijacking, the expression values in TPM are corrected for copy number, if CNA data are provided: the expression values are multiplied by 2/(copy number). The expression values are then log-transformed: . Then, Pyjacker computes the mean μ and SD σ of the gene expression in reference samples (which do not have breakpoints near the gene). For each candidate sample, Pyjacker computes the number of SDs away its expression lies from the mean, in which the SD is increased in order to avoid extreme scores when all reference samples have the same expression: in which is the expression of the gene in the candidate sample. This overexpression score is then transformed so that it is positive when the expression is more than two SDs above the mean and negative otherwise and to avoid very high or very low overexpression scores which would have a disproportionate effect on the final score: if , , else .
Allele-Specific Expression Score
If a gene is activated by enhancer hijacking, we would expect only the allele on the rearranged chromosome to be expressed, resulting in monoallelic expression. For each gene and each sample, heterozygous SNPs are identified in the WGS data, and if there is coverage in the RNA-seq, the numbers of reference and alternative reads in the RNA-seq data are counted. For each SNP, Pyjacker computes the log-likelihood ratio between monoallelic and biallelic expression. For monoallelic expression, we assume a mixture of two β-binomial distributions for the allelic read counts, with means centered on 2% and 98% (to account for possible low expression from the other allele). For biallelic expression, we assume a β-binomial distribution centered on 50%. The log-likelihood ratios from all SNPs in the gene are then combined to get the allele-specific expression (ASE) score, by averaging the log-likelihood ratios but still giving a higher score if several SNPs are present: in which is the number of SNPs in the gene. This score is positive if the allelic information supports a monoallelic expression, negative if it supports a biallelic expression, and close to 0 if it is unclear. We note that if no heterozygous SNPs are present in a gene in a sample, the ASE score will be 0, but this does not preclude the gene from being identified by Pyjacker if the overexpression and enhancer scores are positive. The ASE score is set to 0 for genes with copy number lower than two or greater than four, for genes on sex chromosomes, and for imprinted genes (if a list of imprinted genes is provided as input). If allelic read counts are not provided as input, Pyjacker can still be run and will in this case not use the ASE score, which will result in a higher FDR.
Enhancer Score
A genomic rearrangement is more likely to result in enhancer hijacking if it brings a strong enhancer close to the target gene. Pyjacker can optionally take as input a list of enhancers, scored for enrichment of enhancer marks by ROSE (see section “Identification of myeloid enhancers” for the ChIP-seq data that we used in this study; refs. 66, 67). The list of enhancers provided must be derived from the same cell type as the cancer samples studied. If no enhancer data are available, the enhancer score will be set to 0.
Pyjacker identifies all enhancers which, after the rearrangement, likely come to the same TAD as the gene. This is done by considering the position and orientation of the breakpoints, but each breakpoint is considered independently, which might miss some enhancers in case of complex rearrangements with clustered breakpoints. Enhancers are ranked according to their enrichment, and Pyjacker computes the enhancer score by adding all scores but putting more weight on the strongest enhancers: in which is the number of enhancers and is the enrichment for the -th strongest enhancer.
Combined Score
The weights can be set by the user, but their default values which should work well in all cases are , , , and . is 1 if the gene is deleted in the sample and 0 otherwise.
Aggregated Gene Score across Samples
In order to give more weight to genes which are activated in multiple samples, Pyjacker aggregates the scores from all samples for each gene:
in which is the score from sample .
FDR
The gene scores reflect how likely a gene is to be activated by structural rearrangements in the cohort studied, but the values are somewhat arbitrary. In order to get a more interpretable FDR, Pyjacker computes a null distribution for these scores in the absence of enhancer hijacking. For each gene, the true “candidate samples” are excluded, and instead 1, 2, or 3 (number chosen randomly) random samples are chosen from the reference samples (without breakpoints near the gene) to be considered candidate samples and scored. This results in a list of null scores, in which only pairs of (gene, sample) without enhancer hijacking are used. The length of this list is equal to the number of genes (), so to increase the size of the list (and thus get more precise P values), this process is repeated times ( by default), in which each time different random samples are selected for each gene, resulting in a list of null scores. This null distribution is used to compute an empirical P value for each gene. Finally, the Benjamini–Hochberg correction is used to correct for multiple testing, which results in an FDR.
AML Cell Lines Used to Test Pyjacker
We tested Pyjacker using 10 AML cell lines: THP-1, LAMA-84, MONOMAC-1, MV-4-11, HEL92.1.7, EOL-1, OCI-AML3, GDM-1, MOLM-1, and MUTZ-3. Some of these cell lines had known enhancer hijacking events: MECOM in MOLM-1 and MUTZ-3 (16), MNX1 in GDM-1 (31), and MN1 in MUTZ-3 (32). WGS and RNA-seq data for THP-1, LAMA-84, MONOMAC-1, MV-4-11, HEL92.1.7, and EOL-1 were retrieved from the Cancer Cell Line Encyclopedia (68). RNA-seq and WGS data of GDM-1 were retrieved from Gene Expression Omnibus (GEO) accession GSE221753 and Sequence Read Archive (SRA) accession SRR23087016 (31). RNA-seq data of OCI-AML3 were retrieved from GEO accession GSE209777 (69). WGS data for OCI-AML3 and WGS and RNA-seq data for MOLM-1 and MUTZ-3 were performed for this study (see “Data Availability”). The sequencing data from cell lines were processed in the same way as patient samples (see below).
Medulloblastoma Dataset
AML Patient Samples
The 39 ckAML samples were derived from a prospective clinical trial (NCT02348489) conducted in older, unfit, previously untreated patients with newly diagnosed AML (70). This clinical trial was conducted according to the Declaration of Helsinki, and written consent was obtained from the patients. Patient sex, age at diagnosis, and karyotype information are provided in Supplementary Table S6, but race, ethnicity, risk category, and disease stage were not available. Data on targeted DNA sequencing of this cohort and in part of EPIC BeadChip array analysis were previously reported by Jahn and colleagues (33). For this study, we selected 39 ckAML blood or BM samples (median age: 77 years), which had at least three CNAs detectable from the EPIC array data and for which sufficient material was still available for further profiling. Detailed patient characteristics, including sex, age, and cytogenetics, are provided in Supplementary Table S6.
Generation and Processing of WGS Data
For both primary patient samples (blood or BM) and cell lines, DNA was isolated as previously described (19). The DNA was sequenced with NovaSeq 6000 S4, with read length of 2 × 150 bp and a coverage of 50× to 70× for each sample. The WGS data were aligned to the GRCh37 reference genome using Burrows-Wheeler Aligner-MEM [arXiv:1303.3997v2 (q-bio.GN)]. SVs were called with Manta (71), CNAs were called with Control-FREEC (72) and SNVs with Mutect2 (bioRxiv 10.1101/861054). Because no matched normal samples were available to identify somatic mutations, we only looked for SNVs in 52 genes known to be recurrently mutated in AML, as previously described (19). Chromothripsis was determined using ShatterSeek (73) using a criterion of at least 10 copy number switches in one chromosome. The WGS data processing, starting from the aligned BAM files, was done using a Nextflow workflow: https://github.com/CompEpigen/wf_WGS. All WGS plots were made using Figeno (74).
RNA-seq
RNA was isolated as previously described (19). The RNA was sequenced with NovaSeq 6000 S2, with read length 2 × 101 bp and 180 to 250 million reads per sample. The RNA-seq data were processed using the nf-core rnaseq workflow v3.9, with alignment using STAR (75) and quantification using Salmon (76). Fusion transcripts were detected using Arriba (28). For ASE, we detected heterozygous SNPs in WGS data using HaplotypeCaller and used GATK ASEReadCounter to get allele-specific read counts in RNA-seq data at positions in which a heterozygous SNP was found. Differential gene expression analysis was run using the deseq2 (77) package v1.42.0 with log fold change shrinkage applied by the ashr (78) algorithm v2.2-63. Batch correction was applied for the MLL cohort following the generation of vst-transformed gene expression values for single gene expression visualization. The TARGET pediatric AML RNA-seq dataset was downloaded from UCSC XENA and analyzed using the same approach as the adult AML cohort. For cases with multiple sample points, primary specimens were selected over recurrent samples. BM samples were preferentially used over blood-derived samples, yielding overall two unique cases with the t(7;12)(q36;p13) karyotype. The IDs of the samples from the TARGET-AML cohort that were used, together with their t(7;12) status, origin (blood or BM), and recurrence, are provided in Supplementary Table S18. The Balgobind and colleagues (59) pediatric AML cohort and its corresponding GEO GSE17855 Affymetrix U133 Plus 2.0 microarray dataset was analyzed using the Limma (79) package v3.58.1 using the empirical Bayes algorithm with default settings. Cases with unknown karyotype were not considered.
Validation of Breakpoints by Genomic PCR
PCR to confirm translocation t(1;3) in sample 16KM11270 was done with 10 cycles of touchdown from 59°C to 54.5°C, and 30 cycles at 54°C annealing temperature. PCR to confirm breakpoint 1 in sample 15PB8708 was done with 69°C annealing temperature for 35 cycles, and PCR for breakpoint 2 with 10 cycles of touchdown from 70°C to 65°C and 30 cycles at 65°C. Q5 High-Fidelity PCR Kit (New England Biolabs, #E0555) and, depending on the PCR reaction (primers in Supplementary Table S12), 20 to 160 ng genomic DNA were used. PCR products were analyzed on 1.2% Tris-borate, ethidium bromide–stained agarose gels, and gel images were recorded using a Bio-Rad Geldoc GO system (#12009077).
scRNA-seq of Patients with del(7q) AML
scRNA-seq was performed for eight AML samples: four MNX1+ samples [three with del(7q) and one with an alternative rearrangement)] and four control MNX1− samples with del(7q). Only the MNX1+ sample with alternative rearrangement (15PB8708) was part of the initial 39 ckAML samples, the seven others come from other AML samples. Names and provenance (BM or PB) for these eight samples are provided in Supplementary Fig. S10. Cryopreserved samples from BM and PB were thawed at 37°C for 2 minutes before transferring to a 50-mL tube. Cells were diluted by adding incremental 1:1 volumes of DMEM/F12 media (Thermo Fisher Scientific) for 5 times with 1-minute wait in between each step. Cells were centrifuged at 300 rcf for 5 minutes and resuspended in 2 mL PBS (Thermo Fischer Scientific) + 0.04% BSA (Milteny Biotec). Libraries were generated using 20,000 single cells as input to the Chromium Controller using Chromium Next GEM Single-Cell 3′ Kit v3.1 (10× Genomics). From the single-cell sequencing libraries, we generated between 632 and 803 M (between 60,000 and 80,000 reads per cell) reads per sample using an Illumina NovaSeq 6000 S4 FlowCell. For processing (alignment to reference genome GRCh38, generation of count matrix) raw sequencing reads, Cell Ranger v7.1.0 was used. Subsequent analyses, including normalization (log-normalize), generation of a low-dimensional representation, and cluster annotation, were conducted using the Seurat v5 software package (80). Batch integration was performed with Canonical Correlation Analysis using Seurat’s IntegrateData function (81). For facilitating cluster annotation, we projected our data to the Triana and colleagues (61) reference atlas using scMap (82). We used Numbat (83) for inferring copy number losses and gains from the single-cell transcriptomic data. A cell was annotated as having del(7q) if the probability of the deletion as returned by Numbat was larger than 0.5.
Identification of Myeloid Enhancers
We used public ChIP-seq data for H3K27ac and P300 from three myeloid cell lines: K562 [data from the ENCODE project (84), accessions ENCSR000AKP and ENCSR000EGE], MOLM-1 [data from ArrayExpress accession E-MTAB-2224 (16)], and Kasumi-1 (data from GEO accession GSE167163; bioRxiv 10.1101/2022.09.14.507850). We used ROSE (66, 67) to score and rank super-enhancers, in which transcription start sites were excluded. ROSE normally takes as input a single ChIP-seq experiment, but we found that the ranking was very variable depending on the dataset being used, so we used the six ChIP-seq datasets mentioned above and averaged the ROSE scores. The average ROSE scores were used as input to Pyjacker in order to compute the enhancer score.
MNX1 Expression Screen
For public cohorts profiled with RNA-seq, we considered a sample to be MNX1+ if its expression of MNX1 was higher than 5 TPM, as it was 0 in most samples. Because most MNX1+ samples had expression values for MNX1 greater than 100 TPM, we chose this threshold of 5 TPM to avoid noise from samples with very low MNX1 expression.
For qRT-PCR, cDNA was generated from blood or BM AML samples with random hexamers and Superscript III reverse transcriptase (Invitrogen, #56575). Analysis (primers in Supplementary Table S12) was done with a primaQUANT CYBR mix (Steinbrenner Laborsyteme GmbH, #SL-9902) on a Roche LightCycler 480. Relative expression was determined with the 2−Δ(Ct) method using Ct values of GAPDH or porphobilinogen deaminase (PBGD) for normalization. For each cohort, we computed the mean and SD for these values and considered samples to be MNX1+ if their values were higher than the mean plus three times the SD.
PDX Model
PB samples from a patient with AML at first and second relapses were obtained from the Department of Internal Medicine III, Ludwig-Maximilians-Universität, Munich, Germany. Specimens were collected for diagnostic purposes. Written informed consent was obtained from the patient under the AMLCG Registry study (DRKS00020816). The study was performed in accordance with the ethical standards of the responsible committee on human experimentation (written approval by the Research Ethics Boards of the medical faculty of Ludwig-Maximilians-Universität, Munich, numbers 068-08 and 222-10) and with the Helsinki Declaration of 1975, as revised in 2013.
The PDX models AML-491 and AML-661 were established from primary patient cells at first and second relapses. The PDX cells harbored a del(7)(q21.13q36.3) and several AML-related mutations (Supplementary Table S19). Positive MNX1 expression was determined via RNA-seq, and PDX cells were genetically modified as previously outlined in Zeller and colleagues (85). TAM-inducible shRNA constructs were generated as described in Carlet and colleagues (62) for two individual MNX1 shRNAs (76 & 82) and Renilla control shRNAs. CreERT2 and the shRNA cassettes were stably integrated into the AML-661 PDX model via lentiviral transduction using third-generation packaging plasmids [pMDLg/pRRE (Addgene#12251), pRSV-Rev (Addgene#12253), and the VSV-G envelope–expressing plasmid pMD2.G (Addgene#12259)] with the addition of polybrene (Sigma Aldrich, order no. H9268). CreERT2/shMNX1-76, CreERT2/shMNX1-82, CreERT2/shRenilla-1, and CreERT2/shRenilla-2 transgenic cells were enriched using a BD FACSAria III Cell Sorter (BD Biosciences) and serially transplanted into donor mice for amplification.
Animal trials were performed in accordance with the current ethical standards of the official committee on animal experimentation (written approval by Regierung von Oberbayern, [email protected]; ROB-55.2Vet-2532.Vet_02-16-7 and ROB-55.2-2532.Vet_02-20-159). In general, PDX cells were amplified in 10- to 26-week-old male or female NOD.Cg-PrkdcscidIl2rgtm1Wjl/SzJ mice (The Jackson Laboratory). Mice were kept in animal rooms of the Laboratory Animal Breeding and Husbandry Unit of Helmholtz Zentrum München under specified pathogen-free conditions with a 12/12-hour light cycle. The animal rooms of the barriers were fully air-conditioned with a temperature of 20°C to 24°C and 45% to 65% humidity according to Annex A of the European Convention 2007/526 EC. The maximum stocking density of the cages corresponds to Annex III of the 2010/63 European Union. The cages were constantly filled with structural enrichment, and the animals had unlimited access to food and water. During the experiment, mice were kept in individually ventilated cages (IVC). The cages were only opened one at a time at a cage-changing station, and the experimenter’s gloves were disinfected with disinfectant each time before a mouse was removed from the cage. Hygiene monitoring was carried out at least quarterly in accordance with the current Federation of European Laboratory Animal Science Associations (FELASA) recommendation: In the animal housing areas equipped with IVC systems, exhaust dust from the IVC ventilation units was tested for all FELASA-listed pathogens by PCR.
4C
About 2 million cells per sample were used for 4C according to van de Werken and colleagues (86). Two rounds of restriction digestion/T4 DNA ligation were applied, using BglII in combination with NlaIII. In a first PCR step, second ligation products, inverse primers (Supplementary Table S20), and Q5 high-fidelity enzyme (New England Biolabs, #M0491) were used with reaction conditions at 98°C for 30 seconds; 10 cycles with 98°C for 15 seconds; 63°C, 57°C, or 54°C, depending on the viewpoint, for 20 seconds with 0.5°C touchdown per cycle; 72°C for 2 minutes; 30 or 25 cycles with 98°C for 15 seconds; 58°C, 52°C, or 49°C, depending on the viewpoint, for 20 seconds; and 72°C for 2 minutes, finally followed by 72°C for 1 minute. Purification of PCR products, generation of sequencing libraries, and sequencing were done as described previously (31). PCR products were purified with HighPrep beads (Biozym, cat. no. 220002), and their concentrations were determined by Qubit dsDNA HS Assay (Thermo Fisher Scientific, cat. no. Q32854). The sequencing libraries were generated with about 5 ng purified PCR products by RT-PCR to monitor amplification progress using a LightCycler 480 (Roche) and 25 μL reaction volumes using KAPA2G Robust HotStart ReadyMix (Merck, cat. no. KK5702) at 95°C for 3 minutes (initial melting) and 95°C for 20 seconds, 62°C for 15 seconds, and 72°C for 40 seconds (cycling). Each 0.75 μL of primers (stock concentration 10 μmol/L) Tn5mCP1n (AATGATACGGCGACCACCGAGATCTACACTCGTCGGCAGCGTC) and Tn5mCBar (CAAGCAGAAGACGGCATACGAGAT[barcode]GTCTCGTGGGCTCGG) were used. Sequencing libraries resulting from PCR products were bead-purified, DNA concentration was determined by Qubit dsDNA HS Assay, and product sizes were determined by TapeStation 4150 analysis with D1000 High Sensitivity Assay (Agilent, cat. no. 5067- 5585). Sequencing libraries were pooled in equimolar ratios and analyzed on a NextSeq 550 machine (Illumina), midoutput, 75 PE mode.
ACT-seq
Genome-wide targeting of histone modifications was done by ACT-seq according to Carter and colleagues (87) with some modifications using a self-prepared pA-Tn5ase protein (31) and using the antibodies listed in Supplementary Table S21. To generate a pA-Tn5 transposome (pA-Tn5ome), pA-Tn5ase and Tn5ME-A+B load adaptor were mixed such that both components had a concentration of 3.3 μmol/L in complex formation buffer (CB). pA-Tn5ome–antibody (pA-Tn5ome–ab) complexes were generated by mixing 1 μL pA-Tn5ome with 0.8 μL CB and 0.8 μL antibody solution. Per tagmentation and pA-Tn5ome–ab complex binding, 50,000 cells were used. For normalization of sequence reads between biological replicates, about 4,000 permeabilized nuclei of yeast Saccharomyces cerevisiae, prepared according to (88) and incubated with pA-Tn5ome–ab complex targeting yeast H2B, were spiked into each pA-Tn5ome–ab complex/cell mix. Tagmentation at 37°C for 30 minutes was started by addition of 10 μmol/L MgCl2 (final concentration) followed by a 30-minute proteinase K (20 μg; Qiagen, #19133) treatment at 55°C. DNA was purified using MinElute Kit (Qiagen, #28004) and eluted with 20 μL elution buffer. Sequencing libraries were generated under real-time conditions to monitor amplification progress using a LightCycler 480 in 50 μL reaction mixes consisting of 20 μL tagmented DNA eluate, 25 μL NEBNext High-Fidelity 2× Mix (New England Biolabs, cat. no. M0541), 0.5 μL 100xSYBRGreen, and each 2.5 μL primer Tn5McP1n and Tn5mCBar (stock concentration 10 μmol/L, see above). Reaction conditions were 72°C for 5 minutes; 98°C, 30 seconds; cycling with 98°C, 10 seconds; 63°C, 10 seconds; and 72°C, 10 seconds. PCR products were purified with HighPrep beads. DNA concentration and fragment size were determined as described above. Six to eight differently barcoded libraries were multiplexed and sequenced as described above on a NextSeq 550 system.
ATAC-seq
ATAC-seq was done essentially as described by Corces and colleagues (89) using about 50,000 cells and the Nextera DNA Library Prep Kit (Illumina, #15028212). In brief, cells were lysed in ATAC-RSB buffer containing 0.5 μL NP40 10%, 0.5 μL Tween-20 10%, and 0.5 μL digitonin 1% followed by short incubation in ATAC-RSB containing 0.1% Tween-20. Tagmentation was done in a 50 μL mix at 37°C for 30 minutes in a thermomixer (Eppendorf, comfort 5355) with 1,000 rpm. Reactions were stopped by addition of 20 μL 5 mol/L guanidinium thiocyanate, and DNA was purified with 140 μL HighPrep beads. Libraries were generated under real-time conditions and processed as described for ACT-seq, but cycling conditions were 98°C for 10 seconds; 63°C, 30 seconds; and 72°C, 30 seconds.
4C Sequencing, ACT-seq, and ATAC-seq Data Analysis
4C sequencing data processing and analysis were done with the pipe4C pipeline (90) using single reads starting with a BglII site containing viewpoint primer; the pipe4C pipeline was applied with default parameters under R v3.6.2. ACT-seq and ATAC-seq data were analyzed as described previously (31). Upstream processing of ATAC-seq and ACT-seq data was performed using TrimGalore v0.4.4 (RRID: SCR_011847) together with Cutadapt v1.14 (RRID: SCR_011841) applying the nondefault parameters “—paired,” “—nextera,” “—length_1 35,” and “—length_2 35” to perform adapter and quality trimming. Bowtie2 v2.2.6 (RRID: SCR_016368) was used with the “—very sensitive” flag and a maximum insertion length of 2,500 bp to map trimmed reads against the GRCh37/hg19 reference genome. Aligned reads belonging to the same lane-multiplexed library were combined using SAMtools merge v1.5 (RRID: SCR_006525). PCR duplicates were removed by means of Picard MarkDuplicates v2.17.4 for ATAC-seq but not ACT-seq data. Discordant mappings and alignments with a Phred score below 20 were removed using SAMtools view. Trimmed ACT-seq reads were additionally aligned against the Saccharomyces cerevisiae R64 reference genome and post-aligned as described above. To derive a library-specific scaling factor, the multiplicative inverse of the number of filtered alignments against the yeast genome was calculated. This normalization leads to signal ranges in BigWig files and Integrative Genomics Viewer browser tracks close to zero. Coverage tracks were generated using the bamCoverage functionality of deepTools v3.1.1 (RRID: SCR_016366) with the nondefault parameters “—ignoreForNormalization chrM chrY chrX” and “—effectiveGenomeSize 2652783500” as well as the “—scaleRatio” option to specify the spike-in–derived scaling factor. ATAC-seq accessibility signals were smoothed by centering a 73 bp window on the transposition event’s midpoint of each read using a custom script; the resulting tag coordinates were used for all downstream analyses. The analysis procedures were implemented as fully containerized workflows using the Common Workflow Language v1.0. BigWig tracks were visualized using Figeno (74).
CRISPR/Cas9-Mediated Enhancer Insertion
A 1 kb region (chr7:92384001–92385000, GRCh37/hg19) containing a putative enhancer was inserted upstream of the MNX1 promoter (chr7:156816239, GRCh37/hg19) in ChiPSC22 (Takara Bio Europe) by CRISPR/Cas9 editing as previously described (91). In short, ChiPSC22 cells were nucleofected with the Cas9 ribonucleoprotein complex and a homology-directed repair (HDR) donor template containing the putative enhancer sequence and 200 bp homology arms on each site. The CRISPR RNA was designed using the Alt-R Custom Cas9 crRNA Design Tool (Integrated DNA Technologies), and the HDR donor template was ordered as double-strand DNA HDR Donor Block (Integrated DNA Technologies). Per 20 μL transfection, 500 ng of the HDR Donor Block were used. Clones with successful integration of the enhancer on one allele were selected by PCR using the following primers: AAAAGGACATGGGGATGCGT and GAAGCTGATCTTCCCTGAGGTT. Two cell lines were validated using WGS. Cell lines were differentiated to HSPCs as previously described (91). RNA was isolated from HSPCs using RNeasy Plus Mini Kit (Qiagen) and sequenced as described above.
Competitive MNX1 Knockdown In Vivo Assays
Constitutive Knockdown
Transgenic AML PDX cells were isolated from BM of donor mice and cultured in StemPro-34 medium (Thermo Fisher Scientific) with penicillin/streptomycin, L-glutamine (both Thermo Fisher Scientific), 10 ng/mL hrFLT3L (R&D Systems), 10 ng/mL hrSCF, 10 ng/mL hrTPO, and 10 ng/mL hrIL3 (all Peprotech; ref. 92) at a density of 106 cells/mL at 37°C, 5% CO2. For ex vivo flipping of the shRNA cassettes, the cells were treated using 200 nmol/L (Z)-4-hydroxytamoxifen (Sigma Aldrich, #H7904). This induces flipping of the shRNA cassette, which leads to the expression of the respective shRNA and a switch of the expressed fluorochrome from mTagBFP to eGFP and from iRFP720 to T-Sapphire, respectively. Cells harboring the flipped cassette were enriched via FACS. MNX1 shRNA– and Renilla control shRNA–expressing cells were mixed in a 1:1 ratio and injected into three mice per MNX1 shRNA via tail vein injection (1 × 106 cells per population, 2 × 106 per mouse). The individual input mixes were measured by flow cytometry for each animal before injection as an input sample (Supplementary Fig. S12A–S12E). Outgrowth of tumor cells was monitored by repeated blood samplings and staining for hCD33+ cells (BD Pharmingen PE Mouse Anti-Human CD33, clone WM53, cat. no. 555450; RRID: AB_395843). At an advanced stage of leukemia (hCD33+ cells >60%), mice were sacrificed and PDX cells were isolated from the BM, spleen, and blood.
Inducible Knockdown
In vivo induction of the MNX1 shRNA expression was performed according to Carlet and colleagues (62). Transgenic AML PDX cells were isolated from BM of donor mice. CreERT2/shMNX1 and CreERT2/shRenilla transgenic cells were mixed in a 1:1 ratio and injected into mice via tail vein injection (N = 13; 1 × 106 cells per population and mouse). Fifty mg/kg TAM (Sigma Aldrich, #T5648) was administered once 14 days after transplantation via oral gavage as previously described. Mice were sacrificed on the day of TAM administration without receiving TAM, 3 days after TAM administration, and at an advanced stage of leukemia (hCD33+ cells >60%).
Statistical Analyses
The FDR for Pyjacker was computed by converting the scores into empirical P values and correcting for multiple testing, as described above. For the analysis of the MNX1 in vivo knockdown, we compare the ratio of the two flipped cell populations by performing two-tailed unpaired t tests using Prism 10 (GraphPad Prism).
Data Availability
WGS and RNA-seq data of patient samples are available at the European Genome-phenome Archive under the accession EGAS50000000743. All preprocessed data used as input to Pyjacker for the ckAML cohort are provided in the GitHub repository at https://github.com/CompEpigen/pyjacker/tree/main/data. WGS data of the cell line OCI-AML3 and WGS and RNA-seq data of the cell lines MOLM-1 and MUTZ-3 were uploaded to the SRA under project PRJNA1140384. The source code for Pyjacker is available at https://github.com/CompEpigen/pyjacker. This manuscript describes pyjacker version 1.1.2, which is archived at Zenodo https://doi.org/10.5281/zenodo.14516090. A Code Ocean capsule reproducing Pyjacker’s results on the ckAML dataset is also available at https://codeocean.com/capsule/6670298/tree/v2. The Nextflow workflow used to prepare Pyjacker’s inputs, starting from BAM files, is available at https://github.com/CompEpigen/wf_WGS.
Authors’ Disclosures
E. Jahn reports personal fees and other support from AstraZeneca outside the submitted work. F. Brown-Burke reports grants from DKFZ during the conduct of the study. M. Schönung reports other support from Joachim Herz Foundation during the conduct of the study. S.S. Mughal reports grants from DFG (FOR2674) outside the submitted work. B. Brors reports grants from DFG (German Research Council) during the conduct of the study and grants from Hector Foundation II, German Federal Ministry of Research and Education (BMBF): DEEP-HCC/Li-Sym Cancer, and Fritz Thyssen Foundation outside the submitted work. A.-K. Eisfeld reports grants from the NCI and American Cancer Society during the conduct of the study, as well as other support from Karyopharm, personal fees from VJ HemeOnc and Dava Oncology, and Syndax outside the submitted work. L. Bullinger reports grants from Bayer and Jazz Pharmaceuticals and personal fees from Abbvie, Amgen, Astellas, Bristol Myers Squibb, Daiichi Sankyo, Gilead, Glaxo Smith Kline, Janssen, Jazz Pharmaceuticals, Novartis, Otsuka, Pfizer, Roche, Sanofi, and Servier outside the submitted work. K. Döhner reports grants from Astex Pharmaceuticals during the conduct of the study. H. Döhner reports grants from Astex during the conduct of the study. D.B. Lipka reports grants from the German Research Foundation (DFG) during the conduct of the study and personal fees from Infectopharm GmbH outside the submitted work. No disclosures were reported by the other authors.
Authors’ Contributions
E. Sollier: Data curation, software, formal analysis, investigation, visualization, methodology, writing–original draft, writing–review and editing. A. Riedel: Investigation, visualization, methodology, writing–original draft, writing–review and editing. U.H. Toprak: Formal analysis, investigation, writing–review and editing. J.A. Wierzbinska: Investigation, writing–review and editing. D. Weichenhan: Investigation, writing–review and editing. J.P. Schmid: Investigation, visualization, writing–review and editing. M. Hakobyan: Investigation, writing–review and editing. A. Touzart: Investigation, writing–review and editing. E. Jahn: Resources, data curation, writing–review and editing. B. Vick: Resources, writing–review and editing. F. Brown-Burke: Investigation, writing–review and editing. K. Kelly: Investigation, writing–review and editing. S. Kelekci: Investigation, writing–review and editing. A. Pejkovska: Investigation, writing–review and editing. A. Goyal: Investigation, writing–review and editing. M. Bähr: Investigation, writing–review and editing. K. Breuer: Software, writing–review and editing. M.-J.M. Chen: Investigation, writing–review and editing. M. Llamazares-Prada: Visualization, writing–review and editing. M. Hartmann: Investigation, writing–review and editing. M. Schönung: Investigation, writing–review and editing. N. Correia: Investigation, writing–review and editing. A. Trumpp: Supervision, writing–review and editing. Y. Abdullah: Investigation, writing–review and editing. U. Klingmüller: Supervision, writing–review and editing. S.S. Mughal: Investigation, writing–review and editing. B. Brors: Supervision, writing–review and editing. F. Westermann: Supervision, writing–review and editing. E. Ulrich: Formal analysis, writing–review and editing. R.J. Autry: Formal analysis, supervision, writing–review and editing. M. Schlesner: Supervision, writing–review and editing. S. Vosberg: Resources, writing–review and editing. T. Herold: Resources, writing–review and editing. P.A. Greif: Resources, writing–review and editing. D. Pfeifer: Resources, writing–review and editing. M. Lübbert: Resources, writing–review and editing. T. Fischer: Resources, writing–review and editing. F.H. Heidel: Resources, writing–review and editing. C. Gebhard: Resources, writing–review and editing. W. Walter: Resources, data curation, writing–review and editing. T. Haferlach: Supervision, writing–review and editing. A.-K. Eisfeld: Resources, writing–review and editing. K. Mrózek: Resources, writing–review and editing. D. Nicolet: Resources, writing–review and editing. L. Bullinger: Resources, writing–review and editing. L. Smeenk: Resources, writing–review and editing. C. Erpelinck-Verschueren: Resources, writing–review and editing. R. Mulet-Lazaro: Resources, writing–review and editing. R. Delwel: Supervision, writing–review and editing. A. Ernst: Supervision, writing–review and editing. M. Scherer: Supervision, methodology, writing–review and editing. P. Lutsik: Supervision, methodology, writing–review and editing. I. Jeremias: Supervision, writing–review and editing. K. Döhner: Resources, writing–review and editing. H. Döhner: Resources, writing–review and editing. D.B. Lipka: Conceptualization, supervision, writing–review and editing. C. Plass: Conceptualization, supervision, funding acquisition, writing–original draft, project administration, writing–review and editing.
Acknowledgments
We thank the Genomics and Proteomics Core Facility, the Omics IT and Data Management Core Facility, and the Single-Cell Open Lab of the DKFZ Heidelberg. We thank Ilaria Iacobucci and Charles Mullighan (St. Jude Children’s Research Hospital) for sharing their AEL RNA-seq dataset. We thank June Takeda and Seishi Ogawa (Kyoto University) for sharing their AEL RNA-seq dataset. This work was supported in part by the German Funding Agency (DFG) through SPP1463 (to D.B. Lipka and C. Plass), FOR 2674 (to H. Döhner, K. Döhner, L. Bullinger, and C. Plass) and SFB 1074 (to H. Döhner, K. Döhner, L. Bullinger, and C. Plass), the Carreras Foundation (C. Plass), the Helmholtz Foundation, the Heidelberg Center for Personalized Oncology, and the NCT Personalized Oncology Program (project #HIPO-030 to D.B. Lipka and C. Plass). F.H. Heidel was supported by the Thuringian state program ProExzellenz (RegenerAging - FSU-I-03/14) and through project grants of the German Research Council (DFG) HE6233/4-2 project number 320028127, HE6233/9-1 project number 453491106, and HE6233/10-1 project number 505859092. M. Scherer is supported through a postdoctoral fellowship by the Dr. Rurainski Foundation. M. Schönung is supported by the Joachim Herz Foundation (Add-on Fellowships for Interdisciplinary Life Science). A-K. Eisfeld was supported by R01CA262496, R01CA284595-01, R01CA283574-01, Leukemia & Lymphoma Society, and the American Cancer Society.
Note: Supplementary data for this article are available at Blood Cancer Discovery Online (https://bloodcancerdiscov.aacrjournals.org/).