Lineage-ambiguous leukemias are high-risk malignancies of poorly understood genetic basis. Here, we describe a distinct subgroup of acute leukemia with expression of myeloid, T lymphoid, and stem cell markers driven by aberrant allele-specific deregulation of BCL11B, a master transcription factor responsible for thymic T-lineage commitment and specification. Mechanistically, this deregulation was driven by chromosomal rearrangements that juxtapose BCL11B to superenhancers active in hematopoietic progenitors, or focal amplifications that generate a superenhancer from a noncoding element distal to BCL11B. Chromatin conformation analyses demonstrated long-range interactions of rearranged enhancers with the expressed BCL11B allele and association of BCL11B with activated hematopoietic progenitor cell cis-regulatory elements, suggesting BCL11B is aberrantly co-opted into a gene regulatory network that drives transformation by maintaining a progenitor state. These data support a role for ectopic BCL11B expression in primitive hematopoietic cells mediated by enhancer hijacking as an oncogenic driver of human lineage-ambiguous leukemia.
Lineage-ambiguous leukemias pose significant diagnostic and therapeutic challenges due to a poorly understood molecular and cellular basis. We identify oncogenic deregulation of BCL11B driven by diverse structural alterations, including de novo superenhancer generation, as the driving feature of a subset of lineage-ambiguous leukemias that transcend current diagnostic boundaries.
This article is highlighted in the In This Issue feature, p. 2659
Acute leukemias of ambiguous lineage (ALAL) remain a formidable diagnostic and therapeutic challenge (1, 2). Such leukemias either show limited lineage differentiation or exhibit immunophenotypic features of multiple lineages, most commonly myeloid and either T- or B-lymphoid lineage and are often termed mixed phenotype acute leukemia (MPAL). Early T-cell precursor acute lymphoblastic leukemia (ETP-ALL), although often considered a subtype of T-ALL due to expression of cytoplasmic CD3, also exhibits lineage ambiguity with lack of expression of specific conventional T-ALL markers (e.g., CD1a, CD8, and CD5), along with aberrant expression of myeloid or stem cell markers (3, 4). Thus, lineage-ambiguous leukemias are classified by immunophenotypic features rather than by genomic or biological features, which results in a lack of clarity regarding appropriate therapy, which commonly fails. An improved understanding of the biological basis of these leukemias is thus desirable.
Several observations suggested that the ambiguous immunophenotype of such leukemias results from the interaction of leukemia-initiating genetic alterations and the hematopoietic cell in which these alterations arise (4, 5). Genomic analyses of ETP-ALL showed that such leukemias commonly harbor mutations in genes regulating myeloid maturation, kinase signaling, and chromatin modification which are often observed in myeloid leukemias; furthermore, the transcriptional profile of ETP-ALL is similar to a normal or leukemia myeloid stem cell, which suggests a hematopoietic progenitor as the cell of origin (4, 6). Similar analyses of MPAL showed distinct patterns of hematopoietic transcription factor alterations, such as ETV6, WT1, and RUNX1 in cases with T-lineage and myeloid features (T/myeloid MPAL) and rearrangement of ZNF384 in B/myeloid MPAL (5). ZNF384 rearrangement is also observed in B-ALL cases that lack evidence of myeloid differentiation yet have an otherwise indistinguishable genomic and transcriptomic profile to B/myeloid MPAL (7, 8), and such cases may shift phenotype between MPAL and typical ALL or acute myeloid leukemia (AML) during disease progression, suggesting a common etiology (5). Importantly, genotyping and xenotransplantation analyses of purified hematopoietic progenitors from MPAL samples demonstrated that lineage plasticity is independent of genetic variegation and inherent to all leukemic cells in a given sample (5). Thus, it has been proposed that subsets of ALAL, such as T/myeloid MPAL and ETP-ALL, should be considered a distinct entity (9), but the oncogenic drivers and biological relationship of the various types of lineage-ambiguous acute leukemia to typical myeloid and lymphoid leukemias remain poorly understood. Moreover, genomic sequencing efforts of leukemia samples may fail to capture alterations that deregulate oncogene expression, such as those driven by noncoding alterations (10–12). Here, using integrated genomic analyses of a large acute leukemia cohort, we identify diverse, predominantly noncoding structural variants driving enhancer hijacking events that deregulate BCL11B in hematopoietic progenitor cells as the driver oncogenic events of a distinct subtype of lineage-ambiguous leukemia.
To define gene expression–based subtypes of acute leukemia and the driver genomic alterations, we performed gene-expression profiling and genetic alteration analysis on transcriptome sequencing (RNA-seq) data from 2,574 adult and childhood cases of acute leukemia, including 775 T-lineage ALL, 262 AML, 126 MPAL, and 1,411 B-ALL cases representative of established subtypes (ref. 7; Supplementary Fig. S1A and S1B; Supplementary Tables S1 and S2). As observed in prior analyses restricted to B-ALL (7), subtypes defined by leukemia-initiating somatic chromosomal alterations or sequence mutations exhibited distinct gene-expression profiles. In addition, the subset of B/myeloid MPAL that harbored ZNF384 rearrangements or recurrent alterations observed in typical B-ALL (e.g., high hyperdiploidy, BCR–ABL1, low hypodiploidy, or PAX5 P80R/R38C) clustered with respective B-ALL molecular subtypes, indicating that these cases are canonical B-ALL subtypes rather than MPAL cases of distinct genetic basis.
To further explore the genomic basis of lineage-ambiguous leukemia, we repeated these analyses after exclusion of canonical B-ALL subtypes to enable better resolution of the transcriptional relationships among samples of T and/or myeloid lineage (Fig. 1A; Supplementary Fig. S1C; Supplementary Table S2). T-ALL, AML, and T/myeloid MPAL cases predominantly clustered according to shared genomic alterations that allowed discrimination of known T-ALL and AML subtypes, including clear delineation of recently described subtypes such as T-ALL with rearrangement of SPI1 (13), and identification of a distinct cluster of T-ALL cases with recurrent rearrangements of LMO2, primarily to STAG2, a member of the cohesin complex, as well as SIK3 and FOXJ3. Notably, we observed poor clustering according to immunophenotype in lineage-ambiguous cases (T/myeloid MPAL, ETP or near-ETP ALL; Fig. 1A).
This analysis also identified a distinct cluster of 61 cases that included T/myeloid MPAL (N = 23, 37.7%), ETP-ALL (N = 21; 34.4%), AML (N = 10; 16.4%), acute undifferentiated leukemia (AUL; N = 2; 3.3%), and five cases described as T-ALL but lacking data to determine ETP-ALL/MPAL status (Fig. 1A; Supplementary Fig. S1C; Supplementary Table S3). The immunophenotype of these cases was cytoplasmic CD3+, CD2+, CD34+, CD117+, negative for CD1a, surface CD3, CD5, and CD8, and variable for myeloperoxidase (MPO), a critical marker distinguishing ETP-ALL from T/myeloid MPAL. T-cell receptor gene rearrangements were absent in all cases (Supplementary Table S4; Supplementary Fig. S2), suggesting a hematopoietic progenitor cell of origin.
BCL11B Rearrangement in Lineage-Ambiguous Leukemia
Initial analysis of RNA-seq data of cases in this cluster identified six (9.8%) cases with chimeric fusion genes involving the T-cell transcription factor gene BCL11B (RUNX1–BCL11B and ZEB2–BCL11B). However, all cases with matched transcriptomic and whole-genome sequencing (WGS) data (N = 53) exhibited monoallelic expression of BCL11B, suggesting BCL11B deregulation as the unifying driver alteration (Fig. 1B; Supplementary Fig. S3A–S3C; Supplementary Table S5). We did not observe monoallelic BCL11B expression in canonical T-ALL (N = 59 cases examined), with the exception of cases with TLX1/3 deregulation, to which BCL11B is rearranged (Fig. 1B; Supplementary Table S5; refs. 14, 15).
Analysis of co-occurring genomic alterations showed that 49 of 61 (80%) BCL11B group cases harbored activating internal tandem duplication (ITD) or D835Y mutations in FLT3, greatly exceeding the frequency observed in other leukemia subtypes (Fig. 1C; Supplementary Fig. S4A–S4C; Supplementary Tables S2 and S3). Thus, of those T/myeloid MPAL samples with FLT3 alterations, 19 of 20 (95%) belonged to the BCL11B group, in contrast to only 4 of 30 (13.3%) which lacked FLT3 mutations (Supplementary Fig. S4D). Irrespective of the presence of FLT3 mutations, all BCL11B group samples exhibited elevated FLT3 expression levels as compared with non-BCL11B group T/myeloid MPAL, ETP-ALL, AML, and T-ALL cases (Fig. 1C; Supplementary Figs. S5 and S6). BCL11B expression was similarly high in this group as compared with non-BCL11B group MPAL, ETP-ALL, and AML (Supplementary Fig. S5). Mutations in WT1 (19 of 61, 31%) and alterations of RUNX1 (16 of 61, 26%) were also common (Fig. 1C; Supplementary Tables S6–S9; Supplementary Fig. S6).
To determine the genomic basis for BCL11B deregulation, we analyzed WGS data that identified structural variations (SV) involving BCL11B in all cases with available data (N = 53; Fig. 2A; Supplementary Table S3). All SV breakpoints, except for one corresponding to a ZEB2–BCL11B fusion, occurred either upstream or downstream of the BCL11B gene body and involved eight distinct partner loci. The most common was rearrangement of BCL11B to the gene desert upstream of ARID1B on chromosome 6 (N = 23; 37.7%), whereas eight (13.1%) cases harbored rearrangement near the “blood enhancer cluster” (BENC; ref. 16) located distal to MYC within the CCDC26 gene on chromosome 8. Other rearrangements mapped to CDK6 on chromosome 7 (N = 3; 4.9%), ETV6 on chromosome 12 (N = 1; 1.6%), and SATB1 on chromosome 3 (N = 1; 1.6%). In addition, 13 (21.3%) cases harbored focal amplification of a 2.5 kb noncoding region 730 kb downstream of BCL11B on chromosome 14. These BCL11B SVs were otherwise not identified in WGS analysis of 5,550 pediatric and adult hematologic malignancies, 344 pediatric brain tumors, and 797 pediatric solid tumors (refs. 17–19; Supplementary Table S10).
We verified the presence of this entity in an independent cohort of 91 adults with T-lineage leukemia (70 with RNA-seq), which included 36 ETP-ALL or T/myeloid MPAL cases (Supplementary Table S11); 14/70 (20%) cases were assigned to the BCL11B group, which comprised 13 of the 36 (36%) ETP-ALL and T/myeloid MPAL samples (Supplementary Fig. S7). Survival of adult patients with BCL11B-rearranged leukemia in this cohort was favorable compared with ETP and non- ETP T-ALL, with median relapse-free survival of 9.78 years and overall survival 9.9 years (Supplementary Table S12; Supplementary Fig. S8A and S8B).
BCL11B encodes a C2H2 zinc finger transcription factor of central importance in T-cell development (20–23), and its expression coincides with the onset of T-lineage commitment in both mouse and human (21, 24, 25). Continued BCL11B expression is required to promote T-cell differentiation and repress alternate lineage fate choices, including myeloid and natural killer cell fates that are retained in thymic precursors lacking BCL11B (21, 22). One mechanism by which BCL11B directs lineage fate is through recruitment of corepressor complexes to silence genes involved in alternate fate lineages (26), for example, by closing chromatin binding sites for the myeloid transcription factor PU.1 (27). Consistent with its role in T-cell development, BCL11B is commonly targeted by diverse somatic alterations in T-ALL, including deletions and sequence mutations enriched in the DNA-binding zinc finger domains that are postulated to result in loss of function (28–30), as well as rearrangements, particularly a cryptic t(5;14)(q35;q32) translocation with TLX3, in which distal BCL11B enhancer elements drive overexpression of the TLX3 oncogene (15, 31, 32). However, we did not observe these previously described BCL11B alterations in this subgroup.
Chromatin State of BCL11B Rearrangement Partner Loci
Allele-specific BCL11B expression coupled with chromosomal rearrangements suggested that the genetic alterations of BCL11B observed in this subtype of leukemia are distinct from the loss-of-function or TLX3-deregulating alterations observed in typical T-ALL and may result in oncogenic deregulation of BCL11B. To investigate a potential oncogenic role of genomic loci rearranged to BCL11B, we examined the chromatin context of genomic breakpoints at the partner loci in multiple hematopoietic cell types. Whether in gene deserts (e.g., ARID1B) or genic regions (e.g., CDK6), rearrangement breakpoints occurred in proximity to cord blood (cb) CD34+ hematopoietic stem and progenitor cell (HSPC) superenhancers as assessed by histone 3 lysine 27 acetylation (H3K27ac) chromatin immunoprecipitation (ChIP-seq; refs. 33, 34; Fig. 2B), which were diminished or absent in more committed T-cell progenitors and nonhematopoietic cell types (Fig. 2C; Supplementary Fig. S9A and S9B). Importantly, these chromosomal rearrangements, which included reciprocal translocations and more complex SVs (Supplementary Figs. S10A, S10B, and S11A–S11D), always resulted in juxtaposition of an HSPC enhancer locus in cis with the upstream or downstream regions of BCL11B, often several hundred kilobases from the BCL11B promoter (as exemplified by rearrangements near ARID1B; Supplementary Fig. S12). Notably, the CCDC26 rearrangements at chromosome 8q24.21 occurred near two well-characterized MYC enhancers, BENC (16) and the NOTCH1-driven MYC enhancer (N-Me; ref. 35), that play roles in normal and malignant hematopoiesis (36). Because BCL11B is normally repressed in HSPCs (37), a putative cell of origin for these leukemias (5), these observations raise the notion that BCL11B SVs deregulate BCL11B expression through hijacking of active enhancers in a primitive hematopoietic cell.
Genomic Organization of Rearranged BCL11B Loci
To directly demonstrate an enhancer hijacking mechanism for BCL11B deregulation, we used HiChIP (38) to simultaneously profile H3K27ac and chromatin architecture in primary leukemia samples. We first performed H3K27ac HiChIP in normal human cbCD34+ HSPCs and 2 T-ALL cell lines [TAL1-deregulated Jurkat (39) and BCL11B–TLX3 rearranged DND-41 (40)] to assess the stage-specific configuration of the BCL11B locus. Consistent with its normally silent state in HSPCs, BCL11B was devoid of H3K27ac in cbCD34+ cells (Fig. 3A). In contrast, in both DND-41 and Jurkat T-ALL cells, multiple long-range chromatin interactions were identified between the BCL11B promoter and distal enhancer elements located 1 Mb downstream of BCL11B (Fig. 3B and C). This enhancer cluster resides near a noncoding RNA termed ThymoD, which is required for BCL11B activation during thymocyte development in mice (41, 42) and is responsible for driving oncogenic TLX3 expression in T-ALL cases harboring the t(5;14)(q35;q32) translocation. These data highlight the dynamic configuration of the BCL11B locus at distinct stages of hematopoietic development and in T-lineage leukemias.
We next performed H3K27ac HiChIP in three primary leukemia samples representative of recurrent BCL11B SV rearrangements. We first examined an ETP-ALL case with an ARID1B–BCL11B rearrangement (Fig. 3D). Similar to normal cbCD34+ cells, the distal ARID1B enhancer was also active in the BCL11B-rearranged leukemia sample, and HiChIP demonstrated that this enhancer, as well as two additional H3K27ac regions farther upstream of ARID1B, formed long-range chromatin contacts with BCL11B following rearrangement.
To more accurately visualize the structural configuration of the rearranged ARID1B–BCL11B locus and identify true cis interactions, we remapped the HiChIP data using a patient-specific reference genome (see Methods; Fig. 3D, bottom panel) and used MAPS (43) to call significant chromatin interactions. This revealed that BCL11B not only forms long-range contacts with multiple enhancers derived from the ARID1B locus, in particular with the more distal enhancer peak, but also demonstrated high levels of H3K27ac throughout the entire BCL11B gene body. This chromatin structure is notably different from the DND-41 and Jurkat T-ALL cell lines, where the BCL11B promoter primarily interacts with the distal ThymoD element (DND-41) and the BCL11B 3′ untranslated region (UTR; Jurkat; Fig. 3B and C; Supplementary Fig. S13). Moreover, sequential RNA–DNA FISH demonstrated the presence of BCL11B RNA colocalized with ARID1B enhancer DNA on one allele, whereas the second BCL11B allele only exhibited DNA FISH signal, providing orthogonal evidence of allele-specific BCL11B expression driven by the hijacked ARID1B enhancer (Supplementary Fig. S14A and S14B).
We observed similar evidence that HSPC superenhancers drove BCL11B expression in two additional leukemia samples: an AUL case with a CCDC26/BENC–BCL11B rearrangement (Fig. 3E) and a T/myeloid MPAL case with a CDK6–BCL11B rearrangement (Fig. 3F). The CCDC26/BENC rearrangement results in translocation of the BENC superenhancer to 380 kb upstream of BCL11B, within the SETD3 gene body. Despite this distance, HiChIP demonstrated direct interactions between the rearranged superenhancer and BCL11B (Fig. 3E), and sequential RNA-DNA FISH demonstrated colocalization of the BENC genomic sequence and BCL11B RNA signal only on the rearranged allele (Supplementary Fig. S14C). The CDK6 SV resulted in rearrangement of two strong enhancer elements downstream of BCL11B that formed significant chromatin interactions with the BCL11B promoter (Fig. 3F). RNA and DNA FISH also demonstrated colocalization of CDK6-derived enhancer RNA and nascent BCL11B mRNA at the rearranged locus (Supplementary Fig. S14D). Notably, none of these BCL11B-rearranged leukemia cases showed evidence of ThymoD or N-Me activity, the two T-cell enhancers clearly present in CD34+ progenitors isolated from human thymus (Fig. 3D–F; Supplementary Fig. S15A and S15B). The enhancer landscapes (H3K27ac) of all rearranged loci in primary leukemia samples, normal cbCD34+ HPSCs, normal thymocytes (27), and T-ALL cell lines are shown in Supplementary Fig. S15A–S15H.
De Novo Superenhancer Generation through High Copy Tandem Amplification of a Noncoding Region
One fifth of cases in this subgroup harbored a high copy amplification of a 2.5-kb, evolutionarily conserved noncoding element 730 kb downstream of BCL11B on chromosome 14 (Fig. 4A; Supplementary Fig. S16A–S16D). Genomic quantitative PCR of 4 cases demonstrated 15 to 20 copies of this element (Supplementary Fig. S16E), suggesting that amplification above a threshold copy number is important for optimal function of this amplification, which we term BETA (BCL11B enhancer tandem amplification). To resolve the structure and genomic location of BETA, we performed long-read DNA sequencing using the PacBio platform for two cases (SJMPAL011911, estimated to contain 14 copies for a final size of 35 kb, and SJTALL005006, estimated to contain 17 copies for a final size of 42.5 kb). The longest subreads from each case confirmed that the 2.5 kb element is present in a tandem array at the endogenous site of the original element (Supplementary Fig. S17A and S17B; Supplementary Table S13). Reads spanning the entire amplification, including nonamplified flanking DNA sequence, were obtained in SJMPAL011911, demonstrating that the amplification arose in cis with BCL11B, rather than as an extrachromosomal DNA fragment or elsewhere in the genome.
In contrast to the chromosomal rearrangement partners of all other BCL11B group cases, the chromatin state of this element exhibited weak H3K27 acetylation in normal hematopoietic progenitor cells, including cbCD34+ and CD34+ thymocytes (Fig. 4A and B; refs. 27, 34), whereas more committed myeloid and lymphocyte populations surveyed by the Epigenome Roadmap Project (33) lacked active chromatin marks (Supplementary Fig. S18A and S18B). Thus, despite this weak signal, any regulatory activity encoded by this element likely acts specifically in hematopoietic progenitor cells and not more differentiated cell types. Analysis of transcription factor motifs present in this element identified recurrent ZNF384 and MEIS1/2/3 binding motifs (Supplementary Fig. S18B), and we did not find any evidence of somatically acquired mutations within BETA. We hypothesized that, following high copy tandem amplification, this element would be transformed into a potent enhancer capable of activating BCL11B, similar to the other enhancers involved in BCL11B rearrangements. In support of this, all cases with the amplification exhibited evidence of enhancer-derived transcriptional activity on analysis of RNA-seq data, which was absent in samples lacking the amplification (Fig. 4B; Supplementary Fig. S16A).
To more conclusively demonstrate that this amplification generates a strong enhancer, we performed H3K27ac HiChIP on the two samples analyzed with PacBio sequencing (SJMPAL011911, a T/myeloid MPAL sample, and SJTALL005006, an ETP-ALL sample). In both cases, HiChIP demonstrated high levels of H3K27ac signal over the amplified region uniquely in these cases, which was absent in three BCL11B group samples lacking BETA, normal cbCD34+ HSPCs, and two T-ALL cell lines (Fig. 4C). BETA also formed multiple long-range chromatin loops with BCL11B, supporting its role as a direct regulator of BCL11B expression (Fig. 4D). Similar to the other BCL11B group cases, the BCL11B gene itself was demarcated by H3K27ac along the entire gene body, a pattern not observed in normal thymocyte precursors or the Jurkat or DND-41 T-ALL cell lines (Supplementary Fig. S15A). To determine whether this pattern reflected H3K27ac enrichment along the linear genome, or rather resulted from interactions with the H3K27ac-anchored amplification, we performed H3K27ac ChIP-seq in SJTALL005006 to assess the one-dimensional chromatin state of BCL11B. This confirmed that the HiChIP signal does indeed reflect a broad domain of H3K27ac over the entire 100-kb BCL11B gene body, independent of chromatin looping, making this one of the largest domains of contiguous H3K27ac in the genome of STJALL005006 (Supplementary Fig. S19A and S19B).
Examples of oncogenic enhancer amplifications have been reported previously (35, 44, 45), most notably duplications; however, the high copy number of this tandem amplification led us to question the mechanism of its formation. Tandem amplifications have been reported to result from genomic instability resulting from short stretches of sequence homology that lead to DNA secondary structures during replication (46, 47). In support of this as a plausible mechanism for BETA formation, two homologous “self-chain” sequences flank the 2.5-kb element, and the left and right breakpoints of all 11 samples with WGS data available overlap these flanking homology blocks (Supplementary Figs. S16D and S17A).
Interestingly, we noted that the ThymoD element—normally active in thymocyte progenitors and T-ALL synchronously with the N-Me MYC enhancer—was weakly active in the ETP-ALL case, which otherwise showed an immature progenitor enhancer landscape as marked by the lack of activation of N-Me (Fig. 4D; Supplementary Fig. S15B). As N-Me and ThymoD are typically activated at the same developmental stage (namely, in CD34+ CD1a− thymocytes; Supplementary Fig. S15A and S15B), this suggested that BETA may also be able to activate an otherwise dormant T-cell enhancer to contribute to BCL11B upregulation. Consistent with this, BETA formed significant chromatin interactions with the ThymoD element as well as the BCL11B gene (Fig. 4D).
Integrated Analysis of Gene Expression and BCL11B Chromatin Occupancy in BCL11B-Rearranged Leukemia
Collectively, these data support that BCL11B expression is driven by HSPC-derived superenhancers. Therefore, to investigate whether BCL11B group leukemias exhibit an HSPC-like gene-expression state, we performed CIBERSORT deconvolution analysis on our cohort of leukemia transcriptomes using signatures derived from normal HSPC and mature populations (Fig. 5A and B). The BCL11B group samples showed strongest enrichment for HSPC subtypes, particularly hematopoietic stem cell (HSC) and multilymphoid progenitor (MLP) signatures, and this enrichment was significantly higher as compared with non-BCL11B group T/myeloid MPAL, ETP-ALL, AML, and T-ALL (Supplementary Fig. S20). We next performed combined single-cell (sc) ATAC-seq/RNA-seq in two BCL11B group samples to functionally connect BCL11B expression with open chromatin signatures in individual cells. Consistent with the bulk gene-expression analysis, the scATAC-seq open chromatin profiles of both leukemia samples were significantly enriched for the long-term HSPC (LT-HSPC) and the activated HSPC (Act-HSPC) signatures (Fig. 5C; Supplementary Fig. S21A and S21B). Importantly, BCL11B expression correlated with this enrichment (Fig. 5D), consistent with a role for BCL11B in driving a progenitor cell gene-expression program in this leukemia subtype.
To directly identify BCL11B-regulated genes, we performed BCL11B ChIP-seq in four primary BCL11B group samples (ETP-ALL cases SJTALL005006 and SJALL068666, T/myeloid MPAL case SJALL067671, and case SJALL067672 lacking MPO data) and the DND-41 T-ALL cell line, and compared these data with publicly available BCL11B binding profiles of normal CD34+ and CD34− thymocytes (21). BCL11B binding was widespread, with 8,437 genes showing evidence of promoter-proximal BCL11B binding in all four BCL11B group samples (Supplementary Fig. S22A). These genes, on average, exhibited higher expression in BCL11B group samples than genes lacking promoter-proximal BCL11B binding (Supplementary Fig. S22B). Due to the high number of BCL11B-bound genes, we examined the subset with promoters bound by BCL11B specifically in the BCL11B group leukemias and not in DND-41 or normal proT cells (N = 387, 4.58%). These genes were enriched for pathways related to TGFb receptor signaling (including BMP4, SMAD3, SPP1, WNT1, SMAD5, and ENG) and HSC differentiation (including LYL1, CIITA, SPI1, LMO2, NFATC2, and GATA2; Supplementary Fig. S22C and S22D). Additionally, 78/387 (20.2%) genes within this subset were upregulated in the BCL11B leukemia group compared with all other non–B-ALL leukemia samples. These genes were significantly enriched for biological processes related to lymphocyte proliferation, cytokine-mediated signaling, and regulation of kinase activity, including genes coexpressed with FLT3 (e.g., IL3RA; Supplementary Fig. S22E and S22F; ref. 48). Thousands of BCL11B binding sites were also identified at promoter-distal (>10 kb from a transcription start site) genomic locations in the BCL11B group leukemia samples (range, 5,953–22,150; Supplementary Fig. S22A). Motif enrichment analysis of these putative regulatory elements identified RUNX1 as the top-enriched motif in all samples, including T-ALL and normal proT cells, consistent with previously reported findings in mouse thymocytes (refs. 26, 49; Supplementary Fig. S22G). RUNX1 expression was significantly higher in the BCL11B group compared with non-BCL11B group T/myeloid MPAL (P < 0.001), ETP-ALL (P = 0.0013), and AML samples (P = 0.0043), but not T-ALL (p = NS; Supplementary Fig. S5), suggesting possible collaboration between these factors in driving lineage-ambiguous leukemia.
We next assessed whether BCL11B occupancy corresponded to the enriched open chromatin HSPC signature identified from single-cell analysis. For this analysis, three chromatin signatures had been previously identified from bulk ATAC-seq which span human HSPC populations and show distinct patterns of enrichment among single cells [LT-HSPC, Act-HSPC, and myeloid-erythroid progenitor (MEP); Fig. 5E, bottom; ref. 50]. We calculated the enrichment of BCL11B binding in each single cell using ChromVAR (51). Enrichment of BCL11B binding with the Act-HSPC open chromatin signature was consistently highest in the BCL11B group leukemia samples (Pearson correlation = 0.64–0.78) compared with DND-41 cells (Pearson correlation = 0.33), normal CD34+ thymocytes (Pearson correlation = 0.44), or normal CD34− thymocytes (Pearson correlation = 0.24; Fig. 5E). As an example of the putative functional connection between BCL11B and the progenitor cell gene-expression program characteristic of this leukemia subtype, increased BCL11B binding can be seen at the GATA2 locus (Fig. 5F), including occupancy of noncoding regions upstream and downstream of the GATA2 gene, which is absent in normal thymocytes and DND-41 cells.
Finally, we used single-cell RNA-seq to investigate transcriptional heterogeneity within these leukemias. We analyzed three BCL11B-rearranged leukemia samples (one AML, one ETP-ALL, and one T/myeloid MPAL) and examined the expression patterns of BCL11B with the two main lineage-defining markers MPO (for myeloid lineage) and CD3E (for T-lineage; Supplementary Fig. S23A–S23C). MPO and CD3E were expressed in distinct subpopulations of the ETP-ALL and T/myeloid MPAL sample, consistent with bilineal potential of the leukemia. In contrast, BCL11B was expressed throughout the cell population, present in both MPO+ and CD3E+ cells, lending support that aberrant BCL11B expression drives the lineage ambiguity of these leukemias. Notably, MPO expression was highly variable between cells in individual cases, highlighting the limited utility of MPO to classify such cases.
BCL11B Drives Lineage Aberrancy In Vitro
These results support that SVs occurring in an uncommitted, extrathymic hematopoietic progenitor cell result in deregulated activation of BCL11B to initiate development of lineage-ambiguous leukemia. However, whether BCL11B expression is sufficient to drive immunophenotypic aberrancy in the absence of normal T-cell differentiation inputs (e.g., Notch signaling in the thymic microenvironment; refs. 52, 53) is unknown. To address this, we used lentiviral transduction to express BCL11B or an empty vector control in cbCD34+ HSPCs isolated from three different donors and performed RNA-seq on GFP-sorted cells 96 hours after transduction in order to examine the acute effects of ectopic BCL11B expression on transcriptional regulation. Samples clustered predominantly by BCL11B overexpression status (Fig. 6A), with 669 genes upregulated and 564 genes downregulated compared with empty vector control [fold change ≥2, false discovery rate (FDR) < 0.05; Fig. 6B; Supplementary Table S14]. Consistent with its known role as a driver of T-lineage specification, upregulated genes were significantly associated with gene sets related to T-cell differentiation, whereas downregulated genes were negatively correlated with gene sets related to myeloid differentiation (Fig. 6C; Supplementary Table S15). Exemplars of these trends include components of the T-cell coreceptor and critical diagnostic marker CD3 (CD3D, CD3E, and CD3G), and the lymphoid regulator interleukin 7 receptor (IL7R), which were completely repressed in control CD34+ HSPCs but were significantly upregulated following BCL11B overexpression (Fig. 6D). Conversely, the expression of key myeloid differentiation markers, MPO and LYZ, as well as the myeloid transcription factor gene SPI1 (encoding PU.1), was significantly downregulated, consistent with previously reported roles for BCL11B as a repressor of myeloid differentiation (21, 22). These results demonstrate that BCL11B is sufficient to drive a T-lineage expression program in hematopoietic progenitor cells in the absence of Notch signaling.
To examine the impact of these expression changes on cellular differentiation, and to comodel with the highly recurrent FLT3-ITD alteration observed in this subgroup (Fig. 1C), we repeated this experiment and included conditions with or without cotransduction of FLT3-ITD. Colony-forming assays confirmed the myeloid differentiation block predicted from RNA-seq, where regardless of FLT3-ITD expression status, BCL11B-transduced cells yielded significantly fewer colonies compared with the empty vector control (Fig. 6E). Hematopoietic differentiation was also assessed in liquid culture media supplemented with either myeloid or lymphoid-promoting cytokines. We observed a distinct population of cells expressing cytoplasmic CD3 (cCD3) uniquely in BCL11B-transduced cells, which was most prominent in the lymphoid condition with coexpression of FLT3-ITD (Fig. 6F and G). The majority of cCD3+ cells were GFP+ (encoded by the lentiviral vector expressing BCL11B), consistent with BCL11B driving the immunophenotype. In contrast, cMPO expression was restricted to the GFP− population of BCL11B-transduced cells, which is most apparent in myeloid differentiation media (Fig. 6H and I), whereas in empty vector control cells, a prominent cMPO+ population could be observed in both GFP+ and GFP− cells. Taken together, these results provide evidence that deregulated expression of BCL11B in otherwise normal HSPCs can activate genes characteristic of T-cell differentiation, block myeloid differentiation, and drive expansion of a subpopulation with a T-lineage immunophenotype (cCD3+).
We next examined the transformation potential of BCL11B and/or FLT3-ITD in mouse bone marrow HSPC colony-forming assays. BCL11B overexpression was sufficient to promote replating of up to four serial passages, whereas FLT3-ITD-transduced cells failed to replate after a single passage (Fig. 6J). However, the combination of BCL11B and FLT3-ITD led to the strongest replating capacity, which was accompanied by increased cell numbers per colony in each passage (Fig. 6K). In summary, these in vitro results demonstrate that not only can ectopic BCL11B expression in HPSCs drive expression of T-lineage genes in the absence of normal Notch signaling, but BCL11B may also synergize with constitutive FLT3 activity to drive transformation of hematopoietic progenitor cells.
Lineage-ambiguous leukemias remain a significant diagnostic and therapeutic challenge due to lack of clarity surrounding their cellular origins and the molecular drivers of their characteristically ambiguous immunophenotype. Here, we identified multiple convergent genomic alterations deregulating BCL11B in hematopoietic progenitor cells as the unifying driver event of a subset of lineage-ambiguous leukemias, which has major implications for our understanding of the mechanistic role of transcription factor deregulation in leukemogenesis, as well as the taxonomy of leukemia.
BCL11B encodes a T-cell transcription factor whose expression is normally regulated by a complex series of molecular events specifically in cells entering T-lineage specification (24, 25). In uncommitted, extrathymic HSPCs, the chromatin state of the BCL11B promoter is demarcated by high levels of repressive H3K27 trimethylation to maintain the gene in a silent state (37). Although loss-of-function and overexpression studies have elucidated critical roles for BCL11B in promoting and maintaining commitment to the T-lineage, little is known about the functional consequences of aberrant BCL11B expression in a cell type which otherwise actively represses this gene.
Our results highlight multiple roles for BCL11B in the pathogenesis of acute leukemia with T-lineage features: as a tumor suppressor in typical T-lineage ALL, with loss-of-function deletions or mutations; by provision of enhancers that activate nonhematopoietic oncogenes such as TLX3 in a subset of typical T-ALL; and here, cell stage–specific deregulation by existing or de novo superenhancers that are maximally or uniquely active in HSPCs. In some respects, these findings recapitulate deregulation of other primitive (TAL1/LMO2) or nonhematopoietic (TLX1/3) transcription factor genes in T lymphoid leukemogenesis, but several features here are distinctive: hijacking of a regulator of the mature lymphoid series in HSPCs, and the distinctive and diverse range of normal and neoenhancers that mediate this activation. These findings also resolve long-standing conceptual speculation in the cell of origin of lineage-ambiguous leukemia. Two models of the cellular origins have been promulgated: transformation of a progenitor that retains myeloid and T lymphoid differentiation potential or trans/dedifferentiation of a transformed T-cell to achieve a more stem cell–like state (5, 9, 54). Here, integration of knowledge of BCL11B gene regulation, developmental stage–specific chromatin states, genomic analyses, and functional modeling support the former model: that a hematopoietic stem or early progenitor cell, rather than a committed T-cell, is the cell of origin of BCL11B-deregulated leukemia.
Normally, BCL11B transcription is activated from both alleles within a few cell divisions of each other at the CD34+ stage of thymocyte development (21, 24); thus, the presence of uniformly monoallelic expression of BCL11B in this subgroup indicates an aberrant, premature activation mechanism. The observation that all SVs resulted in juxtaposition of the BCL11B locus to regions in the genome harboring HSPC superenhancers provides a plausible mechanism of ectopic BCL11B expression. Although these superenhancers were also identified in CD34+ thymic precursors, two key T-cell enhancer elements were highly active in thymic precursors but not in cbCD34+ HSPCs. These enhancers include the ThymoD element, which normally activates BCL11B at the earliest stages of T-cell differentiation (41), and N-Me, which normally contributes to high MYC expression in developing thymocytes (35). Consistent with the cbCD34+ HSPC chromatin state, neither of these elements were active in the five cases of BCL11B-deregulated leukemias examined by H3K27ac HiChIP, apart from weak ThymoD activity in one BETA-containing case. Moreover, T-cell receptor gene rearrangements were not detected in any BCL11B group leukemia sample, whereas more than half of non-BCL11B ETP-ALL and T/myeloid MPAL leukemias harbored such rearrangements, supporting that BCL11B group leukemias originate prior to T-lineage commitment.
Additionally, we demonstrated that the gene-expression pattern of BCL11B group leukemias was more similar to normal HSPCs than T-lineage cells. Using combined single-cell ATAC-seq/RNA-seq in two primary BCL11B group samples, we further demonstrated that BCL11B mRNA expression levels correlated with enrichment for the HSPC open chromatin signature. BCL11B ChIP-seq in four primary BCL11B group leukemias revealed widespread binding of BCL11B at thousands of promoters and putative cis-regulatory elements. The strong enrichment of RUNX1 motifs at these sites suggests collaboration between these factors in driving the progenitor cell expression program. Previous investigation of BCL11B occupancy in various murine T-lineage contexts has illuminated a complex logic to BCL11B gene regulatory function that is highly dependent on cell type, existing chromatin landscape, and availability of other chromatin binding complexes and factors (23, 26, 49). The fact that BCL11B binds promiscuously in BCL11B group leukemia samples and that this binding correlates strongly with the open chromatin signature of normal HSPCs suggests that ectopic BCL11B expression leads to aberrant co-option into a preexisting stem/progenitor gene-expression program reinforced by the myeloid lineage-repressing activities of BCL11B.
Finally, we modeled the initial stages of ectopic BCL11B overexpression in human cbCD34+ cells and mouse HSPCs. Acute overexpression of BCL11B was sufficient to induce T-cell differentiation gene-expression programs and repress myeloid programs, which translated to a myeloid differentiation block in vitro and, when combined with FLT3-ITD, emergence of a population of cells expressing cCD3. Notch signaling is indispensable for the normal initialization of T-cell differentiation (52, 53), and it has been suggested that the T-lineage immunophenotype characteristic of ETP-ALL and T/myeloid MPAL might reflect transformation of a thymic progenitor cell (9, 55). However, our results demonstrate that HSPCs aberrantly expressing BCL11B and high levels of FLT3 activity can acquire a T-lineage immunophenotype in the absence of thymus-dependent Notch signaling. Moreover, BCL11B and FLT3-ITD overexpression in mouse lineage-negative HSPCs conferred self-renewal properties, thereby demonstrating a key readout of the transformation potential of putative oncogenes. These in vitro results collectively demonstrate oncogenic properties of ectopic BCL11B expression in HSPCs: aberrant activation of T-lineage genes coincident with a block in myeloid differentiation in human cells and enhanced self-renewal properties in a standard mouse hematopoietic transformation assay.
While sharing features of lineage ambiguity and stemness, MPAL and ETP-ALL remain arbitrarily classified by cell-surface immunophenotype and are often distinguished by a single, variable marker, myeloperoxidase, rather than leukemia-driving genomic aberrations. This confounds clinical management and the pursuit of more efficacious therapy based on a sound understanding of leukemogenesis. Here we demonstrated that one third of ETP-ALL and T/myeloid MPAL, together with a small number of AML cases, comprise a distinct group unified by structural alterations deregulating BCL11B. Although the number of cases with outcome data from the ECOG cohorts was small, BCL11B-rearranged cases had a superior outcome to non–BCL11B-rearranged ETP-ALL. Several prior case reports have described chromosomal rearrangements in myeloid, lymphoid, or mixed phenotype leukemia consistent with rearrangement of BCL11B (31, 56–63). An additional recent series also identified recurrent BCL11B rearrangements to ARID1B, CCDC26, CDK6, and ZEB2 in 20 cases of lineage-ambiguous leukemia identified by DNA FISH analysis (64). Thus, although several BCL11B rearrangements were identified by conventional diagnostic approaches, this study did not identify the BETA enhancer tandem amplification, which occurs in more than 20% of BCL11B group cases, or the less common rearrangements near ETV6, SATB1, and RUNX1. The ease with which BETA cases can be identified from RNA-seq might increase discovery of these cases in the future.
In summary, our findings show that BCL11B deregulation is a subtype-defining event and support the definition of a new entity of lineage-ambiguous leukemia, BCL11B-deregulated ALAL, that includes one third of ETP-ALL and T/myeloid MPAL cases as currently defined by the World Health Organization (65). Identification of this group transcends traditional methods of diagnosis to provide much needed clarity to the classification of lineage-ambiguous leukemias and demonstrates that a subset of T/myeloid MPAL and ETP-ALL originate in a hematopoietic progenitor cell prior to the initiation of T-lineage differentiation. Further studies are needed to examine the potential for inhibition of FLT3 signaling, in view of the frequent mutations in this gene in this subtype, and detailed analysis of the relative role of BCL11B in leukemic transformation according to cell of origin.
Patients were included with diagnoses of ALL, AML, ALAL, and ETP-ALL. Patients were treated at St. Jude Children's Research Hospital, the Children's Oncology Group, the Tokyo Children's Cancer Study Group, the Japan Association of Childhood Leukemia Study or underwent diagnostic testing by the Munich Leukemia Laboratory (MLL), with available tumor RNA-seq data (Supplementary Tables S1 and S2). Patients and/or their guardians provided written informed consent in accordance with the Declaration of Helsinki. The study was approved by the Institutional Review Board (IRB) of St. Jude Children's Research Hospital, and for cord blood studies, the Duke University IRB. BCL11B structural variants were examined in all hematologic malignancies subjected to WGS at the MLL (Supplementary Table S10), and all childhood cancers studied by the Pediatric Cancer Genome Project. An independent cohort of adult T-lineage ALL cases with available immunophenotypic data enrolled on ECOG-ACRIN and CALGB protocols, including cases of T/myeloid MPAL and ETP-ALL immunophenotype, were also studied to determine the prevalence of the BCL11B immunophenotype, and the prevalence of the BCL11B transcriptomic signature in this cohort (Supplementary Table S11).
Analysis of Patient Outcome
Cox proportional hazards models were used to compare clinical outcome (relapse-free survival and overall survival) distribution of each group to CD1a+ T-ALL in the ECOG-ACRIN and CALGB cohorts.
Cell Culture (Cell Lines and Cord Blood CD34+ Cells)
The DND-41 and Jurkat cell lines were grown in RPMI-1640 supplemented with 10% fetal bovine serum and 50 U/mL penicillin, 50 μg/mL streptomycin, and 2 mmol/L L-glutamine. Cells were maintained at a density of 0.3 × 106–1.0 × 106 cells per mL. DND-41 and Jurkat cells were validated by short tandem repeat analysis and were negative for Mycoplasma spp. Mononuclear white blood cells were separated from total umbilical cord blood using Ficoll Accu-Prep (Accurate Chemical AN5511) density gradient centrifugation. CD34+ cells were isolated using the CD34 MicroBead Kit UltraPure (Miltenyi Biotech 130-100-453) according to the manufacturer's instructions. Cell purity was assessed by flow cytometry using APC-Cy7-CD34 (clone 581, BioLegend 343513). CD34+ cells were maintained in HSC expansion media (ref. 66; StemSpan SFEM II; STEMCELL Technologies 09655) supplemented with the following cytokines each at 100 ng/mL: IL6 (PeproTech 200-06), thrombopoietin (PeproTech 300-18), SCF (300-07), and FLT3 ligand (PeproTech 300-19), and 1 μmol/L StemRegenin 1 (Cayman Chemical 10625).
For all analyses of the pan-leukemia cohort, transcriptome sequencing using Illumina library preparation and sequencers, data processing and gene-expression quantitation, fusion gene detection, tSNE analysis, and hierarchical clustering were performed as previously described (7). For analysis of BCL11B overexpression in cbCD34+ cells, RNA-seq was performed as follows. RNA was quantified using the Quant-iT RiboGreen RNA assay (Thermo Fisher) and quality checked by the 2100 Bioanalyzer RNA 6000 Nano assay (Agilent) or 4200 TapeStation High-Sensitivity RNA ScreenTape assay (Agilent) prior to library generation. Libraries were prepared from total RNA with the TruSeq Stranded Total RNA Library Prep Kit according to the manufacturer's instructions (Illumina 20020599). Libraries were analyzed for insert size distribution using the 2100 BioAnalyzer High-Sensitivity kit (Agilent), 4200 TapeStation D1000 ScreenTape assay (Agilent), or 5300 Fragment Analyzer NGS fragment kit (Agilent). Libraries were quantified using the Quant-iT PicoGreen ds DNA assay (Thermo Fisher) or by low pass sequencing with a MiSeq nano kit (Illumina). Paired-end 100 cycle sequencing was performed on a NovaSeq 6000 (Illumina). Total stranded RNA-seq data were processed by the internal AutoMapper pipeline. Briefly, the raw reads were first trimmed using Trim-Galore v0.60 (67), mapped to the human genome assembly (GRCh38) using STAR v2.7 (68) and then the gene-level values were quantified using RSEM v1.31 (69) based on GENCODE v31 annotation. To identify differentially expressed genes, only confidently annotated protein coding transcripts were considered, and low count genes were removed from analysis using a counts per million cutoff of 10. Normalization factors were generated using the TMM (70) method. Counts were normalized using voom (71). Voom-normalized counts were analyzed using the lmFit and eBayes functions of the limma (72) software package. Genes with an absolute fold change of ≥2.0 and FDR <0.05 were considered differentially expressed.
Gene Set Enrichment Analysis
Gene set enrichment analysis (73) was performed using the mSigDB C2 gene sets and an in-house curated list of gene sets.
For somatic structural variants, five SV callers were implemented in the workflow for SV calling, including Delly (74), Lumpy (75), Manta (76), GRIDSS (77), and novoBreak (78). The SV calls passing the default quality filters of each caller were merged using SURVIVOR (79) and genotyped by SVtyper (80). The intersected call sets were manually reviewed for the supporting soft-clipped and discordant read counts at both ends of a putative SV site using the Integrated Genome Viewer. Focal inspection of BCL11B-involved rearrangements were performed to identify the exact breakpoints with BLAT (81).
Mutation and Variant Detection
The genetic alterations including sequencing mutations and copy-number alterations were collected from previous publications or called specifically for this project using established locally developed pipelines (82, 83). All variant calls were further manually reviewed for read depth and to remove artifacts. Published SNVs and indels, SVs, and copy-number variation results were downloaded for the TARGET-AML (84–86), MPAL (5), and Japan-TALL (13) cohorts. Somatic SNVs and indels were identified from Illumina-based WGS and WES data with Bambino version 1.6 (87). For CGI-based WGS data, SNVs and indels were used from our previous TARGET variant calls (84). For additional ALL and MPAL WES/WGS samples, we used our in-house ensemble approach as described (83) to call SNVs and indel with multiple published tools, including Mutect2 version 22.214.171.124 (88), SomaticSniper version 1.0 (89), VarScan2 version 2.4.3 (90), MuSE version 1.0 (91), and Strelka2 version 2.9.10 (92). The consensus calls by at least two callers were considered as confident mutations. Strelka2 version 2.4.7 (92) and Pindel (93) were used for SNV and indel detection for the MLL cohorts as previously described (94). Somatic SVs were identified from Illumina-based WGS data using CREST version 1.0 (95). SVs for CGI-based WGS data were obtained from our previous study, which included filtering to remove germline SVs (84). Manta version 0.28.0 (76) was used for SV detection and GATK4 version 126.96.36.199 for copy-number analysis for the MLL cohort as previously described (94). Preprocessing and copy-number analysis of Illumina SNP array data were performed as described (96). Affymetrix SNP array data were analyzed using Rawcopy (97). CONSERTING (98) was used for somatic copy-number alternation detection from WGS data.
PacBio Sequencing and Analysis
High-molecular-weight DNA was purified from cryopreserved primary leukemia cells using the Gentra DNA Extraction and Purification Kit (Qiagen). Shearing, library prep, and size selection were followed based on the manufacturers protocol (SMRTbell Express Template Prep Kit 2.0). After library preparation, the samples were processed through primer annealing and polymerase binding. These steps were performed using sequencing primer v4 and Sequel II Binding Kit 2.0 following the manufacturer's protocol. The samples were evaluated for fragment sizes using the Fragment Analyzer gDNA kit, samples were sheared using a Megaruptor 3.0, and size selection was performed using the PippinHT, targeting fragments between 18 and 21 kb. Samples were sequenced on a PacBio Sequel II instrument in CCS mode using two SMRTcells per sample, with 70 pmol/L loading concentration, 2 hours of preextension time, and a 30-hour movie time. The resulting data from the sequencing indicate that the samples had a mean length of 18–21 kb. The PacBio reads were aligned against the human genome (hg38) using pbmm2 of PacBio SMRTtools v.8.0. Structural variants were identified (pbsv, SMRTtools). Reads aligning to the region of BETA (hg38 chr14:98541905–98544383) were extracted and de novo assembled using Canu (99). The assembled contigs were compared with the reference BETA sequence and its flanking sequences using zPicture (100). We also generated circular consensus reads (ccs, SMRTtools) to estimate the copy number of BETA (Supplementary Table S13).
Cis-X (ref. 11; version 1.4.0) was used to discover regulatory noncoding variants that lead to allele-specific expression (ASE) of the corresponding target gene. Recommended parameters (cis-X run … -w 10 -r 10 -f 5) were used to detect ASE genes. To account for intronic SNPs that typically have lower depth in RNA-seq, we lowered the coverage threshold for RNA-seq to five to include more markers for the ASE genes. Additionally, ASE runs were called if a minimum of four sequential markers showed significant ASE or monoallelic expression. BCL11B was considered to have ASE when overlapping with the ASE runs identified.
T-cell Receptor Rearrangement Analysis
The fastq files of WGS data were aligned to reference V, D, J, and C genes of T-cell receptors (TCR) and assembled clonotypes using MiXCR software (101). The TCR reference used in the present study was included in the MiXCR software as default setting: TRA/TRD, NG_001332.2; TRB, NG_001333.2; TRG, NG_001336.2. Called TCR rearrangements were filtered by excluding (i) cloneCount < 5 and (ii) cloneFraction < 0.05.
HiChIP Library Construction
All H3K27ac HiChIP experiments were performed using the Arima-HiC+ kit (Arima Genomics A101020) according precisely to the manufacturer's protocols [Arima-HiC+ document numbers A160168 v00 (HiChIP) and A160169 v00 (library preparation)], with minor modifications during the cell preparation step for cell lines and patient samples as follows: for cell lines and CD34+ cells, freshly cultured cells were counted using trypan blue, and a cell suspension volume corresponding to 10 million live cells was washed once with PBS (–Ca2+ −Mg2+) and then cross-linked in a volume of 5 mL PBS with 2% formaldehyde for 10 minutes at room temperature, followed by quenching with Stop Solution 1 according to the Arima protocol. For cryopreserved patient samples, one vial of cells was thawed in a 37°C water bath, and the entire contents of the vial were immediately transferred to 5 mL PBS where cells were counted before proceeding with crosslinking as above. The total number of cells used per patient sample ranged from 1.6 million to 8 million cells. The H3K27ac antibody was from Active Motif (am91194). Uniquely barcoded HiChIP libraries were pooled and sequenced to an average depth of 370 million reads on an Illumina NovaSeq instrument.
HiChIP Data Analysis
HiChIP data were processed and analyzed using MAPS (43) v1.9 as implemented within the Arima Genomics bioinformatics pipeline available at https://github.com/ijuric/MAPS. The following program versions were used: BWA (102) v0.7.12, SAMtools (103) v1.10, and deepTools (104) v3.4.0. For visualization of 2D heatmaps, .hic files were generated with Juicer tools (105) and uploaded to the St. Jude Protein Paint cloud-based server (106). To call significant HiChIP loops, MAPS requires a corresponding ChIP-seq peak set. Because we did not have sufficient patient material to generate ChIP-seq in every sample, we used H3K27ac ChIP-seq data from human CD34+ cells (34) and manually added peak regions corresponding to the BCL11B promoter and the region demarcated by BETA breakpoints, which are not marked by H3K27ac in normal CD34+ cells. To overcome the limitation of interchromosomal read-pair mapping for read pairs spanning chromosome 14 rearrangement junctions, we generated patient-specific reference genomes to allow us to call loops and visualize the actual configuration of the BCL11B rearrangement in these cases. To generate patient-specific genome files for the samples SJMPAL011914, SJAUL068292, and SJALL068279, the breakpoint positions were used to first generate fasta chromosome files corresponding to the reciprocal rearrangement involving chromosome 14 and the partner chromosome. The wild-type chromosome 14 and partner chromosome were then removed from the genome fasta file to prevent HiChIP read pairs mapping to more than two locations. Mappability was computed on the patient-specific genome file using GEM (107, 108) v2.0.14 and converted to bigwig format; mappability scores, GC percentage, and effective fragment lengths for nonoverlapping 5 kb bins were then computed using the genomic features generator scripts within the Arima bioinformatics pipeline to generate patient-specific genomic features files. These files were then used to run the standard MAPS pipeline.
ChIP-seq was performed using the SimpleChIP kit (Cell Signaling Technologies; 56383) according to the manufacturer's instructions. Briefly, freshly harvested DND-41 or thawed cryopreserved cells from primary patient samples were washed once in PBS and resuspended in PBS at a concentration of 1 million cells/mL. 37% formaldehyde was added to a final concentration of 1%, and cells were crosslinked for 10 minutes at room temperature. Cross-linking was quenched with the addition of glycine at a final concentration of 0.125M and incubated at room temperature for five minutes. Cells were spun at 4°C, washed once more with ice-cold PBS, and flash frozen in aliquots of 4 million cells. For each immunoprecipitation, one aliquot of 4 million cross-linked cells was used. Cells were sheared in Covaris microTUBEs (Covaris 520045) at the equivalent of 2 million cells per tube using a Covaris E220 instrument with the following settings: peak power = 140 watts, duty factor = 5, cycles/burst = 200, time = 9 minutes, temperature = 4°C. The following antibodies were used: 1 μg of H3K27ac (Active Motif am91194) or a cocktail of 4 antibodies for BCL11B, 0.5 μg each (Cell Signaling Technologies, 12120; Abcam, ab18465; Bethyl Laboratories, A300-383A and A300-385A). Immunoprecipitated DNA was prepared for Illumina sequencing using the Kapa Hyper Prep Kit (Kapa KR0961) and 11 (H3K27ac) or 19 (BCL11B) PCR cycles. Libraries were sequenced to an average depth of 387 million reads.
ChIP-seq Data Analysis
ChIP-seq reads were quality trimmed using Trim_Galore v0.4.4 with the cutadapt program (109), and FastQC was used to filter reads with quality scores >20 for downstream analysis. Reads were mapped to the hg38 reference genome using BWA (102) v0.7.12 and duplicates were marked using biobambam2 (110) v2.0.87. Coverage tracks were generated using deepTools (104) v3.4.0 with RPKM used for normalization, and a bin size of 10. ChIP-seq peaks were called using MACS2 (111) v2.1.1 with a minimum FDR cutoff of 0.01 using default parameters for both H3K27ac and BCL11B with input DNA used as the control.
Motif Enrichment Analysis
A 50-bp window centered at the summit of each promoter-distal (10 kb from RefSeq TSSs) ChIP-seq peak was used as input to the findMotifsGenome.pl program [HOMER (113) v4.10] using -size given with default background parameters. JASPAR (114) was used to identify motif instances in BETA. For this, the sequence corresponding to hg38 chr14:98541905–98544383 was used as input, and all human CORE motifs were searched with a relative profile score threshold cutoff of 90%.
Genomic PCR Validation of BCL11B Rearrangements
Genomic DNA was extracted from cryopreserved bone marrow aspirates taken at the time of diagnosis using the phenol–chloroform organic extraction method with the exception of sample SJMPAL068275, which, due to very low sample amount, was subjected to whole-genome amplification using the Qiagen REPLI-g kit (Qiagen 150345). Fifty nanograms was used for each PCR reaction (or 1 μL of a 1:100 dilution for sample SJMPAL068275) with Phusion High-Fidelity DNA Polymerase (New England Biolabs; M0530L) with the following primer pairs designed to amplify patient-specific breakpoints on the allele containing the BCL11B gene: SJMPAL042793 (For: ACGGTTGATTTCACTGCGAC; Rev: CTGTGCCATAACATGCGGAA), SJMPAL068275 (For: GCAGCTATCAAATCCAGAGGC; Rev: TAGTACCGGCTGTTGGAGAG), SJMPAL011914 (For: CATTGTCCCTCCAAACCTGC; Rev: TTCACCGATGGAAACCTGGA). Cycling conditions were 98°C for 30 seconds followed by 32 cycles of 98°C for 10 seconds, 63°C for 20 seconds, 72°C for 45 seconds, and a final extension of 2 minutes at 72°C. PCR reactions were cleaned with the Qiagen PCR Purification kit and submitted for Sanger sequencing using the primers used in each PCR reaction.
BETA Copy-Number Analysis
BETA DNA copy number was quantified using the TaqMan Copy-Number Reference Assay with a custom FAM-labeled TaqMan probe targeted to a sequence within a region of the BETA amplification shared by all samples (For: TGAACCGAGCAGAAGTGACAA; Rev: CTGTTAACCTCTCTATTCCTCTTTGTGTT; reporter: 5′-ACCAATCCATCACCCCAGAGCC). A VIC-labeled probe targeting the RNAseP gene was used as the internal reference (Thermo Fisher; 4403326). Reactions were prepared using the TaqPath ProAmp Master Mix (Thermo Fisher; A30866) and performed on samples with available genomic DNA (SJMPAL011911, SJMPAL011912, SJTALL005006, and three immunophenotypically defined subpopulations of SJMPAL040459). Genomic DNA from healthy normal cord blood CD34+ cells and a non-BETA sample (SJMPAL011914) were used as additional copy-number controls.
DNA and RNA FISH
Cryopreserved leukemia cells were thawed, washed in PBS, and applied to slides by cytocentrifugation followed by fixation in 4% PFA containing 0.5% Tween 20 and 0.5% NP-40 for 10 minutes. Following fixation, slides were stored in 70% ethanol at −20°C until ready to proceed with hybridizations. Fixed slides were prepared for RNA hybridization by dehydration in 80% and 100% ethanol for two minutes each. Denatured probes (10 μL) for both an enhancer RNA (either BENC, CDK6, or the ARID1B enhancer) and nascent RNA for BCL11B were combined and applied to dehydrated slides and allowed to hybridize at 37°C overnight. RNA-hybridized slides were then washed in 50% formamide and 2× SSC at 37°C for five minutes and mounted in Vectashield mounting medium containing DAPI. 3-D images were then acquired, coordinates were recorded, and slides were then treated with RNaseA in order to erase the RNA signals. Slides were then treated in 0.2N HCl followed by denaturation in 70% formamide and 2× SSC at 80°C for 10 minutes. Denatured slides were then hybridized with an appropriate set of bacterial artificial chromosomal (BAC) DNA probes for detection of DNA for either BENC, CDK6, or the ARID1B enhancer combined with BCL11B, and the same microscopic fields as were imaged after RNA FISH were imaged again after the addition of DNA FISH. The two sets of images were then compared which allows for visualization of the locations of enhancer RNA and nascent BCL11B RNA and DNA from these same targets. This allowed for direct visualization in cis for allele-specific activation of BCL11B while in contact with specific enhancers. For detection of BCL11B expression that occurs as a result of acquisition of a de novo enhancer 730 kb downstream of BCL11B, a similar experimental approach was used. First, an RNA FISH experiment designed to detect nascent BCL11B expression was performed and imaged followed by DNA FISH for a 2.5-kb segment of DNA that is specifically amplified in cis when BCL11B is expressed (BETA) combined with DNA FISH for BCL11B. The same fields as were imaged after RNA FISH were imaged again after DNA FISH and the images were compared. This experiment showed that BCL11B expression occurs only on the allele that contains BETA. BAC probes used were: BCL11B (RP11-844K17, chr14:99634644–99795981 and RP11-876E22 chr14:99552293–99744370); BCL11B flanking (RP11-151N10 chr14:98799667–98991003 and RP11-45Oc22 chr14:99900658–100090060); BETA (WI2-1744B12 chr14:98984071–99024847); CDK6 (RP11-1102K14 chr7:92302711–92463709 and RP11-809H24 chr7:92117011–92306530); ARID1B enhancer (RP11-808B8 chr6:156458627–156620361 and RP11-263N6 chr6:156057468–156215697); BENC (RP11-17E16 chr8:130493286–130632903). All coordinates referenced for FISH are hg19.
Frozen mononuclear cells from diagnosis bone marrow samples from patients SJMPAL068275 (T/myeloid MPAL, CCDC26/BENC rearranged), SJAML068287 (AML, CDK6 rearranged), and SJTALL005006 (ETP-ALL, BETA) were thawed and stained with a cocktail of human-specific antibodies, including BV605 Mouse Anti-CD45 (BD Biosciences; 564048), PE mouse anti-CD3 (BD Pharmingen; 555340), BUV805 mouse anti-CD13 (BD Biosciences; 749264), BV421 mouse anti-CD2 (BD Biosciences; 744873), APC mouse anti-CD34 (BD Biosciences; 340441), and PerCP/Cyanine5.5 anti-CD38 (BioLegend; 356614). For all samples, the following populations were sorted on a BD FACSAria (BD Biosciences): leukemia blast cells (CD45dim/CD2+/CD3−) and stem/progenitor cells (CD34+CD45dim/neg out all blast negative cells). Sorted cells were washed three times with 1× PBS (calcium and magnesium free) containing 0.04% weight/volume BSA (Thermo Fisher Scientific; AM2616) and counted. From each population, 5,000 cells were combined and loaded on Chromium Next GEM Chip G (10× Genomics; PN-2000177) with Chromium Next GEM Single-Cell 3′ GEM Kit v3.1 (10× Genomics; PN-1000130) and Chromium Next GEM Single-Cell 3′ v3.1 Gel Beads (PN-2000164) according to standard manufacturer's protocols. Briefly, oil partitions of single-cell and oligo-coated gel beads in emulsions (GEM) were captured, and reverse transcription was performed, resulting in cDNA tagged with a cell barcode and unique molecular index (UMI). Next, GEMs were broken and pooled fractions were recovered. Silane magnetic beads were used to purify the first-strand cDNA from the post GEM–RT reaction mixture. Barcoded, full-length cDNA was amplified via PCR and quantified using an Agilent Bioanalyzer High-Sensitivity chip (Agilent Technologies; 5067–5585 and 5067–5593). Enzymatic fragmentation and size selection were used to optimize the cDNA amplicon size. To construct the final libraries, P5, P7, a sample index, and TruSeq Read 2 (5′GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′) were added via end repair, A-tailing, adaptor ligation, and PCR (Chromium Next GEM Single-Cell 3′ Library Kit v3.1, PN-1000158, 10× Genomics). Final library quality was assessed using an Agilent Bioanalyzer High-Sensitivity chip (Agilent Technologies; 5067–5584 and 5067–5585). Samples were then sequenced on the Illumina NovaSeq with 28 (barcode+UMI) + 91(read) setting, with a median depth of >50,000 reads per cell for majority of the samples.
Single-cell data were aligned and quantified using the Cell Ranger (v4.0.0) pipeline (http://www.10xgenomics.com) against the human genome GRCh38 (refdata-gex-GRCh38-2020-A). Cells with fewer than 200 or higher than 6,000 features, or mitochondria content higher than 75%, were removed. Clusters of cells were identified using Seurat (3.1.0, https://satijalab.org/seurat) using UMAP reduction and characterized based on gene expression of major haemopoietic cell types.
Multimodal Single-Cell ATAC-seq/RNA-seq
Frozen mononuclear cells from diagnosis bone marrow samples from patients SJMPAL011913 (T/myeloid MPAL, ARID1B–BCL11B rearranged) and SJTALL005006 (ETP-ALL, BETA) were thawed and enriched for live cells using the Dead Cell Removal Kit (Miltenyi Biotec 130-090-101) according to the manufacturer's instructions. Live cells were washed twice with cold PBS + 0.04% BSA (Miltenyi Biotec; 130-091-376), counted using a Countess II FL Automated Cell Counter (Thermo Fisher Scientific A27974), and 0.7–0.8 million cells were used for nuclei isolation, according to the Nuclei Isolation for Single-Cell Multiome ATAC + Gene Expression Sequencing User Guide (version CG000365 Rev B). Briefly, cells were spun at 300 rcf for five minutes at 4°C and resuspended in 100 μL Lysis Buffer (preparing according to 10× Genomics' instructions and containing 10 mmol/L Tris-HCl pH 7.4, 10 mmol/L NaCl, 3 mmol/L MgCl2, 0.1% Tween-20, 0.1% Nonidet P40 Substitute, 0.01% Digitonin, 1% BSA, 1 mmol/L DTT, 1 U/μL Roche RNase inhibitor and water) and incubated on ice for four minutes, then 1-mL chilled wash buffer (preparing according to 10× Genomics' instructions) was added before spinning. Cells were washed three times in wash buffer and resuspended in chilled diluted nuclei buffer (10× Genomics). To confirm complete lysis and nuclei concentration, an aliquot was inspected by Trypan Blue. GEM generation and single-cell libraries were prepared according to the Chromium Next GEM Single-Cell Multiome ATAC + Gene Expression User Guide (CG000338 Rev B). Briefly, following transposition GEMs were generated by combining barcoded Gel Beads, transposed nuclei, a Master Mix that includes reverse transcription (RT) reagents, and Partitioning Oil on a Chromium Next GEM Chip J (10× Genomics; PN-2000264). Incubation of the GEMs in a thermal cycler for 45 minutes at 37°C and for 30 minutes at 25°C generates 10× barcoded DNA from the transposed DNA (for ATAC) and 10× barcoded, full-length cDNA from poly-adenylated mRNA (for GEX). This was followed by a quenching step that stopped the reaction. Next, GEMs were broken, and pooled fractions were recovered. Silane magnetic beads were used to purify the first-strand cDNA from the post GEM-RT reaction mixture. Barcoded transposed DNA and barcoded full-length cDNA from poly-adenylated mRNA were preamplified by PCR and the products were used as input for both ATAC library construction and cDNA amplification for gene-expression library construction. Libraries were sequenced on the Illumina NovaSeq according to the 10× Genomics setting, with a median depth of >50,000 reads per nucleus for a majority of the samples.
Comparison of Leukemia Transcriptional Signatures with Normal Hematopoietic Populations
Purified hematopoietic populations from healthy umbilical cord blood were sorted and subjected to RNA-seq as previously described in Xie and colleagues (115). Signature matrix generation was performed through CIBERSORTx (116) using a q-value cutoff of 0.05. Deconvolution analysis on 2,467 pan-leukemia samples was performed through CIBERSORTx with default settings, returning enrichment scores of each hematopoietic score for each patient. Signature enrichment of each purified population was normalized to have a sum of one in each sample. Using these normalized scores, a neighborhood graph was generated for the leukemia samples using n = 15 nearest neighbors and UMAP dimensionality reduction was performed using a minimum effective distance of 0.1 and a spread of 1.5. Signature scores were subsequently scaled for visualization.
Single nuclei data from the joint scATAC/RNA experiments were aligned and quantified using cellranger-arc (v1.0.1) pipeline (http://www.10×genomics.com) against the human genome GRCh38 (refdata-gex-GRCh38-2020-A). Signac was used to generate an LSI representation of both scATAC-seq and scRNA-seq data and a joint UMAP representation based using both modalities. ChromVAR was used to calculate the enrichment of hematopoietic signatures from Takayma and colleagues (50) within each single cell with the ATAC-seq data and correlation between hematopoietic signature enrichment and BCL11B gene-expression computed over all single cells.
Comparison of BCL11B Binding with Single-Cell ATAC-seq
As described in Takayama and colleagues (50), cells from three sorted CD34+CD38−CD45RA− and three CD34+CD38+ populations were processed on the 10× Genomics single-cell ATAC-seq platform, and 12,414 single cells were retained based on default cellranger QC criterion and chromVAR depth filtering (51). Cellranger-reanalyze (1.1, 10× Genomics) was subsequently used to map reads over the sites identified in the bulk ATAC-seq catalog, and read counts were binarized. chromVAR (with default settings; ref. 51) was used to calculate the enrichment of each of the hematopoietic signatures identified via nonnegative matrix factorization in each single cell, as well as for BCL11B ChIP-seq in DND-41 and SJTALL005006 cells. The UMAP package in R (117) was used with default parameters to reduce the dimensionality of the enrichment of the signatures in each cell for visualization, with 21 outlier cells excluded from subsequent analyses.
293T cells were cultured in DMEM/10% FBS supplemented with 1× penicillin–streptomycin–glutamine (Invitrogen). To produce concentrated lentivirus, 6 μg CL20c-MSCV lentiviral backbones were cotransfected with packaging vectors (3 μg pCAG-kGF1-1R, 1 μg pCAG-VSVG, and 1 μg pCAG4-RTR2) into 293T cells using PEIpro (Polyplus). Lentiviral vectors were CL20-MSCV-IRES-GFP (empty vector GFP), CL20-MSCV-IRES-mCherry (empty vector mCherry), CL20-MSCV-BCL11B-IRES-GFP, or CL20-MSCV-FLT3-ITD-IRES-mCherry. Vector supernatants were collected 48 hours after transfection, clarified by centrifugation at 330 × g for five minutes and filtered through a 0.22 μm strainer. Lentiviral vector containing supernatants were adjusted to 300 mmol/L NaCl, 50 mmol/L Tris pH 8.0, and loaded onto an Acrodisc Mustang Q membrane (Pall Life Sciences) according to the manufacturer's instructions using an Akta Avant chromatography system (GE Healthcare Bio-Sciences). After washing the column with 10 column volumes of 300 mmol/L NaCl, 50 mmol/L Tris pH 8.0, viral particles were eluted from the column using 2 mol/L NaCl, Tris pH 8.0 directly onto a PD10 desalting column (GE Healthcare) according to the manufacturer's instructions. Vector containing flow-through was diluted with an equal volume of X-VIVO 10 media (Lonza) or phosphate-buffered saline containing 1% human serum albumin (Grifols Biologicals) to achieve an approximate 50-fold concentration from the starting material, 0.22 μm sterile filtered, aliquoted and stored at −80°C.
For lentiviral production to infect mouse lineage-negative cells, 10 μg of each CL20 viral expression vector was mixed with 2 μg of each packaging vector (CAG4-Eco, CAG-KGP1–1R, CAG4-RTR2) and transfected into 293T cells using FuGene HD transfection reagent (Promega E2311). Viral supernatant was collected 48 and 72 hours after transfection and filtered through a 0.45-μm strainer.
Transduction of cbCD34+ HSPCs
cbCD34+ cells were expanded for three days in HSC expansion media (ref. 66; StemSpan SFEM II; STEMCELL Technologies; 09655) supplemented with the following cytokines each at 100 ng/mL: IL6 (PeproTech; 200-06), thrombopoietin (PeproTech; 300-18), SCF (PeproTech; 300-07), and FLT3 ligand (PeproTech; 300-19) and then transduced at a concentration of 4 × 106 cells/mL using 50% volume of concentrated virus with cytokines adjusted accordingly. For transcriptional analysis of BCL11B overexpression, cbCD34+ cells were transduced with CL20-MSCV-GFP (empty vector) or CL20-MSCV-BCL11B-IRES-GFP. For in vitro differentiation (CFU and liquid culture), cbCD34+ cells were transduced with two viruses: GFP and mCherry empty vectors, CL20-MSCV-BCL11B-IRES-GFP + mCherry empty vector, CL20-MSCV-FLT3-ITD-IRES-mCherry + GFP empty vector, or CL20-MSCV-BCL11B-IRES-GFP + CL20-MSCV-FLT3-ITD-IRES-mCherry vectors. Transduced cells were isolated by FACS and then expanded an additional 48 hours in HSC expansion media before cells were harvested for RNA isolation, or plated immediately for in vitro differentiation.
Transduction of Mouse Lineage-Negative HSPCs
Freshly filtered virus was spun onto 6-well non-tissue culture–treated plates coated with RetroNectin (Takara T100B) for 90 minutes at 3,000 RPM at 4°C. Mouse lineage-negative HSPCs were obtained from six- to eight-week-old wild-type C57BL/6 bone marrow using the EasySep Mouse Hematopoietic Progenitor Cell Isolation Kit (STEMCELL Technologies; 19856). Cells were infected on RetroNectin-coated plates for 48 hours with CL20-MSCV lentivirus expressing the genes of interest (GFP + mCherry empty vectors, CL20-MSCV-BCL11B-IRES-GFP + mCherry empty vector, CL20-MSCV-FLT3-ITD-IRES-mCherry + GFP empty vector, or CL20-MSCV-BCL11B-IRES-GFP + CL20-MSCV-FLT3-ITD-IRES-mCherry). Cells were maintained in IMDM supplemented with 20% FBS, 50 ng/mL mSCF, 40 ng/mL mFlt3, 30 ng/mL mIL6, 20 ng/mL mIL3, and 10 ng/mL mIL7 (all from PeproTech) with 1× penicillin/streptomycin for the duration of the transduction.
In Vitro Differentiation of Human CD34+ HSPCs
For in vitro differentiation, GFP+/mCherry+ cells were isolated by FACS and washed once with PBS. For colony-forming assays, 1,500 cells were plated in triplicate in 1.1 mL Methocult supplemented with recombinant human SCF, IL3, IL6, EPO, G-CSF, and GM-CSF (STEMCELL Technologies; H4435E). Colonies were counted after 14 days. For liquid culture differentiation, cells were plated at a density of 20,000 cells/mL in myeloid differentiation media (StemSpan SFEM II (STEMCELL Technologies; 09655) supplemented with 10 ng/mL GM-CSF, 10 ng/mL G-CSF, 10 ng/mL IL6, 10 ng/mL IL3, and 100 ng/mL SCF), or lymphoid differentiation media [StemSpan SFEM II (STEMCELL Technologies; 09655) supplemented with 1,000 U/mL IL2, 5 ng/mL IL3, 20 ng/mL IL7, 20 ng/mL SCF, and 10 ng/mL FLT3-L]. All recombinant human cytokines were purchased from PeproTech. Half-volume of media was added after three days, and cells were analyzed by flow cytometry on day 7. Surface and intracellular markers were stained for flow-cytometry analysis using the Fix & Perm Cell Permeabilization Kit (Life Technologies; GAS004) according to the manufacturer's instructions. Briefly, cultured cells (0.5–1.0 × 106) were washed twice with PBS/BSA (0.5% BSA (w/v) in Dulbecco PBS (DPBS; 1×; Gibco; 14190-144) and resuspended in 200 μL of PBS/BSA. After adding 5 μL normal rabbit serum (Thermo Fisher Scientific; 10510), a cocktail of cell-surface marker antibodies, containing CD45-Pac Orange (5 μL; Life Technologies; MHCD4530), CD235a-PE (5 μL; Beckman Coulter; IM2211U), CD34-PerCP-Cy5.5 (20 μL; BD Biosciences; 347213), CD33-PE-Cy7 (10 μL; BD Biosciences; 333949), CD7-AlexaFluor700 (5 μL; BD Biosciences; 561603), and CD135 (Flt-3)-APC (5 μL; BioLegend; 313308), was added to each tube and incubated in the dark at room temperature for 10 minutes. Cells were washed twice with PBS/BSA and resuspended in 200 μL Fix & Perm medium A, incubated at room temperature for 15 minutes, and washed once with PBS/BSA. Cells were resuspended in 200 μL Fix & Perm medium B, and a cocktail of intracellular marker antibodies containing CD3-APC-H7 (5 μL; BD Biosciences; 641406) and MPO-eFluor450 (5 μL; Invitrogen; 48-1299-42) was added and incubated at room temperature for 15 minutes in the dark. After washing twice with PBS/BSA, the stained cells were resuspended in 0.5 mL of PBS/BSA, and at least 50,000 single-cell events were acquired on a BD LSRFortessa cytometer. Data were analyzed using DIVA 8 and FlowJo v10.
Colony-Forming Assays in Mouse Lineage-Negative HSPCs
GFP+/mCherry+ cells were isolated using FACS, cells were washed once with PBS and plated in triplicate at a density of 5,000 cells per dish in 1.1 mL Methocult (StemCell Technologies M3434) with mGM-CSF added at a final concentration of 10 ng/mL. 5,000 cells were replated in triplicate every seven days for four passages.
RNA-seq, WGS, and PacBio data generated from primary patient samples for this study have been deposited at the European Genome Phenome Archive, accession EGAS00001004810. HiChIP, ChIP-seq, and RNA-seq (cbCD34+ only) data have been deposited at the Gene-Expression Omnibus under accession GSE165209.
S. Bendig reports personal fees from MLL Munich Leukemia Laboratory outside the submitted work. I. Iacobucci reports honoraria from Amgen although not relevant to the work here presented. A. Stengel reports personal fees from MLL Munich Leukemia Laboratory outside the submitted work. M. Rashkovan reports grants from Damon Runyon Cancer Research Foundation during the conduct of the study. S. Luger reports personal fees from Daiichi-Sankyo, Jazz, BMS, Agios, AbbVie, and Syros, grants from Biosgiht, Celgene, Hoffman LaRoche, and Kura outside the submitted work. V. Wang reports grants from NCI during the conduct of the study. M.L. Loh reports personal fees from MediSix Therapeutics outside the submitted work. C. Pui reports grants from NCI during the conduct of the study; personal fees from Adaptive Biotechnology Inc, from Novartis, Amgen, and Erytech outside the submitted work. M.V. Relling reports grants from NIH/NCI/and grants and personal fees from Servier outside the submitted work; and Spouse on board of BioSkryb, Inc. and Scientific Advisory Board Chair of Princess Máxima Center. W.E. Evans reports grants from NIH/NCI outside the submitted work; and Board member of BioSkryb, Inc., scientific advisory board chair of Princess Máxima Center. A.A. Ferrando reports grants from NCI and grants from Alex Lemonade Stand Foundation during the conduct of the study; personal fees from Ayala Pharmaceuticals, Brystol Myers Squib SpringWorks Therapeutics, VantAI, grants from The Chemotherapy Foundation, Hyundai, Leukemia and Lymphoma Society, and Pershing Square Sohn Cancer Research Alliance, and other support from the American Society of Hematology outside the submitted work. J.E. Dick reports grants from Celgene-BMS and other support from Trillium Therapeutics Inc outside the submitted work. C. Haferlach reports other support from MLL Munich Leukemia Laboratory outside the submitted work. C.G. Mullighan reports personal fees from Illumina during the conduct of the study; grants from AbbVie and Pfizer, and other support from Amgen outside the submitted work. No disclosures were reported by the other authors.
L.E. Montefiori: Data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. S. Bendig: Formal analysis and investigation. Z. Gu: Data curation, formal analysis, investigation, visualization, and methodology. X. Chen: Software, formal analysis, investigation, visualization, and methodology. P. Pölönen: Formal analysis, validation, investigation, and methodology. X. Ma: Formal analysis, investigation, and visualization. A. Murison: Resources, formal analysis, investigation, visualization, and methodology. A. Zeng: Resources, data curation, formal analysis, and investigation. L. Garcia-Prat: Resources, data curation, formal analysis, and investigation. K. Dickerson: Investigation and methodology. I. Iacobucci: Formal analysis, investigation, and methodology. S. Abdelhamed: Data curation, formal analysis, investigation, and methodology. R. Hiltenbrand: Data curation, investigation, and methodology. P.E. Mead: Resources, formal analysis, investigation, and methodology. C.M. Mehr: Investigation and methodology. B. Xu: Formal analysis, investigation, visualization, and methodology. Z. Cheng: Formal analysis, investigation, visualization, and methodology. T.-C. Chang: Data curation, formal analysis, investigation, and methodology. T. Westover: Investigation. J. Ma: Formal analysis. A. Stengel: Data curation and formal analysis. S. Kimura: Software, formal analysis, investigation, and methodology. C. Qu: Software, formal analysis, investigation, visualization, and methodology. M.B. Valentine: Investigation, visualization, and methodology. M. Rashkovan: Resources and investigation. S. Luger: Resources. M.R. Litzow: Resources. J.M. Rowe: Resources. M.L. den Boer: Resources. V. Wang: Data curation and formal analysis. J. Yin: Data curation and formal analysis. S.M. Kornblau: Resources. S.P. Hunger: Resources. M.L. Loh: Resources. C. Pui: Resources. W. Yang: Data curation, formal analysis, and investigation. K.R. Crews: Resources. K.G. Roberts: Resources and data curation. J.J. Yang: Resources. M.V. Relling: Resources. W.E. Evans: Resources. W. Stock: Resources. E.M. Paietta: Resources, data curation, formal analysis, and investigation. A.A. Ferrando: Resources, data curation, formal analysis, and investigation. J. Zhang: Resources, data curation, software, formal analysis, supervision, investigation, visualization, and methodology. W. Kern: Resources, data curation, formal analysis, and investigation. T. Haferlach: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, validation, and investigation. G. Wu: Data curation, formal analysis, supervision, investigation, visualization, and methodology. J.E. Dick: Resources, supervision, investigation, visualization, writing–review and editing. J.M. Klco: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, visualization, project administration, writing–review and editing. C. Haferlach: Conceptualization, resources, data curation, formal analysis, supervision, validation, investigation, project administration, writing–review and editing. C.G. Mullighan: Conceptualization, resources, supervision, investigation, writing–original draft, and project administration.
This study was supported by grants from the NIH R35 CA197695 and P30 CA021765 (to C.G. Mullighan), R01 CA216391 (to J. Zhang), R35 CA210065 and P30 CA103696 (to A.A. Ferrando), F32 CA254140 (to L.E. Montefiori), P50 GM115279-01 (to M.V. Relling and M.L. Loh), R00 CA241297 (to Z. Gu), The St. Jude Children's Research Hospital Chromatin Collaborative, the Henry Schueler 41&9 Foundation (to C.G. Mullighan), a St. Baldrick's Foundation Robert J. Arceci Innovation award (to C.G. Mullighan), an Alex's Lemonade Stand Foundation award (to C.G. Mullighan), a Burroughs Wellcome Fund Career Award for Medical Scientists (to J.M. Klco), and the Leukemia and Lymphoma Society's Career Development Program Special Fellow (to Z. Gu). This study was conducted in part by the ECOG–ACRIN Cancer Research Group (Peter J. O'Dwyer, MD, and Mitchell D. Schnall, MD, PhD, group co-chairs) and supported by the NCI of the NIH under the following award numbers: U10CA180820, U10CA180794, UG1CA189859, UG1CA232760, UG1CA233234, UG1CA233290, U24CA196171, and U10CA180821 (to E.M. Paietta and M.R. Litzow). M. Rashkovan is a Damon Runyon-Sohn Pediatric Cancer Fellow supported by the Damon Runyon Cancer Research Foundation (DRSG-2017). This study was supported by funds to J.E. Dick from the Princess Margaret Cancer Centre through funding provided by Ontario Ministry of Health, Princess Margaret Cancer Centre Foundation, Ontario Institute for Cancer Research through funding provided by the Government of Ontario, Canadian Institutes for Health Research (RN380110-409786), Canadian Cancer Society (grant #703212), Terry Fox New Frontiers Program Project Grant, and a Canada Research Chair.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.