SETD2 is the sole histone methyltransferase responsible for H3K36me3, with roles in splicing, transcription initiation, and DNA damage response. Homozygous disruption of SETD2 yields a tumor suppressor effect in various cancers. However, SETD2 mutation is typically heterozygous in diffuse large B-cell lymphomas. Here we show that heterozygous Setd2 deficiency results in germinal center (GC) hyperplasia and increased competitive fitness, with reduced DNA damage checkpoint activity and apoptosis, resulting in accelerated lymphomagenesis. Impaired DNA damage sensing in Setd2-haploinsufficient germinal center B (GCB) and lymphoma cells associated with increased AICDA-induced somatic hypermutation, complex structural variants, and increased translocations including those activating MYC. DNA damage was selectively increased on the nontemplate strand, and H3K36me3 loss was associated with greater RNAPII processivity and mutational burden, suggesting that SETD2-mediated H3K36me3 is required for proper sensing of cytosine deamination. Hence, Setd2 haploinsufficiency delineates a novel GCB context–specific oncogenic pathway involving defective epigenetic surveillance of AICDA-mediated effects on transcribed genes.
Our findings define a B cell–specific oncogenic effect of SETD2 heterozygous mutation, which unleashes AICDA mutagenesis of nontemplate strand DNA in the GC reaction, resulting in lymphomas with heavy mutational burden. GC-derived lymphomas did not tolerate SETD2 homozygous deletion, pointing to a novel context-specific therapeutic vulnerability.
Diffuse large B-cell lymphomas (DLBCL) are aggressive and heterogeneous tumors arising from B cells transiting the germinal center (GC) humoral immune response (1). GCs are dynamic and transient anatomic structures that develop in secondary lymphoid organs following T cell–dependent antigen activation of mature B cells (2). Germinal center B (GCB) cells called centroblasts (CB) undergo rapid rounds of clonal expansion and somatic hypermutation (SHM) to diversify the immunoglobulin (Ig) locus, after which they become postreplicative centrocytes (CC) that compete for T-cell help based on B-cell receptor affinity for cognate antigen, with selected cells exiting the GC reaction and transitioning to become either memory B cells or antibody-producing plasma cells (2). As such, GCB cells are specialized for tolerance of DNA damage (1) and DLBCLs characteristically manifest a high burden of somatic mutations and genomic structural lesions.
SHM is initiated by the deamination of cytosines on uracils at single-stranded DNA (ssDNA) by the enzyme activation-induced cytosine deaminase (AICDA), leading to breaks and nicks on DNA that get repaired with the introduction of mutations to increase the antigen binding affinity (3–5). Although AICDA is targeted to the Ig heavy chain (IgH) locus, it can act at other actively transcribed and accessible regions of the genome, resulting in abundant off-target mutations (6). AICDA-mediated mutagenesis during the GC reaction occurs due to the rapid proliferative rate of these cells, and many replicating GCB cells undergo genomic damage-induced apoptosis (7). In order to survive, GCB cells partially attenuate DNA damage–sensing mechanisms and prevent checkpoint engagement that would otherwise impair the process of SHM, for example, by the BCL6 transcriptional repressor that attenuates the actions of ATR (8). This scenario makes GCB cells highly prone to malignant transformation, with a majority of B-cell tumors, including DLBCLs originating from this process (1). Further compounding this effect, many DLBCLs manifest evidence of undergoing further AICDA-induced mutagenesis and are composed of highly proliferative cells reflecting their GC origin (9, 10).
From the genetic perspective, a dominant theme of somatic mutations in DLBCL is highly recurrent mutations in chromatin modifier genes and transcription factors (11, 12). This is critically linked to the extensive and rapid waves of epigenetic reprogramming experienced by B cells as they transit the GC reaction and undergo a variety of phenotypic transitions. Much of this process is controlled by dynamic activation and repression of gene enhancers and promoters by EZH2, CREBBP, KMT2D, EP300, and TET2 (13–18), all of which are mutated at high frequencies in DLBCL (11, 12). The outcome of perturbation in their function was shown to include disruption of normal homeostatic interactions within the immune microenvironment, disrupting immune surveillance and impaired exit from the GC reaction (19). Collectively, these findings point to disruption in both epigenetic programming of gene regulatory elements and genomic instability as hallmark characteristics of DLBCLs.
Among the highly recurrent DLBCL-associated somatic mutations in chromatin modifiers, those affecting histone methyltransferase SETD2 stand out from the others as being at the crossroads between epigenetic regulation of transcriptional activation and elongation as well as DNA damage sensing (20). Notably, SETD2 mutations in DLBCL are especially highly recurrent in patients with African ancestry, who also experience inferior clinical outcomes (21). Mechanistically, SETD2 is the sole histone methyltransferase that can trimethylate H3K36, a mark that is primarily localized to the coding region of actively transcribed genes (22). Within gene bodies, SETD2 was shown to interact with the phosphorylated C-terminal domain of RNA polymerase II (RNAPII) and to deposit H3K36me3 at actively transcribed genes (23). H3K36me3 was found to be involved in the recruitment of RNA splicing machinery through recruitment of MRG15 and ZMYND11 and preventing spurious transcription initiation by binding DNMT3B (24–26). Other studies suggest that SETD2-mediated H3K36me3 contributes to sensing of DNA damage through various pathways including mismatch repair (MMR) and recruitment of LEDGF and subsequent activation of homologous recombination (HR; refs. 27, 28).
Homozygous loss of function of SETD2 occurs in a variety of solid tumors and leukemias and leads to defective DNA damage repair and impaired transcription through a variety of potential mechanisms (20, 29–31). SETD2 mutations in solid cancers occur as missense and nonsense mutations at roughly equal proportions, with no obvious hotspot but a preference of mutations within the SET domain (20). In early B-cell development, homozygous SETD2 deficiency results in defective VDJ recombination due to aberrant end joining of DNA breaks (32, 33). However, in DLBCLs, SETD2 is mostly affected by heterozygous missense mutations without loss of heterozygosity, in which its role in the highly specialized context of the GC reaction is unknown. Herein, we set out to investigate the role of SETD2 in the humoral immune response and how reduced dosage through heterozygous loss of function could contribute to the malignant transformation of these highly specialized B cells.
Setd2 Haploinsufficiency Induces GC Hyperplasia and Dark Zone Polarization
A survey of publicly available genomic profiling data sets (n = 1,917 DLBCLs) revealed the presence of missense (94%) and nonsense (6%) mutations of SETD2 in 5% of cases overall, which was similar in all cohorts (refs. 11, 34, 35; Fig. 1A). Of these, ∼60% of missense mutations scored as being likely deleterious, including introduction of bulky residues such as phenylalanine, introduction of amino acids with opposing ionic charge, and hydrophobic residues in place of charged residues (Fig. 1A and B). In a cohort of 574 DLBCL patients with structural variants (SV), there was heterozygous loss of SETD2 in 7% of patients, with rare cases showing homozygous deletion (Supplementary Fig. S1A). In contrast, homozygous deletion or loss of heterozygosity is reported to range between 42% and 100% in solid tumors with frequent SETD2 mutations, and 27% in acute myeloid leukemia and acute lymphoblastic leukemia (36–39). Based on the LymphGen classification, we observed that SETD2 mutations were most abundant in the ST2 subtype of DLBCL (Supplementary Fig. S1B), which mostly have GCB-like transcriptional profiles (40). Examining SETD2 gene expression in murine and human splenic naïve B (NB) and GCB cell populations, we observed robust gene expression (Supplementary Fig. S1C and S1D).
To understand the impact of SETD2 loss of function in GCB cells, we crossed mice bearing a floxed Setd2 exon 3 allele with the Cd19-cre strain, expressing cre recombinase in pre–B cells (30, 41). Cd19wt/creSetd2wt/wt (Setd2wt/wt), Cd19wt/creSetd2wt/− (Setd2wt/−), and Cd19wt/creSetd2−/− (Setd2−/−) mice (Supplementary Fig. S1E) were immunized with the T cell–dependent antigen sheep red blood cells (SRBC) to induce GC development and sacrificed 7 days later, when GCs were fully formed. Routine histologic staining of spleens [hematoxylin and eosin (H&E) and B220 staining] revealed intact tissue architecture and normal-appearing B-cell follicles in Setd2wt/− and Setd2−/− animals (Fig. 1C). Notably, immunostaining with the GC-specific lectin peanut agglutinin (PNA) revealed enlarged GCs in Setd2wt/− but not Setd2−/− mice (Fig. 1C–F). Flow cytometry analysis similarly revealed significantly increased abundance of GCB cells in Setd2wt/− but not Setd2−/− animals, both at days 7 and 14 after immunization when GCs are resolving (Fig. 1G; Supplementary Fig. S1F). We confirmed allele dose-dependent reduction of Setd2 by qPCR in sorted GCB cells (Supplementary Fig. S1G). Further analysis of GCB cell populations revealed an increased ratio of CBs to CCs in both Setd2wt/− and Setd2−/− mice, indicating that perturbations of GC polarity may explain the hyperplastic phenotype (Fig. 1H). We found no perturbation in any other mature B-cell populations or memory B cells in Setd2wt/− mice (Supplementary Fig. S1H–S1N), although mature B cells were significantly reduced in Setd2−/− mice, consistent with previous reports (32). Finally, attempts to generate SETD2 knockout (KO) human DLBCL cells failed to yield any homozygous clones in four of five cell lines (Supplementary Fig. S1O and S1P). Given that SETD2 mutations are recurrently heterozygous in DLBCLs (Fig. 1A; Supplementary Fig. S1A), homozygous loss is deleterious in DLBCL cell lines, and Setd2 deletion in mice resulted in a distinct and clearly preneoplastic phenotype in contrast to homozygous deletion, we subsequently focused specifically on the Setd2wt/− setting.
Setd2wt/- GCB Cells Manifest Superior Fitness Due to Reduced Apoptosis
GC hyperplasia could be caused by either increased proliferation or reduced rates of apoptosis. To explore proliferation effects, we immunized Setd2wt/wt and Setd2wt/− mice, and 7 days later injected them with 5-ethynyl-2′-deoxyuridine (EdU) 120 minutes prior to harvesting splenocytes. We found no difference in EdU+ incorporation, consistent with a similar rate of cells transiting the S-phase (Fig. 2A; Supplementary Fig. S2A). We crossed our Setd2 mice with the cell-cycle reporter strain Rosa26-Fucci2a, which marks cells with different fluorescent signals based on their cell-cycle status (42) and did not observe any differences in cell-cycle distribution in Setd2wt/− versus Setd2wt/wt GCB cells (Fig. 2B; Supplementary Fig. S2B). A high fraction of GCB cells undergo apoptosis (7), and accordingly we observed that 20% of Setd2wt/wt cells stained as annexin V+ DAPI− (Fig. 2C), whereas the fraction of apoptotic GCB cells was significantly reduced in Setd2wt/− mice. A similar result was obtained by staining GCB cells with cleaved caspase-3 (Fig. 2D). We also performed IHC stains for cleaved caspase-3 in the spleens of immunized Setd2wt/− and Setd2wt/wt mice and again observed significant reduction in the abundance of apoptotic cells within GCs (delineated by PNA staining; Fig. 2E). Therefore, GC hyperplasia is attributed mainly to decreased apoptosis and not changes in proliferation or cell-cycle dynamics.
To test whether the improved survival of Setd2wt/− GCB cells can confer a clear fitness advantage, we performed mixed bone marrow (BM) chimera experiments in which Cd45.1+;Setd2wt/wt or Cd45.1/2+;Setd2wt/− BM cells were mixed at 50:50 or 75:25 ratios and transplanted into lethally irradiated Cd45.2+ syngeneic recipients. After engraftment, mice were immunized with SRBCs and sacrificed 3, 10, or 20 days later (Fig. 2F). Analysis of GCB cells at the different time points showed no advantage for Setd2wt/− over Setd2wt/wt at the initiation of the GC reaction (day 3; Fig. 2G and H) after normalization to their respective NB-cell populations by flow cytometry. However, a clear and significant competitive advantage for Setd2wt/− GCB cells did emerge by day 10 and persisted until the end of the GC reaction (day 20; Fig. 2G and H). Notably, this advantage was also observed even when Setd2wt/wt cells were transduced at the higher (75:25) ratio to Setd2wt/− cells. In contrast, there was no competitive advantage for Setd2wt/− NB cells normalized to total B-cell populations at any ratio (Fig. 2I). Moreover, the advantage for Setd2wt/− GCB cells was most significant among CBs, the highly proliferative fraction of GCB cells that undergo SHM (Fig. 2J). Consistent with the above findings related to proliferation and apoptosis, the competing Setd2wt/− GCB cells displayed lower levels of cleaved caspase-3 but no differences in the proliferation marker PCNA compared with Setd2wt/wt GCB cells in the same mice (Supplementary Fig. S2C and S2D). These data suggest that Setd2wt/− confers a fitness advantage over GCB cells, associated with reduced rates of apoptotic cell death.
Setd2 Haploinsufficiency Impairs DNA Damage Sensing
Much of the apoptosis occurring in GCB cells is attributed to the DNA damage generated by both AICDA-induced hypermutation and stress from rapid replication (7). Therefore, we wondered whether the improved survival of Setd2wt/− GCB cells was linked to impaired DNA damage sensing, which can occur when there is reduction in SETD2-mediated H3K36 trimethylation (43). Along these lines, performing flow cytometry in spleens of SRBC-immunized mice, we detected significantly decreased levels of the early DNA damage marker γH2AX in Setd2wt/− GCB cells (Fig. 3A). Repeating this experiment in mice injected with EdU 2 hours prior to euthanasia showed that the reduction in γH2AX was not observed in actively dividing EdU+ GCB cells, suggesting the defect occurs after DNA replication (Fig. 3B). ssDNA damage sensing that occurs during SHM triggers phosphorylation of CHK1, which we observed also to be significantly reduced in Setd2wt/− GCB cells (ref. 44; Fig. 3C).
The DNA damage–sensing actions of SETD2 are linked to its generation of H3K36me3, which can subsequently recruit DNA repair proteins such as LEDGF (45). Therefore, to examine the global effect of Setd2 haploinsufficiency on chromatin states, we performed unbiased histone posttranslational modification mass spectrometry on purified Setd2wt/− or Setd2wt/wt GCB cells. We observed significant reduction in the abundance of H3K36me3 in both replication-dependent H3.1 and replication-independent H3.3 isoforms in Setd2wt/− GCB cells (Fig. 3D), with modest reciprocal increases in H3K36me1 and to a lesser extent in H3K36me2. This prompted us to measure the abundance of LEDGF in the chromatin and nonchromatin cell fractions, which revealed a significant reduction in the abundance of chromatin-bound LEDGF in Setd2wt/− GCB cells (Fig. 3E). H3K36me3-dependent LEDGF loading at sites of DNA damage recruits TIP60, which mediates H4K16 acetylation (46). Along these lines, our histone mass spectrometry analysis revealed significant reduction of H4K16ac in Setd2wt/− GCB cells with reciprocal gain of unmodified H4K16 (Fig. 3F).
Finally, we performed H3K36me3 CUT&RUN in Setd2wt/− or Setd2wt/wt GCB cells (Fig. 3G) to further validate the mass spectrometry data and ascertain whether H3K36me3 loss was associated with specific genes. Unsupervised analysis focused on gene bodies, where most H3K36me3 is located (27), revealed clearly distinct profiles in Setd2wt/− GCB cells (Fig. 3G). We then performed a supervised analysis focused on gene bodies with significant H3K36 trimethylation. This analysis revealed 2,368 differentially methylated gene bodies in Setd2wt/− GCB cells, almost all having reduction of H3K36me3 in Setd2wt/− GCB cells (Fig. 3H; Supplementary Fig. S3). To determine whether this reduction in H3K36me3 might lead to accrual of DNA damage, we measured the relative abundance of single-nucleotide variants (SNV) occurring within differentially H3K36-trimethylated gene bodies and observed significantly increased abundance of SNVs (P < 1e−300) and a highly significant correlation between H3K36me3 reduction and gain of SNVs (P < 1e−300; Fig. 3I and J).
H3K36me3 Loss Is Associated with Increased SHM and Off-Target AICDA Mutations
The dominant mechanism of mutagenesis in GCB cells is AICDA-mediated SHM (47). Indeed, sequencing analysis of the Ig JH4 variable region locus in GCB cells revealed a significantly higher abundance of mutations in the Setd2-haploinsufficient state (Fig. 4A). Moreover, Setd2wt/− JH4 alleles often contained higher numbers of point mutations per allele (Fig. 4B) and a significant increase in the proportion of noncanonical AICDA-mediated A>T mutations (Fig. 4C). Linking this finding to Setd2 deficiency, we observed a marked reduction in H3K36me3 at the JH4 locus (Fig. 4D). In addition to SHM, AICDA also mediates class-switch recombination (CSR) in GCB cells, an effect that was intact in Setd2wt/− resting B cells induced to undergo CSR ex vivo (Supplementary Fig. S4A and S4B).
In addition to on-target Ig loci, AICDA is known to induce off-target mutagenesis at accessible chromatin throughout the genome, contributing to development of lymphomas (6). Certain genes have been shown to be more susceptible to AICDA mutagenesis (6, 48), and indeed we observed that these canonical murine and human AICDA off-target genes including critical DLBCL oncogenes featured a significant reduction of gene body H3K36me3 (Fig. 4E; Supplementary Fig. S4C). One of the genes known to be most affected by AICDA off-target genes is PIM1, a highly prevalent lymphoma oncogene (49). Sequencing regions of Pim1 known to be affected by AICDA revealed a significantly higher burden of somatic mutations in Setd2wt/− GCB cells (Fig. 4F). Examining H3K36me3 across the Pim1 locus revealed a clear reduction of this histone mark across the coding region in Setd2wt/− GCB cells, including the 5′ region most affected by SHM (ref. 6; Fig. 4G).
H3K36me3 is normally deposited by SETD2 in actively transcribed genes, suggesting a link between transcriptional activation and H3K36me3. However, RNA sequencing (RNA-seq) performed in Setd2wt/wt or Setd2wt/− CBs and CCs showed lack of distinct expression profiles and virtually no differentially expressed genes between genotypes (Supplementary Fig. S4D). Unexpectedly, we did observe significant enrichment for induction of expression at genes that lose H3K36me3 in Setd2wt/− GCB cells (Fig. 4H). Mapping RNA-seq reads to gene bodies suggests that increased reads could be linked to RNAPII processivity, as reads were diffusely distributed across these loci (Fig. 4I). Similarly, we performed Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) in these cells and did not observe significant changes in chromatin accessibility, although there was a minor trend toward increased reads in gene bodies that lost H3K36me3 (Supplementary Fig. S4E and S4F). To gain further insight into this mechanism, we performed nascent transcription precision nuclear run-on sequencing (PRO-seq; ref. 50), calculated an RNAPII processivity score by dividing RNA-seq/PRO-seq counts per gene, and observed an overall increase of RNAPII processivity in Setd2wt/− GCB cells (Fig. 4J). This suggests that loss of H3K36me3 perturbs the role of RNAPII in transcription-coupled detection of DNA damage, thus leading to increased processivity.
Setd2 Haploinsufficiency Results in Accelerated Lymphomagenesis
The fitness effect of Setd2 haploinsufficiency along with its hypermutator phenotype pointed to a likely haploinsufficient tumor suppressor role. To explore such effects, we crossed the Setd2-floxed mice with the Rosa26lox-stop-lox-BCL2-IRES-GFP strain (51) for conditional expression of the canonical lymphoma oncoprotein BCL2 and GFP in B cells, generating Cd19wt/cre;Rosa26-BCL2GFP;Setd2wt/− (SETD2/BCL2), Cd19wt/cre;Rosa26-BCL2GFP;Setd2wt/wt (BCL2), Cd19wt/cre;Rosa26wt/wt;Setd2wt/− (Setd2), and Cd19wt/cre;Rosa26wt/wt;Setd2wt/wt (cre) mice. BCL2 was selected as a cooperating oncogene, since it is upregulated in both GCB and activated B-cell (ABC) DLBCLs (40). Exploring a publicly available set of patients with RNA-seq performed in isolated tumor cells and mutational profiles, we confirmed that BCL2 is generally highly expressed in SETD2-mutant patients (Supplementary Fig. S5A). A cohort of 40 mice per genotype was generated through BM transplantation to lethally irradiated C57BL/6 recipients, which were then immunized three times to stimulate GC formation. To explore whether Setd2 haploinsufficiency could accelerate lymphomagenesis, we sacrificed animals at a time point prior to mice manifesting overt disease (182 days after BM transplantation; Fig. 5A).
Gross examination of spleens revealed marked splenomegaly in SETD2/BCL2 as compared with BCL2, Setd2, or cre mice (Fig. 5B and C). Histologic evaluation of splenic sections from Setd2/BCL2 animals showed disruption of splenic architecture by a heterogeneous B-cell population with variable nuclear size and amount of cytoplasm. There was striking expansion and disruption of GCs as shown by PNA and Ki-67 staining in Setd2/BCL2 versus BCL2 spleens, consistent with the effect of Setd2 haploinsufficiency inducing GC hyperplasia and impairing apoptosis (Fig. 5D). In marked contrast, spleens in BCL2 mice retained their follicular structure, although these were enlarged and somewhat distorted. There were enlarged but well-delimited GCs within these follicular structures, with B cells appearing more uniform and consisting of tightly packed centrocytic cells. GCB cells occupied 3-fold greater area in Setd2/BCL2 versus BCL2 mice (Fig. 5E). Flow cytometry from these spleens showed significantly greater abundance of B cells in both Setd2/BCL2 and BCL2 mice as compared with Setd2 and controls, almost 100% of which were GFP+ (Supplementary Fig. S5B–S5D), whereas T cells did not express GFP. GC expansion in the spleen of Setd2/BCL2 animals was confirmed by flow cytometry (Supplementary Fig. S5E). Examination of other lymphoid tissues revealed massive enlargement of lymph nodes in Setd2/BCL2 versus BCL2 mice, again with completely effaced architecture by heterogeneous expanded areas of proliferative GCB cells (Supplementary Fig. S5F–S5H). There was also extensive perivascular infiltration of B220+ lymphoma cells into other tissues, including the kidney and liver, in Setd2/BCL2 mice (Supplementary Fig. S5I and S5J). Overall, these data indicate that Setd2 haploinsufficiency results in significant acceleration and dissemination of lymphomagenesis and yields a distinctive proliferative GC phenotype in these tumors.
Despite this, plotting overall survival yielded no difference between Setd2/BCL2 and BCL2 mice (P = 0.4703, log-rank Mantel–Cox test; Supplementary Fig. S5K). However, mortality in BCL2 mice was not associated with overt and widespread aggressive lymphoma, but instead was most likely due to thrombotic microangiopathy resulting in severe renal disease, as previously reported in BCL2 transgenic mice (Supplementary Fig. S5L–S5N; refs. 52, 53). In contrast, Setd2/BCL2 mice showed 100% diffuse lymphoma penetrance and accordingly significantly greater tumor burden as compared with BCL2 mice, as manifested by greater spleen weights as well as greater fraction of GFP+ and B220+ tumor cells within spleens (Fig. 5F–I; Supplementary Fig. S5O and S5P). Moreover, Setd2/BCL2 displayed significantly greater frequency of major lymphadenopathy (37% vs. 9%), defined as lymph nodes greater than 100 mg in weight (Fig. 5J). Histologic analysis of end-stage Setd2/BCL2 lymphoid tissues showed total effacement of normal architecture by diffuse sheets of large and highly proliferative B cells, consistent with full transformation into DLBCL (Fig. 5K). In contrast, the spleens in terminal BCL2 mice contained infiltrates of smaller and heterogeneous cells, with only small islands of highly proliferative cells (Fig. 5K). B-cell receptor profiling of terminal Setd2/BCL2 and BCL2 lymphomas revealed that Setd2/BCL2 lymphomas were more likely to consist of dominant clonal populations, whereas BCL2 lymphomas were highly polyclonal (Supplementary Fig. S5Q and S5R). Gene expression profiles of GFP+ tumor cells (54) indicated that these lymphomas were distributed across the board as GCB, ABC, and unclassified based on the cell-of-origin classification system (Supplementary Fig. S5S). Intraperitoneal adoptive transfer of tumor cells from Setd2/BCL2 mice readily led to their engraftment (3/3) in RAG1KO mice within 30 to 70 days, leading to their death due to abdominal lymphomas that retained the histologic appearance of the original primary B-cell lymphoma (Fig. 5L–N). These tumors could additionally be engrafted into tertiary RAG1KO, NOD-SCID, and C57BL/6 (syngeneic immunocompetent) mice (Fig. 5O). In contrast, none of the terminal BCL2 lymphomas engrafted in RAG1KO recipients. Collectively, the data show that Setd2 deficiency leads to the formation of highly malignant and invasive high-grade DLBCLs with superior fitness to initiate lymphomas in recipient mice.
SETD2 Lymphomas Display a High Abundance of Clustered AICDA Signature Mutations Skewed to Nontemplate Strand DNA
The evidence for reduced sensing of DNA damage induced during the GC reaction led us to consider whether Setd2 haploinsufficiency might contribute to lymphomagenesis by enhancing mutagenesis and genetic heterogeneity, which are linked to increased tumor fitness (55). To address this question, we performed whole-genome sequencing (WGS) in the sorted GFP+ tumor cells from three Setd2/BCL2 and three BCL2 terminal mice, using as germline control the DNA from GFP− BM cells from the original donor animals. We noted a significant increase in the global abundance of SNVs in Setd2/BCL2 tumors (±1,800 vs. 900 per tumor), as well as SNVs located in gene exons (±12 vs. 3 per tumor; Fig. 6A and B) and the frequency of regional clustering of point mutations (Fig. 6C). Focusing on coding mutations, we identified 30 genes with exonic nonsynonymous mutations in Setd2/BCL2 lymphomas that included canonical AICDA off-target mutation genes (6), such as Pax5 and Cd83, compared with only five exonic mutations in BCL2 lymphomas (Fig. 6D; Supplementary Table S1). Known AICDA off-target genes featured more frequent and abundant SNVs in Setd2/Bcl2 tumor cells as compared with BCL2-only mice (Supplementary Fig. S6A). Along these lines, the mutation profiles of Setd2/BCL2 lymphomas were significantly enriched for canonical (C>T) and noncanonical (T>C) AICDA-type mutations (Fig. 6E).
Further analysis revealed that the increase in mutation burden in Setd2/BCL2 lymphomas was mainly due to highly clustered SNVs (<1 kb), which represent a significantly greater fraction of differential point mutations in these tumors (Fig. 6F). Even though much fewer in number, canonical and noncanonical AICDA-associated mutations tend to be more clustered in BCL2 than the other base substitution types, yet this difference was still far greater in the Setd2/BCL2 lymphomas, in which such mutations were far more abundant (Fig. 6G). In contrast, nonclustered mutations had comparable mutation profiles in both Setd2/BCL2 and BCL2 lymphomas (Fig. 6H). Off-target AICDA mutations are more abundant at highly transcribed genes (6), raising the possibility that this SETD2-associated clustered mutation pattern might reflect DNA damage occurring on ssDNA during RNAPII elongation, where the nontemplate strand is more vulnerable to attack by AICDA. Indeed, we observed significant overrepresentation of nontemplate strand AICDA mutations occurring in Setd2/BCL2 lymphomas, which were also far more abundant than in BCL2 lymphomas (Fig. 6I). Moreover, there was also greater skewing of clustered mutations to the nontemplate DNA strand especially at canonical and noncanonical AICDA-associated mutations in Setd2/BCL2 tumors (Fig. 6J; Supplementary Fig. S6B and S6C). Finally, examining three independent cohorts of human DLBCL patients showed a significantly increased burden of SNVs in SETD2-mutant versus SETD2 wild-type lymphomas (Fig. 6K). These findings suggest that SETD2 is important for repair of AICDA-induced DNA damage and loss of function leads to genomic instability.
Increased Burden of AICDA-Associated Structural Genomic Lesions in SETD2-Mutant Lymphomas
AICDA is known to contribute to structural genomic lesions arising from GCB cells, such as translocations that activate MYC activity (56). Strikingly, an overview of SV revealed significantly greater abundance of translocations, duplications, and deletions in Setd2/BCL2 lymphomas (Fig. 7A). Using a more complex genome graph-based SV analysis method (57), we found more highly complex genomic lesions in Setd2/BCL2 lymphomas including chromoplexy, kataegis, and rigma (Fig. 7B). There were an average of 13 translocations per sample in Setd2/BCL2 and four per sample in BCL2 lymphomas. We noted the presence of translocations involving the Ig loci (one in BCL2, and three in Setd2/BCL2 tumors). Notably two thirds of Setd2/BCL2 lymphomas (but 0/3 BCL2 lymphomas) had either an IgH–Myc translocation or IgK–Pvt [a lncRNA adjacent to Myc that regulates MYC stability and function (ref. 58; Fig. 7C and D)], also known to be mediated through AICDA (59). The Ig locus translocation in the BCL2 tumor was not proximal to any annotated genes. We identified an IgH–Gadd45b translocation in a Setd2/BCL2 tumor—a prosurvival factor in GCB cells and tumors (ref. 60; Fig. 7C). A gene set enrichment analysis (GSEA) of RNA-seq data from murine tumors revealed significant enrichment for canonical MYC target gene sets, and Myc transcript abundance was significantly higher in Setd2/BCL2 lymphomas, suggesting that deregulation of MYC is a common event in Setd2-driven lymphomagenesis (Fig. 7E–G; Supplementary Fig. S7A and S7B). These data again point to Setd2 haploinsufficiency impairing DNA damage sensing, facilitating severe genetic instability due to AICDA, including the development of canonical Myc translocations. Similarly, Trp53 deficiency was shown to facilitate development of Myc translocations and malignant transformation due to the actions of AICDA (61). Notably, SETD2 and TP53 mutations tend to be mutually exclusive in human DLBCL patients, but not in a pan-cancer analysis, suggesting context-dependent phenocopy of SETD2 and TP53 in GCB cells (Fig. 7H). Finally, examination the NCI cohort of human DLBCL patients with SV data indicated that SETD2-mutant or copy number–deficient lymphomas also manifested significantly greater burden of copy-number loss across the genome (Fig. 7I; Supplementary Fig. S7C). Collectively, these data suggest that Setd2 deficiency can lead to lymphoma fitness by fostering genomic instability through the actions of AICDA, enabling acquisition of numerous canonical AICDA-induced genomic lesions that confer selective advantage, such as translocations of the Myc locus.
Here, we show that SETD2, a histone methyltransferase responsible for depositing H3K36me3 at gene bodies in actively transcribed genes, has unique functions as a haploinsufficient tumor suppressor in GC-derived lymphomas. The reason for this is a synthetic DNA damage susceptibility scenario that is unique to GCB cells due to the effect of AICDA, a cytosine deaminase that normally induces ssDNA lesions to induce SHM of the Ig loci, as well as double-strand DNA breaks during CSR in B cells entering the GC reaction (3). Upon sensing DNA damage, most cells trigger checkpoint mechanisms to permit DNA damage repair prior to proceeding through the cell cycle. However, GCB cells uniquely attenuate expression of DNA damage–sensing mechanisms such as ATR and TP53 through the actions of the transcriptional repressor BCL6, thus allowing these cells to continuously proliferate and accumulate genetic lesions (8, 62). Perturbation of DNA damage sensing in GCB cells disrupts function, for example, Atm KO mice have decreased GC size due to accumulation of DNA damage and increased rates of cell death (63). This scenario may cause GCB cells to be exceptionally dependent on remaining DNA damage sensing and repair mechanisms to repair AICDA-induced damage and thus vulnerable to lymphomagenesis arising due to loss of function in repair mechanisms. This is the case in MMR-deficient mice, where loss of Msh2 in combination with BCL6 overexpression results in GCB-cell lymphomas with signs of genomic instability (64). Hence, our data point to SETD2 as a novel and critical DNA damage response mechanism in GCB cells required to restrict their potential for malignant transformation, at least in part through failure to trigger DNA damage–associated apoptosis mechanisms. GCB cells that fail to engage in productive interactions with T cells may also undergo apoptosis (7), and we cannot exclude that such effects could also occur here.
Mechanistically, the effect of SETD2 deficiency seemed to link to both sensing and repair of double- and single-strand DNA breaks, and hence is relevant to damage occurring during both CSR and SHM (65). Double-strand breaks can also occur during AICDA mutagenesis within the GC reaction during SHM (47), which may be reflected by the increased abundance of structural genomic lesions in Setd2-haploinsufficient lymphomas. The finding of increased Myc translocations and MYC target gene activation in Setd2-haploinsufficient lymphomas is reminiscent of the effect of compound H2ax and Trp53 loss of function in GCB cells, which also results in increased incidence of such lesions (61, 66). Hence, SETD2 mutation may be one of the many mechanisms through which DLBCLs induce aberrant MYC expression. It is striking that loss of one Setd2 allele in the GC context is sufficiently deleterious to mimic biallelic deletion of Trp53. The fact that there were increased complex genomic lesions, such as rigma and chromoplexy, speaks to the significant dependency of GCB cells on SETD2 to keep order in the genome in the context of their high mutational risk due to AICDA.
SETD2 has been mechanistically linked to transcription due to its role in depositing H3K36me3 at actively transcribed genes through its interaction with phosphorylated, elongating RNAPII, which is enhanced through transcriptional activation signals such as H3.3S31 phosphorylation (23, 67). However, it is not clear whether the H3K36me3 mark is required for transcriptional activation. Instead, it has been mostly linked with changes in nucleosomal occupancy and prevention of spurious intragenic transcriptional initiation through its interaction with the FACT complex and DNMT3B, respectively (68, 69). Alternatively spliced exons display lower levels of H3K36me3, and loss of SETD2 has been implicated in altering exon usage via recruitment of splicing factors including MRG15, ZMYMD11, and hnRNP proteins (25, 26, 70), a key process disrupted in SETD2-mutated colorectal cancers (31). The presence of H3K36me3 is linked instead to a greater tendency of DNA damage to induce HR repair, thus maintaining the integrity of coding genes (45). This function is especially critical in GCB cells, where the predominant DNA repair mechanism is nonhomologous end joining (NHEJ) and HR is critical for reducing the abundance of off-target mutations at coding genes (71, 72). Indeed, disruption of HR through loss of Brca2 or Xrcc2 led to impaired GC reactions and accumulation of DNA breaks, respectively (71, 73). The effect of Setd2 heterozygous loss of function may thread the needle and induce just the right level of impaired DNA damage response during transcription to reduce the efficiency of HR and likely MMR without otherwise impairing GCB cell functions. SETD2 is not really sensing but instead may be marking transcribed chromatin at risk for damage to provide a histone code setting that can be recognized by DNA damage proteins such as LEDGF upon DNA damage signaling occurring in cells.
Although SETD2 has been proposed to also mediate putative epigenetic effects relevant to transcriptional regulation (20), our data suggest that its role as a tumor suppressor is primarily through the described AICDA–DNA damage effect. Along these lines, we did not observe significant changes in gene expression profiles in Setd2wt/− GCB cells or any significant differential chromatin accessibility by ATAC-seq, whereas such effects on gene expression and chromatin accessibility were highly prominent in renal cell carcinomas with homozygous SETD2 deletion (74). This is in contrast to the impact of somatic mutations of other chromatin modifier proteins such as EZH2, CREBBP, KMT2D, and TET2, which result in dramatic perturbation of transcription and repressive chromatin modification profiles (13–18). Moreover, these other mutations are strongly associated with GCB-DLBCLs, whereas SETD2 mutations are not particularly linked to lymphomas with a specific transcriptional signature. These considerations nonetheless do not completely rule out subtle epigenetic effects of SETD2 haploinsufficiency, and future efforts to illuminate such mechanisms are certainly warranted. However, the data point to mutagenesis as the most likely force for natural selection favoring outgrowth of Setd2-deficient GCB cells through events such as Myc translocations and point mutations of other oncogenes and tumor suppressors. Our finding of substantial DNA base pair variation in the form of AICDA-associated “mutational noise” supports this notion, suggesting that SETD2 deficiency induces genetic heterogeneity as a potential metric of GCB cell fitness allowing outgrowth of lymphoma-initiating cells (55). Fitness may be further conferred by decreased cell death observed in Setd2-haploinsufficient GCB cells due to suppressed DNA damage checkpoint activation. The combination of increased genetic diversity and decreased apoptosis presents a scenario allowing for malignant transformation to readily occur, and is likely a significant driving force in humans given the concordance of mutagenic phenotypes in both murine and human SETD2-deficient/mutated lymphomas.
In summary, we provide a novel context-specific tumor suppressor mechanism for Setd2 haploinsufficiency in GCB cells, where it cooperates with GC-specific expression of AICDA to impair a newly defined critical role of SETD2 in restricting off-target mutagenesis during CSR and SHM. These findings make it tempting to speculate whether SETD2 haploinsufficiency could serve as a therapeutic vulnerability for patients with lymphoma, by targeting it with newly developed SETD2 inhibitors (75). This question is of particular interest and potential impact given the overrepresentation of SETD2 mutations in lymphoma patients of African ancestry (21), who also manifest inferior clinical outcomes as compared with the Caucasian population. In contrast to solid tumors and leukemias, which often feature homozygous loss of function, DLBCLs are virtually always heterozygous, suggesting a critical dependency on the remaining SETD2 allele, which might reflect the importance of this genomic checkpoint mechanism in maintaining the viability of these tumors, and providing a potential avenue for precision therapy for SETD2-mutant patients, with special benefit to patients of African ancestry.
The following mouse strains were obtained from The Jackson Laboratory: B6 Cd45.2 (C57BL6/6J, RRID:IMSR_JAX:000664), Cd19Cre [B6.129P2(C)-Cd19tm1(cre)Cgn/J, RRID:IMSR_JAX:006785], B6 Cd45.1 (B6.SJL-PtprcaPepcb/BoyJ, RRID:IMSR_JAX:002014), and RAG1KO (B6.129S7-Rag1tm1Mom/J, RRID:IMSR_JAX:002216). SETD2fl/fl mice were a generous gift from Scott Armstrong (30). Rosa26-BCL2-GFP mice for the lymphoma studies were obtained from H. Christian Reinhardt (51). The R26-Fucci2a model was developed by Ian J. Jackson (42). All mouse experiments were conducted using unbiased age- and sex-matched specimens. Unless stated otherwise, all animals were 8 to 16 weeks of age at the time of experimentation. All procedures were approved, and animals were maintained according to guidelines established by the Research Animal Resource Center of Weill Cornell Medicine. All mice were monitored until any one of several criteria for euthanizing were met, including severe lethargy and more than 10% body weight loss in accordance with the Weill Cornell Medicine Institutional Animal Care and Use Committee–approved animal protocol (protocol #2011-0031).
GC Assessment in Mice
To induce GC formation, age- and sex-matched mice were immunized intraperitoneally with 0.5 mL of 2% SRBC suspended in PBS (Cocalico Biologicals, 20-1334A).
BM cells were harvested from the femur and tibia of 8- to 12-week-old donor mice and treated with red blood cell lysis solution (Qiagen, 158904). One million cells were injected into lethally irradiated C57BL/6J host mice (2 doses of 450 rad, Rad Source Technologies RS 2000 Biological Research X-ray Irradiator) through the retro-orbital sinus. Transplanted mice were used for experiments 6 to 8 weeks after transplant.
Mice were sacrificed at the indicated time points, with organs harvested and mononuclear cells purified using Histopaque gradient centrifugation (Atlanta Biologicals, I40650). Single-cell suspensions were resuspended in flow buffer (PBS + 2% FBS, 2 mmol/L EDTA), blocked with mouse FC block (BD Biosciences, 553141, RRID:AB_394656), and stained with the following fluorescent-labeled anti-mouse antibodies: from BD Biosciences: APC annexin V (dilution 1:100, 550475, RRID:AB_2868885), APC anti-CD138 (dilution 1:500, 558626, RRID:AB_1645216), APC anti-IgG1 (dilution 1:500, 550083, RRID:AB_393553), BV421 anti-CD95 (dilution 1:500, 562633, RRID:AB_2737690), BV786 anti-B220 (dilution 1:500, 563894, RRID:AB_2738472), FITC anti-B220 (dilution 1:500, 553087, RRID:AB_394617), FITC anti-CD23 (dilution 1:500, 553138, RRID:AB_394653), and PerCp-Cy5.5 anti-CD19 (dilution 1:500, 551001, RRID:AB_394004); from Thermo Fisher Scientific: APC anti-CD4 (dilution 1:500, 17-0041-81, RRID:AB_469319), APC anti-CD38 (dilution 1:500, 17-0381-81, RRID:AB_469381), APC anti-IgM (dilution 1:500, 17-5790-82, RRID:AB_469458), PE anti-CXCR4 (dilution 1:250, 12-9991-82, RRID:AB_891391), PE-Cy7 streptavidin (dilution 1:500, RRID:AB_10116480), PerCp-Cy5.5 anti-CD45.1 (dilution 1:500, 45-0453-82, RRID:AB_1107003); and from BioLegend: APC-Cy7 anti-CD38 (dilution 1:500, 102728, RRID:AB_2616968), APC-Cy7 anti-CD45.2 (dilution 1:500, 109824, RRID:AB_830789), BV510 anti-IgD (dilution 1:500, 405723, RRID:AB_2562742), PE-Cy7 anti-CD23 (dilution 1:500, 123420, RRID:AB_1953277), PE-Cy7 anti-CD86 (dilution 1:500, 105014, RRID:AB_439783), PE-Cy7 anti-CD138 (dilution 1:500, 142514, RRID:AB_2562198), PerCp-Cy5.5 anti-CD38 (dilution 1:500, 102728, RRID:AB_2616968), PerCp-Cy5.5 anti-CD95 (dilution 1:500, 152610, RRID:AB_2632905), and PerCp-Cy5.5 anti-GL7 (dilution 1:500, 144610, RRID:AB_2562979).
Intracellular staining was performed by first staining membrane-bound targets, then fixing and permeabilizing cells, followed by staining of intracellular targets. Fixation/permeabilization was performed by either BD cytofix/cytoperm (BD Biosciences, 554714, RRID:AB_2869008) and then stained with the following fluorescent-labeled antibodies: Alexa Fluor 647 anti-phospho-histone H2A.X S139 (Cell Signaling Technology, dilution 1:100, 9720, RRID:AB_10692910), Alexa Fluor 647 anti-active caspase-3 (BD Biosciences, dilution 1:100, 560626, RRID:AB_1727414), and PE anti-phospho-CHK1 S345 (Cell Signaling Technology, dilution 1:100, 12268, RRID:AB_2797863).
DAPI (Thermo Fisher Scientific, D1306, RRID:AB_2629482) or Ghost Dye Violet 510 (Tonbo Biosciences, 13-0870) was used for the exclusion of dead cells. Data were collected on the FACSCanto II or Fortessa (BD Biosciences) flow cytometry analyzer and analyzed using the FlowJo software package (TreeStar, RRID:SCR_008520).
NB and GCB cells were sorted from the spleens of mice immunized with SRBC for 7 days. Briefly, single-cell suspensions were stained with anti-B220, anti-Fas, and anti-CD38. CBs and CCs were separated from the GC fraction using anti-CXCR4 and anti-CD86 antibodies. DAPI was used to exclude dead cells. Cell sorting was performed in a BD Influx cell sorter in the Weill Cornell Medicine (WCM) Flow Cytometry Core Facility. Magnetic bead cell isolation for GCB cells was performed using the PNA MicroBeads kit (Miltenyi Biotec, 130-110-479) or resting B cells using the anti-43 MicroBeads kit (Miltenyi Biotec, 130-049-801) according to the manufacturer's protocol. Cell purity was confirmed by flow cytometry, and all samples had over 90% cell purity of selected populations.
Histology and IHC
Mouse organs were fixed with 4% formaldehyde and embedded in paraffin. Tissue processing and staining was performed by the Laboratory of Comparative Pathology (Memorial Sloan Kettering Cancer Center). Briefly, 5-μm sections were deparaffinized, with heat antigen retrieval in citrate buffer pH 6.4 and endogenous peroxidase activity blocked with 3% hydrogen peroxide in methanol. Indirect IHC was performed with anti-species specific biotinylated secondary antibodies followed by avidin horseradish peroxidase or avidin-AP and developed by Vector Blue or DAB color substrates (Vector Laboratories). Sections were counterstained with hematoxylin. Immunofluorescence slides were stained with the following secondary antibodies: Donkey anti-Rat Alexa Fluor 488 (Invitrogen, A21208, RRID:AB_2535794) and Donkey anti-Rabbit Alexa Fluor 594 (Invitrogen, A21207, RRID:AB_141637). The following antibodies were used: biotin-conjugated anti-B220 (BD Biosciences, 550286, RRID:AB_393581), anti-PNA (Vector Laboratories, B1075, RRID:AB_2313597), anti-CD138 (BD Biosciences, 553712, RRID:AB_394998), anti–cleaved caspase-3 (Asp175; Cell Signaling Technology, 9661, RRID:AB_2341188), and anti–Ki-67 (Cell Signaling Technology, 12202, RRID:AB_2620142). Slides were scanned using a Zeiss Mirax Slide Scanner, and photomicrographs were examined using an Aperio eSlide Manager (Leica Biosystems). QuPath software was used to quantify GC area.
Genomic DNA and RNA Extraction
Genomic DNA was extracted using QuickExtract DNA Extraction Solution (Epicenter, QE09050) or DNeasy Blood and Tissue Kit (Qiagen, 27106). Total RNA was extracted from cells using TRIzol (Thermo Scientific, 15596018). RNA concentration was determined by Qubit Fluorometric Quantification (Thermo Scientific), and integrity was verified by an Agilent 2100 Bioanalyzer (Agilent Technologies).
NB and GCB cells, at least 300,000 each from Setd2wt/wt and Setd2wt/− mice, were sorted by FACS. CUT&RUN was performed following CUTANA protocol v1.5.2 (Epicypher). Briefly, cells were washed, immobilized onto Concanavalin-A beads (Bangs Laboratories, Inc., BP531), and incubated overnight at 4°C with 0.01% digitonin and 0.5 μg H3K36me3 antibody (Epicypher, 13-0035) or anti-rabbit IgG (antibodies-online Inc., ABIN101961, RRID:AB_10775589). CUT&RUN-enriched DNA was purified using a Monarch DNA Cleanup Kit (NEB, T1030S) and 5 to 10 ng used to prepare sequencing libraries with Ultra II DNA Library Prep Kit (NEB, 9645L). DNA libraries were sequenced on a Hi-Seq 2 × 150 bp sequencer (Illumina) at GENEWIZ.
Data were aligned to mm10 genome using BWA-MEM (76, 77), and fragments per kilobase of transcript per million fragments mapped–normalized coverage tracks were generated using the bamCoverage tool from the deepTools framework (78). H3K36me3 peaks were called with SICER2 (ref. 79; window size 200 bp, redundancy_threshold 1), and genes bound by H3K36me3 were identified as those with >40% of their respective gene bodies covered by called peaks (n = 9,225 genes). Genes showing gain or loss of H3K36me3 were calculated using normalized read counts within H3K36me3-bound genes using the deepTools multiBigwigSummary tool (fold change >1.5, nominal Wilcoxon P < 0.05; n = 1 gain, n = 2,368 loss). Read density plots were generated using the deepTools computeMatrix and plotHeatmap tools.
Library preparation, sequencing, and postprocessing of raw data were performed at the Epigenomics Core at WCM or the New York Genome Center. Libraries were prepared using the Illumina TruSeq Stranded mRNA Library Prep Kit (Illumina, 20020594) and sequenced with PE50 paired-end sequencing, performed on an Illumina NovaSeq 6000 sequencer. Sequencing results were aligned to mm10 using STAR and annotated to RefSeq using the R subread package. Differential gene expression was identified using the EdgeR package (80) with thresholds of fold change >1.5 and P < 0.01, adjusted for multiple testing using Benjamini–Hochberg correction. Hierarchical clustering was performed using Euclidean distance of log transcripts per million values of genes. GSEA was performed using the GSEA algorithm (81), and pathway analysis was performed using the PAGE algorithm (82).
PRO-seq Data Preparation and Processing
PRO-seq libraries were prepared according to previously described protocols (50). Exceptions include using a run-on Master Mix, with biotinylated nucleotides at 10 mmol/L Biotin-11-ATP, 10 mmol/L Biotin-11-GTP, 100 mmol/L Biotin-11-CTP, and 100 mmol/L Biotin-11-UTP; digested RNA by base hydrolysis in 0.2N NaOH on ice was reduced from 8 to 6 minutes. Briefly, chromatin from 1e6 cells per sample was mixed with a 1:10,000 ratio of S2 chromatin and normalized by dividing mouse reads in each sample by the total number of S2 reads in the same sample. Libraries were prepared using adapters that contain a 6-bp unique molecular identifier sequence on read1. Libraries were competitively aligned to a genome that resulted by merging mm10 assembly with D.melanogaster dm3 genome assembly. Alignment was performed using the proseq2.0 pipeline developed by the Danko lab (https://github.com/Danko-Lab/proseq2.0) using the parameters -PE –RNA5 = R2_5prime –UMI1 = 6.
Downstream analysis was performed in R using Genomic (83) and Brgenomics1.1.3 (https://mdeber.github.io/index.html). Gene expression was quantified using the GENCODE v20 annotations in mouse, and differential expression was quantified using DESeq2 (84). The total number of reads around each transcription start site (TSS) or within gene bodies of annotated GENCODE v20 genes was counted with a 200-bp window centered on gene start sites, whereas gene bodies were defined as the entirety of the gene excluding the first and last 300 bp from the TSS and the transcription end site, respectively. Raw PRO-seq counts were used as input to DESeq2 along with the mouse/fly ratios as scaling factors.
Omni-ATAC-seq was performed as previously described (85). Nuclei were prepared from 50,000 Setd2wt/wt and Setd2wt/− CBs or CCs and incubated with 2.5 μL transposase (Illumina, 15028212) in a 50 μL reaction volume for 30 minutes at 37°C. Following purification of transposase fragmented DNA, the library was amplified by PCR and subjected to high-throughput sequencing on a Hi-Seq 2 × 150 bp sequencer (Illumina) at GENEWIZ.
Library preparation, sequencing, and postprocessing of raw data were performed at the New York Genome Center. DNA quality was confirmed by Fragment Analyzer (Advanced Analytics), and all samples had a genomic quality number above 8.9. WGS libraries were prepared using the TruSeq DNA PCR-Free Library Preparation Kit in accordance with the manufacturer's instructions. Briefly, 1 μg of DNA was sheared using a Covaris LE220 sonicator (Adaptive Focused Acoustics). DNA fragments underwent bead-based size selection and were subsequently end-repaired, adenylated, and ligated to Illumina sequencing adapters. Final libraries were evaluated using fluorescent-based assays including qPCR with the Universal KAPA Library Quantification Kit and Fragment Analyzer (Advanced Analytics) or BioAnalyzer (Agilent 2100). Libraries were sequenced on an Illumina NovaSeq sequencer (v1.5 chemistry) using 2 × 150 bp cycles.
Whole-genome data were processed on the New York Genome Center automated pipeline. Paired-end 150-bp reads were aligned to the GRCm38 mouse reference using the Burrows-Wheeler aligner (BWA-MEM v0.7.8) and processed using the Genome Analysis Toolkit (GATK) best-practices workflow that includes marking of duplicate reads using Picard tools (v1.83; http://picard.sourceforge.net), local realignment around insertions/deletions (indels), and base quality score recalibration via GATK v3.4.0. Tumor and normal bam files were processed through the New York Genome Center's variant calling pipeline consisting of MuTect2 (GATK v22.214.171.124; ref. 86), Strelka2 (v2.9.3; ref. 87), and Lancet (v1.0.7; ref. 88) for calling SNVs and short indels, SvABA (v0.2.1; ref. 89) for calling indels and SVs, Manta (v1.4.0; ref. 90) and Lumpy (v0.2.13; ref. 91) for calling SVs and FACETS (v0.5.5; ref. 92), EXCAVATOR2 (v1.1.2; ref. 93), and Biseq2 (v0.2.6; ref. 94) for calling copy-number variants. Calls are merged by variant type [SNVs, multinucleotide variants (MNV), indels, and SVs]. SVs are converted to bedpe format, all SVs below 500 bp are excluded, and the rest are merged across callers using bedtools (95) pair to pair (slope of 300 bp, same strand orientation, and 50% reciprocal overlap). SNVs and indels with minor allele frequency of 1% or higher in either 1000 Genomes (phase III) or gnomeAD (r.2.0.1) were removed. Allele counts and frequencies were used to filter the somatic callset. Variants were filtered if (i) the variant allele frequency (VAF) in the tumor sample was less than 0.0001, (ii) if the VAF in the normal sample was greater than 0.2, or (iii) if the depth at position was less than 2 in either the tumor or normal sample. Variants were also filtered if the VAF in the normal sample was greater than the VAF in the tumor sample. Variant rearrangement junctions were identified using SvABA and GRIDSS with standard settings; 1-kb binned read depth was computed and corrected for GC and mappability using fragCounter. Junction-balanced genome graphs were generated from binned read depth and junction calls using JaBbA (57). Rigma are an isolated cluster of deletions at a single chromosomal locus, whereas chromoplexy are multiway reciprocal rearrangements where three or more breakpoints on distinct chromosomes are rearranged. Genome graphs and corresponding genomic data (e.g., binned coverage and allelic bin counts) were visualized using gTrack.
All mutational signature analyses were performed using R (v4.0.1). The R package MutationalPatterns (v3.2.0; ref. 96) was run on the high confidence somatic SNV calls to estimate contributions of known Catalogue of Somatic Mutations in Cancer (COSMIC) mutational signatures (v3) in the tumor sample. Nearest mutation distance (NMD) was computed for SNVs within the same tumor–normal pair and was used to partition SNVs in two groups of clustered (NMD < 1 kb) and nonclustered (NMD ≥ 1 kb) mutations. Mutations were evaluated for involvement in transcriptional strands, where SNVs on the opposite strand of gene bodies were classified as transcribed. Strand bias was assessed using a Poisson test of strand asymmetry.
Quantitative Real-Time PCR
cDNA synthesis from total RNA was performed using a Verso cDNA Synthesis Kit (Thermo Scientific, AB1453B). Gene expression was detected using the Fast SYBR Green Master Mix (Thermo Scientific, 4385614) on a QuantStudio6 Flex Real-Time PCR System (Thermo Scientific). Gene expression was normalized to HPRT levels using the ΔΔC(t) method, with results presented as mRNA expression.
JH4 Intron Sequencing and Assessment of Pim1 Somatic Mutations
GCB cells were isolated using PNA enrichment. DNA was purified using a DNA Clean and Concentrator kit (Zymo Research, D4013). For JH4 intron, sequences were amplified from GCB-cell gDNA by PCR using JH4 forward primer (5′-GGA ATT CGC CTG ACA TCT GAG GAC TCT GC-3′), JH4 reverse primer (5′-GAC TAG TCC TCT CCA GTT TCG GCT GAA TCC -3′; ref. 97), and Phusion Hot Start II DNA polymerase (Thermo Scientific, F549S). PCR conditions used were (98C 3′) × 1, (98C 30″, 72C 1′) × 39, and (72C 10′) × 1. The Pim1 locus was amplified using primers Pim1 forward primer (5′-TTC GGC TCG GTC TAC TCT G-3′) and Pim1 reverse primer (5′-GGA GGG AAA AGT GGG TCA TAC-3′). The PCR program used was (95C 2′) × 1, (95C 1′, 65C 1′, 72C 1′) × 25, and (72C 15′) × 1. PCR products were resolved by agarose gel electrophoresis and corresponding band extracted using the QIAquick Gel Extraction Kit (Qiagen, 28704).
Purified PCR products were cloned into the Zero Blunt TOPO PCR Cloning Kit (Thermo Scientific, 450245) and grown overnight at 37°C on Kanamycin agar plates as per the manufacturer's instructions. Bacteria colony sequencing was performed by GENEWIZ using the T7 universal sequencing primer (5′-TAA TAC GAC TCA CTA TAG GG-3′) for the Pim1 locus and the JH4 sequencing primer (5′-CCA TAC ACA TAC TTC TGT GTT CC-3′) for the JH4 locus. Mutation mismatch counts were calculated from Sanger sequencing and compared with the consensus sequencing using the Basic Local Alignment Search Tool (BLAST).
Genomic DNA was isolated from murine tumors using the QIAGEN DNeasy Blood and Tissue Kit (Qiagen, 69504), and 1 μg of genomic DNA per sample was used for sequencing. Sample data were generated using the immunoSEQ Assay (Adaptive Biotechnologies). The somatically rearranged mouse IGH complementarity-determining region 3 (CDR3) was amplified from genomic DNA using a two-step, amplification bias–controlled multiplex PCR approach (98). The first PCR consists of forward and reverse amplification primers specific for every V and J gene segment and amplifies the hypervariable CDR3 of the immune receptor locus. The second PCR adds a proprietary barcode sequence and Illumina adapter sequences. BCR repertoire analyses were performed using the immunoSEQ Analyzer 3.0 (Adaptive Biotechnologies). Sequences were subjected to analysis using IMGT/Vquest software to define all V, D, and J genes as well as CDR3 sequences.
Histone Mass Spectrometry
Mouse B cells were sorted by FACS as described above with 105 cells collected into 2N H2SO4, as described by Camarillo and colleagues (99). Cellular debris was removed by centrifugation at 4,000 × g for 5 minutes and histones were precipitated with trichloroacetic acid at 20% (v/v) overnight at 4°C. Histones were pelleted by centrifugation at 10,000 × g for 5 minutes and washed with 0.1% HCl in acetone, followed by a wash in acetone and centrifugation at 15,000 × g for 5 minutes. Pellets were dried in a fume hood and store at −80°C. Derivatization and digestions were performed based on the procedure described by Garcia and colleagues (100). Dried histones were resuspended in 50 mmol/L ammonium bicarbonate and sodium hydroxide, and then propionic anhydride was added to the histone solution and adjusted to pH 8 with additional sodium hydroxide. Samples were incubated at 52°C for 1 hour before drying to completion in a SpeedVac concentrator. Propionylated histones were resuspended in 50 mmol/L ammonium bicarbonate and digested for 16 hours with 0.5 μg trypsin. Digests were dried in a SpeedVac concentrator and subjected to a final propionylation as described above.
Samples were resuspended in water with 0.1% trifluoroacetic acid (TFA) and analyzed by nano-LC (Dionex) on a TSQ Quantiva triple quadrupole mass spectrometer (Thermo Scientific). Peptides were loaded on a 3 cm × 150 μm trapping column, packed with ProntoSIL C18-AQ, 3 μm, 200 Å resin (New Objective) in water with 0.1% TFA for 10 minutes at 2.5 μL/minute. The peptides were eluted at 0.30 μL/minute from the trapping and PicoChip analytical column, 10 cm × 75 μm packed with ProntoSIL C18-AQ, 3 μm, 200 Å resin (New Objective) over a 45-minute gradient from 1% to 35% Nano Pump Solvent B (95% acetonitrile with 0.1% formic acid; Nano Pump Solvent A, water with 0.1% formic acid). Ions were produced by electrospray from a 10-μm emitter tip and introduced into the mass spectrometer with the following settings: collision gas pressure of 1.5 mTorr; Q1 peak width of 0.7 (FWHM); cycle time of 3 s; skimmer offset of 10 V; and electrospray voltage of 2.5 kV. All injections were performed in technical triplicate. Targeted analysis of unmodified and various modified histone peptides was performed with transitions specific to each peptide species as described previously. Raw MS files were analyzed with Skyline (v4.1) using Savitzky-Golay smoothing, and peak area assignments were manually assessed. The percent relative abundance of each histone posttranslational modification was calculated from the total peak areas exported from Skyline.
Cells were collected and washed twice with PBS. Pellet was resuspended in 200 μL of 1× Abcam Working Lysis buffer with protease inhibitor (Abcam, Ab117152) and incubated on ice for 10 minutes, followed by 10 seconds vortex. The supernatant was removed and saved for analysis as the soluble fraction; 100 μL of 1× Working Extraction Buffer was added to chromatin pellet and resuspended, followed by 10-minute incubation on ice followed by 10 seconds vortex. Samples were sonicated twice for 20 seconds each and then centrifuged at 12,000 rpm for 10 minutes at 4°C. Chromatin supernatant was transferred to a new tube, followed by a 1:1 addition of chromatin buffer, and stored at −80°C.
Protein concentration was quantified using the BCA protein assay kit (Thermo Scientific, 23225), with a BSA standard curve. Samples were resolved by SDS-polyacrylamide gel electrophoresis, transferred to PVDF membranes, and probed with the primary antibodies LEDGF (1:2,000 dilution, Thermo Fisher Scientific, MA5-14821, RRID:AB_11009140), histone H3 (1:2,000 dilution, Abcam, Ab1791, RRID:AB_302613), and MEK1/2 (1:2,000 dilution, Cell Signaling Technology, 8727, RRID:AB_10829473), followed by horseradish peroxidase–conjugated secondary antibodies (1:5,000 dilution, Cell Signaling Technology, 7074, RRID:AB_2099233), and detected with chemiluminescence (WBKLS0500, Millipore). Densitometry was performed using ImageJ (101).
In Vitro CSR
Purified splenic resting B cells were cultured with B-cell media containing RPMI, 15% FBS, penicillin/streptomycin, 55 nmol/L β-mercaptoethanol, and 2 mmol/L L-glutamine. Induction of IgG1 switching was performed by the addition of LPS (33 μg/mL, Sigma, L4130) and mIL4 (12.5 ng/mL, R&D Systems, 404-ML). Induction of IgG3 switching was performed by the addition of LPS (33 μg/mL, Sigma, L4130).
In Silico Prediction of Mutation Impact
Secondary Tumor Transplantation
Lymph node tumors from BCL2 and Setd2/BCL2 animals were collected, and single-cell suspensions were prepared, followed by SRBC lysis and resuspended in PBS. 10 × 106 cells were injected into the peritoneal cavity of RAG1KO mice and monitored until any one of the criteria for euthanasia were met.
The DLBCL cell lines OCI-Ly7 and RIVA were grown in Iscove's modified Dulbecco's media (Thermo Scientific, 12440061) supplemented with 10% FBS (OCI-Ly7) or 20% FBS (RIVA) and antibiotics; SU-DHL-4, HBL1, and SU-DHL-2 were grown in Roswell Park Memorial Institute media (Corning, 10-040-CV) supplemented with 10% FBS (SU-DHL-4, HBL1) or 20% FBS (SU-DHL-2) and antibiotics. OCI-Ly7 was obtained from the Ontario Cancer Institute (OCI), SU-DHL-4 and RIVA from the German Collection of Microorganisms and Cell Cultures GmbH (DSMZ), SU-DHL-2 from the ATCC, and HBL1 from Jose A. Martinez-Climent (Centre for Applied Medical Research, CIMA). All cells were grown in a 37°C incubator at a 5% CO2 environment. Cell line authentication was performed on all parental and CRISPR KO cell lines at the University of Arizona Genetics Core using the short tandem repeat assay, and genetic profiles were compared with established cell line profiles. Cell lines were also routinely tested for Mycoplasma contamination in the laboratory.
Generation of SETD2 KO Cell Lines
To generate SETD2 KO cell lines (HBL1, OCI-Ly7, SU-DHL-4, and SU-DHL-2), parental cells were transduced with lentivirus expressing doxycycline-inducible Cas9 and the blasticidin resistance gene (Addgene, 83481), followed by 5 days of blasticidin selection. Cas9-expressing cells were transduced with lentivirus expressing a SETD2-specific sgRNA (SETD2G#1: 5′ AAA GAA ACA ATA GTA GAA GT 3′; SETD2G#2: 5′ AAT CTG ATG AAG ATT CTG TA 3′) and GFP (Addgene, 57822). Four days after doxycycline induction, GFP+ cells were single cell–sorted into 96-well plates and allowed to grow for at least 2 weeks. For RIVA, parental cells were electroporated with Amaxa Nucleofector Unit and the SF Cell Line 4D-Nucleofector X Kit (Lonza, PBC2-22500) to incorporate a recombinant Cas9 nuclease (Alt-R S.p. Cas9 Nuclease V3, Integrated DNA Technologies, 1081058), a SETD2-targeting Alt-R CRISPR–Cas9 crRNA (SETD2G#1: 5′ AAA GAA ACA ATA GTA GAA GT 3′; SETD2G#2: 5′ AAT CTG ATG AAG ATT CTG TA 3′; Integrated DNA Technologies), and an Alt-R CRISPR–Cas9 tracrRNA (Integrated DNA Technologies, 1075927) using the manufacturer's protocol. Forty-eight hours after electroporation, ATTO550+ single cells were sorted into 96-well plates and allowed to grow for at least 2 weeks. Clones were screened by PCR amplification of a 500-bp region encompassing the CRISPR–Cas9 cleavage site and verified by Sanger sequencing at GENEWIZ.
Quantification and Statistical Analyses
Data were analyzed with GraphPad Prism 7.01 and represented with number of replicates, type of measurement, and statistical significance reported in the figures and figure legends. Data are considered to be statistically significant when P < 0.05 in a two-sided t test or one-way ANOVA, with asterisks denoting degree of significance (*, P < 0.05; **, P < 0.01; ***, P < 0.001; ****, P < 0.0001). Fisher exact tests were performed for single or multiple pair-wise comparisons.
RNA-seq, CUT&RUN, PRO-seq, and ATAC-seq data have been deposited in the Gene Expression Omnibus database under accession number GSE189867. WGS data have been deposited into the BioProject database under accession number PRJNA830503.
W. Leung reports grants from the Leukemia & Lymphoma Society and the NCI during the conduct of the study. C. Meydan reports personal fees from Thorne HealthTech outside the submitted work. J.M. Camarillo reports grants from the NIH during the conduct of the study, as well as grants from the NIH outside the submitted work. C.R. Flowers reports grants from the Cancer Prevention & Research Institute of Texas (CPRIT Scholar in Cancer Research) during the conduct of the study, as well as personal fees from AstraZeneca, Bayer, BeiGene, BioAscend, Bristol Myers Squibb, Celgene, Curio Sciences, Denovo Biopharma, Epizyme/Incyte, Genentech/Roche, Genmab, MEI Pharmaceuticals, MorphoSys AG, Pharmacyclics/Janssen, and Seagen and grants from 4D, AbbVie, Acerta, Adaptimmune, Allogene, Amgen, Bayer, Celgene, Cellectis, EMD, Gilead, Genentech/Roche, Guardant, Iovance, Janssen Pharmaceutical, Kite, Morphosys, Nektar, Novartis, Pfizer, Pharmacyclics, Sanofi, Takeda, TG Therapeutics, Xencor, Ziopharm, the Burroughs Wellcome Fund, the Eastern Cooperative Oncology Group, the NCI, and V Foundation outside the submitted work. M. Imielinski reports personal fees from ImmPACT Bio outside the submitted work. S.S. Dave reports other support from Data Driven Bioscience outside the submitted work. S.A. Armstrong reports grants from the NIH (CA176745 and CA066996) during the conduct of the study, as well as personal fees from C4 Therapeutics, Neomorph, Mana Therapeutics, Accent Therapeutics, Cyteir, Imago Biosciences, and Vitae/Allergan and grants from Syndax, Janssen, and Novartis outside the submitted work. C.E. Mason reports personal fees from Tempus Labs outside the submitted work. A.M. Melnick reports grants from the Leukemia & Lymphoma Society and the NCI during the conduct of the study, as well as grants and personal fees from Epizyme and Janssen, and personal fees from Daiichi Sankyo and AstraZeneca outside the submitted work. No disclosures were reported by the other authors.
W. Leung: Conceptualization, resources, data curation, formal analysis, supervision, investigation, visualization, methodology, writing–original draft, writing–review and editing. M. Teater: Data curation, formal analysis, visualization, methodology, writing–review and editing. C. Durmaz: Data curation, formal analysis, visualization, methodology, writing–review and editing. C. Meydan: Data curation, formal analysis, visualization, methodology, writing–review and editing. A.G. Chivu: Data curation, formal analysis, visualization, methodology, writing–review and editing. A. Chadburn: Resources, formal analysis, investigation, writing–review and editing. E.J. Rice: Investigation, writing–review and editing. A. Muley: Investigation, writing–review and editing. J.M. Camarillo: Investigation, writing–review and editing. J. Arivalagan: Investigation, writing–review and editing. Z. Li: Formal analysis, visualization. C.R. Flowers: Resources. N.L. Kelleher: Resources, methodology, writing–review and editing. C.G. Danko: Resources, methodology, writing–review and editing. M. Imielinski: Resources, supervision, methodology, writing–review and editing. S.S. Dave: Resources, writing–review and editing. S.A. Armstrong: Resources, methodology, writing–review and editing. C.E. Mason: Resources, supervision, methodology, writing–review and editing. A.M. Melnick: Resources, supervision, funding acquisition, writing–original draft, project administration, writing–review and editing.
The authors express immense gratitude to the late Dr. Kristy Richards (Cornell University, WCM) for first conceiving of this project and for her contributions to lymphoma research. They also acknowledge Dr. Surya Seshan (Department of Pathology and Laboratory Medicine, WCM) for support in interpreting histologic disease. The work was supported by the following grants awarded to A.M. Melnick: NCI/NIH R35 CA220499, NCI/NIH P01 CA229086-01A1, LLS-SCOR 7012-16, LLS-TRP 6572-19, the Samuel Waxman Cancer Research Foundation, the Follicular Lymphoma Consortium, and the Chemotherapy Foundation. N.L. Kelleher is supported by NIGMS/NIH P41 GM108569. S.A. Armstrong is supported by the NIH grants CA176745 and CA066996. C.E. Mason thanks the Scientific Computing Unit (SCU), XSEDE Supercomputing Resources, as well as the Starr Cancer Consortium (I13-0052), the Vallee Foundation, the WorldQuant Foundation, the Pershing Square Sohn Cancer Research Alliance, the NIH (R01CA249054 and R35GM138152), and the Leukemia & Lymphoma Society grants MCL7001-18, LLS 9238-16, and LLS-MCL7001-18. The authors thank the Laboratory of Comparative Pathology, the Epigenomics Core, and the Flow Cytometry Core Facility at WCM.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.