Abstract
RNA splicing dysregulation underlies the onset and progression of cancers. In chronic lymphocytic leukemia (CLL), spliceosome mutations leading to aberrant splicing occur in ∼20% of patients. However, the mechanism for splicing defects in spliceosome-unmutated CLL cases remains elusive. Through an integrative transcriptomic and proteomic analysis, we discover that proteins involved in RNA splicing are posttranscriptionally upregulated in CLL cells, resulting in splicing dysregulation. The abundance of splicing complexes is an independent risk factor for poor prognosis. Moreover, increased splicing factor expression is highly correlated with the abundance of METTL3, an RNA methyltransferase that deposits N6-methyladenosine (m6A) on mRNA. METTL3 is essential for cell growth in vitro and in vivo and controls splicing factor protein expression in a methyltransferase-dependent manner through m6A modification-mediated ribosome recycling and decoding. Our results uncover METTL3-mediated m6A modification as a novel regulatory axis in driving splicing dysregulation and contributing to aggressive CLL.
METTL3 controls widespread splicing factor abundance via translational control of m6A-modified mRNA, contributes to RNA splicing dysregulation and disease progression in CLL, and serves as a potential therapeutic target in aggressive CLL.
See related commentary by Janin and Esteller, p. 176.
This article is highlighted in the In This Issue feature, p. 171
INTRODUCTION
Alternative splicing is a fundamental process that generates functional diversity in the transcriptome (1, 2). Advancement of RNA sequencing (RNA-seq) technology and computational algorithms for quantitative splicing analysis coupled with clinical sample annotations has not only identified RNA splicing dysregulation as a hallmark of cancer but also revealed that it underlies transformation, disease progression, and resistance to therapy (3–10). Understanding the molecular mechanisms of splicing dysregulation is key to dissecting the origins and progression of cancer and designing better therapeutic approaches.
Chronic lymphocytic leukemia (CLL), the most common form of adult mature B-cell leukemia in North America, displays transcriptome-wide RNA splicing defects (11, 12). Large-scale cancer genome sequencing has identified recurrent mutations of spliceosomal components (e.g. SF3B1) in CLL (13–16). These mutations alter splicing specificity and activate pathways that are essential for disease onset and progression (10, 16, 17). As exemplified in murine models, coexpression of Sf3b1 mutations with Atm deletion in murine B cells triggers the onset of CLL (8, 10), highlighting splicing dysregulation as a central underlying pathway in CLL. However, mutations of splicing factors are present in only about 20% of CLL samples and cannot fully explain the general splicing defects observed in this disease (18). This led us to hypothesize that transcriptional and posttranscriptional regulation of RNA splicing contributes to leukemogenesis.
N6-methyladenosine (m6A) is the most abundant and conserved internal modification of mRNA and regulates many aspects of RNA metabolism, including RNA splicing (19–24). As a key player in transcriptional and posttranscriptional control of gene expression, m6A modification is dynamically regulated by writers (METTL3/METTL14/WTAP, METTL16), erasers (ALKBH5, FTO), and readers (YTHDF1/2/3, YTHDC1/2, IGFBP1/2/3, HNRNPC, HNRNPG; refs. 22, 25–29). Emerging evidence reveals m6A modification is required for normal hematopoiesis, and alteration of m6A-modified transcripts leads to hematologic malignancies (30–32). However, the exact roles of m6A modification and its critical regulators in RNA splicing dysregulation in CLL are largely unknown. In the current study, we set out to examine the nongenetic regulation of RNA splicing dysregulation in CLL by METTL3 and m6A modification.
RESULTS
Posttranscriptional Upregulation of RNA Splicing Pathway in CLL
To study the spliceosome mutation-independent splicing defects in CLL, we analyzed RNA-seq data of B cells derived from 6 healthy donors and 36 CLL patients (13 with and 23 without SF3B1 mutations; Supplementary Table S1) using a bioinformatic pipeline that integrates transcript isoform quantification algorithms StringTie (33), LeafCutter (34), and rMATs (35) to maximally improve the power of detection of alternative splicing events, including de novo junctions. When comparing CLL with normal B cells, we identified that 5,545 and 4,017 alternative splicing events were associated with or without SF3B1 mutations, respectively, with 2,553 splicing events shared (CLL-associated events; Fig. 1A and B). When comparing CLL cells with or without SF3B1 mutations, 3,197 splicing events were differentially spliced, with 136 of these among the CLL-associated events (Fig. 1B). The proportion of shared splice variants between normal and CLL with or without SF3B1 mutations (Fig. 1B, blue and green) is greater than that of SF3B1 mutation-specific events shared with either group (Fig.1B, red, proportion test, P < 0.0001). Of note, splice variants associated with CLL had pervasive changes in intron retention (Fig. 1C), irrespective of SF3B1 mutations. As previously reported (17), SF3B1-mutant samples had cryptic 3′ splicing site selection through aberrant branch point usage (Supplementary Fig. S1A and S1B). Taken together, our results suggest that, alongside spliceosomal mutations, other shared regulators impact the alternative RNA splicing in CLL.
To discover regulators for RNA splicing, we performed transcriptomic and quantitative tandem mass tag (TMT) proteomic analyses using B cells derived from healthy donors (n = 10) and untreated CLL patients (n = 22; Fig. 1D; Supplementary Fig. S2A–S2C). 3,230 proteins were consistently detected across all samples. 2,912 genes and 328 proteins were differentially expressed between normal and CLL B cells based on RNA-seq and proteomics data, respectively (Supplementary Table S2). There is a general correlation between mRNA and protein expression, and as expected, CLL tumor-specific genes CD5 and ZAP70 were upregulated at both the RNA and protein levels (Fig. 1D). Interestingly, gene set enrichment analysis (GSEA) revealed that the RNA splicing process was highly upregulated at the protein level but not at the RNA level (Fig. 1E and F). This observation was further validated in two independent proteomic, as well as transcriptomic, data sets (refs. 17, 36–38; Supplementary Fig. S2D and S2E), highlighting that the RNA splicing pathway is likely upregulated posttranscriptionally in CLL.
Splicing Factor Upregulation Results in Splicing Dysregulation and Is an Independent Risk Factor in CLL
To determine whether the abundance of splicing factors impacts RNA splicing, we designed a dosage-dependent splicing (DDS) score system to assess protein dosage-dependent RNA splicing using matched proteomic and RNA-seq data derived from 19 CLL samples (Fig. 2A). Specifically, for each protein, we first identified two subsets of CLL samples, high (top 5 samples) and low (bottom 5 samples), based on its protein expression. Then, we systematically quantified alternative splicing events associated with the protein expression using matched RNA-seq data (Fig. 2A). Last, we assigned to each protein a DDS score, which is the number of splicing events the protein affected normalized to the protein-level difference between the high and low groups, and ranked all proteins by DDS score (Fig. 2B). A high DDS score indicates more splicing events corresponding to protein-level change. Proteins within the upper quartile of the DDS score were significantly enriched in the spliceosome complex, especially U2-type and tri-snRNP spliceosome (Fig. 2C). Upon overlapping DDS events derived from the top 10 splicing factors with 2,417 CLL-associated and SF3B1 mutation-independent splicing events (Fig. 1B; Supplementary Fig. S2F), we observed more than 35% overlap (Fig 2D, permutation test, P = 0.0087), with overlapped events significantly enriched for CLL-related pathways (Fig. 2E). These results suggest that upregulation of splicing factors indeed contributes to RNA splicing dysregulation in CLL by mechanisms yet to be elucidated.
To explore the impacts of splicing factor abundance on CLL pathogenesis, we examined the associations of spliceosome complexes and RNA-binding proteins [serine and arginine-rich (SR) and heterogeneous nuclear ribonucleoprotein (hnRNP) proteins] with clinical outcomes through a combined proteomic data set, which includes 123 samples (10 normal B cell and 113 CLL samples, 2,703 proteins) with 91 samples from a public data set (36). Because splicing factor abundance is highly correlated, we utilized “eigengenes” (39) to summarize the expression dynamics within spliceosome complexes (Fig. 2F). Eigengene is defined as the first principal component of the expression matrix, maximally allowing the diversity among different splicing factors and making it possible to calculate the associations with other features. The abundance of overall spliceosome complex was predictive of a shorter time to first treatment (TTFT) and retained significance when IGHV mutation status and 17p deletion were added (Fig. 2G and H). In particular, the abundance of spliceosome complexes A, B, Bact, C, P, as well as hnRNP proteins was all significantly associated with TTFT (Supplementary Fig. S3A). Moreover, a higher abundance of hnRNP proteins and SR proteins was also associated with inferior overall survival (OS; Supplementary Fig. S3B). Similar results were observed when only considering SF3B1 wild-type (WT) CLL samples (Supplementary Fig. S3C and S3D). These significant associations highlight that the abundance of splicing factors affects the progression of CLL.
METTL3 Potentially Regulates Splicing Factors in CLL
To decipher the mechanism underlying the widespread upregulation of splicing factors, we performed a correlational analysis between splicing factors and nonsplicing factors based on protein expression. As for each nonsplicing factor, we calculated the Pearson correlation coefficient for every detected splicing factor and then ranked these nonsplicing factors according to the number of correlation edges. METTL3, as one of the proteins with the most correlated edges with the detected splicing factors (111/143), had a 77.6% significant positive correlation (Fig. 3A). Importantly, these splicing factors were involved in splicing complexes with clinical associations (Fig. 3B). The other known RNA methyltransferase, METTL16, also showed a 58.7% positive correlation with splicing factors. These results imply that m6A writer proteins may serve as the posttranscriptional regulators for splicing factor abundance in CLL.
To explore the role of METTL3 and METTL16 in the regulation of splicing factor abundance, we first validated our proteomics results as well as computational analysis by quantifying protein expression of METTL3 and METTL16 as well as splicing factors (SF3B1, SF3A1, SF3A2, and SF3A3) using independent cohorts of samples with immunoblot (Fig. 3C; Supplementary Fig. S4A). Indeed, CLL cells exhibited upregulation of METTL3 and METTL16, which were positively correlated with spliceosome component expression. METTL3 and METTL16 appear to be regulated transcriptionally as they were consistently upregulated at both RNA and protein levels in CLL (Supplementary Fig. S4B). However, we observed no apparent RNA expression changes of splicing factors (SF3A1, SF3A2, and SF3B1), corroborated with results from RNA-seq analysis.
To assess the impacts of METTL3 and METTL16 on RNA splicing dysregulation in CLL, we identified 2,010 and 1,927 METTL3- and METTL16-associated DDS events using matched CLL RNA-seq and proteomics data. It appears that these DDS events are enriched for different cellular processes (Supplementary Fig. S4C). Consistent with this, only 56 events were shared, suggesting that METTL3 and METTL16 affect RNA splicing on different substrates (Supplementary Fig. S4D). Through the integration of CLL proteomic data with clinical annotation, we found that higher METTL3 but not METTL16 protein expression was associated with a shorter time to first therapy (log-rank test, P = 0.04; Fig. 3D; Supplementary Fig. S4E), suggesting METTL3 plays a role in driving aggressive CLL.
CLL Cells Exhibit Increased m6A Enriched for Transcripts Associated with mRNA Processing
Increased protein abundance of METTL3 in CLL motivated us to examine global m6A levels in these cells. Using liquid chromatography mass spectrometry (LC-MS) and m6A dot-blot assays, we detected an increase of m6A level (1.5–2-fold) in the mRNA derived from CLL cells compared with that of normal B cells (Fig. 3E and F; Supplementary Fig. S4F). There is no difference of m6A when analyzing total cellular RNA, confirming that increased m6A modification is specific to mRNA. Leveraging an individual-nucleotide-resolution m6A modification sequencing technology (MAZTER-seq; refs. 40, 41), we next mapped m6A sites on mRNA derived from normal (n = 5) and CLL (n = 9) B cells (Fig. 3G). The m6A writer complex preferentially deposits m6A on the consensus sequence, DRA*CH (D = G/A/U, R = G/A, A* = m6A, H = U/A/C) (25), and the m6A modification is enriched around the stop codon. As MAZTER-seq is biased toward ACA sites, we detected GGACA as the most enriched motif within DRACA, accounting for 50% of all the identified m6A sites (Supplementary Fig. S4G). Interestingly, transcripts related to RNA metabolism, mRNA processing, and RNA splicing appear to have a higher m6A density in the B-cell context (Fig. 3H; Supplementary Table S3), suggesting that these pathways are subjected to m6A-dependent modulation. Although CLL and normal B cells shared a similar m6A distribution, we observed an increased m6A level (hyper-m6A) in the 3′UTR regions of CLL cells, concomitant with a decreased m6A level (hypo-m6A) in coding sequence (CDS) regions (Fig. 3I, Binominal test, CDS: P = 0.0017; 3′UTR: P = 0.016). These location-dependent m6A changes might be caused by the upregulation of both m6A writer and eraser proteins in CLL cells (Supplementary Fig. S4A). 680 sites with differential m6A modification were identified between normal and CLL B cells with dysregulated transcripts enriched in RNA metabolism-related pathways, including RNA splicing (Fig. 3J; Supplementary Fig. S4H). These results support that m6A modification preferentially regulates splicing-related transcripts in CLL in a location-dependent manner.
METTL3 Is Essential for Cell Proliferation in CLL Cells
METTL3 has been functionally demonstrated to promote or suppress tumor growth in different cancer types (32, 42). To fully understand the function of METTL3 in CLL, we examined its impact on cell growth using cell lines and primary B cells both in vitro and in vivo. Knockout (KO) of METTL3 with CRISPR/Cas9 technology resulted in substantial inhibition of cell growth in all the mature B cell lines we tested, with Mino and HG3 cells most sensitive to METTL3 depletion (Fig. 4A; Supplementary Fig. S5A). We depleted METTL3 by shRNAs in primary normal and CLL B cells and then cocultured them with CD40L-overexpressing stromal cells. Depletion of METTL3 led to a greater growth disadvantage in CLL cells than in normal B cells (Fig. 4B; Supplementary Fig. S5B, P = 0.007). Our in vivo mouse xenograft of HG3 cells demonstrated a similar trend, with KO of METTL3 inhibiting disease progression and prolonging survival (Fig. 4C, P = 0.0064). Treatment with the METTL3 inhibitor STM2457 also resulted in growth defects in HG3 cells at a level similar to that of MOLM13 cells (Fig. 4D; Supplementary Fig. S5C and S5D), an STM2457-sensitive acute myeloid leukemia (AML) cell line (43). To determine whether METTL3 is a potential therapeutic target for CLL, we established xenograft mice using HG3 cells with doxycycline-inducible METTL3 KO and stable luciferase expression. Doxycycline diet led to a significant delay in tumor growth measured by bioluminescence imaging (Supplementary Fig. S5E). Taken together, the evidence supports that METTL3 is essential to cell growth in mature B-cell leukemia and lymphoma and is a potential therapeutic target for CLL treatment. To expand our understanding of the requirement of m6A and RNA–methylation complexes for cell growth in hematologic malignancies, we utilized DepMap data (https://depmap.org/portal/; ref. 44) to analyze the dependency of genes related to m6A modification in cell lines derived from different hematologic malignancies including AML, acute lymphocytic leukemia, CLL, diffuse large B-cell lymphoma, and multiple myeloma. It appeared that all these cell lines had higher dependency on m6A writers (Supplementary Fig. S5F) than erasers or readers for cell proliferation, highlighting the essentiality of m6A writers across hematologic malignancies.
To determine which domain of METTL3 is required for cell growth, we ectopically expressed full-length WT, or methyltransferase domain catalytic mutant (MUT), or N-terminal domain (NTD, aa1-200)-deleted METTL3 with or without catalytic domain mutation (WT ∆NTD, MUT ∆NTD) in Mino and HG3 cells with KO of METTL3 (Fig. 4E). sgRNAs and METTL3 expression constructs were tagged with mCherry and GFP, respectively. Using growth competition assay to track the ratio of double-positive cells over time, we observed that WT METTL3 completely rescued the growth defect in METTL3 KO cells. However, METTL3 MUT, WT ∆NTD, or MUT ∆NTD all lacked the potential to rescue the phenotype. Although METTL3 has been reported as an oncogene to promote the initiation and development of a variety of cancers (32, 42, 45), we failed to detect significantly enhanced cell proliferation upon overexpression of WT METTL3 in both Mino and HG3 cells (Supplementary Fig. S5G and S5H). Of note, overexpression of either catalytic mutant METTL3 (METTL3 MT) or WT ∆NTD METTL3 led to more drastic growth disadvantage in HG3 cells (Supplementary Fig. S5G) possibly due to competitively binding with another m6A writer protein METTL14, which interfered with its functions. These results highlight that both the catalytic domain and NTD of METTL3 play indispensable roles in cell growth.
Loss of METTL3 Affects the Translation of Splicing Factors
To dissect the underlying molecular mechanism by which METTL3 affects cell growth, we examined if KO of METTL3 affects B-cell receptor signaling, cell cycle, and apoptosis. Loss of METTL3 subtly increased early apoptosis (Supplementary Fig. S6A). Upon anti-IgM stimulation in JeKo-1 cells, AKT pathway activation was slightly reduced in STM2457-treated cells but ERK activation had no change (Supplementary Fig. S6B). Moreover, we observed a small but significant reduction of 5-ethynyl-2′-deoxyuridine (EdU) incorporation during the S phase and higher overall DNA content, suggesting compromised DNA synthesis and mitotic stress (Fig. 4F). Likewise, treatment with STM2457 also led to consistently reduced incorporation of EdU with subtle impact on the cell cycle (Supplementary Fig. S6C and S6D). Taken together, these changes suggest that other mechanisms beyond BCR signaling, cell cycle, or apoptosis may contribute to growth defects observed in METTL3 KO cells.
To define whether METTL3 KO affects translation, we examined O-propargyl-puromycin (OPP) incorporation into newly synthesized proteins in cells with or without METTL3 and found loss of METTL3 resulted in a decrease in OPP incorporation (Fig. 5A; Supplementary Fig. S6E). We further performed an integrated ribosome profiling (Ribo-seq) and RNA-seq analysis on HG3 cells with or without METTL3 KO to determine the transcripts for which METTL3 preferentially affects translation. Consistent with the OPP assay, loss of METTL3 decreased overall translation efficiency (TE; Fig. 5B), defined as the number of ribosome-protected fragments (RPF) divided by the mRNA expression. Analysis of the top 1,000 transcripts with decreased TE revealed that METTL3 KO preferentially affected the transcripts involved in the metabolism of RNA, cell-cycle, RNA splicing, and mRNA processing (Fig. 5B), in which splicing factors (SF3B1, SF3A1, SF3A2, and SF3A3) were confirmed by immunoblot (Fig. 5C). METTL14 expression was also decreased upon KO of METTL3. In contrast, we detected no gene-expression changes at the RNA level, highlighting the importance of posttranscriptional regulation of splicing factors by METTL3 in mature B cells (Fig. 5D).
We next determined whether the overexpression of METTL3 can rescue splicing factor expression. In response to overexpression of WT, but not catalytic mutant METTL3, splicing factor expression and METTL14 abundance were restored in METTL3 KO cells (Fig. 5E), indicating that protein expression of splicing factors is methyltransferase-dependent. Consistent with this observation, treatment with STM2457 also decreased the abundance of splicing factors (Fig. 5F; Supplementary Fig. S6F), suggesting that the NTD is functionally dependent on the catalytic domain of METTL3 in CLL. Furthermore, we found overexpression of SF3B1 in METTL3 KO cells partially rescued the growth defect in HG3 cells, highlighting splicing factors as the targets for METTL3 in CLL (Supplementary Fig. S6G). Apart from METTL3, YTHDF paralogs are reported as m6A reader proteins enhancing the translation of m6A installed transcripts (26, 46). Although KO of YTHDF2 led to growth defects, none of YTHDF paralogs affected splicing factor expression (Supplementary Fig. S6H–S6J). Taken together, our results suggest that METTL3 likely influences the translation of splicing factors by using both catalytic domain and NTD.
To determine the relationship between METTL3-regulated splicing factor expression and RNA splicing dysregulation in CLL, we measured the direct impact of splicing factor protein expression change on RNA splicing. We chose the splicing factor SF3B1 to examine this relationship as it is one of the most consistently upregulated splicing factors in CLL. Splice variants associated with the depletion of either SF3B1 or METTL3 displayed similar splicing pattern change, with intron retention as the most affected splicing type (Fig. 5G; Supplementary Fig. S6K), suggesting SF3B1 protein change is involved in the RNA splicing dysregulation in CLL. To further examine the contribution of SF3B1 and METTL3 protein expression to RNA splicing changes observed in CLL, we first identified “HG3-associated aberrant splicing events” (similar to CLL-associated events) by comparing RNA-seq data derived from HG3 cells and normal B cells. We then determined whether SF3B1 KO or METTL3 KO could rescue HG3-associated aberrant splicing events. 66.9% and 65.2% of HG3-associated aberrant splicing events showed |∆PSI| more than 0.1 upon SF3B1 KO or METTL3 KO, respectively (Supplementary Fig. S6L). In contrast, only 12.1% and 11.2% of events that are not HG3-associated aberrant splicing events had such changes (Supplementary Fig. S6L). Moreover, 85% of SF3B1 KO- and METTL3 KO-rescued events were overlapped (Supplementary Fig. S6M). These results indicated that METTL3-regulated splicing factor expression strongly influences RNA splicing in CLL.
KO of METTL3 Regulates Abundance of Splicing Factors via Modulating m6A at Stop Codons and CDS Regions
To dissect the mechanism by which METTL3 influences the abundance of splicing factors via m6A modification, we performed an integrative methylated RNA immunoprecipitation sequencing (MeRIP-seq), which identifies transcriptome-wide m6A sites, Ribo-seq, and RNA-seq analysis in cells with and without METTL3 KO. We first mapped out METTL3-responding m6A sites upon METTL3 KO. 5,875 hypomethylated and 1,409 hypermethylated sites were detected, with GGACU as the most enriched motif (Fig. 6A; Supplementary Fig. S7A), validating the role of METTL3 as an RNA methyltransferase. Differential m6A sites were primarily found in both exons and 3′UTRs, with the highest enrichment of m6A residues located near the stop codon (Supplementary Fig. S7A). Gene-expression changes negatively correlated with that of the m6A level, with upregulated genes enriched in inflammatory response and TNFα signaling pathways (Supplementary Fig. S7B). We further integrated MeRIP-seq and Ribo-seq data to determine the proteins specifically affected by METTL3-associated m6A changes. Interestingly, our results revealed that genes with reduced TE displayed not only hypomethylated but also hypermethylated sites (Supplementary Fig. S7C), possibly as direct or indirect targets of METTL3. We then explicitly examined m6A status on the mRNA of splicing factors (n = 189). 104 sites in 67 splicing factors out of 122 splicing factors with detectable m6A sites were found with dysregulated m6A status (Fig. 6B). Transcripts with hypomethylated sites were enriched at stop codon regions for most splicing factors, including SF3A2, SRSF1, SRSF5, SRSF6, SF3A3, and SF3B1, indicating these sites as direct target sites of METTL3 (Fig. 6B; Supplementary Fig. S7D). We further verified that METTL3 KO-induced methylation reduction in these splicing factors is not due to decreased RNA expression level as detected by MeRIP-PCR and qPCR (Supplementary Fig. S7E).
KO of METTL3 Influences Ribosome Recycling to Regulate Protein Translation
Although METTL3 has been reported to promote protein synthesis by binding eIF3h and forming an mRNA loop via m6A modification at stop codon regions (47, 48), the mechanism by which METTL3 KO influences translation via m6A modification is not completely elucidated. It is known that mRNA looping can facilitate faster ribosome recycling (49), which in turn could promote translation. The time of the ribosome at the stop codon regions reflects the ribosome recycling ratio. To determine whether METTL3 KO reduces translation by delaying ribosome recycling via downregulation of m6A methylation at the stop codon region, we analyzed the ribosome recycling rate by assessing the ribosomal coverage of P (peptidyl)-sites at CDS regions relative to that of stop codon regions. The change of TE was significantly correlated with that of ribosome recycling rate upon METTL3 KO (Fig. 6C, r = 0.15), and transcripts with decreased ribosome recycling rate were enriched for the RNA splicing pathway (Fig. 6C), suggesting that METTL3 regulates the splicing factor expression via ribosome recycling through m6A modification at or near the stop codon region.
To validate the functional association between m6A modification and splicing factor abundance, we used an m6A editing platform CRISPR-dCas13Rx-METTL3 to install m6A in a site-specific manner. Specifically, we fused dCasRx with the methyltransferase domain of METTL3 along with nuclear localization signal (NLS), and established stable HG3 cells with overexpression of CRISPR-dCas13Rx-METTL3 (Supplementary Fig. S8A and S8B). m6A at the hypomethylated site (m6A_3972) of SF3B1 was selected for validation as SF3B1 is one of the most consistently upregulated proteins in CLL (Supplementary Table S4, q = 0.0033). Upon introduction of m6A at the 3972 site, we observed a 1.4-fold increase of SF3B1 protein in HG3 cells, with its RNA level unchanged (Fig. 6D; Supplementary Fig. S8C), confirming METTL3 directly regulates SF3B1 expression via m6A. Collectively, these results suggest that METTL3 controls splicing factor translation via slowing ribosome recycling at stop codon regions in an m6A-dependent manner.
KO of METTL3 Affects Ribosome Pausing to Regulate Protein Translation
In addition to direct targets (Fig. 6B, hypomethylated transcripts), we also found hypermethylated transcripts associated with METTL3 KO, which may be indirect targets of METTL3. These sites were more localized at CDS and 5′ UTR region of mRNA (Supplementary Fig. S7A). m6A at the CDS region has been reported to impact the kinetics of tRNA entering into the ribosomal aminoacyl (A) sites and leading to ribosome pausing, which in turn affects ribosome elongation and reduces translational output (29, 50). We therefore sought to determine whether METTL3 KO-associated hypermethylated m6A sites at CDS regions can reduce TE via increased ribosomal pausing. We devised a ribosome pausing score (51) for each codon site in the CDS regions to assess the impact of m6A on translation elongation. Pause scores at each codon position in a transcript were calculated by dividing the number of ribosome-protected reads at that position by the average number of reads of that transcript. Within 1,798,661 ribosome-covered sites from our data, 34,975 sites (1.94%) displayed significantly increased ribosome pausing upon METTL3 KO compared with control cells, which is substantially higher than 6,788 sites with decreased pausing (Fig. 6E). Increased pausing sites were highly enriched in genes involved in RNA splicing (Fig. 6E). Of note, the sequence specificity of pausing sites was mostly enriched for GAN in the ribosomal A sites and well matched with the m6A motif (Fig. 6F; Supplementary Fig. S8D), strongly implicating ribosome pausing is tightly linked with m6A modification.
To determine whether m6A modification in these transcripts results in ribosome pausing, we performed an integrative MeRIP-seq and Ribo-seq analysis and found an elevated ribosome density on hypermethylated adenosines at the ribosomal A sites (Fig. 6G), corroborating that hypermethylation influences the decoding process. In supporting this observation, we further experimentally validated that increased m6A modification at the endogenous CDS region of SF3A3 decreased its protein expression with no impact on the overall mRNA abundance using dCasRx-METTL3 (Fig. 6H; Supplementary Fig. S8E). This result led us to consider the possible underlying mechanisms that account for m6A methylation upon METTL3 KO. Systematic examination of m6A writers and erasers revealed that METTL3 KO resulted in an increase in METTL16 and a decrease in ALKBH5 protein expression (Supplementary Fig. S8F), indicating that a complex m6A circuitry precisely controls the RNA splicing network by controlling splicing factor protein expression.
DISCUSSION
The discovery of recurrent somatic mutations in the core spliceosomal components and their significant associations with clinical outcomes emphasized the importance of genome-wide RNA splicing dysregulation and oncogenesis (13–15, 52–54). Hence, understanding the underlying mechanism of RNA splicing dysregulation, especially in cancer types without genetic lesions in spliceosomal components, becomes a high priority in cancer biology. Here, we report that RNA spliceosome abundance is an independent risk factor for poor prognosis in CLL, which contributes to splicing dysregualation and disease progression. Moreover, we discover that METTL3 regulates RNA splicing dysregulation through m6A-mediated translational control of splicing factors (Fig. 6I). Our results have implications in cancer biology, disease progression, and treatment.
Our studies highlight that the novel regulatory axis between METTL3 and splicing factors is likely to be a shared mechanism in cancer. Disruption of the regulatory axis led to cell growth defects in vitro and affected leukemia progression in vivo in CLL. Moreover, inhibition of METTL3 with STM2457 in multiple cancer types all resulted in cell death along with reduced expression of splicing factors, highly suggestive of a common regulatory axis controlling cancer cell growth. Given METTL3 is significantly upregulated in CLL and has been reported to act as an oncogene in multiple cancer types (23, 42, 55), we speculate that METTL3 may contribute to the onset and progression of CLL via regulating splicing networks. Previous studies in AML have indicated that MYC is one of the METTL3 target genes contributing to oncogenesis (23); however, our results showed that no change of MYC was detected upon inhibition or KO of METTL3 in B-cell contexts, suggesting cellular context and differentiation stage are another layer of regulation of m6A-dependent mechanisms. With oral METTL3 inhibitors entering clinical trials, we observed the therapeutic effects of METTL3 silencing in CLL xenograft mice, providing a proof-of-concept for targeting METTL3 in aggressive CLL. Future experiments with CLL murine models are expected to comprehensively elucidate the role of METTL3 in a B-cell context as well as the therapeutic effect of STM2457 in vivo.
Our results also uncover novel mechanisms that METTL3 utilizes to control the translation of splicing factors in a site-dependent fashion, which includes promoting fast ribosome recycling via m6A at stop codon regions to enhance translation and increasing ribosome pausing via m6A at CDS regions to perturb ribosomal elongation along transcripts and slow mRNA translation. This is different from previous studies, which reported that METTL3 enhances translation either by forming mRNA looping through binding with eIF3 or by inducing m6A on a subset of transcripts associated with chromatin and relieving ribosome stalling (32, 47). All these observations support that METTL3 uses multiple mechanisms to ensure proper protein regulation, implicating a critical role of METTL3 in cancer biology. This led us to propose a working model for METTL3 in the pathogenesis of CLL. Transcriptional upregulation of METTL3 in CLL causes an elevated protein expression of METTL3, which promotes m6A at stop codon regions to enhance translation of splicing factors whereas it also modulates m6A at CDS regions to remove translational machinery roadblocks to ensure upregulation of splicing factors. The increased abundance of splicing complex in turn generates aberrant RNA splicing and contributes to disease onset and progression (Fig. 6I).
Altogether, our study highlights the critical role of RNA splicing dysregulation in CLL that can be induced by a splicing factor mutation or an aberrant expression of an upstream regulator like m6A writer METTL3, which regulates the translation of splicing factor-encoding mRNAs via m6A methylation-induced ribosome pausing and recycling, leading to splicing dysregulation and progression of the disease.
METHODS
Human Samples
Heparinized blood samples were obtained from healthy donors and patients enrolled on clinical protocols with informed consent, approved by the Human Subjects Protection Committee of City of Hope (IRB#18067 and IRB#06229) or Dana-Farber Cancer Institute. All studies were performed following written informed consent from patients and in accordance with the Declaration of Helsinki. Peripheral blood mononuclear cells were isolated by density gradient centrifugation using Ficoll-Paque Medium (GE Healthcare). Normal B cells were isolated by immuno-magnetic negative selection with Pan B-cell isolation kit (MiltenyBiotec). All samples were cryopreserved with fetal bovine serum (FBS) 10% DMSO and stored in vapor-phase liquid nitrogen until the time of analysis. In total, 40 CLL samples were used from MAZTER-seq, TMT proteomics, and RNA-seq. The samples were selected based on SF3B1 mutation status and cytogenetic abnormalities to represent the spectrum of CLL.
Cell Lines and Reagents
Leukemia cell lines NALM6 (CRL-3273, ATCC), Mino (CRL-3000, ATCC), JeKo-1 (CRL-3006, ATCC), HG3 (provided by Dr. Richard Rosenquist, Uppsala University), MEC1 (ACC497, DSMZ), and MOLM13 (provided by Dr. Jianjun Chen, City of Hope) were cultured in RPMI-1640 (Invitrogen) supplemented with 10% FBS and 1% penicillin/streptomycin. Lenti-X 293T cells (TaKaRa) and LL8 cells with CD40 ligand overexpression (provided by Dr. John Chan, City of Hope) were cultured in DMEM with 10% FBS and 1% penicillin/streptomycin. When cocultured with primary B cells, LL8 cells were pretreated with 5 μg/mL Mitomycin C (Millipore Sigma) for 3 hours followed by medium change. We received these cell lines in 2018, and our recent STR profiling was conducted in 2022. All the cell lines were used within 25 passages. All cell lines tested negative for Mycoplasma.
Antibodies used in this study include anti-METTL3 (#A301-568A, Bethyl Laboratories; #ab195352, Abcam), anti-METTL14 (#HPA038002, Millipore Sigma), anti-METTL16 (#HPA020352, Millipore Sigma), anti-FTO (#ab126605, Abcam), anti-ALKBH5 (#ab195377, Abcam), anti-SF3B1 (#14434S, Cell Signaling Technology, #PA5-19679, Invitrogen), anti-SF3A1 from Reed Laboratory, anti-SF3A2 (#sc-390444, Santa Cruz Biotechnology), anti-SF3A3 (#sc-393673, Santa Cruz Biotechnology), anti-MYC (#D84C12, Cell Sig-naling Technology), anti-YTHDF1 (#57539S, Cell Signaling Technology), anti-YTHDF2 (#ab220163, Abcam), anti-YTHDF3 (#24206S, Cell Signaling Technology), anti-Annexin V (#640906, BioLegend), anti-GAPDH (#sc-365062, Santa Cruz Biotechnology), and goat F(ab’)2 anti-human IgM-UNLB (#2022-01, SouthernBiotech). Secondary antibody: Goat anti-Rabbit IgG secondary antibody, HRP (#65-6120, Invitrogen), and goat anti-mouse IgG secondary antibody, HRP (#65-6520, Invitrogen). Horseradish peroxidase activity was revealed using Clarity or Clarity Max ECL Western Blotting Substrates (#1705061 or #1705062, Bio-Rad). STM2457 (#HY-134836) was purchased from MedChemExpress.
Plasmid Constructions
Catalytic wild-type and mutant METTL3 sequence was amplified from pCSC-METTL3 (WT/MUT)-IRES-GFP plasmids (provided by Dr. Yanhong Shi, City of Hope; ref. 45) using the following primers: forward 5′-CTAGACTAGTCCACCATGTCGGACACGTGGAGCTCTATC-3′ and reverse 5′ATAAGAATGCGGCCGCCTATAAATTCTTAGGTTTAGAGATGATACCATCTGGG-3′, and then cloned into pCRII-Blunt-TOPO plasmid (Kit #450245, Thermo Fisher Scientific).
METTL3 sgRNA1 and sgRNA2 targeted sequences were silently mutated into 5′-AAGCGACCTCGCACTCACGT-3′ and 5′-GAACTAATCGAAGTTAAAAG-3′ by using Q5 Site-Directed Mutagenesis Kit (Kit #E0554S, New England Biolabs). This kit was also used to generate NTD- (aa1–187) deleted METTL3. Full-length and NTD-deleted catalytic WT and mutant METTL3 were amplified by PCR and cloned into pLVX-EF1α-IRES-zsGreen1 plasmid (TaKaRa). Nontargeting shRNA and shRNAs targeting METTL3 were amplified from pHIV7-mhC/shMETTL3-GFP plasmids (provided by Dr. Yanhong Shi, City of Hope), and further cloned into pMAX-GFP plasmid (Lonza). Cas9-sgRNA targeting to METTL3, YTHDF1, YTHDF2, YTHDF3, and SF3B1 were subcloned into either pLKO.5-mCherry or tet-pLKOsgRNA-puro plasmid (#104321, addgene). All shRNA and sgRNA sequences were shown in Supplementary Table S5. SF3B1 with N-terminal 3 × Flag tag was cloned into pLVXEF1α-IRES-zsGreen1 plasmid (TaKaRa) between SpeI and BglII. dCasRx, deactivated RfxCas13d, along with the 5′NLS was amplified from the pXR002 plasmid (#109050, addgene) using Q5 polymerase (# M0494S, New England Biolabs) and cloned into pLVX-EF1α-IRES-zsGreen1 plasmid (TaKaRa) between SpeI and NotI. A FragmentGene (GeneWiz) containing a 3′NLSXTEN-METTL3 (aa273–580)-HA sequence served as a template for amplifying and joining the METTL3 methyltransferase domain to dCasRx via NotI and BamHI, generating pLVX-dCasRx-NLS-METTL3IRES-ZsGreen1. dCasRx-METTL3 was generated by cloning an HA epitope tag (Y-P-Y-D-V-P-D-Y-A) along with methyltransferase domain of METTL3 onto the C terminus of dCasRx epitranscriptomic editors.
Lentivirus Production and Transduction
For lentivirus production, Lenti-X 293T cells were plated on 6-well plates at 3 × 105 per well in complete DMEM with 10% serum and allowed to adhere for 16 hours. Transfections were performed following the manufacturer's instruction with 3 μg of pLKO.5-sgRNA, 0.4 μg pMD2.G, 1.5 μg pSPAX2, with PEI-MAX 40K (Polyscience). The lentivirus particles were collected at 48 and 96 hours after transfection and filtered through a 0.45-μm Nalgene syringe filter SFCA (Whatman), concentrated with ultracentrifuge at 4°C, 10,000 × g for 2 hours, and resuspended in 300 μL medium. Spin infection was used in all our leukemia and lymphoma cell lines with lentivirus. Specifically, 0.5 million cells were plated in 1 mL of complete media supplemented with 8 μg/mL polybrene and concentrated virus, and then the plate was spun at 37°C for 1.5 hours at 950 × g. The medium was changed 8 hours after the spin transduction.
Xenograft Mice Models
For control and METTL3 KO HG3 xenograft mice, we first transduced nontarget control or METTL3 sgRNA into HG3 cells with Cas9 expression. Then, 6- to 8-week-old NSG mice were injected with 3 × 105 HG3 cells by intravenous injection. For doxycycline-inducible control and METTL3 KO HG3 cells xenograft mice, we generated doxycycline-inducible nontarget control or METTL3 KO cells and then overexpress luciferase. Then, we engrafted 6- to 8-week-old NSG mice by subcutaneous injection of 4.5 × 105 cells. Doxycycline diet was given 1 week after the engraftment. Whole-body bioluminescence imaging was performed with Lago X (Spectral Instruments Imaging) at 10 minutes after D-luciferin (Goldbio) intraperitoneal cavity injection. All the protocols were approved by the Institutional Animal Care and Use Committee (IACUC) of City of Hope.
Cell Proliferation Assay
To study the effects of METTL3 on cell proliferation, cells were seeded in a 24-well plate at a concentration of 20,000 to 40,000 cells per well in triplicate, and cell proliferation was assessed by absolute quantification by coupling flow cytometry and cell counting with hemocytometer. Briefly, at each time point, 150 μL cells were collected and mixed with 50 μL of precounted CFP-positive cells (5 × 105/mL). Cell proliferation was assessed by the following two steps: Calculate the ratio of mCherry+ cells (for KO cells with sgRNAs) or mCherry+GFP+ cells (for METTL3 reconstituted cells) as well as the ratio of CFP+ cells in the mixed cells. Quantify the cell density based on the counting of CFP+ cells using hemocytometer.
Cell-Cycle and OPP Run-On Assays
For the cell-cycle assay, cells were starved for 24 hours in a medium without FBS. The next day, the cells were replenished with a medium containing 10% FBS and cultured for 1 hour. EdU (Invitrogen) was then added into the medium at a concentration of 10 μmol/L for 1 hour before being collected for staining. For the OPP Run-On assay, cells were pretreated with cycloheximide (Thermo Fisher Scientific) at a concentration of 50 μg/mL for 12 hours as a positive control. OPP (Click Chemistry Tools) was added to the medium at a concentration of 20 μmol/L for 0.5 and 1 hour and then collected for staining. After either EdU or OPP treatment, the Click-&-Go Plus 488 Imaging Kit (#1314, Click Chemistry Tools) was used to assess the dye incorporation efficiency based on the manufacturer's recommended procedures. Then, the cells were stained with DAPI. All the samples were analyzed on an LSRFortessa cell analyzer. Data were analyzed with FlowJo software (Tree Star).
Electroporation of Plasmids into Normal and CLL B Cells
1 × 106 cells were collected and washed with PBS before electroporation. Cell pellet was resuspended in 20 μL electroporation buffer (Celetrix) with 4 μL (10 μg) pMAX-shRNA plasmid. The cells were then electroporated (Celetrix electroporator LE+) with the condition of 870 V, 20 ms, and 1 pulse. The electroporated cells were cocultured with LL8 cells in RPMI-1640 medium supplemented with 20% FBS, antibiotics, and 2 ng/mL recombinant human IL4 cytokines (R&D Systems) for monitoring cell survival.
Drug and Proliferation Assays
All suspension cells were plated in 96-well plates in triplicate at 5,000 to 10,000 cells/well in 200 μL medium and treated for 72 hours either with 0.5% dimethyl sulfoxide (DMSO) or different concentrations of STM2457 (0.04–40 μmol/L). After 72 hours, 100 μL aliquots of cells were collected for the day 3 survival assay. The remaining cells were continuously cultured using a fresh medium with the addition of STM2457. The ATPase-based CellTiter-Glo 2.0 Cell Viability Assay (#G9242, Promega) was used to measure cell survival on cells collected on days 3 and 7. Relative cell survival was calculated by normalizing ATP amount in the STM2457- versus DMSO-treated cells.
Immunofluorescence Staining
HEK293T expressing dCasRx-METTL3 were seeded at 5 × 105/mL and cultured on coverslips (#CLS1760-015, Chemglass) in 24-well plates. After 48 hours, the culture medium was discarded, and the coverslips were washed once with PBS gently. Cells were fixed with 4% paraformaldehyde for 15 minutes at room temperature, washed three times with PBS, and then permeabilized with PBS containing 0.2% Triton-X100 (PBST) for 15 minutes at room temperature. Cells were blocked in blocking buffer (10% BSA in PBST) for 30 minutes and stained with anti-HA antibody (Antibody #2367, Cell Signaling Technology) in blocking buffer overnight at 4°C. Cells were then washed three times with PBST and stained with anti-mouse secondary antibodies conjugated with Alexa Fluor 647 (Antibody #A10037, Thermo Fisher Scientific) in blocking buffer for 1 hour at RT. Coverslips were then washed three times in PBST and mounted on mounting media (F6057-20ML, Sigma-Aldrich).
Images were acquired using a confocal laser scanning microscope (FV3000-IX83, Olympus).
m6A LC-MS, Dot Blot, and SELECT Assays
Total RNA was extracted using TRIzol, and polyA RNA was further enriched with PolyATract mRNA isolation System IV (Kit #Z5310, Promega) in accordance with the manufacturer's instructions. For LC-MS, polyA RNA was digested by nuclease P1 (N8630, Sigma-Aldrich) for 1 hour at 42°C. Subsequently, 1 Unit alkaline phosphatase (P5931, Sigma-Aldrich) and NH4HCO3 (100 mmol/L) were added and incubated for another 1 hour at 37°C. Enzymes were removed by Ultrafiltration. Samples were loaded onto the C18 column and eluted by gradient methanol. The Ultimate 3000 system coupled with a TSQ Quantiva mass spectrometer (Thermo Fisher Scientific) was applied to quantify the levels of Adenosine and m6A. For the m6A dot blot, total RNA and polyA RNA were diluted in RNA-binding buffer and denatured at 65°C for 5 minutes. Then, one volume of 20× SSC buffer was added into the RNA samples before dotted onto Amersham Hybond-N+ membrane (#45-000-850, GE Healthcare) with Bio-Dot Apparatus (Bio-Rad).
The RNA samples were cross-linked onto the membrane via UV irradiation with the UV crosslinker (Thermo Fisher Scientific). The membrane was stained with 0.02% methylene blue (MB) as a loading control. After UV cross-linking and MB staining, the membrane was washed with PBST, blocked with 5% nonfat dry milk for 1 hour at room temperature, and incubated with the anti-m6A antibody at 4°C overnight. The membrane was then incubated with secondary HRP-conjugated goat anti-rabbit IgG antibody and developed with Clarity Max ECL Western Blotting Substrates. For the single-base elongation- and ligation-based PCR amplification method (SELECT) to determine the m6A level, the protocol was followed as previously published (56).
TMT Proteomics Sample Preparation, LC-MS, and Data Analysis
CLL and normal B cells were lysed with TEAB buffer supplemented with protease inhibitor and PMSF. Lysates (300 μg) were precipitated and digested to obtain peptides. TMT 10-plex labeling was performed according to the manual from Thermo Fisher Scientific, and peptides were fractionated via BPRP HPLC. An 1100 pump (Agilent) equipped with a degasser, and a photodiode array (PDA) detector (Thermo Fisher Scientific) was used. Peptides were subjected to a linear gradient from 3% to 25% acetonitrile in 0.125% formic acid using an Agilent 300 Extend-C18 column (Agilent) and were fractionated into a total of 96 fractions. Mass spectrometry was performed using an Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific) coupled to a Proxeon EASY-nLC 1000 liquid chromatography (LC) pump (Thermo Fisher Scientific). Peptides were detected (MS1) and quantified (MS3) in the Orbitrap. MS2 spectra were searched using the SEQUEST algorithm against a Uniprot composite database derived from the mouse proteome containing its reversed complement and known contaminants. Peptide spectral matches were filtered to a 1% false discovery rate (FDR) using the target–decoy strategy combined with linear discriminant analysis. The detected proteins were filtered to a = 200 and an isolation specificity of 0.5. Statistical analysis of the proteome was performed based on the normalized intensities of the TMT-reporter ions.
The peptide and protein abundance from TMT proteomics data was log2-transformed. Samples with Gaussian distribution of log2-transformed protein abundance were proceeded (Supplementary Fig. S2A). Additionally, bimodality coefficiency and Hartigan's dip test were used to test the bimodal and significant skew (tailing) distribution (Supplementary Fig. S2B and S2C). Outlier samples were measured by pair-wise Pearson correlation between samples. “Noisy” proteins were removed if they were detected on less than 3 technical replicates. Mean protein intensity was calculated among technical replicates. Proteins detected in all samples were retained for downstream analysis.
For differential protein expression analysis between CLL and normal B cells, we used a previously established method with minor modification (57). Size factor according to total loading for each sample first was calculated to normalize the total amount of detected peptides. An internal control sample was introduced to all three batches to normalize the batch effect. At last, log2-transformed protein intensities were normalized by quantile normalization for all samples. Differentially expressed proteins were identified using LIMMA linear model methodology (58). Proteins with more than 1.5-fold change and FDR less than 0.05 were used as significance cutoff. Differential protein expression between CLL and normal B cells was shown in Supplementary Table S2.
To summarize the protein expression of splicing factors, we first grouped splicing factors according to the HUGO group 1518 (59). In addition, we also included hnRNP and SR proteins from the Spliceosome database (60). In total, this list contains 189 proteins. Eigengenes were calculated for each group of proteins using R package WGCNA (39). 153 out of 189 splicing factors were detected in our proteomics cohort, whereas 143 of them were consistently detected in the combined cohort. Correlation was calculated by using the Pearson correlation method and P values for multiple comparisons were adjusted using the Benjamini–Hochberg correction. A correlation edge with FDR less than 0.05 was defined as a significant correlation edge. Positive correlation edge file and node information file were uploaded to the network visualization software Cytoscape for further visualization (61). Proteins involved in hnRNPs, SR proteins, SF3A/3B complex, and belonging to major spliceosome complex are color coded.
RNA-seq Sample Preparation and Data Analysis
Libraries for RNA-seq were constructed using the Stranded Total RNA Prep with Ribo-Zero Plus Kit (Illumina) and sequenced on the NovaSeq S4 platform using a paired-end 150-bp mode. Reads were aligned to the human reference genome (GRCh38/hg38) using STAR with default parameters (62). Differential expression of mRNA analysis was performed using the DESeq2 R package (63). Genes with more than 2-fold change and FDR less than 0.05 were used as significance cutoff.
For RNA splicing analysis, we established a pipeline that integrates StringTie (33), LeafCutter (34), and rMATs (35) to maximally improve the power of detection of alternative splicing events. In brief, de novo transcripts were assembled using StringTie with default parameters. LeafCutter was used to detect novel exon boundaries. With the isoform annotation file downloaded from GENCODE (release 26), all isoform information was merged to generate a comprehensive isoform annotation file using custom R script as a reference file for rMATs. Percent spliced-in (PSI) value was calculated using rMATs for each splicing event. For differential splicing analysis of cell lines, we adopted the differential splicing analysis statistical model from rMATs and used absolute delta PSI more than 0.1 and FDR less than 0.05 as the significance cutoff. For differential splicing analysis in the patient cohort, we logistically transformed PSI values followed by a Student t test to obtain the significance. The significantly dysregulated splicing events were identified as absolute delta PSI more than 0.1 and P value less than 0.05 as significance cutoff. The significantly changed splicing type was identified by using a Q-Q plot. Expected P values were calculated by using a uniform distribution. The least-square linear fit to the lower 95th percentile of points with slope λ was calculated. Higher λ represents more dysregulation in the specific splicing type.
For splicing factor dosage-dependent alternative splicing (DDS) analysis, we utilized 19 samples with matched RNA-seq and proteomics data from our cohort. As for each protein, we performed RNA splicing analysis between five CLL samples with highest abundance (high group) and five with lowest abundance (low group). DDS score was calculated as the number of significant alternative splicing events normalized to the protein abundance difference between these two groups of samples. To calculate the P value of overlap between dosage-dependent alternative splicing events with CLL-related splicing events, we randomly selected 10 proteins 10,000 times. P value was defined as the probability of higher or the same overlap.
m6A-Sensitive RNase Sequencing Sample Preparation and Data Analysis
Libraries were constructed according to the published protocol (40, 41) and sequenced on HiSeq 2500 platform using a paired-end 100-bp mode. Reads were aligned to the human reference genome (GRCh38/hg38) using STAR with default parameters (62). The cut efficiency matrix was calculated using mazter_mine with minor modification (40). Only ACA sites in line with m6A motif DRACH were included for differential m6A analysis. Delta cut efficiency was calculated as the median cut efficiency of CLL samples subtracting with that of normal B cell samples. To reduce noise, only ACA sites detected in at least 8 samples were included. The statistical significance was calculated using the Wilcoxon rank sum test. Candidate m6A sites were defined as median cut efficiency less than or equal to 0.9 for CLL patients or normal B cells. m6A density was calculated using candidate m6A sites number divided by transcript length (Fig. 3H). m6A sites with absolute delta cut efficiency greater than 0.1 and P value less than 0.05 were used as significance cutoff.
MeRIP Sequencing Sample Preparation and Data Analysis
Methylated RNA immunoprecipitation was processed by using the EpiMark N6-Methyladenosine Enrichment Kit (#E1610S, NEB). Purified RNA fragments were used for library construction using Stranded Total RNA Prep with Ribo-Zero Plus Kit (#20040529, Illumina) and sequenced with Illumina HiSeq 2500 platform using a paired-end 100-bp mode. Reads were aligned to the human reference genome (GRCh38/hg38) using STAR with default parameters (62). m6A peak calling, quantification, annotation, and differential m6A site analyses were performed using the R package exomePeak (64). De novo sequence motif discovery from m6A peaks was performed using HOMER with the command line “findMotifsGenome.pl diff_peak.sort.bed hg38 motif -rna -size 200 -len 5,6,7 -noknown -p 16 -cache 6000” (65). Visualization files were generated using igvtools with parameters “-z 5 -w 10 -e 0.” Integrative Genomics Viewer (IGV) was used to visualize the distributions of the m6A peaks.
For m6A site prediction, m6A sites were first narrowed down by a sliding window of 30 nucleotides in each m6A peak region. m6A enrichment fold change and P values relative to RNA-seq reads were calculated using the statistical model from exomePeak (64). Peak-over-median score (POM) was calculated for each nucleotide in the highest enrichment sliding window. The nucleotide with the highest POM was defined as “peak summit.” All sequences following the DRACH motif inside each m6A peak region were marked. Sites in line with the DRACH motif and closest to the peak summit were defined as the m6A sites.
Ribosome Profiling Sample Preparation and Data Analysis
Ribosome profiling was carried out following protocol from the Illumina TruSeq Ribo Profiler (Mammalian) kit (Illumina). The library was sequenced with the Illumina HiSeq 2500 platform using a single-end 50-bp mode. Reference sequences of rRNAs were downloaded from Ensembl (version 95; ref. 66). Reads aligned to rRNA sequences were removed from further analysis using the alignment tool Bowtie (67). Remaining reads were aligned to the human reference genome (GRCh38/hg38) using HISAT2 with default parameters (68). Multiple alignment reads were filtered out for downstream analysis. Quality control including reads length distribution, reading frames quantification, and 3 nucleotide periodicity were analyzed using the R package riboWaltz (69).
For the calculation of TE, the RPFs were counted in CDS regions with read length between 28 and 34 nt and not aligned to the first 15 and last 5 codons (70, 71). RPFs and RNA-seq reads for each transcript were normalized by the trimmed mean of M values (TMM) from edgeR (72, 73). TE was calculated as a ratio between normalized RPFs and normalized RNA-seq reads. The Fisher exact test was used to test the significance of the independent distribution of RPFs relative to RNA-seq reads. P values for multiple comparisons were adjusted using the Benjamini–Hochberg correction. Genes with more than 2-fold change and FDR less than 0.05 were used as significance cutoff.
For the ribosome recycling analysis, the exact P (peptidyl)-site of each RPFs reads was corrected by using R package riboWaltz according to the offset distance. Ribosome recycling rate was defined as Psite coverage of CDS regions (300-nt-long region upstream of stop codon region) divided to stop codon regions (45 nt upstream and 15 nt downstream of stop codons). Reads coverage was normalized to Reads Per Kilobase of transcript, per million mapped reads (RPKM). Differential ribosome recycling at stop codon regions was examined using the Fisher exact test to check if the RPF changes at stop codon regions are independent of those at upstream CDS regions. The odds ratio was calculated according to the recycling rate in METTL3 KO cells compared with that in control cells. Genes with |Log2 odds ratio| ≥0.5 and P value less than 0.05 were used as significance cutoff.
For the ribosome pausing analysis, the pause score at each codon position was calculated based on a previous study (51). Ribosome A (aminoacyl) site information was extracted from riboWaltz output. To identify METTL3-dependent pausing sites, for each position of the transcript, 2 × 2 contingency tables were created to perform a two-trailed Fisher exact test to compare the ratio of RPFs in the WT and KO fractions at a given position to the ratio at all other positions in that transcript (that is, the summed reads in each fraction for the entire transcript minus the position of interest), Benjamini–Hochberg correction for multiple hypothesis testing was used. METTL3-dependent pause sites were defined as Log2 odds ratio higher than 1 and FDR less than 0.05. To examine the codon enrichment of METTL3-dependent pause sites to other sites, METTL3-dependent pause sites with pause score >10 in METTL3 KO cells and at least 5 RPFs were used. The odds ratio was calculated based on the codon frequency of METTL3-dependent pause sites to other sites. Two-trailed Fisher exact test followed by Benjamini–Hochberg correction to stratify the significance.
For the analysis of m6A-related ribosome pausing, the A-site coverage ± 60 nt around the m6A sites (described above) was calculated, and the regions with less than 20 reads coverage were excluded. The A-site coverage in each transcript at this range was normalized to mean A-site coverage of this region. The mean of normalized A-site coverage at each nucleotide was calculated and plotted out.
Pathway Enrichment Analysis
Enrichment pathways were identified using two methods: GSEA based on databases from the Molecular Signature Database (MSigDB) including KEGG pathway, HALLMARK, and Reactome database. Gene sets with FDR <0.1 were considered significantly enriched pathways (74–76). A web-based pathway enrichment tool Metascape (77). Gene sets with FDR <0.05 were considered significantly enriched pathways. The pathway enrichment table related to each figure was shown in Supplementary Table S5.
Statistical Methods
Survival curves were estimated using the Kaplan–Meier method, and the log-rank test was used to assess statistical significance for TTFT and OS by combining our cohort (n = 22) with the publicly available cohort (n = 91). Univariate Cox regression analysis was used to assess the prognostic impact of spliceosome complexes. Multivariate Cox regression analysis was used to assess the independent prognostic impact from 17p deletion status, IGHV mutational status, and eigengene of spliceosome complex for outcomes in the CLL cohort. Other statistical analyses were performed using one/two-tailed Student t test, two-way ANOVA, Kolmogorov–Smirnov test, Fisher exact test, and Mann–Whitney U test, all included in the figure legends.
Data Availability
RNA-seq, MeRIP-seq, MAZTER-seq, and ribosome profiling raw and processed data have been deposited in the Gene-Expression Omnibus (GSE223731) and dbGAP (phs003191).
Authors’ Disclosures
G. Marcucci reports being a speaker for AbbVie; two clinical trials sponsored by Merck. A.V. Danilov reports grants and personal fees from Astra-Zeneca, Nurix Therapeutics, BeiGene, AbbVie, MEI, Incyte, and BMS, personal fees from Pharmacyclics and MingSight, grants from Bayer Oncology, Cyclacel, Tg Therapeutics, and SecuraBio outside the submitted work. J.R. Brown reports personal fees from AbbVie, Acerta/Astra-Zeneca, grants and personal fees from BeiGene, personal fees from Bristol Myers Squibb/Juno/Celgene, Catapult, Genentech/Roche, Grifols Worldwide Operations, Hutchmed, grants and personal fees from iOnctura, Janssen, Kite, grants and personal fees from Loxo/Lilly, MEI Pharma, personal fees from Morphosys AG, Novartis, Numab Therapeutics, Pfizer, Pharmacyclics, Rigel, grants from Gilead, Verastem/SecuraBio, and Tg Therapeutics outside the submitted work. T. Siddiqi reports Astra-Zeneca, BeiGene, BMS, Kite Pharma, AbbVie, and Celgene outside the submitted work. No disclosures were reported by the other authors.
Authors’ Contributions
Y. Wu: Data curation, validation, investigation, writing–original draft, writing–review and editing. M. Jin: Data curation, investigation, writing–original draft, writing–review and editing. M. Fernandez: Investigation. K.L. Hart: Investigation. A. Liao: Investigation. X. Ge: Methodology. S.M. Fernandes: Resources. T. McDonald: Resources. Z. Chen: Methodology. D. Roth: Methodology. L.Y. Ghoda: Resources. G. Marcucci: Resources. M. Kalkum: Methodology. R.K. Pillai: Resources. A.V. Danilov: Resources. J.J. Li: Methodology. J. Chen: Resources. J.R. Brown: Resources. S.T. Rosen: Resources. T. Siddiqi: Resources. L. Wang: Conceptualization, resources, data curation, supervision, funding acquisition, writing–original draft, project administration, writing–review and editing.
Acknowledgments
The authors would like to thank Drs. Renjang Lin and Nora Heisterkamp (City of Hope) for critical review of our manuscript. We appreciated the kind help from Jennifer Zankich Dettor, Elena Pulkinen, Ashish Streatfield, and Kelly Synold from City of Hope Hematopoietic Tissue Biorepository for extracting and compiling clinical data for annotation of the specimens. This work was supported by a Startup package from City of Hope (L. Wang), grants from the NIH (NCI) R01CA240910 and R01CA21623 (to L. Wang). Research work related to sample collection in the Hematopoietic Tissue Biorepository Core supported by the NIH (P30CA033572).
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Blood Cancer Discovery Online (https://bloodcancerdiscov.aacrjournals.org/).