Phase Separation Mediates NUP98 Fusion Oncoprotein Leukemic Transformation

NUP98 fusion oncoproteins promote leukemogenesis by undergoing phase separation in the nuclei of hematopoietic cells to form hundreds of punctate transcriptional condensates that drive aberrant gene expression and induce myeloid transformation.


RNA Sequencing Analysis
Read alignment to the Mus musculus reference genome (mm10) was performed with STAR software (v2.7.0d) 1 .Quality control was performed to identify duplicated and unmapped reads; all samples were of acceptable quality for analysis.To evaluate gene expression level, the read count for each annotated gene was calculated using HTSeq 2 (version 0.11.2).Differential gene expression and regularized log transformation (rlog) on raw count data were carried out using DESeq2 3 .All the genes were ranked according to the fold-change and significance from differential analysis.Gene set enrichment analysis was then performed using molecular signatures database (MSigDB) (version 6.2) C2 genes 4 .

Cell fixation and immunofluorescence
To fix cells for nuclear pore complex staining, 250,000 cells were grown on Poly-L-Lysine-treated (Sigma-Aldrich), sterile coverslips (VWR).HEK293T cells were first rinsed with warm 1X PBS, followed by a 20 min incubation with 4% Paraformaldehyde (Electron Microscopy Sciences), then a 5-minute incubation in 0.5% Triton-X-PBS for cell permeabilization.Cells were washed with Wash Buffer (0.1% Triton-X-PBS) and blocking was performed at room temperature (RT) for 1 hour in Blocking Buffer [10% Normal Donkey Serum (Jackson ImmunoResearch)] prior to primary antibody incubation.For fixation of hematopoietic cells and NUP98-KDM5A PDX cells, a cytocentrifuge was used to adhere cells to a glass slide by spinning at 400 rpm for 4 minutes.The cells were then rapidly rinsed in 1X PBS-5mM EGTA, followed by incubation at -20 °C in 95% Methanol-5 mM EGTA for 30 minutes.Cells were washed with 1X PBS, followed by blocking at room temperature for 1 hour prior to primary antibody incubation.All primary antibodies were diluted in 5% Normal Donkey Serum and added to coverslips overnight at 4 °C.The primary antibodies used were Mouse anti-NUP107 (Abcam, Mab414, RRID:AB_448181, 1:300) and Rat anti-NUP98 (GeneTex, 2H10, RRID: AB_2894964, 1:200).After primary antibody incubation, cells were washed and incubated for 45 minutes at RT with secondary antibodies conjugated to Alexa Fluor Rhodamine Red™-X or Alexa Fluor 647 (Jackson ImmunoResearch; RRID:AB_2340614) at 1:300 diluted in 5% Normal Donkey Serum.Cells were washed and counter stained with DAPI (4',6-Diamidino-2-Phenylindole, Dihydrochloride, Invitrogen) diluted in PBS (300 nM) for 2 minutes, and then mounted onto glass slides with antifade solution (90% glycereol, 0.5% N-propyl gallate).

Gel mobility shift assays
The DNA-binding assays for HOXA9 homeodomain (HOXA9197-272 HD) and with DNA bindingdeficient mutant (HOXA9197-272-ΔDNA; mutations include R258A, K262A, and W199G) were performed using the concentration range from 2.5 nM to 1.25 M protein.The DNA concentration was kept constant at 10 nM.A double-stranded 20 base pair DNA oligonucleotide (the sequences 5′-ACTCTATGATTTACGACGCT-3′; HOXA9 binding site, TTTAC) having a 5′ end with Cy5 dye (Integrated DNA Technologies, Inc., USA) was chosen for the assay.Both the protein and DNA were present in 10 mM Tris (pH 7.5), 75 mM NaCl, 6% glycerol, 1 mM DTT, 1 mM EDTA, 6.7 ng/L Poly(2′-deoxyinosinic-2′-deoxycytidylic acid) sodium salt (dIdC), 6.7 ng/L BSA.After mixing the DNA and protein, the reactions were incubated at room temperature for 30 min and then run on a 16% acrylamide gel in 1x TBE (Invitrogen) buffer at a constant voltage of 100 V for 140 min.The gel was imaged in a gel imager (Amersham Imager 600, GE Healthcare Life Sciences, USA) for Cy5 fluorescence.

Recombinant protein expression, purification, and fluorescent labeling
Cultures transfected with pET28a-based plasmids expressing the NHA9 and other protein constructs (see Suppl.Table 3) were grown at 37 °C to an optical density at 600 nm (OD600) ~ 0.8.Protein expression was induced by the addition of 1 mM IPTG (GoldBio) and the cultures were incubated at 37 °C for 4 h.8 L of bacterial culture were harvested by centrifugation and lysed in Buffer A (20 mM Tris, 500 mM NaCl, 1 mM Tris (2-carboxyethyl) phosphine hydrochloride (TCEP), pH 7.4) containing 0.1 % Triton X-100, by sonication on ice.The cell lysate obtained was centrifuged at 30,000 x g at 4 °C for 30 minutes.The supernatant was discarded except in the cases of NHA9Midi-21FGAA, HOXA9197-272 HD and HOXA9197-272 HD-ΔDNA which are found in the soluble fraction.For all other proteins, the inclusion bodies obtained were dissolved in extraction buffer [6 M guanidine HCl (GdnHCl) in Buffer A].The solution was again sonicated and centrifuged.The supernatant obtained after centrifugation of the GdnHCl/Buffer A solutions and the soluble fractions of NHA9Midi-21FGAA, HOXA9197-272 HD and HOXA9197-272 HD-ΔDNA were loaded onto a Ni-NTA column (30 mL).The bound proteins were washed with two column volumes of Wash Buffer (6 M urea in Buffer A) followed by 5 column volumes of Wash Buffer containing 50 mM Imidazole.The proteins were eluted with 6 M urea in Buffer A containing 500 mM Imidazole.The 12x His tag was cleaved in an overnight dialysis step at room temperature, against 20 mM Tris pH 7.4, 150 mM NaCl, 2 M urea, 1 mM TCEP buffer, in the presence of TEV protease.
For NHA9Midi-21FGAA, HOXA9197-272 HD and HOXA9197-272 HD-ΔDNA, the TEV digestion was carried out in PBS buffer (1x PBS; 136.9 mM NaCl, 2.68 mM KCl, 10 mM Na2HPO4 1.7 mM KCl, pH 7.4) at 4 °C overnight.The 12x His tagged TEV was then removed using Ni-NTA column and the flow-through fractions containing cleaved proteins were loaded onto a HPLC column (PLRP-S 1000A 8 M, Agilent Technologies) equilibrated with Mobile Phase A (0.1 % TFA and 5 % acetonitrile in water) and eluted with a linear gradient of Mobile Phase B (0.1% TFA in acetonitrile).
The fractions with high purity were identified by SDS-PAGE, flash frozen and lyophilized.For labeling NHA9, a single cysteine construct (C188A) was created.All proteins containing a single cysteine (excluding NHA9-21FGAA, NHA9-8FA and NUP98-N) were fluorescently labeled using maleimide derivatives of Alexa Fluor dyes (Thermo Fisher Scientific), according to the manufacturer's protocol.The lyophilized proteins (400-500 μM) were mixed with Alexa Fluor dye at 1:3 ratio in 6 M guanidine HCl (GdnHCl), 20 mM Tris, 5 mM EDTA, pH 7.0, containing 1 mM Tris (2-carboxyethyl) phosphine hydrochloride (TCEP).After overnight incubation, the reaction was quenched with 50mM DTT.The NHA9-21FGAA, NHA9-8FA and NUP98-N proteins were labeled at the N-terminus using the NHS ester of Alexa Fluor dyes (Thermo Fisher) at a 1:4 (protein:Alexa Fluor dye) mole ratio in 1 M GdnHCl, 20 mM HEPES, pH 7.2, containing 1 mM TCEP.To quench the reaction, 50 mM of Tris was added.All of the labeled proteins were passed purified using HPLC (PLRP-S 1000A 8 M, Agilent Technologies) to remove any unreacted/free dye.For imaging samples between 315 nM to 20 M [protein], 100 nM Alexa Fluor-labeled NHA9 constructs were used; however, for concentrations between 10 nM to 160 nM, a final concentration of 10 nM Alexa Fluor-labeled of the NHA9 constructs was used.

Sequence analysis
We used a custom, Matlab-based computational pipeline called Swiss Army Knife (SAK) to analyze the amino acid sequence features of NHA9.The SAK pipeline identifies and annotates low-complexity regions by calculating the sequence Shannon Entropy 5,6 at each position, conserved domains and features by accessing the Conserved Domain Databases (CDD) 7-9 , predicted secondary structure using the JPred4 API 10 , predicted disordered regions and Molecular recognition features (MoRFs) by executing IUPred2A 11 , regions predicted to be enriched in Pi-Pi interactions and predicted prion-like domains using PScore 12 and PLAAC 13 , respectively, acidic and basic tracts in disordered regions using ABTScore 14 , hydropathy utilizing the Kyte-Doolittle hydropathy scale 15 as implemented in CIDER 16 , position-specific amino acid identities, and enrichment of amino acids for the overall sequence as compared to frequencies in the human proteome as reported in UniProtKB release 15 17 .The results of this analysis are then exported as graphical summaries showing site-specific sequence features, as shown in Suppl.

NUP98 FOs
For all quantified images, the x and y dimensions were 0.11 m  0.11 m, with 992 total pixels in each dimension.The z-step size was 0.20 m, and 61 planes were image, giving 61 pixels in the z dimension.The volume for each 3D image is 61  992  992 pixels.Therefore, one cubic pixel volume (voxel) corresponds to (0.11 m  0.11 m  0.20 m) = 0.00242 m 3 .Individual puncta volume was determined in the unit of cubic pixel and converted to m 3 using the abovementioned conversion.

Background intensity correction
For each 3D image stack, background intensity correction was performed for each pixel based on the peak background intensity value at 101 a.u. in the mEGFP channel of untransfected images (see Suppl.Fig. 7K).For background correction, this background fluorescence value was subtracted from the fluorescence intensity of all image pixels.All the 3D images were used for image analysis after background correction, as described below.

Cell nuclei segmentation
Cell nuclei were segmented based on the DNA (Hoechst) channel using Cellpose 18 (v 0.6.5).
Since each 3D dataset represented only one layer of cells, cell nuclei were segmented in 2D in each z-layer and then combined into 3D stacks.Each individual z-layer intensity was normalized between the 0.25 th and 99.75 th percentiles and segmented with parameters: model type:" cyto", diameter: 120 pixels (13.2 µm), flow threshold: 0.4, probability threshold: 0. The cytoplasm model performed better than the nucleus model for our datasets.
After segmenting nuclei in individual z-layers, results were combined into 3D stacks as follows: The total area of segmented nuclei was computed in each z-layer (excluding 10 layers at each border), and the layer with maximal total area was used as a reference for cell IDs.Cellpose segmentations of individual z-layers were then combined into a 3D stack, converted to a binary mask, and filtered with a 3D median filter (size 3 pixels), to remove objects that were detected in only one z-layer and fill gaps between masks that were one-layer wide.Finally, the binary mask of each z-layer was multiplied by the labels of the reference layer to assign cell IDs throughout the entire 3D stack.

Puncta segmentation
Detection and segmentation of puncta in the mEGFP channel were based on applying the scaleadapted Laplacian of Gaussian (LoG) filter 19 .We first detected puncta centers using the blob_log (LoG detector) function of the scikit-image library 20 with parameters: min_sigma=[1, 1.8, 1.8] (0.2 µm), max_sigma=[10, 18.2, 18.2] (2 µm), num_sigma=5, overlap=1, threshold=0.003for the NHA9 constructs and NUP98 FOs images in HEK293T cells (threshold=0.001for the NHA9 constructs and NUP98 FOs images in HSPC cells, and NUP98-KDM5A patient derived xenograft (PDX) and the control human CD34+ cells).Puncta centers were additionally filtered according to intensity relative to the background mEGFP signal.To quantify the background mEGFP signal in each image, we computed a median mEGFP intensity in each cell, and then calculated the 95 th percentile of these values over the entire image.Puncta centers with an intensity lower than 3 times the background mEGFP signal were removed for the NHA9 constructs and NUP98 FOs images in HEK293T cells.The same was done for the NUP98 signal in quantifying the NUP98-KDM5A PDX and the control human CD34+ cells.For the NHA9 constructs and NUP98 FOs images in HSPC cells puncta centers with an intensity < 2 times the background mEGFP signal were removed.For quantifying puncta detected by the NUP98 antibody in NUP98-KDM5A PDX and control human CD34+ cells (Suppl.Fig. 7N, O), all puncta within 0.5 µm of the nuclear periphery were excluded to avoid fluorescence signal of endogenous NUP98 at the nuclear pore complex.
To segment individual puncta, we filtered the image with five LoG filters with scales linearly interpolated between min_sigma and max_sigma using the filters.gaussianand filters.laplacefunctions of scikit-image (v 0.18.1).We then calculated the pixel-wise maximum of filter responses.The result was thresholded at 0.001 (relative intensity in LoG scale units) for the NHA9 constructs and NUP98 FOs images in HEK293T cells and processed with a distance transform watershed using the puncta centers as seeds.For the NHA9 constructs and NUP98 FOs images in HSPC cells, and NUP98-KDM5A PDX and the control human CD34+ cells, the threshold for puncta segmentation was 0.0005.

Conversion of mEGFP fluorescence intensity to protein concentration
We converted the total mEGFP fluorescence intensity per pixel volume to the GFP molar concentration based on the calibration plot shown in Suppl.Fig. 1A.This was done for the following parameters: [G-NHA9 construct], [LP], and [DP], as reported in the main figures and supplementary figures.For this calibration, 22 mEGFP protein solutions were prepared as well as a buffer-only solution (without mEGFP), with concentrations of mEGFP ranging from 1 nM to 100 M.For each solution, 61 z-stack (0.2 m step size ranging 12.2 m total) images were recorded at 6 different positions using a single fluorescence channel for mEGFP.We used the image analysis pipeline discussed above to determine the total mEGFP fluorescence intensity from the images.We computed that one absorbance unit of total mEGFP fluorescence intensity per cubic pixel volume corresponds to a mEGFP concentration of 20 nM (Suppl.Fig. 1A).We used this conversion factor to convert mEGFP fluorescence intensities per unit volume into mEGFP molar concentrations, i) for each cell nucleus ([G-NHA9 construct]), ii) for the light phase of each nucleus ([LP]) and within puncta within each nucleus, determined as the average over all detected puncta ([DP]).We used the same conversion method for the images of G-NUP98 FOs in HEK293T cells and G-NHA9 constructs in HSPC cells, as well.

Extraction of thermodynamic features from image analysis
We applied several selection criteria during the analysis of thermodynamic parameters based upon the confocal fluorescence microscopy images of the NHA9 constructs, as follows.We excluded the cells based on the following conditions, (i) cells with a [G-NHA9 construct] value below ~0.02 M, determined as the background level of fluorescence in the mEGFP channel of untransfected HEK293T cells (Suppl.Fig. 1B) and (ii) incomplete cells, determined as cells containing puncta near the periphery of the images using 5-pixel cutoff in the xy plane.We used these cell selection criteria for the images of G-NUP98 FOs in HEK293T cells and G-NHA9 constructs in HSPC cells.
We computed the average mEGFP concentration within puncta (dense phase) per cell, where n is the number puncta per cell and the volume of each punctum is in the unit of cubic pixels.We calculated the average mEGFP concentration in the non-puncta containing region of the nucleoplasm (light phase) per cell,
From the quantification of the individual cell nuclei and individual puncta, we obtained the average concentration of the NHA9 constructs / NUP98 FOs in the nucleoplasm (defined as, light phase) per cell ([LP]) and average concentration within puncta (defined as, dense phase) per cell ([DP]).
The ratio of the concentration of the dense to the light phase is defined as the partition coefficient 21 , K p = [DP] [LP] .The Gibbs free energy of transfer (into puncta) is defined as, GTr = -RT ln(Kp), where R is the gas constant (= 1.98  10 -3 kcal/ mol/ K) and T is temperature (= 310 K).

Quantification of nuclear peripheral puncta in HEK293T and mHSPCs
We quantified the nuclear peripheral puncta using distance transform method by calculating the Euclidean distance to the nucleus border for each pixel of the nucleus mask, and then averaged those values for individual puncta.This way we estimated the average distance of all puncta pixels to the nucleus border (per puncta).To obtain the percentage of nuclear peripheral puncta for each construct, we identified the number of puncta within a distance  0.5 µm from the nuclear periphery and compared that with the total number of puncta.

Statistical analysis of the image quantification data
Generalized linear mixed effects models 22 were used to account for the random effects of image position while comparing cellular imaging characteristics across two constructs.A Poisson model with log link was used for analysis of puncta counts.A Gaussian model with identity link was used for analysis of all other imaging characteristics.Puncta count and the transfer free energy (GTr) (see the main text Methods) were not transformed prior to analysis.All other cellular imaging variables were log10-transformed prior to analysis to be more accurately modeled by a normal distribution.All p-values are two-sided.No multiple testing adjustments were performed.All the pairwise p-values were calculated using R-package, lme4 (v 1.1-26) and lmerTest (3.1-3).
All the plots of puncta and thermodynamic features were performed using R-package (v 4.1.0),ggscatter from ggpubr (v 0.4.0) and ggplot2 (v 3.3.5).
Suppl.Videos 6. Time-lapse confocal microscopy video of a fusion event in a HEK293T cell expressing G-NUP98-LNP1 acquired at an experimental frame rate of 1.2 seconds/frame.Video frame rate is 7 frames/second.Time format is mm:ss.and G-NHA9 Midi -21FGAA (bottom, green).DNA is stained with Hoechst dye (blue).B-E, Plots of puncta # (/10 3 m 3 ) (B), V p (m 3 ) (C), K p (K p = [DP] [LP] ) (D), and ΔG Tr (kcal/mol) (E) vs. [G-NHA9 Midi construct] for G-NHA9 Midi (lime green) and G-NHA9 Midi -21FGAA (light blue).Data is plotted on a semi-log (y-axis: log 10 ) scale in (B-D).The pairwise p-value between G-NHA9 Midi vs. G-NHA9 Midi -21FGAA is shown in each plot (B-E) (n = 731 and 791 in (B) including the cells with zero punctum, and n = 122 and 5 in (C-E) including the cells with zero punctum, respectively, for G-NHA9 Midi and G-NHA9 Midi -21FGAA).F, Representative images of fixed HEK293T cells expressing G-NHA9 and stained with a nuclear pore complex marker (NUP107, magenta).DNA is labeled with DAPI dye (blue).G, Quantitation of percent of total puncta localized within 0.5 m of the nuclear periphery for all G-NHA9 constructs in HEK293T cells.
Calibration of the total mEGFP fluorescence intensity vs. mEGFP concentration.From the slope based on linear fitting, we determined that one-unit of mEGFP intensity per pixel volume corresponds to 20 nM.The volume of each image is 60,027,904 (pixels) 3 (992 × 992 × 61).B, Density plot of mEGFP concentration from untransfected HEK293T cells used to determine the fluorescence background in the mEGFP channel of the microscope; this was used to establish a fluorescence intensity threshold (in M mEGFP units) below which cells were rejected for analysis; the median value of 0.02 M mEGFP was used as the threshold value for analysis of microscopy images for all G-NHA9 constructs and the empty EGFP vector control.C, Quantification of nuclear puncta detected by the HA antibody in the presence and absence of a mEGFP tag.In the puncta # (/10 3 M 3 ) vs. average nuclear HA intensity plot, the boxed regions highlight equivalent fluorescence intensity ranges for each condition; n = 750 for HA-mEGFP-NHA9 and 760 for HA-NHA9.D, Concentration-dependent turbidity values for NHA9 protein monitored by UV absorbance at 340 nm. of the effects of mutations in the HOXA9 homeodomain on sequencespecific DNA binding activity.A, Results of gel mobility shift assays with a 20 base pair double-stranded DNA oligonucleotide and the HOXA9 homeodomain (HOXA9 197-272 HD) and with DNA binding-deficient mutant (HOXA9 197-272 -ΔDNA; mutations include R258A, K262A, and W199G).B, Still images of multiple time-points from a confocal fluorescence microscopy time-lapse video (Suppl.Video 3) for a fusion event in a HEK293T cell expressing G-NHA9-ΔDNA.C, Confocal fluorescence micrographs of Alexa 488-labeled NUP98-N condensates prepared in vitro with increasing protein concentration.The micrographs are presented as maximum intensity projections of 13 confocal planes offset by 0.5 m per plane.D, Representative confocal microscopy images of live HEK293T cells expressing G-NUP98-N (green).DNA is stained with Hoechst dye (blue).The merge image of G-NHA9-ΔDNA puncta with stained DNA is included for comparison.E-G, Plots of puncta # (/10 3 m 3 ) (E), V p (m 3 ) (F), and K p (K p = [DP] [LP] ) (G), vs. [G-NHA9 construct] for G-NUP98-N (teal) and G-NHA9-ΔDNA (red).Data is plotted on a semi-log (y-axis: log 10 ) scale.The pairwise p-value between G-NUP98-N vs. G-NHA9-ΔDNA is shown in each plot (E-G) (n = 846 and 780 in (E) including the cells with zero punctum and n = 318 and 254 in (F-G) excluding the cells with zero punctum, respectively, for G-NUP98-N vs. G-NHA9-ΔDNA).
2 (cont.).H, Results of the sequence analysis pipeline show sequence features, including Shannon entropy, predicted secondary structure, predicted disorder, presence of cation-pi and pi-pi interactions, prion-like domains, acidic and basic tracts, and the occurrence and enrichment of amino acids within the HOXA9 region of NHA9.I, Concentration-dependent turbidity values for the C-terminal HOXA9 region of NHA9 in the presence (red) and absence (black) of 10% PEG monitored by UV absorbance at 340 nm.J, Still images of multiple time-points from a confocal fluorescence microscopy time-lapse video (Suppl.Video 4) for a fusion event in a HEK293T cell expressing G-NUP98-N.K, Confocal micrographs of fluorescence recovery of a single G-NUP98-N punctum in HEK293T cells at different times after photobleaching.FRAP recovery curve (right) for a photo-bleached NUP98-N punctum (teal) with the recovery curve for G-NHA9-ΔDNA (red) included for comparison.Individual puncta were manually tracked at different times and recovery was plotted as the mean ± the standard deviation (S.D.; n = 20). of multiple FG residues in the FG-rich IDR of NHA9 Midi disrupts puncta formation in cells.A, Representative images of live HEK293T cells expressing G-NHA9 Midi(top, green) -term FG repeats -GLEBS domain -C-term FG repeats -HOXA9 NUP98N, N-terminal FG repeats NUP98N, C-terminal FG repeats Suppl.Figure 8. Analysis of the amino acid features in the NHA9 sequence.Results of the sequence analysis pipeline show sequence features, including Shannon entropy, predicted secondary structure, predicted disorder, presence of cation-pi and pi-pi interactions, prion-like domains, acidic and basic tracts, and the occurrence and enrichment of amino acids within NHA9 (top), the N-terminal FG-motif region of NUP98-N within NHA9 (bottom left), and C-terminal FG-motif region of NUP98-N within NHA9 (bottom right).The number of residues in the regions are given above their respective bars in the bottom panel.