Phase Separation Mediates NUP98 Fusion Oncoprotein Leukemic Transformation

NUP98 fusion oncoproteins promote leukemogenesis by undergoing phase separation in the nuclei of hematopoietic cells to form hundreds of punctate transcriptional condensates that drive aberrant gene expression and induce myeloid transformation.


RNA Sequencing Analysis
Read alignment to the Mus musculus reference genome (mm10) was performed with STAR software (v2.7.0d) 1 . Quality control was performed to identify duplicated and unmapped reads; all samples were of acceptable quality for analysis. To evaluate gene expression level, the read count for each annotated gene was calculated using HTSeq 2 (version 0.11.2). Differential gene expression and regularized log transformation (rlog) on raw count data were carried out using DESeq2 3 . All the genes were ranked according to the fold-change and significance from differential analysis. Gene set enrichment analysis was then performed using molecular signatures database (MSigDB) (version 6.2) C2 genes 4 .

Cell fixation and immunofluorescence
To fix cells for nuclear pore complex staining, 250,000 cells were grown on Poly-L-Lysine-treated (Sigma-Aldrich), sterile coverslips (VWR). HEK293T cells were first rinsed with warm 1X PBS, followed by a 20 min incubation with 4% Paraformaldehyde (Electron Microscopy Sciences), then a 5-minute incubation in 0.5% Triton-X-PBS for cell permeabilization. Cells were washed with Wash Buffer (0.1% Triton-X-PBS) and blocking was performed at room temperature (RT) for 1 hour in Blocking Buffer [10% Normal Donkey Serum (Jackson ImmunoResearch)] prior to primary antibody incubation. For fixation of hematopoietic cells and NUP98-KDM5A PDX cells, a cytocentrifuge was used to adhere cells to a glass slide by spinning at 400 rpm for 4 minutes. The cells were then rapidly rinsed in 1X PBS-5mM EGTA, followed by incubation at -20 °C in 95% Methanol-5 mM EGTA for 30 minutes. Cells were washed with 1X PBS, followed by blocking at room temperature for 1 hour prior to primary antibody incubation. All primary antibodies were diluted in 5% Normal Donkey Serum and added to coverslips overnight at 4 °C. The primary antibodies used were Mouse anti-NUP107 (Abcam, Mab414, RRID:AB_448181, 1:300) and Rat anti-NUP98 (GeneTex, 2H10, RRID: AB_2894964, 1:200). After primary antibody incubation, cells were washed and incubated for 45 minutes at RT with secondary antibodies conjugated to Alexa Fluor Rhodamine Red™-X or Alexa Fluor 647 (Jackson ImmunoResearch; RRID:AB_2340614) at 1:300 diluted in 5% Normal Donkey Serum. Cells were washed and counter stained with DAPI (4',6-Diamidino-2-Phenylindole, Dihydrochloride, Invitrogen) diluted in PBS (300 nM) for 2 minutes, and then mounted onto glass slides with antifade solution (90% glycereol, 0.5% N-propyl gallate).

Gel mobility shift assays
The DNA-binding assays for HOXA9 homeodomain (HOXA9197-272 HD) and with DNA bindingdeficient mutant (HOXA9197-272-ΔDNA; mutations include R258A, K262A, and W199G) were performed using the concentration range from 2.5 nM to 1.25 M protein. The DNA concentration was kept constant at 10 nM. A double-stranded 20 base pair DNA oligonucleotide (the sequences 5′-ACTCTATGATTTACGACGCT-3′; HOXA9 binding site, TTTAC) having a 5′ end with Cy5 dye (Integrated DNA Technologies, Inc., USA) was chosen for the assay. Both the protein and DNA were present in 10 mM Tris (pH 7.5), 75 mM NaCl, 6% glycerol, 1 mM DTT, 1 mM EDTA, 6.7 ng/L Poly(2′-deoxyinosinic-2′-deoxycytidylic acid) sodium salt (dIdC), 6.7 ng/L BSA. After mixing the DNA and protein, the reactions were incubated at room temperature for 30 min and then run on a 16% acrylamide gel in 1x TBE (Invitrogen) buffer at a constant voltage of 100 V for 140 min. The gel was imaged in a gel imager (Amersham Imager 600, GE Healthcare Life Sciences, USA) for Cy5 fluorescence.

5
Cultures transfected with pET28a-based plasmids expressing the NHA9 and other protein constructs (see Suppl. Table 3) were grown at 37 °C to an optical density at 600 nm (OD600) ~ 0.8. Protein expression was induced by the addition of 1 mM IPTG (GoldBio) and the cultures were incubated at 37 °C for 4 h. 8 L of bacterial culture were harvested by centrifugation and constructs were used; however, for concentrations between 10 nM to 160 nM, a final concentration of 10 nM Alexa Fluor-labeled of the NHA9 constructs was used.

Sequence analysis
We used a custom, Matlab-based computational pipeline called Swiss Army Knife (SAK) to analyze the amino acid sequence features of NHA9. The SAK pipeline identifies and annotates low-complexity regions by calculating the sequence Shannon Entropy 5,6 at each position, conserved domains and features by accessing the Conserved Domain Databases (CDD) 7-9 , predicted secondary structure using the JPred4 API 10 , predicted disordered regions and Molecular recognition features (MoRFs) by executing IUPred2A 11 , regions predicted to be enriched in Pi-Pi interactions and predicted prion-like domains using PScore 12 and PLAAC 13 , respectively, acidic and basic tracts in disordered regions using ABTScore 14 , hydropathy utilizing the Kyte-Doolittle hydropathy scale 15 as implemented in CIDER 16 , position-specific amino acid identities, and enrichment of amino acids for the overall sequence as compared to frequencies in the human proteome as reported in UniProtKB release 15 17 . The results of this analysis are then 7 exported as graphical summaries showing site-specific sequence features, as shown in Suppl.

NUP98 FOs
For all quantified images, the x and y dimensions were 0.11 m  0.11 m, with 992 total pixels in each dimension. The z-step size was 0.20 m, and 61 planes were image, giving 61 pixels in the z dimension. The volume for each 3D image is 61  992  992 pixels. Therefore, one cubic pixel volume (voxel) corresponds to (0.11 m  0.11 m  0.20 m) = 0.00242 m 3 . Individual puncta volume was determined in the unit of cubic pixel and converted to m 3 using the abovementioned conversion.

Background intensity correction
For each 3D image stack, background intensity correction was performed for each pixel based on the peak background intensity value at 101 a.u. in the mEGFP channel of untransfected images (see Suppl. Fig. 7K). For background correction, this background fluorescence value was subtracted from the fluorescence intensity of all image pixels. All the 3D images were used for image analysis after background correction, as described below.

Cell nuclei segmentation
Cell nuclei were segmented based on the DNA (Hoechst) channel using Cellpose 18 (v 0.6.5).
Since each 3D dataset represented only one layer of cells, cell nuclei were segmented in 2D in each z-layer and then combined into 3D stacks. Each individual z-layer intensity was normalized between the 0.25 th and 99.75 th percentiles and segmented with parameters: model type:" cyto", 8 diameter: 120 pixels (13.2 µm), flow threshold: 0.4, probability threshold: 0. The cytoplasm model performed better than the nucleus model for our datasets.
After segmenting nuclei in individual z-layers, results were combined into 3D stacks as follows: The total area of segmented nuclei was computed in each z-layer (excluding 10 layers at each border), and the layer with maximal total area was used as a reference for cell IDs. Cellpose segmentations of individual z-layers were then combined into a 3D stack, converted to a binary mask, and filtered with a 3D median filter (size 3 pixels), to remove objects that were detected in only one z-layer and fill gaps between masks that were one-layer wide. Finally, the binary mask of each z-layer was multiplied by the labels of the reference layer to assign cell IDs throughout the entire 3D stack.

Puncta segmentation
Detection and segmentation of puncta in the mEGFP channel were based on applying the scaleadapted Laplacian of Gaussian (LoG) filter 19 . We first detected puncta centers using the blob_log (LoG detector) function of the scikit-image library 20  constructs and NUP98 FOs images in HSPC cells, and NUP98-KDM5A patient derived xenograft (PDX) and the control human CD34+ cells). Puncta centers were additionally filtered according to intensity relative to the background mEGFP signal. To quantify the background mEGFP signal in each image, we computed a median mEGFP intensity in each cell, and then calculated the 95 th percentile of these values over the entire image. Puncta centers with an intensity lower than 3 times the background mEGFP signal were removed for the NHA9 constructs and NUP98 FOs images in HEK293T cells. The same was done for the NUP98 signal in quantifying the NUP98-KDM5A PDX and the control human CD34+ cells. For the NHA9 constructs and NUP98 FOs images in HSPC cells puncta centers with an intensity < 2 times the background mEGFP signal were removed. For quantifying puncta detected by the NUP98 antibody in NUP98-KDM5A PDX and control human CD34+ cells (Suppl. Fig. 7N, O), all puncta within 0.5 µm of the nuclear periphery were excluded to avoid fluorescence signal of endogenous NUP98 at the nuclear pore complex.
To segment individual puncta, we filtered the image with five LoG filters with scales linearly interpolated between min_sigma and max_sigma using the filters.gaussian and filters.laplace functions of scikit-image (v 0.18.1). We then calculated the pixel-wise maximum of filter responses. The result was thresholded at 0.001 (relative intensity in LoG scale units) for the NHA9 constructs and NUP98 FOs images in HEK293T cells and processed with a distance transform watershed using the puncta centers as seeds. For the NHA9 constructs and NUP98 FOs images in HSPC cells, and NUP98-KDM5A PDX and the control human CD34+ cells, the threshold for puncta segmentation was 0.0005.

Conversion of mEGFP fluorescence intensity to protein concentration
We converted the total mEGFP fluorescence intensity per pixel volume to the GFP molar concentration based on the calibration plot shown in Suppl. For each solution, 61 z-stack (0.2 m step size ranging 12.2 m total) images were recorded at 6 different positions using a single fluorescence channel for mEGFP. We used the image analysis pipeline discussed above to determine the total mEGFP fluorescence intensity from the images. We computed that one absorbance unit of total mEGFP fluorescence intensity per cubic pixel volume corresponds to a mEGFP concentration of 20 nM (Suppl. Fig. 1A). We used this conversion factor to convert mEGFP fluorescence intensities per unit volume into mEGFP molar concentrations, i) for each cell nucleus ([G-NHA9 construct]), ii) for the light phase of each nucleus ([LP]) and within puncta within each nucleus, determined as the average over all detected puncta ([DP]). We used the same conversion method for the images of G-NUP98 FOs in HEK293T cells and G-NHA9 constructs in HSPC cells, as well.

Extraction of thermodynamic features from image analysis
We applied several selection criteria during the analysis of thermodynamic parameters based upon the confocal fluorescence microscopy images of the NHA9 constructs, as follows. We where R is the gas constant (= 1.98  10 -3 kcal/ mol/ K) and T is temperature (= 310 K).

Quantification of nuclear peripheral puncta in HEK293T and mHSPCs
We quantified the nuclear peripheral puncta using distance transform method by calculating the Euclidean distance to the nucleus border for each pixel of the nucleus mask, and then averaged those values for individual puncta. This way we estimated the average distance of all puncta pixels to the nucleus border (per puncta). To obtain the percentage of nuclear peripheral puncta for each construct, we identified the number of puncta within a distance  0.5 µm from the nuclear periphery and compared that with the total number of puncta.  [LP] ) (D), and ΔG Tr (kcal/mol) (E) vs. [G-NHA9 Midi construct] for G-NHA9 Midi (lime green) and G-NHA9 Midi -21FGAA (light blue). Data is plotted on a semi-log (y-axis: log 10 ) scale in (B-D). The pairwise p-value between G-NHA9 Midi vs. G-NHA9 Midi -21FGAA is shown in each plot (B-E) (n = 731 and 791 in (B) including the cells with zero punctum, and n = 122 and 5 in (C-E) including the cells with zero punctum, respectively, for G-NHA9 Midi and G-NHA9 Midi -21FGAA). F, Representative images of fixed HEK293T cells expressing G-NHA9 and stained with a nuclear pore complex marker (NUP107, magenta). DNA is labeled with DAPI dye (blue). G, Quantitation of percent of total puncta localized within 0.5 m of the nuclear periphery for all G-NHA9 constructs in HEK293T cells. Suppl. Figure 6. Characterization of the developmental features of lin-HSPCs transduced with G-NHA9 constructs. A, Schematic for colony forming unit assay. B, Representative images of individual colonies from colony forming assays for lin-HSPCs expressing G-NHA9, G-NHA9-8FA, G-NHA9-21FGAA, and G-NHA9 Midi . C, Immunophenotyping of lin-HSPCs expressing G-NHA9, G-NHA9-8FA, G-NHA9-21FGAA, and G-NHA9 Midi after three weeks of growth in methylcellulose containing myeloid and erythroid growth factors. Data shown are from the mCherry-and mEGFP-positive, CD117-negative, FCε receptornegative live singlet population in a representative experiment. D, Volcano plots for RNA-seq data for empty vector versus G-NHA9 or mutants. RNA sequencing was performed for lin-HSPCs expressing empty vector, G-NHA9 or mutants after one week of growth in methylcellulose containing myeloid and erythroid growth factors. N = 5 for each condition.