Gene fusions frequently result from rearrangements in cancer genomes. In many instances, gene fusions play an important role in oncogenesis; in other instances, they are thought to be passenger events. Although regulatory element rearrangements and copy number alterations resulting from these structural variants are known to lead to transcriptional dysregulation across cancers, the extent to which these events result in functional dependencies with an impact on cancer cell survival is variable. Here we used CRISPR-Cas9 dependency screens to evaluate the fitness impact of 3,277 fusions across 645 cell lines from the Cancer Dependency Map. We found that 35% of cell lines harbored either a fusion partner dependency or a collateral dependency on a gene within the same topologically associating domain as a fusion partner. Fusion-associated dependencies revealed numerous novel oncogenic drivers and clinically translatable alterations. Broadly, fusions can result in partner and collateral dependencies that have biological and clinical relevance across cancer types.
This study provides insights into how fusions contribute to fitness in different cancer contexts beyond partner-gene activation events, identifying partner and collateral dependencies that may have direct implications for clinical care.
Structural variants (SV), including insertions, deletions, copy number alterations, translocations, and other complex rearrangements, play an integral role in oncogenesis (1–3). These somatic events are particularly enriched in pediatric cancers, which are classically characterized by otherwise low mutational burdens; however, their importance to cancer spans across the spectra of age and histology (4, 5). Prior work demonstrated that SVs contribute to transcriptional dysregulation through cis-regulatory element rearrangement (such as “enhancer-hijacking”), copy number alterations leading to changes in gene dosage, and other variations on these phenomena (6–11). Changes in RNA expression within gene sets have been used to interpret SVs relevant for cancer pathogenesis. However, it is not known whether these expression changes at the level of individual genes create functional dependencies with an impact on cancer cell survival (12).
Genome-scale CRISPR-Cas9 knockout screening has enabled exploration of the functional importance of individual genes in numerous cancer contexts, and a collaborative effort has conducted CRISPR-Cas9 loss-of-function screening across established cancer cell lines to develop the Cancer Dependency Map (DepMap) project (n = 769 cell lines to date; refs. 13–15). Picco and colleagues previously demonstrated that CRISPR-Cas9 screening could be applied to understand the functional impact of partner genes in well-known and some less-well-characterized fusions across 371 cell lines (16). However, the full extent to which fusions create (i) functional dependencies on partner genes, or (ii) “collateral dependencies” on genes within the same topologically associating domains (TAD) as the fusion partners is unknown. We hypothesized that a significant number of fusions create dependencies on fusion partners as well as collateral genes through cis-regulatory element rearrangement, copy number change, or variations on these two processes (Fig. 1A). We therefore integrated fusion calls and dependency data in cell lines from DepMap to characterize the impact of fusions on cancer cell survival as mediated through partner and collateral dependencies, reasoning that a subset of these occurrences would reveal novel insights into cancer biology and therapeutic target development opportunities with potential for clinical translation.
Materials and Methods
DepMap data source
We used a publicly available collection of annotated cell lines previously compiled and characterized by the Cancer Cell Line Encyclopedia (CCLE), as well as associated fusion calls, genome-scale CRISPR knockout screening (dependency data), RNA expression, copy number alterations, mutation calls, and sgRNA locations from the following DepMap release (Supplementary Table S1): “DepMap, Broad (2020): DepMap 20Q2 Public. figshare. Dataset.” https://doi.org/10.6084/m9.figshare.12280541.v4. All subsequent genomic analyses related to these cell lines were based on alignment to hg38.
Fusion calls and CRISPR–Cas9 dependency data
RNA sequencing (RNA-seq)-based methods have been used extensively for fusion detection (17). The STAR-Fusion pipeline version 1.6.0 was previously used to identify fusions from RNA-seq data across cell lines in the CCLE (https://github.com/STAR-Fusion/STAR-Fusion/wiki). Among 1,299 cell lines, 21,999 unique fusions were identified after a preliminary filter was applied to remove the following as described in the DepMap documentation. We applied further filtering to select for high-confidence fusion calls (18) by focusing only on fusions with FFPM > 0.1, those with 5′ GT and 3′ AG dinucleotide breakpoints associated with canonical splicing (19, 20), those with >5 junction reads, and those that involved protein-coding partners or the IgH locus. This produced a list of 5,093 high-confidence fusions in 1,075 cell lines.
We used dependency data for 18,119 genes across 769 cell lines, processed as previously described in DepMap documentation to obtain CERES gene scores (correcting for gene independent CRISPR-Cas9 cutting in copy number amplified regions), and subsequently converted to dependency probability scores intended to infer the probability that a score represents a true dependency. As per DepMap documentation, because dependency probability scores take screening quality into account and may be utilized to identify which cell lines are more sensitive to CRISPR knockout when stratifying by the presence of a biomarker, they were used for all analyses. We took the intersection of cell lines with high-confidence fusion calls and those with dependency data to arrive at a starting set of 3,277 fusions across 645 cell lines.
Using fusions as a biomarker to identify associated dependencies across DepMap through genome-scale screening
For each of the 3,277 fusions, all cell lines were stratified by the presence or absence of the fusion of interest, and the mean dependency probability score for each of 18,119 protein-coding genes was calculated for each group. As 94% of fusions were observed in only a single cell line, a two-sample t test with the assumption of equal variance was carried out as a screen to identify genes that were likely to be dependencies based on the difference in scores between both groups. We drew upon the assumption that for each fusion, all cell lines were drawn from the same underlying population with a uniform variance. Correction for multiple-hypothesis testing was done using the Benjamini–Hochberg method to arrive at Q values (using Q < 0.05 to identify genes having a significant effect on cell survival). Building upon the consensus a prior threshold for a true dependency in the DepMap data set of an absolute dependency probability >0.5, we additionally only focused on differential dependencies for this analysis (those with a dependency probability score difference > 0.5 between cell lines with the fusion of interest and those without), to stringently highlight the genes having the most significant unique impact on cell lines with the fusion of interest. Q values were −log10 transformed for data visualization.
To identify associated dependencies for each fusion, we selected fusion partners and genes in close proximity to fusion partners. Prior studies have shown that TAD boundaries are largely invariant between cell types (21). Although rearrangements that result in fusions will inevitably disrupt many TAD boundaries in cancer cells, we used TAD boundaries from a normal endothelial cell line as a generalizable approximation of genes in close proximity to fusion partners (22, 23). Genes in close proximity were identified on the basis of TADs from a Hi-C experiment done on an endometrial microvascular endothelial cell line; we downloaded the BED file with the identifier ENCFF633ORE from ENCODE and utilized the UCSC LiftOver tool to convert hg19 coordinates for TAD boundaries to hg38 coordinates (https://genome.ucsc.edu/cgi-bin/hgLiftOver; refs. 24, 25). Using gene coordinates from the BioMart package, SQL queries were carried out to assign each gene to a single TAD, with the exception of genes falling in TAD boundaries, which were assigned to upstream and/or downstream TADs within 40 kb of the starting gene coordinates (Supplementary Tables S2 and S3; ref. 26). Fusion-associated dependencies that were in close proximity to fusion partners were defined as collateral dependencies based on this assignment.
We defined a fusion-dependency pairing as a unique combination of a partner or collateral dependency with a given fusion. To obtain a conservative total count of fusion-dependency pairings, we counted only unidirectional transcripts for fusions that had both forward and reverse fusion transcripts detected by RNA-seq. Of the 104 fusions identified to contribute the 112 unique partner fusion-dependency pairings, STAR-Fusion was able to predict in-frame versus out-of-frame status for 75; manual curation to include BCL2-IgH and RUNX1-IgH as out-of-frame fusions led to a total of 77 fusions for which in-frame versus out-of-frame status was available. Of the 77 fusions with a predicted protein coding consequence, 50 (65%) were predicted to be in-frame whereas 27 (35%) were predicted to be out-of-frame (Supplementary Table S4). To obtain a count of fusion-dependency pairings accounting for complex rearrangements within TADs, we removed instances where a dependency was associated with multiple fusions in the same cell line (retaining partner dependencies in the rare cases that a gene was a partner and collateral dependency in the same cell line); this count was used as a comparator for permutation schemes.
Structural variant analysis in cell lines
Structural variant calls from WGS data were available for 329 cell lines using the SvABA structural variant caller as previously described (1). For the 209 cell lines with dependency data for which structural variant data were available, we evaluated whether fusions (those with associated dependencies and those without) were seen as part of a rearrangement, or if one or both partners were seen as part of separate rearrangements.
For the 103 fusion-cell contexts with associated dependencies and concurrent WGS structural variant data, we evaluated whether fusions appeared to be associated with simple or complex structural variants. For each fusion-cell context, we quantified the number of structural variants in the same TAD that did not involve at least one of the fusion partners. We used the ShatterSeek analysis package (27) to synthesize copy number and structural variant data in five different cell lines harboring fusion-associated dependencies (NCIN87, DU4475, THP1, HCC38, and PANC1) to visualize the relationship of fusions with simple and complex structural variants in these cell lines.
Enrichment of SV partner and collateral genes among absolute dependencies
We analyzed structural variant calls involving at least one coding partner across 209 cell lines that also had dependency data. For each cell, we (i) identified genes that were absolute dependencies (those with a dependency probability score > 0.5, consistent with the DepMap consensus a priori threshold for a true dependency), (ii) identified genes that were “partners” in a SV in that cell line, and (iii) identified genes that were “collateral” genes relative to a SV in that cell line, based on our previous definition of normal TADs.
Across all cell lines, we observed the total instances where a partner gene was an absolute dependency (vs. not a dependency) compared with other genes, calculated ORs, and determined significance using Fisher exact tests. We repeated this analysis independently for the collateral genes. We carried out this analysis for structural variants in aggregate, as well as for fusions, translocations (interchromosomal rearrangements detected by WGS), inversions, deletions, and duplications individually. Fusions in this analysis were those identified from RNA-seq using STAR-Fusion (as previously described) and with additional evidence in the WGS data of a SV correlate (manifesting as one or multiple of the other SV categories of deletion, duplication, inversion, or translocation). Fusion calls meeting these criteria were identified in 162 of the 209 cell lines with WGS and dependency data. Desiring a family-wise error rate (FWER) < 0.05, we used a Bonferroni correction for the 12 hypotheses tested to obtain a threshold of P < 0.004 to ascertain significant associations. Analysis of ORs among individual cells was focused only on the 162 cell lines with fusion calls, and distributions were compared using two-sided t tests.
Copy number and mutational analysis in fusion-associated dependencies
Gene level copy number and mutation data were matched to each fusion-dependency-cell line context. Mutations identified included silent, missense, splice site, nonsense, and in-frame deletions. Copy number data were log2 transformed with a pseudo count of 1. We used <0.5 as the threshold for copy number loss, and >1.5 as the threshold for copy number gain.
Identifying COSMIC fusion genes, COSMIC cancer census genes, and kinases
Known Catalogue of Somatic Mutations in Cancer (COSMIC) fusions were utilized from the COSMIC fusion census (https://cancer.sanger.ac.uk/cosmic/fusion). A list of known cancer driver genes from the COSMIC cancer census (https://cancer.sanger.ac.uk/cosmic/curation) and a curated list of kinases from a prior study (29) were used to annotate remaining fusions with potentially interesting biology.
Evaluating FOXR1 overexpression and fusion-calling in clinical samples
A total of 12,747 clinical tumor samples with log2-normalized TPM RNA expression data available through the UCSC Treehouse Childhood Cancer Project (Treehouse Tumor Compendium V11 Public PolyA; ref. 30) were screened for significant FOXR1 overexpression, using a threshold of log2 (TPM + 1) > 2. The histologies of specific samples with FOXR1 overexpression were identified, and given the recurrence of neuroblastoma as a histologic subtype, four neuroblastoma samples from the TARGET study were selected for further evaluation of the presence of a FOXR1 fusion (sample IDs: TARGET-30-PASUCB, TARGET-30-PASPBZ, TARGET-30-PASSWW, TARGET-30-PARBAJ). Two independent fusion-callers, STAR-Fusion and FusionCatcher (31), were used to call fusions from hg19-aligned FASTQ RNA-seq files of these samples available for controlled access download through dbGaP (phs000218.v22.p8). WGS-based copy number variant calls from the Complete Genomics CNV pipeline (https://target-data.nci.nih.gov/Public/NBL/WGS/ L3/copy_number/CGI/), specifically at the 11q23.3 locus where FOXR1 is located, were additionally evaluated in these samples.
Experimental validation of FOXR1 fusion dependency in cell lines
143B (obtained from the Broad Institute; used within 6 months of collection; authenticated by L. Guenther using STR profiling; no mycoplasma testing conducted) and CALU6 (obtained from ATCC; used within 6 months of collection; authenticated by ATCC using STR profiling; mycoplasma testing conducted by ATCC, mycoplasma not detected) cells were grown in Eagle's minimum essential medium (EMEM) supplemented with 10% FBS and penicillin–streptomycin (Thermo Fisher Scientific, MT30002CI). To validate FOXR1 fusion dependencies, we used CRISPR-Cas9 sgRNAs targeting FOXR1 to knockout PAFAH1B2-FOXR1 in 143B cells and RPS25-FOXR1in CALU6 cells. The sgRNA sequences are as follows: sgFOXR1–1 5′ GAGACCTCCAGCTTTCCAGG 3′; sgFOXR1–2 5′ GGAAGATGCCAGCTGCTCAG 3′; sgFOXR1–3 5′ TGAGACCTCCAGCTTTCCAG 3′. We infected cells with either nontargeting (NT) or FOXR1-targeting sgRNA, selected cells with puromycin (1 μg/mL), and assessed growth using CellTiter-Glo according to manufacturers' instructions. Immunoblot was performed to confirm knockout of the fusion-oncoproteins using anti-FOXR1 (21942–1-AP; Thermo Fisher Scientific). Anti-vinculin (13901; Cell Signaling Technology) was used as a loading control. For crystal violet staining, cells were fixed and stained using crystal violet solution (1% crystal violet, 20% methanol) for 20 minutes, washed, and imaged.
Analysis of drug data
AUC values representing sensitivity to compounds and associated metadata were utilized from the Cancer Therapeutics Response Portal and available through DepMap (32–34). The AUC for venetoclax for the B-ALL cell line with the BCL2-IgH fusion was compared with other B-ALL cell lines without this fusion, as well as all other cell lines. We took an unbiased approach to screening for compound sensitivity for multiple myeloma cell lines, stratifying by the presence or absence of the IgH-NSD2 fusion, and using a two-sample t test with the assumption of equal variance. We corrected for multiple-hypothesis testing using the Benjamini–Hochberg method to arrive at Q values. We annotated compounds for the inhibition of kinases to identify compounds of interest.
Analysis of histone ChIP-seq and DNASE-seq data
We downloaded BigWig files from ENCODE for the histone marks of interest for the KMS11 cell line with IgH-NSD2 fusion, NCIH929 cell line with IgH-NSD2 fusion, MM1S cell line without IgH-NSD2 fusion, and peripheral blood mononuclear cell lines. We additionally utilized BigWig files from ENCODE for DNASE-seq analysis of the NCIH929 cell line with IgH-NSD2 fusion, RPMI8226 cell line without IgH-NSD2 fusion, and normal B cell lines (Supplementary Table S5). Epigenetic data were visualized using the integrative genomics viewer (IGV): version 2.8.2.
Calculation of phenotypic kill scores in CRISPR spheroid models
We normalized CRISPR guide dropout for genes of interest to nontargeting guides to calculate phenotypic kill scores as described previously (35). CRISPR sgRNA data from three spheroid models derived from NCIH23, NCIH1975, and NCIH2009 cell lines was analyzed as follows: for each replicate, the count of sgRNAs targeting coding genes and nontargeting sgRNAs was normalized for two replicates at day 21 relative to day 0, and log2-fold change values were calculated. The median and SD of log2-fold change of nontargeting guides for each replicate was calculated, and the log2-fold change values for targeting sgRNAs were normalized using these values to yield phenotypic Z (phenotypic kill) scores as described previously (35). The distribution of phenotypic kill scores for all guides targeting EML4 in the NCIH23 spheroid model (containing the THADA-MTA3 fusion) was compared with the distribution of phenotypic kills scores for all guides targeting EML4 in the NCIH1975 and NCIH2009 spheroid models. This analysis was repeated and the mean of phenotypic kill scores was calculated for all previously defined nonessential genes, available through DepMap.
Across all SVs, partner and collateral genes demonstrate the greatest enrichment among dependencies in the context of fusions
We first aimed to demonstrate that fusion partner genes and collateral genes, respectively and independently, were significantly enriched among functional cancer dependencies. Through analysis of whole-genome sequencing (WGS), RNA-seq-based fusion calls, and genome-scale dependency data for 209 cancer cell lines, we identified >26,000 SVs (translocations, inversions, deletions, duplications, and gene fusions) to assess for enrichment of partner genes (those directly involved in the SV of interest) and collateral genes (those in the same TAD as the SV of interest) among absolute dependencies (Supplementary Table S6, Materials and Methods). We found that there was a significant enrichment of partner genes among dependencies in the context of fusions, and either no enrichment or depletion of partner genes among dependencies in the context of all other SV groups (Fig. 1B; Supplementary Fig. S1A). There was significant enrichment of collateral genes among dependencies across all SV groups, but this enrichment was greatest for fusions as compared with all other SVs (Supplementary Figs. S1B and S1C). For fusions, we additionally carried out iterative enrichment analyses removing a single cell line at a time (Supplementary Materials and Methods; Supplementary Tables S7 and S8), as well as multivariate logistic regression analyses (Supplementary Tables S9 and S10), and confirmed that these observations were not driven by any single cell line or disease category (Supplementary Materials and Methods). These results were robust to variations in TAD size and definitions (Supplementary Fig. S2A–S2C; Supplementary Materials and Methods). Thus, the strongest enrichment of partner genes and collateral genes among dependencies occurred in the context of fusions.
Fusion-associated differential dependencies include partners and collateral genes, occurring more than would be expected by chance
We then developed a statistical framework to identify and nominate biologically relevant fusion-associated partner and collateral dependencies (Materials and Methods). Specifically, we performed dedicated analyses to (i) assess for differential dependencies in the context of fusions, (ii) validate the presence of fusions and associated dependencies through multiple approaches, and (iii) evaluate the relationship of fusions with simple or complex SVs. First, we assessed 3,277 fusions present in 645 cell lines with corresponding genome-scale dependency data (Materials and Methods, Supplementary Fig. S3A, range of mean fusions per cell line: 1–20, Supplementary Fig. S3B; Supplementary Table S11). For each fusion, we carried out a statistical genome-scale screen to identify genes that had differential dependencies leading to increased fitness in the cell line(s) containing the fusion (Materials and Methods). On the basis of our preceding enrichment analysis, we hypothesized that a subset of fusions would lead to expression changes or activation of proto-oncogenes, creating differential gene dependencies resulting from the rearrangements themselves. Thus, within each fusion-dependency relationship, genes that were identified as being differential dependencies selectively in cell lines with the fusion (range 0–260 genes, mean 37 genes) were evaluated to identify fusion partner dependencies and collateral dependencies, independently (referred to as fusion-associated dependencies hereafter; Fig. 1C, Materials and Methods).
Across all fusions, 363 (11%) had at least one fusion-associated dependency. Fusion-associated dependencies were observed in 223 cell lines (35%) and occurred in greater than half of leukemia, breast cancer, multiple myeloma, bone cancer, liposarcoma, and other sarcoma cell lines (Fig. 1D). We identified 659 unique fusion-dependency pairings in total (112 partner, 547 collateral; Supplementary Table S12); accounting for complex rearrangements (by removing instances of dependencies associated with multiple fusions in the same cell line), we observed 483 fusion-dependency pairings (100 partner, 383 collateral; Fig. 2A, Materials and Methods). Of 223 cell lines with fusion-associated dependencies, 207 (93%) had at least one hotspot driver mutation (range 1–25 driver mutations, mean 2.5 driver mutations; Supplementary Materials and Methods; Supplementary Table S13). Cell lines without fusion-associated dependencies had a significantly increased number of hotspot driver mutations (range 1–62 driver mutations, mean 3.4 driver mutations; Supplementary Fig. S4A, P = 0.003, two-sided t test), although the proportion of cell lines with hotspot driver mutations was comparable (383 of 422 cell lines, 91%, P = 0.456, Chi-squared test). Fusion-associated dependencies contributed to cancer cell fitness uniquely, even in the presence of other hotspot driver mutations.
Next, to demonstrate that this phenomenon of fusion-associated dependencies was occurring more than would be expected by chance, we carried out fusion-label permutation testing (breaking the link between fusions and dependency scores to create a null distribution, Fig. 2B) and gene-label permutation testing (breaking the link between genes and dependency scores to create a null distribution, Supplementary Figs. S4B and S4C) for partner and collateral dependencies, respectively and independently. Our observed counts of partner and collateral fusion-dependency pairings were significantly greater than those expected by chance (P < 0.001, Supplementary Materials and Methods).
Because of the non-Gaussian distribution of dependency scores and the small numbers of cell lines with any given fusion, we performed additional cell line permutation-based FDR estimation as an approach to fusion-associated dependency discovery, and found that 459 fusion-dependency pairings (70%) identified by our genome-scale screen met the FDR threshold < 0.05 by this approach as well (Fig. 2C, Supplementary Materials and Methods, Supplementary Fig. S5, Supplementary Table S14). The fusion-dependency pairings identified by both approaches were most likely to have biological relevance, and we therefore prioritized these pairings for further study.
Furthermore, for the subset of cell lines with corresponding WGS (Fig. 2D), there was evidence for the presence of a correlated structural variant in 86 of 103 fusion-cell contexts (83%, Materials and Methods), comparable with an evaluation of WGS correlates to fusions detected by RNA-seq in clinical samples (10). This proportion was not significantly different between fusions with associated dependencies and those without (83% vs. 76%, P = 0.11, Fisher exact test; Supplementary Fig. S6A). We also evaluated whether the fusions with associated dependencies from cell lines were identified in a prior study of clinical samples from The Cancer Genome Atlas (TCGA; ref. 29). Of 363 fusions with associated dependencies, 295 were in cell lines with the same histology as tumors in the TCGA (Fig. 2E; Supplementary Fig. S6B; Supplementary Table S15). Of those, 23 (8%) had an exact fusion match in the TCGA, though 291 fusions (99%) had a partner that was seen as part of a fusion in the TCGA, supporting the relevance of preclinical dependency analysis to patient cohorts (Supplementary Materials and Methods). These analyses provided additional orthogonal validation of the fusions identified for study.
Finally, we also sought to determine when gene fusions with associated dependencies were part of simple or larger complex SVs. Gene fusions varied with regard to their association with other SVs in close proximity: 37% of fusions had no additional SVs in the same TAD beyond those involved in the fusion, 41% had 1 to 5 other same-TAD SVs, 13% had 6–10 other same-TAD SVs, and 9% had >10 other same-TAD SVs (Materials and Methods, Supplementary Fig. S6C). In some cases of collateral dependencies, fusions could be more directly linked to the dependency in question (Supplementary Figs. S7A–S7C), whereas in other cases, they were proxies for larger complex SVs collectively contributing to cis-regulatory element rearrangement and copy number change (Supplementary Figs. S7D and S8A–S8B). Thus, for fusions with partner and collateral dependencies in cancer cell lines, there were multiple lines of evidence for their enrichment and clinical relevance to support further investigation.
Copy number, mutational, and transcriptional landscapes intersect with fusion-associated dependencies
We next evaluated copy number alterations and somatic mutations for their potential to contribute to the development of fusion-associated dependencies (Materials and Methods). Partner dependency genes were amplified in 65 of 212 fusion-dependency-cell contexts (31%), whereas collateral dependency genes were amplified in 405 of 588 fusion-dependency-cell contexts (69%, Fig. 3A). This high rate of copy number amplification of fusion-associated dependency genes aligned with prior work showing rearrangements and copy number alterations are highly inter-related (36). However, mutations involving fusion-associated dependency genes were relatively infrequent, with partner dependency genes harboring mutations in 16 of 217 fusion-dependency-cell contexts (7%), and collateral dependency genes harboring mutations in 27 of 589 fusion-dependency-cell contexts (5%, Fig. 3B).
Prior studies demonstrated that fusions can contribute to transcriptional dysregulation in patient samples (6, 7, 9). Thus, we next evaluated RNA expression data for all fusions with orthogonal dependency data to determine the degree of overlap between genome-scale pan-cancer unbiased overexpression and dependencies for fusion-associated genes (Fig. 3C; Supplementary Table S16; Supplementary Materials and Methods). Of 631 fusion-associated overexpressed partner genes, 40 were also dependencies (6%). Similarly, of 1,400 fusion-associated overexpressed collateral genes, 70 were dependencies (5%). Although many fusions led to overexpression of associated genes, only a small proportion of these cases were deemed essential to cell survival through this screening modality.
We noted that 40 of 112 (36%) partner dependencies and 70 of 547 (13%) collateral dependencies were differentially overexpressed in an unbiased manner. However, we observed at least a log2-fold change TPM > 1 in 59% of all fusion-associated dependencies without correcting for genome-scale significance (Supplementary Table S12). This was significantly greater than the 5% of fusion-associated dependencies with log2-fold change TPM < −1 (Supplementary Fig. S9A and S9B, P < 0.001, Fisher exact test). Certain well-characterized fusion-associated dependencies did not meet these criteria for overexpression. For instance, BCR and ABL1 were strong dependencies associated with the BCR–ABL1 fusion, but did not meet the threshold of log2-fold change TPM > 1. Similarly, for the KMT2A fusions established as childhood myeloid and lymphoid leukemia drivers (37, 38), KMT2A was a dependency but did not demonstrate significant overexpression in cell lines harboring these fusions compared with all cell lines without these fusions (Supplementary Fig. S10A–S10D). Thus, essential fusions can induce modest expression changes in associated genes that are context-specific without manifesting as unbiased overexpression, but these events are still important to cell survival and support expression dysregulation as the mechanism leading to fusion-associated dependencies.
COSMIC fusions demonstrate utility of CRISPR-Cas9 for identifying essential genes
Many kinases and COSMIC Cancer Census genes appear among other fusions with associated dependencies. To further explore the biological relevance of fusion-associated dependencies, we examined whether recurrent biologically established fusions defined by COSMIC (“COSMIC fusions”) could be recovered using genome-scale CRISPR-Cas9 loss-of-function screening (39). Across all high-confidence fusion calls in cell lines with genome-scale dependency data, we identified 35 unique COSMIC fusions: 19 fusions had a partner dependency, 1 was associated with a collateral dependency, and 15 had no associated dependency (Fig. 3D). For fusions such as BCR–ABL1 and PAX3–FOXO1, both partner genes were differential dependencies. For EWSR1–FLI1, only FLI1 was a differential dependency, as EWSR1 is a common essential gene in many different cell contexts (Fig. 3E; Supplementary Figs. S11A–S11F). For some fusions resulting from unbalanced rearrangements like EWSR1–ERG, single-guide-RNA (sgRNA) location could preclude a partner screening as a dependency (Fig. 3F; Supplementary Figs. S12A and S12B; Supplementary Table S17; Supplementary Materials and Methods). Using COSMIC fusions as a positive control for this screening modality, we demonstrated that CRISPR-Cas9 loss-of-function screening could identify fusion-associated dependencies in 20 of 35 (57%) cases where we would have expected them to exist.
We also assessed partner and collateral dependencies that involved either kinases or COSMIC cancer census genes, with the hypothesis that many of these would have biological relevance (Supplementary Fig. S13A). The BCL2–IgH fusion has been previously reported in various hematologic malignancies (40, 41), and it was observed concurrently with a BCL2 missense mutation in a B-ALL cell line (JM1), contributing to a partner dependency on BCL2. Cancer Therapeutics Response Portal compound screening data for JM1 showed it was highly sensitive to the BCL2 inhibitor venetoclax when compared with all other cancer cell lines, with median sensitivity in comparison with other B-ALL cell lines (Supplementary Fig. S13B; refs. 32–34). Genes that were copy number amplified, such as MDM2 and ERBB2, were involved in fusions as well, and sometimes associated with multiple collateral dependencies (Supplementary Fig. S13C; Supplementary Fig. S14A–S14D). A study from the PCAWG consortium showed that the OE33 esophageal cancer cell line was characterized by an inversion around ERBB2, disrupting a TAD boundary and leading to the fusion of two TADs (21). We found that the ERBB2-JUP fusion in the OE33 cell line was associated with four collateral dependencies in addition to the partner dependency on JUP. The known formation of a neo-TAD provides a mechanistic explanation for expression changes and the consequent presence of multiple dependencies in close proximity to ERBB2 in this cell line. Therefore, through the biological priors of COSMIC fusions and other COSMIC genes involved in known structural variants, we demonstrated that genome-scale CRISPR-Cas9 loss-of-function screening was effective in identifying true fusion-associated dependencies.
Transcription factors are recurrent fusion-associated dependencies
Having established the biological and statistical bases for partner and collateral dependencies, we next assessed statistically significant fusion-dependency pairings (those identified by multiple approaches) for functional relevance (Supplementary Table S12; Materials and Methods). Among recurrent fusion-associated dependencies, several Forkhead-box transcription factors (42, 43) were essential to cancer cell survival. We observed three instances of intrachromosomal FOXR1 fusions associated with FOXR1 as a differential dependency, occurring independently in osteosarcoma, lung adenocarcinoma, and bladder carcinoma cell lines (Fig. 4A). All three fusions preserved the active Forkhead domain of FOXR1. There have been previous reports of intrachromosomal fusions involving FOXR1, a member of the Forkhead-box family, in rare cases of neuroblastoma. The FOXR1 fusions in cell lines were associated with the overexpression of FOXR1, which is normally only seen in embryogenesis (Supplementary Fig. S15A–S15C; refs. 44, 45). Therefore, there was strong statistical evidence for FOXR1 fusions creating fusion-associated dependencies with implications for oncogenesis.
To demonstrate the clinical relevance of FOXR1 fusions, we used FOXR1 overexpression in a cohort of >12,000 clinical samples as a preliminary screen for identifying clinical samples that may harbor this fusion (Materials and Methods; Supplementary Fig. S16A). Among clinical samples with the highest degree of FOXR1 overexpression were four neuroblastoma samples from the TARGET study (Supplementary Fig. S16B; ref. 45). We identified FOXR1 fusions in all four neuroblastoma samples, and demonstrated that in the three cases of intrachromosomal fusions, there was associated copy number alteration at the 11q23.3 locus where FOXR1 resides (Supplementary Table S18).
Having established the clinical relevance of FOXR1 fusions, we next validated the observed FOXR1 dependencies in two cell line models. In the osteosarcoma cell line 143B harboring a PAFAH1B2-FOXR1 fusion and associated FOXR1 dependency, we found that CRISPR-mediated knockout of the fusion led to a significant reduction in cell growth (Fig. 4B). Similarly, in the lung cancer cell line CALU6 harboring a RPS25-FOXR1 fusion and associated FOXR1 dependency, we demonstrated that CRISPR-mediated knockout of the fusion resulted in decreased cell growth (Fig. 4C). Thus, FOXR1 fusions created dependency on the fusion partner FOXR1, integral to cancer cell survival when present.
FOXA1 is another member of the Forkhead-box family, and contributes to oncogenesis in different cancers, playing a central role in prostate cancer (42). Prior work demonstrated that rearrangement in the FOXA1 TAD leads to the hijacking of a nearby enhancer known as FOXMIND and contributes to FOXA1 overexpression in a prostate cancer cell line, VCAP (46). We observed the presence of a TTC6–MIPOL1 fusion in two prostate cancer (including VCAP) cell lines, as well as one colorectal cancer, breast cancer, and lung cancer cell line. TTC6 and MIPOL1 flank the FOXA1 locus; for the prostate cancer and colorectal cancer cell line with the TTC6–MIPOL1 fusion and dependency data available, FOXA1 was a strong collateral dependency (Supplementary Fig. S17A). In other cell lines with the fusion but without dependency data available, FOXA1 was highly overexpressed. TTC6–MIPOL1 has been previously reported as a recurrent adjacent gene rearrangement in breast cancers (47). Our results suggest that this previously described rearrangement contributes to oncogenesis through FOXA1 overexpression not only in prostate cancer, but other cancers as well.
Finally, among other transcription factors, we observed HNF1A as a collateral dependency associated with two distinct fusions in gastric cancer cell lines, and it was associated with a mean log2-fold change of 4 in expression (Supplementary Fig. S17B). The context specificity suggests that rearrangements in close proximity to HNF1A may contribute to its overexpression and resulting essentiality in some gastric carcinoma cell lines. In summary, we established that fusions contribute to oncogenesis in several instances by creating partner and collateral dependencies on transcription factors.
Clinical applicability of fusion-associated dependencies
We finally examined whether highly recurrent clinically observed fusions created potential clinically actionable collateral dependencies. Approximately 15% of patients with multiple myeloma have a translocation (4;14), which is associated with poor prognosis (48). In this translocation, the IgH enhancer is juxtaposed with NSD2 and FGFR3, leading to aberrant expression of both genes located in close proximity to each other. Because FGFR3 overexpression is not universal in t(4;14) cases, there have been different reported conclusions about the gene that is most relevant to oncogenesis in the presence of this rearrangement (49–52). Here, five t(4;14) multiple myeloma cell lines with dependency data were identified as having an IgH–NSD2 fusion. Compared with multiple myeloma cell lines without this fusion, both FGFR3 and NSD2 were overexpressed (Fig. 5A; Supplementary Figs. S18A–S18C). However, only FGFR3 was a strong dependency in these cell lines (Fig. 5B; Supplementary Table S19). Concurrent FGFR3 mutations were seen in three of the five cell lines (missense in KMS18 and OPM2, silent in KMS11). FGFR3 remained a dependency in two of the cell lines without concurrent FGFR3 missense mutations (KMS26 and KMS34), supporting IgH–NSD2 as the primary molecular lesion driving this dependency.
A multiple myeloma cell line with the fusion (KMS11) was characterized by increased H3K27ac and H3K9ac, and relatively decreased H3K27me3, at FGFR3 reflective of an active transcriptional state in the presence of the fusion (Fig. 5C; Supplementary Figs. S19A and S19B; refs. 22, 23). In addition, the top two statistically significant therapies in this context were cediranib and lenvatinib, which are multikinase inhibitors that also have established anti-FGFR activity (Fig. 5D; refs. 53–55). Although not statistically significant for an unbiased screen, FGFR3 inhibitors AZD4547 and nintedanib also demonstrated increased activity against multiple cell lines with the IgH–NSD2 fusion (Fig. 5D). Integrating collateral dependencies with matching epigenetic and therapeutic data, we found FGFR3 to be the targetable dependency in t(4;14) multiple myeloma cell lines.
Finally, given the patient-specific nature of many fusion-associated dependencies, we evaluated whether such events could be translated to spheroid models, which have demonstrated utility for patient-derived prospective precision cancer medicine studies (56). Han and colleagues performed genome-scale CRISPR screening for dependencies in multiple spheroid models, one of which was derived from the NCIH23 lung cancer cell line with a THADA–MTA3 fusion (35). In addition to being a partner dependency in DepMap cancer cell lines (C10, NCIH3122) with the EML4–ALK COSMIC fusion (57), we observed EML4 to be a collateral dependency in the NCIH23 cell line with the THADA–MTA3 fusion. There was strong evidence for the presence of the THADA–MTA3 fusion in the NCIH23 cell line from RNA-seq and WGS data (Fig. 6A).
In evaluating phenotypic kill scores for sgRNAs targeting EML4, there was increased dependency on EML4 in the spheroid model derived from the NCIH23 cell line in comparison with the spheroid models derived from cell lines without the THADA–MTA3 fusion (P = 0.013, two-sided t test; Fig. 6B; Supplementary Fig. S20A). To ensure that this was not the case for all genes, we evaluated mean phenotypic kill scores for sgRNAs targeting nonessential genes as defined by Hart and colleagues, and observed a similar distribution in spheroid models with and without the THADA–MTA3 fusion (Fig. 6B; Supplementary Fig. S20B; ref. 58). EML4 as a fusion-associated dependency appeared to be relevant in three-dimensional cancer models, establishing relevance for discovering potentially actionable fusion-associated dependencies from clinical cancer samples.
In this study, we demonstrated that many fusions contribute to cancer cell survival by creating partner and collateral dependencies. We also showed that while fusions frequently lead to transcriptional dysregulation, which is the likely intermediate mechanism for creating fusion-associated dependencies when they exist, there is only modest overlap between the unbiased overexpression and dependency spaces. Not all transcriptional dysregulation resulting from structural variation contributes directly to cancer cell survival, and CRISPR–Cas9 dependency provides significantly more insight into essential gene expression changes that often do not manifest as pan-cancer genome-scale overexpression, but still confer a fitness advantage.
We leveraged WGS data to demonstrate that fusions could arise from simple structural variants directly contributing to collateral dependencies, or alternatively be proxies for more complex rearrangements contributing to the development of collateral dependencies. Although in the latter case, the precise role of the fusion in the development of the collateral dependency was more difficult to define, we reasoned that fusions still contributed meaningfully to cancer cell survival in many of these instances because the enrichment of collateral genes among dependencies was greatest for fusions as compared with other SVs.
We showed that specific fusion-associated dependencies had biological and clinical relevance. The FOXR1 fusions were associated with dependency on FOXR1 in different cancer cell contexts; we validated these dependencies in cell lines and demonstrated that FOXR1 fusions also occur in a subset of clinical samples. Similarly FGFR3, a targetable kinase, was the key dependency in t(4;14) multiple myeloma cell lines that harbored an IgH–NSD2 fusion. We also showed that the implications of fusion-associated dependencies extended beyond two-dimensional cell line space, exemplified by dependency on EML4 in the context of a THADA–MTA3 fusion persisting in spheroid models.
There were limitations in our methodology. By focusing our analysis on fusions, applying standardized TAD boundaries to account for variability across cancer cell lines, and relying on loss-of-function screening, we likely underestimated the total impact of structural variants on cancer cell survival. Our genome-scale screen, which relied on a modified t test approach, was inherently limited by the non-Gaussian distribution of dependency probability scores and small numbers of cell lines with any given fusion. We addressed this limitation through an alternative permutation-based identification of fusion-associated dependencies and found substantial overlap in these two approaches, but also differences to suggest that some of our fusion-associated dependencies were more likely to be false positives. Comparison of these approaches also showed that limiting hypotheses to partners or collateral genes increased the discovery of fusion-associated dependencies, suggesting our genome-scale approach may have underestimated how frequently fusions create partner and collateral dependencies.
Regarding some unbalanced rearrangements, sgRNA location relative to fusion breakpoints failed to capture what would likely be true dependency on a partner gene, reducing the sensitivity of CRISPR in identifying fusion partner genes important to cancer cell survival. In other cases, despite sgRNA location off of a fusion transcript, fusion partners still screened as dependencies. We reasoned that these examples may represent an alternate mechanism by which a translocation may contribute to a partner gene becoming a dependency (interruption of one allele through involvement in a fusion may lead to the contralateral allele becoming essential for cell survival) or that they are balanced translocations and the reciprocal transcripts were simply not detected (Supplementary Table S17; Supplementary Figs. S21A and S21B). We sought to understand whether there were idiosyncratic effects of CRISPR–Cas9 that could lead to the creation of false-positive fusion-associated dependencies. Despite the high rate of copy-number amplification among fusion-associated dependencies, we reasoned that because copy-number correction through the CERES algorithm was incorporated into dependency probability scores, a nonspecific copy number effect was unlikely to be the primary explanation for most fusion-associated dependencies. We also considered whether CRISPR–Cas9 would differentially identify false-positive fusion-associated dependencies at fragile sites and found that only two of 363 fusions with associated dependencies had a partner in a known fragile site and that none of the dependencies themselves were located in fragile sites (59). We concluded that the rate of false-positive fusion-associated dependencies was likely to be low.
Broadly, our research provides new insight into how fusions contribute to fitness in different cancer contexts going beyond their straightforward partner-gene activation events, demonstrating that some of the identified partner and collateral dependencies may have direct implications for clinical care. Future studies are needed for further experimental validation of the regulatory elements involved in the fusion-associated dependencies identified in this work. As WGS of cancer cell lines increases, we can broaden the scope of our approach to more fully characterize the impact of structural variation on cancer cell survival.
R. Gillani reports grants from NIH during the conduct of the study. B.K.A. Seong reports grants from Department of Defense (CA181249) during the conduct of the study. N.V. Dharia reports grants and other support from St. Baldrick's Foundation during the conduct of the study and other support from Genentech, Inc., outside the submitted work. M.X. He reports grants from NIH and grants from NSF during the conduct of the study and personal fees from Amplify Medicines / Ikena Oncology outside the submitted work. J.S. Boehm reports grants from Cancer Dependency Map Consortium during the conduct of the study. F. Vazquez reports grants from Novo Ventures and grants from Dependency Map Consortium outside the submitted work. J.M. McFarland reports other support from the Dependency Map Consortium during the conduct of the study and other support from the Dependency Map Consortium outside the submitted work. K. Stegmaier reports grants from NIH during the conduct of the study, grants from Novartis, personal fees from Rigel Pharmaceuticals, Kronos Bio, and AstraZeneca, and other support from Auron Therapeutics outside the submitted work. E.M. Van Allen reports personal fees from Tango Therapeutics, Genome Medical, Invitae, Monte Rosa Therapeutics, Manifold Bio, Illumina, Enara Bio, and Janssen; grants from Novartis and BMS outside the submitted work; and also has institutional patents filed on chromatin mutations and immunotherapy response, and methods for clinical interpretation pending. No disclosures were reported by the other authors.
R. Gillani: Conceptualization, formal analysis, investigation, writing–original draft, writing–review and editing. B.K.A. Seong: Validation, writing–original draft. J. Crowdis: Conceptualization, visualization, methodology. J.R. Conway: Conceptualization, visualization, methodology, writing–review and editing. N.V. Dharia: Conceptualization, methodology, writing–review and editing. S. Alimohamed: Conceptualization, visualization, methodology. B.J. Haas: Conceptualization, methodology. K. Han: Resources, data curation, validation. J. Park: Conceptualization, writing–review and editing. F. Dietlein: Conceptualization, methodology. M.X. He: Conceptualization, methodology. A. Imamovic: Conceptualization, methodology. C. Ma: Conceptualization, methodology. M.C. Bassik: Resources, data curation, validation. J.S. Boehm: Conceptualization, methodology. F. Vazquez: Conceptualization, methodology. A. Gusev: Conceptualization, methodology. D. Liu: Methodology, writing–original draft. K.A. Janeway: Conceptualization, methodology. J.M. McFarland: Conceptualization, methodology, writing–review and editing. K. Stegmaier: Conceptualization, validation, methodology, writing–review and editing. E.M. Van Allen: Conceptualization, supervision, investigation, methodology, writing–original draft, writing–review and editing.
This work was supported by a Research Training Grant in Pediatric Oncology NIH T32 CA136432 11 (to R. Gillani), NIH R37 CA222574 (to E.M. Van Allen), R01 CA227388 (to E.M. Van Allen), U01 CA233100 (to E.M. Van Allen), Innovation in Cancer Informatics Award (to E.M. Van Allen), NIH 5R35 CA210030 (to K. Stegmaier), Department of Defense PRCRP Horizon Award CA181249 (to B.K.A. Seong), and Julia's Legacy of Hope St. Baldrick's Foundation Fellowship (to N.V. Dharia). The authors thank K. Salari, M. Meyerson, B. Crompton, and L. Guenther for helpful feedback on analysis and findings.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.