The landscape of structural variants (SV) in multiple myeloma remains poorly understood. Here, we performed comprehensive analysis of SVs in a large cohort of 752 patients with multiple myeloma by low-coverage long-insert whole-genome sequencing. We identified 68 SV hotspots involving 17 new candidate driver genes, including the therapeutic targets BCMA (TNFRSF17), SLAM7, and MCL1. Catastrophic complex rearrangements termed chromothripsis were present in 24% of patients and independently associated with poor clinical outcomes. Templated insertions were the second most frequent complex event (19%), mostly involved in super-enhancer hijacking and activation of oncogenes such as CCND1 and MYC. Importantly, in 31% of patients, two or more seemingly independent putative driver events were caused by a single structural event, demonstrating that the complex genomic landscape of multiple myeloma can be acquired through few key events during tumor evolutionary history. Overall, this study reveals the critical role of SVs in multiple myeloma pathogenesis.
Previous genomic studies in multiple myeloma have largely focused on single-nucleotide variants, recurrent copy-number alterations, and recurrent translocations. Here, we demonstrate the crucial role of SVs and complex events in the development of multiple myeloma and highlight the importance of whole-genome sequencing to decipher its genomic complexity.
See related commentary by Bergsagel and Kuehl, p. 221.
This article is highlighted in the In This Issue feature, p. 215
Whole-genome sequencing (WGS) studies have demonstrated the importance of structural variants (SV) in the initiation and progression of many cancers (1–8). Functional implications of SVs include gene dosage effects from gain or loss of chromosomal material, gene regulatory effects such as super-enhancer hijacking, and gene fusions (9). The basic unit of SVs are pairs of breakpoints, classified as either deletion, tandem duplication, translocation, or inversion, which can manifest as simple events or form complex patterns where multiple SVs are acquired together, often involving multiple chromosomes (1–8, 10).
In multiple myeloma, previous studies of SVs have had a narrow scope, usually limited to recurrent translocations involving MYC or the immunoglobulin loci (i.e., IGH, IGL, and IGK; refs. 11–17). The vast majority of established genomic drivers in multiple myeloma are single-nucleotide variants (SNV) and copy-number alterations (CNA), identified by whole-exome sequencing and array-based approaches (18–22). However, important aspects of tumor biology and evolution remain poorly explained by known genomic drivers, such as progression from precursor stages to active multiple myeloma and the development of drug resistance (12, 23–25).
We recently reported the first comprehensive study of SVs in multiple myeloma by WGS of sequential samples from 30 patients (21). Despite the limited sample set and the absence of gene expression data, our findings indicated that SVs are a key missing piece to understand the driver landscape of multiple myeloma. Of particular interest, we found a high prevalence of three main classes of complex SVs: chromothripsis, templated insertions, and chromoplexy (21). In chromothripsis, chromosomal shattering and random rejoining results in a pattern of tens to hundreds of breakpoints with oscillating copy number across one or more chromosomes (Fig. 1A and B; ref. 26). Templated insertions are characterized by focal gains bounded by translocations, resulting in concatenation of amplified segments from two or more chromosomes into a continuous stretch of DNA, which is inserted back into any of the involved chromosomes (Fig. 1C and D; refs. 4, 21). Chromoplexy similarly connects segments from multiple chromosomes, but the local footprint is characterized by copy-number loss (Fig. 1E and F; ref. 27). Importantly, these complex SVs represent large-scale genomic alterations acquired by the cancer cell at a single point in time, potentially involving multiple drivers and shaping subsequent tumor evolution (2, 27).
Here, we comprehensively characterized the role of genome-wide SVs in 752 patients with multiple myeloma, revealing novel SV hotspots, rare SVs with strong impact on gene expression, and complex events simultaneously causing multiple drivers.
Genome-Wide Landscape of Structural Variation in Multiple Myeloma
To define the landscape of simple and complex SVs in multiple myeloma, we investigated 752 newly diagnosed patients from the CoMMpass study (NCT01454297; IA13) who underwent low-coverage long-insert WGS (median 4–8X) and whole-exome sequencing (Supplementary Table S1; Supplementary Methods). RNA sequencing was also available from 591 patients (78.6%). For each patient sample, we integrated the genome-wide somatic copy-number profile with SV data and assigned each pair of SV breakpoints as either simple or part of a complex event according to the three main classes previously identified in multiple myeloma (Fig. 1; Methods; ref. 21). Templated insertions involving more than two chromosomes were considered complex. Events involving more than three breakpoint pairs that did not fulfill the criteria for a specific class of complex event were classified as unspecified “complex” (21).
Our final SV catalog was obtained by integrating two SV calling algorithms, DELLY (10) and Manta (28), followed by a series of quality filters. First, we included all SVs called and passed by both callers. Then SVs called by a single caller were included in specific circumstances: (i) SVs supporting copy-number junctions, (ii) reciprocal translocations, and (iii) translocations involving an immunoglobulin locus (i.e., IGH, IGK, or IGL; Supplementary Methods). Using the final SV catalog, long-insert low-coverage WGS revealed a sensitivity of 91% to 92% and specificity of 97% to identify translocations involving IGH and the most common canonical drivers CCND1 and WHSC1/MMSET. Recalculating performance metrics for canonical IGH translocations using the same SV filtering criteria genome wide (i.e., without the relaxed quality requirements for immunoglobulin translocations), we observed no changes in specificity, and sensitivity of 91% for IGH-CCND1 (identical as before) and 88% for IGH-WHSC1/MMSET (down from 92%). Overall, these performance metrics were similar to what was recently described by the PCAWG consortium using standard-coverage short-read WGS (Supplementary Methods; refs. 4, 7). Furthermore, the genome-wide distribution of SV breakpoints in the low-coverage WGS series corresponded with recent pan-cancer and myeloma genomes studies, showing enrichment in regions of early replication, accessible chromatin, and active enhancer regions as defined by histone H3K27 acetylation (H3K27ac; Supplementary Fig. S1; Methods; refs. 4, 7, 21). Taken together, this suggests that low-coverage long-insert WGS provides a representative view of the SV landscape.
We identified a median of 16 SVs per patient [interquartile range (IQR) 8–32; Fig. 2A]. Chromothripsis, chromoplexy, and templated insertions involving >2 chromosomes were observed in 24%, 11%, and 19% of patients, respectively, confirming previous observations (21); 38% of patients had an unspecified complex event. One or more complex events were identified in 63% of patients (median 1; range 0–11).
In patients with newly diagnosed multiple myeloma, different SV classes showed distinct patterns of co-occurrence, mutual exclusivity, and association with recurrent molecular alterations (Fig. 2A–C). Templated insertions showed a particularly striking pattern of positive correlation with single tandem duplications (Spearman ρ = 0.55, P < 2.2 × 10−16) and negative correlation with most other SV classes (Fig. 2B). Templated insertions and single tandem duplications were both strongly enriched in patients with hyperdiploidy and MYC alterations (Fig. 2C). Chromothripsis accounted for the greatest proportion of SVs among all classes (33%), and the burden of chromothripsis SVs in each patient correlated with the number of single deletions, inversions, and unspecified complex events (Fig. 2B). Presence of chromothripsis or a deletion burden in the 4th quartile showed striking associations with known high-risk molecular features in multiple myeloma, including primary translocations of IGH with MMSET, MAF, or MAFB; high APOBEC mutational burden, and most of the recurrent aneuploidies (Fig. 2C; refs. 19, 29, 30). The strongest association was observed between chromothripsis and biallelic inactivation of TP53 [OR 6.6; 95% confidence interval (CI), 2.7–17.15, P = 4.84 × 10−6]. Chromothripsis was previously reported as a rare event in 10 of 764 patients with multiple myeloma (1.3%) using array-based copy-number analysis, and half of these patients relapsed within 10 months (31). Despite the 18-fold higher prevalence in our WGS data, the presence of chromothripsis was associated with poor clinical outcomes and retained its significance after adjustment for established clinical and molecular risk factors, in terms of both progression-free survival (PFS; adjusted HR = 1.42; 95% CI, 1.08–1.87; P = 0.014) and overall survival (OS; adjusted HR = 1.81; 95% CI, 1.23–2.65; P = 0.002; Fig. 2D–F; Supplementary Fig. S2; Methods).
Structural Basis of Recurrent Translocations and Copy-Number Alterations
To define the structural basis of canonical translocations in multiple myeloma, we identified all translocation-type events (single and complex) with one or more breakpoints involving the immunoglobulin loci (i.e., IGH, IGK, and IGL) or canonical IGH partners (e.g., CCND1, MMSET, and MYC; Fig. 3A and B; ref. 18). Templated insertions emerged as the cause of CCND1 and MYC translocations in 26% and 72% of cases, respectively (Fig. 3A). In line with its mechanism of connecting and amplifying distant genomic segments, oncogenes, and regulatory regions (e.g., super-enhancers), templated insertions of CCND1 and MYC were associated with focal amplification in 81% and 98% of cases, respectively, and involved more than two chromosomes in 42% and 44% of cases. Complex SVs involving MYC were first described in 2000 (32), including insertions of the MYC gene into a partner locus or insertion of partner loci near MYC, consistent with the current definition of templated insertions (4). Although rare, we also found examples of chromothripsis and chromoplexy underlying canonical IGH translocations, resulting in overexpression of the partner gene consistent with a driver event (Fig. 3B).
Next we went to investigate the prevalence and landscape of rare noncanonical IGH translocation partners. These events were first described in the 1990s (17), but data from a large and uniformly analyzed series have been lacking. Considering the 591 patients in our study with WGS and RNA sequencing, where aberrant gene expression could be confirmed, 31 patients (5.2%) had translocations involving at least one immunoglobulin locus (IGH = 19, IGL = 12 and IGK = 1) and a noncanonical oncogene partner, most of which were key regulators of B-cell development (e.g., PAX5 and CD40; Fig. 3B; refs. 33, 34). Noncanonical IGH translocations most commonly occurred in patients without another primary IGH translocation (15 of 19 patients, 79%), raising the possibility of noncanonical disease-initiating events. Of these, translocations involving MAP3K14 had similar prevalence (1%) as those involving CCND2 (0.8%) and MAFA (0.5%) and, which are considered among established initiating events, and showed a similar breakpoint distribution in the IGH class-switch recombination regions (Fig. 3B; Supplementary Fig. S3). Taken together, we show that different mechanisms of SV converge to aberrantly activate key driver genes in multiple myeloma, including rare events potentially involved in cancer initiation.
Next, we addressed the structural basis of recurrent CNAs (Supplementary Table S2). Aneuploidies involving a whole chromosome arm were most common (56% of 2,889 events). Among intrachromosomal CNAs, 83% could be attributed to a specific SV (Fig. 3C). There was considerable variation in the proportion and class of SVs causing gains and losses between different loci, indicating the presence of distinct underlying mechanisms being active at these sites (Fig. 3C). Highlighting the importance of complex SVs in shaping the multiple myeloma genome, 47% of all chromothripsis events resulted in the acquisition of at least one recurrent driver CNA (n = 116); the corresponding numbers for chromoplexy and templated insertions involving >2 chromosomes were 44% (n = 43) and 21.7% (n = 46), respectively.
SVs may exert their effect through altered gene dosage (i.e., the number of copies of a gene) or through indirect mechanisms such as the well-known super-enhancer hijacking involving the immunoglobulin loci (Fig. 3B; refs. 14, 35). To quantify the effect of SVs on gene expression independently from copy number, we fit a multivariate linear regression model including all expressed genes on autosomes from all patients (Fig. 3D; Supplementary Methods; ref. 36). Structural events involving immunoglobulin loci were excluded. As expected, copy number had the strongest average effect, with an increase or decrease in expression Z-score of 0.31 for each gain or loss of a gene copy (P < 2.2 × 10−16). Nonetheless, all SV classes showed significant gene expression effects independent from copy number, and these effects were in the direction expected from the consequences of each SV class (36, 37). Chromothripsis is associated with both gain and loss of function (38), and the presence of high-level gains causing outlier gene expression may have skewed our model estimates. However, chromothripsis maintained a positive effect on gene expression when limiting our analysis to genes with less than four copies (estimate = 0.11, P < 2.2 × 10−16; Supplementary Fig. S4). Although the specific implications of individual SVs may be difficult to predict, these data demonstrate that the average effects of SVs on gene expression are considerable.
Hotspots of Structural Variation
Twenty recurrently translocated regions have been previously reported in multiple myeloma, defined by a translocation prevalence of >2% within 1 Mb bins across the genome (11). These included the canonical immunoglobulin translocations, as well as MYC and recurrent partners, such as BMP6/TXNDC5, FOXO3, and FAM46C (11, 14, 39). We were motivated to expand the known catalog of genomic loci where SVs play a driver role in multiple myeloma and are therefore positively selected (i.e., SV hotspots), considering all classes of single and complex SVs. To accomplish this, we applied the Piecewise Constant Fitting (PCF) algorithm, comparing the local SV breakpoint density to an empirical background model, accounting for the propensity of complex SVs to introduce large numbers of clustered breakpoints (Methods; Supplementary Methods; Supplementary Data S1; refs. 3, 40). Overall, we identified 68 SV hotspots after excluding the immunoglobulin loci (i.e., IGH, IGL, and IGK) and five known fragile sites that are prone to focal deletions (e.g., FHIT, CCSER1, and PTPRD; Figs. 4A–D and 5; Supplementary Table S3). Fifty-three SV hotspots had not been previously reported in multiple myeloma. Two of the previously reported regions of recurrent translocation were not recapitulated by our hotspot analysis: 19p13.3 and the known oncogene MAFB on 20q12. This may be explained by the behavior of the PCF algorithm, which favors the identification of loci where breakpoints are tightly clustered compared with neighboring regions as well as the expected background. Indeed, SVs involving MAFB and 19p13.3 were identified in 1.5% and 8.1% of patients, but the breakpoints did not form distinct clusters (Supplementary Fig. S5). While MAFB is an established driver that was missed by our analysis, the implications of SVs involving 19p13.3 are unclear.
Given that SVs and CNAs reflect the same genomic events, we hypothesized that functionally important SV hotspots would be associated with a cluster of CNAs (4). We therefore performed independent discovery of driver CNAs using Genomic Identification of Significant Targets in Cancer (GISTIC; ref. 41). This algorithm identifies peaks of copy number gain or loss containing driver genes and/or regulatory elements based on the frequency and amplitude of observed CNAs (Fig. 4A; Supplementary Tables S4 and S5). In addition, we generated cumulative copy number profiles for the patients involved by SV at each hotspot. Finally, we evaluated the impact of SV hotspots on the expression of nearby genes (Supplementary Table S6) and the presence of oncogene fusion transcripts. By integrating SV, CNA, and expression data, we went on to determine the most likely consequence of each hotspot in terms of gain of function, loss of function, and potential involvement of driver genes and regulatory elements. Individual SVs within hotspots were considered as likely driver events if their functional implications corresponded to the putative driver role of that hotspot (i.e., gain or loss of function); SVs with incongruous effects were removed as likely passenger events (Supplementary Methods).
Gain-of-function hotspots (n = 49) were defined by clustered SVs associated with copy number gains as well as translocation-type events with or without oncogene fusions (Figs. 4 and 5; Supplementary Fig. S6; Supplementary Table S3). There was a strong tendency for templated insertions and tandem duplications to co-occur (Spearman ρ = 0.71, P = 1.56 × 10−8) across hotspots, with a similar pattern being observed genome wide (ρ = 0.57, P < 2.2 × 10−16), supporting a strong association between these events. Strikingly, gain-of-function hotspots showed 8.4-fold enrichment of super-enhancers as compared with the remaining mappable genome (2.5 vs. 0.3 super-enhancers per Mb, Poisson test P < 2.2 × 10−16) and 10.5-fold enrichment of transcription factors involved in key regulatory networks in multiple myeloma (Poisson test P = 1.64 × 10−8; ref. 42). Among gain-of-function hotspots, 16 were associated with well-defined myeloma oncogenes (e.g., WHSC1/MMSET, CCND1, IRF4, and MAP3K14; refs. 11, 18) and 17 involved a novel putative driver gene. Of particular interest, TNFRSF17 was involved by SVs in 2.5% of patients (n = 19) and encodes B-Cell Maturation Antigen (BCMA), a therapeutic target of chimeric antigen receptor T-cells (CAR-T), as well as monoclonal and bispecific antibodies (Fig. 4B; refs. 43, 44). Furthermore, we report two novel SV hotspots on chromosome 1q23 involving putative driver genes with therapeutic implications: SLAMF7 (involved by SV in 2.8%, n = 21), target of the mAb elotuzumab (Fig. 4C; refs. 43, 45), and MCL1 (3%, n = 23), an apoptotic regulator implicated in resistance to the BCL2 inhibitor venetoclax (46, 47) and a promising therapeutic target in its own right (48). Additional novel putative driver genes were BTG2, CCR2, PRKCD, FBXW7, IRF2, NRG2/UBE2D2, SOX30, NEDD9, GLCCI1, TBXAS1/HIPK2, POU2AF1, KLF13, USP3/HERC1, and TNFRSF13B. We also confirmed previous reports that virtually all SVs involving MYC resulted in its overexpression, including deletions and inversions acting to reposition MYC next to the super-enhancers of NSMCE2 roughly 2 Mb upstream (16).
Loss-of-function hotspots (n = 19) were defined by SVs causing copy number loss, most commonly single deletions, but also inversions and complex SVs (Figs. 4 and 5; Supplementary Fig. S6; Supplementary Table S3). We identified loss of 12 known tumor suppressor genes in multiple myeloma, including CDKN2C, SP3, SP140, RPL5, and CYLD. FAM46C stood out as involved by both SVs causing copy-number loss and translocation-type events, which sometimes resulted in gene fusions. This is consistent with its known role as a tumor suppressor while also serving as a target for super-enhancer hijacking (39, 49).
Taken together, we identified 29 SV hotspots involving genes with established tumor suppressor or oncogene function in multiple myeloma; 17 additional hotspots, all gain of function, involved novel putative driver genes (Supplementary Table S3; all putative driver hotspots are shown in Fig. 4B–D; Supplementary Fig. S6; individual patient summary is in Supplementary Table S7; ref. 50).
Each patient had a median of two hotspots involved by a putative driver SV (IQR 1–3), and the number of hotspots involved was strongly associated with the overall SV burden (Spearman ρ = 0.46; P < 2.2 × 10−16). This association became even stronger when SV breakpoints associated with a single event were considered together (Spearman ρ = 0.51; P < 2.2 × 10−16). Reanalyzing published data from tandem duplication hotspots in breast cancer revealed similar results (Spearman ρ of 0.7 and 0.62 for rearrangement signatures 1 and 3, respectively; P < 2.2 × 10−16; Supplementary Fig. S7A–S7C; ref. 40). Extending this observation beyond SVs, there was a strong correlation between SNV burden in multiple myeloma and the number of SNVs in known driver genes (refs. 20, 21; ρ = 0.38, P < 2.2 × 10−16), which remained significant when restricting the analysis to established SNV hotspots within driver genes (ρ = 0.11, P = 0.001). These data indicate that genomic drivers continue to accumulate and provide selective advantage through the disease course despite multiple drivers already being present, consistent with our recent observations from reconstructing the timeline of driver acquisition in multiple myeloma (21, 51, 52) and multiregion WGS performed at autopsy (53).
Templated Insertions and Chromothripsis Exemplify Highly Clustered versus Chaotic Breakpoint Patterns
SVs of different classes showed different propensities to form hotspots. Templated insertion breakpoints were the most likely to be in a hotspot (logistic regression OR 4.04; 95% CI, 3.65–4.49, P < 2.2 × 10−16), with chromothripsis breakpoints being the least likely (OR 0.48; 95% CI, 0.43–0.54; P < 2.2 × 10−16; Fig. 6A and B). This difference remained when considering structural events instead of individual breakpoints, with 66% of 544 templated insertions involving one or more hotspot versus 43% of 244 chromothripsis events (Fisher test OR, 2.66; 95% CI, 1.91–3.65; P = 7.14 × 10−10) despite the vastly higher complexity of chromothripsis events as compared with templated insertions (median 17 vs. 2 breakpoint pairs in each event, Wilcoxon test P < 2.2 × 10−16).
The differences between templated insertions and chromothripsis could be clearly illustrated by the genome-wide distribution of breakpoints and association with number changes (Fig. 6A). Templated insertions were associated mainly with focal copy-number gain in 80.1% of cases (95% CI, 78%–82%) and only rarely with copy-number losses (5.6%; 95% CI, 4.6%–6.7%). Gains were almost exclusively single copy (92.3% of 1,317 gains), highlighting the stability of these events. In contrast, an important feature of chromothripsis is its ability to cause both gain and loss of function as part of the same event (38). Indeed, the breakpoints of chromothripsis were associated with chromosomal loss in 53.8% of cases (95% CI, 52%–55.6%) and gain in 37.6% (95% CI, 36%–39.4%). Templated insertions were predominantly associated with gain of a single copy (Fisher test OR 2.25 vs. chromothripsis; P < 2.2 × 10−16), while chromothripsis dominated for gains of two (OR 1.7, P = 7.07 × 10−5), 3 (OR 13.9, P < 2.03 × 10−14) or more than three copies (OR 40.7, P < 2.2 × 10−16; Fig. 6C). The probability that focal gains involved a multiple myeloma super-enhancer was highest when associated with templated insertions (55%, logistic regression OR 2.76; P < 2.2 × 10−16) and lowest when associated with chromothripsis (21%, logistic regression OR 0.61; P = 7.43 × 10−5; Fig. 6D). In contrast to solid tumors, where chromothripsis may result acquisition of >50 copies (1, 4, 6, 54), we observed no segments with more than nine copies in this series (Fig. 1B).
Consistent with widely different underlying mechanisms, the genome-wide distribution of templated insertion breakpoints could be predicted from genomic features such as active enhancer regions, replication time, and open chromatin, but this was not the case for chromothripsis (Supplementary Methods). To test whether the clustered nature of templated insertion breakpoints can be explained solely by the local sequence context (e.g., active enhancers) or constitute real hotspots subjected to positive selection, we repeated our PCF-based hotspot analysis for templated insertions alone. Despite the considerably lower power of this analysis as compared to the combined analysis presented above, 75% of hotspots containing six or more templated insertions were confirmed (21 out of 28), including novel putative drivers such as FBXW7 and TNFRSF17 (BCMA; Fig. 6A; Supplementary Table S8).
Because the distribution of chromothripsis breakpoints did not follow a predictable pattern, we performed separate hotspot analysis searching for regions that violated the assumption of a uniform distribution across the genome. In contrast to templated insertions, where hotspots were strongly clustered on key driver genes and regulatory regions, hotspots of chromothripsis were much larger, spanning from a few to tens of megabases (Fig. 6A; Supplementary Table S9). This is consistent with mechanisms where templated insertions exert gene regulatory effects disproportionate to the level of copy-number gain, while the effects of chromothripsis manifest as large deletions involving recurrent regions as well as high-level amplifications and local regulatory effects (Fig. 3C and D).
Multiple Driver Alterations Caused by the Same Structural Event
In 31% of patients (n = 235), two or more seemingly independent recurrent CNAs or putative driver translocations were caused by the same SV (Fig. 7A and B). The most common event classes were templated insertions causing chains of gain-of-function events in 12.7% of patients, most commonly including MYC. Chromothripsis caused two or more driver alterations in 7.2% of patients, commonly involving large deletions as well as translocation and/or amplification of oncogenes. Unbalanced translocations simultaneously causing oncogene translocations and large deletions involving tumor suppressor genes were identified in 4.4% of patients. Notably, 12 patients with canonical IGH-MMSET translocations had large deletions of 14q caused by the same unbalanced translocation, including TRAF3 (14q32) and often MAX (14q23; n = 10), contributing to the known association between these events (21). Overall, SVs represents a recurrent mechanism for tumors to acquire multiple drivers simultaneously, demonstrating that the full genomic landscape of multiple myeloma can be acquired through a few key events during tumor evolutionary history (27).
We describe the first comprehensive analysis of SVs in a large series of multiple myeloma patients with paired WGS and RNA sequencing. Previous studies of SVs in multiple myeloma have focused on translocations without consideration of complex events (11, 15, 55, 56), and our previous WGS study of 30 patients lacked both the expression correlate and the power to perform comprehensive driver discovery (21). Here, applying a robust statistical approach (40), we identified 68 SV hotspots, 53 of which have not previously been reported. Integrated analysis of copy-number changes, gene expression and the distribution of SV breakpoints revealed 17 new potential driver genes, including the emerging therapeutic targets TNFRSF17 (BCMA; refs. 43, 44), SLAMF7 (43, 45) and MCL1 (48), the latter of which has also been implicated in resistance to the BCL2 inhibitor venetoclax (46). With all of these targets either currently or imminently in clinical use, it will be of great clinical importance to determine the impact of these genomic alterations as predictive biomarkers for treatment response.
From a pan-cancer perspective, the SV landscape of multiple myeloma is characterized by a lower SV burden and less genomic complexity than in many solid tumors (1, 4, 7). For example, we did not find any classical double minute chromosomes with tens to hundreds of amplified copies or any of the recently proposed complex SV classes pyrgo, rigma, and tyfonas (1). Nonetheless, we found that complex SVs play a crucial role in shaping the genome of patients with multiple myeloma, most importantly chromothripsis, chromoplexy, and templated insertions. A common feature of these SV classes is simultaneously deregulating multiple driver genes as part of a single event. Such multi-driver events are of particular importance in myeloma progression, as they can provide an explanation for the rapid changes in clinical behavior that are frequently seen in the clinic (23). In myeloma precursor disease, understanding these evolutionary patterns will be crucial for early diagnosis of those patients who are on a trajectory toward progression and may benefit from early intervention (23).
Of immediate translational relevance, chromothripsis emerged as a strong independent predictor for high-risk disease, detectable in 24% of newly diagnosed patients by WGS, providing a rationale for the inclusion of chromothripsis in clinical risk scores. The prevalence of chromothripsis in multiple myeloma is higher than what has been reported in previous studies likely for two reasons: (i) use of WGS resolution able to integrate SV and CNA data, and (ii) applying the most updated criteria to define chromothripsis (4, 21, 57).
The use of low-coverage long-insert WGS is a potential limitation of this study. We have applied extensive quality control measures to ensure specificity of our SV calls but may have overlooked a fraction of real SVs, particularly those present at the subclonal level. Thus, the results reported in this study will be skewed toward events acquired in the early phases of tumor evolutionary history, driving tumor initiation and progression, and going on to be present in the dominant tumor clone at diagnosis. Future studies using higher coverage WGS may reveal greater SV burden and additional hotspots, including subclonal events that may be selected at relapse.
Gene deregulation by SV is a major contributor to the biology of multiple myeloma, constituting a hallmark feature of its genome. For decades, the defining features of multiple myeloma pathogenesis and heterogeneity has been hijacking of the IGH super-enhancers to oncogenes such as CCND1 and MMSET. Our findings reveal how simple and complex SVs shape the driver landscape of multiple myeloma, with events ranging from common CNAs and canonical translocations to a large number of SV hotspots. These results focus attention on the importance of SVs in multiple myeloma and on the use of WGS analyses to fully understand its driver landscape and identify novel therapeutic targets.
Patients and Somatic Variant Calling
We analyzed data from 752 patients with newly diagnosed multiple myeloma enrolled in the CoMMpass study (NCT01454297; IA13). Each sample underwent low-coverage long-insert WGS (median 4–8X) and whole-exome sequencing. The median physical coverage was 39 (5th percentile = 28 and 95th percentile = 53). The median insert size was 852 bp (5th percentile = 701 and 95th percentile = 949). Paired-end reads were aligned to the human reference genome (HRCh37) using the Burrows Wheeler Aligner (BWA; v0.7.8). SV calling was performed using DELLY (v0.7.6; ref. 10) and Manta (v.1.5.0; ref. 28) Similarly to recent PCAWG articles, we developed a filtering process based on different criteria (see Results and Supplementary Methods; ref. 7).
tCoNuT was used to call CNAs (https://github.com/tgen/MMRF_CoMMpass/tree/master/tCoNut_COMMPASS). To externally validate the tCoNuT workflow, we compared our results to copy-number profiles generated using controlFREEC (Supplementary Methods; refs. 20, 58). The final catalog of high-confidence SVs was obtained by integrating DELLY and Manta calls with copy-number data and applying a series of quality filters (Supplementary Methods).
RNA-Sequencing Analysis and Fusion Calling
RNA sequencing of 591 samples was performed to a target coverage of 100 million reads. Paired-end reads were aligned to the human reference genome (HRCh37) using STAR v2.3.1z (59). Transcript per million (TPM) gene expression values were obtained using Salmon v7.2 (60). For fusion calling, we employed TopHat2 v2.0.11 with the TopHat-Fusion-post module (61).
Classification of SVs
Each pair of SV breakpoints (i.e., deletion, tandem duplication, inversion, or translocation) was classified as a single event or as part of a complex event (i.e., chromothripsis, chromoplexy, or unspecified complex), as described previously (4, 21).
Translocation-type events were classified as single when involving no more than two breakpoint pairs and two chromosomes, subdivided into reciprocal translocations, unbalanced translocations, templated insertion, or unspecified translocation, as described previously (4, 21). Templated insertions could be either simple or complex, depending on the number of breakpoints and chromosomes involved, but was always defined by translocations associated with copy-number gain. Chromothripsis was defined by the presence of 10 or more interconnected SV breakpoint pairs associated with: (i) clustering of breakpoints, (ii) randomness of DNA fragment joins, and (iii) randomness of DNA fragment order across one or more chromosomes (4, 26, 57). The thresholds of 10 breakpoints was imposed as a stringent criterion to avoid overestimating the prevalence of chromothripsis. Chromoplexy was defined by interconnected SV breakpoints across >2 chromosomes associated with copy-number loss. Patterns of three or more interconnected breakpoint pairs that did not fall into either of the above categories were classified as unspecified “complex” (21).
Mutational Signature Analysis
SNV calls from whole-exome sequencing were subjected to mutational signature fitting using the previously described R package mmsig (51, 52). High APOBEC mutational burden was defined by an absolute contribution of APOBEC mutations (mutational signatures 2 and 13) in the 4th quartile among patients with evidence of APOBEC activity (51).
Structural Basis for Recurrent CNAs in Multiple Myeloma
We applied the following workflow to determine the structural basis for each recurrent CNA in multiple myeloma (Supplementary Table S2). First, we identified in each patient every genomic segment involved by recurrent copy-number gain or loss. Gains were defined by total copy number >2; loss as a minor copy number = 0. Second, whole-arm events were defined when >90% of the arm had the same copy-number state. Third, for segments that did not involve the whole arm, we searched for SV breakpoints responsible for the CNA within 50 kb of the copy-number segment ends. Finally, and intrachromosomal CNAs without SV support were classified as unknown.
Gene Expression Effect of SV Involvement
We used multivariate linear regression to determine the independent effect of SV involvement on gene expression after accounting for the effect of gene dosage (i.e., copy number). All expressed genes on autosomes were included in the analysis, defined as genes with >0 TPM expression in >25% of patients and a median expression level of >1. Gene expression values then underwent Z-score normalization. Genes involved by SVs were defined separately for deletion/tandem duplication–type SVs and translocation/inversion types. For deletions and tandem duplications, genes were considered involved if overlapping the SV ± 10 kb. For translocations and inversions, genes within 1 Mb to either side of each breakpoint were considered involved. All single and complex SVs with one or more breakpoints within 1 Mb of either immunoglobulin loci were excluded to prevent the results from being dominated by the effects of immunoglobulin enhancers. Linear regression was performed for all patients and all genes pooled together, including the total copy number of each gene as a linear feature.
Copy-Number Changes Associated with SV Breakpoints
To determine the genome-wide footprint of copy-number changes resulting from SVs, we employed an “SV-centric” workflow, as opposed to the CNA-centric workflow described above. For each SV breakpoint, we searched for a change in copy number within 50 kb. If more than one CNA was identified, we selected the shortest segment. Deletion and amplification CNAs were defined as changes from the baseline of that chromosomal arm. As a baseline, we considered the average copy number of the 2 Mb closest to the telomere and centromere, respectively. This is important because deletions are often preceded by large gains, particularly in patients with hyperdiploidy (21). In those cases, we are interested in the relative change caused by deletion, not the total copy number of that segment (which may still be ≥2). We estimated the proportion of breakpoints associated with copy-number gain or loss across patients, collapsing the data in 2 Mb bins across the genome. Confidence intervals were estimated using bootstrapping and the quantile method. For the purposes of plotting (Figs. 4A and 6A), we divided the SV-associated CNAs into bins of 2 Mb. The resulting cumulative CNA plot shows the number of patients with an SV-associated deletion or amplification.
Hotspots of SV Breakpoints
To identify regions enriched for SV breakpoints, we employed the statistical framework of PCF. In principle, the PCF algorithm identifies regions where SVs are positively selected on the basis of enrichment of breakpoints with short inter-breakpoint distance compared with the expected background and surrounding regions. We used the computational workflow previously described by Glodzik and colleagues (40). In brief, negative binomial regression was applied to model local SV breakpoint rates under the null hypothesis (i.e., absence of selection), taking into account local features such as gene expression, replication time, nonmapping bases, and histone modifications. The PCF algorithm can define hotspots without the use of binning, based on a user-defined smoothing parameter and threshold of fold enrichment compared with the background. This allows identification of hotspots of widely different size, depending on the underlying biological processes. Moreover, there was no global threshold for the inter-breakpoint distance required to define a hotspot; instead, the genome was searched for local regions with higher than expected breakpoint density compared with local context and the background model. To avoid calling hotspots driven by highly clustered breakpoints in a few samples, we also set a minimum threshold of eight samples involved (∼1% of the cohort) to be considered hotspot, as reported previously (40). Despite this threshold, we found that complex SVs with tens to hundreds of breakpoints in a localized cluster (particularly chromothripsis) came to dominate the results. To account for this, we ran the PCF algorithm in two different ways: (i) considering all breakpoints of nonclustered SVs (simple classes and templated insertions) and (ii) including all SV classes but randomly downsampling the data to include only one breakpoint per 500 kb per patient. The random downsampling followed by PCF analysis was repeated 1,000 times, requiring >95% reproducibility to define a hotspot. Final output from both approaches was merged for downstream analysis.
The full SV hotspot analysis workflow is provided in Supplementary Data S1, drawing on generic analysis tools that we have made available on GitHub (https://github.com/evenrus/hotspots/tree/hotornot-mm). Additional details about nomination of SV hotspots by the PCF algorithm and downstream analysis are provided in Supplementary Methods.
Functional Classification of SV Hotspots
SV hotspots were classified on the basis of local copy number and gene expression data as gain of function or loss of function; hotspots without clear functional implications were removed.
Copy-number data were integrated from two complementary approaches. First, we applied the GISTIC v2.0 algorithm to identify wide peaks of enrichment for chromosomal amplification or deletion (FDR < 0.1), using standard settings (41). Second, we considered the cumulative copy-number profiles of each hotspot, considering only the patients with SV breakpoints within the region, looking for more subtle patterns of recurrent CNA that were not picked up in the genome-wide analysis.
To determine the effects of SV hotspot involvement on gene expression, we applied multivariate linear regression analysis for each gene within 500 kb to either side of a hotspot (36). Genes were considered involved by SV if there was an SV breakpoint within 100 kb to either side of the corresponding hotspot. All SV classes were considered together, and the expression level of each gene was adjusted for the total copy number of that gene in each patient. Genes differentially expressed at FDR < 0.1 were considered statistically significant.
Identification of Putative Driver Genes Involved by SV Hotspots
Multiple lines of evidence were considered to identify driver genes involved by SV hotspots. Evidence of a putative driver gene included: (i) involved by driver SNVs in multiple myeloma (20, 21), (ii) included in the COSMIC cancer gene census (https://cancer.sanger.ac.uk/census), (iii) designated as putative driver gene in The Cancer Genome Atlas (62–65), (iv) enrichment of SV breakpoints in or around the gene, (v) nearby peak of SV-related copy-number gain or loss, (vi) SV classes and recurrent copy-number changes corresponding to a known role of that gene in cancer (i.e., oncogene or tumor suppressor), and (vii) differential gene expression. Having identified candidate driver genes involved by SV hotspots, we reviewed the literature for evidence of a role in multiple myeloma (Supplementary Table S3).
Histone H3K27ac, Super-enhancers, and Multiple Myeloma Transcription Factor Networks
Active enhancer (H3K27ac) and super-enhancer data from primary multiple myeloma cells, as well as key gene regulatory networks in multiple myeloma, were obtained from Jin and colleagues (42). Enrichment of super-enhancers and key multiple myeloma transcription factors in hotspots was assessed using a Poisson test, comparing the density within 100 kb of hotspots with the remaining mappable genome.
Templated Insertion Hotspot Analysis
We developed an empirical background model for templated insertions, which strongly outperformed a random model to predict the genome-wide distribution of templated insertion breakpoints. We then performed PCF-based hotspot analysis for templated insertions alone, searching for regions of enrichment as compared with the templated insertion background model, as described above for nonclustered SVs.
Chromothripsis Hotspot Analysis
Empirical background models showed very poor ability to predict the distribution of chromothripsis breakpoints, as may be expected if DNA breaks in chromothripsis tend to be random. To identify regions enriched for chromothripsis, we applied the PCF algorithm with a uniform background, only adjusting for nonmapping bases.
Enrichment of SV Classes Within Hotspots
We used logistic regression analysis to determine the relative probability that breakpoints of different SV classes are located within 100 kb of a hotspot. Each breakpoint was considered individually. Single deletions were considered as the reference class, and results shown as OR with 95% CI.
SV Classes Associated with Copy-number Gains
To determine the SV classes associated with focal copy-number gains, we selected all copy-number segments smaller than 3 Mb with a total copy-number of >2 and a relative change of ≥1 from the baseline of that chromosome arm (as described above). Each copy-number segment was assigned to the associated SV class or as “No SV” if no breakpoints could be found within 50 kb.
Amplification of Multiple Myeloma Super-enhancers
To determine the relative probability of super-enhancer amplification associated with different SV classes, we applied multivariate logistic regression analysis. Focal copy-number gains were assigned as associated with a super-enhancer if one was found within 100 kb of the copy number segment. Copy-number segments were grouped according to the associated SV classes: templated insertion, tandem duplication, chromothripsis, other SV, or no SV. Gains associated with other SVs were used as the reference level, and copy number was included as a continuous variable. Results were provided as OR and 95% CI for each SV category, adjusted for the effect of copy number.
Multi-driver events were defined by the involvement of two or more independent driver copy-number segments and/or SV hotspots caused by the same simple or complex SV. For example, the association between MMSET and MAX/TRAF3 deletion was often an unbalanced translocation causing two deletions: the first involving MMSET and FGFR3 on chromosome 4p and the second involving the majority of chromosome 14q (including MAX/TRAF3). Each copy-number segment was counted only once, even if more than one driver was deleted or amplified.
Data and Software Availability
All the raw data used in the study are already publicly available [database of Genotypes and Phenotypes (dbGap): phs000748.v1.p1 and EGAS00001001178]. Analysis was carried out in R version 3.6.1. Unless otherwise specified, we used Wilcoxon rank-sum test to test for differences in continuous variables between two groups; Fisher exact test for 2 × 2 tables of categorical variables; and the Bonferroni–Holm method to adjust P values for multiple hypothesis testing. The full analytic workflow in R to identify hotspots of structural variants is provided in Supplementary Data S1. All other software tools used are publicly available.
Disclosure of Potential Conflicts of Interest
M. Hultcrantz reports other from Curio Science (consulting) and Intellisphere, LLC (consulting) outside the submitted work, as well as a patent for myTYPE NGS sequencing assay pending. A. Dogan reports personal fees from Takeda, Seattle Genetics, PER, EUSA Pharma; grants and personal fees from Roche; and nonfinancial support from AbbVie outside the submitted work. N. Bolli reports personal fees from Celgene, Janssen, and Amgen outside the submitted work. E. Papaemmanuil reports other from Isabl Inc (shareholder and CEO of Isabl Inc, a whole-genome analytics company that did not participate in the present study) outside the submitted work and is CEO and shareholder of Isabl Inc, a software company that offers cancer whole-genome sequencing analysis. K.C. Anderson reports personal fees from Bristol-Myers Squibb (advisory board), Millennium (advisory board), Sanofi (advisory board), Gilead (advisory board), Janssen (advisory board), and Precision Biosciences (advisory board) outside the submitted work and is the scientific founder of C4 Therapeutics and Oncopep. P. Moreau reports personal fees from Celgene, Janssen, Amgen, and AbbVie outside the submitted work. N.C. Munshi reports personal fees from Bristol-Myers Squibb, Janssen, Amgen, Takeda, Oncopep, AbbVie, and Karyopharm outside the submitted work. O. Landgren reports grants from Amgen and Janssen; other from Takeda (independent data monitoring committee), Janssen (independent data monitoring committee), and Theradex (independent data monitoring committee); and personal fees from Janssen (advisory board), Amgen (advisory board), Sanofi (advisory board), and Bristol-Myers Squibb (advisory board) outside the submitted work. No potential conflicts of interest were disclosed by the other authors.
One of the Editors-in-Chief is an author on this article. In keeping with the AACR's editorial policy, the peer review of this submission was managed by a member of Blood Cancer Discovery's Board of Scientific Editors, who rendered the final decision concerning acceptability.
E.H. Rustad: Data curation, software, formal analysis, supervision, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. V.D. Yellapantula: Data curation, software, formal analysis, validation. D. Glodzik: Data curation, software, formal analysis, validation. K.H. Maclachlan: Formal analysis, writing–review and editing. B. Diamond: Data curation, formal analysis, validation. E.M. Boyle: Formal analysis. C. Ashby: Formal analysis. P. Blaney: Formal analysis. G. Gundem: Supervision, methodology. M. Hultcrantz: Data curation. D. Leongamornlert: Data curation, formal analysis.N. Angelopoulos: Formal analysis, supervision, methodology. L. Agnelli: Data curation, formal analysis. D. Auclair: Resources, data curation. Y. Zhang: Data curation. A. Dogan: Data curation. N. Bolli: Data curation. E. Papaemmanuil: Data curation. K.C. Anderson: Data curation. P. Moreau: Data curation. H. Avet-Loiseau: Data curation. N.C. Munshi: Data curation. J.J. Keats: Data curation, formal analysis. P.J. Campbell: Data curation, formal analysis. G.J. Morgan: Data curation, supervision, writing–review and editing. O. Landgren: Conceptualization, resources, data curation, supervision, funding acquisition, visualization, writing–original draft, project administration, writing–review and editing. F. Maura: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing.
This work is supported by the Memorial Sloan Kettering Cancer Center NCI Core Grant (P30 CA 008748), the Multiple Myeloma Research Foundation (MMRF), and the Perelman Family Foundation. M. Hultcrantz reports grants from Swedish Blood Cancer Foundation, and grants from Karolinska Institute Foundations. F. Maura is supported by the American Society of Hematology, the International Myeloma Foundation, and The Society of Memorial Sloan Kettering Cancer Center. K.H. Maclachlan is supported by the Haematology Society of Australia and New Zealand New Investigator Scholarship and the Royal College of Pathologists of Australasia Mike and Carole Ralston Travelling Fellowship Award. G.J. Morgan is supported by The Leukemia & Lymphoma Society.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.