Relapse of acute lymphoblastic leukemia (ALL) remains a leading cause of childhood cancer-related death. Prior studies have shown clonal mutations at relapse often arise from relapse-fated subclones that exist at diagnosis. However, the genomic landscape, evolutionary trajectories, and mutational mechanisms driving relapse are incompletely understood. In an analysis of 92 cases of relapsed childhood ALL incorporating multimodal DNA and RNA sequencing, deep digital mutational tracking, and xenografting to formally define clonal structure, we identified 50 significant targets of mutation with distinct patterns of mutational acquisition or enrichment. CREBBP, NOTCH1, and RAS signaling mutations arose from diagnosis subclones, whereas variants in NCOR2, USH2A, and NT5C2 were exclusively observed at relapse. Evolutionary modeling and xenografting demonstrated that relapse-fated clones were minor (50%), major (27%), or multiclonal (18%) at diagnosis. Putative second leukemias, including those with lineage shift, were shown to most commonly represent relapse from an ancestral clone rather than a truly independent second primary leukemia. A subset of leukemias prone to repeated relapse exhibited hypermutation driven by at least three distinct mutational processes, resulting in heightened neoepitope burden and potential vulnerability to immunotherapy. Finally, relapse-driving sequence mutations were detected prior to relapse using droplet digital PCR at levels comparable with orthogonal approaches to monitor levels of measurable residual disease. These results provide a genomic framework to anticipate and circumvent relapse by earlier detection and targeting of relapse-fated clones.
This study defines the landscape of mutations that preexist and arise after commencement of ALL therapy and shows that relapse may be propagated from ancestral, major, or minor clones at initial diagnosis. A subset of cases exhibits hypermutation that results in expression of neoepitopes that may be substrates for immunotherapeutic intervention.
See related video:https://vimeo.com/442838617
See related commentary by Ogawa, p. 21.
This article is highlighted in the In This Issue feature, p. 5
Relapsed acute lymphoblastic leukemia (ALL) is the second leading cause of cancer-related death in children (1). There are few targeted therapeutic approaches for relapsed ALL and outcome is frequently poor (2), even with the advent of immunotherapeutic approaches. ALL typically exhibits a relatively low burden of somatic mutations, which has allowed delineation of the nature and sequence of acquisition of genetic variants that drive treatment failure (3). These include inherited variants that are often associated with leukemia subtype (e.g., TP53 mutations and low hypodiploid ALL), founding chromosomal rearrangements (e.g., BCR-ABL1 and rearrangement of KMT2A), secondary genomic alterations (e.g., alteration of IKZF1), and somatic alterations that are enriched from minor clones or acquired after initiation of therapy (4). Mutations targeting signaling pathways, chromatin patterning, tumor suppression, and nucleoside metabolism are enriched at relapse (5). These can confer resistance to specific drugs, such as mutations in NT5C2 to thiopurines (4, 6–10) and mutations in the glucocorticoid receptor NR3C1 and acetyltransferase CREBBP to glucocorticoids (11), or confer sensitivity to targeted agents, such as RAS pathway mutations and MEK inhibition (12). Prior studies also suggest that “relapse-fated” clones commonly exist as minor clones at diagnosis; along with the predominant major clone, these originate from a common ancestral clone that undergoes divergent evolution (8).
The early identification and genetic characterization of relapse-fated clones offer the opportunity to improve treatment outcomes by anticipating relapse and adjusting therapy, or by targeting relapse-fated clones prior to the acquisition of additional mutations facilitating leukemic progression. However, prior genomic studies of relapsed ALL have typically been limited in cohort size and the extent of genomic analysis such that a rigorous analysis of the relapse driver mutations, formal delineation of clonal structure and disease progression, and deep sequencing to distinguish preexisting clones from acquired mutations has not been possible.
Patterns of Relapse in ALL
Multiple tools were used to describe mutational landscape, clonal structure, and clonal evolution for 92 children with ALL and sequential diagnosis, remission and relapse samples treated on St Jude Total Therapy Studies (13, 14), results of which may be explored at https://stjuderesearch.org/site/data/relapsed-all (Supplementary Table S1). Somatic sequence variants detected at diagnosis and relapse were subjected to confirmatory capture-based sequencing at each time point to optimize estimation of mutant allele frequency (MAF) and time acquisition of relapse-associated mutations (Supplementary Fig. S1A–S1D). The deep sequencing identified subclonal somatic mutations in a subset of the remission samples (Supplementary Table S2), raising the possibility that mutational persistence early in therapy may predict relapse, as observed in acute myeloid leukemia (AML; ref. 15). However, comparative analysis of germline mutation burden in 12 cases from this relapsed cohort, with samples at days 27 to 49 and 20 cases that did not relapse, showed no correlation between mutational burden early in therapy and likelihood of relapse (Supplementary Data).
The burden of single-nucleotide variants (SNV), short insertions/deletions (indels), and copy number alterations (CNA) increased with disease progression (Supplementary Figs. S2A–S2D and S3A–S3C; Supplementary Tables S3–S7). Across the cohort, the majority of CNAs (60%) were preserved from diagnosis to relapse, whereas the majority of SNV/indels (74%) were acquired (Supplementary Table S8). Twenty-seven tumors from 18 patients were hypermutated (>85 mutations per sample, ∼1.3 mutations/Mb; Supplementary Fig. S4A–S4F), including 9 of 14 second relapses (64%), 6 of which were already hypermutated at first relapse. Apart from an increased mutational burden at early second relapse, no relationship was observed between mutation burden and time to relapse. CREBBP mutations (n = 15 cases) were associated with a longer time to relapse (mean 4.3 vs. 2.8 years, Student t test P = 0.019; Supplementary Table S9). Notably, 8 cases with outlier early relapse harbored combinations of alterations known to be involved in relapse development (Supplementary Data).
Frequently Mutated Genes and Pathways
A total of 4,509 genes harbored nonsilent sequence mutations at diagnosis (D) or relapse (R; Supplementary Tables S3 and S4). Clonal, nonsilent SNV/indels, or focal CNAs were acquired or selected for in 125 genes at first relapse in at least 2 cases and 28 genes in at least 3 cases. Among the recurrent mutated genes (≥2 cases), 38 were known cancer/leukemia genes (16) and 87 had not previously been described (Supplementary Table S10). Using GRIN (17), a model that incorporates analysis of multimodal genomic data (Methods; Supplementary Tables S11 and S12), 23 genes were significantly mutated (q < 0.1) at diagnosis and relapse, and 50 genes significantly enriched with mutations at R1 (by rising MAF or acquisition of mutation following diagnosis). Most (n = 20, 87%) of the D-R1 shared mutated genes, but only 14 (28%) R1 specific, were known targets of mutation in cancer/leukemia.
B-ALL relapses were enriched with mutations in Ras pathway (relapse 31.3% vs. diagnosis 17.9%) and epigenetic modifiers/regulators, including PRDM2 (n = 4), PHF19 (n = 3), TET3 (n = 3), and SIN3A (n = 3), 16 of which had not been reported in ALL (49.3% vs. 29.9%; Fig. 1A and B; Supplementary Figs. S5A, S5B and S6; Supplementary Table S13). Of 61 cases with signaling pathway mutations, 31 harbored at least one Ras pathway mutation at diagnosis, with 11 cases having multiple, commonly subclonal RAS pathway mutations at diagnosis (Supplementary Table S14). Seven cases showed convergence to one or two clonal Ras pathway mutations. In contrast, only three cases showed acquisition of new RAS pathway mutations at relapse. Thus, multiclonality of signaling pathway mutations is frequent at diagnosis in ALL, indicating that they are secondary lesions in leukemia evolution, and the observed mutational extinction and convergence to clonal dominance supports a selective advantage to RAS pathway mutations in many cases. In contrast, PI3K–AKT pathway mutations were common at diagnosis in T-ALL but were often lost at relapse, suggesting that inhibition of this pathway may not reduce likelihood of relapse.
Of the genes known to play a role in the development of ALL, 29 (including IKZF1, TP53, NR3C1, TBL1XR1, and PTPN11) showed universal enrichment at relapse, whereby mutations were always retained from diagnosis to relapse (i.e., were truncal variants) or subsequently emerged at relapse (Supplementary Table S15). Of these, six were never observed at diagnosis (NT5C2, LRP1B, USH2A, APC2, PIK3R4, and NCOR2). An additional 20 genes showed extinction of mutations present at diagnosis in only a single case (e.g., VPREB1, CREBBP, ETV6, and KDM6A). Several genes not previously reported to be mutated in ALL were notable for preservation or clonal selection of mutations from diagnosis to relapse, including PTPRT (n = 6), ROBO2 (n = 5), and TRRAP (n = 5), suggesting a role in promoting leukemogenesis and relapse (Supplementary Fig. S7A and S7B; Supplementary Tables S16 and S17). Mutations in the glucocorticoid receptor (NR3C1) and purine/pyrimidine synthesis pathway (NT5C2) were frequent in both B- and T-ALL at relapse, but often as subclonal events, suggesting that additional targeting of these drug-specific resistance-driving mutations may eradicate all relapse clones in many cases.
Integration of Mutational Landscapes with Clonal Structure
Using the rise and fall of CNAs and MAF of somatic SNV/indels, most tumor pairs [n = 79 D-R1/secondary tumor (S; 86%), and n = 12 R1-R2/S (92%)], showed clonal extinction and evolution of new clones in the subsequent tumor, indicating branching evolution. One quarter of the relapses (n = 28, including 2 second relapses) arose from the major clone (MAF > 30%) present at diagnosis or the previous relapse, and half of the relapses (n = 53, of which 6 were second relapses) developed from a minor clone (Fig. 2A and B; Supplementary Table S18). Of the 53 relapses arising from a minor clone, nine (17%) had relapse-enriched mutations already present in that subclone and 26 (49%) acquired new or additional relapse-enriched mutations following diagnosis. Nineteen tumors (18%), exhibited polyclonal evolution in which multiple diagnosis clones persisted at relapse.
We found that second relapses evolved more often in a polyclonal fashion (χ2P = 0.043) and with a shorter remission time than first relapses (average 1.5 years vs. 3 years, Student t test, P = 0.0044). Compared with first relapses, variants in second relapses were more often variants preserved from subclones (10% vs. 4%) or preserved at subclonal levels (11% vs. 2%) and less often acquired (69% vs. 79%), which reflects the polyclonal evolution model (χ2P < 2.2 × 10−16). Thus, initial evolution from diagnosis to relapse is characterized by mutational convergence, and commonly, emergence from a minor clone, but subsequent progression exhibits preservation of the initially selected clones and variants.
Second Primary Leukemia
Four first relapses (SJBALL006, SJTALL142, SJPHALL005, and SJPHALL022425) and one second relapse (SJTALL049) were fully discordant for all genetic alterations (SV, CNA, SNV/Indel, and antigen receptor rearrangements) or shared only a leukemia fusion (Supplementary Tables S19 and S20), suggesting distinct second leukemias masquerading as relapse. These scenarios are important to distinguish, as second leukemias may be curable with standard therapy and multiple primary tumors suggest heritable leukemia predisposition. Both tumors in SJBALL006 harbored MEF2D-BCL9 fusions, but with unique RNA and DNA breakpoints, and evidence of the second MEF2D-BCL9 fusion at low level in the primary sample (Supplementary Tables S20 and S21). In addition, there was a constitutional gain of the chromosome 1q neuroblastoma breakpoint (NBPF; ref. 18) region on chromosome 1q that contains MEF2D and BCL9, and discordant somatic complex genomic amplifications of the NBPF region at diagnosis and relapse (Fig. 3A–D; Supplementary Tables S6 and S22). Thus, while emergence of a tumor with the same fusion partners suggests relapse, this case represents germline structural variation promoting development of multiple tumors with distinct initiating fusions and different latencies of presentation.
BCR-ABL1 cases SJPHALL005 and SJPHALL022425 lacked shared nonsilent variants at diagnosis and relapse, raising the possibility of second leukemia rather than relapse. However, whole genome sequencing (WGS) of SJPHALL005 identified identical rearrangement breakpoints at diagnosis and relapse and 65 shared somatic noncoding SNVs, demonstrating relapse from a common, ancestral clone (Supplementary Table S20). SJTALL049 developed three tumors: STIL-TAL1 rearranged T-ALL at age 6 that relapsed at age 14 plus an independent BCR-ABL1–positive chronic myeloid leukemia (CML) at age 20. Breakpoint spanning PCR revealed unique STIL-TAL1 rearrangements at diagnosis and relapse, but WGS showed 19 shared noncoding SNVs. WGS showed no shared SNVs between either of the T-ALLs and the CML, indicating that the CML developed as a second primary tumor. Thus, completely genetically distinct second tumors masquerading as relapse of acute leukemia are rare, and even if complete nonsilent mutational discordance is present, they may arise from divergent evolution soon after leukemia initiation.
Three relapses were clinically considered second tumors based on a shift to myeloid lineage (SJTALL008, SJTALL124, and SJTALL164). However, these tumors revealed shared mutations between diagnosis and relapse, indicating a common clonal origin (Supplementary Data). This recapitulates the lineage plasticity that is independent of genetic variegation we have recently described in acute leukemia of ambiguous lineage (19) and highlights the importance of genomic analysis to accurately interpret the relationship of diagnosis and relapse samples.
Tracing the Evolution of Relapse
Bulk sequencing data may fail to unambiguously define a clonal evolution model, particularly for mutationally sparse samples and those exhibiting a continuum of variant MAFs. We performed limiting dilution xenografting and sequencing of eight matched diagnosis and relapse samples (Supplementary Table S1). Most (90.5% of 232) of the somatic mutations detected in primary samples were identified in at least one diagnosis or relapse xenograft (MAF ≥ 0.01; Supplementary Fig. S8A). Of the 22 mutations not captured in xenografts, 20 were observed only in the bulk diagnosis sample and not in the relapse sample, suggesting that such nonxenografted diagnosis mutations must be present in cells that lack clonal propagating potential and are less likely to initiate relapse.
Genomic analysis including xenografting overcame the challenge of assigning mutations to individual subclones and enabled unambiguous delineation of clonal structure. Xenograft analysis of SJBALL036 (ETV6-RUNX1-like subtype) identified two linearly related clones (2.1 and 2.2) from mutations originally allocated to a common clone (Fig. 4A). Mutations in relapse-fated clone 2.1 were clonal in all xenografts, including those propagated from diagnosis, whereas the CREBBP mutations in clone 2.2 were observed in a subset of xenografts, indicating that subclone 2.2 arose from 2.1. In addition, xenografting demonstrated the selective advantage of clone 2.1 versus clone 5 that was lost at relapse and was not represented in any of the xenografts transplanted with the diagnosis sample. Furthermore, the xenografts derived from the relapse sample captured mutually exclusive variants, providing definitive evidence that clones 3 and 4 were unrelated, and represent branching evolution (Fig. 4B and C). Similarly, xenograft data of SJETV010, of which the second relapse sample is hypermutated (1,699 somatic mutations), resolved 13 clones following linear and branching patterns of evolution (Supplementary Fig. S8B and S8C). These data support the notion of branching evolution in ALL suggested by FISH and bulk sequencing analysis (20, 21), but now with mutational data enabling unambiguous clonal delineation. A subset of xenografts has also been utilized to demonstrate that relapse-fated clones may be detected at diagnosis that already exhibit resistance to therapy (22, 23).
Tracing Mutation Acquisition Prior to Relapse
As xenografting identified resistance-driving mutations in relapse-fated subclones present at diagnosis, we sought to determine whether these mutations could be detected in patient samples obtained between diagnosis and relapse. We used droplet digital PCR (ddPCR) to track the emergence of relapse-specific mutations in CREBBP, NRAS, KRAS, NT5C2, and WHSC1 in 50 samples from 5 patients (Supplementary Table S23). With an input of 500 ng DNA a frequency of >0.005% (>7 copies) could be consistently detected (Supplementary Fig. S9A–S9D; Supplementary Table S24). ddPCR identified previously undetected minor clones in three tumor samples (NRAS G12R MAF = 0.4% in SJBALL013-R1, KRAS G12S MAF = 0.4% in SJBALL022481-D, NT5C2 R39Q MAF = 0.006% in SJBALL192-R1; Fig. 5), confirming the minor clone evolution model for these tumors. Difference in the temporal dynamics and occurrence of mutations in SJBALL022481 (CREBBP and KRAS), SJBALL192 (two NT5C2 mutations), and SJTALL001 (NT5C2 and KRAS) demonstrated clonal exclusivity of these mutations in each case. Despite complete remission by conventional minimal residual disease (MRD) testing, we detected tumor-specific mutations up to 534 days after the diagnosis (SJBALL022481, CREBBP R1446C), as well as 40 days prior to relapse (SJBALL013 NRAS G12R) in complete remission bone marrow samples. Moreover, in SJBALL022481, the KRAS G12S mutation was detectable at regular intervals during the 1,169 day period between diagnosis and relapse, even though the samples were deemed complete remission. Peripheral blood samples obtained from patients with B-lineage (SJBALL192, SJHYPER127) as well as T-lineage ALL (SJTALL001) with eventual bone marrow relapse were negative or contained much lower MAFs compared with the bone marrow. Thus, leukemic cells may persist in bone marrow and may be detected at low levels during complete remission, indicating the utility of this approach for disease monitoring and early relapse detection.
Mutational Drivers of Hypermutation and Neoepitope Expression
Three percent of diagnoses, 17% of first relapse, and 64% of second relapse cases exhibited hypermutation. This was defined by an inflection at 85 mutations, approximately 1.3 mutations/Mb, a burden that was more conservative than the cutoff determined by the Segmented algorithm (ref. 24; Supplementary Fig. S4A). Hypermutation was observed in cases relapsing from minor or multiple clones of all 3 hypodiploid, 2 of 5 ETV6-RUNX1, 5 of 13 hyperdiploid, 1 Ph-like, 2 unclassified B-ALL, 1 ETP, and 4 T-ALL cases.
To understand the processes responsible for hypermutation, we used non-negative matrix factorization (NMF; ref. 25) and extracted four single-base substitution (SBS) signatures (Fig. 6A and B; Supplementary Fig. S10A and S10B). The high prevalence of hypermutation in second relapses suggests that hypermutation may be driven by treatment exposure. However, we did not uncover a mutational signature associated with treatment, such as temozolomide-associated signatures found in glioblastoma and melanoma (25). On the basis of the most prominent mutational signature (Supplementary Fig. S10B), we classified 17 hypermutated relapses from 13 patients into four groups (Fig. 6C). Two of these groups were characterized by well-established mutational processes. Group 2 (signature B mutations) resembles signatures associated with activity of the AID/APOBEC family of cytidine deaminases (26, 27). This mutational process is common in human cancer including ALL (25) and was postulated to occur in short bursts initiated by retrotransposon mobility (28). The mutational burden in this group was relatively low (99–156 acquired mutations) compared with the other three groups (group 1; 220–1,210, group 3; 860–1,218, group 4; 104–710). Group 3 cases have high contribution of signature C, which clusters with mismatch repair (MMR)–associated signatures, with highest similarity to SBS15 (Fig. 6B). Indeed, the three relapses in this group all had biallelic mutations in one of the MMR genes (see Supplementary Data) and had high levels of single-base insertions or deletions in simple repeats (Fig. 6D), a feature of MMR deficiency (29). Genetic alterations in the MMR pathway have been associated with resistance to drugs such as thiopurines in ALL (30), suggesting that this mechanism of hypermutation directly contributed to relapse in these cases.
Two signatures could not be assigned to known mutational processes. Signature D (group 4) lacked a strong bias for a particular trinucleotide context and showed similarity to multiple mutational signatures. Signature A (group 1) resembled clock-like signature SBS1, which is a known consequence of a slow but progressive accumulation of C to T transitions at CpG sites owed to spontaneous deamination of methylated cytosines and is more apparent in cancers diagnosed at older age (29). Because the patients in our cohort are young, this process appears to be accelerated by an acquired imbalance between damage and repair. This was not caused by alterations in genes encoding regulators of DNA deamination. Interestingly, the signature A mutations in SJETV010 were subclonal (MAF < 0.5%) in the first relapse, but showed a much wider spread of allele frequencies in the second relapse, suggesting an ongoing endogenous mutational process initiated in a minor clone at first relapse (Fig. 6E). Recently, an SBS1-like signature (SBS74) has been reported that appears to be associated with MMR deficiency (31. Indeed, all hypermutated ALL relapses with MMR deficiency show signature A mutations (Fig. 6C; Supplementary Data).
To further compare the characteristics of signature A with the clock-like signature SBS1, we performed WGS of relapse and remission samples of SJETV010, SJHYPER022, and SJHYPO117, followed by somatic SNV calling and mutational signature extraction. In line with our findings in the coding regions, we identified highly concordant mutation densities, as well as composition and relative contribution of the four mutational signatures (Supplementary Fig. S10C–S10F). Signature SBS1 mutations have been described to occur throughout the genome and do not show strand asymmetry in transcribed regions (25, 32). We confirmed these observations using three recently published colon organoid samples (33), which are characterized by high prevalence of signature SBS1 mutations, mainly outside gene bodies. In contrast, we found that signature A mutation density was highest in gene bodies and showed strong transcription strand bias, particularly in genes with high expression in the respective samples (Fig. 6F and G; Supplementary Fig. S10G). We did not observe strand asymmetry associated with DNA replication for either signature A or signature SBS1 (Supplementary Fig. S10H). Transcriptional strand asymmetry can be the result of more efficient repair of the transcribed strand, or increased damage on the single-stranded, nontranscribed strand, two mechanisms that show opposite correlations with expression (34). Because signature A mutations are enriched in highly expressed genes, they may originate from transcription-coupled damage at the single-stranded, nontranscribed strand, as has been described for liver cancer (34).
The high prevalence of hypermutation in relapsed ALL suggested that this may result in increased generation of expressed neoepitopes that may be exploited by immunotherapeutic approaches to enhance antitumor responses of autologous T cells. We used WGS and RNA sequencing (RNA-seq) data to infer HLA class I types from each sample (35), predicted the binding affinities of all unique 8–12 amino acid peptides corresponding to SNVs and fusion proteins (36), and developed unweighted (UPAS) and weighted putative antigenicity scores (WPAS), the latter of which incorporates predicted sample-specific peptide: MHC binding with variant-specific expression. Although we observed variation in the number of fusion-encoded, predicted MHC-binding peptides across individual fusions (0–20, median = 1), fusion-encoded neoepitopes remained unchanged over time (Supplementary Table S21). In contrast, we observed that the number of predicted HLA-binding mutant peptides (≤500 nmol/L; ref. 37) per tumor rose with disease progression as a function of increased mutation burden, and thus particularly in hypermutated samples (P < 0.001, Supplementary Table S25; Supplementary Fig. S11A). In addition, the number of predicted MHC-binding peptides per tumor was significantly correlated with disease (B- and T-ALL), disease progression (D, R1, and R2), and signatures of hypermutation (Supplementary Fig. S11B). An expression-weighted antigenicity analysis comprising the subset of missense SNVs with expression data showed a significant effect of disease progression, with median WPAS lowest in R1 and highest in R2 variants (Supplementary Fig. S11C), and was particularly marked for known cancer genes (Supplementary Fig. S11D). These patterns may correspond to variations in immunologic pressure owed to, for instance, the distinct etiologies underlying B- and T-ALL and the successive use of immunosuppressive agents in treatment, respectively.
Using multiple orthogonal approaches, we have described the patterns, dynamics, and drivers of clonal evolution in a large cohort of childhood relapsed ALL. These results have implications for the development of new approaches to monitor and treat ALL more effectively.
The scope of our study allowed us to identify relapsed enriched driver mutations more comprehensively than in prior studies and included newly identified targets of mutation as well as recurrent mutations in genes previously identified at relapse (11, 38, 39). We were able to identify distinct patterns of temporal acquisition across relapse-enriched targets of mutation, with mutations in genes such as CREBBP preserved from or acquired after diagnosis, and others in genes such as NT5C2 and USH2A only observed after initial therapy; importantly, these findings suggest a role for therapy in the induction of mutation, and/or a deleterious effect on initial leukemic fitness of these mutations (38). Mutations observed in different gene regulation pathways showed different frequencies of relapse-enriched genes between B- and T-ALL, indicating that distinct biological mechanisms drive the genetic alterations of the disease progression.
Tumors that evolved from, or retained multiple clones all had a short time to relapse, supporting a model in which early relapses are associated with dynamic clonal evolution and late relapses to a more inert pattern (40). These patterns of evolution, and description of the targets of mutation, have important implications for anticipation of relapse and modulation of therapy. Over half of relapses arise from a minor clone that commonly harbors, or acquires, relapse-enriched mutations that drive drug resistance. Analysis of this large cohort also enabled demonstration that relapse enriched drivers such as alterations of CREBBP, IKZF1, and NT5C2 are rarely lost if present at diagnosis or subsequently acquired, indicating that early detection may predict an increased likelihood of treatment failure. Moreover, in parallel studies using a subset of the xenografts described here, we have shown relapse-fated minor clones already exhibiting resistance to therapy are present at the time of diagnosis (22, 23). Thus, it is imperative that mutational profiling must now strive to achieve MRD levels of mutation detection at diagnosis or early in therapy, and we have shown the feasibility of this approach using ddPCR. An alternative approach is capture-based deep sequencing of regions of sequence and structural variation in ALL. Early detection will facilitate close monitoring and consideration of alternative treatment approaches such as intensification, immunotherapy, or novel drug approaches for drug-specific resistance (e.g., NT5C2 and thiopurines and CREBBP and corticosteroid resistance).
We show hypermutation is common at relapse, and driven by distinct mechanisms of mutation, including tumor-intrinsic processes, such as cytidine deaminase DNA editing activity (AID/APOBEC) as previously observed in diagnosis samples (41) and experimental models (42), or the acquisition of mutations that cause MMR deficiency, which may represent a mechanism of MMR-induced resistance to thiopurines in ALL (30). In addition, we identified a new SBS1-like signature (Signature A), characterized by transcriptional strand asymmetry and enrichment in expressed genes, caused by an unknown mutational mechanism that is acquired during leukemia progression. Importantly, the identification of hypermutation as a common phenomenon in relapsed ALL suggests that immunotherapeutic approaches intended to restore autoreactivity against neoantigen expression, such as checkpoint blockade, should be formally explored. Although long assumed to be poor targets for immunotherapy due to the relatively low mutation burden in comparison with other tumors (43), recent data have demonstrated that pediatric hematologic malignancies promote the generation of abundant and functional immune responses to tumor-specific neoepitopes despite the apparent inability of the immune system to control those tumors (44). In conjunction with those findings, our results suggest that not only does hypermutation drive the generation of HLA-restricted neoepitopes, but that these are expressed in an immunologically tolerized milieu that may be exploited with strategies to augment T-cell antitumor reactivity. It will now be of great interest to formally document the presence of autoreactive T-cell clones directed at neoepitopes induced by hypermutation, as we have described at diagnosis in ALL (44), and to formally test, in experimental models, whether immunomodulatory approaches can augment or restore antitumor reactivity in hypermutated ALL.
Subjects and Samples
The cohort included 92 children (31 female, 61 male) with relapsed B-progenitor (B-ALL, n = 67) or T-lineage [T-ALL, n = 25; including 7 early T-cell precursor ALL (ETP-ALL)] ALL patients treated on St. Jude Total Therapy Studies XI-XVI (median age at diagnosis 7.8 years; range, 1 month–18.7 years; Supplementary Table S1; refs. 45–47). The median time from diagnosis to first relapse was 2.7 years (range, 3 months–9.9 years). Sixteen cases developed a second relapse (range, 3 months–7 years). Relapse was very early (<1 year) in 17 patients, early (1–2 years) in 21 patients, and late (>5 years) in 14 patients. Nine cases developed a second tumor of different lineage, including 2 basal cell carcinomas, 1 B-cell lymphoma, 1 CML, and 5 AMLs. A total of 46 patients received bone marrow transplants at a median age of 12 years (range, 7 months–22 years), and all except one were allogeneic. Tumor samples with a blast percentage of less than 80% were flow sorted for the tumor population. Written informed consent was obtained from the patient and/or parent. The study was conducted in accordance with the Declaration of Helsinki, and was approved by the St. Jude Children's Research Hospital Institutional Review Board.
DNA copy number aberrations were determined using SNP 6.0 microarrays (Affymetrix) in 307 samples from 92 patients (92 diagnosis, 84 relapse, 14 second relapse, 5 second tumor, and 91 germline samples; Supplementary Table S1). Data were analyzed using reference normalization (48) and circular binary segmentation (49).
We performed whole-exome sequencing (WES) on 276 samples from 92 cases (Supplementary Table S26) and WGS on 99 samples from 36 cases (Supplementary Table S27). Exomes were captured using the TruSeq Exome Library Prep Kit (67 Mb, 1 μg DNA input) or the Nextera Rapid Capture Expanded Exome Kit (62 Mb, 50 ng DNA input; Illumina). Paired-end sequencing was performed with the HiSeq 2500 sequencer (Illumina). The data were mapped to human reference genome hg19 and variant calling was performed using Bambino (50–52). All somatic SNVs and indels identified at diagnosis or relapse were validated using NimbleGen SeqCap Target Enrichment according to manufacturer's instructions (Roche NimbleGen) and resequenced using a HiSeq 2500 genome sequencer to a mean coverage >350× (250–500 ng DNA input; Supplementary Table S28). We performed transcriptome sequencing (RNA-seq) for TRIzol extracted RNA for 115 samples obtained from 66 cases (Supplementary Table S29). We used 1 μg RNA for library preparation with the TruSeq RNA Library Prep Kit v2 (Illumina) and 2 × 100 bp paired-end sequencing was performed on a HiSeq 2500 sequencer (Illumina). RNA-seq data were mapped to human reference genome hg19 using StrongArm and fusions were identified using CICERO (53) and FusionCatcher (54, 55).
B-ALL Subtyping Based on Gene Expression Profile from RNA-seq
Read counts for annotated genes (Ensembl Homo sapiens GRCh37 v75) were called by HTSeq (version 0.6.0; ref. 56) and processed by DESeq2 R package (57) to normalize gene expression into regularized log2 values (rlog). A subtype predication model was trained by Prediction Analysis of Microarrays based on a cohort of 309 samples from our previous studies (16, 53, 58), which consists of 8 B-ALL subtypes: IGH-DUX4 (n = 42), TCF3-PBX1 (n = 41), ETV6-RUNX1 (n = 42), hyperdiploid (n = 46), MEF2D-rearranged (n = 21), KMT2A (MLL)-rearranged (n = 44), BCR-ABL1 (n = 44), and ZNF384-rearranged (n = 27). The trained model was applied with 100 evenly divided thresholds (control selected feature genes from 5,000 to 50) and the probability score was averaged to predict subtypes for the enrolled RNA-seq samples.
Mutation Analysis and Clonal Modeling
Variants with a total coverage of <5 reads (combining WGS/WES and validation) were excluded. Variants with ≤2 variant reads were considered wild type; those with 3 to 8 variant reads only considered mutant when both WES/WGS and capture validation techniques identified the variant or the variant was called with higher coverage in other (tumor) samples of the same patient. A MAF of <30% was considered subclonal, 30% to 75% heterozygous, and MAF ≥75% homozygous.
MutSigCV (59) and the Genomic Random Interval Model (GRIN; ref. 17) analyses were performed to identify potential driver lesions (Supplementary Tables S11 and S12). MutSigCV is limited to analysis of sequence mutations, whereas GRIN incorporates multimodal genomic data including sequence and structural variants with robust adjustment for background mutation rate to identify significantly altered genes/regions, and unlike CNA-specific tools such as GISTIC (60), is not influenced by full chromosomal aneuploidies.
Two-dimensional MAF plots were used to visualize the relationship between sequential samples (4). We used sciClone (14) and manual curation incorporating xenograft data to assign variants to clones. Clonal evolution was visualized using ClonEvol (61) and fishplot (13). Noncoding variants were also considered to resolve the nature of clonal evolution in cases with presumed evolution from a major clone, which resulted in; required reclassification from evolution from a major to a minor clone in 5 of 27 cases. Retention of multiple clones was deemed polyclonal evolution, and relapse from an ancestral precursor where D/R tumors share only the founding translocation and up to two SNV/indels.
Diagnosis and/or relapse tumors of 8 cases [3 ETV6-RUNX1, 1 ETV6-RUNX1-like, 2 KMT2A (MLL)-rearranged, one DUX4 and 1 case without subtype data; Supplementary Table S1] were transplanted at limiting dilution from 250,000 cells to 10 cells into the femur of 8 to 12 week-old sublethally irradiated (225 cGy) female NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ (NSG) mice. Engrafted tumor cells were harvested from bone marrow, spleen, and the central nervous system when mice displayed evidence of disease or at 30 weeks posttransplantation. Cells from the bone marrow and spleen were purified using the Miltenyi Mouse Cell Depletion Kit (Miltenyi Biotec; samples with >20% engraftment) or by cell sorting. All animal experiments were done in accordance with institutional guidelines approved by the University Health Network (Toronto, Canada) Animal Resource Centre (AUP#1117.37).
Seven relapse-associated mutations were genotyped by ddPCR technique (RainDance Technologies) using custom (NT5C2 p.P414A, CREBBP p.R1446C, NRAS p.G12R, and WHCS1 p.E1099K) or available (NT5C2 p.R39Q, KRAS p.G12S, and KRAS p. G12C) primers and probes (Supplementary Table S30). An average of 7 million droplets were generated by the RainDrop Source instrument, and emulsion PCR was performed using the C1000 Thermal Cycler (Bio-Rad). Droplet fluorescence of the amplified product was detected by the RainDrop Sense instrument, and data analysis was carried out using the RainDrop Analyst II Software. Detection limits were determined by testing serial dilutions of flow purified–mutant leukemia cells in wild-type REH cells. A frequency of >0.005% (>7 copies) could be consistently detected (Supplementary Fig. S9; Supplementary Table S24). ddPCR MAF correlated well with the MAFs called from WES (r = 0.971) or CapVal (r = 0.9964; Supplementary Fig. S9; Supplementary Table S23).
IGH and TCR Rearrangement Sequencing
IGH and TCRB loci were genotyped by ImmunoSeq (Adaptive Biotechnologies) to analyze clonal relationships of putative second tumors lacking shared genomic alteration (34 samples from 15 cases; Supplementary Table S19).
Germline copy number variants were filtered by the Database of Genomic Variants (62) and in-house databases. To prioritize germline SNV/indel variants, we filtered for rare variants (<0.01% population frequency in ExAC, dbSNP, GoNL, ESP, Wellderly, Kaviar, and Complete Genomics' 60 genomes databases) that were predicted to be deleterious (nonsense, frameshift, or canonical splice site variants, and missense variants with PhyloP>3) and were present in genes known to be associated with leukemia susceptibility and pediatric cancer syndromes (Supplementary Tables S31 and S32). Second, we used St. Jude's Medal Ceremony algorithm to identify Gold Medal variants (truncating mutations in tumor suppressors, matches to hotspots or truncating mutations in somatic mutation databases, and matches to locus-specific databases; ref. 63).
Mutational Signature Analysis
We defined hypermutation as samples containing >85 SNV/indels (∼1.3 mutations/Mb) based on the density histogram of the number of variants per sample in our cohort (Supplementary Fig. S4), which is more conserved than the cutoff determined by Segmented (ref. 24; >56 SNV/indels, 0.8 mutations/Mb).
For de novo extraction of somatic mutational signatures, we selected 22 hypermutated ALL samples from this cohort, and WES and WGS from patients with hypermutated B-ALL (n = 38) and WGS from B-cell lymphoma samples (N = 10, ICGC). Mutational signatures were extracted using NMF (64, 65) and MutationalPatterns (66). Similarity between two signatures, A and B, defined as nonnegative vectors with n mutation types, was calculated using cosine similarity. We defined two signatures to be the same if the cosine similarity is ≥0.95 (range 0–1). We calculated the cosine similarity between the mutational profiles of the samples and the mutational signatures and included the COSMIC mutational signatures and the de novo–extracted mutational signatures in this analysis.
Finally, we studied the strand asymmetry of signature A mutations in the context of transcription and replication. This analysis requires that individual mutations are assigned to a single signature. Using the number of trinucleotide changes and the signature probabilities per sample, we calculated the relative contributions of each signature for each trinucleotide context in a sample. Only trinucleotide changes with a relative contribution of ≥95% to one of the four signatures were assigned to that particular signature.
Replication timing in leukemia samples was determined using Repli-seq data obtained from five lymphoblastoid cell lines in the ENCODE project (ref. 67; GM06990, GM12801, GM12812, GM12813, GM12878), using median values per 1-kb bin (33). For intestinal organoids, we used Repli-seq data described previously (68). Predefined signature A mutations were assigned to early, intermediate, and late replicating regions, as described previously (33). Replication asymmetry analysis was done using the MutationalPatterns R package (66). On the basis of the distribution of rlog values from RNA-seq, genes were stratified into three groups for each sample: not expressed (genes with no or few supporting reads), genes with low expression (below the median level), and genes with high expression (median expression level or higher). We confirmed presence of SBS1 mutations by comparing their mutational profile with signature SBS1 for each sample (cosine similarity = 0.98; Supplementary Fig. S10). Gene expression stratification for healthy colon organoids was performed as described for the ALL samples. Testing of strand asymmetries within three groups was done using Poisson test for strand asymmetry significance testing.
To characterize the antigenic potential of missense variants, we developed putative antigenicity scores that consider predicted patient-specific peptide: MHC binding variant-specific expression. WGS and RNA-seq data were used to infer class I HLA alleles to four-digit resolution for each patient using OptiType (35). For each missense SNV and patient HLA allele, we then used NetMHCcons 1.1 (36) to predict the binding affinities of all unique, mutated peptides of lengths 8–12 amino acids, excluding peptides that could be found elsewhere in the human proteome. Predicted binding affinities are often categorized as presumptive binders (≤ 500 nmol/L) and nonbinders, which can be useful for narrowing epitope targets (44), but to estimate total antigenic potential, we also conceptualized binding affinities as correlated with probabilities of peptide: MHC binding to consider all predicted binding affinities additively. The UPAS was calculated as the natural logarithm of the summation of the inverse of all putative binding affinities; for the subset of SNVs for which expression data were available, this value was then weighted by adding the natural logarithm of 0.01 + the fraction of expressed alternate-to-total bases to generate a WPAS specifically for those variants. Each of these scores is best considered in comparison across variants, with increasingly positive scores indicative of increasing putative antigenicity.
To investigate potential correlates of antigenic potential (e.g., disease, disease progression, hypermutator status, and interactions thereof) while controlling for the nonindependence of the data owed to multiple variants across patients, we used the lme4 R package (69) to fit linear mixed models with patient as a random effect. The car R package (70) was used to assess the statistical significance of the fixed effects, and residuals of models with significant effects were verified as unbiased and homoscedastic via visual inspection.
WES, WGS, transcriptome sequencing, and SNP array data are available at the European Genome-Phenome Archive (accession no. EGAS00001003975). The genomic landscape reported in this study can be explored at the St. Jude PeCan Data Portal https://stjuderesearch.org/site/data/relapsed-all.
Disclosure of Potential Conflicts of Interest
M.V. Relling reports receiving a commercial research grant from Servier Pharmaceuticals. P.G. Thomas has received speakers bureau honoraria from Illumina and PACT Pharma and has ownership interest (including patents) in PCT/US2016/064735. J.E. Dick is a Scientific Advisory Board member at Trillium Therapeutics, reports receiving a commercial research grant from Celgene, has ownership interest in a patent licensed to University Health Network. C.G. Mullighan reports receiving commercial research grants from Pfizer, AbbVie, and Loxo Oncology, has received speakers bureau honoraria from Amgen and Pfizer, and is a consultant/advisory board member for Illumina. No potential conflicts of interest were disclosed by the other authors.
E. Waanders: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review, and editing. Z. Gu: Resources, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review, and editing. S.M. Dobson: Resources, data curation, software, formal analysis, investigation, visualization, methodology, writing–original draft, writing–review, and editing. Ž. Antić: Software, formal analysis, investigation, methodology, writing–original draft, writing–review and, editing. J.C. Crawford: Resources, data curation, software, formal analysis, investigation, methodology, writing–original draft, writing–review, and editing. X. Ma: Resources, data curation, software, formal analysis, investigation, methodology, writing–original draft, writing–review, and editing. M.N. Edmonson: Data curation, software, formal analysis, and methodology. D. Payne-Turner: Data curation, software, formal analysis, investigation, and methodology. M. van de Vorst: Data curation and investigation. M.C.J. Jongmans: Data curation. I. McGuire: Resources, data curation, and visualization. X. Zhou: Resources and visualization. J. Wang: Resources and visualization. L. Shi: Resources, software, formal analysis, investigation, visualization, and methodology. S. Pounds: Software, formal analysis, investigation, and methodology. D. Pei: Software, formal analysis, investigation, and methodology. C. Cheng: Formal analysis. G. Song: Formal analysis. Y. Fan: Formal analysis. Y. Shao: Formal analysis, validation, and investigation. M. Rusch: Resources, validation, and investigation. K. McCastlain: Resources and investigation. J. Yu: Validation and investigation. R. van Boxtel: Formal analysis, validation, and investigation. F. Blokzijl: Formal analysis. I. Iacobucci: Resources, data curation, formal analysis, validation, investigation, and methodology. K.G. Roberts: Resources, data curation, validation, investigation, methodology, and project administration. J. Wen: Resources, formal analysis, investigation, and project administration. G. Wu: Formal analysis and investigation. J. Ma: Formal analysis and investigation. J. Easton: Formal analysis, validation, and investigation. G. Neale: Validation and investigation. S.R. Olsen: Investigation. K.E. Nichols: Resources and investigation. C.-H. Pui: Resources. J. Zhang: Resources, software, and formal analysis. W.E. Evans: Resources, software, and formal analysis. M.V. Relling: Resources. J.J. Yang: Resources. P.G. Thomas: Conceptualization, resources, data curation, software, formal analysis, and investigation. J.E. Dick: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review, and editing. R.P. Kuiper: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review, and editing. C.G. Mullighan: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review, and editing.
We thank the Biorepository, Genome Sequencing Facility of the St. Jude Hartwell Center for Bioinformatics and Biotechnology, and the Flow Cytometry and Cell Sorting Core Facility of St. Jude Children's Research Hospital. We acknowledge W.M. Nillesen and A.R. Mensenkamp for expert advice on germline mutation interpretation and N.B. Waanders for bioinformatic assistance. This work was supported in part by the American Lebanese Syrian Associated Charities of St. Jude Children's Research Hospital, a St. Baldrick's Foundation Scholar Award, the Henry Schueler 41 & 9 Foundation, and a Robert J. Arceci Innovation Award (to C.G. Mullighan), the NCI grants P30 CA021765 (St. Jude Cancer Center Support Grant), P50 GM115279 (to C.G. Mullighan, M.V. Relling, W.E. Evans, G. Neale, J. Zhang, and J.J. Yang), Outstanding Investigator Award R35 CA197695 (to C.G. Mullighan), the Dutch Cancer Society (KUN2012-5366, to E.Waanders), the American Society of Hematology Scholar Award (to Z. Gu), the Leukemia & Lymphoma Society's Career Development Program Special Fellow (to Z. Gu), the NCI of the NIH under award number K99CA241297 (to Z. Gu), the KiKa Foundation (KiKa150 to R.P. Kuiper), the China Scholarship Council (CSC 201304910347, to J. Yu), and the Royal Netherlands Academy of Arts and Sciences Ter Meulen Grant to (J. Yu). Work to JED was supported by funds from: the Princess Margaret Cancer Centre Foundation, Ontario Institute for Cancer Research with funding from the Province of Ontario, Canadian Institutes for Health Research, Canadian Cancer Society Research Institute, Terry Fox Research Institute, Genome Canada through the Ontario Genomics Institute, and a Canada Research Chair.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.