Progression from myelodysplastic syndromes (MDS) to secondary acute myeloid leukemia (AML) is associated with the acquisition and expansion of subclones. Our understanding of subclone evolution during progression, including the frequency and preferred order of gene mutation acquisition, remains incomplete. Sequencing of 43 paired MDS and secondary AML samples identified at least one signaling gene mutation in 44% of MDS and 60% of secondary AML samples, often below the level of standard sequencing detection. In addition, 19% of MDS and 47% of secondary AML patients harbored more than one signaling gene mutation, almost always in separate, coexisting subclones. Signaling gene mutations demonstrated diverse patterns of clonal evolution during disease progression, including acquisition, expansion, persistence, and loss of mutations, with multiple patterns often coexisting in the same patient. Multivariate analysis revealed that MDS patients who had a signaling gene mutation had a higher risk of AML progression, potentially providing a biomarker for progression.

Significance:

Subclone expansion is a hallmark of progression from MDS to secondary AML. Subclonal signaling gene mutations are common at MDS (often at low levels), show complex and convergent patterns of clonal evolution, and are associated with future progression to secondary AML.

See related article by Guess et al., p. 316 (33).

See related commentary by Romine and van Galen, p. 270.

This article is highlighted in the In This Issue feature, p. 265

Myelodysplastic syndromes (MDS) are a heterogeneous group of clonal bone marrow disorders characterized by cytopenias, abnormal differentiation, and a high risk of transformation to secondary acute myeloid leukemia (AML; refs. 1–3). Approximately 30% of MDS patients eventually progress to secondary AML, defined as >20% blasts in the blood or bone marrow. MDS patients who progress to secondary AML have inferior outcomes compared with de novo AML (4–7). Understanding the mechanism driving progression from MDS to secondary AML could help identify at-risk patients for early intervention before the development of leukemia, although the best approach for early intervention has not been established.

MDS is caused by somatic DNA mutations that accumulate in hematopoietic stem/progenitor cells. Advances in next-generation sequencing have greatly enhanced our understanding of the gene mutations present in MDS and secondary AML (8–18). The clonal architecture of bone marrow cells from MDS and secondary AML patients is defined by a founding clone and subclones derived from the founding clone (12, 16, 19–22). The majority of cells in the bone marrow are clonal at the time of MDS diagnosis, and clonal evolution is a defining characteristic of progression to secondary AML. Previous studies have demonstrated that during progression to secondary AML, at least one subclone emerges or expands (10, 12, 16, 17, 23). This raises the question of whether the order of subclonal mutation acquisition is important for progression from MDS to secondary AML, as has been shown for other hematopoietic malignancies (24, 25). Additionally, it is unknown if convergent clonal evolution (i.e., the independent acquisition of similar gene mutations in separate cancer clones derived from a shared ancestral clone) affects a specific class of genes that are preferentially mutated in subclones.

An analysis of MDS and secondary AML samples has identified differences in the frequency of gene mutations between the two disease states, including enrichment of signaling gene mutations (i.e., mutations in genes related to signal transduction, survival, and/or proliferation; e.g., FLT3, RAS family members) in secondary AML (8, 11–14, 17, 21, 26–30). However, many of these studies used unpaired samples or sequenced a limited number of samples and genes, which limits the analysis of clonal evolution of signaling gene mutations within individual patients. To overcome these limitations, we sequenced paired MDS and secondary AML samples (and serial samples) from 43 patients using targeted, whole-genome, error-corrected, and single-cell DNA sequencing to determine the frequency and order of subclonal mutation acquisition, and to define the pattern of clonal evolution of signaling gene mutations during progression from MDS to secondary AML.

Signaling Gene Mutations Are Typically Acquired after MDS Diagnosis

To assess the acquisition of somatic mutations and clonal evolution during progression from MDS to secondary AML, we evaluated paired bone marrow samples from 43 patients at MDS, secondary AML, and serial time points, along with skin biopsies as a source of matched normal DNA (Table 1; detailed clinical information available in Supplementary Table S1). We initially sequenced the 43 secondary AML samples for all exons of 284 genes recurrently mutated in myeloid malignancies (RMG). Within these genes, a core set is most often mutated in MDS and AML patients and these genes segregate into six specific cellular pathways or gene categories (8, 9, 14, 17, 31). Additionally, we manually curated additional genes to expand the common list of genes typically assigned to activated signaling and transcription factor gene categories (Supplementary Table S2). All secondary AML patients had at least one coding mutation, and 41/43 (95%) had a coding mutation in at least one gene included in six commonly mutated pathways [e.g., spliceosome (40% of patients), epigenetic regulators/chromatin modifiers (47%), cohesin (16%), TP53 (28%), transcription factors (28%), and signaling genes (49%); Fig. 1A; Supplementary Table S3]. Patients with TP53 mutations are categorized separately because they define a unique clinical group who often had a complex karyotype (11/12 patients), fewer total RMG mutations, and two concurrent TP53 mutations or a copy-number alteration or copy-neutral loss of heterozygosity spanning the TP53 locus (Fig. 1A; Supplementary Table S4; refs. 8, 11, 14, 17, 32).

Table 1.

Characteristics of 43 subjects with MDS and progression to secondary AML

Total cohort (n = 43)
ParameterPatients (percentage/range)
Sex 
 Males 29 (67%) 
 Females 14 (33%) 
Age (years) 
 <50 7 (16%) 
 50–59 8 (19%) 
 60–69 12 (28%) 
 70–79 15 (35%) 
 >80 1 (2%) 
MDS FAB subtypes 
 RA 8 (19%) 
 tMDS 6 (14%) 
 RAEB 29 (67%) 
Median time to progression (days) 183 (21–3324) 
MDS blast count 
 <5% 12 (28%) 
 5%–9% 14 (33%) 
 10%–19% 16 (37%) 
 NA 1 (2%) 
IPSS cytogenetic risk 
 Good 14 (33%) 
 Intermediate 7 (16%) 
 Poor 18 (42%) 
 NA 4 (9%) 
IPSS-R cytogenetic risk 
 Very good 0 (0%) 
 Good 13 (30%) 
 Intermediate 6 (14%) 
 Poor 3 (7%) 
 Very poor 17 (40%) 
 NA 4 (9%) 
IPSS risk group 
 Low (0) 2 (5%) 
 Intermediate 1 (0.5–1) 14 (33%) 
 Intermediate 2 (1.5–2) 13 (30%) 
 High (2.5–3.5) 7 (16%) 
 NA 7 (16%) 
IPSS-R risk group 
 Very low (0–1.5) 0 (0%) 
 Low (2–3) 4 (9%) 
 Intermediate (3.5–4.5) 10 (23%) 
 High (5–6) 8 (19%) 
 Very high (>6) 14 (33%) 
 NA 7 (16%) 
Total cohort (n = 43)
ParameterPatients (percentage/range)
Sex 
 Males 29 (67%) 
 Females 14 (33%) 
Age (years) 
 <50 7 (16%) 
 50–59 8 (19%) 
 60–69 12 (28%) 
 70–79 15 (35%) 
 >80 1 (2%) 
MDS FAB subtypes 
 RA 8 (19%) 
 tMDS 6 (14%) 
 RAEB 29 (67%) 
Median time to progression (days) 183 (21–3324) 
MDS blast count 
 <5% 12 (28%) 
 5%–9% 14 (33%) 
 10%–19% 16 (37%) 
 NA 1 (2%) 
IPSS cytogenetic risk 
 Good 14 (33%) 
 Intermediate 7 (16%) 
 Poor 18 (42%) 
 NA 4 (9%) 
IPSS-R cytogenetic risk 
 Very good 0 (0%) 
 Good 13 (30%) 
 Intermediate 6 (14%) 
 Poor 3 (7%) 
 Very poor 17 (40%) 
 NA 4 (9%) 
IPSS risk group 
 Low (0) 2 (5%) 
 Intermediate 1 (0.5–1) 14 (33%) 
 Intermediate 2 (1.5–2) 13 (30%) 
 High (2.5–3.5) 7 (16%) 
 NA 7 (16%) 
IPSS-R risk group 
 Very low (0–1.5) 0 (0%) 
 Low (2–3) 4 (9%) 
 Intermediate (3.5–4.5) 10 (23%) 
 High (5–6) 8 (19%) 
 Very high (>6) 14 (33%) 
 NA 7 (16%) 

Abbreviations: FAB, French–American–British; RA, refractory anemia; tMDS, therapy-related MDS; RAEB, refractory anemia with excess blasts; IPSS, international prognostic scoring system; IPSS-R, international prognostic scoring system-revised; NA, not available.

Figure 1.

Signaling gene mutations are significantly enriched at secondary AML compared with MDS using paired samples. A, Heat map representation of predicted protein-altering mutations recurrent in myeloid malignancies among 43 secondary AML (sAML) samples detected using standard sequencing. Each column represents an individual patient and each row a gene, gene mutations are indicated by colored cells, and gene categories are on the left. Cytogenetic status is indicated in the bottom row. Mutations detected in Tyr kinase genes: TYK2, CSF1R; RAS pathway genes: NRAS, NF1, CBL, PTPN11, and PTPRN; Cohesin genes: SMC1A, SMC3, STAG2, RAD21. B, The percentage of unpaired MDS (white) or secondary AML (black) samples with a mutation in each functional gene category (categories defined in A and Supplementary Table S2 with each gene assigned to only one category) detected using standard sequencing. C–H, Gene mutations identified in 43 secondary AML samples using standard sequencing were sequenced in the paired, antecedent MDS samples using error-corrected sequencing. Each data point represents one mutation. Secondary AML mutations not detected at MDS are indicated by a red dot. I, Summary of the secondary AML mutations that were detected in MDS samples using error-corrected sequencing based on gene category. UPN, unique patient number. Fisher exact test; *, P < 0.05; **, P < 0.01; ****, P < 0.0001.

Figure 1.

Signaling gene mutations are significantly enriched at secondary AML compared with MDS using paired samples. A, Heat map representation of predicted protein-altering mutations recurrent in myeloid malignancies among 43 secondary AML (sAML) samples detected using standard sequencing. Each column represents an individual patient and each row a gene, gene mutations are indicated by colored cells, and gene categories are on the left. Cytogenetic status is indicated in the bottom row. Mutations detected in Tyr kinase genes: TYK2, CSF1R; RAS pathway genes: NRAS, NF1, CBL, PTPN11, and PTPRN; Cohesin genes: SMC1A, SMC3, STAG2, RAD21. B, The percentage of unpaired MDS (white) or secondary AML (black) samples with a mutation in each functional gene category (categories defined in A and Supplementary Table S2 with each gene assigned to only one category) detected using standard sequencing. C–H, Gene mutations identified in 43 secondary AML samples using standard sequencing were sequenced in the paired, antecedent MDS samples using error-corrected sequencing. Each data point represents one mutation. Secondary AML mutations not detected at MDS are indicated by a red dot. I, Summary of the secondary AML mutations that were detected in MDS samples using error-corrected sequencing based on gene category. UPN, unique patient number. Fisher exact test; *, P < 0.05; **, P < 0.01; ****, P < 0.0001.

Close modal

The frequency of mutations in spliceosome, signaling genes and transcription factor genes was increased in this secondary AML cohort compared with our previously published cohort of 150 MDS patients who were not selected based on their development of secondary AML (P = 0.029, P ≤ 0.0001 and P = 0.035, respectively, Fisher exact test; Fig. 1B; ref. 17). To determine whether mutations identified in secondary AML samples in these enriched categories preexisted at MDS, we sequenced the paired antecedent MDS samples for all detected secondary AML mutations using error-corrected sequencing or targeted PCR sequencing. We observed that signaling gene mutations were less commonly detected at MDS [(10/34; 29%) present at MDS] than mutations in all other gene categories [transcription factors (15/19; 79%), epigenetic/chromatin modifier genes (28/34; 82%), spliceosome genes (18/19; 95%), TP53 (16/16; 100%), and cohesin genes (6/7; 86%); excluding 4 FLT3-ITD variants (P ≤ 0.01) for all comparisons, Fisher exact test; Fig. 1C–I]. No other categories were different from each other. We confirmed the absence of a subset of recurrent signaling gene mutations (e.g., NRAS mutations) at MDS using digital droplet PCR (ddPCR), and this absence indicates that these mutations are acquired in subclones (Supplementary Tables S5 and S6). However, some signaling and transcription factor gene mutations exist at MDS diagnosis with a high variant allele frequency (VAF; Fig. 1C and D), raising the question of whether they are present in the founding clone or a subclone.

Transcription Factor Mutations Occur in Subclones Prior to Signaling Gene Mutations

To address whether a mutation occurred in the founding clone or a subclone, we defined the clonal architecture using enhanced whole-genome sequencing (eWGS, whole-genome sequencing with higher coverage of the exome) of paired MDS and secondary AML samples (plus skin as control) from 12 patients. Detected somatic mutations were orthogonally validated using error-corrected sequencing. Using the sciClone analysis of the validated eWGS variants (Supplementary Table S7), 12 of 12 signaling gene and 6 of 8 transcription factor gene mutations were confirmed to be in subclones (Fig. 2AD; Supplementary Fig. S1A–S1F). In comparison, TP53 mutations were almost exclusively in founding clones (4/5; Supplementary Fig. S1D, S1E, S1G, and S1H), whereas epigenetic/chromatin modifiers gene mutations equally occurred in founding clones and subclones (9/15 in founding clones; Fig. 2AD; Supplementary Fig. S1A, S1C, S1F, S1I, and S1J). All three cohesin gene mutations were present in subclones that preexisted at MDS (Fig. 2A and D; Supplementary Fig. S1A and S1C). More than half of the spliceosome gene mutations occurred in a subclone with a preexisting epigenetic modifier gene mutation (e.g., TET2, SETBP1 or EZH2), an unexpected finding. In addition, both founding clone spliceosome gene mutations cooccurred with an epigenetic modifier gene mutations (e.g., IDH2, TET2, and ASXL1), making it possible that they were also acquired in a subclone after an epigenetic modifier gene mutation (Fig. 2AD; Supplementary Fig. S1A, S1C, S1I, and S1J). A similar finding was observed by Guess and colleagues, where subclonal spliceosome gene mutations cooccurred in a cell with a preexisting epigenetic modifier gene mutation (e.g., DNMT3A and IDH2) and 2 of the 3 founding clone spliceosome gene mutations cooccurred with an epigenetic modifier gene (e.g., TET2; ref. 33).

Figure 2.

Whole-genome sequencing reveals the temporal acquisition of transcription factor and signaling gene mutations in subclones. A–C, Clonal analysis of enhanced whole-genome sequencing (whole-genome sequencing with additional sequencing coverage of coding bases; i.e., exome) results from 12 paired MDS and secondary AML (sAML) samples, and validation sequencing at serial time points, identifies that both transcription factor (e.g., RUNX1 and CEBPA) and signaling gene (e.g., NRAS, PTPN11, and FLT3) mutations typically occur in subclones derived from the founding clone (green). Gene names listed more than once had multiple unique mutations detected, either in the same clone or separate clones as indicated by the color. Left, each point on the plot represents one detected mutation and the VAF of each mutation is plotted at MDS and sAML and clustering of individual mutations define clones that are indicated by a different color. Mutations present in the founding clone (green) are present in subclones derived from the founding clone. Right, clonal evolution from MDS to sAML is imputed from the clustering of mutation VAFs over time. The distance between the dashed lines is proportional to 100% of the bone marrow cells. When transcription factor and signaling gene mutations coexist in the same patient, the transcription factor gene mutation is typically acquired prior to the signaling gene mutation (A, B). D, The percentage of mutations acquired in a subclone (gray) based on whole-genome sequencing is shown for the six commonly mutated gene categories, with signaling genes occurring in subclones (n = 12 patients; 36 total subclones). E, Subclonal transcription factor and signaling gene mutations typically expand during progression from MDS to secondary AML. Each point represents an individual mutation. F, The percentage of bone marrow cells (calculated as twice the VAF for heterozygous mutations) harboring founding clone and subclone mutations at MDS and secondary AML for 12 patients with paired whole-genome sequencing. Up to five subclones were detected in at least one sample and were numbered based on the order of acquisition. The size of the founding clone does not significantly change during disease progression (green). However, the initial subclone (orange) that is often detectable at MDS presentation expands during disease progression. Additional subclones (purple) that are acquired after MDS presentation expand significantly during disease progression and typically contain signaling gene mutations. Clonal assignments were performed using sciClone and ClonEvol software packages. Error bars, SD. VAF, variant allele frequency; D, day number of the banked sample relative to first banking (D0). UPN, unique patient number.

Figure 2.

Whole-genome sequencing reveals the temporal acquisition of transcription factor and signaling gene mutations in subclones. A–C, Clonal analysis of enhanced whole-genome sequencing (whole-genome sequencing with additional sequencing coverage of coding bases; i.e., exome) results from 12 paired MDS and secondary AML (sAML) samples, and validation sequencing at serial time points, identifies that both transcription factor (e.g., RUNX1 and CEBPA) and signaling gene (e.g., NRAS, PTPN11, and FLT3) mutations typically occur in subclones derived from the founding clone (green). Gene names listed more than once had multiple unique mutations detected, either in the same clone or separate clones as indicated by the color. Left, each point on the plot represents one detected mutation and the VAF of each mutation is plotted at MDS and sAML and clustering of individual mutations define clones that are indicated by a different color. Mutations present in the founding clone (green) are present in subclones derived from the founding clone. Right, clonal evolution from MDS to sAML is imputed from the clustering of mutation VAFs over time. The distance between the dashed lines is proportional to 100% of the bone marrow cells. When transcription factor and signaling gene mutations coexist in the same patient, the transcription factor gene mutation is typically acquired prior to the signaling gene mutation (A, B). D, The percentage of mutations acquired in a subclone (gray) based on whole-genome sequencing is shown for the six commonly mutated gene categories, with signaling genes occurring in subclones (n = 12 patients; 36 total subclones). E, Subclonal transcription factor and signaling gene mutations typically expand during progression from MDS to secondary AML. Each point represents an individual mutation. F, The percentage of bone marrow cells (calculated as twice the VAF for heterozygous mutations) harboring founding clone and subclone mutations at MDS and secondary AML for 12 patients with paired whole-genome sequencing. Up to five subclones were detected in at least one sample and were numbered based on the order of acquisition. The size of the founding clone does not significantly change during disease progression (green). However, the initial subclone (orange) that is often detectable at MDS presentation expands during disease progression. Additional subclones (purple) that are acquired after MDS presentation expand significantly during disease progression and typically contain signaling gene mutations. Clonal assignments were performed using sciClone and ClonEvol software packages. Error bars, SD. VAF, variant allele frequency; D, day number of the banked sample relative to first banking (D0). UPN, unique patient number.

Close modal

In total, when eWGS (Fig. 2D) and gene-panel sequencing imputed clonality results (Fig. 2; Supplementary Table S3) are combined, at least 32 of 34 (94%) signaling gene mutations and 12 of 19 (63%) transcription factor mutations were determined to be subclonal. Of the 18 transcription factor and signaling gene mutations defined as subclonal by eWGS, 15 (9/12 signaling gene– and 6/6 transcription factor–mutated subclones) expanded at the time of progression to secondary AML (Fig. 2E). Collectively, the eWGS results show that subclones expand at progression (Supplementary Table S7; Fig. 2F) and suggest that the majority of both transcription factor and signaling gene mutations occur in subclones, raising the question of whether there is a preferred order of subclonal mutation acquisition for these two pathways.

To directly address this question, we identified eight patients who had both a transcription factor and signaling gene mutation and unequivocal clonal assignment of mutations. In two eWGS (Fig. 2A and B)- and three RMG (Supplementary Fig. S2A–S2C)-sequenced cases, a subclonal transcription factor mutation was acquired before a signaling gene mutation. Single-cell DNA sequencing of three additional secondary AML samples confirmed that transcription factor mutations were gained prior to signaling genes when they cooccurred in the same cell (Fig. 3AC). Collectively, the results suggest that there is a preferred order of subclone gene mutation acquisition with transcription factor gene mutations occurring first followed by the acquisition of one or more subclonal signaling gene mutations.

Figure 3.

Single-cell DNA sequencing identifies the convergent clonal evolution of signaling gene mutations in secondary AML. Single-cell sequencing of six secondary AML samples containing 17 signaling gene mutations. A–C, Transcription factor gene mutations (blue) are acquired prior to RAS family signaling gene mutations (orange). A, B, D–F, When multiple RAS family signaling gene mutations occur in a sample, signaling gene mutations occurred in parallel subclones (i.e., convergent clonal evolution of signaling gene mutations). C, Only one case had two signaling gene mutations in the same cell. Mutations in genes recurrently mutated in myeloid malignancy (see Fig. 1A) other than signaling or transcription factor genes are colored gray. Percentages are the fraction of cells in a sample with a genotype and the circle areas are proportional to the fraction of cells containing the mutation. Clonal architecture was determined using Tapestri Insights software (MissionBio) and manual review.

Figure 3.

Single-cell DNA sequencing identifies the convergent clonal evolution of signaling gene mutations in secondary AML. Single-cell sequencing of six secondary AML samples containing 17 signaling gene mutations. A–C, Transcription factor gene mutations (blue) are acquired prior to RAS family signaling gene mutations (orange). A, B, D–F, When multiple RAS family signaling gene mutations occur in a sample, signaling gene mutations occurred in parallel subclones (i.e., convergent clonal evolution of signaling gene mutations). C, Only one case had two signaling gene mutations in the same cell. Mutations in genes recurrently mutated in myeloid malignancy (see Fig. 1A) other than signaling or transcription factor genes are colored gray. Percentages are the fraction of cells in a sample with a genotype and the circle areas are proportional to the fraction of cells containing the mutation. Clonal architecture was determined using Tapestri Insights software (MissionBio) and manual review.

Close modal

Multiple Signaling Gene Mutations Rarely Cooccur in the Same Cell

Sequencing of secondary AML samples revealed that when a signaling gene was detected in the minority of cells at secondary AML (VAF <25%), there was typically a second signaling gene mutation, often at a higher VAF, that cooccurred in the same sample. Across the cohort, standard capture-based sequencing showed that 12 of 43 secondary AML samples had more than one signaling gene mutation, representing 57% of cases with a signaling gene mutation. Eight patients had multiple RAS pathway gene mutations in the same secondary AML sample (i.e., RAS pathway multimutation patients). We hypothesized that multiple signaling gene mutations would be rare within the same cell/subclone, specifically for the RAS pathway. We used single-cell DNA sequencing to determine if mutations in RAS family members cooccurred in the same cell when multiple mutations were present in a sample (Supplementary Table S8).

We identified 6 patients harboring a total of 17 signaling gene mutations at secondary AML. Collectively, the 17 RAS family gene mutations were spread across 17 subclones (Fig. 3AF). Only one case had two signaling gene mutations cooccur in the same cell (e.g., NRAS and PTPN11 mutations; Fig. 3C). A similar finding was observed by Guess and colleagues, where signaling gene mutations occurred in separate subclones for 3 patients harboring multiple signaling gene mutations (33). This indicates that the cooccurrence of two RAS family gene mutations in the same secondary AML subclone is rare, suggesting that having multiple signaling gene mutations may be functionally redundant or detrimental to leukemia cells. Single-cell sequencing was necessary to separate signaling gene mutations with lower VAFs into separate subclones (i.e., branched or parallel clonal evolution) compared with standard coverage bulk whole-genome sequencing (as seen for UPN990168), providing evidence that subclonal diversity maybe more complex than previously appreciated based on bulk sequencing (Supplementary Fig. S3).

Signaling Gene Mutations Are Common and Present Below the Level of Detection of Standard Sequencing in AML Patient Samples

Based on the presence of multiple signaling gene mutations in separate clones in the single-cell DNA sequencing results, we asked whether increased signaling gene clonal complexity exists below the level of standard sequencing detection (e.g., VAF <2%). To address this question, we performed the error-corrected sequencing of the secondary AML samples using probes that captured signaling genes and utilized stringent bioinformatic filters to identify hotspot variants (Supplementary Table S9). In addition to the previously detected 34 signaling gene mutations with a VAF ≥2%, 37 signaling gene mutations with a VAF <2% were identified at secondary AML (average VAF: 0.23%; range, 0.026%–0.86%), with NRAS mutations being the most common (4 FLT3-ITD variants were excluded; Fig. 4A and B; Supplementary Table S10). This result shows that signaling gene mutations are acquired at a rate even higher in secondary AML patients than previously appreciated and suggests that the convergent clonal evolution of signaling gene mutations in secondary AML patients is common and likely important for disease pathogenesis.

Figure 4.

Signaling gene mutations are common and often present below the level of detection of standard sequencing in MDS and secondary AML patient samples. Error-corrected and capture sequencing identified 90 mutations in signaling genes in 43 patients (including 4 FLT3-ITD mutations; 1 in sAML only and 3 in MDS and sAML categories shown in E). A, 50 signaling gene mutations that were detected at sAML were detected only in the sAML sample. B, 21 signaling gene mutations that were detected at sAML were also detected in the paired MDS sample. C, 15 signaling gene mutations were detected at MDS but not in the paired secondary AML sample. D, Percentage of MDS or sAML patients with a detected signaling gene mutation segregated by mutation VAF. E, Of the 90 detected signaling gene mutations, 51 were detected only in the sAML sample, 24 were detected in both MDS and sAML paired samples, and 15 were detected only in the MDS sample.

Figure 4.

Signaling gene mutations are common and often present below the level of detection of standard sequencing in MDS and secondary AML patient samples. Error-corrected and capture sequencing identified 90 mutations in signaling genes in 43 patients (including 4 FLT3-ITD mutations; 1 in sAML only and 3 in MDS and sAML categories shown in E). A, 50 signaling gene mutations that were detected at sAML were detected only in the sAML sample. B, 21 signaling gene mutations that were detected at sAML were also detected in the paired MDS sample. C, 15 signaling gene mutations were detected at MDS but not in the paired secondary AML sample. D, Percentage of MDS or sAML patients with a detected signaling gene mutation segregated by mutation VAF. E, Of the 90 detected signaling gene mutations, 51 were detected only in the sAML sample, 24 were detected in both MDS and sAML paired samples, and 15 were detected only in the MDS sample.

Close modal

Signaling Gene Mutations Are Common at Low Levels in MDS Patient Samples

Because signaling gene mutations were more common at secondary AML than previously identified with standard sequencing, we asked if the same was true at MDS. Using error-corrected sequencing and the same set of bioinformatics filters, we identified 23 additional signaling gene mutations at MDS with a VAF <2% (average VAF 0.34%; range, 0.04%–1.06%), with NRAS and PTPN11 mutations being most common (3 FLT3-ITD variants were excluded; Fig. 4B and C; Supplementary Table S11). When combined with prior sequencing results (including FLT3-ITD), a total of 39 signaling gene mutations were detected at MDS in 19 (44%) patients, a higher proportion than previously reported (8, 14, 17). Error-corrected sequencing identified a higher proportion of patients with a signaling gene mutation at MDS and secondary AML than using a VAF threshold ≥2% obtained with standard sequencing (42% vs. 19% in MDS, and 56% vs. 42% in sAML, respectively; excluding FLT3-ITD; Fig. 4D). A subset of recurrent hotspot mutations in NRAS was orthogonally validated using ddPCR with 100% specificity (Supplementary Fig. S4A–S4L; Supplementary Table S6). These data indicate that there is increased clonal complexity in both MDS and secondary AML samples than previously appreciated using standard sequencing, especially for mutations with VAFs less than 2%, and error-corrected sequencing allows for their detection in bulk samples.

Signaling Gene Mutations Show Diverse Patterns of Clonal Evolution during Disease Progression

To assess the clonal evolution of signaling gene mutations during disease progression, we compared the mutations detected at MDS and secondary AML by either error-corrected or standard sequencing. Of the mutations present at MDS, 24/39 (62%) were still detectable after progression to secondary AML, whereas the remaining 15 were only detectable at MDS (Fig. 4B, C, and E). The remaining 51 (68%) signaling gene mutations were detectable at secondary AML but not MDS, even at the level of detection of error-corrected sequencing (Fig. 4E). As mentioned previously, a subset of these secondary AML-specific mutations was confirmed to be absent at MDS by ddPCR (Supplementary Tables S5 and S6). The VAFs of mutations that persisted or contracted from MDS to secondary AML were variable (Fig. 4B and C). In fact, in patients with multiple MDS signaling gene mutations, the signaling gene mutation with the highest VAF at MDS was not always the mutation that persisted at secondary AML (Supplementary Tables S10 and S11).

When signaling gene mutations were evaluated at the patient level, we found that the 39 MDS signaling gene mutations were spread across 19 patients, whereas the 75 secondary AML signaling gene mutations were spread across 26 patients (Fig. 5A and B). Additionally, 15 of 19 MDS patients with a signaling gene mutation had a signaling gene mutation at secondary AML, though not always the same one. We observed that 8 (19%) of MDS and 20 (47%) of secondary AML patients harbored more than one signaling gene mutation (i.e., signaling gene multimutation cases), with a subset of cases having more than one signaling gene mutation at both MDS and secondary AML (Fig. 5AC).

Figure 5.

Complex clonal evolution of signaling gene mutations during disease progression is common and a hallmark of patients with multiple mutations. A, The number of MDS patients who had zero, one, or more than one signaling gene mutation detected. B, The number of sAML patients who had zero, one, or more than one signaling gene mutation detected. C, The number of signaling gene mutations in 22 patients with multiple signaling gene mutations at either MDS or secondary AML. D, Signaling gene mutations could be categorized broadly into three patterns of clonal evolution; (i) those that were acquired during progression from MDS to sAML, (ii) those that persist or expand during progression from MDS to sAML, and (iii) those that were present at MDS and contract prior to sAML progression. Patients exhibited one or more of these clonal evolution patterns, as indicated by the number and percentage of patients at the far right. Mutations were detected by error-corrected and capture sequencing.

Figure 5.

Complex clonal evolution of signaling gene mutations during disease progression is common and a hallmark of patients with multiple mutations. A, The number of MDS patients who had zero, one, or more than one signaling gene mutation detected. B, The number of sAML patients who had zero, one, or more than one signaling gene mutation detected. C, The number of signaling gene mutations in 22 patients with multiple signaling gene mutations at either MDS or secondary AML. D, Signaling gene mutations could be categorized broadly into three patterns of clonal evolution; (i) those that were acquired during progression from MDS to sAML, (ii) those that persist or expand during progression from MDS to sAML, and (iii) those that were present at MDS and contract prior to sAML progression. Patients exhibited one or more of these clonal evolution patterns, as indicated by the number and percentage of patients at the far right. Mutations were detected by error-corrected and capture sequencing.

Close modal

When comparing mutations present at MDS and secondary AML, NRAS and KRAS were mutated at similar rates in MDS samples (10 NRAS and 7 KRAS mutations), but NRAS mutations were much more common in secondary AML samples (31 NRAS and 7 KRAS; Fig. 4AC). The similar rates at MDS, but not secondary AML, suggest that the acquisition rates of NRAS and KRAS mutations may be similar, but the potential to drive leukemic clonal expansion in myeloid cells favors mutant NRAS over KRAS.

The 90 signaling gene mutations detected at MDS or secondary AML showed three patterns of clonal evolution during disease progression: (i) new signaling gene mutations that were acquired during progression (n = 51), (ii) signaling gene mutations that persisted or expanded during disease progression (n = 24), and (iii) signaling gene mutations that were present at MDS and disappeared during disease progression to secondary AML (n = 15; Figs. 4E and 5D). Notably, many mutations in both the acquired and persist categories remained below 5% VAF after progression to secondary AML. Additionally, a similar percentage of persisting and acquired mutations expanded to greater than 5% VAF at the time of disease progression, indicating that early acquisition is not necessarily linked to clonal expansion.

These three patterns of signaling gene clonal evolution can exist in parallel in the same patient. For example, one MDS patient may have three signaling gene mutations, with one mutation disappearing, whereas two expand. A secondary AML patient might have two signaling gene mutations, one that persisted from MDS whereas another was acquired (Fig. 5D). Among the 70% of patients harboring a signaling gene mutation at any time point, the single most common pattern was to acquire a signaling gene mutation during progression (26%). However, every possible pattern, including all three in the same patient, was represented (Fig. 5D). Collectively, signaling gene mutations were characterized by diverse patterns of expansion and contraction during progression from MDS to secondary AML.

Association of Signaling Gene Mutations with Progression to Secondary AML

The high fraction of MDS patients in our cohort with signaling gene mutations suggests that the presence of a signaling gene mutation at MDS, even at low levels, may be associated with future progression to secondary AML. To determine whether signaling gene mutations were predictive of progression, we identified patients who had banked MDS bone marrow samples (including some in the original cohort), an International Prognostic Scoring System-Revised (IPSS-R) score (based on peripheral blood counts, cytogenetic abnormalities, and bone marrow blast cell counts), and clinical follow-up. IPSS-R scores range from 0 to 10 and define five risk categories (very low, low, intermediate, high, and very high), with higher scores and categories (e.g., high/very high) indicating a worse prognosis and increased risk of progression to secondary AML. We identified 135 MDS patients who met these criteria and performed error-corrected sequencing for 40 recurrently mutated genes in myeloid neoplasms, including signaling genes, on their banked MDS samples (Supplementary Tables S12–S14; Supplementary Fig. S5A).

We next asked whether the detection of a signaling gene mutation (at any VAF) was predictive of progression beyond known clinical and genetic risk factors, including age, sex, IPSS-R score [grouped as lower risk (IPSS-R ≤4.5, very low/low/intermediate categories) or higher risk (IPSS-R >4.5, high/very high-risk categories)], type of MDS (de novo or therapy-related), TP53 mutation, or a mutation in one of five genes associated with poor overall survival (e.g., TP53, EZH2, ETV6, RUNX1, or ASXL1; ref. 34). A Fine–Gray model controlling for the competing risk of death suggested that variables associated with an increased cumulative incidence of progression to secondary AML were patients with higher-risk MDS (e.g., high/very-high-risk IPSS-R categories; P < 0.001), female (P = 0.023), and having a signaling gene mutation (P = 0.007; Supplementary Fig. S5B). The presence of at least one signaling gene mutation with any VAF was associated with a higher risk of progression to secondary AML than the absence of such a mutation, even after adjustment for other covariates [hazard ratio, 3.01; 95% confidence interval (CI), 1.35–6.70; P = 0.007, Fine–Gray; Supplementary Fig. S5B]. Inclusion of a covariate for the interaction between a signaling gene mutation and IPSS-R risk category (e.g., lower-risk and higher-risk groups) revealed that the effect of having a signaling gene mutation was stronger for the very low/low/intermediate-risk group (Supplementary Fig. S5B). Therefore, we examined the effect of a mutation in both lower and higher IPSS-R risk groups separately. Among very low/low/intermediate IPSS-R risk MDS patients, the presence of at least one signaling gene mutation with any VAF was associated with a higher risk of progression to secondary AML than the absence of such a mutation [hazard ratio, 2.94; 95% confidence interval (CI), 1.38–6.27; P = 0.003, Fine–Gray; Fig. 6A]. In contrast, the presence of at least one signaling gene mutation with any VAF in high/very high IPSS-R risk MDS patients was not associated with a higher risk of progression to secondary AML than the absence of such a mutation (hazard ratio, 1.05; 95% CI, 0.52–2.12; P = 0.897, Fine–Gray; Fig. 6B). Variables associated with a lower rate of progression-free survival (i.e., the time from MDS banking to progression to secondary AML or death, whichever occurred first) were a higher risk IPSS-R score (P = 0.007, Cox proportional hazards), female (P = 0.039, Cox proportional hazards), and a TP53 gene mutation (P = 0.010, Cox proportional hazards; Supplementary Fig. S5C). Analysis of lower-risk MDS patients defined as an IPSS-R score of ≤3.5 showed similar results (35). The presence of at least one signaling gene mutation with any VAF was associated with a higher risk of progression to secondary AML (hazard ratio, 4.05; 95% CI, 1.35–12.17; P = 0.013, Fine–Gray) than the absence of such a mutation (Supplementary Fig. S6A), and also a lower rate of progression-free survival (hazard ratio, 2.18; 95% CI, 1.03–4.61; P = 0.042, Cox proportional hazards; Supplementary Fig. S6B). The higher risk of progression to secondary AML in patients with a signaling gene mutation was also restricted to lower risk (e.g., IPSS-R score ≤3.5; hazard ratio, 4.25; 95% CI, 1.41–12.80; P = 0.005, Fine–Gray; Supplementary Fig. S6C) but not higher-risk MDS patients (e.g., IPSS-R score >3.5; hazard ratio, 1.22; 95% CI, 0.68–2.19; P = 0.506, Fine–Gray; Supplementary Fig. S6D). Although exploratory, the analyses indicate an association between the detection of a signaling gene mutation at MDS and an increased risk of progression to secondary AML, especially in lower-risk MDS patients, potentially nominating the detection of a signaling gene mutation as a possible biomarker of progression, regardless of whether the mutation expands at secondary AML.

Figure 6.

Association of signaling gene mutations with progression to secondary AML. Signaling gene mutations were determined with the use of error-corrected sequencing at MDS. Patients are grouped according to the presence (yes) or absence (no) of a signaling gene (SG) mutation. The rates of progression to secondary AML in MDS patients with very low/low/intermediate IPSS-R risk MDS (IPSS-R score ≤4.5; A,n = 84) or high/very high IPSS-R risk MDS (IPSS-R score >4.5; B,n = 51) are shown. IPSS-R, International Prognostic Scoring System-Revised. HR, hazard ratio. 95% confidence intervals are shown in brackets.

Figure 6.

Association of signaling gene mutations with progression to secondary AML. Signaling gene mutations were determined with the use of error-corrected sequencing at MDS. Patients are grouped according to the presence (yes) or absence (no) of a signaling gene (SG) mutation. The rates of progression to secondary AML in MDS patients with very low/low/intermediate IPSS-R risk MDS (IPSS-R score ≤4.5; A,n = 84) or high/very high IPSS-R risk MDS (IPSS-R score >4.5; B,n = 51) are shown. IPSS-R, International Prognostic Scoring System-Revised. HR, hazard ratio. 95% confidence intervals are shown in brackets.

Close modal

In this study, we discovered that both transcription factor and signaling gene mutations are typically acquired in subclones, with a preferred order of acquisition (e.g., transcription factor prior to signaling gene mutations). As subclone acquisition and expansion are defining features of clonal evolution during MDS disease progression, the data indicate that these mutations are significant contributors to progression. Using error-corrected sequencing, we determined that signaling gene mutations are much more common in both MDS and secondary AML samples compared with what we observed with standard sequencing approaches. Additionally, multiple signaling gene mutations often coexist in the same patient (i.e., signaling gene multimutation patients), with half of the patients in our cohort with paired MDS and secondary AML samples having two or more signaling gene mutations at some time point. Our whole-genome and single-cell DNA sequencing studies of MDS and secondary AML samples showed that cooccurring signaling gene mutations are rarely in the same cell. The convergent clonal evolution of signaling gene mutations, both within the same patient and across the cohort of MDS patients who develop secondary AML, indicates their importance for disease progression. The diverse patterns of signaling gene mutation clonal evolution during MDS progression also highlight that the therapeutic targeting of these mutated pathways will be challenging given the subclone diversity.

Patients who have multiple signaling gene mutations may represent a distinct subset of MDS patients who acquire these mutations at an elevated rate and are at greater risk of progressing to secondary AML. Multiple signaling gene mutations have also been detected by single-cell/colony sequencing of AML samples (36–38), and the emergence and disappearance of signaling gene mutations were observed at relapse of AML and ALL (39, 40). These findings indicate that not all acquired signaling gene mutations expand, rather, they are repeatedly acquired and likely contribute to MDS disease progression only when they occur in the correct context (e.g., due to cell-intrinsic or extrinsic factors). The preferred order of subclonal mutation acquisition, with transcription factor mutations acquired prior to those in signaling genes, was also observed in a small number of patients in prior studies (11, 23), suggesting that a transcription factor gene mutation may be one factor that makes an MDS cell more likely to expand when a signaling gene mutation is acquired. Analysis using an extended list of manually curated genes for each gene category should also be considered based on available or emerging functional data for a gene. Ultimately, we may be underestimating the true frequency of patients with a specific pathway mutation. The findings regarding the order of mutation acquisition and the convergent clonal evolution of signaling gene mutations in parallel subclones need validation in additional, larger cohorts. The contribution of epigenetic modifier and spliceosome gene mutations, two of the most common categories of genes mutated in our study, for subclone expansion should also be explored in future studies.

The increased subclonal complexity identified in this study likely exists in stem cells (20), and additional cooperating factors are likely involved in disease progression, including copy-number alterations, structural variants, cell-extrinsic factors (e.g., inflammatory cytokines), and gene-expression changes; none of which were explored here (29). The recent finding that the mutant ZRSR2-induced alternative splicing of LZTR1, a regulator of RAS-related GTPases, occurs in MDS further supports the importance of convergent RAS pathway dysregulation in disease progression (41). Moving forward, the impact of chemotherapy and emerging RAS inhibitors on the clonal evolution of signaling gene mutations will require further investigation (42).

The sizable fraction of patients in our cohort who have a signaling gene mutation detectable at MDS raises the question of whether low-level signaling gene mutations are enriched in MDS patients who later progress to secondary AML. Although caveats exist, our results uncovered an association between the detection of a signaling gene mutation and an increased risk of disease progression in lower-risk MDS patients. The mechanism of how low-level signaling gene mutations contribute to disease pathogenesis is not known. These mutations may identify patients who have a greater propensity to generate new subclones with or without additional signaling gene mutations. Generating a larger number of subclones could increase the odds that one subclone will eventually expand and contribute to disease progression. Alternatively, signaling gene mutant cells may influence surrounding cells via a paracrine effect. Ultimately, further investigations are needed to understand the cell-intrinsic and cell-extrinsic effects of these mutations on AML progression. If the acquisition of signaling gene mutations is validated as a biomarker of future disease progression, the presence of these mutations, even at very low VAFs might allow for the identification of MDS patients at high risk of progression, months to years prior to the development of secondary AML.

This study also demonstrates the limitations and advantages of the various sequencing approaches. Standard sequencing is often capable of defining the clonal architecture of a bulk tumor sample for clones present in 10% to 20% or more of cells; however, it can be difficult to accurately define smaller clones, especially when mutations in different clones have similar VAFs. Alternatively, single-cell sequencing can accurately segregate low mutation VAFs into separate clones but is limited by the number of cells and mutations assayed (typically less than in bulk sequencing) and the allele dropout rate. Utilizing multiple sequencing methods, based on the sample characteristics and the question being addressed, may ultimately be required for comprehensive clonal analysis.

In summary, our data suggest that signaling gene mutations play a significant role in disease progression and that patients who progress with a signaling gene mutation likely had a diverse pattern of clonal evolution prior to progression. To further clarify this phenomenon, prospective serially banked samples from a larger cohort of MDS patients should be sequenced between diagnosis and progression to secondary AML to monitor the acquisition and contraction of low-level signaling gene mutations, identify and track the characteristics of signaling gene–mutated clones that expand at the time of disease progression, define a signaling gene mutation VAF cutoff associated with progression, and calculate the positive and negative predictive values of detecting a signaling gene mutation. These additional studies may allow for better patient monitoring and early intervention at the first sign of disease progression. Additionally, this study indicates that the detection of signaling gene mutations at MDS, even at very low frequencies, may predict future progression that could ultimately contribute to the reduced overall survival in patients with these mutations (28). Collectively, our data indicate that the relationship between the presence of a signaling gene mutation and progression to secondary AML is more complex than previously appreciated. The paradigm of “acquire then progress” for signaling gene mutations in MDS is not always true. Only by extensively characterizing the diverse patterns of signaling gene mutation acquisition, loss, and clonal expansion can we determine the clinical relevance of signaling gene mutations (even at low levels) in MDS patients.

Patients

Forty-three patients with MDS who progressed to secondary AML were selected for the study. All patients had DNA available at MDS diagnosis and secondary AML progression, and a matched skin biopsy as a source of normal DNA. A subset of patients had serial samples available between MDS and secondary AML or after secondary AML progression. Summary clinical information is available in Table 1, and detailed information on all sequenced time points is provided in Supplementary Table S1 for the 43 patients with paired MDS and secondary AML samples. Summary clinical information is available in Supplementary Table S12 and detailed clinical characteristics in Supplementary Table S13 for 135 MDS patient samples (including some samples from the 43 patient cohort) that were sequenced for Fig. 6. The study was approved by the institutional review board at Washington University in St. Louis (protocol #201011766). All patients provided written informed consent that included explicit permission for genetic studies, including whole-genome sequencing.

Capture Sequencing of Recurrently Mutated Genes

DNA isolated from paired skin (normal control, or sorted T cells from 543465) and bulk bone marrow at the time of secondary AML diagnosis (n = 43, and the MDS sample for 743041) was enriched for all exons of a custom set of 285 RMGs in MDS and AML, which has previously been published (15, 43). Enriched libraries were sequenced on a HiSeq2500 instrument, as previously described (15, 44).

Enhanced Whole-Genome Sequencing

Enhanced whole-genome sequencing (eWGS) was performed on samples harvested from 12 patients. DNA isolated from normal skin (or sorted T cells from 543465), MDS, and secondary AML samples was enriched for the exome using the exome capture sequencing reagent (IDT xGen Lockdown Exome Panel v1 capture set) and then “spiked-in” to whole-genome sequencing libraries. Libraries were prepared from genomic DNA (not subjected to whole-genome amplification). Sequencing was generated on Illumina HiSeq4000 instrument.

Error-Corrected Sequencing

Ultra-deep error-corrected sequencing was performed to (i) orthogonally validate mutations detected by eWGS; (ii) to characterize MDS samples (n = 43, not including UPN 743041; n = 135, including some samples from the 43 patient cohort); and (iii) detect low-level variants in AML samples (n = 41, not including UPNs 554562, 741699 and 782328). To validate eWGS results, ligation-based probes were designed against mutations in the genome detected at either MDS or secondary AML time points. MDS, secondary AML, and any available serial samples were then sequenced. For MDS characterization, probes against either all exons or known hotspots of 40 recurrently mutated genes, as previously published (45), plus mutations detected at secondary AML via capture sequencing were designed. For sAML low-level variant detection, probes against either all exons or known hotspots of the same 40 recurrently mutated genes used in MDS characterization were designed.

The HaloplexHS kit (Agilent Technologies) was used for error-corrected sequencing. To create libraries for error-corrected sequencing, 500 ng of DNA was digested with a custom set of restriction enzymes before hybridization to probes against the target regions, plus 10 base pair degenerate oligonucleotides [unique molecular indexes (UMIs)] and sample-specific indexes. Following magnetic bead enrichment, libraries were amplified with PCR using primers with tailed Illumina sequencing motifs to generate libraries. Sequencing was performed on either the Illumina HiSeq 4000 (eWGS validation) or Illumina NovaSeq 6000 (MDS characterization and sAML low-level variant detection).

Targeted Amplicon Sequencing

For some mutations, tailed primers flanking the target region were designed to generate <250 base pair amplicons by PCR. PCR product from the first PCR reaction was used as input for a second round of PCR, with second-round primers designed against the original primer tails. Second-round primers were tailed with Illumina P5 and P7 adaptors. Second-round primers also included sample identifying indexes. EconoTaq (Lucigen) was used for all PCR reactions. Amplicons from individual reactions were pooled to create libraries that were sequenced on the Illumina MiSeq. Primer sequences are provided in Supplementary Table S15.

Single-Cell DNA Sequencing

Six samples were chosen for single-cell DNA sequencing using the Tapestri platform (MissionBio). In brief, single-cell droplets were generated and barcoded using unique indexes. Targeted sequencing was then performed against 40 genes commonly mutated in myeloid malignancy (myeloid panel, MissionBio). After library preparation, sequencing was performed on Illumina NovaSeq 6000 instruments, and analysis was performed using Tapestri Insights software (MissionBio). Quality metrics for the single-cell sequencing are available in Supplementary Table S8.

Droplet Digital PCR

Droplet digital PCR was performed on a subset of MDS samples using validated kits and consumables purchased from Bio-Rad. Droplets containing DNA, primers, restriction enzymes, and probes against both WT and the target mutation were generated using the QX200 droplet generator. PCR of the target region was performed to attach droplet-specific barcodes and probes to the target DNA. Droplets were then read using the QX200 droplet reader. Three mutant single-positive droplets were required to validate previously identified mutations. Sample-specific limits of detection are listed in Supplementary Table S6.

Somatic Variant Analysis and Validation

Sequencing data were aligned to the human reference sequence build GRCh37 or GRCh38 using bwa(1) version 0.5.9, as indicated. Variants located on contigs that are unplaced on chromosomes were excluded from the analysis.

Capture/Enhanced Whole-Genome Sequencing.

Capture and whole-genome sequencing data were merged and deduplicated using picard version 1.46 (Broad Institute). Variants were called from tumor samples with their matched skin samples, as has been previously described (44). Single-nucleotide variants (SNVs) were detected using the union of four callers: (i) samtools(2) version r982, (ii) VarScan(4) version 2.3.6, (iii) Strelka(5) version 1.0.11, and (iv) MuTect version 1.1.4. Pindel was utilized for the identification of insertions and deletions. SNVs and INDELs were further filtered by removing artifacts found in a panel of 151 normal exomes, removing sites that exceeded 1% frequency in the TCGA exome sequencing cohort, and then using a Bayesian classifier and retaining variants classified as somatic with a binomial log-likelihood of at least 3 and 20× sequencing depth.

Imputed Clonality from Capture Sequencing of Recurrently Mutated Genes.

Based on the VAF distribution of mutations within a clone in our eWGS data, subclone mutations had a VAF at least 2 standard deviations (e.g., 16%) less than the highest detected VAF in the sample. We included variants found in a list of transcription factors (e.g., CEBPA, CUX1, ETV6, GATA1, GATA2, IRF1, NCOR2, RUNX1) and signaling genes (e.g., BRAF, CALR, CBL, CSF1R, CSF3R, FLT3, JAK2, KDR, KIT, KRAS, MPL, NF1, NRAS, PTPN11, PTPRN, and TYK2) for the downstream analysis. Using this threshold, we correctly identified 5 of 6 subclonal transcription factor gene mutations, 11 of 12 subclonal signaling gene mutations, and did not misclassify any mutation as subclonal in our eWGS data.

Error-Corrected Sequencing.

Ultra-deep error-corrected sequencing was used for the low-level mutation detection and validation of mutations for clonal analysis. Barcoded FASTQ data were demultiplexed using a custom python script that adds degenerate barcode information to FASTQ files. Data were then aligned to build GRCh37 using bwa mem (version 1.9.a), with default parameters for data presented in Supplementary Tables S3, S10, and S11. Data were aligned to build GRCh38 using bwa mem (version 1.9.a) with default parameters for some variants presented in Supplementary Table S14. For clonal analysis, the variant call output of a modified (to allow deeper sequence input) version of the Myeloseq software package for UMI-based sequencing data (https://github.com/genome/cle-myeloseq) was used to validate mutations. Automated clustering of individual amplicon VAFs was performed to identify and filter outlier amplicons. The clustering and distribution of amplicon VAFs were manually inspected by two independent reviewers. Coding and splice-site variants with VAFs ≥2%, a minimum of 2 supporting amplicons (with 60% or more of amplicons containing variant reads depending on coverage), and present in 10 or more read families were manually reviewed. Variants from prior clinical and research sequencing were reviewed when available. Potential FLT3-ITD and ASXL1 frameshift variants were manually reviewed. For low-level mutations, the Myeloseq package was used up to the point of generating a consensus BAM of collapsed read families. A modified (to require 3 reads per read family) version of the Myeloseq package script “addAmpliconInfo­AndCountReads.py” was then used to generate read family counts at the positions of interest for the statistical analysis. We determined whether each detected variant was above the background noise level on a per-position basis as follows. The total number of variant reads in all samples as a proportion of the total number of reference reads plus variant reads in all samples is used as an estimator of the background rate for a given variant base and position. For each sample, a binomial P value is calculated using the sample reference and variant reads using this proportion. Variants with a P < 0.0001 and a minimum of 4 read families supporting the variant were designated positive. Positive calls were then removed from the background calculation, including all remaining variants at the same nucleotide position in a sample with a positive call, and the process was repeated iteratively until no new variants were identified. For the validation of previously detected mutations, via either capture or eWGS, data were processed using the above script, modified to allow for deeper sequencing, to generate unfiltered variant calls at the regions of interest.

Targeted Amplicon Sequencing.

FASTQ files were demultiplexed and aligned to GRCh37. Aligned reads were then viewed in IGV and variant and reference reads were quantified. A VAF threshold of 2% was applied unless the sequenced region showed greater than normal variability (the surrounding region showed >2% nonspecific variant base pairs).

Copy-Number Analysis.

Copy-number analysis was performed on eWGS data using the Copy Change Assessment Tool 2 (CopyCAT2) R software package with default parameters (Abel and Duncavage, unpublished data, https://github.com/abelhj/cc2). Coverage was first calculated using bedTools coverageBed with default parameters (46). To call somatic copy-number alterations (CNA), skin data from patients in this study were used as a normal control. To ensure that constitutional CNAs were not called somatic variants, CNA analysis was also performed on each patient's skin sample using the same series of pooled normal skin controls; CNAs that appeared in both paired skin and paired MDS/secondary AML marrow samples were considered constitutional variants and removed from the analysis. Somatic CNAs were called if determined to be significant by the software using a P value cutoff of 1 × 10−6 for allele frequency and 1 × 10−6 for coverage. Normalized log2 coverage data corresponding to those regions were manually reviewed.

Analysis of Clonal Architecture and Evolution

To determine tumor clonality and track the temporal patterns of subclonal evolution during disease progression, we characterized the subclonal composition of each patient using sciClone package for R (47). This bioinformatics tool infers the clonal architecture by analyzing the VAFs of somatic mutations across multiple time points, and in this study it was used with default parameters. We primarily focused on somatic variants in copy-number neutral loci (i.e., diploid/non-sex chromosomes) without loss of heterozygosity, though some copy-number–altered somatic mutations of interest were included. This allowed for a more accurate determination of VAF for high confidence inference of tumor clonality. In those exceptions, VAF was corrected using their copy-number estimate.

Tumor evolution models were generated using ClonEvol (48) with phylogeny plots and “monoclonal” as the “cancer.initiation.model” parameter (Supplementary Table S16). The evolutionary models of each patient were visualized using the fishplot package for R (49).

Quantification and Statistical Analysis

Statistical testing for mutation calling was performed in R or Python as described above. Statistical comparisons of identified mutations were performed in GraphPad Prism Version 9 software (GraphPad Software).

To test if there was an association between the presence of a signaling gene mutation at MDS and AML progression, the primary outcome was the cumulative incidence of progression to secondary AML. Because death is a competing risk for disease progression, a Fine–Gray regression was fit to assess the interaction of at least one signaling gene mutation with any VAF with being lower risk [very low/low/intermediate IPSS-R risk categories (e.g., IPSS-R score ≤4.5), or an IPSS-R score ≤3.5] on the cumulative incidence of progression to secondary AML while controlling for other covariates, including a TP53 mutation, a mutation in one of five genes associated with poor overall survival (e.g., TP53, EZH2, ETV6, RUNX1, or ASXL1) (34), as well as age, sex and therapy-related MDS (50, 51). In the Fine–Gray subdistribution hazard models, death without progression to secondary AML was considered a competing risk, and data on patients who were alive and did not have disease progression at the end of the study were censored. The association between patient and disease characteristics and an adverse outcome (progression to secondary AML or death) was also assessed with the use of proportional-hazards models of time to progression to secondary AML or death, whichever occurred first (progression-free survival). P values in figures and hazard ratios were calculated with the use of the Cox proportional-hazards or Fine–Gray model. Progression to secondary AML was measured from the time of MDS banking to either death or progression to secondary AML, whichever occurred first, or to the time of censoring.

Data Availability

The data generated in this study are publicly available in dbGAP (phs000159.v12.p5).

J.S. Welch reports grants from NIH, Janssen Pharmaceuticals, Evans Foundation, and Children's Discovery Institute during the conduct of the study. D.H. Spencer reports personal fees from Wugen, Inc and nonfinancial support from Illumina outside the submitted work. P. Westervelt reports grants from NCI during the conduct of the study; personal fees from Pfizer outside the submitted work. No disclosures were reported by the other authors.

A.J. Menssen: Conceptualization, data curation, methodology, writing–original draft, writing–review and editing. A. Khanna: Data curation, formal analysis, visualization. C.A. Miller: Conceptualization, data curation, formal analysis. S. Nonavinkere Srivatsan: Data curation, formal analysis, visualization. G. Chang: Data curation, formal analysis. J. Shao: Data curation, formal analysis, project administration. J. Robinson: Data curation, project administration. M. O'Laughlin: Data curation. C.C. Fronick: Data curation. R.S. Fulton: Conceptualization, data curation, formal analysis. K. Brendel: Data curation. S.E. Heath: Data curation. R. Saba: Data curation. J.S. Welch: Conceptualization. D.H. Spencer: Conceptualization. J.E. Payton: Conceptualization. P. Westervelt: Conceptualization, data curation. J.F. DiPersio: Conceptualization. D.C. Link: Conceptualization, funding acquisition. M.J. Schuelke: Formal analysis. M.A. Jacoby: Conceptualization. E.J. Duncavage: Conceptualization. T.J. Ley: Conceptualization, funding acquisition. M.J. Walter: Conceptualization, formal analysis, supervision, funding acquisition, writing–original draft, project administration, writing–review and editing.

We thank Todd Druley for assistance with single-cell DNA sequencing and droplet digital PCR and for helpful scientific discussion along with Andrew Young regarding the statistical approaches to the analysis of the error-corrected sequencing using binomial distributions. This work was supported by the Genomics of AML Program Project of the NCI (P01 CA101937 to Dr. Ley) and support for procurement of human samples and research was provided by the Specialized Program of Research Excellence (SPORE) in AML (P50 CA171963 to Dr. Link). This work was supported by grants from the Siteman Cancer Center, The Foundation for Barnes-Jewish Hospital Cancer Frontier Fund (5109), the Taub Foundation, the Lottie Caroline Hardy Trust, the Edward P. Evans Center for Myelodysplastic Syndromes at Washington University (to Dr. Walter), the Edward P. Evans Foundation (to Drs. Walter and Duncavage), the NCI (R33 CA217700, to Drs. Walter and Duncavage), the SPORE in AML of the NCI Career Enhancement Program (P50 CA171963, to Dr. Duncavage), the Washington University Institute of Clinical and Translational Sciences (UL1 TR002345) from the National Center for Advancing Translational Sciences of the National Institutes of Health (TL1 TR002344, to Dr. Duncavage), and an NCI Research Specialist Award (R50 CA211782, to Dr. Miller). Core services were provided by the Siteman Cancer Center Tissue Procurement Core, the Flow Cytometry Core, and the Genome Technology Access Center supported in part by an NCI Cancer Center Support Grant P30 CA091842.

Note: Supplementary data for this article are available at Blood Cancer Discovery Online (https://bloodcancerdiscov.aacrjournals.org/).

1.
Nimer
SD
.
Myelodysplastic syndromes
.
Blood
2008
;
111
:
4841
51
.
2.
Swerdlow
SH
.
WHO classification of tumours of haematopoietic and lymphoid tissues
.
WHO Classification of Tumours
2008
;
22008
:
439
.
3.
Tefferi
A
,
Vardiman
JW
.
Myelodysplastic syndromes
.
N Engl J Med
2009
;
361
:
1872
85
.
4.
Borthakur
G
,
Lin
E
,
Jain
N
,
Estey
EE
,
Cortes
JE
,
O'Brien
S
, et al
.
Survival is poorer in patients with secondary core-binding factor acute myelogenous leukemia compared with de novo core-binding factor leukemia
.
Cancer
2009
;
115
:
3217
21
.
5.
Granfeldt Ostgard
LS
,
Medeiros
BC
,
Sengelov
H
,
Norgaard
M
,
Andersen
MK
,
Dufva
IH
, et al
.
Epidemiology and clinical significance of secondary and therapy-related acute myeloid leukemia: a national population-based cohort study
.
J Clin Oncol
2015
;
33
:
3641
9
.
6.
Hulegardh
E
,
Nilsson
C
,
Lazarevic
V
,
Garelius
H
,
Antunovic
P
,
Rangert Derolf
A
, et al
.
Characterization and prognostic features of secondary acute myeloid leukemia in a population-based setting: a report from the Swedish Acute Leukemia Registry
.
Am J Hematol
2015
;
90
:
208
14
.
7.
Xu
XQ
,
Wang
JM
,
Gao
L
,
Qiu
HY
,
Chen
L
,
Jia
L
, et al
.
Characteristics of acute myeloid leukemia with myelodysplasia-related changes: a retrospective analysis in a cohort of Chinese patients
.
Am J Hematol
2014
;
89
:
874
81
.
8.
Haferlach
T
,
Nagata
Y
,
Grossmann
V
,
Okuno
Y
,
Bacher
U
,
Nagae
G
, et al
.
Landscape of genetic lesions in 944 patients with myelodysplastic syndromes
.
Leukemia
2014
;
28
:
241
7
.
9.
Ogawa
S
.
Genetics of MDS
.
Blood
2019
;
133
:
1049
59
.
10.
Menssen
AJ
,
Walter
MJ
.
Genetics of progression from MDS to secondary leukemia
.
Blood
2020
;
136
:
50
60
.
11.
Lindsley
RC
,
Mar
BG
,
Mazzola
E
,
Grauman
PV
,
Shareef
S
,
Allen
SL
, et al
.
Acute myeloid leukemia ontogeny is defined by distinct somatic mutations
.
Blood
2015
;
125
:
1367
76
.
12.
Makishima
H
,
Yoshizato
T
,
Yoshida
K
,
Sekeres
MA
,
Radivoyevitch
T
,
Suzuki
H
, et al
.
Dynamics of clonal evolution in myelodysplastic syndromes
.
Nat Genet
2017
;
49
:
204
12
.
13.
Papaemmanuil
E
,
Gerstung
M
,
Bullinger
L
,
Gaidzik
VI
,
Paschka
P
,
Roberts
ND
, et al
.
Genomic classification and prognosis in acute myeloid leukemia
.
N Engl J Med
2016
;
374
:
2209
21
.
14.
Papaemmanuil
E
,
Gerstung
M
,
Malcovati
L
,
Tauro
S
,
Gundem
G
,
Van Loo
P
, et al
.
Clinical and biological implications of driver mutations in myelodysplastic syndromes
.
Blood
2013
;
122
:
3616
27
;
quiz 99
.
15.
Uy
GL
,
Duncavage
EJ
,
Chang
GS
,
Jacoby
MA
,
Miller
CA
,
Shao
J
, et al
.
Dynamic changes in the clonal structure of MDS and AML in response to epigenetic therapy
.
Leukemia
2017
;
31
:
872
81
.
16.
Walter
MJ
,
Shen
D
,
Ding
L
,
Shao
J
,
Koboldt
DC
,
Chen
K
, et al
.
Clonal architecture of secondary acute myeloid leukemia
.
N Engl J Med
2012
;
366
:
1090
8
.
17.
Walter
MJ
,
Shen
D
,
Shao
J
,
Ding
L
,
White
BS
,
Kandoth
C
, et al
.
Clonal diversity of recurrently mutated genes in myelodysplastic syndromes
.
Leukemia
2013
;
27
:
1275
82
.
18.
Yoshida
K
,
Sanada
M
,
Shiraishi
Y
,
Nowak
D
,
Nagata
Y
,
Yamamoto
R
, et al
.
Frequent pathway mutations of splicing machinery in myelodysplasia
.
Nature
2011
;
478
:
64
9
.
19.
Xu
L
,
Gu
ZH
,
Li
Y
,
Zhang
JL
,
Chang
CK
,
Pan
CM
, et al
.
Genomic landscape of CD34+ hematopoietic cells in myelodysplastic syndrome and gene mutation profiles as prognostic markers
.
Proc Natl Acad Sci U S A
2014
;
111
:
8589
94
.
20.
Chen
J
,
Kao
YR
,
Sun
D
,
Todorova
TI
,
Reynolds
D
,
Narayanagari
SR
, et al
.
Myelodysplastic syndrome progression to acute myeloid leukemia at the stem cell level
.
Nat Med
2019
;
25
:
103
10
.
21.
Kim
T
,
Tyndel
MS
,
Kim
HJ
,
Ahn
JS
,
Choi
SH
,
Park
HJ
, et al
.
The clonal origins of leukemic progression of myelodysplasia
.
Leukemia
2017
;
31
:
1928
35
.
22.
Ding
L
,
Ley
TJ
,
Larson
DE
,
Miller
CA
,
Koboldt
DC
,
Welch
JS
, et al
.
Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing
.
Nature
2012
;
481
:
506
10
.
23.
da Silva-Coelho
P
,
Kroeze
LI
,
Yoshida
K
,
Koorenhof-Scheele
TN
,
Knops
R
,
van de Locht
LT
, et al
.
Clonal evolution in myelodysplastic syndromes
.
Nat Commun
2017
;
8
:
15099
.
24.
Nangalia
J
,
Nice
FL
,
Wedge
DC
,
Godfrey
AL
,
Grinfeld
J
,
Thakker
C
, et al
.
DNMT3A mutations occur early or late in patients with myeloproliferative neoplasms and mutation order influences phenotype
.
Haematologica
2015
;
100
:
e438
42
.
25.
Ortmann
CA
,
Kent
DG
,
Nangalia
J
,
Silber
Y
,
Wedge
DC
,
Grinfeld
J
, et al
.
Effect of mutation order on myeloproliferative neoplasms
.
N Engl J Med
2015
;
372
:
601
12
.
26.
Corces-Zimmerman
MR
,
Hong
WJ
,
Weissman
IL
,
Medeiros
BC
,
Majeti
R
.
Preleukemic mutations in human acute myeloid leukemia affect epigenetic regulators and persist in remission
.
Proc Natl Acad Sci U S A
2014
;
111
:
2548
53
.
27.
Martín-Izquierdo
M
,
Abáigar
M
,
Hernández-Sánchez
JM
,
Tamborero
D
,
López-Cadenas
F
,
Ramos
F
, et al
.
Co-occurrence of cohesin complex and Ras signaling mutations during progression from myelodysplastic syndromes to secondary acute myeloid leukemia
.
Haematologica
2021
;
106
:
2215
23
.
28.
Murphy
DM
,
Bejar
R
,
Stevenson
K
,
Neuberg
D
,
Shi
Y
,
Cubrich
C
, et al
.
NRAS mutations with low allele burden have independent prognostic significance for patients with lower risk myelodysplastic syndromes
.
Leukemia
2013
;
27
:
2077
81
.
29.
Shiozawa
Y
,
Malcovati
L
,
Galli
A
,
Pellagatti
A
,
Karimi
M
,
Sato-Otsubo
A
, et al
.
Gene expression and risk of leukemic transformation in myelody­splasia
.
Blood
2017
;
130
:
2642
53
.
30.
Takahashi
K
,
Jabbour
E
,
Wang
X
,
Luthra
R
,
Bueso-Ramos
C
,
Patel
K
, et al
.
Dynamic acquisition of FLT3 or RAS alterations drive a subset of patients with lower risk MDS to secondary AML
.
Leukemia
2013
;
27
:
2081
3
.
31.
Ley
TJ
,
Miller
C
,
Ding
L
,
Raphael
BJ
,
Mungall
AJ
,
Robertson
A
, et al
.
Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia
.
N Engl J Med
2013
;
368
:
2059
74
.
32.
Welch
JS
,
Petti
AA
,
Miller
CA
,
Fronick
CC
,
O'Laughlin
M
,
Fulton
RS
, et al
.
TP53 and decitabine in acute myeloid leukemia and myelodysplastic syndromes
.
N Engl J Med
2016
;
375
:
2023
36
.
33.
Guess
T
,
Potts
CR
,
Bhat
P
,
Cartailler
JA
,
Brooks
A
,
Holt
C
, et al
.
Distinct patterns of clonal evolution drive myelodysplastic syndrome progression to secondary acute myeloid leukemia
.
Blood Cancer Discov
2022
;3:316–29.
34.
Bejar
R
,
Stevenson
K
,
Abdel-Wahab
O
,
Galili
N
,
Nilsson
B
,
Garcia-Manero
G
, et al
.
Clinical effect of point mutations in myelodysplastic syndromes
.
N Engl J Med
2011
;
364
:
2496
506
.
35.
Pfeilstöcker
M
,
Tuechler
H
,
Sanz
G
,
Schanz
J
,
Garcia-Manero
G
,
Solé
F
, et al
.
Time-dependent changes in mortality and transformation risk in MDS
.
Blood
2016
;
128
:
902
10
.
36.
Morita
K
,
Wang
F
,
Jahn
K
,
Hu
T
,
Tanaka
T
,
Sasaki
Y
, et al
.
Clonal evolution of acute myeloid leukemia revealed by high-throughput single-cell genomics
.
Nat Commun
2020
;
11
:
5327
.
37.
Miles
LA
,
Bowman
RL
,
Merlinsky
TR
,
Csete
IS
,
Ooi
AT
,
Durruthy-Durruthy
R
, et al
.
Single-cell mutation analysis of clonal evolution in myeloid malignancies
.
Nature
2020
;
587
:
477
82
.
38.
Paguirigan
AL
,
Smith
J
,
Meshinchi
S
,
Carroll
M
,
Maley
C
,
Radich
JP
.
Single-cell genotyping demonstrates complex clonal diversity in acute myeloid leukemia
.
Sci Transl Med
2015
;
7
:
281re2
.
39.
Welch
JS
.
Mutation position within evolutionary subclonal architecture in AML
.
Semin Hematol
2014
;
51
:
273
81
.
40.
Oshima
K
,
Khiabanian
H
,
da Silva-Almeida
AC
,
Tzoneva
G
,
Abate
F
,
Ambesi-Impiombato
A
, et al
.
Mutational landscape, clonal evolution patterns, and role of RAS mutations in relapsed acute lymphoblastic leukemia
.
Proc Natl Acad Sci U S A
2016
;
113
:
11306
11
.
41.
Inoue
D
,
Polaski
JT
,
Taylor
J
,
Castel
P
,
Chen
S
,
Kobayashi
S
, et al
.
Minor intron retention drives clonal hematopoietic disorders and diverse cancer predisposition
.
Nat Genet
2021
;
53
:
707
18
.
42.
Awad
MM
,
Liu
S
,
Rybkin
II
,
Arbour
KC
,
Dilly
J
,
Zhu
VW
, et al
.
Acquired resistance to KRAS(G12C) inhibition in cancer
.
N Engl J Med
2021
;
384
:
2382
93
.
43.
Jacoby
MA
,
Duncavage
EJ
,
Chang
GS
,
Miller
CA
,
Shao
J
,
Elliott
K
, et al
.
Subclones dominate at MDS progression following allogeneic hematopoietic cell transplant
.
JCI Insight
2018
;
3
:
e98962
.
44.
Klco
JM
,
Miller
CA
,
Griffith
M
,
Petti
A
,
Spencer
DH
,
Ketkar-Kulkarni
S
, et al
.
Association between mutation clearance after induction therapy and outcomes in acute myeloid leukemia
.
JAMA
2015
;
314
:
811
22
.
45.
Duncavage
EJ
,
Jacoby
MA
,
Chang
GS
,
Miller
CA
,
Edwin
N
,
Shao
J
, et al
.
Mutation clearance after transplantation for myelodysplastic syndrome
.
N Engl J Med
2018
;
379
:
1028
41
.
46.
Quinlan
AR
,
Hall
IM
.
BEDTools: a flexible suite of utilities for comparing genomic features
.
Bioinformatics
2010
;
26
:
841
2
.
47.
Miller
CA
,
White
BS
,
Dees
ND
,
Griffith
M
,
Welch
JS
,
Griffith
OL
, et al
.
SciClone: inferring clonal architecture and tracking the spatial and temporal patterns of tumor evolution
.
PLoS Comput Biol
2014
;
10
:
e1003665
.
48.
Dang
HX
,
White
BS
,
Foltz
SM
,
Miller
CA
,
Luo
J
,
Fields
RC
, et al
.
ClonEvol: clonal ordering and visualization in cancer sequencing
.
Ann Oncol
2017
;
28
:
3076
82
.
49.
Miller
CA
,
McMichael
J
,
Dang
HX
,
Maher
CA
,
Ding
L
,
Ley
TJ
, et al
.
Visualizing tumor evolution with the fishplot package for R
.
BMC Genomics
2016
;
17
:
880
.
50.
Geskus
RB
.
Cause-specific cumulative incidence estimation and the fine and gray model under both left truncation and right censoring
.
Biometrics
2011
;
67
:
39
49
.
51.
Fine
JP
,
Gray
RJ
.
A proportional hazards model for the subdistribution of a competing risk
.
J Am Statist Assoc
1999
;
94
:
496
509
.

Supplementary data