Clonal hematopoiesis of indeterminate potential (CHIP) is characterized by detectable hematopoietic-associated gene mutations in a person without evidence of hematologic malignancy. We sought to identify additional cancer-presenting mutations usable for CHIP detection by performing a data mining analysis of 48 somatic mutation landscape studies reporting mutations at diagnoses of 7,430 adult and pediatric patients with leukemia or other hematologic malignancy. Following extraction of 20,141 protein-altering mutations, we identified 434 significantly recurrent mutation hotspots, 364 of which occurred at loci confidently assessable for CHIP. We then performed an additional large-scale analysis of whole-exome sequencing data from 4,538 persons belonging to three noncancer cohorts for clonal mutations. We found the combined cohort prevalence of CHIP with mutations identical to those reported at blood cancer mutation hotspots to be 1.8%, and that some of these CHIP mutations occurred in children. Our findings may help to improve CHIP detection and precancer surveillance for both children and adults.

Significance:

This study identifies frequently occurring mutations across several blood cancers that may drive hematologic malignancies and signal increased risk for cancer when detected in healthy persons. We find clonal mutations at these hotspots in a substantial number of individuals from noncancer cohorts, including children, showcasing potential for improved precancer surveillance.

See related commentary by Spitzer and Levine, p. 192.

Somatic mutations at hotspots (genetic loci observed to be frequently mutated across patients with cancer) often drive or contribute to cancer pathogenesis (1, 2). Many somatic landscape studies have now been performed with next-generation sequencing (NGS), typically for a single cancer type in small- to medium-sized patient cohorts, identifying mutations that recur frequently. Large single-study and pan-cancer analyses of both pediatric and adult cohorts have identified cancer genes and mutation hotspots having even lower but significant recurrence rates. For example, over 140 driver genes were identified in an analysis of children with six cancer types (3), and more than 450 hotspots were found in an analysis of 41 predominantly nonhematologic cancers (2). However, the number of included leukemic or other hematologic cancer samples in both pan-cancer and single-study analyses reporting hotspots at a codon level has generally been insufficient to identify blood cancer–specific hotspots of low recurrence. Existing databases that accumulate mutations across many individual studies [most notable in size, the Catalogue Of Somatic Mutations In Cancer (COSMIC; https://cancer.sanger.ac.uk/cosmic); ref. 4] are extremely valuable for identifying reported mutations within their included studies; yet, current versions of these databases lack many exome/genome-wide blood cancer studies that could improve detection of mutations recurrent at low frequencies, and some are difficult to filter for specific cancers. Hence, many hematologic malignancy mutation hotspots of lesser recurrence and their constituent mutations that may drive blood cancers are likely still unidentified or unappreciated.

Clonal hematopoiesis of indeterminate potential (CHIP), which is detected as an expansive clonal somatic mutation in a person currently free from hematologic malignancy, carries a greatly increased risk (HR >10) for future blood cancer development (5, 6). CHIP increases sharply in prevalence with advanced age (5–9), and backward extrapolation of trends in adults could suggest that CHIP in childhood is extremely rare. However, no exome-wide CHIP study has included large numbers of healthy children to directly address this issue. The mutations in hematopoiesis-regulating genes used to identify CHIP include hotspot mutations of high recurrence, but also include mutations lacking driver or recurrent evidence at specific amino acid positions. Regardless of neoplastic evidence, CHIP mutations are frequently identified in persons singly (5–10); therefore, most are likely responsible for the clonal expansion that makes their detection possible. Yet CHIP mutations identical to those reported at blood cancer mutation hotspots may be found to carry increased risk for future neoplastic transformation.

With the dual aims of identifying mutational hotspots of hematologic malignancies and using these hotspots to detect CHIP having potentially greater risk for affecting cancer transformation, we performed a recurrent mutation analysis independently from existing databases. We systematically extracted and analyzed reported data from 48 studies meeting inclusion criteria, which detected mutations at diagnoses of patients with acute leukemias, myeloproliferative neoplasms (MPN), and myelodysplastic syndrome (MDS). As the majority of these studies interrogated the entire exome and many are not currently included in other mutation databases, our analysis provides a valuable compiled list of recurrent hematologic cancer mutations at a protein-coding level with smaller genome-wide bias and larger sample size than previously reported. We then performed a large analysis of CHIP, focused on finding clonal mutations identical to those reported at these hotspots in the whole-exome sequencing (WES) data of 4,538 persons from three noncancer cohorts, which included the widely used 1000 Genomes Project (1KG) and more than 400 children. Our findings may lead to the improved prognostic and clinical utility of precancer surveillance in children and adults.

Somatic Mutation Studies

By systematic literature search, we identified 48 somatic mutation landscape studies of blood cancers (3, 11–57) that could be used to determine recurrence of mutations reported at amino acid positions in protein-coding genes (Table 1). Focusing on diseases that may have been preceded by CHIP (58), we limited our investigation to studies that assessed patients by NGS at diagnosis of one of seven hematologic malignancies: acute lymphoid leukemia (ALL), acute myeloid leukemia (AML), chronic myeloid leukemia (CML), chronic myelomonocytic leukemia (CMML), juvenile myelomonocytic leukemia (JMML), MDS, and MPN (Fig. 1). After evaluation, filtering, and harmonization of 58,177 reported mutations assessed in 7,430 diagnostic patients, we determined a total of 20,141 mutations that altered an amino acid or splice site.

Table 1.

The 48 studies included in the hotspot mutation analysis to identify recurrent mutations in hematologic malignancies

StudyMain hematologic malignancyPatients assessed (N)aMutations assessed (N)bProtein-altering mutations (N)c
Andersson et al. (2015; ref. 11) ALL 65 250 146 
Chen et al. (2018; ref. 12) ALL 36 302 294 
De Keersmaecker et al. (2013; ref. 13) ALL 211 538 470 
Holmfeldt et al. (2013; ref. 14) ALL 40 706 366 
Liu et al. (2016; ref. 15) ALL 203 2,437 1,418 
Liu et al. (2017; ref. 16) ALL 264 4,165 3,426 
Oshima et al. (2016; ref. 17) ALL 55 1,845 397 
Papaemmanuil et al. (2014; ref. 18) ALL 55 795 549 
Paulsson et al. (2015; ref. 19) ALL 51 459 376 
Russell et al. (2017; ref. 20) ALL 218 86 
Ryan et al. (2016; ref. 21) ALL 42 45 30 
Stengel et al. (2014; ref. 22) ALL 625 110 98 
Ma et al. (2018; ref. 3) ALL 122 574 254 
Huether et al. (2014; ref. 23) ALL 525 412 167 
Bolouri et al. (2018; ref. 24) AML 684 1,983 584 
de Rooij et al. (2017; ref. 25) AML 113 268 99 
Dolnik et al. (2012; ref. 26) AML 50 171 153 
Eisfeld et al. (2017; ref. 27) AML 177 258 199 
Eisfeld et al. (2017; ref. 28) AML 10 36 36 
Eisfeld et al. (2016; ref. 29) AML 23 68 56 
Faber et al. (2016; ref. 30) AML 165 849 588 
Farrar et al. (2016; ref. 31) AML 20 208 126 
Garg et al. (2015; ref. 32) AML 67 443 250 
Greif et al. (2018; ref. 33) AML 50 558 440 
Hirsch et al. (2016; ref. 34) AML 53 257 246 
Lavallée et al. (2015; ref. 35) AML 29 55 53 
CGARN et al. (2013; ref. 36) AML 200 22,633 1,574 
Madan et al. (2016; ref. 37) AML 153 210 126 
Papaemmanuil et al. (2016; ref. 38) AML 1,540 3,902 3,341 
Sehgal et al. (2015; ref. 39) AML 152 52 52 
Sood et al. (2016; ref. 40) AML 13 180 94 
Thol et al. (2017; ref. 41) AML 171 603 415 
Kim et al. (2017; ref. 42) CML 100 68 40 
Togasaki et al. (2017; ref. 43) CML 24 191 181 
Mason et al. (2016; ref. 44) CMML 69 507 479 
Merlevede et al. (2016; ref. 45) CMML 17 8,077 143 
Palomo et al. (2016; ref. 46) CMML 56 262 255 
Patnaik et al. (2017; ref. 47) CMML 261 16 15 
Caye et al. (2015; ref. 48) JMML 118 187 107 
Stieglitz et al. (2015; ref. 49) JMML 71 128 128 
Haferlach et al. (2014; ref. 50) MDS 102d 300 281 
Pastor et al. (2017; ref. 51) MDS 50 25 18 
Walter et al. (2013; ref. 52) MDS 150 322 254 
Yoshida et al. (2011; ref. 53)e MDS 22 268 185 
Churpek et al. (2015; ref. 54)f MDS 16 183 51 
Schwartz et al. (2017; ref. 55)g MDS 54 288 195 
Lundberg et al. (2014; ref. 56) MPN 197 267 236 
Nangalia et al. (2013; ref. 57) MPN 151 1,498 1,064 
Total  7,430 58,177 20,141 
StudyMain hematologic malignancyPatients assessed (N)aMutations assessed (N)bProtein-altering mutations (N)c
Andersson et al. (2015; ref. 11) ALL 65 250 146 
Chen et al. (2018; ref. 12) ALL 36 302 294 
De Keersmaecker et al. (2013; ref. 13) ALL 211 538 470 
Holmfeldt et al. (2013; ref. 14) ALL 40 706 366 
Liu et al. (2016; ref. 15) ALL 203 2,437 1,418 
Liu et al. (2017; ref. 16) ALL 264 4,165 3,426 
Oshima et al. (2016; ref. 17) ALL 55 1,845 397 
Papaemmanuil et al. (2014; ref. 18) ALL 55 795 549 
Paulsson et al. (2015; ref. 19) ALL 51 459 376 
Russell et al. (2017; ref. 20) ALL 218 86 
Ryan et al. (2016; ref. 21) ALL 42 45 30 
Stengel et al. (2014; ref. 22) ALL 625 110 98 
Ma et al. (2018; ref. 3) ALL 122 574 254 
Huether et al. (2014; ref. 23) ALL 525 412 167 
Bolouri et al. (2018; ref. 24) AML 684 1,983 584 
de Rooij et al. (2017; ref. 25) AML 113 268 99 
Dolnik et al. (2012; ref. 26) AML 50 171 153 
Eisfeld et al. (2017; ref. 27) AML 177 258 199 
Eisfeld et al. (2017; ref. 28) AML 10 36 36 
Eisfeld et al. (2016; ref. 29) AML 23 68 56 
Faber et al. (2016; ref. 30) AML 165 849 588 
Farrar et al. (2016; ref. 31) AML 20 208 126 
Garg et al. (2015; ref. 32) AML 67 443 250 
Greif et al. (2018; ref. 33) AML 50 558 440 
Hirsch et al. (2016; ref. 34) AML 53 257 246 
Lavallée et al. (2015; ref. 35) AML 29 55 53 
CGARN et al. (2013; ref. 36) AML 200 22,633 1,574 
Madan et al. (2016; ref. 37) AML 153 210 126 
Papaemmanuil et al. (2016; ref. 38) AML 1,540 3,902 3,341 
Sehgal et al. (2015; ref. 39) AML 152 52 52 
Sood et al. (2016; ref. 40) AML 13 180 94 
Thol et al. (2017; ref. 41) AML 171 603 415 
Kim et al. (2017; ref. 42) CML 100 68 40 
Togasaki et al. (2017; ref. 43) CML 24 191 181 
Mason et al. (2016; ref. 44) CMML 69 507 479 
Merlevede et al. (2016; ref. 45) CMML 17 8,077 143 
Palomo et al. (2016; ref. 46) CMML 56 262 255 
Patnaik et al. (2017; ref. 47) CMML 261 16 15 
Caye et al. (2015; ref. 48) JMML 118 187 107 
Stieglitz et al. (2015; ref. 49) JMML 71 128 128 
Haferlach et al. (2014; ref. 50) MDS 102d 300 281 
Pastor et al. (2017; ref. 51) MDS 50 25 18 
Walter et al. (2013; ref. 52) MDS 150 322 254 
Yoshida et al. (2011; ref. 53)e MDS 22 268 185 
Churpek et al. (2015; ref. 54)f MDS 16 183 51 
Schwartz et al. (2017; ref. 55)g MDS 54 288 195 
Lundberg et al. (2014; ref. 56) MPN 197 267 236 
Nangalia et al. (2013; ref. 57) MPN 151 1,498 1,064 
Total  7,430 58,177 20,141 

NOTE: Studies assessed somatic mutations in diagnostic patient samples with NGS and met additional inclusion criteria.

Abbreviation: CGARN, Cancer Genome Atlas Research Network.

aNumber of unique diagnostic patients assessed in the original study meeting inclusion criteria (hence, the total sample size of the original study may have been larger).

bNumber of mutations from the original study assessed in this hotspot mutation analysis.

cNumber of determined protein-altering or splice-site mutations used in recurrence tallies.

dSample size estimated for the random selection of mutations provided.

eSome samples from patients classified as CMML were included in this study.

fSome patients were also classified with AML in this study.

gSome patients were also classified with AML or MPN/JMML in this study.

Figure 1.

Flowchart of mutational hotspot and clonal hematopoiesis of indeterminate potential (CHIP) analyses. Hematologic cancer hotspot mutations identified in the recurrent mutation analysis were assessed for clonal presence in persons from noncancer cohorts as part of the CHIP analysis. pt., patient.

Figure 1.

Flowchart of mutational hotspot and clonal hematopoiesis of indeterminate potential (CHIP) analyses. Hematologic cancer hotspot mutations identified in the recurrent mutation analysis were assessed for clonal presence in persons from noncancer cohorts as part of the CHIP analysis. pt., patient.

Close modal

Mutational Hotspots in Hematologic Cancers

We found 434 amino acid or splice-site positions occurring across 85 genes that met our criteria for hotspots (Methods; Supplementary Table S1). The most common hotspots were well-known and observed frequently even within individual studies (e.g., mutations at NRAS p.G12 were reported for 345 persons within 35 studies; Fig. 2A). Yet, 79 hotspots observed in three to eight persons had only a single mutation within any given study, highlighting the benefit of multistudy evaluations. Nearly all of the hotspots were recurrently listed across combined cancers in the most recent COSMIC database, which includes many blood cancer studies, the majority having been published between 1992 and 2010. Yet this version was lacking 32 of the 48 NGS studies (comprising 51% of total patients) used in our assessment. When restricting COSMIC mutation data to primary samples in the seven hematologic malignancies of our focus, the majority (n = 222) of our identified hotspots were observed in less than three individuals of this COSMIC subset (Supplementary Table S2).

Figure 2.

Mutation hotspots in hematologic malignancies. A, Most frequently reported mutation hotspots at blood cancer diagnoses. Top axes: total patients assessed for each hematologic malignancy. Numbers within cells: total number of patients having a reported mutation. Color intensity scale: percentage of persons having a reported mutation for each malignancy. As not all patients were assessed for mutations at each locus, these frequencies likely represent lower-bound estimates. Yellow boxes indicate mutations identified in only a single hematologic malignancy. B, Location in coding region for the 434 hotspots occurring in 85 genes. Horizontal axes: normalized location of each mutation locus within the coding sequence. Color of triangles: types of mutations reported at each locus (purple indicates only nonsense or frameshift, blue indicates only point mutations or nonframeshift, orange indicates a mixture of nonsense/frameshift and point/nonframeshift, and pink indicates splice site). Utilized transcripts are provided in Supplementary Table S1.

Figure 2.

Mutation hotspots in hematologic malignancies. A, Most frequently reported mutation hotspots at blood cancer diagnoses. Top axes: total patients assessed for each hematologic malignancy. Numbers within cells: total number of patients having a reported mutation. Color intensity scale: percentage of persons having a reported mutation for each malignancy. As not all patients were assessed for mutations at each locus, these frequencies likely represent lower-bound estimates. Yellow boxes indicate mutations identified in only a single hematologic malignancy. B, Location in coding region for the 434 hotspots occurring in 85 genes. Horizontal axes: normalized location of each mutation locus within the coding sequence. Color of triangles: types of mutations reported at each locus (purple indicates only nonsense or frameshift, blue indicates only point mutations or nonframeshift, orange indicates a mixture of nonsense/frameshift and point/nonframeshift, and pink indicates splice site). Utilized transcripts are provided in Supplementary Table S1.

Close modal

Some of our identified mutation hotspots were unique to a specific hematologic malignancy (Fig. 2B; Supplementary Table S1). With the proportion of assessed patients being 49.4% AML, 31.0% ALL, and 19.6% other malignancies, we found that mutations at IDH2 p.R172 (n = 59), KIT p.N822 (n = 31), and KIT p.Y418 (n = 18) were reported only in patients with AML, whereas RPL10 p.R98 (n = 25), NOTCH1 p.L1678 (n = 24), FBXW7 p.R479 (n = 23), NOTCH1 p.L1600 (n = 22), and NOTCH1 p.L1585 (n = 19) mutations were observed only in patients with ALL. We also observed that some genes contain mostly point or nonframeshift hotspots (e.g., KRAS, NRAS, PTPN11, SF3B1), mostly nonsense or frameshift hotspots (e.g., ASXL1, NF1, STAG2), or had distinct groupings of mutation types by location within the coding region (e.g., CEBPA, NOTCH1).

Hotspot Mutations Evaluable for CHIP

We identified 70 of the 434 hotspots to have heightened potential for false positives if used to infer CHIP from mutations in blood samples alone (Supplementary Table S3). This resulted in 364 hematologic cancer mutation hotspots and 755 constituent mutations that could be confidently used in screening for CHIP with prevalent NGS methods (Supplementary Table S4). Only 45 (12.4%) of these 364 loci were identified as hotspots in a previous pan-cancer analysis (2), and only 35 (9.6%) were observed in three or more patients of a sizable MDS investigation not included in our assessment (59). Likely because many of the exome-wide studies that identified mutations at these hotspots were published around or after a previous landmark CHIP study (10), 350 of the 755 specific mutations identified in our hotspot mutation assessment were not present in the list of queried CHIP variants of that study. Of these, 134 were reported in ≥3 patients at diagnosis of a hematologic malignancy (Supplementary Table S5). Many of these unique mutations occur within NOTCH1, FLT3, CEBPA, KIT, and RUNX1.

CHIP in Noncancer Cohorts Identical to Hotspot Mutations

We next assessed three large noncancer cohorts (60–62) for CHIP at our identified hotspots within their combined 4,538 individuals. The 1KG cohort included 2,503 adults from 26 different populations with sample sizes ranging from N = 61 to N = 113 (median N = 99). The Qatari Genome (QTRG) and Simons Simplex Collection (SSC) cohorts included children and adults. We found the ages, sequencing depths, and methods utilized in each study to vary considerably (Supplementary Fig. S1A), with potential to influence the rate of clonal mutation identification. The means of the average depths of coverage across the 364 confident hotspots were 78.5×, 93.3×, and 53.6×, with averages of 27.5%, 32.7%, and 10.3% of hotspots covered at a depth ≥100× in the 1KG, SSC, and QTRG cohorts, respectively (Supplementary Fig. S1B).

All detected variants were evaluated for reliability (which led to the exclusion of eight outlier samples) and for meeting the criteria for CHIP at hotspots, resulting in the identification of 83 individuals (1.83% of the 4,530 included) each having a single CHIP mutation at one of 62 hotspots across 23 genes (Supplementary Fig. S2; Supplementary Table S6). The prevalence rate of CHIP at hotspots for each cohort was as follows: QTRG: 0.73% (9/1,231), 1KG: 2.48% (62/2,503), and SSC 1.51% (12/796)—with subgroup rates of 0.77% (3/388) for SSC children and 2.21% (9/408) for SSC adults (Fig. 3A). CHIP mutations at hotspots were most common in DNMT3A, TET2, and TP53 (Fig. 3B). Clonal size was not significantly associated with reported frequency in our recurrent mutation analysis (P = 0.56; Supplementary Fig. S3). However, CHIP detection was associated with frequency of hotspot recurrence [OR, 1.75 per log of reported mutations; 95% confidence interval (CI), 1.33–2.29; P < 0.0001]. Still, while half of the 12 hotspots (including those of lesser confidence for CHIP evaluation) reported in more than 100 patients of the recurrence analysis were observed with identical clonal mutations in the noncancer cohorts, six were not. These included NPM1 p.W288 and FLT3 p.D835, both of which may rapidly accelerate neoplastic onset, making them more difficult to observe in healthy persons (9). Together, these observations suggest that stratifying CHIP mutations by significant recurrence in hematologic malignancies may prove prognostic in studies of blood cancer risk.

Figure 3.

CHIP detection at hotspots in noncancer cohorts. A, Proportion of each cohort with identified CHIP mutations identical to mutations reported at 364 confident mutation hotspots (black colored bars indicate cohort aggregates). ACB, African Caribbean in Barbados; ASW, Americans of African Ancestry in SW, USA; BEB, Bengali in Bangladesh; CDX, Chinese Dai in Xishuangbanna, China; CEU, Utah Residents with Northern and Western European Ancestry; CHB, Han Chinese in Beijing, China; CHS, Han Chinese South; CLM, Colombians from Medellin, Colombia; ESN, Esan in Nigeria; FIN, Finnish in Finland; GBR, British in England and Scotland; GIH, Gujarati Indian from Houston, Texas; GWD, Gambian in Western Divisions in the Gambia; IBS, Iberian Population in Spain; ITU, Indian Telugu in the UK; JPT, Japanese in Tokyo, Japan; KHV, Kinh in Ho Chi Minh City, Vietnam; LWK, Luhya in Webuye, Kenya; MSL, Mende in Sierra Leone; MXL, Mexican Ancestry from Los Angeles, USA; PEL, Peruvians from Lima, Peru; PJL, Punjabi from Lahore, Pakistan; PUR, Puerto Ricans in Puerto Rico; STU, Sri Lankan Tamil from the UK; TSI, Toscani in Italia; YRI, Yoruba in Ibadan, Nigeria. B, Genes with CHIP mutations identical to mutations reported at hotspots across the three cohorts (N = 4,530). Black color indicates mutation identified in a child sample. C, Rate of relaxed CHIP (mutations having ≥2 variant reads) in the Qatari cohort by age groups.

Figure 3.

CHIP detection at hotspots in noncancer cohorts. A, Proportion of each cohort with identified CHIP mutations identical to mutations reported at 364 confident mutation hotspots (black colored bars indicate cohort aggregates). ACB, African Caribbean in Barbados; ASW, Americans of African Ancestry in SW, USA; BEB, Bengali in Bangladesh; CDX, Chinese Dai in Xishuangbanna, China; CEU, Utah Residents with Northern and Western European Ancestry; CHB, Han Chinese in Beijing, China; CHS, Han Chinese South; CLM, Colombians from Medellin, Colombia; ESN, Esan in Nigeria; FIN, Finnish in Finland; GBR, British in England and Scotland; GIH, Gujarati Indian from Houston, Texas; GWD, Gambian in Western Divisions in the Gambia; IBS, Iberian Population in Spain; ITU, Indian Telugu in the UK; JPT, Japanese in Tokyo, Japan; KHV, Kinh in Ho Chi Minh City, Vietnam; LWK, Luhya in Webuye, Kenya; MSL, Mende in Sierra Leone; MXL, Mexican Ancestry from Los Angeles, USA; PEL, Peruvians from Lima, Peru; PJL, Punjabi from Lahore, Pakistan; PUR, Puerto Ricans in Puerto Rico; STU, Sri Lankan Tamil from the UK; TSI, Toscani in Italia; YRI, Yoruba in Ibadan, Nigeria. B, Genes with CHIP mutations identical to mutations reported at hotspots across the three cohorts (N = 4,530). Black color indicates mutation identified in a child sample. C, Rate of relaxed CHIP (mutations having ≥2 variant reads) in the Qatari cohort by age groups.

Close modal

CHIP at Mutation Hotspots by Age Groups

We detected CHIP mutations at the leukemic hotspots in 3 of 388 (0.77%) children of the SSC cohort (two autism spectrum disorder probands and one unaffected sibling). These mutations occurred at hematologic hotspots in DNMT3A [p.R882S, variant allele frequency (VAF) = 2.5%] and RUNX1 [p.G170 splice site (c.509–1G>T), VAF = 1.3%; p.S322X, VAF = 1.1%], with all variant reads passing manual review and being detected on both strands (Supplementary Table S6; Supplementary Fig. S4). We assessed the parent–child trios for each of these children for expected SNP inheritance (to preclude the possibility of sample mix-up of children and adult samples or data) and confirmed that all three samples were from the child. The child with a RUNX1 p.S322X clonal mutation also had an apparent germline SNP at RUNX1 p.L56S (variant present in 5/10 reads) that was inherited from the child's father. This SNP is present in the gnomAD version 2.1.1 database (63) at a frequency of 1.2% yet was also reported in 4/56 (7.1%) patients diagnosed with CMML (46), suggesting a possibility that this variant helped accelerate clonal growth under the two-hit hypothesis (64). Still, the clonal size and number of variant reads at each hotspot in these children were insufficient to completely rule out the possibility of artifacts, and hence these findings are preliminary.

The QTRG cohort having participants with age ranging from 0 to 85 years had a much lower sequencing depth, so we explored the effect of relaxing the requirement for detecting CHIP at hotspots to include mutations with only two variant reads. While such a threshold would result in a high false-positive rate in deeply sequenced data, a higher rate of true positives is likely to be present in lower depth data. This assessment found an additional 41 clonal mutations for this cohort (Supplementary Table S7) and found age to be associated with relaxed CHIP prevalence in this call set (OR, 1.025 per year of age; 95% CI, 1.004–1.048; P = 0.022; Fig. 3C).

CHIP Not Exclusive to Hotspots

We also investigated the noncancer cohorts for the prevalence of CHIP that included seldom or never-before reported mutations at diagnosis of blood cancers, but which may still be involved with clonal expansion due to their occurrence in genes with known hematopoietic function. Across the specified mutations and domains of the 74 allowed hematologic genes of Jaiswal and colleagues (10), and using their criteria, we observed 189 mutations not exclusive to hotspots within the 4,530 persons of our analyzed cohorts (Supplementary Table S8), 40 of which (21.2%) were identical to mutations reported at confident hotspots in three or more diagnostic patients in the 48 somatic landscape publications we assessed (i.e., those listed in Supplementary Table S4). For comparison, we also determined that 76 of the 224 (33.9%) and 298 of the 805 (37.0%) reported CHIP mutations in Jaiswal and colleagues (5) and Jaiswal and colleagues (10), respectively, were identical to mutations in our hotspot list.

Our general CHIP analysis identified six persons having two mutations (five of whom had a mutation of DNMT3A), none with more than two, and the remaining 177 persons having only a single CHIP mutation—yielding a general CHIP prevalence rate of 4.04% (183/4,530) across the cohorts. Including identified clonal mutations having a VAF >4% at the additional hotspots not previously utilized by Jaiswal and colleagues (10), 193/4,530 (4.26%) persons had qualifying CHIP. As in previous studies, CHIP not exclusive to hotspots was most frequent for TET2 and DNMT3A (Supplementary Table S8). However, we found mutations in SETD2, EP300, and KMT2A/D to be more common in our cohort, whereas ASXL1 and JAK2 mutations were noticeably less frequent. Variability of gene prevalence for CHIP mutations in DNMT3A, TET2, ASXL1, and, most strikingly, JAK2 between control and cardiovascular disease cohorts was previously observed (10). As the cohorts we assessed likely had fewer underlying cases of cardiovascular disease and younger average age than other CHIP studies using WES data, some of our gene prevalence rates may reflect these differences. Further exploration identified seven ASXL1 variants having a VAF >2%, yet only one reached the threshold of 4% required for this general CHIP analysis. No additional JAK2 variants with three or more supporting reads were observed at any frequency.

Potential CHIP at non-hotspots with a VAF >4% was observed in only four children having variant reads detected on both strands (Supplementary Table S8). In one child, the variant (DNMT3A p.V665L) had been previously identified and reported as a de novo mutation by Lim and colleagues (65), although it could possibly have been a CHIP mutation that had reached near-complete saturation (the VAF was 47.3%; see Methods). This particular variant was not observed in 55 individuals having Tatton-Brown–Rahman syndrome, a congenital condition due to germline DNMT3A mutations (66), one of whom developed AML in childhood. Yet the clinical association of autistic spectrum disorder in 20 (36%) of their study participants could be related to this detection of a DNMT3A mutation in an SSC proband. Also intriguing, another of the four children with CHIP at non-hotspots was one of the three children we initially identified to have a hotspot CHIP mutation, making the child's nonhotspot TET2 p.A1863S variant (VAF = 6.5%) a potentially synergistic CHIP mutation with their previously mentioned RUNX1 p.G170 splice-site hotspot variant (Supplementary Fig. S4). This observation may once again reflect an increased likelihood of observing CHIP in children when two genetic insults are present, as without an additional factor, single clonal mutations in younger persons may have lacked sufficient time to expand to detectable levels.

Owing to the work of many researchers who published the findings of their somatic mutation studies (3, 11–57), we were able to compile a novel list of recurrent hematologic cancer mutations. The majority of these NGS studies assessed the entire exome/genome, allowing for a less biased and larger-scale assessment of diagnostic mutation recurrence in blood cancers than has previously been available. While significant recurrence alone does not imply a driver or even contributory role in cancer development, overall, the hotspots we identified will likely be enriched for pathogenic blood cancer mutations. Importantly, these identified hotspots also increase the number of mutations having documented recurrence in primary hematologic malignancies that can now be used to identify CHIP.

Using these recurrent mutation loci, we analyzed three large noncancer cohorts, finding clones with mutations identical to those observed at blood cancer mutation hotspots in 1.83% of persons across the combined cohorts. Without restricting to hotspots, our estimate of CHIP prevalence coincides with that of previous WES studies of CHIP, 4% to 5%, with the proportion of CHIP identical to diagnostic mutations at hotspots ranging from 21% to 37% within these studies. As the vast majority of all persons with identified CHIP are found to have only a single clonal mutation in hematopoiesis-regulating genes, most detected hotspot or nonhotspot mutations have likely driven the clonal expansion. Future studies of large size and long follow-up will be required to determine the degree to which CHIP at hotspots may carry an increased risk for aiding neoplastic transformation, as well as for apparently unrelated nonhematologic cardiovascular complications of CHIP (10).

Our analysis of CHIP included for the first time a large assessment of children free from blood disorders [please note that CHIP has been observed in children having aplastic anemia (67) and that postzygotic mutations of generally higher mutation frequencies in children have also been previously assessed (65)]. Of great interest, we detected CHIP at mutation hotspots in three of the children in our cohorts. This novel, preliminary detection of CHIP in children unselected for blood disorders is important because although only a small number of children were observed to have these clonal mutations, this rate may be higher than anticipated from past analyses of adult cohorts (e.g., only 1/1,039 adults aged 20–39 years was detected with CHIP in a previous analysis of WES data; ref. 5). The well-designed Simons Simplex Collection study sequenced children and parental samples simultaneously at nearly identical depths (60). That CHIP at hotspots in these children occurred at 35% of the overall parental frequency (2.2%) encourages future studies to investigate this phenomenon more thoroughly in children.

Abelson and colleagues (7) found that AML mutations reported with higher frequency in the COSMIC database (4) were also observed as CHIP more frequently. We similarly found an association between frequency of reported diagnostic mutations across the seven hematologic malignancies and their detection as clonal mutations in the noncancer cohorts, strengthening the conjecture that mutations at hotspot loci may increase risk for affecting transition from CHIP to hematologic cancer (6). Some of the hotspot mutations we list may have been overlooked in previous targeted designs for CHIP assessment. These mutations may now receive heightened priority when clinically observed as well as further investigation for functional significance and therapeutic potential.

Much remains to be discovered regarding most individual CHIP mutations, including their prevalence by ethnicity, risk for onset of nonhematologic diseases, and concomitant factors associated with their size and clonal dynamics. Using high-depth WGS and RNA sequencing, future work may expand to include the identification of clonal fusions in CHIP prevalence and risk assessments. While CHIP was previously thought to be pertinent to adults only, our analysis of noncancer cohorts identified clonal mutations at both hotspots and non-hotspots in children, indicating a need to further explore CHIP in younger-age cohorts. Our hematologic cancer–focused hotspot list will allow for subgroup analyses of CHIP in future studies, with promise to improve its prognostic performance and future blood cancer prevention research efforts.

Somatic Mutation Studies

An initial feasibility exercise was performed on 14 somatic mutation landscape studies to discover possible extraction-related problems and necessary steps to confidently identify and harmonize reported mutations usable for hotspot determination. This preliminary work was supplemented with a formal PubMed search on studies published before July 2018 with the following search criteria: (“genomic landscape,” “somatic landscape,” “genomic profile,” “mutational landscape,” OR “mutation landscape”) AND (“leukemia” OR “leukaemia”). This search returned a total of 172 papers, of which 61 were found to be review articles reporting on no new patients. We assessed studies focused on any of seven disorders: ALL, AML, CML, CMML, JMML, MPN, and MDS.

Mutation Evaluation

Each study was assessed for whether hematologic neoplasms had been investigated with NGS and for providing information that could accurately identify the genetic position of each alteration and the effect on protein coding. Studies that did not utilize NGS or reported mutations that could not be determined without ambiguity (owing to multiple transcripts for many genes, mutations from studies providing only an amino acid substitution without either an accompanying transcript identifier or genomic positions were generally deemed ambiguous) were not included. For remaining studies, all substitutions and all deletions were assessed separately for provided reference alleles to determine the combination of factors that were used in reporting mutations, specifically, cDNA or gDNA, 0-bp or 1-bp coordinate system, and human genome reference version. If only a trivial number of reference allele discrepancies were observed, these were discarded and the remaining variants were incorporated; otherwise, the entire group of substitutions, indels, or entire study was discarded. While translocations and large structural rearrangements are common for many cancers (68), these were not included in this investigation as our primary purpose was the identification of recurrent events occurring entirely within exonic regions, so that these could be queried for CHIP in WES data (in addition, less than one third of the studies provided breakpoints of large structural rearrangements at a base pair level). Many somatic landscape studies reported having performed validation of their listed mutations; uniformly, we relied on the mutations as reported by the original authors rather than requesting the original NGS data for reanalysis. Overall, our approach was designed to enrich for reported mutations having high confidence for accurate extraction rather than to collect increased numbers of mutations with less overall confidence in their authenticity.

Mutation Filtering and Harmonization

In studies that included mutations from relapse or secondary leukemia samples in addition to primary samples, mutations in the primary diagnostic samples were extracted only if they could be clearly distinguished from those in relapse or secondary leukemia samples. If there was no distinction, the entire study was excluded. For a few studies assessing samples that had been investigated in previous studies, mutations from patients reported in the older analysis were not included. Mutations in genes known for having high false-positive rates were frequently filtered by authors, and we likewise removed mutations reported in such genes from studies that had not performed such filtering. Similarly, a few studies reported all observed variants with little or no read/frequency filtering for mutation calling, and variants having minor VAFs were removed. Finally, a single RefSeq transcript was determined for each gene based on its study-utilized frequency and length, and each mutation was assessed for its effect on this transcript (listed in Supplementary Table S1). Only splice-site or protein-altering mutations were retained. A total of 48 studies (3, 11–57) satisfying all of these stringent criteria were included in the hotspot mutation analysis. Comparison with COSMIC data was performed with the most recent COSMIC database (version 92, August 2020).

Hotspot Determination

Similar to Chang and colleagues (2), we assessed mutational recurrence at amino acid and splice-site positions in protein-coding genes, yet also included indels in addition to substitutions. The number of reported protein-altering and splice-site substitutions and indels across the final set of 48 somatic mutation landscape studies was tallied at each amino acid locus, with indels tallied at the locus at which they began and splice sites tallied separately. As most studies did not provide silent or nonexonic mutations in their lists, a formal driver analysis was not possible, which may be considered a weakness of this work. Hence, we sought to determine minimalist criteria for our designation of hotspot based on deviance from expected recurrence in binomial distribution modeling that utilized the set of reported protein-altering mutations (Supplementary Methods; Supplementary Fig. S5). Various models were assessed, across which a threshold of at least three times recurrence was markedly pronounced for deviation from expectation. Thus, our minimalist criteria for hotspot designation was defined as three or more patients having such mutations at the same amino acid or splice-site position, with 434 loci meeting that criteria. To determine hotspots that could be confidently used in identifying CHIP, we additionally determined the mappability of the genetic sequence surrounding each locus with multiple approaches including an extensively repeated sequence search. This allowed us to exclude hotspots that occurred at difficult-to-map regions in CHIP analyses. Reported mutations at regions of homopolymer repeats or identical to nonrare germline SNPs were also classified as less confident loci for CHIP calling and excluded. These additional filtering steps resulted in a final total of 364 hotspot loci having increased likelihood for a driver or contributory role in hematologic malignancies, which could be confidently used for detecting CHIP.

Noncancer Cohorts

Three large cohorts unselected for cancer and having publicly available paired-end WES data were analyzed for CHIP-associated mutations. (i) 1KG: We assessed CHIP from the FASTQ data of 2,535 WES samples coming from the 26 populations of the phase 3 1000 Genomes Project submitted to the European Nucleotide Archive (ENA) as PRJNA262923 by the Wellcome Trust Sanger Institute. Persons in this cohort were generally assumed to be unselected for disease and ≥18 years old. One sample with unavailable paired-end FASTQ data files as well as samples that were poorly sequenced or had other issues previously identified for exclusion (61) were not utilized, resulting in a total of N = 2,503 persons whose samples were deemed satisfactory. DNA had been extracted from lymphoblastoid cell lines (LCL) for most participants (61). Also as per 1000 Genomes Project Consortium and colleagues (61, 69) and The International Genome Sample Resource (www.internationalgenome.org), the populations assessed consist of African Caribbean in Barbados (ACB); Americans of African Ancestry in SW, USA (ASW); Bengali in Bangladesh (BEB); Chinese Dai in Xishuangbanna, China (CDX); Utah Residents (from CEPH families) with Northern and Western European Ancestry (CEU); Han Chinese in Beijing, China (CHB); Han Chinese South (CHS); Colombians from Medellin, Colombia (CLM); Esan in Nigeria (ESN); Finnish in Finland (FIN); British in England and Scotland (GBR); Gujarati Indian from Houston, Texas (GIH); Gambian in Western Divisions in the Gambia (GWD); Iberian Population in Spain (IBS); Indian Telugu in the UK (ITU); Japanese in Tokyo, Japan (JPT); Kinh in Ho Chi Minh City, Vietnam (KHV); Luhya in Webuye, Kenya (LWK); Mende in Sierra Leone (MSL); Mexican Ancestry from Los Angeles, USA (MXL); Peruvians from Lima, Peru (PEL); Punjabi from Lahore, Pakistan (PJL); Puerto Ricans in Puerto Rico (PUR); Sri Lankan Tamil from the UK (STU); Toscani in Italia (TSI); and Yoruba in Ibadan, Nigeria (YRI). (ii) SSC: We assessed the FASTQ data of 804 paired-end WES samples from Simons Simplex families deposited in ENA as PRJNA167318. These included 413 parents and 391 children. Of the children, 205 were diagnosed with autism and 186 were unaffected siblings. DNA had been extracted from whole blood as reported in Sanders and colleagues (60). Of note, this cohort has been previously analyzed for postzygotic mutations of substantial frequency (65), and while not specifically looking for CHIP, Lim and colleagues (65) did identify two of the general (and none of the hotspot) CHIP mutations we list in Supplementary Table S8. (iii) QTRG: We assessed paired-end WES FASTQ data from the samples of 1,231 persons of the QTRG for precision medicine deposited in ENA as PRJNA290484. These included 510 persons diagnosed with type II diabetes and 270 persons designated as controls. The age spectrum of this cohort ranged from 0 to 85 years, with 25 persons being <18 years of age. DNA had been extracted from blood as reported in Fakhro and colleagues (62). While both this QTRG cohort and the SSC cohort assessed DNA from blood, a possibility for infrequent LCL-specific mutations could be present in the data of the 1KG samples. However, we found the frequency of CHIP mutations in the 1KG samples to be similar to the SSC parent cohort, implying that any LCL-specific effect may be small in magnitude.

CHIP Assessment at Hotspots

We aligned the paired-end FASTQ data files for each of the 4,538 samples in the noncancer cohorts to the human reference genome hg19 and performed subsequent deduplication, realignment, recalibration, filtering, and variant calling (additional details provided in Supplementary Methods). CHIP at hotspots classification required somatic mutations to have ≥3 variant reads of ≥20 total reads and to occur at one of the 364 confident amino acid hotspots identified in the recurrent mutation analysis (hence at a locus reported in at least three patients and with confident mapping). As the total number of loci at which CHIP could be called was reduced >100-fold from that used in other studies assessing CHIP across large domains (1,092 bases compared with >160,000 bases in Jaiswal and colleagues; ref. 10), we did not incorporate an additional lower-bound cutoff at hotspots. In addition, such mutations were required to yield the identical amino acid substitution or insertion/deletion effect (e.g., frameshift) as had been previously reported in at least one diagnostic tumor sample, without additional filtering for predicted benign/deleterious effect on protein. Only mutations meeting all of these criteria were used to identify CHIP at hotspots. CHIP at the 70 hotspots with greater potential for artifacts were also assessed, and a recovery analysis was performed to arbitrate any excluded mutations for inclusion. While the coverage statistics at the 364 hotspots were plausibly similar to those of other WES studies analyzed for CHIP, coverage was insufficient to detect small- to medium-sized clones at many hotspots. In a separate analysis, a classification of relaxed CHIP at hotspots was used with the sole modification of allowing ≥2 variant reads for identification.

General CHIP Assessment

We utilized the specified mutations and domains provided in Jaiswal and colleagues (10) to assess general CHIP (clonal mutations not exclusive to hotspot mutations previously reported in hematologic cancers, although in genes or domains that have been associated with hematopoiesis). We performed filtering for mappability, sequencing artifacts, and recurrence as we did for CHIP at hotspots, with an additional imposed VAF threshold requirement of 4%. We also assessed sensitivity of CHIP prevalence by gene for lower VAF threshold levels. We calculated the proportion of hotspots in our general CHIP calls at a gene level and overall level, and similarly computed these rates on the data of previously published cohorts (5, 10).

Statistical Analysis

We used logistic regression analysis to determine the association of year of age with relaxed CHIP mutation occurrence, with age being mean centered. We also used logistic regression to assess the association of frequency of reported mutations at hotspots and binary detection of CHIP at those loci. As the frequency of reported mutations was right skewed, the natural logarithm of this variable was used. Sensitivity analyses using unlogged data as well as log base 10 showed similar results. Deviance from expected recurrence frequencies for mutations based on binomial distributions was used to determine the effect of various potential hotspot threshold criteria. Testing for association of variant allele fraction and reported recurrence groups was computed with the Wilcoxon–Mann–Whitney test. A two-sided Fisher exact test was used to determine the significance of differences in prevalence for the different variant thresholds and cohort pairings assessed. Additional statistical details are provided in the Supplementary Methods. All statistical computations were performed with SAS software, version 9.4 (SAS Institute).

J.L. Rodriguez-Flores is a full-time employee of Regeneron Pharmaceuticals Inc. C.C. Mason reports other from Intermountain Healthcare Foundation (funding for the Pediatric Cancer Program) and Primary Children's Hospital Foundation (funding for the Pediatric Cancer Program), and grants from Primary Children's Center for Personalized Medicine during the conduct of the study. No disclosures were reported by the other authors.

The content and expressed viewpoints are those of the authors only.

J.E. Feusier: Data curation, software, validation, investigation, writing–original draft, writing–review and editing. S. Arunachalam: Data curation, software, validation, investigation, writing–review and editing. T. Tashi: Conceptualization, writing–review and editing. M.J. Baker: Data curation, writing–review and editing. C. VanSant-Webb: Data curation. A. Ferdig: Data curation, writing–review and editing. B.E. Welm: Funding acquisition, writing–review and editing. J.L. Rodriguez-Flores: Validation, writing–review and editing. C. Ours: Conceptualization, writing–review and editing. L.B. Jorde: Conceptualization, funding acquisition, writing–review and editing. J.T. Prchal: Conceptualization, writing–review and editing. C.C. Mason: Conceptualization, resources, data curation, software, formal analysis, supervision, funding acquisition, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing.

The authors thank the many researchers who published the data and details of their studies as well as the participants in those studies, and acknowledge their original publications and the European Nucleotide Archive as sources of the primary data. They also thank Allyson Mower and Michele Ballantyne for invaluable advice, and the many publishers who granted text and data mining permission. C.C. Mason acknowledges Pediatric Cancer Program funding, which is supported by the Intermountain Healthcare and Primary Children's Hospital Foundations as well as the Department of Pediatrics and Division of Pediatric Hematology/Oncology at the University of Utah. Some work by J.E. Feusier was supported by National Center for Advancing Translational Sciences (NCATS)/NIH (UL1TR002538/TL1TR002540). Some work by S. Arunachalam was funded by a U.S. Department of Defense grant (W81XWH-14-1-0417; to B.E. Welm). L.B. Jorde acknowledges funding from NIH (GM118335/GM059290). The support and resources from the Center for High Performance Computing at the University of Utah are gratefully acknowledged.

1.
Bailey
MH
,
Tokheim
C
,
Porta-Pardo
E
,
Sengupta
S
,
Bertrand
D
,
Weerasinghe
A
, et al
Comprehensive characterization of cancer driver genes and mutations
.
Cell
2018
;
173
:
371
85
.
2.
Chang
MT
,
Asthana
S
,
Gao
SP
,
Lee
BH
,
Chapman
JS
,
Kandoth
C
, et al
Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity
.
Nat Biotech
2016
;
34
:
155
63
.
3.
Ma
X
,
Liu
Y
,
Liu
Y
,
Alexandrov
LB
,
Edmonson
MN
,
Gawad
C
, et al
Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours
.
Nature
2018
;
555
:
371
6
.
4.
Tate
JG
,
Bamford
S
,
Jubb
HC
,
Sondka
Z
,
Beare
DM
,
Bindal
N
, et al
COSMIC: the Catalogue Of Somatic Mutations In Cancer
.
Nucleic Acids Res
2019
;
47
:
D941
7
.
5.
Jaiswal
S
,
Fontanillas
P
,
Flannick
J
,
Manning
A
,
Grauman
PV
,
Mar
BG
, et al
Age-related clonal hematopoiesis associated with adverse outcomes
.
N Engl J Med
2014
;
371
:
2488
98
.
6.
Genovese
G
,
Kähler
AK
,
Handsaker
RE
,
Lindberg
J
,
Rose
SA
,
Bakhoum
SF
, et al
Clonal hematopoiesis and blood-cancer risk inferred from blood DNA sequence
.
N Engl J Med
2014
;
371
:
2477
87
.
7.
Abelson
S
,
Collord
G
,
Ng
SWK
,
Weissbrod
O
,
Mendelson Cohen
N
,
Niemeyer
E
, et al
Prediction of acute myeloid leukaemia risk in healthy individuals
.
Nature
2018
;
559
:
400
4
.
8.
Xie
M
,
Lu
C
,
Wang
J
,
McLellan
MD
,
Johnson
KJ
,
Wendl
MC
, et al
Age-related mutations associated with clonal hematopoietic expansion and malignancies
.
Nat Med
2014
;
20
:
1472
8
.
9.
McKerrell
T
,
Park
N
,
Moreno
T
,
Grove
CS
,
Ponstingl
H
,
Stephens
J
, et al
Leukemia-associated somatic mutations drive distinct patterns of age-related clonal hemopoiesis
.
Cell Rep
2015
;
10
:
1239
45
.
10.
Jaiswal
S
,
Natarjan
P
,
Silver
AJ
,
Gibson
CJ
,
Bick
AG
,
Shvartz
E
, et al
Clonal hematopoiesis and risk of atherosclerotic cardiovascular disease
.
N Engl J Med
2017
;
377
:
111
21
.
11.
Andersson
AK
,
Ma
J
,
Wang
J
,
Chen
X
,
Gedman
AL
,
Dang
J
, et al
The landscape of somatic mutations in infant MLL-rearranged acute lymphoblastic leukemias
.
Nat Genet
2015
;
47
:
330
7
.
12.
Chen
B
,
Jiang
L
,
Zhong
ML
,
Li
JF
,
Li
BS
,
Peng
LJ
, et al
Identification of fusion genes and characterization of transcriptome features in T-cell acute lymphoblastic leukemia
.
Proc Natl Acad Sci U S A
2018
;
115
:
373
8
.
13.
De Keersmaecker
K
,
Atak
ZK
,
Li
N
,
Vicente
C
,
Patchett
S
,
Girardi
T
, et al
Exome sequencing identifies mutation in CNOT3 and ribosomal genes RPL5 and RPL10 in T-cell acute lymphoblastic leukemia
.
Nat Genet
2013
;
45
:
186
90
.
14.
Holmfeldt
L
,
Wei
L
,
Diaz-Flores
E
,
Walsh
M
,
Zhang
J
,
Ding
L
, et al
The genomic landscape of hypodiploid acute lymphoblastic leukemia
.
Nat Genet
2013
;
45
:
242
52
.
15.
Liu
YF
,
Wang
BY
,
Zhang
WN
,
Huang
JY
,
Li
BS
,
Zhang
M
, et al
Genomic profiling of adult and pediatric B-cell acute lymphoblastic leukemia
.
EBioMedicine
2016
;
8
:
173
83
.
16.
Liu
Y
,
Easton
J
,
Shao
Y
,
Maciaszek
J
,
Wang
Z
,
Wilkinson
MR
, et al
The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia
.
Nat Genet
2017
;
49
:
1211
8
.
17.
Oshima
K
,
Khiabanian
H
,
da Silva-Almeida
AC
,
Tzoneva
G
,
Abate
F
,
Ambesi-Impiombato
A
, et al
Mutational landscape, clonal evolution patterns, and role of RAS mutations in relapsed acute lymphoblastic leukemia
.
Proc Natl Acad Sci U S A
2016
;
113
:
11306
11
.
18.
Papaemmanuil
E
,
Rapado
I
,
Li
Y
,
Potter
NE
,
Wedge
DC
,
Tubio
J
, et al
RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia
.
Nat Genet
2014
;
46
:
116
25
.
19.
Paulsson
K
,
Lilljebjörn
H
,
Biloglav
A
,
Olsson
L
,
Rissler
M
,
Castor
A
, et al
The genomic landscape of high hyperdiploid childhood acute lymphoblastic leukemia
.
Nat Genet
2015
;
47
:
672
6
.
20.
Russell
LJ
,
Jones
L
,
Enshaei
A
,
Tonin
S
,
Ryan
SL
,
Eswaran
J
, et al
Characterisation of the genomic landscape of CRLF2-rearranged acute lymphoblastic leukemia
.
Genes Chromosomes Cancer
2017
;
56
:
363
72
.
21.
Ryan
SL
,
Matheson
E
,
Grossmann
V
,
Sinclair
P
,
Bashton
M
,
Schwab
C
, et al
The role of the RAS pathway in iAMP21-ALL
.
Leukemia
2016
;
30
:
1824
31
.
22.
Stengel
A
,
Schnittger
S
,
Weissmann
S
,
Kuznia
S
,
Kern
W
,
Kohlmann
A
, et al
TP53 mutations occur in 15.7% of ALL and are associated with MYC-rearrangement, low hypodiploidy, and a poor prognosis
.
Blood
2014
;
124
:
251
8
.
23.
Huether
R
,
Dong
L
,
Chen
X
,
Wu
G
,
Parker
M
,
Wei
L
, et al
The landscape of somatic mutations in epigenetic regulators across 1,000 paediatric cancer genomes
.
Nat Commun
2014
;
5
:
3630
.
24.
Bolouri
H
,
Farrar
JE
,
Triche
T
 Jr
,
Ries
RE
,
Lim
EL
,
Alonzo
TA
, et al
The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions
.
Nat Med
2018
;
24
:
103
12
.
25.
de Rooij
JD
,
Branstetter
C
,
Ma
J
,
Li
Y
,
Walsh
MP
,
Cheng
J
, et al
Pediatric non-Down syndrome acute megakaryoblastic leukemia is characterized by distinct genomic subsets with varying outcomes
.
Nat Genet
2017
;
49
:
451
6
.
26.
Dolnik
A
,
Engelmann
JC
,
Scharfenberger-Schmeer
M
,
Mauch
J
,
Kelkenberg-Schade
S
,
Haldemann
B
, et al
Commonly altered genomic regions in acute myeloid leukemia are enriched for somatic mutations involved in chromatin remodeling and splicing
.
Blood
2012
;
120
:
e83
92
.
27.
Eisfeld
AK
,
Kohlschmidt
J
,
Schwind
S
,
Nicolet
D
,
Blachly
JS
,
Orwick
S
, et al
Mutations in the CCND1 and CCND2 genes are frequent events in adult patients with t(8;21)(q22;q22) acute myeloid leukemia
.
Leukemia
2017
;
31
:
1278
85
.
28.
Eisfeld
AK
,
Kohlschmidt
J
,
Mrózek
K
,
Volinia
S
,
Blachly
JS
,
Nicolet
D
, et al
Mutational landscape and gene expression patterns in adult acute myeloid leukemias with monosomy 7 as a sole abnormality
.
Cancer Res
2017
;
77
:
207
18
.
29.
Eisfeld
AK
,
Kohlschmidt
J
,
Mrózek
K
,
Blachly
JS
,
Nicolet
D
,
Kroll
K
, et al
Adult acute myeloid leukemia with trisomy 11 as the sole abnormality is characterized by the presence of five distinct gene mutations: MLL-PTD, DNMT3A, U2AF1, FLT3-ITD and IDH2
.
Leukemia
2016
;
30
:
2254
8
.
30.
Faber
ZJ
,
Chen
X
,
Gedman
AL
,
Boggs
K
,
Cheng
J
,
Ma
J
, et al
The genomic landscape of core-binding factor acute myeloid leukemias
.
Nat Genet
2016
;
48
:
1551
6
.
31.
Farrar
JE
,
Schuback
HL
,
Ries
RE
,
Wai
D
,
Hampton
OA
,
Trevino
LR
, et al
Genomic profiling of pediatric acute myeloid leukemia reveals a changing mutational landscape from disease diagnosis to relapse
.
Cancer Res
2016
;
76
:
2197
205
.
32.
Garg
M
,
Nagata
Y
,
Kanojia
D
,
Mayakonda
A
,
Yoshida
K
,
Haridas Keloth
S
, et al
Profiling of somatic mutations in acute myeloid leukemia with FLT3-ITD at diagnosis and relapse
.
Blood
2015
;
126
:
2491
501
.
33.
Greif
PA
,
Hartmann
L
,
Vosberg
S
,
Stief
SM
,
Mattes
R
,
Hellmann
I
, et al
Evolution of cytogenetically normal acute myeloid leukemia during therapy and relapse: an exome sequencing study of 50 patients
.
Clin Cancer Res
2018
;
24
:
1716
26
.
34.
Hirsch
P
,
Zhang
Y
,
Tang
R
,
Joulin
V
,
Boutroux
H
,
Pronier
E
, et al
Genetic hierarchy and temporal variegation in the clonal history of acute myeloid leukaemia
.
Nat Commun
2016
;
7
:
12475
.
35.
Lavallée
VP
,
Baccelli
I
,
Krosl
J
,
Wilhelm
B
,
Barabé
F
,
Gendron
P
, et al
The transcriptomic landscape and directed chemical interrogation of MLL-rearranged acute myeloid leukemias
.
Nat Genet
2015
;
47
:
1030
7
.
36.
Cancer Genome Atlas Research Network
Ley
TJ
,
Miller
C
,
Ding
L
,
Raphael
BJ
,
Mungall
AJ
, et al
Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia
.
N Engl J Med
2013
;
368
:
2059
74
.
37.
Madan
V
,
Shyamsunder
P
,
Han
L
,
Mayakonda
A
,
Nagata
Y
,
Sundaresan
J
, et al
Comprehensive mutational analysis of primary and relapse acute promyelocytic leukemia
.
Leukemia
2016
;
30
:
1672
81
.
38.
Papaemmanuil
E
,
Gerstung
M
,
Bullinger
L
,
Gaidzik
VI
,
Paschka
P
,
Roberts
ND
, et al
Genomic classification and prognosis in acute myeloid leukemia
.
N Engl J Med
2016
;
374
:
2209
21
.
39.
Sehgal
AR
,
Gimotty
PA
,
Zhao
J
,
Hsu
JM
,
Daber
R
,
Morrissette
JD
, et al
DNMT3A mutational status affects the results of dose-escalated induction therapy in acute myelogenous leukemia
.
Clin Cancer Res
2015
;
21
:
1614
20
.
40.
Sood
R
,
Hansen
NF
,
Donovan
FX
,
Carrington
B
,
Bucci
D
,
Maskeri
B
, et al
Somatic mutational landscape of AML with inv(16) or t(8;21) identifies patterns of clonal evolution in relapse leukemia
.
Leukemia
2016
;
30
:
501
4
.
41.
Thol
F
,
Klesse
S
,
Köhler
L
,
Gabdoulline
R
,
Kloos
A
,
Liebich
A
, et al
Acute myeloid leukemia derived from lympho-myeloid clonal hematopoiesis
.
Leukemia
2017
;
31
:
1286
95
.
42.
Kim
T
,
Tyndel
MS
,
Kim
HJ
,
Ahn
JS
,
Choi
SH
,
Park
HJ
, et al
Spectrum of somatic mutation dynamics in chronic myeloid leukemia following tyrosine kinase inhibitor therapy
.
Blood
2017
;
129
:
38
47
.
43.
Togasaki
E
,
Takeda
J
,
Yoshida
K
,
Shiozawa
Y
,
Takeuchi
M
,
Oshima
M
, et al
Frequent somatic mutations in epigenetic regulators in newly diagnosed chronic myeloid leukemia
.
Blood Cancer J
2017
;
7
:
e559
.
44.
Mason
CC
,
Khorashad
JS
,
Tantravahi
SK
,
Kelley
TW
,
Zabriskie
MS
,
Yan
D
, et al
Age-related mutations and chronic myelomonocytic leukemia
.
Leukemia
2016
;
30
:
906
13
.
45.
Merlevede
J
,
Droin
N
,
Qin
T
,
Meldi
K
,
Yoshida
K
,
Morabito
M
, et al
Mutation allele burden remains unchanged in chronic myelomonocytic leukaemia responding to hypomethylating agents
.
Nat Commun
2016
;
7
:
10767
.
46.
Palomo
L
,
Garcia
O
,
Arnan
M
,
Xicoy
B
,
Fuster
F
,
Cabezón
M
, et al
Targeted deep sequencing improves outcome stratification in chronic myelomonocytic leukemia with low risk cytogenetic features
.
Oncotarget
2016
;
7
:
57021
35
.
47.
Patnaik
MM
,
Barraco
D
,
Lasho
TL
,
Finke
CM
,
Hanson
CA
,
Ketterling
RP
, et al
DNMT3A mutations are associated with inferior overall and leukemia-free survival in chronic myelomonocytic leukemia
.
Am J Hematol
2017
;
92
:
56
61
.
48.
Caye
A
,
Strullu
M
,
Guidez
F
,
Cassinat
B
,
Gazal
S
,
Fenneteau
O
, et al
Juvenile myelomonocytic leukemia displays mutations in components of the RAS pathway and the PRC2 network
.
Nat Genet
2015
;
47
:
1334
40
.
49.
Stieglitz
E
,
Taylor-Weiner
AN
,
Chang
TY
,
Gelston
LC
,
Wang
YD
,
Mazor
T
, et al
The genomic landscape of juvenile myelomonocytic leukemia
.
Nat Genet
2015
;
47
:
1326
33
.
50.
Haferlach
T
,
Nagata
Y
,
Grossman
V
,
Okuno
Y
,
Bacher
U
,
Nagae
G
, et al
Landscape of genetic lesions in 944 patients with myelodysplastic syndromes
.
Leukemia
2014
;
28
:
241
7
.
51.
Pastor
V
,
Hirabayashi
S
,
Karow
A
,
Wehrle
J
,
Kozyra
EJ
,
Nienhold
R
, et al
Mutational landscape in children with myelodysplastic syndromes is distinct from adults: specific somatic drivers and novel germline variants
.
Leukemia
2017
;
31
:
759
62
.
52.
Walter
MJ
,
Shen
D
,
Shao
J
,
Ding
L
,
White
BS
,
Kandoth
C
, et al
Clonal diversity of recurrently mutated genes in myelodysplastic syndromes
.
Leukemia
2013
;
27
:
1275
82
.
53.
Yoshida
K
,
Sanada
M
,
Shiraishi
Y
,
Nowak
D
,
Nagata
Y
,
Yamamoto
R
, et al
Frequent pathway mutations of splicing machinery in myelodysplasia
.
Nature
2011
;
478
:
64
9
.
54.
Churpek
JE
,
Pyrtel
K
,
Kanchi
KL
,
Shao
J
,
Koboldt
D
,
Miller
CA
, et al
Genomic analysis of germ line and somatic variants in familial myelodysplasia/acute myeloid leukemia
.
Blood
2015
;
126
:
2484
90
.
55.
Schwartz
JR
,
Ma
J
,
Lamprecht
T
,
Walsh
M
,
Wang
S
,
Bryant
V
, et al
The genomic landscape of pediatric myelodysplastic syndromes
.
Nat Commun
2017
;
8
:
1557
.
56.
Lundberg
P
,
Karow
A
,
Nienhold
R
,
Looser
R
,
Hao-Shen
H
,
Nissen
I
, et al
Clonal evolution and clinical correlates of somatic mutations in myeloproliferative neoplasms
.
Blood
2014
;
123
:
2220
8
.
57.
Nangalia
J
,
Massie
CE
,
Baxter
EJ
,
Nice
FL
,
Gundem
G
,
Wedge
DC
, et al
Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2
.
N Engl J Med
2013
;
369
:
2391
405
.
58.
Steensma
DP
,
Bejar
R
,
Jaiswal
S
,
Lindsley
RC
,
Sekeres
MA
,
Hasserjian
RP
, et al
Clonal hematopoiesis of indeterminate potential and its distinction from myelodysplastic syndromes
.
Blood
2015
;
126
:
9
16
.
59.
Papaemmanuil
E
,
Gerstung
M
,
Malcovati
L
,
Tauro
S
,
Gundem
G
,
Van Loo
P
, et al
Clinical and biological implications of driver mutations in myelodysplastic syndromes
.
Blood
2013
;
122
:
3616
27
.
60.
Sanders
SJ
,
Murtha
MT
,
Gupta
AR
,
Murdoch
JD
,
Raubeson
MJ
,
Willsey
AJ
, et al
De novo mutations revealed by whole-exome sequencing are strongly associated with autism
.
Nature
2012
;
485
:
237
41
.
61.
Sudmant
PH
,
Rausch
T
,
Gardner
EJ
,
Handsaker
RE
,
Abyzov
A
,
Huddleston
J
, et al
An integrated map of structural variation in 2,504 human genomes
.
Nature
2015
;
526
:
75
81
.
62.
Fakhro
KA
,
Staudt
MR
,
Ramstetter
MD
,
Robay
A
,
Malek
JA
,
Badii
R
, et al
The Qatar genome: a population-specific tool for precision medicine in the Middle East
.
Hum Genome Var
2016
;
3
:
16016
.
63.
Karczewski
KJ
,
Francioli
LC
,
Tiao
G
,
Cummings
BB
,
Alföldi
J
,
Wang
Q
, et al
The mutational constraint spectrum quantified from variation in 141,456 humans
.
Nature
2020
;
581
:
434
43
.
64.
Knudson
AG
 Jr
. 
Mutation and cancer: statistical study of retinoblastoma
.
Proc Natl Acad Sci U S A
1971
;
68
:
820
3
.
65.
Lim
ET
,
Uddin
M
,
De Rubeis
S
,
Chan
Y
,
Kamumbu
AS
,
Zhang
X
, et al
Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder
.
Nat Neurosci
2017
;
20
:
1217
24
.
66.
Tatton-Brown
K
,
Zachariou
A
,
Loveday
C
,
Renwick
A
,
Mahamdallie
S
,
Aksglaede
L
, et al
The Tatton-Brown-Rahman syndrome: a clinical study of 55 individuals with de novo constitutive DNMT3A variants
.
Wellcome Open Res
2018
;
3
:
46
.
67.
Yoshizato
T
,
Dumitriu
B
,
Hosokawa
K
,
Makishima
H
,
Yoshida
K
,
Townsley
D
, et al
Somatic mutations and clonal hematopoiesis in aplastic anemia
.
N Engl J Med
2015
;
373
:
35
47
.
68.
Li
Y
,
Roberts
ND
,
Wala
JA
,
Shapira
O
,
Schumacher
SE
,
Kumar
K
, et al
Patterns of somatic structural variation in human cancer genomes
.
Nature
2020
;
578
:
112
21
.
69.
1000 Genomes Project Consortium
Auton
A
,
Brooks
LD
,
Durbin
RM
,
Garrison
EP
,
Kang
HM
, et al
A global reference for human genetic variation
.
Nature
2015
;
526
:
68
74
.