Although replication repair deficiency, either by mismatch repair deficiency (MMRD) and/or loss of DNA polymerase proofreading, can cause hypermutation in cancer, microsatellite instability (MSI) is considered a hallmark of MMRD alone. By genome-wide analysis of tumors with germline and somatic deficiencies in replication repair, we reveal a novel association between loss of polymerase proofreading and MSI, especially when both components are lost. Analysis of indels in microsatellites (MS-indels) identified five distinct signatures (MS-sigs). MMRD MS-sigs are dominated by multibase losses, whereas mutant-polymerase MS-sigs contain primarily single-base gains. MS deletions in MMRD tumors depend on the original size of the MS and converge to a preferred length, providing mechanistic insight. Finally, we demonstrate that MS-sigs can be a powerful clinical tool for managing individuals with germline MMRD and replication repair–deficient cancers, as they can detect the replication repair deficiency in normal cells and predict their response to immunotherapy.

Significance:

Exome- and genome-wide MSI analysis reveals novel signatures that are uniquely attributed to mismatch repair and DNA polymerase. This provides new mechanistic insight into MS maintenance and can be applied clinically for diagnosis of replication repair deficiency and immunotherapy response prediction.

This article is highlighted in the In This Issue feature, p. 995

The two major mechanisms that safeguard the fidelity of genome replication are the DNA mismatch repair machinery and DNA polymerase proofreading (ϵ or δ, encoded by POLE and POLD1; refs. 1–4). Deficiencies of mismatch repair (MMRD) or polymerase proofreading can occur independently or simultaneously, leading to combined DNA replication repair deficiency (RRD). Both mechanisms maintain genome stability by correcting misincorporated noncomplementary nucleotides. Thus, when either or both pathways are inactivated, there is an increased rate of replicative mutagenesis that leads to a wide array of hypermutant cancers (1, 2, 4). The MMR machinery also repairs insertions and deletions in stretches of DNA with small repeated motifs [called microsatellites (MS)], which are created as a result of DNA polymerase slippage (4, 5). Nevertheless, the impact of DNA polymerase on the accumulation of these MS insertions and deletions (MS-indels) in human cancer is currently not well described (6–9).

The increased number of MS-indels in MMRD tumors is marked by MS instability (MSI). MSI is clinically used for tumor stratification, referral to germline testing for cancer predisposition, and has recently been approved as a biomarker for immune checkpoint inhibitors (ICI; refs. 10–13). MSI is commonly detected by comparing MS lengths of a limited panel of MSs (five loci) between a patient's tumor and germline DNA. The Cancer Genome Atlas (TCGA) used this approach to stratify tumors as MS unstable [MSI-high (MSI-H)] or MS stable (MSS) for colorectal, gastric, and endometrial tumors. However, this stratification approach has recently been challenged as being limited in scope, due to the misclassification of some MMRD cancers (6, 14). Importantly, using this assay, polymerase mutant cancers do not exhibit high MSI (2, 15, 16), and therefore, in humans, the role of polymerase mutations in the accumulation of MS-indels in cancer is currently considered largely unknown.

The most common cause of MMRD in human cancers is somatic hypermethylation of the MLH1 promoter, whereas others are due to LOH from the germline allele in any one of the MMR genes, MSH2, MSH6, MLH1, and PMS2 (17, 18). Insight regarding the pathogenesis and clinical behavior of RRD can be gained by studying human syndromes with germline mutations in these genes. Lynch syndrome is caused by germline heterozygous inactivating mutations in one of the MMR genes and is followed by somatic loss of the remaining allele, leading to MMR deficiency. Because it requires loss of only one allele, Lynch syndrome is associated with earlier cancer onset than somatic MMRD (19). However, although less common, the most aggressive form of MMRD stems from germline biallelic inactivating mutations in one of the MMR genes. This condition, termed constitutional MMRD (CMMRD), is one of the most aggressive cancer syndromes in humans, in which virtually all individuals will be affected by a variety of cancers during childhood, and all share hypermutation and specific mutational signatures (1, 2, 4, 20). Similarly, heterogeneous germline polymerase mutations in the exonuclease (proofreading) domain of POLE or POLD1(1) result in a cancer syndrome termed polymerase proofreading deficiency (PPD), which is less well described but also leads to hypermutant cancers (1, 2, 7, 20–22). Importantly, MS-indels in either of these cancer predisposition syndromes have not been previously investigated.

In patients with CMMRD, somatic mutations in the polymerase genes result in combined dysfunction of the replication repair machinery, leading to a dramatic accumulation of single-nucleotide variations (SNV; refs. 1, 2, 7, 20–22). Analysis of SNV-based mutational signatures and their timing also enabled the prediction of early germline events and late treatment-related processes in each cancer that correlate with the type of RRD (2). Moreover, cancers with both MMRD and PPD have unique SNV signatures that are characteristic of the interaction between the two DNA repair mechanisms (7). Nevertheless, SNV-based analyses fail to provide information on several biological and clinical questions related to RRD carcinogenesis. These include insights regarding the mechanisms of mutagenesis during RRD tumor initiation and progression, explaining genotype–phenotype correlations between different mutations in MMR and polymerase genes, and the ability to detect RRD in normal cells. These are highly important for the management of these patients and their tumors, and implementation of surveillance to family members.

Although indel-based signatures were recently reported (23), the accumulation of MS-indels and their potential signatures in MMRD during tumorigenesis are not well described (24–26). Furthermore, the impact of polymerase mutations on the accumulation of MS-indels and their corresponding signatures in cancer is not known (6–9).

To answer the above biological and clinical questions, we used our large cohort of carefully annotated cancers and normal tissues from patients with germline mutations in the MMR and polymerase genes. We analyzed genomic MS-indels from this cohort using whole-exome sequencing (WES) and whole-genome sequencing (WGS), thoroughly investigating the roles of both replication repair machineries in correcting MS alterations, and their impact on cancer progression and response to therapy. Comparison of these tumors with adult RRD cancers and pediatric non-RRD tumors uncovered: (i) important insights into the accumulation of MS-indels in RRD cancers; (ii) a novel and distinct association between polymerase mutations and MS-indel accumulation; and (iii) a potential mechanism of MMRD that produces large deletions in MSs by single events that converge to an MS-locus length of approximately 15 bp. We then considered using the activity levels of different MS-indel signatures for tumor stratification and genetic diagnosis, and as biomarkers of response to immunotherapy.

A Distinct Pattern of MSI in CMMRD Cancers

Because all cells from patients with CMMRD have impaired abilities to repair mismatches and MS-indels (1, 2), MSI should be prevalent in all normal and cancerous tissues. To test this hypothesis, we analyzed the MSI status of a cohort of 96 CMMRD cancers and 8 normal tissues using the current gold-standard method, the Microsatellite Instability Analysis System (MIAS, Promega; refs. 27, 28). Strikingly, only 17 (18%) of the 96 CMMRD cancers and none of the normal tissues were classified as MSI-H (≥2 loci with ≥3 bp indels; Fig. 1A). This observation was not related to the specific MMR-mutated gene (P = 0.95, Kruskal–Wallis, Supplementary Fig. S1A), and similarly, different tumors from the same patient resulted in different MSI classifications (Fig. 1A; Supplementary Fig. S1B; Supplementary Table S1). Interestingly, we observed tissue specificity, with gastrointestinal (GI) cancers having the highest proportion of MSI-H (10/25, 40%) compared with brain tumors (2/51, 4%, P = 2 × 10−4, χ2 test).

It is unclear whether the lack of indels in the five MIAS MS-loci is a unique feature to these loci, or whether MS-loci in CMMRD tumors universally have a low rate of indel accumulation. To test this, we applied MSMuTect to all of the MS-loci in the 69 tumor/normal whole-exome pairs of pediatric CMMRD cancers (6). We found that pediatric CMMRD cancers had a significantly higher tumor MS-indel burden (TMSIB) than pediatric MMR-proficient (MMRP; n = 239) cancers (P < 2.0 × 10−15, Mann–Whitney U test, Fig. 1B). Importantly, pediatric CMMRD cases had a similar TMSIB to the adult MMRD cases (n = 114, P = 0.37), and no significant difference was found in the TMSIB across pediatric cases from different tissues (Fig. 1B). On the other hand, the number of SNVs in the pediatric tumors was higher due to the somatic POLE mutations commonly found in CMMRD tumors (2). Overall, our method demonstrates that, as predicted by the lack of MMR, pediatric CMMRD cancers indeed have an increased rate of MS-indel accumulation, similar to adult MMRD cancers.

MS-indels are spread equally on different chromosomes in both adult and pediatric cohorts (Fig. 1C; Supplementary Table S2), and increase between initial and relapsed tumors (ref. 29; n = 8; P = 0.014, Paired-samples Wilcoxon test; Fig. 1D). Furthermore, in contrast to adult MMRD tumors which exhibit highly mutated loci (refs. 6, 14; hotspots) like those used in the MIAS assay, pediatric MMRD tumors lacked this characteristic. For example, the MS in the ACVR2A gene (chr2:148683686-148683693) is mutated in approximately 45% of adult MMRD cases, but only 11% in pediatric CMMRD cases (Supplementary Table S3). In addition, although adult tumors have 19 hotspots that are mutated in more than 30% of the cases, the strongest hotspot in pediatric tumors is mutated in approximately 20% (Fig. 1E; Supplementary Table S4). Together, these observations suggest that MS-indels do accumulate in childhood CMMRD cancers and continue to accumulate as tumors progress. Furthermore, the key differences between adult and childhood MMRD cancers, i.e., the different mutated loci and the lack of MS-loci that are mutated very frequently, may explain why the MIAS assay was unable to detect MSI in pediatric tumors.

MMRD and Polymerase Mutant Tumors Have Distinct MSI-Indel Signatures

Another striking observation was that childhood tumors had a similar number of MS-indels as adult MMRD cancers, despite being largely MSS in the MIAS analysis (Fig. 1A). This was most clearly observed between childhood CMMRD brain tumors and adult GI cancers (P = 0.27, Mann–Whitney U test; Fig. 1B), as well as between pediatric and adult GI cancers (P = 0.37, Mann–Whitney U test; Fig. 1B). Because CMMRD tumors acquire somatic polymerase mutations, which are known to dramatically increase SNV mutations in cancer and MS-indels in yeast, we hypothesized that mutant polymerases actively contribute to MS-indel accumulation in pediatric CMMRD tumors (3, 30, 31). We therefore compared MS-indels found in WES of 239 pediatric MMRP, 38 MMRD, 4 polymerase mutant, and 31 combined RRD (MMRD and polymerase mutant) tumors with 505 adult endometrial cancers spanning the same four subtypes. Despite proficiency in MMR, polymerase mutant cancers exhibited increased MS-indel burden in both pediatric and adult cancers (P = 0.0071, P = 1.1 × 10−8, respectively; Fig. 2A). Moreover, combined RRD cancers have an increased MS-indel burden compared with MMRD cancers in the pediatric cohort (P = 1.2 × 10−6; Fig. 2A, left), further highlighting the contribution of polymerase mutations to the accumulation of MS-indels.

We then wondered whether the two components of the replication repair machinery exert distinct alterations in MSs. To investigate this, we characterized each MS-indel based on three features, its motif (A or C), the MS length (5–20 bp), and the indel size (-3–+3 bp; Methods). We then used SignatureAnalyzer (32, 33) to infer the different mutational signatures based on these 192 (= 2 × 16 × 6) configurations and found 5 distinct MS-indel signatures (MS-sigs; Fig. 2B; Supplementary Fig. S2A). When compared with MMRP cancers (Fig. 2B), MMRD cancers were enriched for two signatures dominated by 1 bp deletions in both A- and C-repeats (MS-sig1 and MS-sig3, Mann–Whitney U test, P < 2.2 × 10−16), whereas polymerase mutations resulted in two signatures dominated by 1 bp insertions (MS-sig2 and MS-sig4, P = 4.4 × 10−14). In addition, we analyzed RRD cancers with driver mutations in POLD1 (encoding the DNA polymerase that synthesizes the lagging strand; ref. 3; Supplementary Fig. 2B and S2C). POLD1mut cancers were similarly dominated by insertions in MS-indels as POLEmut cancers (Supplementary Fig. S2B). To our knowledge, this is the first time that the unique mechanism of MS-indel accumulation by DNA polymerase is described in human cancers. Interestingly, our results are in agreement with several yeast studies, which showed a bias toward 1 bp insertions in MSs in mutant POLE strains as the leading strand polymerase (9, 34, 35), but contrast with other yeast reports which suggest that mutant POLD1 creates single-base deletions in MS sequences (8, 36). Together, these findings support that POLE and POLD1 with defective proofreading are associated with increased insertion events (as opposed to the deletion events associated with MMRD) and can contribute to the overall TMSIB in human cancers.

Previously (2), we classified hypermutant cancers into three clinically relevant subtypes of RRD (MMRD, PPD, or both MMRD and PPD) based on their single-base substitution (SBS) signatures (COSMIC SBS signatures; Supplementary Fig. S3). We therefore examined the correlation of the MS-sigs to these clusters. MMRD and PPD (SBS-Cluster 1) are mostly seen in children with germline MMRD who later acquire somatic POLE mutations and fit a combination of MS-sig1 and MS-sig2. MS-sig1 is specific for MMRD-only cancers (SBS-Cluster 2) in both children and adults, and PPD (SBS-Cluster 3) cancers are highly driven by the mutant polymerase and have more MS-sig2 MS-indels than MS-sig1 (P = 3.2 × 10−5, P = 0.0053, and P < 2.2 × 10−16, respectively; Fig. 2C). These data suggest that both SNV-based signatures and MS-sigs provide additional complementary information to standard mutational calls and can be used to clinically screen and stratify tumors in the future.

Accumulation of Large MS Deletions in MMRD Cancers with Preferred Final Length

To gain insight into the mechanism of MS-indel accumulation during RRD tumorigenesis, we analyzed the patterns of MS-indels in whole genomes of 46 pediatric RRD cancers (39 brain, 4 colorectal, 1 osteochondroma, 1 Wilms tumor, and 1 T-cell lymphoma) and 28 MMRD adult cancers (12 colorectal, 5 gastric, and 11 endometrial). The genome provides approximately 50 times more MS-loci compared with the exome, and has disproportionally higher numbers of longer (>20 bp) MS-loci. These two features enable a more refined analysis of the MS-indel distribution. Adult MMRD cancers present (Fig. 3A; Supplementary Figs. S3, S4A, S4B; Supplementary Table S5) bias toward deletions similar to their behavior in WES (Fig. 2B). They also present a new phenomenon of accumulating long deletions (>3 bases), which increase in size with increasing MS-loci length. In contrast, pediatric CMMRD cases have mostly 1 bp deletions, and MMRD and PPD cancers are enriched with 1 bp insertions (Fig. 3A; Supplementary Figs. S3 and S4A–S4C; Supplementary Table S5) with a lack of long deletions. The lack of long MS-indels in pediatric cancers is likely the reason for the inability of the conventional electrophoresis methodologies (e.g., MIAS, Promega; Fig. 1A) to detect MSI in these cancers, as the standard assay cannot robustly identify short indels and considers only indels of ≥3 bases as robust events.

To better understand the kinetics of the accumulation of large deletions in MS-loci, we compared two known models. (i) A two-phase model where each mutational event can alter either a single repeat motif or multiple motifs; or a simple (ii) stepwise-mutation model, which assumes only one repeat motif is altered per mutational event, making large deletions the amalgamation of multiple deletions (37, 38). The stepwise model predicts that the relationship between the MS-indel size and the frequency of the mutational events of that size will be unimodal (i.e., Poisson). On the other hand, the two-phase model enables a bimodal relationship between the MS-indel size and frequency (see Methods).

The pediatric tumors follow the predictions of the stepwise model (39) and present a sharp decrease in the frequency of MS-indels as the deletion size increases. In contrast, the adult tumors follow the stepwise model only for short loci (<12 bp), and from 12 bp and onward, they deviate from the stepwise model and resemble the two-phase model (ref. 37; Fig. 3B; Supplementary Fig. S4D and S4E).

Interestingly, the large deletions in adult tumors reduced the length of long (>15 bp) MS-loci to become approximately 15 bp (Fig. 3C; Supplementary Fig. S4F). These findings suggest that there is an underlying mechanism of MS deletions that gives larger loci the tendency to converge to 15 bp in length. A potential model for this phenomenon is that after the DNA polymerase replicates 12 to 15 bp on an MS-locus, it may become more susceptible to slip off from the template strand. The width of the DNA-binding domain of the polymerase is known to be approximately 50 Å, which is approximately the combined length of 15 nucleotides (51 Å) in a DNA double helix (40). We predict that because 15 nucleotides span the entire DNA-binding domain, the weak association of the polymerase to repeated nucleotides makes the polymerase especially prone to detach from the template strand. As the new strand then rebinds to the template strand, the template strand may generate a loop to become fully complementary to the approximately 15 bp of the nascent strand (refs. 41, 42; Fig. 3D).

This process is restricted to deletions, hinting that it may be specific to MMR-deficient cells, whereas insertions in polymerase mutant cancers did not reveal the same large events (Supplementary Fig. S4A).

Clinical Applications of MS-Sigs

To begin answering the potential diagnostic and predictive role of MS-indels in patients with RRD cancers, we first compared the ability of MS-sigs to diagnose RRD cancers with currently used SNV-based clinically approved tools (2). The MS-indel signatures presented above (MS-sig1 and MS-sig2) are based on detecting MS-indels by comparing pairs of tumor and normal samples. However, the characteristic patterns of MS-indels and the large number of MSs in the genome may enable analyzing tumors without the need for matched normal samples. To test this, we designed scores that reflect the prevalence of MS-sig1 (MMRDness score) and MS-sig2 (POLEness score, Methods) in any given sample. Both scores were calculated by taking the logarithm (base 10) of the ratio of the number of -1 deletions (MMRDness) or +1 insertions (POLEness) divided by the total number of MS-loci within the specified lengths (10–15 bp for MMRDness and 5–6 bp for POLEness, Methods). Higher scores indicate higher prevalence of the MMRDness and POLEness signatures in each sample. Only MS-sig1 and MS-sig2 were used due to the overwhelming dominance of A-repeat MS-loci in the genome compared with C-repeats, which are represented by MS-sig3 and MS-sig4. We calculated the two scores for a well-characterized set of tumors and were able to separate the four subtypes that we analyzed (MMRP, MMRD, PPD, and MMRD and PPD; Fig. 4A).

We then validated whether the MMRDness and POLEness scores can be used as a more cost-effective clinical tool for detecting MMRD and PPD. We sequenced 52 MMRP and RRD tumors with a small fraction of the standard genomic coverage (0.5X instead of the typical 30X) and calculated their MMRDness and POLEness. We observed similar clustering of the RRD subtypes (Supplementary Fig. S5A), validating their ability to accurately and cost-effectively classify RRD tumors. Subsequently, we tested whether the MS-sigs can improve the current SNV-based methods used to screen and diagnose RRD tumors (4). We assessed the MMRDness and POLEness scores (Methods) of 72 tumors for which WGS was available through our clinical SickKids Cancer Sequencing (KiCS) program (2). The MMRDness score uncovered four tumors (Fig. 4B; Supplementary Fig. S5B) that were not known to be MMRD by the sequencing center, as they had relatively low tumor mutational burdens and no SNV-based MMRD signatures (3–17 Mut/Mb; Supplementary Table S6). Genetic testing revealed that three of them had canonical Lynch syndrome germline mutations, and the remaining one carried a novel germline MMR mutation (Supplementary Table S6). This suggests that MMRDness and POLEness can be both a more accurate and less expensive test compared with current methods for detecting RRD in tumors.

Analysis of Whole-Genome MSI Signatures Can Diagnose CMMRD in Normal Cells

Having established the ability of MS-sigs and the corresponding MMRDness score to identify MMRD tumors, we tested the ability of the tool to identify germline (CMMRD) mutations in nonmalignant (blood) samples. All nonmalignant samples from patients with CMMRD (n = 34) had an elevated MMRDness score compared with patients with MMRP, polymerase mutant, and Lynch syndrome (P = 8.7 × 10−11, Mann–Whitney U test; Fig. 4C; Supplementary Fig. S5C). Notably, the Promega panel assay was completely unable to detect CMMRD in nonmalignant tissues (Fig. 1A), and, similarly, SNV-based signatures are not able to detect hypermutation in normal cells. The MMRDness score requires neither a patient-matched normal control nor high sequencing depth to detect the MS-sig patterns throughout the genome that correspond to CMMRD. The high specificity and sensitivity of the MMRDness score (Fig. 4D) enables it to be an inexpensive and robust assay for screening and diagnosing CMMRD prior to cancer, which can be used to implement early surveillance protocols to improve the survival of patients with CMMRD. On the other hand, the diagnostic role of POLEness is limited to RRD tumors (Fig. 4A), and further investigation is required to assess its ability to detect PPD in the germline.

Different Alterations of the MMR and Polymerase Genes Have Varying Effects on MMRDness and POLEness

The four genes MLH1, MSH2, MSH6, and PMS2, that when mutated lead to MMRD, play different roles in the MMR mechanism (43). However, thus far, no footprint has been found in the mutational landscape for the four different genes. For example, germline mutations in MSH2 and MLH1 have higher penetrance than MSH6 and PMS2, and are found most commonly in adult Lynch syndrome cases. In contrast, for individuals with CMMRD, the reverse is observed. PMS2 is the most commonly mutated gene, followed by MSH6, and biallelic MSH2 germline mutations are extremely rare. These cancers have similar tumor mutation burdens (TMB) irrespective of the mutated MMR gene (1, 2); thus, the different penetrance cannot be simply explained by a higher mutation rate. Other functional assays have also failed to explain this genotype–phenotype relationship despite the different mechanistic roles of the MMR genes. Similarly, PPD is almost strictly observed with POLE but not POLD1 germline mutations—an observation that also cannot be explained by differences in TMB or other functional assays.

We therefore evaluated whether the deficiency in different genes may have distinct effects on the MMRDness score (Fig. 4E). We found that MMRD cancers with biallelic inactivating mutations in MSH2 had the strongest MMRDness score, followed by MLH1, and was much lower in PMS2 mutant cancers (P = 6.7 × 10−4, Mann–Whitney U test; Fig. 4E). This fits well with the clinical observations of MMRD, as MSH2 and MLH1 mutations have higher cancer penetrance than PMS2 and MSH6. This also correlates with the known mechanism of MMR, as the MSH2 protein initially recognizes mismatches during replication and is crucial for both the mutSα and mutSβ complexes, whereas MLH1 is an important factor in the mutL complex (44–46).

Next, we investigated whether the two polymerase genes (POLE and POLD1) have a distinct effect on the POLEness score. POLD1mut cancers had significantly higher POLEness scores than POLEmut cancers (P = 2.7 × 10−4, Mann–Whitney U test; Fig. 4E), supporting our previous finding from the exomes (P = 2.3 × 10−4, Supplementary Fig. S2C), where POLD1mut cancers had a higher TMSIB than POLEmut. This can be explained by the replication of the lagging strand inherently having more dissociation and slippage of the polymerase from the DNA (16, 47).

POLEness Score Predicts Response to Immunotherapy in RRD Cancers

It was recently shown that RRD tumors tend to respond to ICI therapy (48). However, not all patients respond, and currently there is no adequate method to distinguish responders from nonresponders (Supplementary Fig. S6A). As MS-indels are highly immunogenic (10, 49–52), we hypothesized that within the context of RRD cancers, MMRDness and/or the POLEness score could predict response to ICI therapy. To test this, we initially used WGS data of 22 RRD tumors undergoing ICI as a part of our consortium registry study (48). We observed that the POLEness score of responders was higher than nonresponders to ICI (P = 0.011, Mann–Whitney U test; Fig. 5A), and tumors with a high POLEness score (≥2.92 POLEness, Methods) had a significantly higher survival than the low POLEness group (P = 0.012, Log-rank test; Fig. 5A). We therefore tested the ability of the POLEness score to predict response to ICI using exomes from a larger cohort of patients (n = 28). POLEness scores were higher in responders to ICI therapy than in nonresponders (P = 0.0016, Mann–Whitney U test; Fig. 5B). However, exome and genome MMRDness was not shown to have a significant difference between responders and nonresponders, potentially because all tumors in the cohort are MMRD (Supplementary Fig. S6B). The high-POLEness groups in both genome and exome had significantly longer survival times than the low-POLEness groups (P = 0.012 and 0.016, Log-rank test; Fig. 5A and B). To our knowledge, this is the first time that a mutational signature was suggested as a biomarker to distinguish between responders and nonresponders to ICI in RRD tumors (10, 49–51). This result sheds new light on the immunogenicity of MS-indels and the use of their signatures as novel biomarkers for response to immune checkpoint inhibition in the context of RRD.

This report uncovers roles for both components of the replication repair machinery in MS-indel accumulation in cancers. These observations address important fundamental biological questions regarding accumulation of MS-indels during carcinogenesis, which can be translated to clinical use.

Our data reveal that in addition to defects in MMR, loss of proofreading by replication-specific DNA polymerases is also associated with an increase in MS-indels. MS-indels produced by the deficiencies of the two replicative repair mechanisms have distinct characteristics. Although MMRD generates mainly deletions (MS-sig1 and MS-sig3), PPD results predominantly in insertions (MS-sig2 and MS-sig4), suggesting a strand-bias repair process (Supplementary Fig. S7). The MS-sigs described were comparable to the ID-1 and ID-2 indel signatures reported in the study by Alexandrov and colleagues, which were common in tissues that are associated with RRD (23). However, Alexandrov and colleagues did not analyze MS-indels in loci longer than five repeated units, and therefore were not able to find the unique MS-sigs present in larger MS-loci.

We hypothesize that the proofreading domain of the polymerase repairs extra bases on the nascent strand by changing the conformation of the DNA to push the substrate into the exonuclease domain of the polymerase. This would activate the exonucleolytic mode of DNA polymerase to excise the loop. Because mutations in the proofreading domain of the polymerase would prevent the DNA substrate from shifting into the exonuclease site, these remain unrepaired as replication continues (34, 53), creating a permanent insertion. As larger insertions were not observed in PPD cancers, we postulate that larger indel loops within the nascent strand do not fit into the exonuclease domain of the polymerase (Supplementary Fig. S7). In contrast, the MMR system is more effective in detecting loops on the parental strand (3, 45, 54, 55). These loops would result in deletions if unrepaired, which translated into the surge of MS-sig1 and MS-sig3 in MMRD cancers in our cohort.

Another intriguing observation related to the role of MMR on the template strand is that larger deletions in adult MMRD cancers converged to a common length of MSs of approximately 15 bp (Fig. 3C). We propose that this 15 bp convergence is due to the size of the DNA-binding domain of the polymerase complex being similar to the length of 15 bp, which is about 50 Å (40). Indeed, these large events can explain the ease of detection of MSI-H in adult MMRD cancers by the standard electrophoresis-based assay that specifically looks for long (≥3 bp) MS-indels. Furthermore, pediatric tumors lacked hotspots or commonly mutated MS-loci even within specific tissues (Fig. 1E; Supplementary Table S3). In adults, MMRD cells can acquire MS-indels in the ACVR2A gene, which may lead to a selection advantage in the GI system, but not in other tissue types. Thus, the contrast between pediatric and adult cases may be due to the selective pressures that are present in different tissue types, although comparative analysis between pediatric and adult GI cancers still showed unique MS-indel profiles (Supplementary Fig. S4B). In addition, the distinct MS deletions in adult and pediatric cancers might be due to unknown differences in DNA damage sensing or response mechanisms. Further experimental investigation could clarify these phenomena and can hint toward a better understanding of the kinetics and patterns of MS-indel accumulation.

The biological observations described above can have a major clinical impact on the management of patients with RRD cancers. First, the standard methods for MSI classification (e.g., MIAS Promega) cannot detect MSI in most pediatric MMRD tumors (Fig. 1A) and possibly in tumors in which MMRD occurs late in carcinogenesis due to other mutational processes. The latter includes treatment-related MMRD cancers such as leukemias, gliomas, and POLEmut-driven tumors (2, 56, 57), where MMRD occurs later, and which are therefore falsely termed MSS (6, 15, 16). Both SNV- and MS-indel–based methods may be more sensitive and specific to classify such tumors correctly for management and potential genetic testing. In addition, MS-sigs were able to differentiate between MMRD and PPD tumor genotypes (Fig. 4E), which has not been previously shown through SNV or the canonical MSI analysis method. The increased MMRDness signature in MSH2 and MLH1 mutant tumors (Fig. 4E) correlates with the more aggressive phenotype seen in Lynch syndrome cases and the biological importance of the two proteins in the repair mechanism of the mutSα and mutSβ MMR complexes (44–46). The ability to use low-pass WGS from tumors, without their matched normal samples, can make MS-sigs the basis for an inexpensive tool for RRD classification. There has been a recent report that used deep sequencing of a panel of 277 MS-loci to diagnose CMMRD using blood DNA (58). However, our method of ultra low-pass WGS (ULP-WGS) at 0.1 to 0.5X coverage is less expensive per sample and covers between 10% and 50% of the approximately 23 million MS-loci, respectively. The ULP-WGS approach can be more sensitive and specific than deep sequencing of specific loci, depending on the depth of sequencing in each of the approaches. MS-sigs can also provide additive information to the clinical diagnosis, such as the POLEness signature that can predict response to immunotherapy (Fig. 5). Ongoing and future investigations will compare the effectiveness of the two methods in detecting CMMRD.

Second, we showed that MS-sigs can be used to detect MMRD in nonmalignant samples of individuals with CMMRD. The potential to detect MMRD from nonmalignant samples, perhaps even before cancer development, has important clinical applications. In many cases, genetic diagnosis of CMMRD is difficult due to the large number of variants of unknown significance (VUS) and pseudogenes in the mismatch repair and DNA polymerase genes. Furthermore, MS-sigs were found to have extremely high sensitivity and specificity (both 100% in our cohort) in detecting MMRD in tumors and CMMRD in normal tissue. MS-sigs can distinguish functionally disruptive mutations from VUSs and pseudogenes in the MMR and polymerase genes, and can be conducted using low quantities (<150 ng) of unmatched DNA from formalin-fixed, paraffin-embedded (FFPE) or frozen tissue (Fig. 4D). We suggest here that MS-sig is an assay that can assess the MMRD phenotype from accessible, nonmalignant tissues, which may become a frontline tool for the clinical diagnosis of RRD tumors and for monitoring patients at high risk. Further validation on clinical cohorts is required.

Finally, our data provide further support for the notion that MS-indels may be strongly immunogenic in RRD hypermutant cancers (59). In our cohort, MS-sig–based POLEness score, but not MMRDness, was a predictive biomarker for response to immune checkpoint inhibition. This adds to the recent report which correlates MSI status to anti–PD-1 immunotherapy response (49). Because our cohort includes responders from brain and other tumor sites, which are not considered MSI-H using conventional methods, the robustness of next-generation sequencing–based MS-sig analysis may be superior to panel-based methods. Our observations suggest that the single repeat insertions represented by the POLEness signature likely yield highly immunogenic neoantigens, which in turn result in a robust immunogenic response against RRD tumor cells (48, 60, 61). Future large studies comparing MSI-based signatures with other genomic features of RRD cancers, as well as investigating differences in immunotherapy response between MMRD, PPD, and MMRD and PPD cancers will be required to validate our findings.

In summary, our data add a new dimension to the role of RRD in human cancer mutagenesis in the form of MS maintenance and instability. Studies of cancers emerging from rare, germline childhood cancer syndromes provide important insights and may be applied to more common adult cancers. Further mechanistic analysis will potentially explain the different types and sizes of MS-indels that are introduced during tumorigenesis. Future studies will determine the impact of both MMRD and PPD on MSs, which can lead to advancements in the classification, diagnosis, and determination of biomarkers in the management of RRD cancers and individuals.

Sample Collection and DNA Extraction for WES and WGS of RRD Tumors

As described in previous reports, patients with RRD have been routinely consented and registered into the International Replication Repair Consortium with written informed consent. The study was conducted in accordance with the Canadian Tri-Council Policy Statement II and was approved by the Institutional Research Ethics Board (approval number 1000048813), and all data were centralized in the Division of Haematology/Oncology at The Hospital for Sick Children (SickKids; Toronto, Ontario, Canada). Tumor and blood samples were collected from the SickKids tumor bank, and diagnosis of RRD was confirmed via sequencing and immunohistochemistry of the four MMR genes and sequencing of the POLE and POLD1 genes by a clinically approved laboratory. DNA was extracted using the QIAGEN DNeasy Blood & Tissue Kit for frozen tissues, and the QIAamp DNA FFPE Tissue Kit for paraffin-embedded tissues.

WES and WGS Data of Pediatric Tumors in the KiCS Program

WES and WGS data of pediatric tumors were obtained through the KiCS program. Detailed information about KiCS can be found at www.kicsprogram.com.

WES Data of Pediatric Brain Tumors

Written informed consent was provided for patients with brain tumors treated at The Hospital for Sick Children and was approved by the Institutional Review Board. DNA from the tumors and nonmalignant samples was extracted for WES. Further details on the DNA extraction protocol and exome-sequencing pipeline are available in previous reports.

WES and WGS Data and MSI Status Identification of Adult Tumors in TCGA Database

WES and WGS data for colorectal carcinoma, stomach adenoma, and uterine corpus endometrial carcinoma were downloaded from TCGA and were sequenced on the Illumina platform. As described in a previous report, their MSI status was annotated by TCGA.

WES Data of Pediatric Neuroblastoma from the TARGET Database

WES data of pediatric neuroblastoma were acquired from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative.

MSI Status of Pediatric Tumors Determined by the MIAS (Promega)

DNA extracted from tumor and matched normal tissues was quantified with Nanodrop (Thermo Fisher Scientific) and amplified with Platinum Multiplex PCR Master Mix (Thermo Fisher Scientific) and MSI 10X Primer Pair Mix (Promega) in a Veriti 96-Well Thermal Cycler, as per the manufacturer's recommendations for PCR cycling conditions. The primer mix targets a panel of five mononucleotide loci, termed BAT-25, BAT-26, NR-21, NR-24, and MONO-27, and following amplification, the products were run in a 3130 Genetic Analyzer for fluorescent capillary electrophoresis. Electrophoretograms were subsequently visualized using the Peak Scanner Software (v1.0, Thermo Fisher Scientific), and the highest peaks that were flanked by lower peaks were selected to be the representative alleles for each of the five loci in the panel. Each tumor–normal pair was compared by their allelic lengths. Tumors were considered MSI-H if two or more loci were unstable (≥3 bp shift from the normal allele), MSI-L if one locus was unstable, and MSS if all five loci were stable (<2 bp shift from the normal allele; refs. 27, 28).

Exome Sequencing of RRD Pediatric Tumors

All high-throughput sequencing and mutation identification were performed at The Centre for Applied Genomics at the Hospital for Sick Children, as described in previous reports (1, 2). Briefly, tumor and matched nonmalignant tissue DNA was sequenced on an Illumina HiSeq2500 machine using Agilent's exome enrichment kit (Sure Select V4/V5; with >50% baits above 25x coverage). Processing into FASTQ files was done using CASAVA and/or HAS, which were aligned to UCSC's hg19 GRCh37 with BWA. The realignment and recalibration of aligned reads were done using Short-read micro re-aligner and/or Genome Analysis Toolkit and the Genome Analysis Toolkit26 (v1.1–28), respectively. Whole-exome TMB was calculated using MuTect (v1.1.4) and filtered using dbSNP (v132), the 1000 Genomes Project (February 2012), a 69-sample Complete Genomics dataset, the Exome Sequencing Project (v6500), and the ExAc database for common SNPs.

Genome Sequencing of Pediatric Tumors

WGS was done using the Illumina HiSeq2500 or Illumina HiSeqX at ≥30X coverage, as well as on the Illumina NovaSeq6000 at 0.5X coverage. Read realignment and recalibration were conducted using the same pipeline as exome sequencing.

Exomic and Genomic MS Definition and Identification and Mutation Calling via MSMuTect

The detailed methods of the MSMuTect algorithm were previously reported (6). Briefly, repeats of five or more nucleotides were considered to be MS loci, and using the PHOBOS algorithm and the lobSTR approach, tumor and normal BAM files were aligned with their 5′ and 3′ flanking sequences. Each MS-locus allele was estimated using the empirical noise model, |$\ P_{({j,m})}^{Noise}({k,m})$| which is the probability of observing a read with an MS length k and motif m, where the true length of the allele is j with the motif m. This was used to call the MS alleles with the highest likelihood of being the true allele at each MS-locus. The MS alleles of each tumor and matched normal pair were called individually, which were compared to identify the mutations on the tumor MS-loci. The Akaike Information Criterion score was assigned to both the tumor and normal models, and a threshold score that was determined by using simulated data was applied to make the final call of the MS-indel.

MS-Indel Signatures Discovered by SignatureAnalyzer

For the WES samples noted in Supplementary Table S7, we quantified the number of MS-indels they had using different parameters, defined by the length of the MS-locus in the normal sample, the size of the indel (one base indel, two base indel, etc.), and the MS-locus motif (A- and C-repeats, Fig. 2B; Supplementary Fig. S2A). We then used the Bayesian Non-negative matrix factorization algorithm (SignatureAnalyzer; refs. 32, 33) with its default parameters to infer the different mutational processes operating in our samples. We did not include other repeat motifs, as they do not have sufficient mutational events.

Calculation of MMRDness and POLEness Scores

Exome- and genome-wide MS-indels in each sample were quantified and segregated into deletions and insertions of up to three bases, and by locus size from 5 to 40 bases. The number of MS-indels in each sample was normalized to its own sum of total MS-indels. MMRDness scores were calculated by taking log10 of the sum of the average proportions of one base deletions in loci sized from 10 to 15 bases. The POLEness scores were similarly calculated by taking log10 of the sum of the average proportions of one base insertions in loci sized five to six bases. Both scores were normalized to the total number of MS-loci with the respective lengths.

WES MS-Indel Quantification and Comparison

Exome-wide MS-indels for each tumor/normal pair were identified and quantified using MSMuTect (6). We processed the MSMuTect output using the Rstudio graphical user interface (v1.1.447). The processing included summing the total number of MS-indels and calculating the length change of each MS in each tumor compared with their matched normal samples. The median numbers of MS-indels in the exome of each tumor type and in each MS-indel signature cluster were compared using the Mann–Whitney U test in Rstudio, and statistical significance was considered at P < 0.05.

WGS MS-Indel Quantification

Genome-wide MS-indels were summed for each sample according to the reference length of each locus and the change in size/length of the indel. Genome-wide deletions were identified and summed for both adult and pediatric samples, and were averaged for each deletion length. The mean and SDs were calculated using Rstudio (v1.1.447) at each deletion length in adult and pediatric samples independently. Fractions of indels based on initial and mutated loci size in adult cancers were calculated by dividing the number of events pertaining to a specific initial and mutated locus size by the sum of all MS-indels in all adult tumors. The same procedure was followed for pediatric cancers.

MMRDness and POLEness Score Comparison between Deficient MMR and Polymerase Proofreading

MMRDness medians were calculated for pediatric RRD tumors with the same mutated MMR gene, and compared between the four genes using the Mann–Whitney U test. Similarly, POLEness medians were calculated for POLE or POLD1 mutants and compared.

MMRDness and POLEness Score Calculation and Comparison for Ultra Low-Coverage WGS Samples

MMRDness and POLEness scores were calculated as explained above for the ultra low-pass coverage sequencing samples. The tumors were separated according to their RRD type (MMRP, PPD, MMRD, and MMRD and PPD), and medians of their MMRDness were calculated. Comparison of MMRDness between groups was done using the Mann–Whitney U test. Germline samples were also segregated based on their RRD statuses (MMRP, PPD, Lynch, and CMMRD), and their MMRDness scores were compared using the same method as the tumors.

Immunotherapy Response Thresholds for POLEness Scores

POLEness scores were determined from 28 tumors where WES was available, and from 22 of them that had WGS available. These tumors were collected from patients with RRD cancers who underwent immunotherapy. POLEness scores were calculated as described above. For survival analysis, tumors were separated by their respective median POLEness scores (-2.92 in genome and -2.75 in exome). P values were determined using the log-rank test.

Data Availability

Existing WES and/or WGS data have previously been deposited in the European Genome Phenome Archive under the study accession numbers EGAS00001000579 and EGAS00001001112. New sequencing data from this report have been deposited in the European Genome Phenome Archive, and the accession number is EGAS00001004816. Public datasets used include the TCGA database (https://portal.gdc.cancer.gov/), the TARGET database (https://ocg.cancer.gov/programs/target), and the KiCS program (https://kicsprogram.com/). Supplementary Table S7 includes all of the raw data used in all analyses within this article. Further information and/or requests for data will be fulfilled by the corresponding author, Uri Tabori (uri.tabori@sickkids.ca).

Code Availability

The software and pipelines used for data collection are the following: MSMuTect (Maruvka and colleagues; https://github.com/getzlab/MSMuTect/; ref. 6), SignatureAnalyzer (https://software.broadinstitute.org/cancer/cga/msp), and MuTect (https://software.broadinstitute.org/cancer/cga/mutect). All data were analyzed using Rstudio (v1.1.447). All R packages and other statistical code used for analysis are as follows: Scales (https://CRAN.R-project.org/package=scales), ggplot2 (https://CRAN.R-project.org/package=uggplot2), dplyr (https://CRAN.R-project.org/package=dplyr), Reshape2 (https://CRAN.R-project.org/package=reshape2), RColorBrewer (https://CRAN.R-project.org/package=RColorBrewer), ggpubr (https://CRAN.R-project.org/package=ggpubr), tibble (https://CRAN.R-project.org/package=tibble), epade (https://CRAN.R-project.org/package=epade), Plot3-D (https://CRAN.R-project.org/package=plot3-D), forcats (https://CRAN.R-project.org/package=forcats), survival (https://cran.r-project.org/web/packages/survival/index.html), and survminer (https://CRAN.R-project.org/package=survminer). Scripts used to generate the figures are accessible upon request from the corresponding author.

Y.E. Maruvka reports MGH startup money grant outside the submitted work. N.J. Haradhvala reports personal fees from Constellation Pharmaceuticals outside the submitted work. M. Osborn reports nonfinancial support from Amgen, Pfizer, and Novartis outside the submitted work. G. Mason reports other support from Janssen Pharmaceuticals outside the submitted work. B. George reports grants and personal fees from Ipsen, BMS, Celgene, Taiho Oncology, and Roche/Genentech; personal fees from Foundation Medicine, Exelixis, Terumo Interventional Systems, and Eisai; and grants from Boehringer Ingelheim, Tolero Pharmaceuticals, Glyconex, and NGM Bio outside the submitted work. D.S. Ziegler reports personal fees from Bayer outside the submitted work. A. Van Damme reports other from BMS outside the submitted work. D.A. Morgenstern reports personal fees from Roche, Boehringer Ingelheim, Bayer, Clarity Pharmaceuticals, Y-mAbs Therapeutics, and EUSA Pharma outside the submitted work. E. Bouffet reports grants from Bristol Myers Squibb during the conduct of the study and grants from Roche outside the submitted work. G. Getz reports a patent for MSMuTect issued and a patent for MSIDetect pending. G. Getz also receives research funds from IBM and Pharmacyclics, and is an inventor on patent applications related to MuTect, ABSOLUTE, MutSig, MSMuTect, MSMutSig, MSIDetect, POLYSOLVER, and TensorQTL. G. Getz is also a founder, consultant, and holds privately held equity in Scorpion Therapeutics. U. Tabori reports grants from AACR/SU2C BMS Catalyst during the conduct of the study. No disclosures were reported by the other authors.

J. Chung: Conceptualization, resources, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing. Y.E. Maruvka: Conceptualization, resources, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing. S. Sudhaman: Resources, data curation, software, formal analysis, investigation, writing–review and editing. J. Kelly: Resources, data curation, formal analysis, investigation, writing–review and editing. N.J. Haradhvala: Writing–review and editing. V. Bianchi: Data curation, writing–review and editing. M. Edwards: Resources, data curation, writing–review and editing. V.J. Forster: Writing–review and editing. N.M. Nunes: Writing–review and editing. M.A. Galati: Writing-review and editing. M. Komosa: Writing–review and editing. S. Deshmukh: Formal analysis, writing–review and editing. V. Cabric: Formal analysis, writing–review and editing. S. Davidson: Resources, data curation, writing–review and editing. M. Zatzman: Data curation, writing–review and editing. N. Light: Data curation, writing–review and editing. R. Hayes: Data curation, writing–review and editing. L. Brunga: Data curation, writing–review and editing. N.D. Anderson: Data curation, writing–review and editing. B. Ho: Data curation, writing–review and editing. K.P. Hodel: Data curation, writing–review and editing. R. Siddaway: Data curation, writing–review and editing. A.S. Morrissy: Data curation, writing–review and editing. D.C. Bowers: Data curation, writing–review and editing. V. Larouche: Data curation, writing–review and editing. A. Bronsema: Data curation, writing–review and editing. M. Osborn: Data curation, writing–review and editing. K.A. Cole: Data curation, writing–review and editing. E. Opocher: Data curation, writing–review and editing. G. Mason: Data curation, writing–review and editing. G.A. Thomas: Data curation, writing–review and editing. B. George: Data curation, writing–review and editing. D.S. Ziegler: Writing–review and editing. S. Lindhorst: Data curation, writing–review and editing. M. Vanan: Data curation, writing–review and editing. M. Yalon-Oren: Data curation, writing–review and editing. A.T. Reddy: Data curation, writing–review and editing. M. Massimino: Data curation, writing–review and editing. P. Tomboc: Data curation, writing–review and editing. A. Van Damme: Data curation, writing–review and editing. A. Lossos: Data curation, writing–review and editing. C. Durno: Data curation, writing–review and editing. M. Aronson: Data curation, writing–review and editing. D.A. Morgenstern: Data curation, writing–review and editing. E. Bouffet: Writing–review and editing. A. Huang: Data curation, writing–review and editing. M.D. Taylor: Data curation, writing–review and editing. A. Villani: Writing–review and editing. D. Malkin: Writing–review and editing. C.E. Hawkins: Data curation, writing–review and editing. Z.F. Pursell: Investigation, writing–review and editing. A. Shlien: Data curation, methodology, writing–review and editing. T.A. Kunkel: Visualization, writing–review and editing. G. Getz: Conceptualization, resources, data curation, software, supervision, validation, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing. U. Tabori: Conceptualization, resources, data curation, supervision, funding acquisition, investigation, visualization, methodology, writing–original draft, project administration, writing–review and editing.

The KiCS program is supported by the Garron Family Cancer Centre with funds from the SickKids Foundation. This research is supported by Meagan's Walk (MW-2014-10), b.r.a.i.n.child Canada, LivWise, an Enabling Studies Program grant from BioCanRx, Canada's Immunotherapy Network (a Network Centre of Excellence), the Canadian Institutes for Health Research (CIHR) grant (PJT-156006), the CIHR Joint Canada-Israel Health Research Program (108188), and a Stand Up To Cancer (SU2C)–Bristol Myers Squibb Catalyst Research (SU2C-AACR-CT07-17) research grant. Stand Up To Cancer is a division of the Entertainment Industry Foundation. Research grants are administered by the American Association for Cancer Research, the scientific partner of SU2C. G. Getz and Y.E. Maruvka were partially funded by G. Getz funds at Massachusetts General Hospital. G. Getz was partially funded by the Paul C. Zamecnick Chair in Oncology at the Massachusetts General Hospital Cancer Center.

1.
Shlien
A
,
Campbell
BB
,
de Borja
R
,
Alexandrov
LB
,
Merico
D
,
Wedge
D
, et al
Combined hereditary and somatic mutations of replication error repair genes result in rapid onset of ultra-hypermutated cancers
.
Nat Genet
2015
;
47
:
257
62
.
2.
Campbell
BB
,
Light
N
,
Fabrizio
D
,
Zatzman
M
,
Fuligni
F
,
de Borja
R
, et al
Comprehensive analysis of hypermutation in human cancer
.
Cell
2017
;
171
:
1042
56
.
3.
Kunkel
TA
,
Erie
DA
. 
Eukaryotic mismatch repair in relation to DNA replication
.
Annu Rev Genet
2015
;
49
:
291
313
.
4.
Tabori
U
,
Hansford
JR
,
Achatz
MI
,
Kratz
CP
,
Plon
SE
,
Frebourg
T
, et al
Clinical management and tumor surveillance recommendations of inherited mismatch repair deficiency in childhood
.
Clin Cancer Res
2017
;
23
:
e32
e7
.
5.
Kunkel
TA
. 
Evolving views of DNA replication (in)fidelity
.
Cold Spring Harb Symp Quant Biol
2009
;
74
:
91
101
.
6.
Maruvka
YE
,
Mouw
KW
,
Karlic
R
,
Parasuraman
P
,
Kamburov
A
,
Polak
P
, et al
Analysis of somatic microsatellite indels identifies driver events in human tumors
.
Nat Biotechnol
2017
;
35
:
951
9
.
7.
Haradhvala
NJ
,
Kim
J
,
Maruvka
YE
,
Polak
P
,
Rosebrock
D
,
Livitz
D
, et al
Distinct mutational signatures characterize concurrent loss of polymerase proofreading and mismatch repair
.
Nat Commun
2018
;
9
:
1746
.
8.
Fortune
JM
,
Pavlov
YI
,
Welch
CM
,
Johansson
E
,
Burgers
PM
,
Kunkel
TA
. 
Saccharomyces cerevisiae DNA polymerase delta: high fidelity for base substitutions but lower fidelity for single- and multi-base deletions
.
J Biol Chem
2005
;
280
:
29980
7
.
9.
Kirchner
JM
,
Tran
H
,
Resnick
MA
. 
A DNA polymerase ϵ mutant that specifically causes +1 frameshift mutations within homonucleotide runs in yeast
.
Genetics
2000
;
155
:
1623
32
.
10.
Le
DT
,
Uram
JN
,
Wang
H
,
Bartlett
BR
,
Kemberling
H
,
Eyring
AD
, et al
PD-1 blockade in tumors with mismatch-repair deficiency
.
N Engl J Med
2015
;
372
:
2509
20
.
11.
Gryfe
R
,
Kim
H
,
Hsieh
ET
,
Aronson
MD
,
Holowaty
EJ
,
Bull
SB
, et al
Tumor microsatellite instability and clinical outcome in young patients with colorectal cancer
.
N Engl J Med
2000
;
342
:
69
77
.
12.
Yamashita
H
,
Nakayama
K
,
Ishikawa
M
,
Nakamura
K
,
Ishibashi
T
,
Sanuki
K
, et al
Microsatellite instability is a biomarker for immune checkpoint inhibitors in endometrial cancer
.
Oncotarget
2018
;
9
:
5652
64
.
13.
Dudley
JC
,
Lin
MT
,
Le
DT
,
Eshleman
JR
. 
Microsatellite instability as a biomarker for PD-1 blockade
.
Clin Cancer Res
2016
;
22
:
813
20
.
14.
Hause
RJ
,
Pritchard
CC
,
Shendure
J
,
Salipante
SJ
. 
Classification and characterization of microsatellite instability across 18 cancer types
.
Nat Med
2016
;
22
:
1342
50
.
15.
Cancer Genome Atlas Research Network
,
Kandoth
C
,
Schultz
N
,
Cherniack
AD
,
Akbani
R
,
Liu
Y
, et al
Integrated genomic characterization of endometrial carcinoma
.
Nature
2013
;
497
:
67
73
.
16.
Prindle
MJ
,
Loeb
LA
. 
DNA polymerase delta in DNA replication and genome maintenance
.
Environ Mol Mutagen
2012
;
53
:
666
82
.
17.
Lynch
HT
,
Snyder
CL
,
Shaw
TG
,
Heinen
CD
,
Hitchins
MP
. 
Milestones of Lynch syndrome: 1895–2015
.
Nat Rev Cancer
2015
;
15
:
181
94
.
18.
Vilar
E
,
Gruber
SB
. 
Microsatellite instability in colorectal cancer-the stable evidence
.
Nat Rev Clin Oncol
2010
;
7
:
153
62
.
19.
Umar
A
,
Boland
CR
,
Terdiman
JP
,
Syngal
S
,
Chapelle
Adl
,
Rüschoff
J
, et al
Revised Bethesda guidelines for hereditary nonpolyposis colorectal cancer (Lynch syndrome) and microsatellite instability
.
J Natl Cancer Inst
2004
;
96
:
261
8
.
20.
Wimmer
K
,
Kratz
CP
,
Vasen
HF
,
Caron
O
,
Colas
C
,
Entz-Werle
N
, et al
Diagnostic criteria for constitutional mismatch repair deficiency syndrome: suggestions of the European consortium ‘care for CMMRD’ (C4CMMRD)
.
J Med Genet
2014
;
51
:
355
65
.
21.
Wimmer
K
,
Kratz
CP
. 
Constitutional mismatch repair-deficiency syndrome
.
Haematologica
2010
;
95
:
699
701
.
22.
Abedalthagafi
M
. 
Constitutional mismatch repair-deficiency: current problems and emerging therapeutic strategies
.
Oncotarget
2018
;
9
:
35458
69
.
23.
Alexandrov
LB
,
Kim
J
,
Haradhvala
NJ
,
Huang
MN
,
Tian Ng
AW
,
Wu
Y
, et al
The repertoire of mutational signatures in human cancer
.
Nature
2020
;
578
:
94
101
.
24.
Dietmaier
W
,
Wallinger
S
,
Bocker
T
,
Kullmann
F
,
Fishel
R
,
Ruschoff
J
. 
Diagnostic microsatellite instability: definition and correlation with mismatch repair protein expression
.
Cancer Res
1997
;
57
:
4749
56
.
25.
Thibodeau
SN
,
French
AJ
,
Roche
PC
,
Cunningham
JM
,
Tester
DJ
,
Lindor
NM
, et al
Altered expression of hMSH2 and hMLH1 in tumors with microsatellite instability and genetic alterations in mismatch repair genes
.
Cancer Res
1996
;
56
:
4836
40
.
26.
Gologan
A
,
Sepulveda
AR
. 
Microsatellite instability and DNA mismatch repair deficiency testing in hereditary and sporadic gastrointestinal cancers
.
Clin Lab Med
2005
;
25
:
179
96
.
27.
Loukola
A
,
Eklin
K
,
Laiho
P
,
Salovaara
R
,
Kristo
P
,
Jarvinen
H
, et al
Microsatellite marker analysis in screening for hereditary nonpolyposis colorectal cancer (HNPCC)
.
Cancer Res
2001
;
61
:
4545
9
.
28.
Murphy
KM
,
Zhang
S
,
Geiger
T
,
Hafez
MJ
,
Bacher
J
,
Berg
KD
, et al
Comparison of the microsatellite instability analysis system and the Bethesda panel for the determination of microsatellite instability in colorectal cancers
.
J Mol Diagn
2006
;
8
:
305
11
.
29.
von Loga
K
,
Woolston
A
,
Punta
M
,
Barber
LJ
,
Griffiths
B
,
Semiannikova
M
, et al
Extreme intratumour heterogeneity and driver evolution in mismatch repair deficient gastro-oesophageal cancer
.
Nat Commun
2020
;
11
:
139
.
30.
Pursell
ZF
,
Kunkel
TA
. 
DNA polymerase epsilon: a polymerase of unusual size (and complexity)
.
Prog Nucleic Acid Res Mol Biol
2008
;
82
:
101
45
.
31.
Tran
HT
,
Gordenin
DA
,
Resnick
MA
. 
The 3′→5′ exonucleases of DNA polymerases delta and epsilon and the 5′→3′ exonuclease Exo1 have major roles in postreplication mutation avoidance in Saccharomyces cerevisiae
.
Mol Cell Biol
1999
;
19
:
2000
7
.
32.
Kasar
S
,
Kim
J
,
Improgo
R
,
Tiao
G
,
Polak
P
,
Haradhvala
N
, et al
Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution
.
Nat Commun
2015
;
6
:
8866
.
33.
Kim
J
,
Mouw
KW
,
Polak
P
,
Braunstein
LZ
,
Kamburov
A
,
Kwiatkowski
DJ
, et al
Somatic ERCC2 mutations are associated with a distinct genomic signature in urothelial tumors
.
Nat Genet
2016
;
48
:
600
6
.
34.
Xing
X
,
Kane
DP
,
Bulock
CR
,
Moore
EA
,
Sharma
S
,
Chabes
A
, et al
A recurrent cancer-associated substitution in DNA polymerase epsilon produces a hyperactive enzyme
.
Nat Commun
2019
;
10
:
374
.
35.
Aksenova
A
,
Volkov
K
,
Maceluch
J
,
Pursell
ZF
,
Rogozin
IB
,
Kunkel
TA
, et al
Mismatch repair–independent increase in spontaneous mutagenesis in yeast lacking non-essential subunits of DNA polymerase ϵ
.
PLos Genet
2010
;
6
:
e1001209
.
36.
Abdulovic
AL
,
Hile
SE
,
Kunkel
TA
,
Eckert
KA
. 
The in vitro fidelity of yeast DNA polymerase delta and polymerase epsilon holoenzymes during dinucleotide microsatellite DNA synthesis
.
DNA Repair
2011
;
10
:
497
505
.
37.
Sainudiin
R
,
Durrett
RT
,
Aquadro
CF
,
Nielsen
R
. 
Microsatellite mutation models: insights from a comparison of humans and chimpanzees
.
Genetics
2004
;
168
:
383
95
.
38.
Slatkin
M
. 
A measure of population subdivision based on microsatellite allele frequencies
.
Genetics
1995
;
139
:
457
62
.
39.
Dieringer
D
,
Schlötterer
C
. 
Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species
.
Genome Res
2003
;
13
:
2242
51
.
40.
Hogg
M
,
Osterman
P
,
Bylund
GO
,
Ganai
RA
,
Lundstrom
EB
,
Sauer-Eriksson
AE
, et al
Structural basis for processive DNA synthesis by yeast DNA polymerase varepsilon
.
Nat Struct Mol Biol
2014
;
21
:
49
55
.
41.
Levinson
G
,
Gutman
GA
. 
High frequencies of short frameshifts in poly-CA/TG tandem repeats borne by bacteriophage M13 in Escherichia coli K-12
.
Nucleic Acids Res
1987
;
15
:
5323
38
.
42.
Schlotterer
C
,
Tautz
D
. 
Slippage synthesis of simple sequence DNA
.
Nucleic Acids Res
1992
;
20
:
211
5
.
43.
Martín-López
JV
,
Fishel
R
. 
The mechanism of mismatch repair and the functional analysis of mismatch repair defects in Lynch syndrome
.
Fam Cancer
2013
;
12
:
159
68
.
44.
Zhang
Y
,
Yuan
F
,
Presnell
SR
,
Tian
K
,
Gao
Y
,
Tomkinson
AE
, et al
Reconstitution of 5′-directed human mismatch repair in a purified system
.
Cell
2005
;
122
:
693
705
.
45.
Constantin
N
,
Dzantiev
L
,
Kadyrov
FA
,
Modrich
P
. 
Human mismatch repair: reconstitution of a nick-directed bidirectional reaction
.
J Biol Chem
2005
;
280
:
39752
61
.
46.
Dzantiev
L
,
Constantin
N
,
Genschel
J
,
Iyer
RR
,
Burgers
PM
,
Modrich
P
. 
A defined human system that supports bidirectional mismatch-provoked excision
.
Mol Cell
2004
;
15
:
31
41
.
47.
Burgers
PM
. 
Polymerase dynamics at the eukaryotic DNA replication fork
.
J Biol Chem
2009
;
284
:
4041
5
.
48.
Bouffet
E
,
Larouche
V
,
Campbell
BB
,
Merico
D
,
Borja
Rd
,
Aronson
M
, et al
Immune checkpoint inhibition for hypermutant glioblastoma multiforme resulting from germline biallelic mismatch repair deficiency
.
J Clin Oncol
2016
;
34
:
2206
11
.
49.
Mandal
R
,
Samstein
RM
,
Lee
KW
,
Havel
JJ
,
Wang
H
,
Krishna
C
, et al
Genetic diversity of tumors with mismatch repair deficiency influences anti-PD-1 immunotherapy response
.
Science
2019
;
364
:
485
91
.
50.
Havel
JJ
,
Chowell
D
,
Chan
TA
. 
The evolving landscape of biomarkers for checkpoint inhibitor immunotherapy
.
Nat Rev Cancer
2019
;
19
:
133
50
.
51.
Le
DT
,
Durham
JN
,
Smith
KN
,
Wang
H
,
Bartlett
BR
,
Aulakh
LK
, et al
Mismatch repair deficiency predicts response of solid tumors to PD-1 blockade
.
Science
2017
;
357
:
409
13
.
52.
Samstein
RM
,
Lee
CH
,
Shoushtari
AN
,
Hellmann
MD
,
Shen
R
,
Janjigian
YY
, et al
Tumor mutational load predicts survival after immunotherapy across multiple cancer types
.
Nat Genet
2019
;
51
:
202
6
.
53.
Parkash
V
,
Kulkarni
Y
,
Ter Beek
J
,
Shcherbakova
PV
,
Kamerlin
SCL
,
Johansson
E
. 
Structural consequence of the most frequently recurring cancer-associated substitution in DNA polymerase epsilon
.
Nat Commun
2019
;
10
:
373
.
54.
Modrich
P
. 
Mechanisms in eukaryotic mismatch repair
.
J Biol Chem
2006
;
281
:
30305
9
.
55.
Haye
JE
,
Gammie
AE
. 
The eukaryotic mismatch recognition complexes track with the replisome during DNA synthesis
.
PLos Genet
2015
;
11
:
e1005719
.
56.
van Thuijl
HF
,
Mazor
T
,
Johnson
BE
,
Fouse
SD
,
Aihara
K
,
Hong
C
, et al
Evolution of DNA repair defects during malignant progression of low-grade gliomas after temozolomide treatment
.
Acta Neuropathol
2015
;
129
:
597
607
.
57.
Diouf
B
,
Cheng
Q
,
Krynetskaia
NF
,
Yang
W
,
Cheok
M
,
Pei
D
, et al
Somatic deletions of genes regulating MSH2 protein stability cause DNA mismatch repair deficiency and drug resistance in human leukemia cells
.
Nat Med
2011
;
17
:
1298
303
.
58.
González-Acosta
M
,
Marín
F
,
Puliafito
B
,
Bonifaci
N
,
Fernández
A
,
Navarro
M
, et al
High-sensitivity microsatellite instability assessment for the detection of mismatch repair defects in normal tissue of biallelic germline mismatch repair mutation carriers
.
J Med Genet
2020
;
57
:
269
73
.
59.
Turajlic
S
,
Litchfield
K
,
Xu
H
,
Rosenthal
R
,
McGranahan
N
,
Reading
JL
, et al
Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis
.
Lancet Oncol
2017
;
18
:
1009
21
.
60.
Maletzki
C
,
Huehns
M
,
Bauer
I
,
Ripperger
T
,
Mork
MM
,
Vilar
E
, et al
Frameshift mutational target gene analysis identifies similarities and differences in constitutional mismatch repair-deficiency and Lynch syndrome
.
Mol Carcinog
2017
;
56
:
1753
64
.
61.
Maby
P
,
Galon
J
,
Latouche
JB
. 
Frameshift mutations, neoantigens and tumor-specific CD8(+) T cells in microsatellite unstable colorectal cancers
.
Oncoimmunology
2016
;
5
:
e1115943
.
62.
Bakry
D
,
Aronson
M
,
Durno
C
,
Rimawi
H
,
Farah
R
,
Alharbi
QK
, et al
Genetic and clinical determinants of constitutional mismatch repair deficiency syndrome: report from the constitutional mismatch repair deficiency consortium
.
Eur J Cancer
2014
;
50
:
987
96
.
63.
Shuen
AY
,
Lanni
S
,
Panigrahi
GB
,
Edwards
M
,
Yu
L
,
Campbell
BB
, et al
Functional repair assay for the diagnosis of constitutional mismatch repair deficiency from non-neoplastic tissue
.
J Clin Oncol
2019
;
37
:
461
70
.
64.
Bodo
S
,
Colas
C
,
Buhard
O
,
Collura
A
,
Tinat
J
,
Lavoine
N
, et al
Diagnosis of constitutional mismatch repair-deficiency syndrome based on microsatellite instability and lymphocyte tolerance to methylating agents
.
Gastroenterology
2015
;
149
:
1017
29
.
65.
Durno
C
,
Boland
CR
,
Cohen
S
,
Dominitz
JA
,
Giardiello
FM
,
Johnson
DA
, et al
Recommendations on surveillance and management of biallelic mismatch repair deficiency (BMMRD) syndrome: a consensus statement by the US multi-society task force on colorectal cancer
.
Gastroenterology
2017
;
152
:
1605
14
.