Human colorectal, endometrial, and gastric cancers with defective DNA mismatch repair (MMR) have microsatellite instability, a unique molecular alteration characterized by widespread frameshift mutations of repetitive DNA sequences. We developed “Kangaroo,” a bioinformatics program for searches in nucleotide and protein sequence databases, and performed an in silico genome scan for DNA coding microsatellites that may have novel mutations in MMR-deficient cancers. Examination of 29 previously untested coding polyadenines revealed widespread mutations in MMR-deficient colorectal cancers, with the highest frequencies in ERCC5, CASP8AP2, p72, RAD50, CDC25, RECQL1, CBF2, RACK7, GRK4, and DNAPK (range, 10–33%). This algorithm allows comprehensive mutation profiling of MMR-deficient cancers, an important step in understanding the pathogenesis of these neoplasms.

CRC4 is the second leading cause of cancer death in North America, providing the impetus for research aimed at understanding the biology of this disease. Among the important discoveries, in recent years it has become clear that there are at least two major molecular pathogenetic pathways to CRC: (a) MSI, because of defects in DNA MMR; and (b) chromosomal instability, because of defects in mitotic spindle apparatus and other genes (1). Importantly, the pathological and clinical attributes of the cancers arising out of each of these two pathways are different. MSI-H CRCs are more often located in the right colon, are typically polypoid, and have high grade histology with a prominent lymphoid reaction (2). This pathway also underlies most cases of hereditary nonpolyposis colon cancer (3) and leads to cancers that display less aggressive growth characteristics with fewer metastases and better overall survival (4). The fundamental difference between these two cancer pathways lies in the underlying mechanism of genomic instability (1). CRCs with chromosomal instability are characterized by widespread chromosomal deletions and translocations, whereas those with MSI have ubiquitous DNA mutations (3, 5, 6). As predicted by bacterial and yeast models, MMR deficiency leads to instability of short repeated sequences, particularly mononucleotides and dinucleotides (7). This is exemplified by gene inactivating frameshift mutations in coding microsatellites in MSI-H CRCs, most notably transforming growth factor, β receptor II(8). Therefore, an in silico search for genes with coding microsatellites should uncover the novel genetic targets involved in the molecular progression of these neoplasms. Unfortunately, the current query programs of the public sequence databases have two limitations that prohibit such a search. First, they do not support searches for low complexity regions, because these regions are filtered out as “background noise,” and second, they do not allow searching solely within human open reading frames. We devised a computer program, “Kangaroo,” that searches for DNA sequences in annotated human GenBank records. Although GenBank is a highly redundant database, we identified many records containing coding microsatellite sequences and demonstrated mutations in a number of novel target genes that may be involved in the pathogenesis of MMR-deficient cancers. This approach unveils the possibility of comprehensive mutation profiling of MMR-deficient cancers and will be integral to uncovering the biologically important molecular alterations of these neoplasms.

Bioinformatics Search Algorithm.

We developed a two-step search algorithm, Kangaroo, written in C computer programming language using the NCBI toolkit (J. Ostell, NCBI Software Development Toolkit, 1997)5 and developed on a dual Pentium II processor Linux machine. In the first step, NCBI GenBank records are retrieved, and coding region sequences are parsed out from all of the records. NCBI GenBank records, are accessed from our in-house database (SeqHound),6,7 which mirrors the NCBI latest GenBank release (v.123.0 Apr.2001), the NCBI taxonomy database, and the Brookhaven protein databank (9). Coding region information was derived from the sequence annotations as entered in the GenBank flatfile by the individual record submitters. Although GenBank provides a reliable source of regularly updated records, it is a highly redundant database, and, thus, a single gene may be represented as many as 20 times in our searches. In the second step, Kangaroo searches through coding regions for the DNA pattern submitted by the user. We designed Kangaroo to permit searches of short and/or low complexity DNA sequences and query sequences that contain IUPAC DNA ambiguity codes. The search algorithm is based on Regular Expression functions and is part of the NCBI C toolkit. The strategy described here was extended to search different organism databases and to search general DNA and protein records. To develop this search algorithm into a user-friendly, public, bioinformatics tool, we amalgamated the features into a web-based application. Kangaroo, which runs on a four processor Sun Solaris server,8 can perform searches through amino acids, DNA, and annotated coding regions in 10 different organisms with custom flexibility that is not available in other recent database search tools (10, 11, 12).

Tissue Samples and MSI Testing.

Patients (<50 years of age) with resected CRCs were identified through the Ontario Cancer Registry in a population-based study (4). Paraffin-embedded tissues were obtained and a histopathological review performed to locate regions of high neoplastic cellularity (>50%). Tissue was microdissected and DNA extracted as described (4). Briefly, tissue was scraped from two to three unstained 10-μm slides into 50–100 μl of lysis buffer [10 mm Tris-Cl (pH 7.0), 100 mm KCl, 2.5 mm MgCl2, and 0.45% Tween 20]. After a 10-min incubation at 95°C, tissue samples were subjected to proteinase K (20 mg/ml, 15–35 μl) digestion overnight at 65°C. A total of 16 human cancer cell lines were obtained from the American Type Culture Collection (Manassas, VA), including 7 MSI-H CRC cell lines (SW48, LS174T, LS411, LoVo, HCT-8, HCT-116, and DLD-1), 1 MSI-H endometrial carcinoma cell line (HEC1A), and 8 microsatellite stable CRC cell lines (HT-29, SW480, SW620, SW837, SW1116, Colo320HSR, LS513, and LS1034; Refs. 13, 14, 15, 16). DNA was extracted from the cell lines using DNeasy Tissue kit (Qiagen, Mississauga, ON), according to the manufacturer’s instructions. MSI was tested in the primary CRCs by PCR of five reference panel loci outlined in the National Cancer Institute Workshop on Microsatellite Instability, and CRCs were classified as microsatellite stable, low frequency microsatellite instability, or MSI-H as defined (17). The loci used in our study were BAT-25, BAT-26, D2S123, D5S346, D17S250, BAT-40, BAT-RII, D18S58, D18S69, and D17S787, with PCR conditions as described (4, 17). In total, there were 102 MSI-H primary CRCs available for mutation screening (results of the MSI testing have been published previously; Ref. 4). The MSI status of the cell lines was confirmed by PCR analysis of the BAT26 locus.

Mutation Profiling of Coding Microsatellites.

PCR primers corresponding to selected coding regions were designed to amplify a product <150bp (primer sequences and annealing temperatures available on request). The reverse primer was end-labeled in a final volume of 10 μl; 0.3 μm of the reverse primers was combined with 60 μCi of [γ-33P]ATP (Easytides; NEN-US, Boston, MA) and 5.88 units of FPLCpure Polynucleotide Kinase (Amersham Pharmacia Biotech, Baie d’Urfe, Quebec, Canada). The reaction was incubated at 37°C for 1 h and denatured at 90°C for 2 min. In a 15-μl PCR reaction, 2 μl of genomic DNA from primary CRCs or 1 μl of DNA from the cell lines was combined with 10 × PCR buffer [200 mm Tris-HCl (pH 8.4), 500 mm KCl], 1.5 mm MgCl2, 0.13 mm deoxynucleotide triphosphates, 0.4 μm of each forward and reverse primer, and 1 unit of Platinum Taq polymerase (Life Technologies, Inc., Burlington, Ontario, Canada). PCR cycling conditions were 2 min at 94°C followed by 35 cycles of 15 s at 94°C, 15 s at annealing temperature, and 20 s at 72°C (DNA Engine, model PTC-200; MJ Research, Watertown, MA). After PCR, 7.5 μl of denaturing formamide dye was added, the samples were denatured at 94°C for 4 min, cooled, and loaded on to a denaturing 6% polyacrylamide gel. The gel was transferred onto 3-mm Whatman paper, dried, and exposed to Kodak Biomax film (Rochester, NY). All of the putative mutations were confirmed by sequencing. Templates were reamplified, PCR products were gel purified using the Concert Rapid PCR Purification System (Life Technologies, Inc.), and the reverse primer was used for sequencing using the Thermo Sequenase radiolabeled terminator cycle sequencing kit (Amersham Pharmacia Biotech, Cleveland, OH) according to the manufacturer’s instructions.

Using Kangaroo, we identified many records with mononucleotide tracts at least six nucleotides in length in human coding sequences. Because GenBank is a highly redundant database, the number of identified records with mononucleotides may be 5–10 times higher than the actual number of coding mononucleotides in human sequences. The number of six-base mononucleotides was by far the most frequent (80% of the total), and the frequency dropped sharply with increasing tract length (Fig. 1). Polyadenine tracts were present at much high frequency than all of the other tracts. In particular, (A)8 tracts were about five times more abundant than any other type of (N)8 repeat. In addition, whereas there were occasional very long (12 and 13 nucleotide) tracts, these were all A/T tracts, and similar length C/G tracts were not identified. There were a number of mononucleotides >13 bases in length identified in the database, the majority of which were putative genes identified from sequencing projects, and the repeats were most frequently located at the extreme 3′ end of the entry (data not shown). Several entries were isolated mutations that extended open reading frames to include repeats in the 3′ UTR. A single (A)32 was also identified in the original entry for regulator of mitotic spindle assembly 1 (HUMPROTXA/RMSA-1), but this sequence was subsequently recognized to be a cloning artifact. The longest definite coding mononucleotide identified, an (A)14, was in melastatin 1 (MLSN1).

To investigate the role of some of these novel candidates, we selected 18 genes containing polyadenine tracts measuring at least eight nucleotides in length and screened for mutations in MMR-deficient CRCs. Genes were selected because of their probable function in cell cycle regulation or transcription activation. Mutation screening revealed frameshift mutation frequencies varying widely among the genes (from 0% to 33%). Half of the genes had mutation frequencies of ≥5% (Table 1). The highest mutation frequencies were found in DNA-activated protein kinase catalytic subunit (DNAPK), G protein-coupled receptor kinase 2 (Drosophila)-like (GRK4), protein kinase C-binding protein 1 (RACK7), and CCAAT box binding protein (CBF2), all of which were >15%. The remaining nine genes had mutation frequencies from 0 to 4% (data not shown). For comparison, we also screened for mutations in seven genes with previously reported coding polyadenine microsatellites, including five genes with mutation frequencies of ≥5% (Table 1). The overall results were similar to those described previously (17). Of note, the overall mutation frequency in the tumors was 61 of 196 (31%) for (A)10 tracts, 77 of 461 (17%) for (A)9 tracts, and 112 of 1560 (7.2%) for (A)8 tracts, suggesting an association between polyadenine length and mutation frequency.

To compare the mutation profiles of cultured CRC cells, we tested a subset of the same genes in a panel of MSI-H cell lines. The mutation frequencies ranged from zero (11 of the 24 genes tested) to 50% (MSH3). Comparison of the mutation frequencies in the primary CRCs revealed that the results in the cell lines were similar (Fig. 2). Four of the six genes with mutation frequencies >20% in the cell lines were also >20% in the primary CRCs. Similarly, there were 18 genes with mutation frequencies <20% in the cell lines, and 17 of these were <20% in the primary CRCs. Whereas some low frequency mutations were detected in the primary tumors but not in the cell lines, this probably reflects the statistical variability of a mutation screen in a small panel of lines. The overall mutation frequency in the cell lines for all of the genes tested, 20 of 185 (10.8%), was the same as the overall mutation frequency in the primary CRCs for the corresponding genes, 241 of 2201 (10.9%). These findings suggested that the MSI-H cell lines could be used in a screen to identify targets that were mutated frequently in the primary MSI-H CRCs. We used this strategy to test 11 additional coding polyadenines, and identified 3 genes, RAD50 (Saccharomyces cervesiae) homologue (RAD50), DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17 (DDX17/p72), and CASP8-associated protein 2 (CASP8AP2/RIP25), with coding microsatellites mutated in ≥10% of the samples (Table 2). The remaining 8 genes had no coding microsatellite mutations identified in any of the cell lines (data not shown).

DNA MMR deficiency results in widespread frameshift mutations because of a failure to repair one and two base slippage events occurring during replication of repetitive DNA sequences (3, 5, 6, 7). The presence of coding microsatellite frameshift mutations in CRCs with defective DNA MMR suggests a molecular mechanism for this distinctive pathogenetic pathway (8, 17). Kangaroo is an important advance for investigating these neoplasms, because it permits a comprehensive in silico genome scan to identify previously unrecognized coding microsatellites. The high frequency of polyadenine tracts identified by Kangaroo was particularly striking, and the relative absence of longer G/C mononucleotides raises the possibility that these homopolymers may be more highly selected against in evolution. This is consistent with the greater susceptibility of G/C homopolymeric tracts to instability in the presence of DNA MMR deficiency (18). The total number of records with coding mononucleotides identified by Kangaroo is similar to that reported by Woerner et al.(11), which also used a highly redundant database (EMBL) for source records. In contrast, the study by Mori et al.(12) identified a much smaller number of coding mononucleotides, most likely reflecting the fact that source records were obtained from the less redundant Unigene database. Wren et al.(10) enumerated the presence of coding microsatellites, predominantly trinucleotide repeats, based on the identification of in-frame repetitive elements only.

We selected a number of biologically important candidate genes and identified widespread coding microsatellite mutations, similar to other recent mutation surveys of coding sequences (11, 12, 19, 20). This suggests that the coding microsatellites identified by Kangaroo likely represent numerous novel molecular progression targets of MMR-deficient neoplasms. Several of the genes we found to have the highest coding microsatellite mutation frequencies are candidate genes in tumorigenesis. Mutation of DNA-dependent protein kinase catalytic subunit (DNAPK), which is involved in DNA-damage response double-strand break repair, causes the severe combined immumodeficiency phenotype and lymphoma predisposition in mice. Although DNA-dependent protein kinase catalytic subunit (DNAPK) inactivation is not reported in human neoplasms, several genes with related functions, including ataxia telangiectasia mutated (ATM), breast cancer 1 early onset (BRCA1), breast cancer 2 early onset (BRCA2), and p53, are directly implicated in tumorigenesis. In addition, we identified mutations in two other genes involved in DNA repair, RecQ (DNA helicase) protein-like (RECQL1) and excision repair cross-complementing complementation group 5 (ERCC5), and confirmed the presence of mutations in MSH3, ataxia telangiectasia and Rad3 related (ATR), Bloom syndrome (BLM), and RAD50 (S. cervesiae) homologue (RAD50). Thus, there is a growing list of DNA repair genes that have coding microsatellites inactivated in MMR-deficient tumorigenesis [including also mutS (Escherichia coli) homologue 6 and methyl-CpG binding domain protein 4 (MBD4)]. The curious presence of hypermutable repetitive sequences in several of the DNA MMR genes has been noted, raising theories about a role in evolutionary modulation (21).

We also found relatively high mutation frequencies in cell division cycle 25C (CDC25C), which is involved in triggering entry into mitosis, and CASP8 associated protein 2 (CASP8AP2/RIP25), which is involved in FAS-mediated apoptosis. In combination with mutations in the coding 8 base polyguanine tract of BCL2-associated X protein, these findings suggest that there may be a host of coding microsatellite targets contributing to abrogation of cell cycle and apoptosis regulation in MMR-deficient neoplasms. In contrast, several other DNA repair, cell cycle, and apoptosis regulatory genes, including centromere protein F (CENP-F/MITOSIN), ATP-dependent DNA ligase III (DNA ligase III), transcription factor Dp-2 (DP-2), protein tyrosine phosphatase nonreceptor type 13 (FAP-1/PTPN13), cell division cycle 7 (S. cervesiae homologue)-like 1 (CDC7), REV1 (yeast homologue)-like (REV1), PMS2, apoptotic protease activity factor (APAF1), and checkpoint (Schizosaccharomyces pombe) homologue (CHK1), harbor relatively low mutation frequencies. Finally, the negligible mutation frequencies in two putative apoptosis inhibitors, BCL2-associated anthanogene 4/silencer of death domain (BAG4/SODD) and BCL2-associated X protein antagonist selected in saccharomyces 1 (BASS1), as well as several signaling pathway genes, Vaccinia-related kinase 2 (VRK2), mitogen inducible 2 (MIG-2), PDZ domain-containing guanine nucleotide exchange factor I (GNEF/LOC51735), mitogen activated protein kinase kinase kinase (MAP3K4/MTK1/MEKK4), renal tumor antigen (MOK/RAGE), and Ras-like protein (TC10), also raise the possibility that mutations in some coding microsatellites could be selected against in tumor development.

Although some clues can be obtained from overall mutation frequencies, it is not possible to infer the biological significance of coding microsatellite mutations without functional studies of the cell biological effects of these alterations. It is entirely possible that the frameshift mutations in a given coding microsatellite could be bystanders, even in genes with putative roles in apoptosis and cell cycle regulation. Furthermore, interpretation of mutation frequency requires an understanding of the role that repeat length, nucleotide type, and adjacent sequence context play in determining sequence stability (17). Attempts have been made to compare coding to noncoding mutation frequencies for microsatellites of similar sequence composition (8, 17); however, much larger surveys of intronic sequences are required. Polycytosines and polyguanines may have greater instability than polyadenines and polythymines (18), precluding comparisons of mutation frequencies for different tract compositions. Therefore, additional studies will be required before the functional significance of the mutations identified by us are known.

DNA MMR deficiency underlies a significant proportion of CRCs, endometrial cancers, and gastric cancers. We developed Kangaroo, a powerful bioinformatics search algorithm, to perform an in silico genome scan for coding microsatellites that may be mutated in DNA MMR-deficient tumors. It will now be possible to develop a comprehensive mutation profile across hundreds of coding microsatellites that are putative targets for MMR-deficient tumors. Whereas it is often difficult to understand the biological and functional importance of some of these alterations, the delineation of a comprehensive mutation profile may be analogous to the development of a transcriptional profile of a tumor. This type of systematic approach will be essential to better understand the molecular pathogenesis of MMR-deficient neoplasms.

Fig. 1.

Semilog distribution of mononucleotide repetitive sequences identified in GenBank records of annotated human coding region. Mononucleotides less than six bases in length were too abundant to enumerate using Kangaroo. For the purposes of this figure, searches for coding microsatellites >13 nucleotides in length were truncated because of apparent ambiguities in many of these sequence entries (see text). Because of the redundancy of GenBank, the number of actual coding mononucleotides is likely to be much smaller than that indicated by the number of records identified.

Fig. 1.

Semilog distribution of mononucleotide repetitive sequences identified in GenBank records of annotated human coding region. Mononucleotides less than six bases in length were too abundant to enumerate using Kangaroo. For the purposes of this figure, searches for coding microsatellites >13 nucleotides in length were truncated because of apparent ambiguities in many of these sequence entries (see text). Because of the redundancy of GenBank, the number of actual coding mononucleotides is likely to be much smaller than that indicated by the number of records identified.

Close modal
Fig. 2.

Coding mononucleotide mutation frequencies in MMR-deficient primary CRCs and cell lines. The genes are organized into three groups based on the mutation frequency in the cell lines: 0% (no cell lines with instability), 1–20% (one cell line with coding instability), and >20% (two or more cell lines with coding instability). Within each of these groups, the genes are arranged according to the mutation frequency in the primary tumors. For some genes, only seven tumor cell lines amplified successfully.

Fig. 2.

Coding mononucleotide mutation frequencies in MMR-deficient primary CRCs and cell lines. The genes are organized into three groups based on the mutation frequency in the cell lines: 0% (no cell lines with instability), 1–20% (one cell line with coding instability), and >20% (two or more cell lines with coding instability). Within each of these groups, the genes are arranged according to the mutation frequency in the primary tumors. For some genes, only seven tumor cell lines amplified successfully.

Close modal

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1

Supported in part by the National Cancer Institute of Canada with funds from the Terry Fox Run. M. R. was the recipient of a Research Scientist Award from the National Cancer Institute of Canada supported with funds provided by the Canadian Cancer Society.

4

The abbreviations used are: CRC, colorectal cancer; NCBI, National Center for Biotechnology Information; MMR, mismatch repair; MSH3, mutS (Escherichia coli) homologue 3; MSI, microsatellite instability; MSI-H, high frequency microsatellite instability; PMS2, postmeiotic segregation increased (S. cervesiae) 2.

5

Internet address: ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools.

6

K. Michalickova, G. D. Bader, R. Isserlin, and C. W. V. Hogue. SeqHound biological sequence database system as a platform for bioinformatics research, manuscript in preparation.

7

Internet address: http://bioinfo.mshri.on.ca.

8

Internet address: http://bioinfo.mshri.on.ca/kangaroo.

Table 1

Mutation frequency of coding polyadenine microsatellites in MSI-H colorectal cancers

Gene symbolaGene name; functionAccession no.LocationbCoding microsatellitecMutation frequency (%)
Newly identified coding polyadenines      
DNAPK/PRKDC Protein kinase, DNA-activated, catalytic subunit; DNA double-strand break repair and recombination NM_006904 487 (A)10 33/99 (33) 
   10807 (A)8 8/98 (8) 
GRK4/GPRK2L G protein-coupled receptor kinase 2 (Drosophila)-like; desensitize G protein-coupled receptors NM_005307 656 (A)9 19/91 (21) 
RACK7/PRKCBP1 Protein kinase C-binding protein 1; anchors protein kinase C-beta-1 NM_012408 319 (A)8 17/91 (19) 
CBF2 CCAAT box binding protein; transcriptional activation NM_005760 1420 (A)9 14/91 (16) 
RECQL1 RecQ (DNA helicase) protein-like; DNA repair NM_032941 112 (A)9 11/93 (12) 
CDC25C Cell division cycle 25C; triggers entry into mitosis NM_001790 724 (A)8 10/93 (11) 
ERCC5 Excision repair cross-complementing, complementation group 5; nucleotide excision and transcription-coupled repair of DNA damage NM_000123 2743 (A)9 8/80 (10) 
TF-34 Human Krueppel-related DNA binding protein; zinc finger protein GI: 1124875 500 (A)8 6/86 (7) 
BLYM Burkitt’s lymphoma transforming gene; unknown gene product GI: 179497 NM_005179 78 (A)8 5/96 (5) 
Previously reported coding polyadenines      
MSH3 MutS (E. coli) homolog 3; DNA mismatch repair NM_002439 1141 (A)8 42/101 (42) 
ATR Ataxia telangiectasia and Rad3 related; DNA recombination, damage checkpoint, double-strand break repair NM_001184 2311 (A)10 28/97 (29) 
BLM Bloom syndrome; ATPase/helicase, may suppress inappropriate recombination NM_000057 1536 (A)9 22/97 (23) 
BCL10 B-cell CLL/lymphoma 10; activates nuclear factor κB, promotes apoptosis, suppresses transformation NM_003921 129 (A)8 7/102 (7) 
   493 (A)7 2/102 (2) 
PMS2 Postmeiotic segregation increased (S. cervesiae) 2; DNA mismatch repair NM_000535 1232 (A)8 5/99 (5) 
Gene symbolaGene name; functionAccession no.LocationbCoding microsatellitecMutation frequency (%)
Newly identified coding polyadenines      
DNAPK/PRKDC Protein kinase, DNA-activated, catalytic subunit; DNA double-strand break repair and recombination NM_006904 487 (A)10 33/99 (33) 
   10807 (A)8 8/98 (8) 
GRK4/GPRK2L G protein-coupled receptor kinase 2 (Drosophila)-like; desensitize G protein-coupled receptors NM_005307 656 (A)9 19/91 (21) 
RACK7/PRKCBP1 Protein kinase C-binding protein 1; anchors protein kinase C-beta-1 NM_012408 319 (A)8 17/91 (19) 
CBF2 CCAAT box binding protein; transcriptional activation NM_005760 1420 (A)9 14/91 (16) 
RECQL1 RecQ (DNA helicase) protein-like; DNA repair NM_032941 112 (A)9 11/93 (12) 
CDC25C Cell division cycle 25C; triggers entry into mitosis NM_001790 724 (A)8 10/93 (11) 
ERCC5 Excision repair cross-complementing, complementation group 5; nucleotide excision and transcription-coupled repair of DNA damage NM_000123 2743 (A)9 8/80 (10) 
TF-34 Human Krueppel-related DNA binding protein; zinc finger protein GI: 1124875 500 (A)8 6/86 (7) 
BLYM Burkitt’s lymphoma transforming gene; unknown gene product GI: 179497 NM_005179 78 (A)8 5/96 (5) 
Previously reported coding polyadenines      
MSH3 MutS (E. coli) homolog 3; DNA mismatch repair NM_002439 1141 (A)8 42/101 (42) 
ATR Ataxia telangiectasia and Rad3 related; DNA recombination, damage checkpoint, double-strand break repair NM_001184 2311 (A)10 28/97 (29) 
BLM Bloom syndrome; ATPase/helicase, may suppress inappropriate recombination NM_000057 1536 (A)9 22/97 (23) 
BCL10 B-cell CLL/lymphoma 10; activates nuclear factor κB, promotes apoptosis, suppresses transformation NM_003921 129 (A)8 7/102 (7) 
   493 (A)7 2/102 (2) 
PMS2 Postmeiotic segregation increased (S. cervesiae) 2; DNA mismatch repair NM_000535 1232 (A)8 5/99 (5) 
a

Includes only those genes with coding microsatellite mutation frequencies of ≥5%.

b

The numerical position of the repeat sequence is denoted as its position within the coding sequence and is retrieved by Kangaroo from the GenBank Flatfile.

c

In addition to the identified polyadenine tract, all other coding mononucleotides in the same genes measuring at least seven bases in length were also screened for mutations.

Table 2

Mutation frequency of coding polyadenine microsatellites in MSI-H human cancer cell lines

Gene symbolaGene name; functionAccession no.LocationbCoding microsatellitecMutation frequency (%)
RAD50 RAD50 (S. cervesiae) homolog, DNA double-strand break repair/recombination NM_005732 2175 (A)9 2/8 (25) 
   2812 (A)8 0/8 
DDX17/p72 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17; ATPase involved in transcription/translation NM_006386 118 (A)8 1/7 (14) 
CASP8AP2/RIP25 CASP8 associated protein 2; interacts with and activates caspase-8 in FAS mediated apoptosis NM_02115 3700 (A)8 1/8 (12.5) 
   3382 (A)9 0/8 
   2855 (A)7 0/8 
   3794 (A)7 0/8 
Gene symbolaGene name; functionAccession no.LocationbCoding microsatellitecMutation frequency (%)
RAD50 RAD50 (S. cervesiae) homolog, DNA double-strand break repair/recombination NM_005732 2175 (A)9 2/8 (25) 
   2812 (A)8 0/8 
DDX17/p72 DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17; ATPase involved in transcription/translation NM_006386 118 (A)8 1/7 (14) 
CASP8AP2/RIP25 CASP8 associated protein 2; interacts with and activates caspase-8 in FAS mediated apoptosis NM_02115 3700 (A)8 1/8 (12.5) 
   3382 (A)9 0/8 
   2855 (A)7 0/8 
   3794 (A)7 0/8 
a

Includes only those genes with coding microsatellite mutation frequencies of 10% or greater.

b

The numerical position of the repeat sequence is denoted as its position within the coding sequence and is retrieved by Kangaroo from the GenBank Flatfile.

c

In addition to the identified polyadenine tract, all other coding mononucleotides in the same genes measuring at least seven bases in length were also screened for mutations.

1
Lengauer C., Kinzler K. W., Vogelstein B. Genetic instabilities in human cancer.
Nature (Lond.)
,
396
:
643
-649,  
1998
.
2
Kim H., Jen J., Vogelstein B., Hamilton S. R. Clinical and pathological characteristics of sporadic colorectal carcinomas with DNA replication errors in microsatellite sequences.
Am. J. Pathol.
,
145
:
148
-156,  
1994
.
3
Aaltonen L. A., Peltomaki P., Leach F. S., Sistonen P., Pylkkanen L., Mecklin J. P., Jarvinen H., Powell S. M., Jen J., Hamilton S. R., Petersen G. M., Kinzler K. W., Vogelstein B., de la Chapelle A. Clues to the pathogenesis of familial colorectal cancer.
Science (Wash. DC)
,
260
:
812
-816,  
1993
.
4
Gryfe R., Kim H., Hsieh E. T., Aronson M. D., Holowaty E. J., Bull S. B., Redston M., Gallinger S. Tumor microsatellite instability and clinical outcome in young patients with colorectal cancer.
N. Engl. J. Med.
,
342
:
69
-77,  
2000
.
5
Ionov Y., Peinado M. A., Malkhosyan S., Shibata D., Perucho M. Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis.
Nature (Lond.)
,
363
:
558
-561,  
1993
.
6
Thibodeau S. N., Bren G., Schaid D. Microsatellite instability in cancer of the proximal colon.
Science (Wash. DC)
,
260
:
816
-819,  
1993
.
7
Sia E. A., Kokoska R. J., Dominska M., Greenwell P., Petes T. D. Microsatellite instability in yeast: dependence on repeat unit size and DNA mismatch repair genes.
Mol. Cell Biol.
,
17
:
2851
-2858,  
1997
.
8
Markowitz S., Wang J., Myeroff L., Parsons R., Sun L., Lutterbaugh J., Fan R. S., Zborowska E., Kinzler K. W., Vogelstein B. Inactivation of the type II TGF-β receptor in colon cancer cells with microsatellite instability.
Science (Wash. DC)
,
268
:
1336
-1338,  
1995
.
9
Bernstein F. C., Koetzle T. F., Williams G. J., Meyer E. F. J., Brice M. D., Rodgers J. R., Kennard O., Shimanouchi T., Tasumi M. The protein data bank: a computer-based archival file for macromolecular structures.
Arch. Biochem. Biophys.
,
185
:
584
-591,  
1978
.
10
Wren J. D., Forgacs E., Fondon J. W., III, Pertsemlidis A., Cheng S. Y., Gallardo T., Williams R. S., Shohet R. V., Minna J. D., Garner H. R. Repeat polymorphisms within gene regions: phenotypic and evolutionary implications.
Am. J. Hum. Genet.
,
67
:
345
-356,  
2000
.
11
Woerner S. M., Gebert J., Yuan Y. P., Sutter C., Ridder R., Bork P., von Knebel D. M. Systematic identification of genes with coding microsatellites mutated in DNA mismatch repair-deficient cancer cells.
Int. J. Cancer
,
93
:
12
-19,  
2001
.
12
Mori Y., Yin J., Rashid A., Leggett B. A., Young J., Simms L., Kuehl P. M., Langenberg P., Meltzer S. J., Stine O. C. Instabilotyping. Comprehensive identification of frameshift mutations caused by coding region microsatellite instability.
Cancer Res.
,
61
:
6046
-6049,  
2001
.
13
Cottu P. H., Muzeau F., Estreicher A., Flejou J. F., Iggo R., Thomas G., Hamelin R. Inverse correlation between RER+ status and p53 mutation in colorectal cancer cell lines.
Oncogene
,
13
:
2727
-2730,  
1996
.
14
Hoang J. M., Cottu P. H., Thuille B., Salmon R. J., Thomas G., Hamelin R. BAT-26, an indicator of the replication error phenotype in colorectal cancers and cell lines.
Cancer Res.
,
57
:
300
-303,  
1997
.
15
Sparks A. B., Morin P. J., Vogelstein B., Kinzler K. W. Mutational analysis of the APC/β-catenin/Tcf pathway in colorectal cancer.
Cancer Res.
,
58
:
1130
-1134,  
1998
.
16
Schwartz S., Jr., Yamamoto H., Navarro M., Reventos J., Perucho M. Frameshift mutations at mononucleotide repeats in caspase-5 and other target genes in endometrial and gastrointestinal cancer of the microsatellite mutator phenotype.
Cancer Res.
,
59
:
2995
-3002,  
1999
.
17
Boland C. R., Thibodeau S. N., Hamilton S. R., Sidransky D., Eshleman J. R., Burt R. W., Meltzer S. J., Rodriguez-Bigas M. A., Fodde R., Ranzani G. N., Srivastava S. A. National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer.
Cancer Res.
,
58
:
5248
-5257,  
1998
.
18
Zhang L., Yu J., Willson J. K., Markowitz S. D., Kinzler K. W., Vogelstein B. Short mononucleotide repeat sequence variability in mismatch repair-deficient cancers.
Cancer Res.
,
61
:
3801
-3805,  
2001
.
19
Forgacs E., Wren J. D., Kamibayashi C., Kondo M., Xu X. L., Markowitz S., Tomlinson G. E., Muller C. Y., Gazdar A. F., Garner H. R., Minna J. D. Searching for microsatellite mutations in lung, breast, ovarian and colorectal cancer.
Oncogene
,
20
:
1005
-1009,  
2001
.
20
Duval A., Rolland S., Compoint A., Tubacher E., Iacopetta B., Thomas G., Hamelin R. Evolution of instability at coding and non-coding repeat sequences in human MSI-H colorectal cancers.
Hum. Mol. Genet.
,
10
:
513
-518,  
2001
.
21
Chang D. K., Metzgar D., Wills C., Boland C. R. Microsatellites in the eukaryotic DNA mismatch repair genes as modulators of evolutionary mutation rate.
Genome Res.
,
11
:
1145
-1146,  
2001
.