Abstract
Human colorectal, endometrial, and gastric cancers with defective DNA mismatch repair (MMR) have microsatellite instability, a unique molecular alteration characterized by widespread frameshift mutations of repetitive DNA sequences. We developed “Kangaroo,” a bioinformatics program for searches in nucleotide and protein sequence databases, and performed an in silico genome scan for DNA coding microsatellites that may have novel mutations in MMR-deficient cancers. Examination of 29 previously untested coding polyadenines revealed widespread mutations in MMR-deficient colorectal cancers, with the highest frequencies in ERCC5, CASP8AP2, p72, RAD50, CDC25, RECQL1, CBF2, RACK7, GRK4, and DNAPK (range, 10–33%). This algorithm allows comprehensive mutation profiling of MMR-deficient cancers, an important step in understanding the pathogenesis of these neoplasms.
Introduction
CRC4 is the second leading cause of cancer death in North America, providing the impetus for research aimed at understanding the biology of this disease. Among the important discoveries, in recent years it has become clear that there are at least two major molecular pathogenetic pathways to CRC: (a) MSI, because of defects in DNA MMR; and (b) chromosomal instability, because of defects in mitotic spindle apparatus and other genes (1). Importantly, the pathological and clinical attributes of the cancers arising out of each of these two pathways are different. MSI-H CRCs are more often located in the right colon, are typically polypoid, and have high grade histology with a prominent lymphoid reaction (2). This pathway also underlies most cases of hereditary nonpolyposis colon cancer (3) and leads to cancers that display less aggressive growth characteristics with fewer metastases and better overall survival (4). The fundamental difference between these two cancer pathways lies in the underlying mechanism of genomic instability (1). CRCs with chromosomal instability are characterized by widespread chromosomal deletions and translocations, whereas those with MSI have ubiquitous DNA mutations (3, 5, 6). As predicted by bacterial and yeast models, MMR deficiency leads to instability of short repeated sequences, particularly mononucleotides and dinucleotides (7). This is exemplified by gene inactivating frameshift mutations in coding microsatellites in MSI-H CRCs, most notably transforming growth factor, β receptor II (8). Therefore, an in silico search for genes with coding microsatellites should uncover the novel genetic targets involved in the molecular progression of these neoplasms. Unfortunately, the current query programs of the public sequence databases have two limitations that prohibit such a search. First, they do not support searches for low complexity regions, because these regions are filtered out as “background noise,” and second, they do not allow searching solely within human open reading frames. We devised a computer program, “Kangaroo,” that searches for DNA sequences in annotated human GenBank records. Although GenBank is a highly redundant database, we identified many records containing coding microsatellite sequences and demonstrated mutations in a number of novel target genes that may be involved in the pathogenesis of MMR-deficient cancers. This approach unveils the possibility of comprehensive mutation profiling of MMR-deficient cancers and will be integral to uncovering the biologically important molecular alterations of these neoplasms.
Materials and Methods
Bioinformatics Search Algorithm.
We developed a two-step search algorithm, Kangaroo, written in C computer programming language using the NCBI toolkit (J. Ostell, NCBI Software Development Toolkit, 1997)5 and developed on a dual Pentium II processor Linux machine. In the first step, NCBI GenBank records are retrieved, and coding region sequences are parsed out from all of the records. NCBI GenBank records, are accessed from our in-house database (SeqHound),6,7 which mirrors the NCBI latest GenBank release (v.123.0 Apr.2001), the NCBI taxonomy database, and the Brookhaven protein databank (9). Coding region information was derived from the sequence annotations as entered in the GenBank flatfile by the individual record submitters. Although GenBank provides a reliable source of regularly updated records, it is a highly redundant database, and, thus, a single gene may be represented as many as 20 times in our searches. In the second step, Kangaroo searches through coding regions for the DNA pattern submitted by the user. We designed Kangaroo to permit searches of short and/or low complexity DNA sequences and query sequences that contain IUPAC DNA ambiguity codes. The search algorithm is based on Regular Expression functions and is part of the NCBI C toolkit. The strategy described here was extended to search different organism databases and to search general DNA and protein records. To develop this search algorithm into a user-friendly, public, bioinformatics tool, we amalgamated the features into a web-based application. Kangaroo, which runs on a four processor Sun Solaris server,8 can perform searches through amino acids, DNA, and annotated coding regions in 10 different organisms with custom flexibility that is not available in other recent database search tools (10, 11, 12).
Tissue Samples and MSI Testing.
Patients (<50 years of age) with resected CRCs were identified through the Ontario Cancer Registry in a population-based study (4). Paraffin-embedded tissues were obtained and a histopathological review performed to locate regions of high neoplastic cellularity (>50%). Tissue was microdissected and DNA extracted as described (4). Briefly, tissue was scraped from two to three unstained 10-μm slides into 50–100 μl of lysis buffer [10 mm Tris-Cl (pH 7.0), 100 mm KCl, 2.5 mm MgCl2, and 0.45% Tween 20]. After a 10-min incubation at 95°C, tissue samples were subjected to proteinase K (20 mg/ml, 15–35 μl) digestion overnight at 65°C. A total of 16 human cancer cell lines were obtained from the American Type Culture Collection (Manassas, VA), including 7 MSI-H CRC cell lines (SW48, LS174T, LS411, LoVo, HCT-8, HCT-116, and DLD-1), 1 MSI-H endometrial carcinoma cell line (HEC1A), and 8 microsatellite stable CRC cell lines (HT-29, SW480, SW620, SW837, SW1116, Colo320HSR, LS513, and LS1034; Refs. 13, 14, 15, 16). DNA was extracted from the cell lines using DNeasy Tissue kit (Qiagen, Mississauga, ON), according to the manufacturer’s instructions. MSI was tested in the primary CRCs by PCR of five reference panel loci outlined in the National Cancer Institute Workshop on Microsatellite Instability, and CRCs were classified as microsatellite stable, low frequency microsatellite instability, or MSI-H as defined (17). The loci used in our study were BAT-25, BAT-26, D2S123, D5S346, D17S250, BAT-40, BAT-RII, D18S58, D18S69, and D17S787, with PCR conditions as described (4, 17). In total, there were 102 MSI-H primary CRCs available for mutation screening (results of the MSI testing have been published previously; Ref. 4). The MSI status of the cell lines was confirmed by PCR analysis of the BAT26 locus.
Mutation Profiling of Coding Microsatellites.
PCR primers corresponding to selected coding regions were designed to amplify a product <150bp (primer sequences and annealing temperatures available on request). The reverse primer was end-labeled in a final volume of 10 μl; 0.3 μm of the reverse primers was combined with 60 μCi of [γ-33P]ATP (Easytides; NEN-US, Boston, MA) and 5.88 units of FPLCpure Polynucleotide Kinase (Amersham Pharmacia Biotech, Baie d’Urfe, Quebec, Canada). The reaction was incubated at 37°C for 1 h and denatured at 90°C for 2 min. In a 15-μl PCR reaction, 2 μl of genomic DNA from primary CRCs or 1 μl of DNA from the cell lines was combined with 10 × PCR buffer [200 mm Tris-HCl (pH 8.4), 500 mm KCl], 1.5 mm MgCl2, 0.13 mm deoxynucleotide triphosphates, 0.4 μm of each forward and reverse primer, and 1 unit of Platinum Taq polymerase (Life Technologies, Inc., Burlington, Ontario, Canada). PCR cycling conditions were 2 min at 94°C followed by 35 cycles of 15 s at 94°C, 15 s at annealing temperature, and 20 s at 72°C (DNA Engine, model PTC-200; MJ Research, Watertown, MA). After PCR, 7.5 μl of denaturing formamide dye was added, the samples were denatured at 94°C for 4 min, cooled, and loaded on to a denaturing 6% polyacrylamide gel. The gel was transferred onto 3-mm Whatman paper, dried, and exposed to Kodak Biomax film (Rochester, NY). All of the putative mutations were confirmed by sequencing. Templates were reamplified, PCR products were gel purified using the Concert Rapid PCR Purification System (Life Technologies, Inc.), and the reverse primer was used for sequencing using the Thermo Sequenase radiolabeled terminator cycle sequencing kit (Amersham Pharmacia Biotech, Cleveland, OH) according to the manufacturer’s instructions.
Results
Using Kangaroo, we identified many records with mononucleotide tracts at least six nucleotides in length in human coding sequences. Because GenBank is a highly redundant database, the number of identified records with mononucleotides may be 5–10 times higher than the actual number of coding mononucleotides in human sequences. The number of six-base mononucleotides was by far the most frequent (80% of the total), and the frequency dropped sharply with increasing tract length (Fig. 1). Polyadenine tracts were present at much high frequency than all of the other tracts. In particular, (A)8 tracts were about five times more abundant than any other type of (N)8 repeat. In addition, whereas there were occasional very long (12 and 13 nucleotide) tracts, these were all A/T tracts, and similar length C/G tracts were not identified. There were a number of mononucleotides >13 bases in length identified in the database, the majority of which were putative genes identified from sequencing projects, and the repeats were most frequently located at the extreme 3′ end of the entry (data not shown). Several entries were isolated mutations that extended open reading frames to include repeats in the 3′ UTR. A single (A)32 was also identified in the original entry for regulator of mitotic spindle assembly 1 (HUMPROTXA/RMSA-1), but this sequence was subsequently recognized to be a cloning artifact. The longest definite coding mononucleotide identified, an (A)14, was in melastatin 1 (MLSN1).
To investigate the role of some of these novel candidates, we selected 18 genes containing polyadenine tracts measuring at least eight nucleotides in length and screened for mutations in MMR-deficient CRCs. Genes were selected because of their probable function in cell cycle regulation or transcription activation. Mutation screening revealed frameshift mutation frequencies varying widely among the genes (from 0% to 33%). Half of the genes had mutation frequencies of ≥5% (Table 1). The highest mutation frequencies were found in DNA-activated protein kinase catalytic subunit (DNAPK), G protein-coupled receptor kinase 2 (Drosophila)-like (GRK4), protein kinase C-binding protein 1 (RACK7), and CCAAT box binding protein (CBF2), all of which were >15%. The remaining nine genes had mutation frequencies from 0 to 4% (data not shown). For comparison, we also screened for mutations in seven genes with previously reported coding polyadenine microsatellites, including five genes with mutation frequencies of ≥5% (Table 1). The overall results were similar to those described previously (17). Of note, the overall mutation frequency in the tumors was 61 of 196 (31%) for (A)10 tracts, 77 of 461 (17%) for (A)9 tracts, and 112 of 1560 (7.2%) for (A)8 tracts, suggesting an association between polyadenine length and mutation frequency.
To compare the mutation profiles of cultured CRC cells, we tested a subset of the same genes in a panel of MSI-H cell lines. The mutation frequencies ranged from zero (11 of the 24 genes tested) to 50% (MSH3). Comparison of the mutation frequencies in the primary CRCs revealed that the results in the cell lines were similar (Fig. 2). Four of the six genes with mutation frequencies >20% in the cell lines were also >20% in the primary CRCs. Similarly, there were 18 genes with mutation frequencies <20% in the cell lines, and 17 of these were <20% in the primary CRCs. Whereas some low frequency mutations were detected in the primary tumors but not in the cell lines, this probably reflects the statistical variability of a mutation screen in a small panel of lines. The overall mutation frequency in the cell lines for all of the genes tested, 20 of 185 (10.8%), was the same as the overall mutation frequency in the primary CRCs for the corresponding genes, 241 of 2201 (10.9%). These findings suggested that the MSI-H cell lines could be used in a screen to identify targets that were mutated frequently in the primary MSI-H CRCs. We used this strategy to test 11 additional coding polyadenines, and identified 3 genes, RAD50 (Saccharomyces cervesiae) homologue (RAD50), DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17 (DDX17/p72), and CASP8-associated protein 2 (CASP8AP2/RIP25), with coding microsatellites mutated in ≥10% of the samples (Table 2). The remaining 8 genes had no coding microsatellite mutations identified in any of the cell lines (data not shown).
Discussion
DNA MMR deficiency results in widespread frameshift mutations because of a failure to repair one and two base slippage events occurring during replication of repetitive DNA sequences (3, 5, 6, 7). The presence of coding microsatellite frameshift mutations in CRCs with defective DNA MMR suggests a molecular mechanism for this distinctive pathogenetic pathway (8, 17). Kangaroo is an important advance for investigating these neoplasms, because it permits a comprehensive in silico genome scan to identify previously unrecognized coding microsatellites. The high frequency of polyadenine tracts identified by Kangaroo was particularly striking, and the relative absence of longer G/C mononucleotides raises the possibility that these homopolymers may be more highly selected against in evolution. This is consistent with the greater susceptibility of G/C homopolymeric tracts to instability in the presence of DNA MMR deficiency (18). The total number of records with coding mononucleotides identified by Kangaroo is similar to that reported by Woerner et al. (11), which also used a highly redundant database (EMBL) for source records. In contrast, the study by Mori et al. (12) identified a much smaller number of coding mononucleotides, most likely reflecting the fact that source records were obtained from the less redundant Unigene database. Wren et al. (10) enumerated the presence of coding microsatellites, predominantly trinucleotide repeats, based on the identification of in-frame repetitive elements only.
We selected a number of biologically important candidate genes and identified widespread coding microsatellite mutations, similar to other recent mutation surveys of coding sequences (11, 12, 19, 20). This suggests that the coding microsatellites identified by Kangaroo likely represent numerous novel molecular progression targets of MMR-deficient neoplasms. Several of the genes we found to have the highest coding microsatellite mutation frequencies are candidate genes in tumorigenesis. Mutation of DNA-dependent protein kinase catalytic subunit (DNAPK), which is involved in DNA-damage response double-strand break repair, causes the severe combined immumodeficiency phenotype and lymphoma predisposition in mice. Although DNA-dependent protein kinase catalytic subunit (DNAPK) inactivation is not reported in human neoplasms, several genes with related functions, including ataxia telangiectasia mutated (ATM), breast cancer 1 early onset (BRCA1), breast cancer 2 early onset (BRCA2), and p53, are directly implicated in tumorigenesis. In addition, we identified mutations in two other genes involved in DNA repair, RecQ (DNA helicase) protein-like (RECQL1) and excision repair cross-complementing complementation group 5 (ERCC5), and confirmed the presence of mutations in MSH3, ataxia telangiectasia and Rad3 related (ATR), Bloom syndrome (BLM), and RAD50 (S. cervesiae) homologue (RAD50). Thus, there is a growing list of DNA repair genes that have coding microsatellites inactivated in MMR-deficient tumorigenesis [including also mutS (Escherichia coli) homologue 6 and methyl-CpG binding domain protein 4 (MBD4)]. The curious presence of hypermutable repetitive sequences in several of the DNA MMR genes has been noted, raising theories about a role in evolutionary modulation (21).
We also found relatively high mutation frequencies in cell division cycle 25C (CDC25C), which is involved in triggering entry into mitosis, and CASP8 associated protein 2 (CASP8AP2/RIP25), which is involved in FAS-mediated apoptosis. In combination with mutations in the coding 8 base polyguanine tract of BCL2-associated X protein, these findings suggest that there may be a host of coding microsatellite targets contributing to abrogation of cell cycle and apoptosis regulation in MMR-deficient neoplasms. In contrast, several other DNA repair, cell cycle, and apoptosis regulatory genes, including centromere protein F (CENP-F/MITOSIN), ATP-dependent DNA ligase III (DNA ligase III), transcription factor Dp-2 (DP-2), protein tyrosine phosphatase nonreceptor type 13 (FAP-1/PTPN13), cell division cycle 7 (S. cervesiae homologue)-like 1 (CDC7), REV1 (yeast homologue)-like (REV1), PMS2, apoptotic protease activity factor (APAF1), and checkpoint (Schizosaccharomyces pombe) homologue (CHK1), harbor relatively low mutation frequencies. Finally, the negligible mutation frequencies in two putative apoptosis inhibitors, BCL2-associated anthanogene 4/silencer of death domain (BAG4/SODD) and BCL2-associated X protein antagonist selected in saccharomyces 1 (BASS1), as well as several signaling pathway genes, Vaccinia-related kinase 2 (VRK2), mitogen inducible 2 (MIG-2), PDZ domain-containing guanine nucleotide exchange factor I (GNEF/LOC51735), mitogen activated protein kinase kinase kinase (MAP3K4/MTK1/MEKK4), renal tumor antigen (MOK/RAGE), and Ras-like protein (TC10), also raise the possibility that mutations in some coding microsatellites could be selected against in tumor development.
Although some clues can be obtained from overall mutation frequencies, it is not possible to infer the biological significance of coding microsatellite mutations without functional studies of the cell biological effects of these alterations. It is entirely possible that the frameshift mutations in a given coding microsatellite could be bystanders, even in genes with putative roles in apoptosis and cell cycle regulation. Furthermore, interpretation of mutation frequency requires an understanding of the role that repeat length, nucleotide type, and adjacent sequence context play in determining sequence stability (17). Attempts have been made to compare coding to noncoding mutation frequencies for microsatellites of similar sequence composition (8, 17); however, much larger surveys of intronic sequences are required. Polycytosines and polyguanines may have greater instability than polyadenines and polythymines (18), precluding comparisons of mutation frequencies for different tract compositions. Therefore, additional studies will be required before the functional significance of the mutations identified by us are known.
DNA MMR deficiency underlies a significant proportion of CRCs, endometrial cancers, and gastric cancers. We developed Kangaroo, a powerful bioinformatics search algorithm, to perform an in silico genome scan for coding microsatellites that may be mutated in DNA MMR-deficient tumors. It will now be possible to develop a comprehensive mutation profile across hundreds of coding microsatellites that are putative targets for MMR-deficient tumors. Whereas it is often difficult to understand the biological and functional importance of some of these alterations, the delineation of a comprehensive mutation profile may be analogous to the development of a transcriptional profile of a tumor. This type of systematic approach will be essential to better understand the molecular pathogenesis of MMR-deficient neoplasms.
Semilog distribution of mononucleotide repetitive sequences identified in GenBank records of annotated human coding region. Mononucleotides less than six bases in length were too abundant to enumerate using Kangaroo. For the purposes of this figure, searches for coding microsatellites >13 nucleotides in length were truncated because of apparent ambiguities in many of these sequence entries (see text). Because of the redundancy of GenBank, the number of actual coding mononucleotides is likely to be much smaller than that indicated by the number of records identified.
Semilog distribution of mononucleotide repetitive sequences identified in GenBank records of annotated human coding region. Mononucleotides less than six bases in length were too abundant to enumerate using Kangaroo. For the purposes of this figure, searches for coding microsatellites >13 nucleotides in length were truncated because of apparent ambiguities in many of these sequence entries (see text). Because of the redundancy of GenBank, the number of actual coding mononucleotides is likely to be much smaller than that indicated by the number of records identified.
Coding mononucleotide mutation frequencies in MMR-deficient primary CRCs and cell lines. The genes are organized into three groups based on the mutation frequency in the cell lines: 0% (no cell lines with instability), 1–20% (one cell line with coding instability), and >20% (two or more cell lines with coding instability). Within each of these groups, the genes are arranged according to the mutation frequency in the primary tumors. For some genes, only seven tumor cell lines amplified successfully.
Coding mononucleotide mutation frequencies in MMR-deficient primary CRCs and cell lines. The genes are organized into three groups based on the mutation frequency in the cell lines: 0% (no cell lines with instability), 1–20% (one cell line with coding instability), and >20% (two or more cell lines with coding instability). Within each of these groups, the genes are arranged according to the mutation frequency in the primary tumors. For some genes, only seven tumor cell lines amplified successfully.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Supported in part by the National Cancer Institute of Canada with funds from the Terry Fox Run. M. R. was the recipient of a Research Scientist Award from the National Cancer Institute of Canada supported with funds provided by the Canadian Cancer Society.
The abbreviations used are: CRC, colorectal cancer; NCBI, National Center for Biotechnology Information; MMR, mismatch repair; MSH3, mutS (Escherichia coli) homologue 3; MSI, microsatellite instability; MSI-H, high frequency microsatellite instability; PMS2, postmeiotic segregation increased (S. cervesiae) 2.
Internet address: ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools.
K. Michalickova, G. D. Bader, R. Isserlin, and C. W. V. Hogue. SeqHound biological sequence database system as a platform for bioinformatics research, manuscript in preparation.
Internet address: http://bioinfo.mshri.on.ca.
Internet address: http://bioinfo.mshri.on.ca/kangaroo.
Mutation frequency of coding polyadenine microsatellites in MSI-H colorectal cancers
Gene symbola . | Gene name; function . | Accession no. . | Locationb . | Coding microsatellitec . | Mutation frequency (%) . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Newly identified coding polyadenines | ||||||||||
DNAPK/PRKDC | Protein kinase, DNA-activated, catalytic subunit; DNA double-strand break repair and recombination | NM_006904 | 487 | (A)10 | 33/99 (33) | |||||
10807 | (A)8 | 8/98 (8) | ||||||||
GRK4/GPRK2L | G protein-coupled receptor kinase 2 (Drosophila)-like; desensitize G protein-coupled receptors | NM_005307 | 656 | (A)9 | 19/91 (21) | |||||
RACK7/PRKCBP1 | Protein kinase C-binding protein 1; anchors protein kinase C-beta-1 | NM_012408 | 319 | (A)8 | 17/91 (19) | |||||
CBF2 | CCAAT box binding protein; transcriptional activation | NM_005760 | 1420 | (A)9 | 14/91 (16) | |||||
RECQL1 | RecQ (DNA helicase) protein-like; DNA repair | NM_032941 | 112 | (A)9 | 11/93 (12) | |||||
CDC25C | Cell division cycle 25C; triggers entry into mitosis | NM_001790 | 724 | (A)8 | 10/93 (11) | |||||
ERCC5 | Excision repair cross-complementing, complementation group 5; nucleotide excision and transcription-coupled repair of DNA damage | NM_000123 | 2743 | (A)9 | 8/80 (10) | |||||
TF-34 | Human Krueppel-related DNA binding protein; zinc finger protein | GI: 1124875 | 500 | (A)8 | 6/86 (7) | |||||
BLYM | Burkitt’s lymphoma transforming gene; unknown gene product | GI: 179497 NM_005179 | 78 | (A)8 | 5/96 (5) | |||||
Previously reported coding polyadenines | ||||||||||
MSH3 | MutS (E. coli) homolog 3; DNA mismatch repair | NM_002439 | 1141 | (A)8 | 42/101 (42) | |||||
ATR | Ataxia telangiectasia and Rad3 related; DNA recombination, damage checkpoint, double-strand break repair | NM_001184 | 2311 | (A)10 | 28/97 (29) | |||||
BLM | Bloom syndrome; ATPase/helicase, may suppress inappropriate recombination | NM_000057 | 1536 | (A)9 | 22/97 (23) | |||||
BCL10 | B-cell CLL/lymphoma 10; activates nuclear factor κB, promotes apoptosis, suppresses transformation | NM_003921 | 129 | (A)8 | 7/102 (7) | |||||
493 | (A)7 | 2/102 (2) | ||||||||
PMS2 | Postmeiotic segregation increased (S. cervesiae) 2; DNA mismatch repair | NM_000535 | 1232 | (A)8 | 5/99 (5) |
Gene symbola . | Gene name; function . | Accession no. . | Locationb . | Coding microsatellitec . | Mutation frequency (%) . | |||||
---|---|---|---|---|---|---|---|---|---|---|
Newly identified coding polyadenines | ||||||||||
DNAPK/PRKDC | Protein kinase, DNA-activated, catalytic subunit; DNA double-strand break repair and recombination | NM_006904 | 487 | (A)10 | 33/99 (33) | |||||
10807 | (A)8 | 8/98 (8) | ||||||||
GRK4/GPRK2L | G protein-coupled receptor kinase 2 (Drosophila)-like; desensitize G protein-coupled receptors | NM_005307 | 656 | (A)9 | 19/91 (21) | |||||
RACK7/PRKCBP1 | Protein kinase C-binding protein 1; anchors protein kinase C-beta-1 | NM_012408 | 319 | (A)8 | 17/91 (19) | |||||
CBF2 | CCAAT box binding protein; transcriptional activation | NM_005760 | 1420 | (A)9 | 14/91 (16) | |||||
RECQL1 | RecQ (DNA helicase) protein-like; DNA repair | NM_032941 | 112 | (A)9 | 11/93 (12) | |||||
CDC25C | Cell division cycle 25C; triggers entry into mitosis | NM_001790 | 724 | (A)8 | 10/93 (11) | |||||
ERCC5 | Excision repair cross-complementing, complementation group 5; nucleotide excision and transcription-coupled repair of DNA damage | NM_000123 | 2743 | (A)9 | 8/80 (10) | |||||
TF-34 | Human Krueppel-related DNA binding protein; zinc finger protein | GI: 1124875 | 500 | (A)8 | 6/86 (7) | |||||
BLYM | Burkitt’s lymphoma transforming gene; unknown gene product | GI: 179497 NM_005179 | 78 | (A)8 | 5/96 (5) | |||||
Previously reported coding polyadenines | ||||||||||
MSH3 | MutS (E. coli) homolog 3; DNA mismatch repair | NM_002439 | 1141 | (A)8 | 42/101 (42) | |||||
ATR | Ataxia telangiectasia and Rad3 related; DNA recombination, damage checkpoint, double-strand break repair | NM_001184 | 2311 | (A)10 | 28/97 (29) | |||||
BLM | Bloom syndrome; ATPase/helicase, may suppress inappropriate recombination | NM_000057 | 1536 | (A)9 | 22/97 (23) | |||||
BCL10 | B-cell CLL/lymphoma 10; activates nuclear factor κB, promotes apoptosis, suppresses transformation | NM_003921 | 129 | (A)8 | 7/102 (7) | |||||
493 | (A)7 | 2/102 (2) | ||||||||
PMS2 | Postmeiotic segregation increased (S. cervesiae) 2; DNA mismatch repair | NM_000535 | 1232 | (A)8 | 5/99 (5) |
Includes only those genes with coding microsatellite mutation frequencies of ≥5%.
The numerical position of the repeat sequence is denoted as its position within the coding sequence and is retrieved by Kangaroo from the GenBank Flatfile.
In addition to the identified polyadenine tract, all other coding mononucleotides in the same genes measuring at least seven bases in length were also screened for mutations.
Mutation frequency of coding polyadenine microsatellites in MSI-H human cancer cell lines
Gene symbola . | Gene name; function . | Accession no. . | Locationb . | Coding microsatellitec . | Mutation frequency (%) . |
---|---|---|---|---|---|
RAD50 | RAD50 (S. cervesiae) homolog, DNA double-strand break repair/recombination | NM_005732 | 2175 | (A)9 | 2/8 (25) |
2812 | (A)8 | 0/8 | |||
DDX17/p72 | DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17; ATPase involved in transcription/translation | NM_006386 | 118 | (A)8 | 1/7 (14) |
CASP8AP2/RIP25 | CASP8 associated protein 2; interacts with and activates caspase-8 in FAS mediated apoptosis | NM_02115 | 3700 | (A)8 | 1/8 (12.5) |
3382 | (A)9 | 0/8 | |||
2855 | (A)7 | 0/8 | |||
3794 | (A)7 | 0/8 |
Gene symbola . | Gene name; function . | Accession no. . | Locationb . | Coding microsatellitec . | Mutation frequency (%) . |
---|---|---|---|---|---|
RAD50 | RAD50 (S. cervesiae) homolog, DNA double-strand break repair/recombination | NM_005732 | 2175 | (A)9 | 2/8 (25) |
2812 | (A)8 | 0/8 | |||
DDX17/p72 | DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 17; ATPase involved in transcription/translation | NM_006386 | 118 | (A)8 | 1/7 (14) |
CASP8AP2/RIP25 | CASP8 associated protein 2; interacts with and activates caspase-8 in FAS mediated apoptosis | NM_02115 | 3700 | (A)8 | 1/8 (12.5) |
3382 | (A)9 | 0/8 | |||
2855 | (A)7 | 0/8 | |||
3794 | (A)7 | 0/8 |
Includes only those genes with coding microsatellite mutation frequencies of 10% or greater.
The numerical position of the repeat sequence is denoted as its position within the coding sequence and is retrieved by Kangaroo from the GenBank Flatfile.
In addition to the identified polyadenine tract, all other coding mononucleotides in the same genes measuring at least seven bases in length were also screened for mutations.