Abstract
The repair of damaged DNA requires the function of multiple proteins in generally damage-specific, nonredundant pathways. The relationship of DNA repair to cancer susceptibility is obvious in “cancer families,” in which low frequency, high penetrance, loss-of-function variant alleles of genes with roles in the repair of damaged DNA have been associated with a high risk of disease. More important for the cancer incidence in the general population, many individuals exhibit reduced (60–75% of normal) repair capacity phenotypes that have been associated with several-fold increases in individual cancer risk. In a program to identify the molecular basis for the variation in repair capacity and the elevated cancer susceptibility, we have identified 127 amino acid substitution variants in resequencing 37 DNA repair genes in 36–164 unrelated individuals. Over 50% of the substitutions are exchanges of amino acid residues with dissimilar physical or chemical properties, at sites at which the common residue is identical in the human and mouse proteins. Five additional sequence changes resulting in proteins with altered termination of translation and one amino acid insertion variant were detected. The variant allele frequencies average 0.047, with individual variant allele frequencies ranging from <0.01 to 0.43. Homozygous variant individuals and individuals with multiple amino acid substitutions in a gene were observed. Most individuals exhibited variation in multiple genes in a repair pathway. Ten variant alleles accounted for 52% of the genetic variation among individuals, but a striking 23% of the total variation is associated with 108 variants with allele frequencies of less than 5%. Screening generally healthy individuals generates a catalogue of common variants that is a resource for molecular epidemiology studies endeavoring to use a genotype to phenotype paradigm to estimate the role of genetic variation and individual susceptibility in disease risk from environmental and lifestyle exposures in the general population of the United States.
Introduction
Exposure of cells to environmental agents as well as the by-products of cellular metabolism results in extensive damage to DNA. The pattern of DNA damage is often complex but has characteristics associated with the damaging agent. Organisms have several, generally nonredundant, pathways for repairing different DNA lesions, e.g., strand breaks, adducts, and oxidized bases, resulting from these exposures. The NER3 pathway removes UV-induced pyrimidine dimers and bulky DNA adducts associated with chemical exposures (1, 2). The DSB/RR pathway repairs strand breaks that are often associated with exposure to ionizing radiation and radiometric drugs or the result of incomplete repair of other damage (3, 4, 5). A third pathway, BER, directly processes damaged bases, such as oxidized bases, the most common spontaneous damage (6, 7). These lesions also result from exposure of cells to ionizing or UV radiation via elevations in the intracellular reactive oxygen levels (8). A fourth pathway, MMR, repairs replication errors (9, 10). As recently summarized, DNA repair in mammalian cells involves more than 80 genes with direct roles in the repair of DNA damage (11, 12). At least 40 additional genes with roles in DNA damage recognition and cell cycle checkpoint processes have important, although more indirect, roles in the repair of DNA damage (13, 14).
The important role of DNA repair in the maintenance of a normal cellular genotype and a cancer-free state is most obvious in “cancer families,” in which the presence of rare but highly penetrant variant alleles at a number of loci is associated with a high risk of cancer. A classic example is xeroderma pigmentosum, a prototype cancer gene syndrome associated with the development of UV-induced skin cancers resulting from the loss of function of a gene of the NER pathway (15). The association of defects in MMR with colon cancer is another example of the critical role of DNA repair in cancer prevention (9, 10). Other genes with direct or indirect roles in DNA repair and in which variant alleles are associated with elevated cancer risk, include BRCA1, BRCA2, TP53, ATM, and NBS1 (16). Disruption of the function of genes with roles in the repair of damaged DNA is associated with increased sensitivity to DNA damaging agents and cancer proneness (17). These instances of inherited cancer predisposition have provided important models for increasing the understanding of DNA repair pathways and carcinogenic processes and the relationship of DNA repair to cancer risk. In terms of cancer incidence, these generally highly penetrant, low-activity disease-associated alleles are estimated to account for no more than 5% of the cancer cases in the general population. The remaining cancer cases or sporadic cancers are suggested to occur in individuals with combinations of common polymorphisms with often low penetrance, only marginally altered function, and weak effects giving rise to individuals with increased susceptibility to disease from environmental exposures and lifestyle factors.
Reduced DNA repair capacity is a polymorphic phenotypic trait. At least 10% of the individuals in the population have a capacity to repair DNA damage after in vitro exposure of lymphocytes to DNA-damaging agents, which is only 60–75% of the population mean (18). The reduced-repair-capacity phenotypes for damage induced by bleomycin, γ radiation, and benzo[a]pyrene-diolepoxide, classes of damage expected to be repaired by different pathways and, thus, different sets of genes, behave as independent traits (19). The genetic contribution to the interindividual differences in repair capacity ranges from 0.65 to 0.80 for these three traits, the traits being the ability to repair DNA damage induced by bleomycin, γ radiation, or benzo[a]pyrene-diolepoxide (20, 21, 22). A link between the reduced-DNA-repair-capacity phenotypes and cancer susceptibility is supported by data from a series of epidemiology studies. These studies have demonstrated that a reduced-repair-capacity phenotype is associated with an increased risk (odds ratios of 2–10) of developing tumors at several sites, including breast, lung, skin, liver, or head/neck [see review of Berwick and Vineis (23) and references cited therein]. A limited number of individuals have reduced capacity to repair damage induced by both bleomycin and benzo[a]pyrene-diolepoxide, damage expected to be repaired by different pathways. The number of individuals is consistent with the number expected, given the incidence of individuals with reduced capacity phenotypes for each agent. These individuals with reduced capacity in two pathways exhibit a higher risk of developing lung cancer than do individuals with a reduced capacity in only one pathway (24).
Accumulating evidence suggests that many diseases, including most cancer cases, result from low-level exposures and lifestyle factors in genetically susceptible individuals. The results presented here focus on identification of the common or polymorphic amino acid substitution variants existing in DNA repair genes in the general population. This extends the work of Shen et al. (25) to more genes in a larger number of individuals. This catalogue of variant alleles is a resource for molecular epidemiology studies using a genotype to phenotype or health consequence paradigm for associating DNA repair gene variants to repair capacity and cancer susceptibility.
Materials and Methods
The resequencing strategy, which involves sequencing of the same genomic region in multiple individuals to identify DNA sequence variation in DNA repair genes, has been described previously (25). It involves the direct sequencing of PCR products containing the individual exons of a gene plus adjacent intronic and noncoding regions. The PCR products include the splice sites and the 5′ and 3′ regions of the genes.
PCR Amplification of Exons.
The PCR primers are designed using the Oligo Primer Analysis software (National Biosciences, Inc., Plymouth, MN). Appended to the 5′ end of each PCR primer is the primer binding site for the forward or reverse energy transfer (ET) DNA sequencing primer (Amersham Life Science, Inc., Cleveland, OH). PCR primers are matched so that the sense and the antisense PCR primers contain different sequencing primer binding sites. PCR primers are optimized as necessary by addition of DMSO. Primers were obtained from Sigma Genesys (The Woodlands, TX).
PCR primers are positioned so that amplification of the genomic sequence is initiated at least 75 nucleotides from the intron-exon boundary. This is sufficient distance for high-quality sequence data to be obtained before reaching the intron/exon splice site. The PCR products are ∼400 bp (range, 300–450 bp), and, therefore, the entire fragment can be sequenced in both directions without developing new sequencing primers.
DNA Sequencing.
After PCR amplification, PCR products are diluted and used as substrate in sequencing reactions. Dye primer cycle sequencing reactions are performed according to manufacturer’s instructions for the DYEnamic Direct cycle sequencing kit with the DYEnamic energy transfer primers (Amersham Life Science, Inc.) and loaded into an ABI Prism 377 stretch DNA sequencer (Applied BioSystems, Foster City, CA). The dye primer sequencing method yields generally uniform peak intensities, which facilitates identification of sequence variation in heterozygote individuals as the two comigrating peaks are of similar intensity, but ∼50% of the intensity of the neighboring peaks in the chromatogram. All of the PCR products are sequenced in both directions, with the identification of the variant nucleotide in the sequencing of the reverse read providing evidence for the authenticity of the initial observation of sequence variation.
Sequence Analysis.
The initial data analysis (lane tracking and base calling) is performed with the ABI prism DNA sequence analysis software (version 2.1.2). Chromatograms created by the ABI prism DNA sequence analysis software are imported into a Sun Microsystems UNIX workstation (Sun Microsystems Inc., Mountain View, CA). The chromatograms are reanalyzed with Phred (bases called and quality of sequence values assigned, version 0.961028), assembled with Phrap (version 0.960213) and the resultant data viewed with Consed (version 4.1).5 “PolyPhred” (version 2.1), a software package that uses the output from Phred, Phrap, and Consed, is used to identify single nucleotide substitutions in heterozygote individuals (26). All of the sequence variants identified by PolyPhred and the immediately surrounding region were inspected to confirm the existence of high-quality sequence reads in the region, before “marking” the nucleotide substitution or other sequence alteration as a variant. The consensus sequence for each gene is derived from the samples sequenced in this study and may differ from the specific sequence(s) in GenBank. The numbering of the nucleotides in the genomic sequence is consistent with the numbering in GenBank or the public domain draft sequence that is available. The common or wild-type allele is defined as the most common allele in the sample set sequenced rather than by the reference GenBank sequence. The GenBank genomic sequences for ERCC1, LIG3, POLD1, RAD52, and XRCC1 are in descending numbers because the genomic sequence for each of these genes is in the reverse orientation of the cDNA sequence.
Samples for Variation Screening.
Four sets of samples were screened for variation. Table 1 relates the genes to the specific sample set screened and the number of unrelated individuals screened for variation for each gene. Table 1 also includes the GenBank number for the specific cDNA sequence used for assigning the location of the amino acid substitutions identified in these samples and listed in Table 2.
The majority of the genes have been screened for variation in 92 samples from the “DNA Polymorphism Discovery Resource” at the Coriell Institute for Medical Research (Group I). This resource was developed by the NIH to have a common set of samples available to investigators screening for common variants existing in the general population of the United States. The availability of lists of common polymorphisms in large numbers of candidate genes was expected to facilitate subsequent studies to relate genetic variation to disease risk (27). The samples (27) are from United States residents selected to represent the major ethnic groupings of the population, although the ethnic origin of specific individuals is unknown. The individuals in this sample set are from population groups as follows: European-American, 23; African-American, 23; Mexican-American, 11; Native-American, 11; Asian-American, 23. Because the ethnicity of specific individuals is unknown, the estimated allele frequency data will not suggest potential differences among ethnic groups in the distribution of alleles. These samples cannot be associated with specific donors and were deemed to be exempt by the LLNL-IRB for human subjects research.
Because this variation screening effort was initiated before the establishment of the Polymorphism Discovery Resource, two other sample sets were screened for variation in early studies. The initial resequencing screened DNA from 36 unrelated individuals (Group II). Group II included 12 samples from unidentified individuals for whom no characteristics are known, although they are presumed to have been healthy at the time of sample collection in Ann Arbor, MI and are probably Caucasian. These are the same samples as screened by Shen et al. (25) in a preliminary search for variation in DNA repair genes. Because the samples cannot be associated with a donor, they were deemed to be exempt by the LLNL-IRB. Twenty-four additional samples in Group II were from Caucasians enrolled in a lung cancer study conducted at Johns Hopkins University, Baltimore, MD. Informed consent to use these samples for the study of the possible relationship of variation in DNA repair and cancer had been obtained and the study approved by the Johns Hopkins University Institutional Review Board and the LLNL-IRB. Twelve of the samples are from cancer cases and 12 are from controls. No variants were identified in resequencing of lymphocyte DNA from the lung cancer cases that were not identified in other individuals also.
A subsequent sample set, that was also a predecessor to the Polymorphism Discovery Resource, included 72 individuals of African, Asian, or Caucasian origin selected for geographical diversity (Ref. 28; Group III). These samples are available from the Coriell Institute for Medical Research (Camden, NJ) and were also deemed to be exempt by the LLNL-IRB.
A fourth set of 46 samples (Group IV) has been used to screen for variation in genes of the NER pathway that may be associated with risk of melanoma. These individuals are of Caucasian origin and were selected because of the previous diagnosis of recurrent melanoma. The screening of these samples for sequence variation was approved by the Institutional Review Board of the Memorial Sloan-Kettering Cancer Center (New York, NY) and the LLNL-IRB.
All of the repair genes screened for variation are located on autosomes and, therefore, the number of chromosomes screened is twice the number of samples. Thus, in summary, Group I is 184 chromosomes, Group II is 72 chromosomes, Group III is 144 chromosomes, and Group IV is 92 chromosomes. APEX, POLB, and XRCC1 were screened for variation in both the Group I and II sample sets or a total of 256 chromosomes, and LIG3 was screened in the samples of Groups I and III or 328 chromosomes (Table 1).
Results
Summary of Amino Acid Substitution Variants.
A total of 127 different single nucleotide polymorphisms resulting in amino acid substitutions were identified in the screening of 37 genes (Table 2). The sequence variation data and the individual genotypes for each individual for the genes screened in the Polymorphism Discovery Resource sample set (Group I) can be accessed on the Internet.6 Similar data for the Group III genes can also be accessed.7
A large number of different amino acid substitution variants were identified in several genes, including 10 different variants for LIG1, 9 for XRCC1, and 8 for MLH1. Although none of the 10 LIG1 variants exists at a frequency of over 0.02, the total variant allele frequency for the gene is 0.11. Similarly, the seven low-frequency variants of both XRCC1 and MLH1 exist at total variant allele frequencies of 0.10 for each gene. No amino acid substitution variants were detected in RAD51, PCNA, FEN1, ERCC1, and ERCC3. With the exception of FEN1, a small intronless gene, nucleotide substitutions were identified in both the exons and introns of these genes. In the collection of 127 amino acid substitution variants, 22 variant alleles (17%) exist at estimated frequencies of greater than 0.05. Two variant alleles existing at frequencies of at least 0.10 were identified in five genes (ERCC2, XPC, MSH3, XRCC1, and POLD1). The estimates of allele frequencies must be considered tentative, as relatively small numbers of chromosomes were screened for variation, although the estimated allele frequencies for several of the common polymorphisms are similar to frequencies observed in subsequent molecular epidemiology studies (29, 30, 31). The two nucleotide substitutions in the codon for amino acid residue 618 (Lys) of MLH1 exist in the same individual. From the direct sequencing of the PCR product, it is not possible to differentiate between nucleotide substitutions on the same chromosome and substitutions on opposite chromosomes; thus, three possibilities for the variant amino acid residue exist, Ala if the substitutions are on the same chromosome or Thr and Glu if the substitutions are on different chromosomes.
Six additional variants with potential to disrupt protein structure were identified (Table 3). Two of the variants were identified in RAD50. The first is an insertion of three nucleotides resulting in the addition of a Gln residue after amino acid residue 363, with the sequence changing from Gln-Glu-His-Ile to Gln-Glu-(Gln)-His-Ile. The second variant is a nucleotide substitution in the codon for Gln at 826, generating a termination signal and resulting in deletion of ∼30% of the protein. Each of these variants was observed only once, existing at an estimated allele frequency of <0.01. Nucleotide substitutions resulting in the generation of termination codons and synthesis of truncated proteins were identified at amino acid residue 333 of MRE11A (allele frequency of 0.009) and residues 346 (allele frequency of 0.033) and 415 (allele frequency of 0.041) of RAD52. A four-nucleotide duplication at the COOH terminus of MSH6 was identified (allele frequency of 0.02). This insertion changed the amino acid sequence from Thr-Leu-Ile-Lys-Glu-Leu-stop to the variant sequence Thr-Leu-Ile-Asp-stop. No nucleotide substitutions were observed in the critical splice site consensus sequences (splice site plus or minus two nucleotides). In total, 133 different nucleotide sequence alterations with potential to disrupt protein function were identified in the screening of 37 genes.
The variation identified in screening 37 genes is summarized by repair pathway in Table 4. An average of 3.6 different variants were identified per gene. The totals in Table 4 are the number of different genes screened and unique variants observed. They are not the sum of the numbers for the individual pathways because several genes have roles in more than one repair pathway. The average variant allele frequency is 0.047 (Table 4). No major differences in average allele frequency for the genes of the different pathways were noted.
In addition to the 133 nucleotide substitution and insertion/deletion variants described in Tables 2 and 3, 96 nucleotide substitutions that did not result in amino acid substitutions were detected within the exons. Thus, ∼60% of the nucleotide substitutions identified in the exons of these repair genes result in amino acid substitutions. In addition, 608 nucleotide substitutions have been identified in the adjacent intronic and 3′- and 5′-UTRs of these genes. Initial analysis does not provide evidence that any of the nucleotide substitutions in the 5′-UTR disrupt known regulatory sites.8 These sequence variation data are also available on the Internet.6
Characteristics of Variants.
Twenty-eight of the 127 amino acid substitutions (22%) occurred at Arg residues, i.e., Arg is the common allele. Twenty of the nucleotide substitutions in the Arg codon were at the G residue and 18 of the substitutions were G to A (CGX to CAX), resulting in replacement of an Arg with either His (10 variants) or Gln (8 variants), depending on the third position nucleotide. His and Gln are amino acid residues with properties that differ from Arg. Six substitutions occurred at Pro residues, a residue imparting significant constraints on protein structure.
The nature of the amino acid residues involved in the interchange can be defined by criteria reflective of the physical and chemical properties of the respective members of the amino acid pair (32). Using the groupings of amino acid residues by similar chemical and physical characteristics used by Smith and Smith (33), 101 of the 127 variants (80%) involve the interchange of amino acid residues with dissimilar physical or chemical properties.
The evolutionary conservation of the amino acid residues at the site of the substitution is another characteristic suggestive of the potential for an amino acid substitution to impact protein function. Ninety-five (75%) of the 127 amino acid substitution variants exist at amino acid residues at which the common allele in the human protein is identical to the amino acid residue in the mouse protein. Seventy-nine or 62% of the substitutions involve the exchange of amino acid residues with dissimilar physical or chemical properties at residues at which the common allele in humans encodes the same amino acid residue as observed in the mouse protein. This number is reduced to 56 or 44% when the properties of the exchanged amino acid residues are scored by the Blosum 62 matrix (34). Over 50% of the 22 variants with allele frequencies of >0.05 involve the exchange of amino acid residues with dissimilar physical and/or chemical properties at residues at which the human and mouse proteins have the same amino acid residue, characteristics often associated with negative impact on protein function.
Contributions of Individual Alleles to Total Variation and the Complexity of Individual Genotypes.
The relative contribution of variants existing at different allele frequencies to the total genetic variation among individuals is presented in Table 5. The seven variants existing at frequencies of >0.30 account for 41% of the total genetic variation among individuals for these genes, whereas the 116 variant alleles existing at individual frequencies of less than 10% account for 32% of the total variation in the population.
Homozygous variant individuals were identified for 14 of the 17 variant alleles existing at variant allele frequencies of 0.10 or greater. Two of the variant alleles not observed in the homozygous state were at loci (XPC and RAD23B) at which only 46 individuals were screened for variation.
Two or more different variant alleles were identified in 26 genes, and many individuals with two different variant alleles in a gene were observed. These individuals would express two different variant forms of a protein (if the substitutions were on different chromosomes) or a wild-type protein and a variant protein with two amino acid substitutions (if the substitutions were located on the same chromosome). For example, 14 of 36 individuals screened for variation in ERCC2 exhibited variation at more than one residue. One individual was homozygous variant at both of the highly polymorphic sites (312Asn/Asn and 751Gln/Gln) in ERCC2, seven individuals were heterozygous at one site and homozygous variant at the other and six were heterozygous at both sites. Among 128 individuals screened for variation in XRCC1, eight individuals were homozygous variant at residue 399 (399Gln/Gln), and three were homozygous variant at residue 194 (194Trp/Trp). No individuals were homozygous at one of these two sites in XRCC1 and also variant at the second site, which suggested that the variant alleles at XRCC1 194 and 399 were on different chromosomes. Twelve individuals were variant at one of these two sites and also at an additional site within the protein.
Complex genotypes were observed in compiling the variation for the genes of a repair pathway for an individual. The BER pathway, in which 49 different variants were identified in screening 12 of the 30 genes of this pathway, is illustrative. Only 3 of 90 individuals were homozygous wild-type at all 12 of the loci, whereas 5 individuals were identified with 5 variant alleles and 7 individuals had 6 variant alleles among the 12 BER genes screened thus far (Table 6). The average of 2.8 variant alleles per individual would extrapolate to ∼7 variants in the average individual, from a pool estimated to include ∼115 different variants, for the 30 genes of the BER pathway.
Seventy-five different combinations of alleles or pathway genotypes were observed in compiling the variation for these 12 BER genes. In addition to the three individuals who were wild-type at all of the 12 loci screened, five groups of either two or three individuals were heterozygous for the same single variant, whereas two pairs of individuals were heterozygous for the same 2 variant alleles. All of the 49 individuals with 3 or more variant alleles among these 12 genes had unique combination of single nucleotide polymorphisms. Examples of the complexity of genotypes observed are presented for the 26 individuals with three variant alleles in Table 7. The data presented include only the 27 variant alleles existing in this specific subset of individuals.
Similar complex pathway genotypes were observed for the other repair pathways, although the data sets were not as complete because not all of the genes screened in these pathways were resequenced in the same sample set. [The individual genotype data (except for the limited data for Group II individuals) are available at http://greengenes.llnl.gov/dpublic/secure/reseq/or http://manuel.niehs.nih.gov/egsnp/home.htm.]
Discussion
Understanding the molecular and biological basis for the reduced-DNA-repair-capacity phenotype and the associated elevation in individual cancer risk will require that both the genes involved in the repair of the different classes of DNA damage be known and that the common variants in these genes be cataloged. Thus, this study focused on well-characterized genes of the different DNA repair pathways and the associated DNA damage recognition and cell cycle checkpoint processes. The ultimate goal is the identification of the common variants segregating in the United States population for all of the genes with roles in the repair of damaged DNA. This catalogue will serve as a resource for future biochemical and molecular epidemiology studies. The strategy used here, screening a limited number of individuals selected to represent a larger population, will not provide the resources for the study of genetically distinct populations or subpopulations with specific variation and potentially unique risks, e.g., the Ashkenazi Jewish population (35). Molecular epidemiology of genetically isolated populations will require additional screening to ensure ascertainment of variants specific to these populations. Although the ethnic composition of the set of samples screened for variation in this study is known (27), the ethnicity of specific individuals is unknown; thus, the estimated allele frequencies do not address the issue of ethnic differences in allele distribution or frequency. Future molecular epidemiology studies that address questions regarding the role of these variants in disease susceptibility will undoubtedly obtain data regarding potential differences in allele frequencies among different ethnic groups or substructured populations.
Extensive variation, 3.6 different amino acid substitution- and protein sequence altering-variants per gene with an average variant allele frequency of 0.047, was identified in the resequencing of the 37 genes in this screening effort. This is consistent with previous, more limited or focused studies of variation in DNA repair genes in “non-cancer family” individuals (36, 37, 38). The common variants of ERCC2 (39), ERCC4 (40), RAD52 (41), XPC (42), MSH3 (43), and RAD54 (41) that were observed in this resequencing effort were identified in other studies also. The RAD51 gene appears to be very invariant, because no amino acid substitution variants were identified in screening 92 individuals in this study or in 100 additional individuals (38) or in over 60 human tumor samples (44). Kato et al. (45) identified one amino acid-substitution variant in RAD51 in 2 of 45 Japanese breast cancer patients, but this variant was not seen in genotyping 200 additional Japanese breast cancer and 100 colon cancer patients. As in this screen, no amino acid substitution variants were detected in screening another sample set for variants in FEN1 and PCNA (46). Although the study presented in the present report focuses on nucleotide sequence variation that results in amino acid substitutions or that otherwise disrupts protein structure extensive additional sequence variation in both exons and also noncoding regions that could potentially impact gene expression was identified.8 The reported association of variation in the 5′-UTR of RAD51 with the risk of breast and ovarian cancer in BRCA1/2 mutation carriers is an example of the potential for variation in noncoding regions of genes to be important (47).
The average of 3.6 different variants per repair gene is slightly higher than the number of different variants observed in the systematic screening of other sets of candidate disease-susceptibility genes. For example, an average of 1.1–2.8 different amino acid substitution variants per gene were observed in sequencing 36 cardiovascular risk genes (48), 75 hypertension risk genes (49), or 106 common disease risk genes (50) in samples from multiethnic populations. In contrast to the extensive variation observed in screening multiethnic sample sets, only 0.4–0.5 different amino acid substitution variants per gene were identified in genes associated with the risk of ischemic heart disease (41 genes; Ref. 51) or rheumatoid arthritis (41 genes; Ref. 52) in the resequencing of DNA from 49 Japanese individuals. This lower level of variation in the Japanese sample could be related to a lower level of heterogeneity in that population. It is also consistent with the observation that the total number of different variants detected is lowest when screening individuals of only one ethnic origin and is highest when individuals of African origin are included in a multiethnic sample set (53, 54). Similar results were observed in the present study, in which two variants of APEX and five variants of XRCC1 that were not observed in the Caucasian samples of Group II were detected in screening the multiethnic Group I sample set. Thus, it is expected that the total number of different variants identified in these genes would have been even higher if the 11 repair genes that were screened for variation only in the Caucasian sample sets (Groups II and IV) had been screened in a multiethnic sample set.
The functional relevance of the extensive variation identified in DNA repair genes remains to be fully addressed, although initial data are becoming available. Seven variants of APEX identified in this study, in other reports (55, 56), or GenBank DNA sequence databases have been characterized as to activity in biochemical assays (57). Four of the variants retained only 10–60% of normal or wild-type activity. Three of the reduced activity variants involved the exchange of amino acid residues with dissimilar properties at residues that were identical in the human and mouse proteins. The fourth variant, which retained 60% of wild-type activity, was a replacement of Glu at residue 126 by Asp, residues with similar properties. Two of the variants retaining wild-type activity (Asp148Glu and Gly306Ala) were exchanges that would not be expected to disrupt structure by these limited criteria. The exchange of Arg at residue 241 for Gly (the mouse residue is Gly), residues with dissimilar characteristics, also retained wild-type activity. Using the characteristics of the amino acid residues, knowledge of evolutionary conservation and localization of the residues within the three-dimensional structure of the APEX protein, Hadi et al. (57) correctly predicted the impact of the substitutions for six of these seven variants. The exception was the Glu to Asp exchange at residue 126. The availability of the protein structure obviously enhances the potential to correctly predict the impact of a substitution on protein activity.
In other studies to address the question of functional relevance, several variants have been associated with altered DNA repair capacity or level of damage from an exposure. Spitz et al. (58) reported that variant alleles at amino acid residues 312 and 751 of ERCC2 were associated with a reduced capacity to repair damage induced by in vitro exposure of lymphocytes from individuals of a lung cancer cohort to benzo[a]pyrene diol epoxide. Homozygosity for a variant allele in either of two NER genes, XPC or ERCC2, was associated with the reduced capacity to repair UV-induced DNA damage as assayed by the host-reactivation assay in lymphocytes from a cohort of healthy subjects (59). Hu et al. (31) reported that the APEX 148Glu allele was associated with prolonged mitotic delay in lymphocytes exposed to ionizing radiation. In addition, women with at least three variant alleles of APEX and XRCC1 were at increased likelihood of being ionizing-radiation sensitive. The XRCC1 399Gln variant has been associated with increased aflatoxin adducts (30) and with increased polyphenol adducts in the breast tissue of smokers (29). This variant has also been associated with increased levels of glycophorin A mutations in RBCs (30) and sister chromatid exchanges in lymphocytes from smokers (29). The data suggest that the approximately 40% of the population with this variant allele of XRCC1 will have 20–50% more DNA damage than will individuals with the common allele after similar exposures.
A steadily increasing number of molecular epidemiology studies are reporting on the potential association of polymorphic DNA-repair gene variants with cancer risk. Over 30 studies reporting on the cancer risk associated with one or more of the variants of APEX 148, ERCC2 312 and 751, XRCC1 194 and 399, or XRCC3 241 have been published. These manuscripts can be accessed using the appropriate gene symbol to search the National Center for Biotechnology Information Entrez search and retrieval system.9 These genotyping-risk-association studies can be summarized as follows: (a) the sample sizes are generally small, and usually the genotyping included only the variants in a single gene of a repair pathway and sometimes only a single variant in a gene; (b) most of these variant alleles have been reported to be associated with cancer risk in some studies; (c) other studies have reported an absence of association in cohorts with similar cancers or cohorts with other cancers; and (d) in some studies, the common or wild-type allele was associated with elevated risk.
The molecular epidemiology and biochemical characterization of individual variants are beginning to address the relationship of individual polymorphic variants to repair capacity and relevance as cancer risk factors. The cataloguing of variants in the genes in pathways with roles in ameliorating the negative consequences of exposures is a step in initiating these studies. It is important that a majority of the variants in each gene in a pathway be identified and be used as reagents for molecular epidemiology and biochemical function studies, because it is expected that the impact of most of the variants will be individually small (60, 61). It is usually assumed that selection against the alleles associated with elevated risk of complex diseases is generally small because they are expected to have only mildly deleterious impacts on function (62). As observed in the present study, the collective contribution of the low-frequency variants to the total variation among individuals is large, emphasizing the importance of using the full richness of this genetic variation across all relevant genes and pathways to estimate individual susceptibility. The development of approaches for relating complex genotypes, interacting with exposure (63), to disease risk (64) will be required. In the end of days, the refinement of estimates of cancer risk will require extensive genotyping in studies using large cohorts with well-documented exposure histories and disease status to account for the manifold and subtle effects of gene-gene and gene-gene-environment interactions in determining individual susceptibility.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Work performed under the auspices of the United States Department of Energy by the University of California, Lawrence Livermore National Laboratory (contract W-7405-ENG-48) and supported in part by Interagency Agreement Y1-ES-8054-05 from National Institute of Environmental Health Sciences and National Cancer Institute Grant 1 U-1 CA 83180-03.
The abbreviations used are: NER, nucleotide excision repair; DSB/RR, double-strand break/recombination repair; BER, base excision repair; MMR, mismatch repair; LLNL-IRB, Lawrence Livermore National Laboratory Institutional Review Board; UTR, untranslated region; UV, ultraviolet.
Internet address: http://www.gene.ucl.ac.uk/nomenclature/.
Description and documentation for Phred, Phrap, and Consed may be obtained at http://www.genome.washington.edu.
Internet address: http://greengenes.llnl.gov/dpublic/secure/reseq/.
Internet address: http://manuel.niehs.nih.gov/egsnp/home.htm.
Tong Xi, Johana Vázquez-Matías, and Harvey W. Mohrenweiser. Single nucleotide polymorphisms in the exonic and neighboring intronic and untranslated regions of 40 DNA repair and repair related genes in humans, manuscript in preparation.
Internet address: http://www.ncbi.nlm.nih.gov/Entrez/.
Summary of genes screened for variation
Gene . | Alias . | Samples screeneda . | Genomic sequence . | cDNA sequence . |
---|---|---|---|---|
ADPRT | PARP | I | AC04143 | J03473 |
APEX | APE1 | I, II | M92444 | M80261 |
CDK4 | II | U37022 | M14505 | |
CDKN2A | INK4; p16 | II | U12818–U12820 | NM_000077 |
ERCC1 | II | M63796 | M13194 | |
ERCC2 | XPD | II | L47234 | X52221 |
ERCC3 | XPB | IV | AC027142 | M31899 |
ERCC4 | XPF | II | L76568 | NM_005236 |
ERCC5 | XPG | IV | D16305 | NM_000123 |
FANCG | XRCC9 | I | AC004472 | U70310 |
FEN1 | III | AC004770 | L37374 | |
LIG1 | I | AC011466 | M36067 | |
LIG3 | I, III | AC004223 | U40671, X84740 | |
MLH1 | HNPCC | III | U40960TO–U40978 | U07418 |
MRE11A | I | AP000786 | NM_005590 | |
MSH2 | III | U41206–U41221 | U03911 | |
MSH3 | III | D61397–D61419 | U61981 | |
MSH6 | GTBP | III | AC006509 | NM_000179 |
NBS1 | I | AF069291.1 | NM_002485 | |
NTHL1 | NTH1 | I | AC005600 | U79718 |
NUDT1 | MTH1 | I | D38591 | D16581 |
PCNA | II | J04718 | M15796 | |
POLB | I, II | AH00541, and AF170802 | NM_002690 | |
POLD1 | I | AC020909, and CITB-E1_2545M3 | M80397 | |
POLD2 | I | AC006454 | NM_006230 | |
RAD23A | HHR23A | II | AD0000092 | NM_005053 |
RAD23B | HHR23B | IV | AL137852 | D21090 |
RAD50 | III | AC004042A | U63139 | |
RAD51 | I | AF165088–AF165094 | D14134 | |
RAD52 | I | AC004803 | U27516 | |
RAD54L | I | AL360086 | NM_003579 | |
XPA | II | AL442130, U16815 | NM_000380 | |
XPC | IV | AH009651 | D21089 | |
XRCC1 | I, II | L34079 | M36089 | |
XRCC2 | III | AC003109 | Y08837 | |
XRCC3 | II | AF037222 | NM_005432 | |
XRCC4 | I | AC034211, AC022416 | NM_003401 |
Gene . | Alias . | Samples screeneda . | Genomic sequence . | cDNA sequence . |
---|---|---|---|---|
ADPRT | PARP | I | AC04143 | J03473 |
APEX | APE1 | I, II | M92444 | M80261 |
CDK4 | II | U37022 | M14505 | |
CDKN2A | INK4; p16 | II | U12818–U12820 | NM_000077 |
ERCC1 | II | M63796 | M13194 | |
ERCC2 | XPD | II | L47234 | X52221 |
ERCC3 | XPB | IV | AC027142 | M31899 |
ERCC4 | XPF | II | L76568 | NM_005236 |
ERCC5 | XPG | IV | D16305 | NM_000123 |
FANCG | XRCC9 | I | AC004472 | U70310 |
FEN1 | III | AC004770 | L37374 | |
LIG1 | I | AC011466 | M36067 | |
LIG3 | I, III | AC004223 | U40671, X84740 | |
MLH1 | HNPCC | III | U40960TO–U40978 | U07418 |
MRE11A | I | AP000786 | NM_005590 | |
MSH2 | III | U41206–U41221 | U03911 | |
MSH3 | III | D61397–D61419 | U61981 | |
MSH6 | GTBP | III | AC006509 | NM_000179 |
NBS1 | I | AF069291.1 | NM_002485 | |
NTHL1 | NTH1 | I | AC005600 | U79718 |
NUDT1 | MTH1 | I | D38591 | D16581 |
PCNA | II | J04718 | M15796 | |
POLB | I, II | AH00541, and AF170802 | NM_002690 | |
POLD1 | I | AC020909, and CITB-E1_2545M3 | M80397 | |
POLD2 | I | AC006454 | NM_006230 | |
RAD23A | HHR23A | II | AD0000092 | NM_005053 |
RAD23B | HHR23B | IV | AL137852 | D21090 |
RAD50 | III | AC004042A | U63139 | |
RAD51 | I | AF165088–AF165094 | D14134 | |
RAD52 | I | AC004803 | U27516 | |
RAD54L | I | AL360086 | NM_003579 | |
XPA | II | AL442130, U16815 | NM_000380 | |
XPC | IV | AH009651 | D21089 | |
XRCC1 | I, II | L34079 | M36089 | |
XRCC2 | III | AC003109 | Y08837 | |
XRCC3 | II | AF037222 | NM_005432 | |
XRCC4 | I | AC034211, AC022416 | NM_003401 |
Sample sets are described in “Materials and Methods.”
Amino acid substitution variants identified in DNA repair and repair-related genes
The common nucleotide followed by the variant nucleotide is enclosed in parentheses, and the codon for the amino acid residue is underlined and bold. The nine amino acid substitutions reported previously (25) are indicated with an asterisk.
Gene name . | Exon . | Codon . | Common residue . | Variant residue . | Allele frequency . | Mouse residue . | cDNA sequence 5′→3′ . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BER . | . | . | . | . | . | . | . | |||||||
ADPRT | 4 | 188 | Ala | Thr | 0.006 | Ser | TCCTT(G/A)CTACA | |||||||
ADPRT | 7 | 334 | Val | Ile | 0.011 | Val | AGTGG(G/A)TAACC | |||||||
ADPRT | 8 | 383 | Ser | Tyr | 0.014 | Ser | CTCCT(C/A)TGCTT | |||||||
ADPRT | 17 | 761 | Val | Ala | 0.18 | Val | CAAGG(T/C)GGAAA | |||||||
ADPRT | 21 | 940 | Lys | Arg | 0.011 | Lys | CAGCA(A/G)GTTAC | |||||||
APE1 | 3 | 51 | Gln | His | 0.03 | Gln | GATCA(G/C)AAAAC | |||||||
APE1 | 3 | 64 | Ile | Val | 0.01 | Ile | TCAAG(A/G)TCTGC | |||||||
APE1 | 5 | 148 | Asp | Glu | 0.33 | Glu | GGCGA(T/G)GAGGA | |||||||
APE1 | 5 | 241 | Gly | Arg | 0.01 | Gly | GCTTC(G/A)GGGAA | |||||||
FEN1 | No variants | |||||||||||||
LIG1 | 3 | 24 | Ala | Val | 0.01 | Thr | GGAGG(C/T)ATCCA | |||||||
LIG1 | 4 | 62 | Arg | Trp | 0.01 | Gln | CGGCC(C/T)GGGTC | |||||||
LIG1 | 9 | 249 | Gly | Glu | 0.01 | Gly | GCCAG(G/A)GGCTC | |||||||
LIG1 | 10 | 267 | Asn | Ser | 0.02 | Asn | TTACA(A/G)TCCTG | |||||||
LIG1 | 13 | 369 | Val | Ile | 0.01 | Ile | AGTCC(G/A)TCCGG | |||||||
LIG1 | 13 | 409 | Arg | His | 0.01 | Cys | GTTCC(G/A)CGACA | |||||||
LIG1 | 16 | 480 | Met | Val | 0.01 | Val | CAGCC(A/G)TGGTG | |||||||
LIG1 | 20 | 614 | Thr | Ile | 0.01 | Thr | GGTCA(C/T)ATCCT | |||||||
LIG1 | 22 | 673 | Glu | Asp | 0.01 | Gln | CGTGA(G/T)CCCCT | |||||||
LIG1 | 22 | 677 | Arg | Leu | 0.01 | Arg | TTCCC(G/T)GCGCC | |||||||
LIG3 | 18 | 780 | Arg | His | 0.03 | Cys | GTCCC(G/A)CAAGG | |||||||
LIG3 | 19 | 811 | Lys | Thr | 0.01 | Lys | TGCAA(A/C)GCCTT | |||||||
LIG3 | 21 | 899 | Pro | Ser | 0.01 | Thr | AGAAC(C/T)CTGCG | |||||||
NTHL1 | 1 | 21 | Arg | Trp | 0.006 | Arg | GGAGC(C/T)GGAGC | |||||||
NTHL1 | 1 | 31 | Gly | Val | 0.006 | Gly | GCGGG(G/T)GTGTA | |||||||
NTHL1 | 1 | 33 | Arg | Lys | 0.006 | Arg | GTGTA(G/A)GGAGG | |||||||
NTHL1 | 3 | 176 | Ile | Thr | 0.005 | Ile | GCTCA(T/C)CTACC | |||||||
NTHL1 | 4 | 234 | Ser | Leu | 0.006 | Ser | TGTGT(C/T)AGGCA | |||||||
NUDT1 | 3 | 83 | Val | Met | 0.006 | Val | TGGAC(G/A)TGCAT | |||||||
NUDT1 | 4 | 135 | Gly | Trp | 0.006 | Gly | TCCAC(G/T)GGTAC | |||||||
PCNA | No variants | |||||||||||||
POLB | 1 | 8 | Gln | Arg | 0.01 | Gln | GCCGC(A/G)GGAGA | |||||||
POLB | 7 | 137 | Arg | Gln | 0.006 | Arg | TCAGC(G/A)AATTG | |||||||
POLB | 12 | 242 | Pro | Arg | 0.005 | Pro | GCTTC(C/G)CAGTA | |||||||
POLD1 | 1 | 19 | Arg | His | 0.12 | Arg | GGCCC(G/A)TGGGG | |||||||
POLD1 | 1 | 30 | Arg | Trp | 0.006 | Ser | CACCT(C/T)GGCCA | |||||||
POLD1 | 3 | 119 | Arg | His | 0.15 | Arg | ATCCC(G/A)CGGCT | |||||||
POLD1 | 4 | 173 | Ser | Asn | 0.05 | Ser | CATCA(G/A)CCGGG | |||||||
POLD1 | 4 | 177 | Arg | His | 0.003 | Arg | CAGTC(G/A)CGGGG | |||||||
POLD1 | 19 | 849 | Arg | His | 0.011 | Arg | ACTGC(G/A)CCGCC | |||||||
POLD1 | 26 | 1086 | Arg | Gln | 0.01 | Arg | GGTGC(G/A)GAAGG | |||||||
POLD2 | 8 | 303 | Asn | Ser | 0.005 | Asn | CACCA(A/G)TTACA | |||||||
XRCC1 | 3 | 72 | Val | Ala | 0.03 | Val | GCTGG(T/C)GGGCA | |||||||
XRCC1 | 5 | 161 | Pro | Leu | 0.005 | Thr | GGCCC(C/T)GTCCC | |||||||
XRCC1 | 5 | 173 | Phe | Leu | 0.005 | Phe | CAGTT(C/G)CGTGT | |||||||
XRCC1 | 6 | 194* | Arg | Trp | 0.13 | Arg | TCAGC(C/T)GGATC | |||||||
XRCC1 | 9 | 280* | Arg | His | 0.03 | Arg | AACTC(G/A)TACCC | |||||||
XRCC1 | 9 | 309 | Pro | Ser | 0.01 | Ala | GACGA(C/T)CCCGA | |||||||
XRCC1 | 10 | 399* | Arg | Gln | 0.24 | Arg | CTCCC(G/A)GAGGT | |||||||
XRCC1 | 15 | 560 | Arg | Trp | 0.01 | Arg | AGCGG(C/T)GGAAA | |||||||
XRCC1 | 16 | 576 | Tyr | Ser | 0.01 | Tyr | GGACT(A/C)TATGA |
Gene name . | Exon . | Codon . | Common residue . | Variant residue . | Allele frequency . | Mouse residue . | cDNA sequence 5′→3′ . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BER . | . | . | . | . | . | . | . | |||||||
ADPRT | 4 | 188 | Ala | Thr | 0.006 | Ser | TCCTT(G/A)CTACA | |||||||
ADPRT | 7 | 334 | Val | Ile | 0.011 | Val | AGTGG(G/A)TAACC | |||||||
ADPRT | 8 | 383 | Ser | Tyr | 0.014 | Ser | CTCCT(C/A)TGCTT | |||||||
ADPRT | 17 | 761 | Val | Ala | 0.18 | Val | CAAGG(T/C)GGAAA | |||||||
ADPRT | 21 | 940 | Lys | Arg | 0.011 | Lys | CAGCA(A/G)GTTAC | |||||||
APE1 | 3 | 51 | Gln | His | 0.03 | Gln | GATCA(G/C)AAAAC | |||||||
APE1 | 3 | 64 | Ile | Val | 0.01 | Ile | TCAAG(A/G)TCTGC | |||||||
APE1 | 5 | 148 | Asp | Glu | 0.33 | Glu | GGCGA(T/G)GAGGA | |||||||
APE1 | 5 | 241 | Gly | Arg | 0.01 | Gly | GCTTC(G/A)GGGAA | |||||||
FEN1 | No variants | |||||||||||||
LIG1 | 3 | 24 | Ala | Val | 0.01 | Thr | GGAGG(C/T)ATCCA | |||||||
LIG1 | 4 | 62 | Arg | Trp | 0.01 | Gln | CGGCC(C/T)GGGTC | |||||||
LIG1 | 9 | 249 | Gly | Glu | 0.01 | Gly | GCCAG(G/A)GGCTC | |||||||
LIG1 | 10 | 267 | Asn | Ser | 0.02 | Asn | TTACA(A/G)TCCTG | |||||||
LIG1 | 13 | 369 | Val | Ile | 0.01 | Ile | AGTCC(G/A)TCCGG | |||||||
LIG1 | 13 | 409 | Arg | His | 0.01 | Cys | GTTCC(G/A)CGACA | |||||||
LIG1 | 16 | 480 | Met | Val | 0.01 | Val | CAGCC(A/G)TGGTG | |||||||
LIG1 | 20 | 614 | Thr | Ile | 0.01 | Thr | GGTCA(C/T)ATCCT | |||||||
LIG1 | 22 | 673 | Glu | Asp | 0.01 | Gln | CGTGA(G/T)CCCCT | |||||||
LIG1 | 22 | 677 | Arg | Leu | 0.01 | Arg | TTCCC(G/T)GCGCC | |||||||
LIG3 | 18 | 780 | Arg | His | 0.03 | Cys | GTCCC(G/A)CAAGG | |||||||
LIG3 | 19 | 811 | Lys | Thr | 0.01 | Lys | TGCAA(A/C)GCCTT | |||||||
LIG3 | 21 | 899 | Pro | Ser | 0.01 | Thr | AGAAC(C/T)CTGCG | |||||||
NTHL1 | 1 | 21 | Arg | Trp | 0.006 | Arg | GGAGC(C/T)GGAGC | |||||||
NTHL1 | 1 | 31 | Gly | Val | 0.006 | Gly | GCGGG(G/T)GTGTA | |||||||
NTHL1 | 1 | 33 | Arg | Lys | 0.006 | Arg | GTGTA(G/A)GGAGG | |||||||
NTHL1 | 3 | 176 | Ile | Thr | 0.005 | Ile | GCTCA(T/C)CTACC | |||||||
NTHL1 | 4 | 234 | Ser | Leu | 0.006 | Ser | TGTGT(C/T)AGGCA | |||||||
NUDT1 | 3 | 83 | Val | Met | 0.006 | Val | TGGAC(G/A)TGCAT | |||||||
NUDT1 | 4 | 135 | Gly | Trp | 0.006 | Gly | TCCAC(G/T)GGTAC | |||||||
PCNA | No variants | |||||||||||||
POLB | 1 | 8 | Gln | Arg | 0.01 | Gln | GCCGC(A/G)GGAGA | |||||||
POLB | 7 | 137 | Arg | Gln | 0.006 | Arg | TCAGC(G/A)AATTG | |||||||
POLB | 12 | 242 | Pro | Arg | 0.005 | Pro | GCTTC(C/G)CAGTA | |||||||
POLD1 | 1 | 19 | Arg | His | 0.12 | Arg | GGCCC(G/A)TGGGG | |||||||
POLD1 | 1 | 30 | Arg | Trp | 0.006 | Ser | CACCT(C/T)GGCCA | |||||||
POLD1 | 3 | 119 | Arg | His | 0.15 | Arg | ATCCC(G/A)CGGCT | |||||||
POLD1 | 4 | 173 | Ser | Asn | 0.05 | Ser | CATCA(G/A)CCGGG | |||||||
POLD1 | 4 | 177 | Arg | His | 0.003 | Arg | CAGTC(G/A)CGGGG | |||||||
POLD1 | 19 | 849 | Arg | His | 0.011 | Arg | ACTGC(G/A)CCGCC | |||||||
POLD1 | 26 | 1086 | Arg | Gln | 0.01 | Arg | GGTGC(G/A)GAAGG | |||||||
POLD2 | 8 | 303 | Asn | Ser | 0.005 | Asn | CACCA(A/G)TTACA | |||||||
XRCC1 | 3 | 72 | Val | Ala | 0.03 | Val | GCTGG(T/C)GGGCA | |||||||
XRCC1 | 5 | 161 | Pro | Leu | 0.005 | Thr | GGCCC(C/T)GTCCC | |||||||
XRCC1 | 5 | 173 | Phe | Leu | 0.005 | Phe | CAGTT(C/G)CGTGT | |||||||
XRCC1 | 6 | 194* | Arg | Trp | 0.13 | Arg | TCAGC(C/T)GGATC | |||||||
XRCC1 | 9 | 280* | Arg | His | 0.03 | Arg | AACTC(G/A)TACCC | |||||||
XRCC1 | 9 | 309 | Pro | Ser | 0.01 | Ala | GACGA(C/T)CCCGA | |||||||
XRCC1 | 10 | 399* | Arg | Gln | 0.24 | Arg | CTCCC(G/A)GAGGT | |||||||
XRCC1 | 15 | 560 | Arg | Trp | 0.01 | Arg | AGCGG(C/T)GGAAA | |||||||
XRCC1 | 16 | 576 | Tyr | Ser | 0.01 | Tyr | GGACT(A/C)TATGA |
NER . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ERCC1 | No variants | |||||||||||||
ERCC2 | 8 | 199* | Ile | Met | 0.01 | Ile | TCAAT(C/G)CTGCA | |||||||
ERCC2 | 8 | 201* | His | Tyr | 0.01 | His | TCCTG(C/T)ATGCC | |||||||
ERCC2 | 10 | 312* | Asp | Asn | 0.40 | Asp | TGCCC(G/A)ACGAA | |||||||
ERCC2 | 20 | 616 | Arg | Pro | 0.01 | Arg | CGGGC(G/C)GGCCG | |||||||
ERCC2 | 23 | 751* | Lys | Gln | 0.32 | Gln | CGCTG(A/C)AGAGG | |||||||
ERCC3 | No variants | |||||||||||||
ERCC4 | 7 | 379* | Pro | Ser | 0.03 | Pro | GCAAC(C/T)CAAAG | |||||||
ERCC4 | 8 | 415 | Arg | Gln | 0.06 | Arg | TGACC(G/A)AACAT | |||||||
ERCC5 | 4 | 141 | Asn | Asp | 0.02 | Asp | GAGAA(A/G)ACGAC | |||||||
ERCC5 | 7 | 254 | Met | Val | 0.012 | Met | AGGAA(A/G)TGAAT | |||||||
ERCC5 | 8 | 529 | Cys | Ser | 0.03 | Arg | AACTT(G/C)TACAA | |||||||
ERCC5 | 8 | 597 | Val | Leu | 0.011 | Met | AGGCA(G/C)TAGAT | |||||||
ERCC5 | 10 | 761 | Ala | Thr | 0.011 | Ala | GGATC(G/A)CTGCT | |||||||
ERCC5 | 15 | 1090 | Glu | Asp | 0.011 | Asp | GGGGA(G/C)ACCTG | |||||||
ERCC5 | 15 | 1104 | Asp | His | 0.18 | Asp | GTGAA(G/C)ATGCT | |||||||
LIG1 | See above | |||||||||||||
PCNA | See above | |||||||||||||
POLD1 | See above | |||||||||||||
POLD2 | See above | |||||||||||||
RAD23A | 5 | 200 | Thr | Met | 0.03 | Thr | GCTCA(C/T)GGGAA | |||||||
RAD23B | 7 | 249 | Ala | Val | 0.10 | Ala | TGGGG(C/T)TCCTC | |||||||
XPA | 6 | 256 | Met | Ile | 0.01 | Met | GACAT(G/C)TACCG | |||||||
XPC | 1 | 16 | Leu | Val | 0.04 | none | GCGAA(C/G)TGCGC | |||||||
XPC | 2 | 48 | Leu | Phe | 0.04 | Ser | GCCTT(C/T)TCTCC | |||||||
XPC | 8 | 492 | Arg | His | 0.04 | Arg | CCATC(G/A)TAAGG | |||||||
XPC | 8 | 499 | Ala | Val | 0.24 | Ala | GCCAG(C/T)GGCAT | |||||||
XPC | 15 | 939 | Lys | Gln | 0.38 | Lys | TTGAG(A/C)AGCTG |
NER . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ERCC1 | No variants | |||||||||||||
ERCC2 | 8 | 199* | Ile | Met | 0.01 | Ile | TCAAT(C/G)CTGCA | |||||||
ERCC2 | 8 | 201* | His | Tyr | 0.01 | His | TCCTG(C/T)ATGCC | |||||||
ERCC2 | 10 | 312* | Asp | Asn | 0.40 | Asp | TGCCC(G/A)ACGAA | |||||||
ERCC2 | 20 | 616 | Arg | Pro | 0.01 | Arg | CGGGC(G/C)GGCCG | |||||||
ERCC2 | 23 | 751* | Lys | Gln | 0.32 | Gln | CGCTG(A/C)AGAGG | |||||||
ERCC3 | No variants | |||||||||||||
ERCC4 | 7 | 379* | Pro | Ser | 0.03 | Pro | GCAAC(C/T)CAAAG | |||||||
ERCC4 | 8 | 415 | Arg | Gln | 0.06 | Arg | TGACC(G/A)AACAT | |||||||
ERCC5 | 4 | 141 | Asn | Asp | 0.02 | Asp | GAGAA(A/G)ACGAC | |||||||
ERCC5 | 7 | 254 | Met | Val | 0.012 | Met | AGGAA(A/G)TGAAT | |||||||
ERCC5 | 8 | 529 | Cys | Ser | 0.03 | Arg | AACTT(G/C)TACAA | |||||||
ERCC5 | 8 | 597 | Val | Leu | 0.011 | Met | AGGCA(G/C)TAGAT | |||||||
ERCC5 | 10 | 761 | Ala | Thr | 0.011 | Ala | GGATC(G/A)CTGCT | |||||||
ERCC5 | 15 | 1090 | Glu | Asp | 0.011 | Asp | GGGGA(G/C)ACCTG | |||||||
ERCC5 | 15 | 1104 | Asp | His | 0.18 | Asp | GTGAA(G/C)ATGCT | |||||||
LIG1 | See above | |||||||||||||
PCNA | See above | |||||||||||||
POLD1 | See above | |||||||||||||
POLD2 | See above | |||||||||||||
RAD23A | 5 | 200 | Thr | Met | 0.03 | Thr | GCTCA(C/T)GGGAA | |||||||
RAD23B | 7 | 249 | Ala | Val | 0.10 | Ala | TGGGG(C/T)TCCTC | |||||||
XPA | 6 | 256 | Met | Ile | 0.01 | Met | GACAT(G/C)TACCG | |||||||
XPC | 1 | 16 | Leu | Val | 0.04 | none | GCGAA(C/G)TGCGC | |||||||
XPC | 2 | 48 | Leu | Phe | 0.04 | Ser | GCCTT(C/T)TCTCC | |||||||
XPC | 8 | 492 | Arg | His | 0.04 | Arg | CCATC(G/A)TAAGG | |||||||
XPC | 8 | 499 | Ala | Val | 0.24 | Ala | GCCAG(C/T)GGCAT | |||||||
XPC | 15 | 939 | Lys | Gln | 0.38 | Lys | TTGAG(A/C)AGCTG |
DSB/RR . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LIG3 | See above | |||||||||||||
NBS1 | 4 | 142 | Asn | Ser | 0.006 | Asn | AAACA(A/G)TTGGA | |||||||
NBS1 | 5 | 185 | Gln | Glu | 0.34 | Glu | CAGTT(G/C)AGTCC | |||||||
NBS1 | 6 | 196 | Phe | Val | 0.005 | Phe | AAAGT(T/G)TTTAC | |||||||
NBS1 | 6 | 216 | Gln | Lys | 0.005 | His | GACGG(C/A)AGGAA | |||||||
NBS1 | 14 | 716 | Asn | Asp | 0.013 | Asn | GAAAG(A/G)ATACA | |||||||
RAD50 | 5 | 191 | Thr | Ile | 0.01 | Thr | AGAAA(C/T)ACTTC | |||||||
RAD50 | 16 | 884 | Arg | His | 0.024 | Arg | ACGTC(G/A)TCAGC | |||||||
RAD50 | 24 | 1239 | Arg | Gln | 0.01 | Arg | TGACC(G/A)AGAAA | |||||||
RAD51 | No variants | |||||||||||||
XRCC2 | 2 | 16 | Ala | Ser | 0.01 | Ala | TCCTT(G/T)CCCGA | |||||||
XRCC2 | 3 | 188 | Arg | His | 0.05 | Arg | CTATC(G/A)CCTGG | |||||||
XRCC3 | 7 | 241* | Thr | Met | 0.43 | Thr | GGCCA(C/T)GCTGC | |||||||
XRCC4 | 1 | 12 | Ser | Cys | 0.01 | Ser | TGTTT(C/G)TGAAC | |||||||
XRCC4 | 2 | 75 | Leu | Ser | 0.006 | Val | ATTGT(T/C)GTCAG | |||||||
XRCC4 | 3 | 134 | Ile | Thr | 0.016 | Ile | CACCA(T/C)TGCAG | |||||||
XRCC4 | 3 | 137 | Asn | Cys | 0.005 | Lys | GAAAA(T/G)CAAGC | |||||||
XRCC4 | 5 | 247 | Ala | Ser | 0.08 | Ala | GGTTG(G/T)CTTCA |
DSB/RR . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LIG3 | See above | |||||||||||||
NBS1 | 4 | 142 | Asn | Ser | 0.006 | Asn | AAACA(A/G)TTGGA | |||||||
NBS1 | 5 | 185 | Gln | Glu | 0.34 | Glu | CAGTT(G/C)AGTCC | |||||||
NBS1 | 6 | 196 | Phe | Val | 0.005 | Phe | AAAGT(T/G)TTTAC | |||||||
NBS1 | 6 | 216 | Gln | Lys | 0.005 | His | GACGG(C/A)AGGAA | |||||||
NBS1 | 14 | 716 | Asn | Asp | 0.013 | Asn | GAAAG(A/G)ATACA | |||||||
RAD50 | 5 | 191 | Thr | Ile | 0.01 | Thr | AGAAA(C/T)ACTTC | |||||||
RAD50 | 16 | 884 | Arg | His | 0.024 | Arg | ACGTC(G/A)TCAGC | |||||||
RAD50 | 24 | 1239 | Arg | Gln | 0.01 | Arg | TGACC(G/A)AGAAA | |||||||
RAD51 | No variants | |||||||||||||
XRCC2 | 2 | 16 | Ala | Ser | 0.01 | Ala | TCCTT(G/T)CCCGA | |||||||
XRCC2 | 3 | 188 | Arg | His | 0.05 | Arg | CTATC(G/A)CCTGG | |||||||
XRCC3 | 7 | 241* | Thr | Met | 0.43 | Thr | GGCCA(C/T)GCTGC | |||||||
XRCC4 | 1 | 12 | Ser | Cys | 0.01 | Ser | TGTTT(C/G)TGAAC | |||||||
XRCC4 | 2 | 75 | Leu | Ser | 0.006 | Val | ATTGT(T/C)GTCAG | |||||||
XRCC4 | 3 | 134 | Ile | Thr | 0.016 | Ile | CACCA(T/C)TGCAG | |||||||
XRCC4 | 3 | 137 | Asn | Cys | 0.005 | Lys | GAAAA(T/G)CAAGC | |||||||
XRCC4 | 5 | 247 | Ala | Ser | 0.08 | Ala | GGTTG(G/T)CTTCA |
Damage recognition and cell cycle checkpoints . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDK4 | 2 | 20 | Val | Leu | 0.03 | Val | GGACA(G/T)TGTAC | |||||||
CDKN2A | 2 | 148 | Ala | Thr | 0.05 | Leu | ATGCC(G/A)CGGAA | |||||||
FANCG | 7 | 297 | Thr | Ile | 0.01 | Ala | CACAA(C/T)AGCAG | |||||||
FANCG | 8 | 330 | Pro | Ser | 0.01 | Pro | TACTG(C/T)CACCA | |||||||
FANCG | 9 | 378 | Ser | Leu | 0.02 | Ser | TAGCT(C/T)GGAGC | |||||||
FANCG | 10 | 464 | Val | Phe | 0.01 | Ser | CCTGG(G/T)TTCAA | |||||||
RAD52 | 4 | 70 | Arg | Trp | 0.005 | Arg | GTCAT(C/T)GGGTA | |||||||
RAD52 | 7 | 221 | Gln | Glu | 0.011 | Glu | TGCAG(C/G)AGGTG | |||||||
RAD52 | 8 | 287 | Ser | Asn | 0.05 | His | GAAGA(G/A)TGAGG | |||||||
RAD54L | 4 | 74 | Ile | Met | 0.006 | Ile | TTTAT(T/G)CGAAG | |||||||
RAD54L | 7 | 202 | Arg | Cys | 0.006 | Arg | TTTTA(C/T)GCCAG | |||||||
RAD54L | 10 | 380 | Arg | Gln | 0.011 | Arg | GGAGC(G/A)GCTGC | |||||||
RAD54L | 16 | 583 | Ile | Thr | 0.01 | Ile | TCTCA(T/C)TGGGG | |||||||
MLH1 | 8 | 213 | Val | Met | 0.04 | Val | CAACC(G/A)TGGGAC | |||||||
MLH1 | 8 | 219 | Ile | Val | 0.12 | Ile | GCTCC(A/G)TCTTT | |||||||
MLH1 | 11 | 325 | Arg | Gln | 0.01 | Arg | GGAGC(G/A)GGTGC | |||||||
MLH1 | 11 | 326 | Val | Ala | 0.01 | Val | GCGGG(T/C)GCAGC | |||||||
MLH1 | 12 | 452 | Thr | Ser | 0.01 | Leu | ATACA(A/T)CAAAG | |||||||
MLH1 | 16 | 618 | Lys | Ala | 0.01 | Lys | AGAAG(A/G)&(A/C)GGCTG | |||||||
MLH1 | 16 | 618 | Lys | Thr or Glu | 0.01 | Lys | AGAAG(A/G)or(A/C)GGCTG | |||||||
MLH1 | 19 | 718 | His | Tyr | 0.01 | His | TGGAA(C/T)ACATTG | |||||||
MSH2 | 3 | 127 | Asn | Ser | 0.006 | Asn | TGGCA(A/G)TCTCT | |||||||
MSH2 | 3 | 170 | Gln | Glu | 0.006 | Gln | CCATA(C/G)AGAGG | |||||||
MSH2 | 6 | 319 | Asp | Val | 0.009 | Asp | TGAAG(A/T)TACCA | |||||||
MSH2 | 6 | 322 | Gly | Asp | 0.011 | Gly | CACTG(G/A)CTCTC | |||||||
MSH2 | 7 | 390 | Leu | Phe | 0.005 | Leu | ACCGA(C/T)TTGCC |
Damage recognition and cell cycle checkpoints . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CDK4 | 2 | 20 | Val | Leu | 0.03 | Val | GGACA(G/T)TGTAC | |||||||
CDKN2A | 2 | 148 | Ala | Thr | 0.05 | Leu | ATGCC(G/A)CGGAA | |||||||
FANCG | 7 | 297 | Thr | Ile | 0.01 | Ala | CACAA(C/T)AGCAG | |||||||
FANCG | 8 | 330 | Pro | Ser | 0.01 | Pro | TACTG(C/T)CACCA | |||||||
FANCG | 9 | 378 | Ser | Leu | 0.02 | Ser | TAGCT(C/T)GGAGC | |||||||
FANCG | 10 | 464 | Val | Phe | 0.01 | Ser | CCTGG(G/T)TTCAA | |||||||
RAD52 | 4 | 70 | Arg | Trp | 0.005 | Arg | GTCAT(C/T)GGGTA | |||||||
RAD52 | 7 | 221 | Gln | Glu | 0.011 | Glu | TGCAG(C/G)AGGTG | |||||||
RAD52 | 8 | 287 | Ser | Asn | 0.05 | His | GAAGA(G/A)TGAGG | |||||||
RAD54L | 4 | 74 | Ile | Met | 0.006 | Ile | TTTAT(T/G)CGAAG | |||||||
RAD54L | 7 | 202 | Arg | Cys | 0.006 | Arg | TTTTA(C/T)GCCAG | |||||||
RAD54L | 10 | 380 | Arg | Gln | 0.011 | Arg | GGAGC(G/A)GCTGC | |||||||
RAD54L | 16 | 583 | Ile | Thr | 0.01 | Ile | TCTCA(T/C)TGGGG | |||||||
MLH1 | 8 | 213 | Val | Met | 0.04 | Val | CAACC(G/A)TGGGAC | |||||||
MLH1 | 8 | 219 | Ile | Val | 0.12 | Ile | GCTCC(A/G)TCTTT | |||||||
MLH1 | 11 | 325 | Arg | Gln | 0.01 | Arg | GGAGC(G/A)GGTGC | |||||||
MLH1 | 11 | 326 | Val | Ala | 0.01 | Val | GCGGG(T/C)GCAGC | |||||||
MLH1 | 12 | 452 | Thr | Ser | 0.01 | Leu | ATACA(A/T)CAAAG | |||||||
MLH1 | 16 | 618 | Lys | Ala | 0.01 | Lys | AGAAG(A/G)&(A/C)GGCTG | |||||||
MLH1 | 16 | 618 | Lys | Thr or Glu | 0.01 | Lys | AGAAG(A/G)or(A/C)GGCTG | |||||||
MLH1 | 19 | 718 | His | Tyr | 0.01 | His | TGGAA(C/T)ACATTG | |||||||
MSH2 | 3 | 127 | Asn | Ser | 0.006 | Asn | TGGCA(A/G)TCTCT | |||||||
MSH2 | 3 | 170 | Gln | Glu | 0.006 | Gln | CCATA(C/G)AGAGG | |||||||
MSH2 | 6 | 319 | Asp | Val | 0.009 | Asp | TGAAG(A/T)TACCA | |||||||
MSH2 | 6 | 322 | Gly | Asp | 0.011 | Gly | CACTG(G/A)CTCTC | |||||||
MSH2 | 7 | 390 | Leu | Phe | 0.005 | Leu | ACCGA(C/T)TTGCC |
MMR . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MSH2 | 13 | 735 | Ile | Val | 0.006 | Ile | CTTCT(A/G)TCCTC | |||||||
MSH3 | 8 | 429 | Ala | Val | 0.04 | Met | AGAGG(C/T)GCTCA | |||||||
MSH3 | 9 | 456 | Tyr | Cys | 0.006 | Tyr | TGAAT(A/G)CAGCC | |||||||
MSH3 | 10 | 514 | Glu | Lys | 0.05 | Glu | AACCT(G/A)AGAAT | |||||||
MSH3 | 13 | 597 | Ser | Asn | 0.014 | Ser | ATCTA(G/A)TGTGT | |||||||
MSH3 | 15 | 700 | Phe | Leu | 0.005 | Phe | CTGAC(T/C)TCCCT | |||||||
MSH3 | 21 | 931 | Gly | Cys | 0.006 | Gly | GGATG(G/T)GTGCT | |||||||
MSH3 | 21 | 940 | Arg | Gln | 0.10 | Arg | AGGAC(G/A)GAGTA | |||||||
MSH3 | 23 | 1036 | Thr | Ala | 0.30 | Asp | CAGGC(A/G)CAGCA | |||||||
MSH6 | 1 | 25 | Ala | Val | 0.003 | Ala | CTCGG(C/T)CAGGG | |||||||
MSH6 | 1 | 39 | Gly | Glu | 0.24 | Gly | CCCCG(G/A)GGCCT | |||||||
MSH6 | 4 | 396 | Leu | Val | 0.009 | Leu | CTACA(C/G)TCTAT | |||||||
MSH6 | 4 | 878 | Val | Ala | 0.016 | Val | AGAAG(T/C)TGCTG | |||||||
MSH6 | 6 | 1152 | Val | Ile | 0.009 | Val | TAGCT(G/A)TAATG |
MMR . | . | . | . | . | . | . | . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MSH2 | 13 | 735 | Ile | Val | 0.006 | Ile | CTTCT(A/G)TCCTC | |||||||
MSH3 | 8 | 429 | Ala | Val | 0.04 | Met | AGAGG(C/T)GCTCA | |||||||
MSH3 | 9 | 456 | Tyr | Cys | 0.006 | Tyr | TGAAT(A/G)CAGCC | |||||||
MSH3 | 10 | 514 | Glu | Lys | 0.05 | Glu | AACCT(G/A)AGAAT | |||||||
MSH3 | 13 | 597 | Ser | Asn | 0.014 | Ser | ATCTA(G/A)TGTGT | |||||||
MSH3 | 15 | 700 | Phe | Leu | 0.005 | Phe | CTGAC(T/C)TCCCT | |||||||
MSH3 | 21 | 931 | Gly | Cys | 0.006 | Gly | GGATG(G/T)GTGCT | |||||||
MSH3 | 21 | 940 | Arg | Gln | 0.10 | Arg | AGGAC(G/A)GAGTA | |||||||
MSH3 | 23 | 1036 | Thr | Ala | 0.30 | Asp | CAGGC(A/G)CAGCA | |||||||
MSH6 | 1 | 25 | Ala | Val | 0.003 | Ala | CTCGG(C/T)CAGGG | |||||||
MSH6 | 1 | 39 | Gly | Glu | 0.24 | Gly | CCCCG(G/A)GGCCT | |||||||
MSH6 | 4 | 396 | Leu | Val | 0.009 | Leu | CTACA(C/G)TCTAT | |||||||
MSH6 | 4 | 878 | Val | Ala | 0.016 | Val | AGAAG(T/C)TGCTG | |||||||
MSH6 | 6 | 1152 | Val | Ile | 0.009 | Val | TAGCT(G/A)TAATG |
Summary of non-amino acid substitution variants disrupting protein structure
Gene name . | Exon . | Codon . | Common residue . | Variant residue . | Allele frequency . | Mouse residue . | cDNA sequence 5′→3′ . |
---|---|---|---|---|---|---|---|
RAD52 | 10 | 346 | Ser | Stop | 0.033 | Leu | GCCCT(C/A)GTCTA |
RAD52 | 11 | 415 | Tyr | Stop | 0.041 | Leu | AAATA(T/G)GATCC |
RAD50 | 8 | 363 | QEHIa | QE(Q)HI | 0.005 | QEHI | TCAAG(AAC/AACAAC)ATATC |
RAD50 | 15 | 826 | Gln | Stop | 0.005 | Gln | CTGTC(C/T)AACAA |
MRE11A | 8 | 333 | Gln | Stop | 0.009 | Gln | CCATA(C/T)AAAGC |
MSH6 | 16 | 1357 | TLIKEL(STOP)b | TLID(STOP) | 0.020 | ALINGL(STOP) | GACT(TTGA/TTGATTGA)TTAAG |
Gene name . | Exon . | Codon . | Common residue . | Variant residue . | Allele frequency . | Mouse residue . | cDNA sequence 5′→3′ . |
---|---|---|---|---|---|---|---|
RAD52 | 10 | 346 | Ser | Stop | 0.033 | Leu | GCCCT(C/A)GTCTA |
RAD52 | 11 | 415 | Tyr | Stop | 0.041 | Leu | AAATA(T/G)GATCC |
RAD50 | 8 | 363 | QEHIa | QE(Q)HI | 0.005 | QEHI | TCAAG(AAC/AACAAC)ATATC |
RAD50 | 15 | 826 | Gln | Stop | 0.005 | Gln | CTGTC(C/T)AACAA |
MRE11A | 8 | 333 | Gln | Stop | 0.009 | Gln | CCATA(C/T)AAAGC |
MSH6 | 16 | 1357 | TLIKEL(STOP)b | TLID(STOP) | 0.020 | ALINGL(STOP) | GACT(TTGA/TTGATTGA)TTAAG |
The wild-type human sequence is Gln-Glu-His-Ile, the variant sequence is Gln-Glu-Gln-His-Ile, and the mouse sequence is Gln-Glu-His-Ile.
The human wild-type sequence is Thr-Leu-Ile-Lys-Glu-Leu, the variant sequence is Thr-Leu-Ile-Asp, and the mouse sequence is Ala-Leu-Ile-Asn-Gly-Leu.
Summary of variants identified in genes in different DNA repair pathways
Pathway . | No. of genes . | No. of variants . | Variants/gene (range) . | Allele frequency (average) . |
---|---|---|---|---|
BER | 12 | 49 | 0–10 | 0.033 |
NER | 13 | 40 | 0–9 | 0.061 |
DSB/RR | 8 | 22 | 0–5 | 0.054 |
MMR | 4 | 28 | 6–8 | 0.038 |
Damage recognition and cell cycle checkpoint | 5 | 15 | 1–5 | 0.02 |
Totala | 37 | 133 | ||
Average | 3.6 | 0.047 |
Pathway . | No. of genes . | No. of variants . | Variants/gene (range) . | Allele frequency (average) . |
---|---|---|---|---|
BER | 12 | 49 | 0–10 | 0.033 |
NER | 13 | 40 | 0–9 | 0.061 |
DSB/RR | 8 | 22 | 0–5 | 0.054 |
MMR | 4 | 28 | 6–8 | 0.038 |
Damage recognition and cell cycle checkpoint | 5 | 15 | 1–5 | 0.02 |
Totala | 37 | 133 | ||
Average | 3.6 | 0.047 |
LIG1, POLD1, POLD2, and PCNA have functions in both the BER and the NER pathways. LIG3 functions in both BER and DSB/RR. These genes are counted in both pathways in the Table. The total is the number of different genes resequenced and the number of unique variants identified. The count includes both the protein truncation and the amino acid substitution variants.
Contribution of alleles of different frequency to total variation
Allele frequency . | Variant alleles (n) . | Total variation (%) . | Cumulative variation (%) . |
---|---|---|---|
>0.40 | 2 | 14 | 14 |
0.30–0.399 | 5 | 27 | 41 |
0.20–0.299 | 3 | 11 | 52 |
0.10–0.199 | 7 | 16 | 68 |
0.05–0.099 | 8 | 8 | 76 |
0.02–0.049 | 22 | 11 | 88 |
<0.02 | 86 | 12 | 100 |
Allele frequency . | Variant alleles (n) . | Total variation (%) . | Cumulative variation (%) . |
---|---|---|---|
>0.40 | 2 | 14 | 14 |
0.30–0.399 | 5 | 27 | 41 |
0.20–0.299 | 3 | 11 | 52 |
0.10–0.199 | 7 | 16 | 68 |
0.05–0.099 | 8 | 8 | 76 |
0.02–0.049 | 22 | 11 | 88 |
<0.02 | 86 | 12 | 100 |
Distribution of variant alleles per individual identified in screening 12 genes of the BER pathway in 90 individuals
No. of variant alleles/individual . | No. of individuals . |
---|---|
0 | 3 |
1 | 19 |
2 | 19 |
3 | 26 |
4 | 11 |
5 | 5 |
6 | 7 |
No. of variant alleles/individual . | No. of individuals . |
---|---|
0 | 3 |
1 | 19 |
2 | 19 |
3 | 26 |
4 | 11 |
5 | 5 |
6 | 7 |
Examples of genotypes observed in individuals with four variant alleles among 12 genes of the BER pathway
“1” designates the wild-type allele and “2” is the variant allele; heterozygote individuals are shaded light gray with a diagonal line, and homozygous variant individuals are shaded dark gray and boxed. The top line is the wild-type amino acid, line 2 is the variant residue, and the third line indicates the position of the substitution. Additional details regarding the substitutions are in Table 2.

Acknowledgments
We gratefully acknowledge the assistance of Suzanne Duarte, Arlene Gonzales, and Karolyn Burkhart-Schultz in the sequencing and of Linda Ott, Mimi Yeh, and Tom Slezak in the informatics support. We appreciate the many discussions with Dr. David Wilson, III, and the assistance of Dr. Gloria Petersen in supplying the samples from Johns Hopkins University and of Dr. James Selkirk and the Environmental Genome Program of the National Institute of Environmental Health Sciences.