Databases of expressed sequence tags (EST) can be used to screen rapidly for potential polymorphisms in candidate proteins. As part of this study,we screened the gene for the enzyme thymidylate synthase (TS). TS is important physiologically because it is essential for the synthesis of deoxythymidylate, a nucleotide required for DNA synthesis and repair. TS is also a major target for cancer chemotherapeutic drugs, especially the widely used 5-fluorouracil. Using sequence alignment of ESTs, we identified a candidate 6-bp variation at bp 1494 in the 3′-untranslated region of the TS mRNA. This sequence variation occurred in 21 of 34 aligned ESTs at this location, including ESTs from various tissue sources. The presence of this polymorphism was confirmed in a Caucasian population (n = 95) by polymerase chain restriction amplification/RFLP analysis. The allele frequency of the 6-bp deletion was found to be 0.29 (wild-type +6 bp/+6 bp, 48%; +6 bp/−6 bp, 44%; −6 bp/−6 bp, 7%). Although the function of this polymorphism has not yet been investigated, the 3′-untranslated region of a gene can play a role in mRNA stability and translation. This study illustrates an approach to polymorphism discovery in candidate enzymes of physiological interest by searches of publicly available sequence data, a rapid and inexpensive method. The potential functional relevance of the common 6-bp deletion in the TS gene needs to be investigated, because this enzyme is plausibly of major importance not only in cancer treatment but also in cancer prevention.
As part of the human genome project, terabytes of sequence information are stored in public databases. ESTs3 comprise much of the available data; they are partial cDNA sequences that have been generated from many different tissues. ESTs reflect some or all of the transcribed sequence of a gene, which includes the coding sequence as well as 5′- and 3′-UTRs. Currently, the EST database for humans contains more than one million entries and is publicly available.4 As demonstrated by others (1, 2, 3, 4), this resource is a rich source for identifying genetic variations. Whereas most of these researchers used a broad search pattern, it is possible to direct screening of ESTs to specific proteins of physiological interest. On the basis of findings linking folate availability to carcinogenesis,our interest focused on enzymes in folate metabolism.
TS (EC 184.108.40.206) is an enzyme that catalyzes the conversion of deoxyuridylate to deoxythymidylate by simultaneous conversion of 5,10-methylenetetrahydrofolate to dihydrofolate (Fig. 1). Thus, TS is essential for the provision of a nucleotide required for both DNA synthesis and repair. TS is an essential enzyme in proliferating cells and is also an important target for a variety of chemotherapeutic drugs, including 5-FU. Thus, TS plays a major role in cancer therapy and possibly in cancer prevention.
Genetic polymorphisms in the TS gene may result in altered enzyme function. This could affect cancer susceptibility as well as treatment efficacy and the toxicity of antifolate cancer therapeutics. A common polymorphism in another enzyme in folate metabolism,5,10-methylenetetrahydrofolate reductase has been found to be associated with risk of both cancer and preneoplastic lesions; in several studies, this association was modified by folate status,resulting in an increased risk among those with low folate intake (5, 6, 7, 8).
To date, only one functional polymorphism in the TS gene has been reported, and its prevalence in Caucasians is unknown (9). We undertook a search for new genetic polymorphisms in TS by screening public databases of ESTs and identified a common 6-bp deletion in the 3′-UTR of the TS gene.
Materials and Methods
EST Database Search.
The EST database is publicly available on the NCBI web site.4 A search for matching ESTs was carried out with BLAST, which is also available at the NCBI web site (10). The reference sequence used was the TScDNA sequence (GenBank accession no. X02308). The “flat master-slave with identities” alignment view option in BLAST was used to present the EST sequences in a multiple-sequence alignment with the reference sequence. This option provides a simultaneous display of the aligned sequences and the reference sequence, which can be scanned for common sequence variations.
Verification of a TS Candidate Polymorphism in a Caucasian Population.
The presence of a TS candidate polymorphism was verified in a Caucasian population (n = 95). These individuals were initially recruited for a case-control study in Minnesota in 1991–1994 that was approved by the University of Minnesota Institutional Review Board. DNA was obtained from buffy coat as part of the study protocol,part of which was specifically focused on the role of genetic variability in the etiology of colorectal neoplasia.
Genomic DNA was extracted from peripheral WBCs using the Puregene kit(Gentra Systems, Minneapolis, MN). A RFLP spanning the 6-bp insertion or deletion was used to verify the existence of the TS3′-UTR polymorphism at bp 1494. The presence of the 6 bp creates a DraI restriction site. The fragment containing the polymorphism was amplified by PCR using primers 5′CAAATCTGAGGGAGCTGAGT3′ and 5′CAGATAAGTGGCAGTACAGA3′ in a reaction containing 10 mm Tris (pH 8.3), 50 mm KCl, 2.5 mmMgCl2, 150 μmdeoxynucleotide triphosphates, 300 nm each primer, 100 ng of genomic DNA, and 1unit of AmpliTaq DNA polymerase (PE Biosystems, Foster City, CA). The cycling conditions were: 1 cycle of 94°C for 5 min; 30 cycles of 94°C for 30 s, 58°C for 45 s, and 72°C for 45 s; and 1 cycle of 72°C for 5 min. The amplified fragments were digested with DraI and the products separated on a 3% NuSieve agarose gel. The expected fragment sizes are 70 bp and 88 bp for the wild-type allele and 152 bp for the mutant allele. χ2 analysis was performed to test for agreement with Hardy-Weinberg equilibrium.
EST Database Searches.
On the basis of the complete coding sequence of TS (GenBank accession number X02308), we identified 99 matches of human ESTs (Fig. 2).
We reviewed the identified ESTs and their alignment with the coding sequence. As part of our quality control measures, we restricted the retrieved ESTs to the 59 matches where the probability that the match between the observed sequence (EST) and the reference sequence occurred by chance was less than 10−90. On the basis of our review of the specific ESTs, those were all of the relevant ESTs that were successfully aligned over a long stretch of sequence. This quality control measure also excludes sequences with a large number of sequencing errors. As ESTs are usually produced via single-pass automated sequencing, errors are common. The alignment provided several regions of interest, including what appeared as a common 6-bp insertion(compared with the reference coding sequence) in the 3′-UTR at bp 1494(Fig. 3). This sequence variation occurred in 21 of 34 aligned ESTs in this region. Evaluation of the tissue sources of the aligned ESTs showed that the variant ESTs were derived from several different tissue sources and laboratories.
Verification of the New TS Candidate Polymorphism in a Caucasian Population.
We used a RFLP spanning the 6-bp variation to verify the existence of the TS 3′-UTR polymorphism at bp 1494 among 95 Caucasians. Results of the digestion are shown in Fig. 4. Individuals homozygous for the presence of the 6 bp (+6/+6) are characterized by two fragments of 88 and 70 bp, respectively, whereas individuals homozygous for the absence of the 6bp (−6/−6) are characterized by a single fragment of 152 bp in length. Heterozygotes show all three bands. The158-bp band in the heterozygous samples is caused by an undigested wild-type fragment.
Genotype frequencies among 95 Caucasian individuals are shown in Table 1. On the basis of these studies, we established an allele frequency of 0.71 for the +6-bp allele and 0.29 for the −6-bp allele. Given the prevalence of these alleles, we regard the 6-bp deletion as the variant. Seven percent of our study population were homozygous variant for the 6-bp deletion, and the genotype frequencies were in Hardy-Weinberg equilibrium.
This study demonstrates that the human EST database can be a valuable tool for identifying new candidate polymorphism in proteins of specific interest. This approach is cost-effective and requires little time; it has already been used by other investigators, largely in broad-based screening efforts (1, 2, 3). For our directed approach, the search for polymorphisms using EST databases entailed the following steps: (a) identification of a protein/gene of interest; (b) identification of an appropriate reference sequence and its accession number [e.g., via the Online Mendelian Inheritance in Man (OMIM)website5];(c) BLAST search for matching human ESTs at NCBI, and paired alignment with the reference sequence; (d) visual (or automated) search for locations with at least two alternative sequences(nucleotide substitutions, insertions/deletions); and (e)evaluation of the location of the candidate polymorphism (5′-UTR;coding region; or 3′-UTR) and potential effect on amino acid sequence.
ESTs are submitted to the databases as part of the overall genomic analysis and are subject to fewer checks than accepted genomic or coding sequences. Therefore, sequencing errors can be quite common;this requires several quality control measures to increase the likelihood that an observed sequence variation constitutes a true polymorphism rather than a sequencing error. The measures used in this study include (a) the exclusion of sequences with a multitude of differences from the reference sequences (these may be error prone, but may also be expressed from a second related gene with significant sequence similarity); (b) exclusion of regions of an EST with multiple matching errors; (c) restriction to sequences with a very low likelihood of a random match, based on a probability calculated by BLAST; and (d) consideration only of candidate polymorphisms that occur in at least two ESTs from independent tissue sources.
The approach of screening EST databases for polymorphisms is naturally limited by the number of available ESTs and thus is more applicable to widely expressed genes and for common polymorphisms. Similarly, only the transcribed region can be investigated, but this is usually the genomic region of greater interest. A limitation of this approach is that ESTs are usually cloned from the 3′ end [beginning at the poly(A)sequence] and reverse transcription usually does not yield full-length clones. Thus, ESTs containing 3′ sequences are overrepresented. The very 3′ end of an mRNA is noncoding but can contain sequences determining mRNA stability. Although the significance of mutations in the 3′-UTR is not as obvious as mutations resulting in amino acid changes, mRNA turnover can be affected. Differences in mRNA turnover alter steady-state levels of a given mRNA, which, in turn,determines protein expression levels.
TS is essential in the regulation of the balanced supply of the four nucleotides required for the normal replication of DNA and plays a central role in folate metabolism. The importance of an adequate folate supply for cancer prevention has been demonstrated in several recent studies, showing an elevated risk of colon cancer and colorectal adenoma (11, 12, 13, 14, 15, 16, 17), pancreatic cancer (18), and possibly breast cancer (19, 20, 21) associated with low folate status, particularly if combined with high alcohol consumption. The observed association with cancer risk may be attributable to a pathway associated with nucleotide synthesis; folate deficiency can result in deoxynucleotide triphosphate pool disturbances in rats (22, 23), in incorporation of uracil instead of thymidine into DNA, and subsequently in chromosome breaks attributable to transient nicks (24, 25). It is very likely that these effects are mediated by decreased TS activity attributable to a lack of substrate.
Inhibition of TS has antiproliferative effects, a mechanism used by several “antifolate” chemotherapeutic drugs, particularly 5-FU and raltitrexed. It has been shown that resistance to 5-FU, as well as patient survival, is often associated with increased TS expression (26, 27, 28, 29, 30). Thus, functional genetic polymorphisms in TS could be relevant for cancer treatment.
In summary, we report here on a new, common polymorphism in the TS gene identified using a public EST database. The presence of this polymorphism has been confirmed in a Caucasian population. Potential effects of the TS 6-bp deletion at bp 1494 on function have not yet been investigated. Although the 3′-UTR of a gene is not translated into protein, it often plays an important role for maintaining mRNA stability. If changes in the secondary mRNA structure(e.g., folding) take place, translation can also be affected. Others have shown that a common point-mutation polymorphism in the 3′-UTR (polyadenylation signal) of the N-Acetyltransferase 1 (NAT1) gene is associated with altered enzyme activity in bladder and colon tissues (31). It is possible that a deletion of 6 bp in the 3′-UTR affects mRNA stability or secondary mRNA structure and could thus ultimately affect protein levels of TS or response to up-regulation of this enzyme. If this polymorphism is associated with alterations in enzyme activity, it could be of major importance for cancer chemotherapy and possibly for cancer prevention.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
This work was supported by the National Institute of Environmental Health Sciences Center P30, ES-07033.
The abbreviations used are: EST, expressed sequence tag; UTR, untranslated region; TS, thymidylate synthase; 5-FU,5-fluorouracil; NCBI, National Center for Biotechnology Information;BLAST, Basic Local Alignment Search Tool.
|Genotype .||No. of individuals (n = 95) .||Genotype frequency .|
|+6 bp/+6 bp||46||0.48|
|+6 bp/−6 bp||42||0.44|
|−6 bp/−6 bp||7||0.07|
|Genotype .||No. of individuals (n = 95) .||Genotype frequency .|
|+6 bp/+6 bp||46||0.48|
|+6 bp/−6 bp||42||0.44|
|−6 bp/−6 bp||7||0.07|
We would like to thank Angela C. Bush and Sushma S. Thomas for technical assistance with the thymidylate synthasegenotyping and Clayton Hibbert and Mari Nakayoshi for assistance with the graphics.