Abstract
Purpose: The glutathione S-transferases (GSTs) catalyze the glutathione conjugation of reactive electrophiles, including carcinogens and many antineoplastic drugs. GSTT1 and GSTM1 are polymorphically deleted, but the full range of genetic variation in these two genes has not yet been explored. We set out to systematically identify common polymorphisms in GSTT1 and GSTM1, followed by functional genomic studies.
Experimental Design: First, multiplex PCR was used to determine GSTT1 and GSTM1 copy number in 400 DNA samples (100 each from 4 ethnic groups). Exons, splice junctions, and 5′-flanking regions (5′-FR) were then resequenced using DNA samples that contained at least one copy of GSTT1 or GSTM1.
Results: Gene deletion frequencies among ethnic groups were from 33.5% to 73.5% for GSTT1 and from 50.5% to 78.0% for GSTM1. GSTT1 deletion data correlated with the results of mRNA microarray expression studies. The 18 single nucleotide polymorphisms (SNP) observed in GSTT1 included three nonsynonymous coding SNPs (cSNPs) and one single-nucleotide deletion, whereas the 51 GSTM1 SNPs included two nonsynonymous cSNPs. Two of the GSTT1 nonsynonymous cSNPs resulted in decreases in levels of immunoreactive protein to 56% and 12% of wild type (WT), whereas those in GSTM1 resulted in modest increases in protein levels. Reporter gene assays showed that one GSTT1 5′-FR haplotype, with a frequency of 32% in African-American subjects, resulted in an increase in transcription in JEG-3 cells to 351% of that for the WT sequence, and one GSTM1 5′-FR haplotype resulted in an increase in transcription in JEG-3 cells to 129% of WT.
Conclusions: These observations suggest that functionally significant pharmacogenomic variation beyond GSTT1 and GSTM1 gene deletion may contribute to carcinogenesis or individual variation in antineoplastic drug therapy response.
The glutathione S-transferases (GSTs) play multiple roles in the pathogenesis and treatment of cancer. They are capable of detoxifying reactive electrophiles and carcinogens (1), but they can also bioactivate xenobiotics to form genotoxic products (2–4). In addition, they can catalyze the conjugation of many antineoplastic drugs (5). The GST-catalyzed bioactivation of xenobiotics has been exploited in cancer chemotherapy to activate prodrugs such as TLK286 and TER286 (6–9), but glutathione conjugation is also a major mechanism for the detoxification of platinum-based antineoplastic agents, resulting in their inactivation and removal from the cell (10–13). The GSTs can also display “ligandin” behavior—the binding of nonsubstrate ligands (14)—and they have recently been found to participate in cell signaling (15–19). They can contribute to drug resistance as a result of interactions with signaling molecules in the mitogen-activated protein kinase (MAPK) and extracellular signal-regulated kinase (ERK) pathways, even if the antineoplastic drugs themselves are not GST substrates (15, 20).
Many cancer molecular epidemiology studies, more than 1,000 cited in PubMed, have focused on two members of the GST superfamily: GST𝛉1 (GSTT1) and GSTμ1 (GSTM1). A major reason is the fact that the genes encoding these isoforms are polymorphically deleted (21, 22). Those studies attempted to link gene deletion with risk for a wide array of cancers. However, although many studies have analyzed the possible effect of deletion of these genes, there has not been an attempt to comprehensively characterize the full range of common sequence variation in GSTT1 and GSTM1. A better understanding of common variation in the genes encoding these two GST isoforms, as well as the functional consequences of that variation, would provide a foundation for future studies of the possible role of these genes in individual variation in the pathogenesis of cancer or in response to antineoplastic drug therapy.
In the present study, we have applied a genotype-to-phenotype approach to systematically identify common polymorphisms in GSTT1 and GSTM1, followed by the functional characterization of those polymorphisms, as well as mechanisms by which they might alter function. As a first step, two independent multiplex PCR assays were done to determine GSTT1 and GSTM1 copy number polymorphisms (CNP) in each of 400 DNA samples from four different ethnic groups, African-American (AA), Caucasian-American (CA), Han Chinese-American (HCA), and Mexican-American (MA). Next, GSTT1 and GSTM1 exons, splice junctions, and 5′-flanking regions (5′-FR) were resequenced using these DNA samples. Transcriptional activity for common 5′-FR haplotypes was then assessed by the use of reporter gene constructs. Finally, functional genomic studies were done with all variant allozymes encoded by alleles containing nonsynonymous coding single nucleotide polymorphisms (cSNP). These studies provide extensive data with regard to common sequence variation in GSTT1 and GSTM1, together with information with regard to the functional consequences of that variation. These results will form the basis for future genotype-phenotype association studies directed at both cancer risk and inherited variation in antineoplastic drug response phenotypes.
Materials and Methods
DNA samples. DNA samples from 100 CA, 100 AA, 100 HCA, and 100 MA subjects (sample sets HD100CAU, HD100AA, HD100CHI, and HD100MEX) were obtained from the Coriell Cell Repository.5
Detailed descriptions of these sample sets can be found at http://ccr.coriell.org/nigms/.
GSTT1 and GSTM1 CNP assays. Two different and independent multiplex PCR assays for both GSTT1 and GSTM1 were done to determine gene copy number in each DNA sample. In the first assay, multiplex PCR with primers located in regions flanking GSTT1 or GSTM1 were used to amplify a long amplicon when the gene was absent, whereas primers located within the gene would produce a shorter amplicon in the presence of the gene (23). The amplicons were separated by agarose gel electrophoresis and visualized by ethidium bromide staining to determine whether zero, one, or two copies of the gene were present. Primer sequences and amplification conditions for these experiments are listed in the Supplementary Material. These results were then verified by fluorescent-based fragment analysis, which involved the simultaneous amplification of GSTT1 or GSTM1, with GSTT2 or GSTM4 as reference genes, using fluorescent primers (24). Peak heights were then analyzed using an ABI 3100 sequencer (Applied Biosystems). For GSTM1, two additional sets of fluorescent primers were designed that hybridized near the 5′-ends of GSTM1 and GSTM2 and near the 3′-ends of GSTM1 and GSTM5. The sequences of primers for these reactions, as well as reaction conditions, can also be found in the Supplementary Material. CNP frequencies were then calculated for each ethnic group.
GSTT1 and GSTM1 gene resequencing. The PCR was used to amplify all GSTT1 and GSTM1 exons, intron-exon splice junctions, and a portion of the 5′-FR for all samples that contained at least one copy of either gene. Amplification conditions and primer sequences are listed in the Supplementary Material. Amplicons were sequenced on both strands in the Mayo Molecular Biology Core Facility using dye terminator sequencing chemistry. The sequence chromatograms were analyzed using Mutation Surveyor (Softgenetics). Polymorphisms observed only once, as well as any ambiguous sequences, were confirmed by performing independent amplifications, followed by DNA sequencing.
GSTT1 microarray analysis. Lymphoblastoid cell lines from which the DNA samples used for the gene resequencing and copy number studies had been obtained were acquired from the Coriell Cell Repository. Total RNA was extracted from the first 60 cell lines in each of the four populations using the RNeasy kit (Qiagen). RNA quality assessment was done using the Agilent 2100 bioanalyzer before microarray analysis. All RNA samples had Agilent RNA Integrity Number (RIN) values >9.0. The RNA was then reverse-transcribed and biotin labeled for hybridization with Affymetrix U133 Plus 2.0 GeneChips (Affymetrix). The microarray images were analyzed using quality control techniques established in the Mayo Clinic Microarray Core Facility, and the data were normalized using fastlo, a type of cyclic loess normalization (25). Data from probe set 203815_at, corresponding to GSTT1, were used in these analyses.
GSTT1 and GSTM1 reporter gene assays. Luciferase reporter gene constructs were created to study the possible effects of GSTT1 and GSTM1 5′-FR polymorphisms and haplotypes on transcription. For GSTT1, ∼1 kb of the wild-type (WT) sequence was amplified from a DNA sample with known sequence. The primers used to perform this amplification contained MluI and XhoI restriction sites, and amplicons were digested with these enzymes, followed by purification with the QIAquick Gel Purification Kit (Qiagen). The digestion products were cloned into pGL3 Basic (Promega), 5′ of the firefly luciferase open reading frame (ORF). Constructs with variant sequences were created by site-directed mutagenesis done using circular PCR. For GSTM1, all reporter gene constructs were obtained by the amplification of DNA segments from samples with known haplotypes using primers that contained MluI and XhoI restriction sites. Sequences of inserts in the reporter gene constructs were verified by sequencing in both directions. JEG-3 and HEK 293T/17 cells (American Type Culture Collection) were transfected with these constructs. Specifically, 2 μg of purified plasmid DNA and 0.5 μg of pRL-TK (Promega) DNA encoding Renilla luciferase were co-transfected to make it possible to correct for possible variation in transfection efficiency. The cells were also transfected with empty pGL3 Basic vector or with pGL3 control. Transfections were done using the TransFast reagent (Promega). The Dual-Luciferase Reporter Assay Kit (Promega) was used to analyze cell lysates, and results were expressed as the ratio of firefly luciferase light units to Renilla luciferase light units. All transfections were done in triplicate at least twice, and the results reported represent the mean ± SE of those six independent transfections.
GSTT1 and GSTM1 expression constructs and transient expression. A WT expression construct for each gene was created by amplifying human liver cDNA and cloning the full-length cDNA ORF into the expression vector pcDNA4/HisMax (Invitrogen). Site-directed mutagenesis done with the circular PCR was then used to create variant allozyme constructs. Sequences of all inserts were confirmed by sequencing in both directions. Expression constructs for the WT and variant allozyme cDNAs were then transfected into COS-1 cells in serum-free DMEM using the TransFast reagent (Promega) at a charge ratio of 3:1. Specifically, 7 μg of construct DNA was co-transfected with 7 μg of pSV–β-galactosidase DNA (Promega) as a control to make it possible to correct for possible variation in transfection efficiency. After 48 h, the cells were washed with PBS, resuspended in 1 mL of homogenization buffer, and were lysed with a Polytron homogenizer (Brinkmann Instruments). The homogenates were centrifuged at 100,000 × g at 4°C for 1 h. The resulting cytosol preparations were stored at −80°C.
Western blot analysis. Levels of immunoreactive protein were determined for each recombinant GSTT1 and GSTM1 allozyme by performing quantitative Western blot analysis. A mouse monoclonal anti-His antibody (Sigma-Aldrich) was used to visualize the protein bands. Specifically, COS-1 cell cytosol was loaded onto 12% SDS mini-gels (Bio-Rad) on the basis of β-galactosidase activity levels to correct for any variation in transfection efficiency. Electrophoresis was done for 90 min at 120 V, followed by the transfer of proteins to nitrocellulose membranes. After blocking for 2 h with 5% powdered nonfat milk in TBS with Tween 20 (TBST), the membranes were incubated overnight with primary antibody diluted 1:20,000 with 5% powdered milk in TBST. The next morning, after three washes, goat anti-mouse horseradish peroxidase antibody (Bio-Rad) was applied for 2 h at a dilution of 1:10,000, followed by three washes. The ECL Western Blotting System (Amersham Biosciences) was used to detect bound antibody by enhanced chemiluminescence. The Western blot data were analyzed with the AutoChemi System (UVP BioImaging Systems). Multiple blots were done for each allozyme, and the results were expressed as a percentage of the intensity of the WT allozyme on the same gel, reported as mean ± SE.
In vitro translation/degradation assay. Transcription and translation of GSTT1 allozymes were done with the TnT Coupled Rabbit Reticulocyte Lysate (RRL) System (Promega), in the presence of 35S-methionine and cysteine (1,000 Ci/mmol, 2.5 mCi total activity; Amersham Biosciences). This reaction mixture was incubated at 37°C for 90 min, and 5-μL aliquots were used to perform SDS-PAGE. After transcription and translation of the allozymes, protein degradation experiments were done as described previously (26). Specifically, 10 μL of in vitro translated 35S-methionine– and 35S-cysteine–labeled protein was added to 40 μL of an ATP-generating system and 40 μL of untreated RRL. During incubation at 37°C, 10-μL aliquots were removed at 0, 4, 8, and 24 h, followed by SDS-PAGE and autoradiography. Radioactively labeled protein was quantified using the AutoChemi System (UVP BioImaging Systems). The rapidly degraded protein TPMT*3A allozyme (27) was used as a positive control.
Semiquantitative reverse transcription-PCR analysis. The RNeasy Mini Kit (Qiagen) was used to isolate mRNA from COS-1 cells transfected with constructs encoding GSTT1 variant allozymes. The cells were co-transfected with β-galactosidase to correct for transfection efficiency. Reverse transcription-PCR (RT-PCR) was then done with primers for GSTT1 and, as a control, β-galactosidase (see Supplementary Material for primer sequences).
Data analysis. Linkage disequilibrium was determined by calculating D′ values (28, 29), and intragene haplotypes for samples that had two copies of the gene were inferred using the method described by Schaid et al. (30). These data were used, together with information from samples with only a single copy of the gene, to calculate overall haplotype frequencies. Group mean values were compared using Student's t test. Linear regression was done to determine if the GSTT1 mean expression level versus copy number fits a linear model.
Results
GSTT1 and GSTM1 CNP. Striking variation was observed among ethnic groups in the frequency of GSTT1 and GSTM1 gene deletions (Table 1). For example, the frequency of GSTT1 deletion ranged from a low of 33.5% in the CA population to a high of 73.5% in the HCA population, whereas the frequency of GSTM1 deletion ranged from a low of 50.5% in the AA population to a high of 78.0% in the HCA subjects, reported as a percentage of alleles deleted out of a total of 200 potential alleles studied in each population sample. Duplication of GSTM1, resulting in a total of three copies of the gene, was observed in one sample from an AA subject.
Number . | Polymorphism location . | Nucleotide sequence change . | Amino acid sequence change . | Minor allele frequencies (of copies present) . | . | . | . | Present in databases . | rs number . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | . | AA . | CA . | HCA . | MA . | . | . | |||||||||
GSTT1 | ||||||||||||||||||
Null allele frequency | 0.485 | 0.335 | 0.735 | 0.385 | ||||||||||||||
1 | 5′- FR (−714) | C to T | 0.010 | 0.015 | 0.000 | 0.000 | Yes | |||||||||||
2 | 5′- FR (−476) | G to A | 0.000 | 0.008 | 0.000 | 0.000 | No | |||||||||||
3 | 5′- FR (−295) | C to G | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
4 | 5′- FR (−56) | G to C | 0.320 | 0.000 | 0.000 | 0.008 | Yes | |||||||||||
5 | exon 1 (13) | C to T | 0.010 | 0.023 | 0.000 | 0.008 | Yes | rs1130990 | ||||||||||
6 | IVS 1 (−178) | C to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
7 | IVS 1 (−151) | G to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
8 | IVS 1 (−24) | C to T | 0.029 | 0.000 | 0.000 | 0.000 | No | |||||||||||
9 | IVS 1 (−14) | T to A | 0.000 | 0.000 | 0.000 | 0.008 | No | |||||||||||
10 | exon 2 (127) | G to A | Asp43Asn | 0.010 | 0.000 | 0.000 | 0.000 | No | ||||||||||
11 | exon 2 (177) | C to T | 0.000 | 0.008 | 0.000 | 0.000 | No | |||||||||||
12 | exon 2 (194) | C to T | Thr65Met | 0.000 | 0.000 | 0.000 | 0.008 | No | ||||||||||
13 | IVS 2 (−13) | C to T | 0.136 | 0.000 | 0.000 | 0.008 | Yes | rs2234734 | ||||||||||
14 | exon 3 (225) | G to A | 0.010 | 0.000 | 0.000 | 0.008 | No | |||||||||||
15 | exon 4 (412) | del G | * | 0.000 | 0.000 | 0.000 | 0.008 | No | ||||||||||
16 | exon 4 (505) | G to A | Val169Ile | 0.126 | 0.000 | 0.000 | 0.008 | Yes | rs2266637 | |||||||||
17 | IVS 4 (−87) | del CCT | 0.029 | 0.000 | 0.000 | 0.000 | No | |||||||||||
18 | exon 5 (573) | G to A | 0.000 | 0.000 | 0.000 | 0.008 | No | |||||||||||
GSTM1 | ||||||||||||||||||
Null allele frequency | 0.505 | 0.710 | 0.780 | 0.660 | ||||||||||||||
19 | 5′-FR (−676) | C to T | 0.121 | 0.000 | 0.116 | 0.029 | No | |||||||||||
20 | 5-'FR (−673) | G to A | 0.121 | 0.000 | 0.116 | 0.029 | No | |||||||||||
21 | 5′-FR (−671) | G to A | 0.121 | 0.000 | 0.116 | 0.029 | No | |||||||||||
22 | 5′-FR (−668) | C to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
23 | 5′-FR (−552) | C to G | 0.303 | 0.119 | 0.163 | 0.132 | Yes | |||||||||||
24 | 5′-FR (−540) | C to G | 0.172 | 0.085 | 0.116 | 0.088 | No | |||||||||||
25 | 5′-FR (−525) | C to T | 0.040 | 0.085 | 0.000 | 0.074 | No | |||||||||||
26 | 5′-FR (−480) | G to A | 0.429 | 0.661 | 0.674 | 0.750 | Yes | |||||||||||
27 | 5′-FR (−398) | C to T | 0.156 | 0.424 | 0.326 | 0.294 | Yes | |||||||||||
28 | 5′-FR (−397) | A to T | 0.156 | 0.424 | 0.326 | 0.294 | Yes | |||||||||||
29 | 5′-FR (−395) | C to T | 0.010 | 0.220 | 0.000 | 0.029 | No | |||||||||||
30 | 5′-FR (−393) | C to T | 0.469 | 0.339 | 0.674 | 0.544 | Yes | |||||||||||
31 | 5′-FR (−358) | G to A | 0.177 | 0.186 | 0.302 | 0.250 | No | |||||||||||
32 | 5′-FR (−273) | del T | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
33 | 5′-FR (−268) | A to T | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
34 | 5′-FR (−267) | ins G | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
35 | 5′-FR (−258) | G to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
36 | 5′-FR (−222) | T to A | 0.042 | 0.000 | 0.000 | 0.044 | No | |||||||||||
37 | 5′-FR (−218) | T to C | 0.281 | 0.288 | 0.000 | 0.044 | No | |||||||||||
38 | 5′-FR (−190) | G to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
39 | 5′-FR (−165) | A to G | 0.131 | 0.000 | 0.000 | 0.029 | No | |||||||||||
40 | 5′-FR (−109) | C to T | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
41 | 5′-UTR (−54) | C to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
42 | IVS 1 (38) | A to G | 0.212 | 0.254 | 0.000 | 0.029 | No | |||||||||||
43 | IVS 1 (97) | C to T | 0.081 | 0.000 | 0.000 | 0.015 | No | |||||||||||
44 | IVS 1 (−117) | G to A | 0.000 | 0.000 | 0.023 | 0.000 | No | |||||||||||
45 | IVS 1 (−78) | T to A | 0.051 | 0.000 | 0.000 | 0.015 | No | |||||||||||
46 | IVS 1 (−25) | G to A | 0.192 | 0.136 | 0.000 | 0.088 | Yes | rs10857795 | ||||||||||
G to C | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||||
47 | IVS 1 (−21) | C to G | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||
48 | IVS 1 (−19) | T to C | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||
49 | IVS 3 (21) | G to A | 0.040 | 0.000 | 0.000 | 0.044 | No | |||||||||||
50 | IVS 3 (48) | C to G | 0.020 | 0.000 | 0.000 | 0.000 | No | |||||||||||
51 | IVS 3 (−98) | A to C | 0.051 | 0.000 | 0.000 | 0.015 | No | |||||||||||
52 | IVS 3 (−78) | T to C | 0.202 | 0.271 | 0.614 | 0.529 | Yes | |||||||||||
53 | IVS 3 (−58) | G to C | 0.000 | 0.000 | 0.000 | 0.015 | No | |||||||||||
54 | IVS 3 (−21) | G to C | 0.000 | 0.000 | 0.000 | 0.015 | No | |||||||||||
55 | IVS 3 (−18) | G to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
56 | Exon 4 (222) | C to T | 0.000 | 0.000 | 0.000 | 0.015 | No | |||||||||||
57 | Exon 4 (254) | A to G | Asn85Ser | 0.000 | 0.068 | 0.000 | 0.000 | No | ||||||||||
58 | IVS 4 (26) | G to A | 0.020 | 0.237 | 0.500 | 0.206 | Yes | rs4147565 | ||||||||||
59 | IVS 4 (53) | G to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
60 | IVS 5 (140) | C to T | 0.333 | 0.186 | 0.386 | 0.279 | Yes | rs4147566 | ||||||||||
61 | IVS 5 (195) | C to T | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||
62 | IVS 5 (−92) | A to G | 0.192 | 0.136 | 0.000 | 0.088 | Yes | |||||||||||
63 | IVS 5 (−59) | G to A | 0.020 | 0.000 | 0.000 | 0.000 | No | |||||||||||
64 | exon 6 (396) | T to C | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
65 | IVS 6 (−19) | T to C | 0.111 | 0.305 | 0.000 | 0.044 | No | |||||||||||
66 | Exon 7 (519) | G to C | Lys173Asn | 0.071 | 0.254 | 0.614 | 0.485 | Yes | rs1065411 | |||||||||
67 | Exon 7 (528) | C to T | 0.323 | 0.271 | 0.386 | 0.353 | Yes | rs1056806 | ||||||||||
68 | IVS 7 (−56) | C to G | 0.020 | 0.017 | 0.000 | 0.015 | Yes | |||||||||||
69 | 3′-UTR (888) | C to T | 0.424 | 0.288 | 0.000 | 0.059 | Yes | rs17672 |
Number . | Polymorphism location . | Nucleotide sequence change . | Amino acid sequence change . | Minor allele frequencies (of copies present) . | . | . | . | Present in databases . | rs number . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | . | . | . | AA . | CA . | HCA . | MA . | . | . | |||||||||
GSTT1 | ||||||||||||||||||
Null allele frequency | 0.485 | 0.335 | 0.735 | 0.385 | ||||||||||||||
1 | 5′- FR (−714) | C to T | 0.010 | 0.015 | 0.000 | 0.000 | Yes | |||||||||||
2 | 5′- FR (−476) | G to A | 0.000 | 0.008 | 0.000 | 0.000 | No | |||||||||||
3 | 5′- FR (−295) | C to G | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
4 | 5′- FR (−56) | G to C | 0.320 | 0.000 | 0.000 | 0.008 | Yes | |||||||||||
5 | exon 1 (13) | C to T | 0.010 | 0.023 | 0.000 | 0.008 | Yes | rs1130990 | ||||||||||
6 | IVS 1 (−178) | C to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
7 | IVS 1 (−151) | G to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
8 | IVS 1 (−24) | C to T | 0.029 | 0.000 | 0.000 | 0.000 | No | |||||||||||
9 | IVS 1 (−14) | T to A | 0.000 | 0.000 | 0.000 | 0.008 | No | |||||||||||
10 | exon 2 (127) | G to A | Asp43Asn | 0.010 | 0.000 | 0.000 | 0.000 | No | ||||||||||
11 | exon 2 (177) | C to T | 0.000 | 0.008 | 0.000 | 0.000 | No | |||||||||||
12 | exon 2 (194) | C to T | Thr65Met | 0.000 | 0.000 | 0.000 | 0.008 | No | ||||||||||
13 | IVS 2 (−13) | C to T | 0.136 | 0.000 | 0.000 | 0.008 | Yes | rs2234734 | ||||||||||
14 | exon 3 (225) | G to A | 0.010 | 0.000 | 0.000 | 0.008 | No | |||||||||||
15 | exon 4 (412) | del G | * | 0.000 | 0.000 | 0.000 | 0.008 | No | ||||||||||
16 | exon 4 (505) | G to A | Val169Ile | 0.126 | 0.000 | 0.000 | 0.008 | Yes | rs2266637 | |||||||||
17 | IVS 4 (−87) | del CCT | 0.029 | 0.000 | 0.000 | 0.000 | No | |||||||||||
18 | exon 5 (573) | G to A | 0.000 | 0.000 | 0.000 | 0.008 | No | |||||||||||
GSTM1 | ||||||||||||||||||
Null allele frequency | 0.505 | 0.710 | 0.780 | 0.660 | ||||||||||||||
19 | 5′-FR (−676) | C to T | 0.121 | 0.000 | 0.116 | 0.029 | No | |||||||||||
20 | 5-'FR (−673) | G to A | 0.121 | 0.000 | 0.116 | 0.029 | No | |||||||||||
21 | 5′-FR (−671) | G to A | 0.121 | 0.000 | 0.116 | 0.029 | No | |||||||||||
22 | 5′-FR (−668) | C to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
23 | 5′-FR (−552) | C to G | 0.303 | 0.119 | 0.163 | 0.132 | Yes | |||||||||||
24 | 5′-FR (−540) | C to G | 0.172 | 0.085 | 0.116 | 0.088 | No | |||||||||||
25 | 5′-FR (−525) | C to T | 0.040 | 0.085 | 0.000 | 0.074 | No | |||||||||||
26 | 5′-FR (−480) | G to A | 0.429 | 0.661 | 0.674 | 0.750 | Yes | |||||||||||
27 | 5′-FR (−398) | C to T | 0.156 | 0.424 | 0.326 | 0.294 | Yes | |||||||||||
28 | 5′-FR (−397) | A to T | 0.156 | 0.424 | 0.326 | 0.294 | Yes | |||||||||||
29 | 5′-FR (−395) | C to T | 0.010 | 0.220 | 0.000 | 0.029 | No | |||||||||||
30 | 5′-FR (−393) | C to T | 0.469 | 0.339 | 0.674 | 0.544 | Yes | |||||||||||
31 | 5′-FR (−358) | G to A | 0.177 | 0.186 | 0.302 | 0.250 | No | |||||||||||
32 | 5′-FR (−273) | del T | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
33 | 5′-FR (−268) | A to T | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
34 | 5′-FR (−267) | ins G | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
35 | 5′-FR (−258) | G to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
36 | 5′-FR (−222) | T to A | 0.042 | 0.000 | 0.000 | 0.044 | No | |||||||||||
37 | 5′-FR (−218) | T to C | 0.281 | 0.288 | 0.000 | 0.044 | No | |||||||||||
38 | 5′-FR (−190) | G to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
39 | 5′-FR (−165) | A to G | 0.131 | 0.000 | 0.000 | 0.029 | No | |||||||||||
40 | 5′-FR (−109) | C to T | 0.000 | 0.017 | 0.000 | 0.000 | No | |||||||||||
41 | 5′-UTR (−54) | C to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
42 | IVS 1 (38) | A to G | 0.212 | 0.254 | 0.000 | 0.029 | No | |||||||||||
43 | IVS 1 (97) | C to T | 0.081 | 0.000 | 0.000 | 0.015 | No | |||||||||||
44 | IVS 1 (−117) | G to A | 0.000 | 0.000 | 0.023 | 0.000 | No | |||||||||||
45 | IVS 1 (−78) | T to A | 0.051 | 0.000 | 0.000 | 0.015 | No | |||||||||||
46 | IVS 1 (−25) | G to A | 0.192 | 0.136 | 0.000 | 0.088 | Yes | rs10857795 | ||||||||||
G to C | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||||
47 | IVS 1 (−21) | C to G | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||
48 | IVS 1 (−19) | T to C | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||
49 | IVS 3 (21) | G to A | 0.040 | 0.000 | 0.000 | 0.044 | No | |||||||||||
50 | IVS 3 (48) | C to G | 0.020 | 0.000 | 0.000 | 0.000 | No | |||||||||||
51 | IVS 3 (−98) | A to C | 0.051 | 0.000 | 0.000 | 0.015 | No | |||||||||||
52 | IVS 3 (−78) | T to C | 0.202 | 0.271 | 0.614 | 0.529 | Yes | |||||||||||
53 | IVS 3 (−58) | G to C | 0.000 | 0.000 | 0.000 | 0.015 | No | |||||||||||
54 | IVS 3 (−21) | G to C | 0.000 | 0.000 | 0.000 | 0.015 | No | |||||||||||
55 | IVS 3 (−18) | G to A | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
56 | Exon 4 (222) | C to T | 0.000 | 0.000 | 0.000 | 0.015 | No | |||||||||||
57 | Exon 4 (254) | A to G | Asn85Ser | 0.000 | 0.068 | 0.000 | 0.000 | No | ||||||||||
58 | IVS 4 (26) | G to A | 0.020 | 0.237 | 0.500 | 0.206 | Yes | rs4147565 | ||||||||||
59 | IVS 4 (53) | G to T | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
60 | IVS 5 (140) | C to T | 0.333 | 0.186 | 0.386 | 0.279 | Yes | rs4147566 | ||||||||||
61 | IVS 5 (195) | C to T | 0.010 | 0.000 | 0.000 | 0.015 | No | |||||||||||
62 | IVS 5 (−92) | A to G | 0.192 | 0.136 | 0.000 | 0.088 | Yes | |||||||||||
63 | IVS 5 (−59) | G to A | 0.020 | 0.000 | 0.000 | 0.000 | No | |||||||||||
64 | exon 6 (396) | T to C | 0.010 | 0.000 | 0.000 | 0.000 | No | |||||||||||
65 | IVS 6 (−19) | T to C | 0.111 | 0.305 | 0.000 | 0.044 | No | |||||||||||
66 | Exon 7 (519) | G to C | Lys173Asn | 0.071 | 0.254 | 0.614 | 0.485 | Yes | rs1065411 | |||||||||
67 | Exon 7 (528) | C to T | 0.323 | 0.271 | 0.386 | 0.353 | Yes | rs1056806 | ||||||||||
68 | IVS 7 (−56) | C to G | 0.020 | 0.017 | 0.000 | 0.015 | Yes | |||||||||||
69 | 3′-UTR (888) | C to T | 0.424 | 0.288 | 0.000 | 0.059 | Yes | rs17672 |
NOTE: Polymorphism locations, nucleotide and amino acid sequence changes, variant allele frequencies (based on the number of gene copies present), and the presence in public databases are listed for each of the four ethnic groups. Polymorphisms in exons and UTRs are numbered with A in the ATG being 1. Numbers 5′ to that A are negative, whereas numbers 3′ are positive. Nucleotides located within introns (IVS) are numbered based on their distance from splice junctions, with distances from 3′ splice junctions assigned positive numbers and distances from 5′ splice junctions assigned negative numbers. Polymorphisms within exons are in bold font.
Frame shift leading to an early stop codon.
GSTT1 and GSTM1 gene resequencing. Resequencing of the exons, splice junctions, and a portion of the 5′-FRs for both GSTT1 and GSTM1 resulted in the identification of 18 and 51 SNPs, respectively. Polymorphism locations are represented graphically in Fig. 1, and both polymorphism locations and frequencies are listed in Table 1. Three nonsynonymous cSNPs and one single-nucleotide deletion that resulted in an early stop codon were included among the 18 SNPs identified in GSTT1. There were 12 SNPs in samples from AA, 4 in CA, none in HCA, and 9 in MA subjects. The GSTT1*B variant previously identified in the Saami population of Sweden (31), a Thr104Pro change in amino acid sequence resulting from a change at cDNA nucleotide 310 from A to C, was not observed in our samples. We identified 51 polymorphisms—43 in AA, 26 in CA, 16 in HCA, and 37 in MA subjects—in GSTM1, including two nonsynonymous cSNPs (Fig. 1, Table 1). The results of our resequencing effort were then compared with known publicly available SNPs by the use of PolyMAPr (32) and a search of dbSNP. We found that 13 of our 18 GSTT1 SNPs and 38 of the 51 GSTM1 SNPs that we had observed were not present in dbSNP or the databases queried by PolyMAPr. We also compared our data to those in the HapMap, Public Release 21. The HapMap listed only a single SNP in GSTT1 and eight in GSTM1. The relative sparsity of SNPs in these public databases as compared with the number of SNPs that we observed underscores the continuing need to perform this type of study to characterize common sequence variation in important genes such as GSTT1 and GSTM1.
Haplotypes are of increasing importance for use in association studies (33). Therefore, haplotype analysis was done for each gene in each of the four ethnic groups. Supplementary Table S1A shows the nine GSTT1 haplotypes with a frequency of ≥2.5% that were observed or inferred, whereas Supplementary Table S1B shows the 27 GSTM1 haplotypes with a frequency of ≥3%, as well as any haplotype that included a nonsynonymous cSNP. Haplotype frequencies were calculated on the basis of alleles that were present in these samples.
Expression array analysis. Expression microarray studies were done to determine whether there might be a correlation between GSTT1 gene copy number and basal mRNA expression in the lymphoblastoid cells from which the DNA used to determine gene copy number had been obtained. It was not possible to perform similar studies for GSTM1 because the GSTM1 probe sets on the Affymetrix U133 Plus 2.0 chips used to perform these experiments cross-hybridized with transcripts for other GSTM isoforms. There was a striking degree of variation in GSTT1 expression among individual samples (Fig. 2A). Differences in average expression level among copy number groups were highly significant by Student's t test (Fig. 2B, inset). Furthermore, the average level of GSTT1 expression was linear with the number of gene copies. However, there was a slight departure from a purely linear relationship with copy number (P = 0.04). Samples with one copy of GSTT1 had a slightly higher mean expression level than would have been expected on the basis of a purely linear relationship.
There was overlap in expression levels between samples with one and two copies of the gene, and several samples with one or two copies of GSTT1 had no detectable mRNA expression, much like samples that lacked this gene. In addition, one sample that lacked GSTT1 on the basis of both of the copy number assays displayed a moderate level of GSTT1 transcripts. PCR amplification of each exon in that sample showed that it had a partial gene deletion (data not shown). The variation that we observed for expression within copy number groups could be due to gene sequence variation in promoter regions, SNPs that influence mRNA stability, or differences in the activity of trans-acting factors, such as proteins or mRNAs that are required for transcript and/or mRNA stability. Therefore, the next series of experiments focused on possible sequence-dependent variation in transcription.
GSTT1 and GSTM1 reporter gene studies. Our resequencing studies had identified four SNPs within a region located in the 5′-FR of GSTT1 (Fig. 1, Table 1). One of those SNPs, G(−56)C, occurred at a frequency of 32% in AA subjects, whereas the others had frequencies of <1%. Because of the low frequency of haplotypes including more than one of these SNPs, reporter gene constructs were created for each SNP to study them separately. Two cell lines, JEG-3 and HEK 293T/17, were selected for use in these experiments because GSTT1 and GSTM1 are expressed in both the kidney and placenta according to the UniGene EST ProfileViewer.6
Results of the transfection of these cells with reporter gene constructs showed cell line–dependent variation in transcription (Fig. 3A and B). The G(−56)C variant resulted in a striking increase in transcription in JEG cells to 351 ± 20% of WT (mean ± SE, P < 0.0001), whereas two other GSTT1 5′-FR SNPs, C(−295)G and G(−476)A, resulted in modest but significant decreases in expression to 82 ± 4% (P < 0.01) and 77 ± 8% (P < 0.05) of WT, respectively. In 293T cells, G(−56)C increased transcription, but not significantly, whereas the C(−295)G, G(−476)A, and C(−714)T variants resulted in modest but significant decreases in expression to 77 ± 1% (P < 0.0001), 76 ± 4% (P < 0.001), and 78 ± 8% (P < 0.01) of WT, respectively. A search of the TransFac database failed to identify any putative transcription factor binding sites that might be altered by the GSTT1 G(−56)C polymorphism.Our GSTM1 resequencing studies, in contrast to the situation for GSTT1, resulted in the identification of a large number of SNPs in the 5′-FR (Fig. 1B). Many of those SNPs were in linkage disequilibrium. Therefore, GSTM1 5′-FR SNPs were not studied individually. In this case, reporter gene constructs were created for the three haplotypes with the highest frequencies in the AA population and the three haplotypes with the highest frequencies in the CA population (Supplementary Table S1C). Construct 1 was designated the WT sequence because it had the highest haplotype frequency in the AA population. The results for GSTM1 also showed cell line–dependent variation in transcription (Fig. 3C and D). In JEG cells, construct 2 resulted in a significant increase in activity to 129 ± 6% of the WT construct (P < 0.01). In the 293T cells, constructs 2, 3, 4, and 6 all displayed significant alterations in activity compared with WT, resulting in 131 ± 7% (P < 0.01), 75 ± 4% (P < 0.001), 85 ± 2% (P < 0.001), and 73 ± 3% (P < 0.0001) of WT activity, respectively. A search of the TransFac database identified transcription factor binding sites that might be altered at three of the positions that varied among the haplotypes studied. At position (−393), the change to T created an Sp1 site; the A at position (−480) created an Ik2 site; and the G variant at nucleotide (−358) created a v-Myb binding site. We next turned our attention to the possibility that alterations in the encoded amino acid sequence as a result of nonsynonymous cSNPs might also influence GSTT1 and GSTM1 function.
GSTT1 and GSTM1 Western blot analysis. To explore the possible effects of inherited alterations in amino acid sequence on GSTT1 and GSTM1, COS-1 cells were transfected with expression constructs encoding each of the GSTT1 and GSTM1 allozymes that we had identified. A mammalian cell line was selected for these experiments so the machinery for mammalian post-translational modification and protein degradation would be present. It has been shown repeatedly that alteration in only one or two amino acids as a result of genetic polymorphisms can result in significant changes in protein level, most often as a result of degradation through a ubiquitin-proteasome–mediated process (26, 27). After correction for transfection efficiency, we found that all but one of the GSTT1 variant allozymes displayed a significantly decreased level of immunoreactive protein (Fig. 4A). The Asp43Asn and Thr65Met variants had levels of immunoreactive protein that were only 56 ± 0.7% (P < 1 × 10−8) and 12 ± 3% (P < 0.0001) of the WT, respectively, whereas the level of the Val169Ile variant allozyme was similar to that of the WT allozyme, at 94 ± 10%. We also studied the GSTT1*B (Thr104Pro) variant that was reported previously to have no detectable enzyme activity or immunoreactive protein in vivo (31). We were able to detect immunoreactive protein, but it was reduced to ∼25% (P < 0.001) of the WT level (Fig. 4A). Finally, not surprisingly, we were unable to detect protein for the construct that included the premature translation termination codon. The two nonsynonymous cSNPs in GSTM1—Asn85Ser and Lys173Asn—displayed modest but insignificant increases in levels of immunoreactive protein, with values 169 ± 37% and 160 ± 38%, respectively, of the WT level.
GSTT1 in vitro translation/degradation assays. In an attempt to determine the mechanism responsible for the striking decreases observed in the level of immunoreactive protein for GSTT1 variant allozymes, in vitro translation and degradation experiments were done for the GSTT1 variant allozymes using the rabbit reticulocyte lysate system. Although we were able to synthesize radioactively labeled protein for all variant allozymes with the exception of that encoded by the construct with the premature stop codon, we obtained no evidence for accelerated degradation of any of the variant allozymes, although a positive control, the rapidly degraded TPMT*3A variant allozyme (26), was very rapidly degraded. We then moved on to test the possibility that an alteration in level of mRNA might be associated with the decreases in level of GSTT1 variant allozyme protein expression by performing RT-PCR with mRNA isolated from COS-1 cells transfected with WT and variant allozyme constructs.
GSTT1 RT-PCR. After the transient transfection of COS-1 cells with GSTT1 expression constructs and β-galactosidase, mRNA was isolated, and RT-PCR was done with primers for both GSTT1 and β-galactosidase. After correction for transfection efficiency, altered levels of mRNA for the GSTT1 variants also failed to account for the difference in expression observed during the Western blot analysis, with the exception of the variant resulting in a stop codon, which failed to express detectable mRNA (data not shown). Therefore, the mechanism(s) responsible for decreased protein levels for the remaining GSTT1 variant allozymes remains unclear.
Discussion
GSTT1 and GSTM1 catalyze the detoxification of many carcinogens and antineoplastic drugs, and they can also bioactivate selected xenobiotics to form genotoxic products (1–5). The genes encoding these two enzymes have common CNPs, and those gene deletion polymorphisms have been the focus of literally hundreds of epidemiologic studies attempting to link GSTT1 and GSTM1 variation to carcinogenesis. However, no previous systematic studies of the nature or extent of DNA sequence variation in these important genes have been reported. Therefore, in the present study, we set out to systematically identify common genetic variation in GSTT1 and GSTM1 and to characterize the functional significance of that variation to expand on the commonly used deletion versus presence paradigm for these studies. Specifically, we characterized the CNP status of 400 DNA samples from four ethnic groups and then resequenced all exons, splice junctions, and ∼1 kb of the 5′-FR for each gene in those same samples. We also performed functional genomic studies using reporter genes, expression constructs, and expression array data.
We used two independent gene deletion assays in our studies. Both of the assays that we used were able to differentiate one or two copies of each gene, but the ABI-based assay had the advantage of comparing the GST gene of interest with a reference amplicon, so it could also detect gene duplication. As a result, we identified duplication of GSTM1 in one AA subject, an observation made only once before in a Saudi Arabian population (34). When mRNA microarray analysis was done to evaluate the role of copy number in the individual variation in GSTT1 expression, there was a high degree of correlation between our copy number and GSTT1 mRNA expression assay results (Fig. 2B). A similar study was not performed for GSTM1 because of the high degree of sequence homology within the GSTM1 gene family and, as a result, probe cross-hybridization. In future experiments, the use of GSTM1-specific primers for RT-PCR might make it possible to avoid this difficulty. After eliminating samples in which both copies of these genes were deleted, we resequenced GSTT1 and GSTM1 in the remaining samples, observing many previously unreported SNPs in both genes.
Reporter gene assays were then performed to address the issue of possible sequence-dependent variation in the transcription of GSTT1 and GSTM1 (see Fig. 3). Obviously, the observations shown in Fig. 3 will have to be pursued in the course of future experiments designed to elucidate mechanisms responsible for this sequence-dependent alteration in transcription.
Finally, we identified three nonsynonymous cSNPs in GSTT1, as well as a single nucleotide deletion that resulted in a frame shift and a premature stop codon. Expression constructs were created for the WT and all four of these variants as well as a previously reported GSTT1 nonsynonymous cSNP (31). As we pointed out in Results, we did not observe a previously reported polymorphism that results in a Thr104Pro alteration in encoded amino acid in any of our DNA samples, but that was not surprising because that SNP was previously observed only in samples from Saami subjects (31). Three of the four constructs that included nonsynonymous cSNPs (Asp43Asn, Thr65Met, and Thr104Pro), as well as the single nucleotide deletion, resulted in significant decreases in levels of GSTT1 immunoreactive protein. We were also able to express and detect the previously reported Thr104Pro variant allozyme (31). It displayed <25% of the WT level of protein. A series of previous studies have shown that genetic alteration in a single amino acid often influences function as a result of alterations in protein level (26, 27, 35–38), as shown by GSTT1. When the mechanism responsible for decreased levels of protein had been studied, most often it has been found to result from accelerated proteasome-mediated degradation (26, 27). Therefore, we used the RRL system to study the possible contribution of degradation to the decreased levels of variant allozymes shown in Fig. 4. However, the use of the RRL to perform in vitro transcription/translation failed to provide evidence that any of the GSTT1 variants were rapidly degraded by the ubiquitin-proteasome system. In addition, with the exception of the single nucleotide deletion construct, RT-PCR studies did not suggest that mRNA stability might contribute to the decrease in expression that we observed. That construct failed to yield detectible protein after transient expression in COS-1 cells or after RRL transcription-translation. Therefore, the mechanism responsible for variations in levels of GSTT1 variant allozyme immunoreactive protein levels when expressed in COS-1 cells remains unclear. It is clear that additional research will be required to explain this phenomenon, and future studies will need to address alternative mechanisms, e.g., autophagy-mediated degradation (39, 40).
In summary, we have studied GSTT1 and GSTM1 CNPs in four populations, as well as their effect on GSTT1 mRNA expression levels. We also resequenced the human GSTT1 and GSTM1 genes and observed a large number of unreported SNPs in both genes. Reporter gene assays showed that several of these SNPs and 5′-FR haplotypes significantly altered the ability to drive transcription. Finally, studies of the effect on the level of immunoreactive protein of amino acid sequence alterations as a result of common polymorphisms showed (especially for polymorphisms in GSTT1) significantly altered protein levels after COS-1 cell transfection. These results will make it possible to move beyond merely assaying copy number in future studies of the possible implications of GSTT1 and GSTM1 for carcinogenesis risk and/or variation in response to antineoplastic therapy.
Grant support: NIH grants T32 GM72474, R01 GM28157, R01 GM35720, Prostate Specialized Programs of Research Excellence CA91956, and U01 GM61388 (The Pharmacogenetics Research Network).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
The gene resequencing data included in the report have been deposited in the NIH database PharmGKB with submission identification numbers PS205618 and PS206423 for GSTT1 and GSTM1, respectively.
Acknowledgments
We thank Luanne Wussow for her assistance with the preparation of the manuscript.