The epidermal growth factor receptor (EGFR) plays a prominent role in cell growth and development. Its regulation in humans is complex and incompletely understood. In this study, 12 new polymorphisms were discovered in the 5′-regulatory region of EGFR gene and 2 common single nucleotide polymorphisms (−216G/T and −191C/A) were found in the essential promoter area, one of which is located in a Sp1 recognition site (−216). Transient transfection in human cancer and primary cell lines showed significantly different promoter activity between the two most common haplotypes (−216G-191C and −216T-191C). The replacement of G by T at position −216 increases the promoter activity by 30%. A transient transfection assay in the Sp1-deficient cell line (Schneider cell line 2) showed a strong dependence of EGFR promoter activity on Sp1 and confirmed the effect of the aforementioned polymorphisms. Electrophoretic mobility shift assay also showed a significantly higher binding efficiency of nuclear protein or pure Sp1 protein to the T allele compared with the G allele. We then investigated the allelic imbalance of EGFR transcription in fibroblast cell lines with heterozygous genotype at −216G/T but C/C homozygous genotype at −191C/A. The expression of mRNA carrying T-C haplotype was significantly stronger compared with that of G-C haplotype (P < 0.02). Thus, we successfully showed that a common polymorphism in the EGFR promoter was associated with altered promoter activity and gene expression both in vitro and in vivo. Our findings have implications for cancer etiology and therapy and may also be relevant to the inherited susceptibility of other common diseases.

The human epidermal growth factor receptor (EGFR) gene is located in chromosome 7p12.1-12.3. Its product, a 170-kDa transmembrane glycoprotein, plays a critical role in the signal transduction pathway for cell proliferation, differentiation, and survival. Overexpression of EGFR is found in ∼30% of human primary tumors and has been significantly associated with disease stage, prognosis, survival, and response to chemotherapy (1). EGFR is also a possible genetic risk factor for cancer susceptibility. In transgenic mice, overexpression of the EGFR initiates formation of oligodendroglioma and breast cancer (2, 3). In humans, a (CA)n polymorphism in intron 1 of EGFR was recently associated with breast cancer risk in young women (4). Because of its important function in cancer, EGFR is now an attractive target for treatment and prevention of cancer. Several anticancer agents inhibiting the EGFR phosphorylation or blocking the ligand binding are being tested currently in clinical trials, and two anti-EGFR agents, cetuximab and gefitinib, were recently approved by the U.S. Food and Drug Administration for the treatment of colorectal and lung cancer, respectively.

However, the regulation of EGFR is not yet completely understood. To date, several features of the 5′-regulatory region of EGFR have been described, including a TATA-less, CAAT-less, high GC content promoter with multiple transcriptional start sites (5, 6); an intronic enhancer element in hormone-independent cells and other two cooperative enhancer regions around exon 1 (7, 8); and a variety of trans and cis elements, such as epidermal growth factor–responsive DNA binding protein-1, p53, p63, Sp1, vitamin D–responsible element, and estrogen-responsive element (5, 6, 9–20). Furthermore, multiple Sp1 binding sites have been discovered, and Sp1 is deemed necessary for EGFR promoter activity (5, 9, 21, 22). However, only limited data exist regarding DNA variants in the 5′-regulatory region of EGFR. Recently, a microsatellite sequence ∼1.5 kb downstream of exon 1 has been proposed to regulate EGFR expression (23, 24). Thus far, no regulatory variations in the promoter or enhancer regions of EGFR with functional consequence have been reported. To further understand the regulation of EGFR, we resequenced an ∼4-kb area in the 5′-regulatory region of EGFR in Caucasian, African American, and Asian populations. Furthermore, we characterized the function of two common polymorphisms in the EGFR promoter region.

Discovery of Single Nucleotide Polymorphisms in EGFR Regulatory Region. DNA samples from either the Lymphoblastoid Cell Line Core in University of Chicago or the Coriell Cell Repository were used for resequencing. The samples included 24 African Americans, 22 Caucasians, and 23 Asians. For single nucleotide polymorphism (SNP) discovery, PCR was used to amplify the ∼4.0-kb fragment containing the upstream and downstream enhancer, promoter, exon1, and part of intron 1. Primers were designed according to the reference sequence AF288738 from Genbank using Primer Premier version 5.00 (Premier Biosoft International, Palo Alto, CA). Primer sequences were EGFR1F: 5′-GCATGACTTCAACGCACAGT-3′ and EGFR1R: 5′-GAGGCTAAGTGTCCCACTGC-3′, EGFR2F: 5′-TCGGACTTTAGAGCACCACC-3′ and EGFR2R: 5′-GAGGAGGAGAATGCGAGGAG-3′, EGFR3F: 5′-AAATTAACTCCTCAGGGCACC-3′ and EGFR3R: 5′-CGCCCTTACCTTTCTTTTCC-3′, EGFR4F: 5′-CCCTGACTCCGTCCAGTATT-3′ and EGFR4R:5′-AAGAAAGTTGGGAGCGGTTC-3′,EGFR5F: 5′-CGTCCTTTCCTGTTTCCTTG-3′ and EGFR5R: 5′-AGACGAGTTCTCCCAGCTCC-3′, EGFR6F: 5′-GCGCAGGTCTCAAACTGAAG-3′ and EGFR6R: 5′-GGAGAAGTTTGCTGTGAGCC-3′, EGFR7F: 5′-CCCTCGTCTTGCCTATCCA-3′ and EGFR7R: 5′-AGTGATCCCCAAATCTGGCT-3′, EGFR8F: 5′-GGCATAGAACAGTGGTTCCC-3′; EGFR8R: 5′-GAACACCAATGGAGGGAGAA-3′, and EGFR9F: 5′-TGAAGGAACTGGTGGAAAGG-3′ and EGFR9R: 5′-CATGTCCCAGAACCAAACAA-3′. After PCR, products were purified and subsequently directly sequenced from both ends.

SNP Location, Allele Frequency, Haplotypes, and Linkage Disequilibrium. Position of SNPs is reported relative to the initiator ATG (with Aas+1). The common and rare alleles were separated by a slash (the common allele is always put to the left of the slash; Table 1). Allele frequency of each SNP was calculated by allele counting among each ethnic group. Pairwise linkage disequilibrium (LD) and haplotypes were calculated or estimated by using Arlequin 2.000 (http://anthro.unige.ch/arlequin). D′ and r2 were generated and plotted through the software, LD plotter (http://innateimmunity.net/IIPGA2/Bioinformatics/).

Table 1.

Number and frequency of rare alleles of SNPs discovered from EGFR regulatory region

SNPSurrounding sequenceFrequency (%)
African AmericansCaucasiansAsians
−1433C/T ctgcagggtgCgagacccagg 4.2 
−1298G/A agagatcaggGttgttgaacc 4.2 
−1247G/A gagaggagggGtctggctgta 10.4 
−1225G/A catggacctaGaggacatttt 2.3 
−759C/A ctgatccccgAgagggtcccg 14.6 13.6 
−646G/A tctaaagctgGtacaagtttg 4.2 
−540G/A cgcggggaccGggtccagagg 2.4 
−482C/A tcagggcaccCgctcccctcc 6.3 
−216G/T agcagcctccGccccccgcac 29.2 31.8 7.1 
−191C/A tgagcgcccgCcgcggccgag 13.6 
169G/T gcgcaccggcGcaccggctcg 12.5 11.4 
2028G/A ctgttagtcaGgtgtcagccc 4.2 2.3 9.5 
SNPSurrounding sequenceFrequency (%)
African AmericansCaucasiansAsians
−1433C/T ctgcagggtgCgagacccagg 4.2 
−1298G/A agagatcaggGttgttgaacc 4.2 
−1247G/A gagaggagggGtctggctgta 10.4 
−1225G/A catggacctaGaggacatttt 2.3 
−759C/A ctgatccccgAgagggtcccg 14.6 13.6 
−646G/A tctaaagctgGtacaagtttg 4.2 
−540G/A cgcggggaccGggtccagagg 2.4 
−482C/A tcagggcaccCgctcccctcc 6.3 
−216G/T agcagcctccGccccccgcac 29.2 31.8 7.1 
−191C/A tgagcgcccgCcgcggccgag 13.6 
169G/T gcgcaccggcGcaccggctcg 12.5 11.4 
2028G/A ctgttagtcaGgtgtcagccc 4.2 2.3 9.5 

NOTE: SNP is in uppercase. SNPs are named relative to the initiator codon with A of the ATG codon as +1 and followed by common/rare allele.

Genotyping of −216G/T, −191 C/A, and Intron 1 (CA)n Polymorphisms. Restriction enzymes were used to genotype the two SNPs in the EGFR promoter. The G-to-T change at position −216 is recognized byBseRI, whereas the C-to-A at −191 abolishes the cutting site of SacII. Genotyping of these two polymorphisms was done by digesting 10-μL PCR product using 8 units BseRI or 10 units SacII (New England Biolabs, Beverly,MA), respectively, at 37°C for 2.5 hours. Digestion reaction was then inactivated at 65°C for 15 minutes, separated in 2% agarose gel, and visualized by ethidium bromide staining. The intron 1 (CA)n polymorphism was genotyped as described previously (25).

Construction of EGFR Promoter Reporter Vectors. Previous deletion mapping studies have shown that ∼500-bp 5′-flanking sequence (−484 to−16) upstream of the initiator ATG has the essential promoter function, whereas the fragment −384 to −16 has full promoter activity (Fig. 1; refs. 6, 9, 21). Therefore, this fragment containing the −216G/T and −191C/A polymorphisms was chosen to construct the luciferase reporter gene vectors. PCR was done to generate a 525-bp amplicon (−418 to +107)using 5′-CCACCGGTACCGGCGGCCGCTGGCCTTG-3′ as the forward primer and 5′-CGGCGAGACACGCCCTTACCTTT-3′ as the reverse primer. This fragment contained a SacI site at position −16 (Fig. 1). To facilitate the subcloning, the forward primer was designed to contain a KpnI site (modified nucleotides were indicated by italics in the forward primer sequence listed above). This fragment was amplified from the individuals with specific haplotypes of −216G/T and −191C/A by PCR using the Proofstart DNA polymerase (Qiagen, Hilden, Germany), which is modified for high-fidelity DNA amplification. The fragment was then digested by KpnI and SacI and a 397-bp product (−412 to −16) was cloned into the KpnI/SacI site of pGL3-basic vector containing the firefly luciferase (Promega, Madison, WI) to construct vectors designated as pGL3EGFRluc (Fig. 1). Three haplotypes [−216G-191C (G-C, *1), −216G-191A (G-A, *2), and −216T-191C (T-C, *3)] were successfully amplified and cloned. The rarehaplotype −216T-191A (T-A, *4) was constructed by ligating the T-fragment and the A-fragment from the DraIII digested T-C and G-A haplotypes (Fig. 1). Plasmid DNA was prepared and all clones were sequenced to exclude any PCR errors and to ensure the orientation of the haplotypes prior to transfection.

Figure 1.

Essential promoter region of human EGFR gene. Open boxes, Sp1 recognition sites; underlined, forward and reverse primers for amplifying the region; big and small arrows, major and minor transcriptional start sites, respectively; italics, positions of restriction enzymes (KpnI, DraIII, and SacI); uppercase, coding region of exon 1; bold and uppercase, two polymorphisms (−216G/T and −191C/A).

Figure 1.

Essential promoter region of human EGFR gene. Open boxes, Sp1 recognition sites; underlined, forward and reverse primers for amplifying the region; big and small arrows, major and minor transcriptional start sites, respectively; italics, positions of restriction enzymes (KpnI, DraIII, and SacI); uppercase, coding region of exon 1; bold and uppercase, two polymorphisms (−216G/T and −191C/A).

Close modal

Cell Culture. Human breast cancer cell lines MDA-MB-231 and MCF-7 were grown in RPMI 1640 and the human embryonic kidney 293 (HEK-293) cells were grown in DMEM. Both media were supplemented with 2 mmol/L glutamine and 10% fetal bovine serum in a humidified atmosphere (5% CO2, 95% air). Drosophila melanogaster Schneider cell line 2 (SL-2) was obtained from the American Type Culture Collection (Manassas, VA) and maintained in Schneider's Drosophila medium supplemented with 10% fetal bovine serum at 23°C and atmospheric CO2. Human fibroblast cell lines were obtained from the Coriell Cell Repository and grown in Eagle's MEM with Earle's salts supplemented with 2 mmol/L glutamine and 15% fetal bovine serum at 37°C and atmospheric CO2. All media and supplements for cell culture were purchased from Invitrogen (Carlsbad, CA).

Transient Transfection. The pGL3EGFRluc reporter vectors carrying each of the four haplotypes (*1 to *4) were transfected into MDA-MB-231, MCF-7, HEK-293, or SL-2 cells, respectively, to compare the relative expression of the luciferase gene. Transient transfection of MDA-MB-231 and HEK-293 cells was done with LipofectAMINE 2000 (Invitrogen), whereas transfection of MCF-7 and SL-2 was done with FuGene 6 transfection reagent (Roche Molecular Biochemicals, Mannheim, Germany) according to the manufacturer's instructions. Human cells were cotransfected by eachpGL3EGFRluc reporter vector together with pRL-TK reporter vector (Promega). The SL-2 cells were transfected with pGL3EGFRluc alone or together with pPac-Sp1 [(Sp1 expression vector pPac-Sp1 was kindly provided by Drs. Robert Tjian (University of California, Berkeley, CA) and Erin G. Schuetz (St. Jude Children's Research Hospital, Memphis, TN)]. For detection, cells were washed twice with PBS and lysed and luciferase activity was detected using the dual luciferase reporter assay system (Promega) according to the manufacturer's instructions. Experiments were done in triplicate and were repeated at least twice. To control the transfection efficiency, activity of firefly luciferase was normalized to either Renilla luciferase of pRL-TK vector (human cell lines) or the total cellular protein level measure by the Bradford method (SL-2, Bio-Rad, Hercules, CA).

Electrophoretic Mobility Shift Assay. Nuclear proteins were extracted from MDA-MB-231 cells using the NE-PER nuclear and cytoplasmic extraction reagents according to the manufacturer's protocol (Pierce, Rockford, IL). Pure human Sp1 protein was purchased from Promega. The probes and competitors for −216G, −216T, and the Sp1 consensus binding sequence (Promega) were 5′-GCAGCCTCCGCCCCCCGCACGGTGT-3′, 5′-GCAGCCTCCTCCCCCCGCACGGTGT-3′, and 5′-ATTCGATCGGGGCGGGGCGAGC-3′, respectively. Probes were synthesized as single strands and end labeled by biotin. Identical unlabeled oligonucleotides with same sequences were used as competitors. dsDNA was made and electrophoretic mobility shift assay was done using the LightShift Chemiluminescent EMSA kit (Pierce) according to the manufacturer's instruction.

Allele-Specific Transcription and Real-time PCR. Total RNA was extracted from MDA-MB-231, MCF-7, HEK-293, and 10 human fibroblast cell lines. Fibroblast cells were selected based on genotypes (G/T heterozygotes at −216 and C/C homozygotes at −191). cDNA of each sample was then obtained through reverse transcription of 1 μg total RNA using SuperScript II reverse transcriptase with random hexamer primers according to the manufacturer's protocol (Invitrogen).

cDNA from MDA-MB-231, MCF-7, and HEK-293 was then used to perform real-time PCR to evaluate the expression of EGFR gene with SYBR Green I method (Bio-Rad). β-actin gene was included as internal control. Primer sequences were EGFR-rF: 5′-GTCTGCCATGCCTTGTGCTC-3′ (forward) and EGFR-rR: 5′-CTTGTCCACGCATTCCCTGC-3′ (reverse) for EGFR gene and ACTB-rF: 5′-ACGTGGACATCCGCAAAGAC-3′ (forward) and ACTB-rR: 5′-CAAGAAAGGGTGTAACGCAACTA-3′ (reverse) for β-actin. Standard curves of both genes were constructed and expression of EGFR was normalized to 1,000 copies of β-actin.

For 10 fibroblasts, nested PCR was done to amplify both cDNA and DNA fragments from the same individual containing −216G/T and −191C/A. The primers for the first-round amplification were EGFR-p1 (forward: 5′-TCTGCTCCTCCCGATCCCTCCT-3′) and EGFR-p2 (reverse: 5′-CGGCGAGACACGCCCTTACCTTT-3′) for DNA and EGFR-p1 (forward) and EGFR-ex2p (reverse: 5′-AAAGTGCCCAACTGCGTGAG-3′) for cDNA. The reverse primer for cDNA amplification was particularly designed into the EGFR exon 2 to avoid DNA contamination. Nested PCR primers for amplifying both DNA and cDNA were EGFR-p3 (5′-GGCCCGCGCGAGCTAGACGT-3′) and EGFR-p4 (5′-CAGGTGGCCTGTCGTCCGGTCT-3′). Nested PCR products were then purified by treatment with ExoI and shrimp alkaline phosphatase (Roche Molecular Biochemicals). Purified PCR product was then used to perform single base extension to genotype the −216G/T polymorphism with primer EGFR-SBEp (5′-CTCACACCGTGCGGGGGG-3′) and the SNaPshot kit (Applied Biosystems, Foster City, CA). The ratio of Tallele and G allele TcDNA/GcDNA (represented by the value of peak height) in the cDNA was normalized to the TDNA/GDNA ratio in the corresponding DNA. The 95% confidence intervals for the relative ratios R [R = (TcDNA / GcDNA) / (TDNA / GDNA)] from 10 samples were calculated using GraphPad Prism version 3.00 for Windows (GraphPad Software, San Diego, CA; http://www.graphpad.com).

Single Nucleotide Polymorphism Discovery. By resequencing the 4-kb EGFR 5′-regulatory region, including the promoter and enhancers, 12 new SNPs were identified (Table 1). Five SNPs showed relatively high frequency (rare allele frequency ≥10%) in at least one of the three populations screened (Table 1), among which the −216G/T was the most common polymorphism in African Americans (29.2%) and Caucasians (31.8%) populations but not in Asians (7.1%). The African American population had the highest DNA diversity with one SNP in 449 nucleotides on average compared with one in 674 among Caucasians and one in 1,348 among Asians.

All but 1 (169 G/T) of the 12 SNPs was located in the enhancer or promoter region, reflecting a highly polymorphic regulatory region of EGFR. Two common polymorphisms, −216G/T and −191 C/A, were located in a critical region for the promoter activity where all transcriptional start sites and multiple nuclear protein affinity sites are located (6, 9). Interestingly, the −216 G/T was located in a Sp1 binding site (6, 9) and the −191 C/A was just 4bpupstream of one of six transcription initiation sites (6). Therefore, these two SNPs may have a significant impact on regulation of EGFR.

Linkage Disequilibrium and Haplotype Prediction. Pairwise LD across the 4-kb region was shown in Fig. 2. The −216G/T showed a very low level of LD with other SNPs in this region by both D′ and r2 values. To understand a potential relationship between the newly discovered SNPs and the intron 1 (CA)n polymorphism located between exon 1and the downstream enhancer region and previously suggested to have a regulatory function on EGFR expression (23, 24), pairwise LD based on P values of Fisher's exact test was calculated using the Arlequin software (the LD plotter cannot calculate the LD including microsatellite data). There was a very low level of LD (P > 0.05, Fisher's exact test) between −216G/T and the(CA)n, whereas the other five relatively common SNPs were in tight LD with the (CA)n (P < 0.01, Fisher's exact test; Fig. 2C). This implies that the −216G/T may have an independent effect on regulating EGFR gene compared with the intron 1 (CA)n polymorphism.

Figure 2.

Pairwise LD (D′ in A, r2 in B, and Fisher's exact test P in C) across the 5′-regulatory region of EGFR gene. C, intron 1 (CA)n polymorphism was also included. Low level of LD is found between (CA)n polymorphism and −216G/T compared with other five relatively common SNPs (−1247, −759, −191, 169, and 2028).

Figure 2.

Pairwise LD (D′ in A, r2 in B, and Fisher's exact test P in C) across the 5′-regulatory region of EGFR gene. C, intron 1 (CA)n polymorphism was also included. Low level of LD is found between (CA)n polymorphism and −216G/T compared with other five relatively common SNPs (−1247, −759, −191, 169, and 2028).

Close modal

Haplotypes across all the polymorphisms were predicted and six major ones were listed in Table 2. There is an obvious ethnic difference in haplotype distribution among three populations but with a relatively low haplotype diversity, as a few common haplotypes accounted for >60% of all haplotypes. Effective number of haplotypes was significantly lower in Asians than that in other two populations, indicating a lowest haplotype diversity in this population (Table 2). With regard to the two potentially functional SNPs in the promoter region (−216G/T and −191C/A), three haplotypes (G-C, G-A, and T-C) were observed in our samples. The G-C and T-C were the two common haplotypes in the essentialEGFR promoter among all three populations, whereas the G-A haplotype was only found in Caucasians (Table 2).

Table 2.

Distribution of six major haplotypes of 5′-regulatory region and four haplotypes between −216G/T and −191C/A among populations

IDMajor haplotypes of 5′-regulatory regionAfrican American (%)Caucasian (%)Asian (%)
CGGGCGGCGCG20G 27.1 15.0 34.8 
CGGGCGGCTCG16G 16.7 17.3 5.3 
CGGGCGGCGCG16G 12.5 16.8 29.5 
CGGGCGGCGCG21G 10.4 4.5 8.7 
CGGGAGGCGAT18G 0.0 11.4 0.0 
CGGGCGGCTCG20G 0.0 10.0 0.0 
Total frequency  66.7 75.0 78.3 
Total no. estimated haplotypes  147 66 26 
No. effective haplotypes  7.9 8.1 4.4 

 

 

 

 

 
 Haplotypes between −216G/T and −191C/A
 
African American (%)
 
Caucasian (%)
 
Asian (%)
 
*1 G-C 70.8 48.2 91.3 
*2 G-A 0.0 18.2 0.0 
*3 T-C 29.2 34.1 8.7 
*4 T-A 0.0 0.0 0.0 
IDMajor haplotypes of 5′-regulatory regionAfrican American (%)Caucasian (%)Asian (%)
CGGGCGGCGCG20G 27.1 15.0 34.8 
CGGGCGGCTCG16G 16.7 17.3 5.3 
CGGGCGGCGCG16G 12.5 16.8 29.5 
CGGGCGGCGCG21G 10.4 4.5 8.7 
CGGGAGGCGAT18G 0.0 11.4 0.0 
CGGGCGGCTCG20G 0.0 10.0 0.0 
Total frequency  66.7 75.0 78.3 
Total no. estimated haplotypes  147 66 26 
No. effective haplotypes  7.9 8.1 4.4 

 

 

 

 

 
 Haplotypes between −216G/T and −191C/A
 
African American (%)
 
Caucasian (%)
 
Asian (%)
 
*1 G-C 70.8 48.2 91.3 
*2 G-A 0.0 18.2 0.0 
*3 T-C 29.2 34.1 8.7 
*4 T-A 0.0 0.0 0.0 
*

Effective haplotypes, reflecting the frequency of common haplotypes among population, was calculated as the reciprocal of the sum of the haplotype frequency squared. From this, Asian population showed a lowest haplotype diversity among three populations.

NOTE: For the major haplotypes, the order of polymorphisms from left to right are −1433C/T, −1298G/A, −1247G/A, −1225G/A, −759C/A, −646G/A, −540G/A, −484C/A, −216G/T, −191C/A, 169G/T, (CA)n, and 2028G/A.

Transient Transfection Assay. Promoter activity of the four haplotypes (*1 to *4) showed similar patterns using either human primary embryonic epithelial cells (HEK-293) or cancer cell lines (MDA-MB-231 and MCF-7) with significantly higher luciferase activity for the T-C haplotype in comparison with the G-C haplotype vector (P < 0.04 for all comparisons; Fig. 3A). This effect was independent of the EGFR expression level of the cells (Fig. 3B). The −216 G/T polymorphism had a greater effect on activity than the −191 C/A polymorphism. On average, the G-to-T substitution showed an ∼30% increase of promoter activity.

Figure 3.

A, transient transfection of pGL3EGFRluc (*1 to *4) in MDA-MB-231, MCF-7, HEK-293, and SL-2 cells. For human cell lines, pGL3EGFRluc (1.6 μg) was cotransfected with pRL-TK vector (160 ng). For SL-2 cells, pGL3EGFRluc (300 ng) was cotransfected with pPac-Sp1 vector (100 ng), and relative expression of 200 light units of luciferase activity/μg total protein/mL was set to 1. Significant difference of promoter activity was observed between G-C and T-C haplotype of −216G/T-191C/A (all Ps < 0.04). B, relative expression of EGFR among MDA-MB-231, MCF-7, and HEK-293 cell lines and corresponding genotypes of −216G/T and −191C/A polymorphisms. EGFR mRNA level was normalized to 1,000 copies of β-actin gene. Experiments were repeated thrice. Columns, mean; bars, SE.

Figure 3.

A, transient transfection of pGL3EGFRluc (*1 to *4) in MDA-MB-231, MCF-7, HEK-293, and SL-2 cells. For human cell lines, pGL3EGFRluc (1.6 μg) was cotransfected with pRL-TK vector (160 ng). For SL-2 cells, pGL3EGFRluc (300 ng) was cotransfected with pPac-Sp1 vector (100 ng), and relative expression of 200 light units of luciferase activity/μg total protein/mL was set to 1. Significant difference of promoter activity was observed between G-C and T-C haplotype of −216G/T-191C/A (all Ps < 0.04). B, relative expression of EGFR among MDA-MB-231, MCF-7, and HEK-293 cell lines and corresponding genotypes of −216G/T and −191C/A polymorphisms. EGFR mRNA level was normalized to 1,000 copies of β-actin gene. Experiments were repeated thrice. Columns, mean; bars, SE.

Close modal

To further confirm the potential cooperative effect of the DNA alteration and Sp1 on promoter activity, transient transfection was also done in the D. melanogaster SL-2 in which Sp1 is deficient (26). As a result, cotransfection of pGL3EGFRluc with Sp1 expression vector resulted in ∼100-fold induction of promoter activity compared with transfection of pGL3EGFRluc alone (data not shown). Cotransfection of pPac-Sp1 and each of four pGL3EGFRluc constructs showed a significantly lower promoter activity driven by G-C haplotype compared with the T-C haplotype (P < 0.03; Fig. 3A).

Preferential Protein-DNA Interaction between Nuclear Extracts or Sp1 and −216G/T by Electrophoretic Mobility Shift Assay. Electrophoretic mobility shift assay was done to test the binding efficiency of nuclear proteins from the MDA-MB-231 cells to each allele-specific probe. Both alleles of −216G/T showed affinity to nuclear proteins at the same position of Sp1 consensus sequence. The band pattern of protein-DNA complex could be changed by excess of competitors. The affinity of nuclear proteins to the T allele was significantly higher than that to the G allele (Fig. 4A). Assays with pure human Sp1 protein produced similar results (Fig. 4B). These data further support our view that the altered EGFR promoter activity associated with the −216G/T was a result of differential affinity of transcription factor to the variant site.

Figure 4.

Electrophoretic mobility shift assay of −216G/T probes and nuclear proteins from MDA-MB-231 cell line (A) or pure Sp1 protein (B). Higher affinity of either nuclear proteins or Sp1 to the T allele probe compared with the G allele probe. The pattern could be altered by 100-fold excess of unlabeled competitor. A, Sp1 consensus probe was also used to be an internal control and show the position of DNA-protein shifting.

Figure 4.

Electrophoretic mobility shift assay of −216G/T probes and nuclear proteins from MDA-MB-231 cell line (A) or pure Sp1 protein (B). Higher affinity of either nuclear proteins or Sp1 to the T allele probe compared with the G allele probe. The pattern could be altered by 100-fold excess of unlabeled competitor. A, Sp1 consensus probe was also used to be an internal control and show the position of DNA-protein shifting.

Close modal

Haplotypes of −216G/T-191C/A Were Associated with EGFR mRNA Expression In vivo. We selected human fibroblast cells (which express EGFR) to evaluate the association between −216G/T-191C/A haplotypes and EGFR transcription. According to previous reports, there were multiple transcription initiation sites in the EGFR promoter (6, 9), whereas the major site for in vivo transcription was at position −260 (Fig. 1; ref. 9). Thus, positions −216 and −191 would be present in most EGFR mRNA sequences. We therefore chose 10 cell lines with diplotype G-C/T-C for the two polymorphisms, so that we could potentially detect a difference of expression level between mRNA carrying T-C haplotype and G-C haplotype within the same cell.

As a result, a significant deviation of the average relative ratio from the hypothetical ratio 1:1 was observed (mean R, 1.39 ± 0.12; 95% confidence interval, 1.11-1.67; P < 0.02), demonstrating that EGFR mRNA derived from the T-C haplotype was ∼40% higher than that from the G-C haplotype. This finding indicates that the −216G/T variant also has a strong impact on EGFR transcription in vivo.

In addition to the allelic imbalance, we also evaluated the relative expression of EGFR among the above three human cell lines by real-time PCR. Interestingly, the EGFR levels among these cells were in agreement with their diplotypes, with a dramatically high level of EGFR in MDA-MB-231 cells but ∼6-fold less in HEK-293 and the lowest in MCF-7 (Fig. 3B).

In this study, we discovered 12 SNPs in the EGFR 5′-regulatory region. Some of them occurred with a high frequency, implying a highly polymorphic regulatory region of human EGFR. We subsequently showed that a common polymorphism in the EGFR essential promoter region was associated with altered promoter activity and gene expression both in vitro and in vivo. To our knowledge, this is the first report describing SNPs in the EGFR regulatory region, with evidence for a functional cis-acting common polymorphism in the promoter region.

SNPs usually modulate gene transcription by interacting with trans-acting elements (27–31). The EGFR regulatory region, like that of many other housekeeping genes, has a high GC content and multiple transcriptional start sites but no TATA box. For such genes, the promoters are often activated by Sp family proteins (32, 33). The pivotal role of Sp1 in EGFR promoter activity has been well described (5, 9, 21, 22). Of the 12 SNPs we identified, two of them (−216G/T and −191C/A) are located in an essential region where multiple protein factors and transcriptional start sites were identified (8, 13, 14). The −216G/T polymorphism is in one of four Sp1 recognition sites in the promoter, whereas −191C/A is 4 bp upstream of one of six transcriptional start sites (6, 9). Therefore, it would be not surprising that these polymorphisms might give a modification onto the promoter activity. On the other hand, EGFR intron 1 is >120 kb, suggesting that the sequences around exon 1 may have a separate effect on gene regulation. Through LD analysis, −216G/T is not linked to other polymorphisms in this region, including the (CA)n polymorphism in intron 1, indicating that the function of −216G/T would be independent of any effect of this intronic microsatellite. Through the studies here, our data confirmed the requirement of Sp1 for EGFR transcription and suggested an effect of −216G/T on EGFR regulation. This effect was also independent of cell type (cancer cell lines or transformed human primary cells) and EGFR expression level. However, it is still possible that other functional cis-acting elements upstream or downstream of the 5′ sequences would also cooperate with the−216G/T in vivo. More extensive studies of EGFR may provide additional insight into the functionality of the −216 SNP.

Previous studies by Gebhardt et al. (23) have shown that the dinucleotide (CA)n polymorphism in the intron 1 of EGFR (near the downstream enhancer), ranging from 14 to 21 repeats, seemed to regulate EGFR expression. The longer allele with 21 repeats showed an 80% reduction of gene expression compared with the shorter allele with 16 repeats (23, 24). Significant ethnic differences in allelic frequency of this polymorphism were reported recently (25). However, our data suggest that this polymorphism is in relatively high LD with five other common SNPs in the 5′-regulatory region (Fig. 2C) and thus may not be functional by itself. More importantly, the previous articles demonstrating a functional effect of the (CA)n polymorphism were flawed in that the authors used a methodology (heteroduplex analysis of a 4.5-kb fragment) that was unable to detect the variants reported here. Furthermore, although there is no LD between −216G/T and (CA)n, there is a significant enrichment of longer alleles (corresponding to lower EGFR expression) on the chromosomes carrying −216G-191C haplotype in our samples. For example, among African American population, 34.6% longer alleles (n ≥ 20) were clustered with G-C haplotype compared with only 2.9% with T-C haplotype. Among Caucasians and Asians, these numbers were 19.9% versus 10.1% and 43.5% versus 0%, respectively (overall χ2 = 12.3; df = 1; P < 0.001). Although our data support the hypothesis that the −216 SNP has functional importance, studies in intact cellular systems will provide more credible evidence to support this suggestion, as exemplified by a recently presented association study of the (CA)n polymorphism (34). Studies are ongoing with fibroblast cell lines from unrelated individuals, which will permit analysis of the relationship of the polymorphisms of interest to various phenotypes, including susceptibility to EGFR inhibitors.

Our findings have several implications. First, EGFR overexpression has been associated with adverse disease stage, prognosis, survival, and response to chemotherapy in a variety of human tumors. Although the overexpression of EGFR has generally been correlated with gene amplification, the level of EGFR is still primarily regulated by the abundance of its mRNA (35–37). Therefore, variations in both cis- and trans-acting elements would be relevant to the variable expression of EGFR. In our study, the −216T was associated with a 40% increase in EGFR expression in vivo compared with −216G. In addition, the −216T would be expected to result in a stronger correlation between Sp1 and EGFR expression, potentially resulting in a different signaling network. Thus, it is likely that −216G/T may at least partly contribute to the variability of EGFR expression in malignant cells and may potentially influence the cell's dependence on EGFR.

Second, EGFR expression seems to manifest interindividual variability, which was correlated with −216G/T genotype. EGFR is now an attractive target for treatment and prevention of cancer. Studies of EGFR-targeting cancer therapies have shown encouraging results with three EGFR inhibitors: cetuximab, a monoclonal antibody with affinity to the extracellular domain of EGFR, and two small EGFR-specific tyrosine kinase inhibitors, gefitinib and erlotinib (1, 38–42). Two recent studies have shown that somatic mutations in the EGFR tyrosine kinase domain were associated with clinical response of non–small cell lung cancer to gefitinib (43, 44). A more recent study found a similar association with erlotinib as well (45). These gain-of-function mutations were deemed to structurally alter the catalytic pocket of the tyrosine kinase and thus enhance its sensitivity to both ATP and its competitors like gefitinib. Functional assays confirmed that these mutants could activate an antiapoptotic pathway (46).

However, it is likely that other factors also contribute to the variability in response to EGFR inhibitors. The identified somatic mutations were not able to account for all responders with lung cancer. Putting the data from the three studies together, only 81%oflung cancer patients taking gefitinib or erlotinib and experiencing partial responses or marked clinical improvement harbored EGFR somatic mutations (45). Furthermore, not all gefitinib-sensitive patients have mutated EGFR(44). Hence, in spite of strong association, the somatic mutations within EGFR are neither sufficient nor necessary for the drug response. Meanwhile, it was recently suggested that the mutations are not relevant to other tumor types (47), although there is evidence of activity in other diseases such as colon cancer and brain tumors (48). It is also unclear why some patients are more susceptible to somatic mutations in EGFR, with the interethnic differences being of particular interest, possibly suggesting a genetic basis for this susceptibility, either in EGFR or in other genes. Additionally, the occurrence of skin rash has also been associated with antitumor response (49, 50), suggesting that both antitumor response and toxicity to EGFR inhibitors may have a common genetic etiology. This hypothesis is supported by the preliminary results of Perea et al. (34). Additional pharmacogenetic studies are clearly warranted, particularly in conjunction with large phase III studies that include EGFR inhibitors in non–small cell lung cancer.

Grant support: Pharmacogenetics of Anticancer Agents Research Group (http://pharmacogenetics.org), NIH/National Institute of General Medical Sciences grant U01GM61393; William F. O'Connor Foundation; and PharmGKB (http://pharmgkb. org/), NIH/National Institute of General Medical Sciences grant U01GM61374.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank Drs. Robert Tjian and Erin G. Schuetz for generously providing the pPac-Sp1 vector, Donna L. Fackenthal for kind help on experiments of single base extension, and Deborah L. Stoit for assistance.

1
Grandis JR, Sok JC. Signaling through the epidermal growth factor receptor during the development of malignancy.
Pharmacol Ther
2004
;
102
:
37
–46.
2
Weiss WA, Burns MJ, Hackett C, et al. Genetic determinants of malignancy in a mouse model for oligodendroglioma.
Cancer Res
2003
;
63
:
1589
–95.
3
Brandt R, Eisenbrandt R, Leenders F, et al. Mammary gland specific hEGF receptor transgene expression induces neoplasia and inhibits differentiation.
Oncogene
2000
;
19
:
2129
–37.
4
Brandt B, Hermann S, Straif K, Tidow N, Buerger H, Chang-Claude J. Modification of breast cancer risk in young women by a polymorphic sequence in the EGFR gene.
Cancer Res
2004
;
64
:
7
–12.
5
Ishii S, Xu YH, Stratton RH, Roe BA, Merlino GT, Pastan I. Characterization and sequence of the promoter region of the human epidermal growth factor receptor gene.
Proc Natl Acad Sci U S A
1985
;
82
:
4920
–4.
6
Johnson AC, Ishii S, Jinno Y, Pastan I, Merlino GT. Epidermal growth factor receptor gene promoter. Deletion analysis and identification of nuclear protein binding sites.
J Biol Chem
1988
;
263
:
5693
–9.
7
McInerney JM, Wilson MA, Strand KJ, Chrysogelos SA. A strong intronic enhancer element of the EGFR gene is preferentially active in high EGFR expressing breast cancer cells.
J Cell Biochem
2001
;
80
:
538
–49.
8
Maekawa T, Imamoto F, Merlino GT, Pastan I, Ishii S. Cooperative function of two separate enhancers of the human epidermal growth factor receptor proto-oncogene.
J Biol Chem
1989
;
264
:
5488
–94.
9
Kageyama R, Merlino GT, Pastan I. Epidermal growth factor (EGF) receptor gene transcription: requirement for Sp1 and an EGF receptor-specific factor.
J Biol Chem
1988
;
263
:
632936
.
10
Nishi H, Senoo M, Nishi KH, et al. p53 Homologue p63 represses epidermal growth factor receptor expression.
J Biol Chem
2001
;
276
:
41717
–24.
11
Nishi H, Nishi KH, Johnson AC. Early growth response-1 gene mediates up-regulation of epidermal growth factor receptor expression during hypoxia.
Cancer Res
2002
;
62
:
827
–34.
12
Gonzalez EA, Disthabanchong S, Kowalewski R, Martin KJ. Mechanisms of the regulation of EGF receptor gene expression by calcitriol and parathyroid hormone in UMR 106-01 cells.
Kidney Int
2002
;
61
:
1627
–34.
13
Kageyama R, Merlino GT, Pastan I. A transcription factor active on the epidermal growth factor receptor gene.
Proc Natl Acad Sci U S A
1988
;
85
:
5016
–20.
14
Chen LL, Clawson ML, Bilgrami S, Carmichael GA. sequence-specific single-stranded DNA-binding protein that is responsive to epidermal growth factor recognizes an S1 nuclease-sensitive region in the epidermal growth factor receptor promoter.
Cell Growth Differ
1993
;
4
:
975
–83.
15
Hou X, Johnson AC, Rosner MR. Induction of epidermal growth factor receptor gene transcription by transforming growth factor β1: association with loss of protein binding to a negative regulatory element.
Cell Growth Differ
1994
;
5
:
801
–9.
16
Hou X, Johnson AC, Rosner MR. Identification of an epidermal growth factor receptor transcriptional repressor.
J Biol Chem
1994
;
269
:
4307
–12.
17
Kageyama R, Pastan I. Molecular cloning and characterization of a human DNA binding factor that represses transcription.
Cell
1989
;
59
:
815
–25.
18
Kageyama R, Merlino GT, Pastan I. Nuclear factor ETF specifically stimulates transcription from promoters without a TATA box.
J Biol Chem
1989
;
264
:
15508
–14.
19
Reed AL, Yamazaki H, Kaufman JD, Rubinstein Y, Murphy B, Johnson, AC. Molecular cloning and characterization of a transcription regulator with homology to GC-binding factor.
J Biol Chem
1998
;
273
:
21594
–602.
20
Wilson MA, Chrysogelos SA. Identification and characterization of a negative regulatory element within the epidermal growth factor receptor gene first intron in hormone-dependent breast cancer cells.
J Cell Biochem
2002
;
85
:
601
–14.
21
Xu J, Thompson KL, Shephard LB, Hudson LG, Gill GN. T3 receptor suppression of Sp1-dependent transcription from the epidermal growth factor receptor promoter via overlapping DNA-binding sites.
J Biol Chem
1993
;
268
:
16065
–73.
22
Grinstein E, Jundt F, Weinert I, Wernet P, Royer HD. Sp1 as G1 cell cycle phase specific transcription factor in epithelial cells.
Oncogene
2002
;
21
:
1485
–92.
23
Gebhardt F, Zanker KS, Brandt B. Modulation of epidermal growth factor receptor gene transcription by a polymorphic dinucleotide repeat in intron 1.
J Biol Chem
1999
;
274
:
13176
–80.
24
Buerger H, Gebhardt F, Schmidt H, et al. Length and loss of heterozygosity of an intron 1 polymorphic sequence of egfr is related to cytogenetic alterations and epithelial growth factor receptor expression.
Cancer Res
2000
;
60
:
854
–7.
25
Liu W, Innocenti F, Chen P, Das S, Cook EH Jr, Ratain MJ. Interethnic difference in the allelic distribution of human epidermal growth factor receptor intron 1 polymorphism.
Clin Cancer Res
2003
;
9
:
1009
–12.
26
Courey AJ, Tjian R. Analysis of Sp1 in vivo reveals multiple transcriptional domains, including a novel glutamine-rich activation motif.
Cell
1988
;
55
:
887
–98.
27
Baseggio L, Bartholin L, Chantome A, Charlot C, Rimokh R, Salles G. Allele-specific binding to the −308 single nucleotide polymorphism site in the tumour necrosis factor-α promoter.
Eur J Immunogenet
2004
;
31
:
15
–9.
28
Gazzoli I, Kolodner RD. Regulation of the human MSH6 gene by the Sp1 transcription factor and alteration of promoter activity and expression by polymorphisms.
Mol Cell Biol
2003
;
23
:
7992
–8007.
29
Harendza S, Lovett DH, Panzer U, Lukacs Z, Kuhnl P, Stahl RA. Linked common polymorphisms in the gelatinase a promoter are associated with diminished transcriptional response to estrogen and genetic fitness.
J Biol Chem
2003
;
278
:
20490
–9.
30
Stevens A, Soden J, Brenchley PE, Ralph S, Ray DW. Haplotype analysis of the polymorphic human vascular endothelial growth factor gene promoter.
Cancer Res
2003
;
63
:
812
–6.
31
Mann V, Hobson EE, Li B, et al. A COL1A1 Sp1 binding site polymorphism predisposes to osteoporotic fracture by affecting bone density and quality.
J Clin Invest
2001
;
107
:
899
–907.
32
Dynan WS, Sazer S, Tjian R, Schimke RT. Transcription factor Sp1 recognizes a DNA sequence in the mouse dihydrofolate reductase promoter.
Nature
1986
;
319
:
246
–8.
33
Vallian S, Chin KV, Chang KS. The promyelocytic leukemia protein interacts with Sp1 and inhibits its transactivation of the epidermal growth factor receptor promoter.
Mol Cell Biol
1998
;
18
:
7147
–56.
34
Perea S, Oppenheimer D, Amador M, et al. Proceedings of American Society of Clinical Oncology, abstract 3005. Genotypic bases of EGFR inhibitors pharmacological actions.
J Clin Oncol
2004
;
22
.
35
Haley J, Whittle N, Bennet P, Kinchington D, Ullrich A, Waterfield M. The human EGF receptor gene: structure of the 110 kb locus and identification of sequences regulating its transcription.
Oncogene Res
1987
;
1
:
375
–96.
36
Merlino GT, Ishii S, Whang-Peng J, et al. Structure and localization of genes encoding aberrant and normal epidermal growth factor receptor RNAs from A431 human carcinoma cells.
Mol Cell Biol
1985
;
5
:
1722
–34.
37
Merlino GT, Xu YH, Richert N, et al. Elevated epidermal growth factor receptor gene copy number and expression in a squamous carcinoma cell line.
J Clin Invest
1985
;
75
:
1077
–9.
38
Glover KY, Perez-Soler R, Papadimitradopoulou VA. A review of small-molecule epidermal growth factor receptor-specific tyrosine kinase inhibitors in development for non-small cell lung cancer.
Semin Oncol
2004
;
31
:
83
–92.
39
Kim ES, Vokes EE, Kies MS. Cetuximab in cancers of the lung and head & neck.
Semin Oncol
2004
;
31
:
61
–7.
40
Laskin JJ, Sandler AB. Epidermal growth factor receptor: a promising target in solid tumours.
Cancer Treat Rev
2004
;
30
:
1
–17.
41
Lu C, Speers C, Zhang Y, et al. Effect of epidermal growth factor receptor inhibitor on development of estrogen receptor-negative mammary tumors.
J Natl Cancer Inst
2003
;
95
:
1825
–33.
42
Desai AA, Innocenti F, Ratain MJ.
Pharmacogenomics: road to anticancer therapeutics nirvana? Oncogene
2003
;
22
:
6621
–8.
43
Paez JG, Janne PA, Lee JC, et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy.
Science
2004
;
304
:
1497
–500.
44
Lynch TJ, Bell DW, Sordella R, et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib.
N Engl J Med
2004
;
350
:
2129
–39.
45
Pao W, Miller V, Zakowski M, et al. EGF receptor gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib.
Proc Natl Acad Sci U S A.
2004
;
101
:
13306
–11.
46
Sordella R, Bell DW, Haber DA, Settleman J. Gefitinib-sensitizing EGFR mutations in lung cancer activate anti-apoptotic pathways.
Science
2004
;
305
:
1163
–7.
47
Lee JW, Soung YH, Kim SY, et al. Absence of EGFR mutation in the kinase domain in common human cancers besides non-small cell lung cancer. Int J Cancer 2004. Epub ahead of print.
48
Arteaga CL. Selecting the right patient for tumor therapy.
Nat Med
2004
;
10
:
577
–8.
49
Cohen EE, Rosen F, Stadler WM, et al. Phase II trial of ZD1839 in recurrent or metastatic squamous cell carcinoma of the head and neck.
J Clin Oncol
2003
;
21
:
1980
–7.
50
Perez-Soler R.
Can rash associated with HER1/EGFR inhibition be used as a marker of treatment outcome? Oncology (Huntingt)
2003
;
17
:
23
–8.