Abstract
Background: Chromosome 13q22.1 has previously been identified to be a susceptibility locus for pancreatic cancer in Chinese and European ancestry populations. This pleiotropy study aimed to identify novel variants in this region associated with susceptibility to different types of human cancer.
Method: To fine-map the 13q22.1 region, imputation analyses were conducted on the basis of the GWAS data of 2,031 esophageal squamous cell cancer (ESCC) cases and 2,044 controls and 5,930 SNPs (625 directly genotyped and 5,305 well imputed). Promising associations were then examined in ESCC (4,146 cases and 4,135 controls), gastric cardia cancer (1,894 cases and 1,912 controls), noncardia gastric cancer (1,007 cases and 2,243 controls), and colorectal cancer (1,111 cases and 1,138 controls). Fine mapping and biochemical analyses were further performed to elucidate the potential function of novel variants.
Results: Two novel variants, rs1924966 and rs115797771, were associated with ESCC risk (P = 1.37 × 10−10 and P = 2.32 × 10−10, respectively) and were also associated with risk of gastric cardia cancer (P = 0.0003 and P = 0.0018, respectively) but not gastric cancer and colorectal cancer. Fine-mapping revealed another SNP, rs58090485, in strong linkage disequilibrium with rs115797771 (r2 = 0.94). Functional analysis showed that this SNP disturbs a transcriptional repressor binding to the promoter region of KLF5, which might result in high constitutional expression of KLF5.
Conclusions: These results demonstrate that variants mapped on 13q22.1 are associated with the risk of different types of cancer.
Impact: 13q22.1 might serve as a biomarker for the identification of individuals at risk for ESCC and gastric cardia cancer. Cancer Epidemiol Biomarkers Prev; 24(11); 1774–80. ©2015 AACR.
Introduction
Genome-wide association study (GWAS) has been shown to be powerful and successful tool in the identification of genetic variants associated with susceptibility to human diseases or phenotypes. However, the majority of newly identified risk alleles confer a relatively small risk, ranging from 1.1 to 1.5, and the estimates of disease risk based on these established associations are a substantial improvement compared with previous risk prediction models (1–5). Taking esophageal squamous cell carcinoma (ESCC) as an example, four GWAS have identified 26 novel variants mapped on 14 different chromosomal regions, also showing effects not only by the genetic variants themselves but also by gene-drinking interactions on the risk of ESCC in Chinese populations (6–9). The area under the curve (AUC) for a risk model using GWAS-identified SNPs and four nongenetic factors (sex, age, smoking status, and drinking status) was 70.9%, with the improvement being only 7.0% compared with a model using the four nongenetic factors (10). It is believed that “missing heritability” exists and more genetic variants need to be identified through other new strategies.
Different types of human cancer might share genetic susceptibility factors. For example, on chromosome 8q24, a “gene desert” region, more than 20 loci have been identified to be associated with the risk of multiple cancers (11–16). Functional studies have clarified that these variants might affect an enhancer's effect on long-range regulation of MYC expression (17). Therefore, it seems to be a complementary strategy to identify novel variants for a specific cancer through fine-mapping of susceptibility loci that have been identified to be associated with other cancers.
Chromosome 13q22.1, in which KLF5 and KLF12 are located, has been shown to be associated with susceptibility to pancreatic cancer in both Chinese and European ancestry populations, and has also been reported to be associated with prostate cancer risk in a Japanese population (18–20). KLF5 is an important gene that is aberrantly expressed in multiple cancers in the digestive tract such as ESCC, gastric cancer, and colon cancer (21, 22). KLF12 has also been shown to be amplified in esophageal adenocarcinoma and gastric cancer (23, 24) and might play an oncogenic role in poorly differentiated gastric cancer progression. Five variants in this region have been shown to be associated with susceptibility to pancreatic cancer or prostate cancer, respectively. rs4885093 and rs9573163 have been reported to be associated with the risk of pancreatic cancer in the Chinese population and other two susceptibility variants for pancreatic cancer, rs9543325 and rs9564966, were identified in European ancestry (18, 19). In the Japanese population, rs9600079 has been found to be a prostate cancer susceptibility variant (20). To identify novel susceptibility variants on the 13q22.1, we fine-mapped the 13q22.1 region and examined the associations between five tag SNPs and the risk of four different cancers of the digestive tract, including ESCC, gastric cardia cancer, noncardia gastric cancer, and colorectal cancer. A series of biochemical assays were further conducted to elucidate the potential function of the novel identified variants.
Materials and Methods
Study subjects and genotyping analysis
A two-stage study was conducted to examine the associations between the variants at 13q22.1 and risk of the four different cancers in the digestive tract, including ESCC, gastric cardia cancer, noncardia gastric cancer, and colorectal cancer. In the imputation stage, we conducted imputation analyses based on the GWAS data of 2,031 ESCC cases with a response rate of 95% and 2,044 controls (8) to identify potential susceptibility variants. All these samples were genotyped using Affymetrix GeneChip Human Mapping 6.0 set in our previous GWAS studies. Five tag SNPs (rs1924966, rs1924956, rs115797771, rs9543527, and rs970040) associated with ESCC risk (P < 0.001) were then verified in three independent cohorts including 2,248 cases with a response rate of 94% and 2,238 controls in Replication I recruited in Beijing, 935 cases with a response rate of 91%, and 959 controls in Replication II recruited in Hebei province and 963 cases with a response rate of 92% and 938 controls in Replication III collected in Hubei province. To test whether these SNPs were also associated with risk of other cancers in the digestive tract, 1,894 cases with gastric cardia cancer with a response rate of 93% and 1,912 controls, 1,007 cases with noncardia gastric cancer with a response rate of 93% and 2,243 controls, and 1,111 cases with colorectal cancer with a response rate of 95% and 1,138 controls recruited in Beijing were investigated. These cases were recruited from the Han Chinese population through collaboration with multiple hospitals in Beijing, Hebei, and Wuhan province. All the controls were cancer-free individuals selected from a community nutritional survey in the same region during the same period as cases were collected. All the SNPs in the replication phase were genotyped using TaqMan assays platform (ABI 7900HT system, Applied Biosystems). Several genotyping quality controls were implemented for replication stage, including (i) the case and control samples were mixed in the plates, and persons who performed the genotyping assay were not aware of case or control status, (ii) positive and negative (no DNA) samples were included on every 384-well assay plate, and (iii) we further employed the direct sequencing of PCR products to replicate sets of 50 randomly selected, TaqMan-genotyped samples for rs1924966, rs115797771, and rs58090485; the accordance between the two methods was 100%.
Demographic characteristics including age and sex were obtained from the medical records of these individuals. Diagnosis of pancreatic ductal adenocarcinoma, ESCC, gastric cardia cancer, noncardia gastric cancer or colorectal cancer was confirmed histopathologically or cytologically by at least two local pathologists according to the World Health Organization classification. All the cancer cases had no previous diagnosis of another type of cancer. All the controls were selected on the basis of physical examination, health-screening program, or community cancer screening program. Informed consent was obtained from each subject at recruitment and this study was approved by the Institutional Review Board of the Chinese Academy of Medical Sciences Cancer Institute.
Imputation and linkage disequilibrium pattern analysis
To increase the spectrum of variants tested for association in the 13q22.1 region with risk of ESCC, we performed imputation using MaCH-Admix software to impute ungenotyped SNPs in a region of 2 Mb centered on rs4885093 based on the linkage disequilibrium (LD) and haplotypes information from 1000 Genomes Project November 2010 ASN (Asian) samples as the references (25). LD structures and haplotype block plots were generated using snp.plotter package in R.
Quantitative real-time PCR
Total RNA was isolated from surgically removed normal esophageal tissues adjacent to tumors in 67 patients with ESCC and then converted to cDNA using oligo (dT)15 primer and Superscipt II (Invitrogen). KLF5 RNA was measured by real-time quantitative reverse transcription-PCR in triplicate using ABI 7900HT Real-Time PCR system based on the SYBR-Green method. The measurement of individual KLF5 RNA expression was determined relative to that of GAPDH expression using a modification of the method described by Lehmann and Kreipe (26). The primer sequences used for detecting different RNAs are available upon request.
Luciferase assay
Three SNPs, rs58090485 A/−, rs3812852 A/G, and rs141391427 C/−, were located in the 5′-flanking region of KLF5. A 2,053 bp DNA fragment containing rs58090485A allele, rs3812852A allele, and rs141391427C allele were generated by PCR and subcloned into the pGL4.10[luc2] vector (Promega). The resultant plasmid was designated as p-[AAC]. Because of the perfect LD among rs58090485, rs3812852, and rs141391427, the p-[AAC] construct was then site specifically mutated to create four different constructs, p-[−AC], p-[AGC], p-[AA−], and p-[−G−], and each construct contains the other allele of the corresponding SNP, respectively. All the constructs were restriction mapped and sequenced to confirm their authenticity.
Three human ESCC cell lines, KYSE-30, KYSE-150, and KYSE-510, were gifts from Y. Shimada (First Department of Surgery, Faculty of Medicine, Kyoto University, Japan) and used for luciferase assays. All cell lines used in this study were regularly authenticated by morphologic observation and tested for absence of mycoplasma contamination (MycoAlert, Lonza Rockland). All cell lines were maintained in RPMI1640 medium with 10% FBS at 37°C in 5% CO2. We seeded 5 × 10−5 cells per well in 48-well plates and transfected them with empty pGL4.10[luc2] vector (a promoterless control), p-[AAC], p-[−AC], p-[AGC], p-[AA−], or p-[−G−] construct, respectively. pRL-SV40 plasmid (Promega) was cotransfected as a normalizing control. All transfections were carried out in triplicate and in three different ESCC cell lines. After 48 hours, cells were collected and analyzed for luciferase activity with the Dual-Luciferase Reporter Assay System (Promega).
Electrophoretic mobility shift assay
Synthetic double-stranded and 3′ biotin–labeled oligonucleotides corresponding to the rs58090485[A] and rs58090485[−] sequences and nuclear extracts obtained from KYSE-150 cells were incubated for 20 minutes using the Light Shift Chemiluminescent EMSA kit (Pierce). Reaction mixtures were separated by 8% PAGE, and products were detected by stabilized streptavidin–horseradish peroxidase conjugate (Pierce). For competition assays, unlabeled oligonucleotides at 10- or 100-fold molar excess were added to the reaction mixture before addition of the biotin-labeled probe.
Statistical analysis
Association analyses between 5,930 (625 directly genotyped and 5,305 well-imputed) SNPs and risk of ESCC in the imputation stage were performed using unconditional logistic regression with age, sex, smoking status, drinking status, and first three principal components from EIGENSTRAT, which was calculated in the previous ESCC GWAS (8) as covariates. Association analyses with risk of different cancers in the digestive tract system were adjusted for age, sex, smoking status, and drinking status but without the principal components, as there was no way to assess population stratification of the replication samples given the small number of selected SNPs typed. For colorectal cancer, we only adjusted for age and sex because the data on smoking status and drinking status were not available. The ORs were calculated for the minor allele of each SNP. The conditional association analyses were performed to identify the independent signals. For each locus, the associations between SNPs and risk of ESCC were conditioning on the most significant SNP. Among these five loci, we then conducted stepwise conditional analyses to examine the dependences among them.
In the imputation stage, 109 SNPs with P < 0.001 were selected for further replications. The Student t test was used to examine the differences in luciferase reporter gene expression, and the Mann–Whitney U test was used to assess differences in KLF5 transcript abundance with different genotypes. All statistical tests were carried out in a two-sided manner and a P value less than 0.05 was considered to be statistically significant and was reported without corrections for multiple testing.
Results
Characteristics of study subjects
This study consisted of four ESCC case–control sets including a total of 6,177 cases with ESCC and 6,179 controls recruited from three geographical regions. The select characteristics of these cases and controls such as sex, age, smoking status, and drinking status are shown in Supplementary Table S1. This study also consisted of a gastric cardia cancer case–control set (1,894 cases and 1,912 controls), a noncardia gastric cancer case–control set (1,007 cases and 2,243 controls), and a colorectal cancer case–control set (1,111 cases and 1,138 controls). The distributions of selected characteristics of these study subjects are shown in Supplementary Table S2.
Novel variants associated with risk of ESCC
In our previous GWAS on pancreatic cancer, the SNP rs4885093 was the most significant marker on 13q22.1 (18); however, this SNP was not associated with the risk of ESCC in the current study [OR, 1.05; 95% confidence interval (CI), 0.96–1.15; P = 0.2620]. To identify whether there are other variants potentially associated with the risk of ESCC in this chromosomal region, we imputed a 2 Mb region centered on rs4885093 based on the previous GWAS data of 2,031 cases with ESCC and 2,044 controls. After imputation, we were able to test 5,930 (625 directly genotyped and 5,305 well-imputed) SNPs (Supplementary Fig. S1). Association analyses showed that 109 SNPs were potentially associated with risk of ESCC (all P < 0.001; Supplementary Table S3). These SNPs were scattered at five different loci; the most significant one at each locus were rs1924966, rs1924956, rs115797771, rs9543527, and rs970040. Five SNPs were the tag SNPs on 13q22.1 and the other SNPs in each locus were in strong LD with them, respectively; the r2 values are shown in Supplementary Table S3. We then performed conditional association analyses to assess the dependence of variants in this region. When conditioning on the most significant tag SNP one by one, the association P values for the other SNPs in the same locus increased by at least five orders of magnitude, but the association results of SNPs in other four loci remained significant. All these results suggest that these five loci are likely to be the independent signals and each locus was marked by the top significant tag SNP, respectively (Supplementary Fig. S2).
We then performed replication of these five potentially associated SNPs, i.e., rs1924966, rs1924956, rs115797771, rs9543527, and rs970040, in three additional ESCC case–control sets. We found that only two SNPs, rs1924966 and rs115797771, were significantly associated with risk in the fast-track Replication I group (P = 0.0009 and P = 0.0005, respectively) while the other three were not (Supplementary Table S4). We then further examined these two significant SNPs in the Replication II and Replication III groups and the results showed a consistent significant association of them with risk of ESCC. The P values for rs1924966 and rs115797771 in the combined sample were 1.37 × 10−10 and 2.32 × 10−10, and the minor alleles for both SNPs showed a protective effect with odds ratios (OR) of 0.84 (95% CI, 0.80–0.89) and 0.69 (95% CI, 0.62–0.78), respectively (Table 1). All Hardy–Weinberg equilibrium P values were >0.05. Stratified analyses showed that the risks associated with rs1924966, rs115797771, and rs58090485 were not significantly different among subgroups according to smoking status or drinking status (Supplementary Table S5).
. | . | . | . | . | MAF . | . | . | |
---|---|---|---|---|---|---|---|---|
SNP . | Position . | Gene . | Location . | Phase . | Cases . | Controls . | OR (95% CI) . | P . |
rs1924966 | 73007053 | KLF5 | Upstream | Imputation | 0.35 | 0.40 | 0.82 (0.75–0.90) | 1.69 × 10−5 |
A > C | Replication I | 0.36 | 0.40 | 0.86 (0.79–0.94) | 0.0009 | |||
Replication II | 0.34 | 0.39 | 0.83 (0.73–0.96) | 0.0091 | ||||
Replication III | 0.36 | 0.39 | 0.86 (0.75–0.98) | 0.0255 | ||||
All replication | 0.36 | 0.40 | 0.85 (0.80–0.91) | 1.30 × 10−6 | ||||
Combined | 0.35 | 0.40 | 0.84 (0.80–0.89) | 1.37 × 10−10 | ||||
rs115797771 | 73638643 | KLF5 | Intron | Imputation | 0.04 | 0.06 | 0.66 (0.54–0.82) | 9.17 × 10−5 |
A > C | Replication I | 0.05 | 0.07 | 0.72 (0.60–0.86) | 0.0005 | |||
Replication II | 0.04 | 0.07 | 0.64 (0.48–0.87) | 0.0040 | ||||
Replication III | 0.05 | 0.06 | 0.70 (0.52–0.93) | 0.0141 | ||||
All replication | 0.05 | 0.07 | 0.69 (0.60–0.80) | 2.17 × 10−7 | ||||
Combined | 0.05 | 0.06 | 0.69 (0.62–0.78) | 2.32 × 10−10 | ||||
rs58090485 | 73631518 | KLF5 | Upstream | Imputation | 0.04 | 0.06 | 0.67 (0.55–0.83) | 0.0001 |
A > – | Replication I | 0.05 | 0.07 | 0.71 (0.59–0.85) | 0.0002 | |||
Replication II | 0.04 | 0.07 | 0.64 (0.48–0.86) | 0.0030 | ||||
Replication III | 0.05 | 0.06 | 0.68 (0.51–0.91) | 0.0103 | ||||
All replication | 0.05 | 0.07 | 0.69 (0.60–0.79) | 7.27 × 10−8 | ||||
Combined | 0.05 | 0.07 | 0.69 (0.62–0.77) | 1.23 × 10−10 |
. | . | . | . | . | MAF . | . | . | |
---|---|---|---|---|---|---|---|---|
SNP . | Position . | Gene . | Location . | Phase . | Cases . | Controls . | OR (95% CI) . | P . |
rs1924966 | 73007053 | KLF5 | Upstream | Imputation | 0.35 | 0.40 | 0.82 (0.75–0.90) | 1.69 × 10−5 |
A > C | Replication I | 0.36 | 0.40 | 0.86 (0.79–0.94) | 0.0009 | |||
Replication II | 0.34 | 0.39 | 0.83 (0.73–0.96) | 0.0091 | ||||
Replication III | 0.36 | 0.39 | 0.86 (0.75–0.98) | 0.0255 | ||||
All replication | 0.36 | 0.40 | 0.85 (0.80–0.91) | 1.30 × 10−6 | ||||
Combined | 0.35 | 0.40 | 0.84 (0.80–0.89) | 1.37 × 10−10 | ||||
rs115797771 | 73638643 | KLF5 | Intron | Imputation | 0.04 | 0.06 | 0.66 (0.54–0.82) | 9.17 × 10−5 |
A > C | Replication I | 0.05 | 0.07 | 0.72 (0.60–0.86) | 0.0005 | |||
Replication II | 0.04 | 0.07 | 0.64 (0.48–0.87) | 0.0040 | ||||
Replication III | 0.05 | 0.06 | 0.70 (0.52–0.93) | 0.0141 | ||||
All replication | 0.05 | 0.07 | 0.69 (0.60–0.80) | 2.17 × 10−7 | ||||
Combined | 0.05 | 0.06 | 0.69 (0.62–0.78) | 2.32 × 10−10 | ||||
rs58090485 | 73631518 | KLF5 | Upstream | Imputation | 0.04 | 0.06 | 0.67 (0.55–0.83) | 0.0001 |
A > – | Replication I | 0.05 | 0.07 | 0.71 (0.59–0.85) | 0.0002 | |||
Replication II | 0.04 | 0.07 | 0.64 (0.48–0.86) | 0.0030 | ||||
Replication III | 0.05 | 0.06 | 0.68 (0.51–0.91) | 0.0103 | ||||
All replication | 0.05 | 0.07 | 0.69 (0.60–0.79) | 7.27 × 10−8 | ||||
Combined | 0.05 | 0.07 | 0.69 (0.62–0.77) | 1.23 × 10−10 |
NOTE: P values are two sided and were calculated by an additive model in logistic regression analysis adjusted for sex, age, smoking status, and drinking status.
Abbreviation: MAF, minor allele frequency.
Association with risk of other three types of cancer in the digestive tract
We next examined whether these two SNPs were also associated with the risk of gastric cardia, noncardia gastric, and colorectal cancer. The minor alleles of rs1924966 A>C (OR, 0.84; 95% CI, 0.77–0.93; P = 0.0003) and rs115797771 A>C (OR, 0.73; 95% CI, 0.60–0.89; P = 0.0018) showed significant protective effects for gastric cardia cancer, which were similar to those observed for ESCC. However, we did not find any association between these two SNPs and the risk of noncardia gastric and colorectal cancer (all P > 0.05; Supplementary Table S6; Supplementary Fig. S3).
Functional analysis of significant variants
rs1924966 and rs115797771 are located 622 kb away from the transcriptional start site and intron region of KLF5, respectively. To study the regulatory effects of these two SNPs on KLF5 gene expression, we first examined the association between the genotypes of these two SNPs and the KLF5 RNA level in esophageal tissue samples. Although there was no significant association between the rs1924966 genotype and KLF5 RNA levels (Supplementary Fig. S4), we did find a significant association between KLF5 RNA levels and the rs115797771 genotype. Subjects with the rs115797771 AA genotype had significantly lower KLF5 RNA levels (mean ± SE) than those with the rs115797771 AC genotype [0.2003 ± 0.0156 (n = 50) vs. 0.2722 ± 0.0239 (n = 17), P = 0.0198; Supplementary Fig. S4).
Although rs115797771 is the most significant SNP in this high LD region, it might not necessarily be the functional variant because it is an intronic SNP. We therefore performed functional annotations for 35 SNPs identified in the imputation stage and also in high LD with rs115797771 using HaploReg v2 (Supplementary Table S7). Three SNPs, rs58090485, rs3812852, and rs141391427, in perfect LD (r2 = 1.00) with each other and in high LD (r2 = 0.94) with rs115797771 (Fig. 1), all showed significant associations with risk of ESCC (OR, 0.69; 95% CI, 0.62–0.77; P = 1.23 × 10−10) in the combined samples and also gastric cardia cancer (OR, 0.73; 95% CI, 0.60–0.88; P = 0.0012; Table 1 and Supplementary Table S6). Because all these three SNPs are located in the promoter region of KLF5 and previous studies have shown that many promoter histone marks (H3K4me3, H3K9ac, and H3K27ac) are located in this DNaseI-hypersensitivity region in multiple cell types, we therefore first examined whether these three SNPs would have an impact on KLF5 promoter activity using a set of luciferase reporter gene assays. We found that the p-[AAC] construct, which contains the KLF5 promoter with all three major alleles, drove significantly lower reporter gene expression compared with the p-[−G−] construct containing the KLF5 promoter with the minor alleles of the three SNPs (P < 0.0001). Furthermore, we compared three different constructs, p-[−AC], p-[AGC], and p-[AA−], each of which contains only one minor allele of rs58090485, rs3812852, or rs141391427 to identify the real functional SNP, and found that the p-[−AC] construct containing the rs3812852[−] allele drove significantly higher reporter gene expression than the p-[AAC] construct containing the rs58090485 [A] allele (P < 0.0001). However, the p-[AGC] and p-[AA−] constructs containing rs3812852 [G] and rs141391427 [−], respectively, did not drive significantly different reporter gene expression compared with the p-[AAC] construct (Fig. 2A). These results were consistent in three ESCC cell lines, and suggest that the rs58090485 SNP may have an impact on the regulation of KLF5 expression and needs to be further investigated.
Regulatory sequences with discrete alleles might influence gene expression upon binding of transcriptional activators or inhibitors that instruct their regulatory control. Therefore, we then examined whether the rs58090485A deletion changes the binding pattern of nuclear proteins using EMSA. As a result, we found that the binding pattern for the rs58090485 [A] allele differed from that for the rs58090485 [−] allele; one DNA–protein complex disappeared when the rs58090485 [−] probe was incubated with nuclear proteins from the KYSE-150 cell line (Fig. 2B, lane 7) compared with the rs58090485 [A] probe under the same experimental conditions (Fig. 2B, lane 2). Competition assays showed that the addition of 100-fold excess unlabeled rs58090485 [A] probe (Fig. 2B, lane 4) but not rs58090485 [−] probe (Fig. 2B, lane 5) to the reaction mixture markedly eliminated the DNA–protein complex formed by the interaction between the rs58090485[A] allele and nuclear proteins, indicating that binding is sequence-specific.
Discussion
In this study, we performed fine-mapping of a pancreatic cancer susceptibility region at 13q22.1 and conducted the association analyses between variants located in this region and the risk of four different cancers in the digestive tract. Two novel SNPs, rs1924966 and rs115797771, were found to be associated with the risk of ESCC and gastric cardia cancer, but not noncardia gastric cancer or colorectal cancer. Functional analysis suggests that rs58090485 located in the KLF5 promoter region, which is in high LD (r2 = 0.94) with rs115797771, had an impact on the regulation of KLF5 expression.
Previous GWAS identified numerous cancer susceptibility loci, but only a small proportion of the heritability can be explained by these discovered risk loci (27). Patterns of pleiotropic association have shown that key loci or shared pathways might affect multiple cancers, such as the 8q24 region (11–16) and the telomerase reverse transcriptase (TERT) gene region on chromosome 5 (14, 19, 28–30). Our previous GWAS identified only one locus tagged by rs4885093 on 13q22.1 that was associated with the risk of pancreatic cancer. In this study, we did not find any association between rs4885093 and the risk of ESCC; however, we found two novel SNPs, rs1924966 and rs115797771 that were significantly associated with the risk of ESCC and gastric cardia cancer but not the risk of noncardia gastric cancer and colorectal cancer. Epidemiologic studies suggest shared geographic distributions and environmental risk factors for both ESCC and gastric cardia cancer but not for noncardia gastric cancer and colorectal cancer in China (31, 32). On the other hand, a GWAS on ESCC and gastric cancer has also shown a shared genetic risk factor, PLCE1 variation, for both ESCC and gastric cardia cancer but not for noncardia gastric cancer (6). These results suggest that some different types of cancer have the same environmental and genetic susceptibility factors. The analytical strategy used in the current study may not only help to identify new risk genetic loci but also elucidate common etiologies between different cancers.
The most proximate gene on 13q22.1 is KLF5, which has complex roles in digestive tract carcinogenesis. In colon cancer, KLF5 acts as an important mediator of KRAS during intestinal carcinogenesis process (33–36). A similar oncogenic function of KLF5 has also been observed in the severity of premalignant lesions in human gastric carcinogenesis induced by H. pylori (37–40). Considering ESCC, however, KLF5 might play a tumor suppressor role. Downregulation of KLF5 might reduce its limitation on NOTCH1 activity and is sufficient on its own to transform primary human keratinocytes to form invasive tumors in the context of P53 mutation or loss of function (41). KLF5 might also activate the JNK pathway and lead to apoptosis and reduced cell survival (42). Our findings in the present study are consistent with the postulation that KLF5 acts as a tumor suppressor in ESCC carcinogenesis. Compared with individuals with the rs115797771 AA genotype, individuals with at least one protective C allele (AC or CC genotype) of rs115797771 showed higher expression of KLF5, suggesting that upregulation of KLF5 could protect individuals from ESCC.
The current study has several strengths. First, 13q22.1 was the only region identified in pancreatic cancer in both Chinese and European ancestry populations. The current study is the first association study to perform fine-mapping of this region and to identify novel risk variants for different types of cancer in the digestive tract. Previous GWAS used very stringent P values as the statistical significance level (P < 5 × 10−8 in most studies) to identify susceptibility loci associated with the risk of disease. Although this strategy might decrease false-positive findings, it might miss some true susceptibility loci (43). By using our strategy, three loci on 13q22.1 were found to be associated with the risk of different cancers, and two novel loci were associated with ESCC and gastric cardia cancer in addition to pancreatic cancer. Another major strength of our study is the two-phase design and case–control sets for SNP imputation and replication that were recruited from three different geographical regions, which would largely reduce false positive findings. We also characterized the function of rs58090485 SNP, making the association of this SNP with the risk of ESCC biologically plausible.
However, despite the aforementioned strengths, we also acknowledge several limitations of this study. First, because the GWAS data were available only for pancreatic cancer and ESCC in this study, we only explored the susceptibility SNPs in the 13q22.1 region with the risk of ESCC through an imputation approach in the imputation stage. Although we analyzed the two novel variants in gastric cardia cancer, noncardia gastric cancer, and colorectal cancer, it would be interesting to analyze other types of cancer in the digestion tract or other systems. Second, in the imputation stage, the associations between 5,930 SNPs and the risk of ESCC were analyzed and the P values of 109 SNPs were less than 0.001. After conditional and LD analyses, we selected five tag SNPs for further replications and functional analyses, but multiple corrections were not performed in this stage and we still cannot rule out the possibility of false positive results. Further investigations, including resequencing the 13q22.1 region in Asian or Caucasian populations, are still warranted. Third, in vitro functional analysis and expression data can only indirectly support this association. It would be better to include the expression levels of KLF5 in an rs115797771 CC cohort; however, we did not find any individual with this genotype in our cohort. In addition, while functional analyses showed that rs58090485 A/− disrupts a transcriptional inhibitor binding site, the identification of a nuclear protein bound to this site remains to be clarified. In addition, due to the lack of expression of KLF5 expression in controls, the relationships among rs115797771, KLF5 expression, and ESCC risk are still unknown and need to be further explored.
In conclusion, through fine-mapping of a potential susceptibility region for cancer combined with functional analysis, we have extended our GWAS results to the discovery of two novel susceptibility loci for ESCC. Among them, the rs58090485 A/− change may cause upregulation of the KLF5 gene, which in turn might inhibit carcinogenesis in ESCC. These results support our hypothesis that some types of human cancer might share the same genetic susceptibility factors in terms of gene or chromosomal region, which would extend our understanding of the genetic etiology of disease.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: J. Chang, X. Zhang, C. Wu, D. Lin
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): L. Wei, W. Tan, X. Zhang
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Chang, L. Wei, X. Miao, D. Yu, C. Wu
Writing, review, and/or revision of the manuscript: J. Chang, X. Zhang, C. Wu, D. Lin
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): X. Miao, W. Tan, C. Wu
Grant Support
This work was supported by the National High-Tech Research and Development Program of China (2014AA020601; to C. Wu) and National Basic Research Program of China (2013CB910301; to D. Lin).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.