Abstract
Background: The molecular mechanisms for the genome-wide association studies (GWAS)-identified prostate cancer (PCa) risk-associated single-nucleotide polymorphisms (SNP) remain largely unexplained. One recent finding that the PCa risk SNPs are enriched in genomic regions containing androgen receptor (AR)-binding sites has suggested altered AR signaling as a potentially important mechanism.
Methods: To explore novel associations by leveraging this knowledge, we utilized a meta-analysis previously done over SNPs harbored in ChIP-on-chip identified AR-binding genomic regions using the GWAS data from the Johns Hopkins Hospital (JHH) and the Cancer Genetic Markers of Susceptibility (CGEMS) study, and subsequently evaluated the top associations in a third population from the CAncer of the Prostate in Sweden (CAPS) study.
Results: One SNP (rs4919743: G>A), located at the KRT8 locus at 12q13.13 which encodes a keratin protein (K8) long used as a prostate epithelial malignancy marker and implicated in the tumorigenesis of several cancer types, was identified to be associated with PCa risk. The frequency of its minor “A” allele was consistently higher in PCa cases than in controls in all three study populations, with a combined OR of 1.22 (95% CI: 1.13–1.32) and an overall P value of 4.50 × 10−7 (Bonferroni corrected, P = 0.006).
Conclusion: We have identified a novel genetic locus that is associated with PCa risk.
Impact: This study illustrated the great potential of prior biological knowledge in facilitating the search for novel disease-associated genetic loci. This finding warrants further replication in other studies. Cancer Epidemiol Biomarkers Prev; 20(11); 2396–403. ©2011 AACR.
Introduction
Prostate cancer (PCa) is the most prevalent nonskin malignancy in men in Western countries, and is one of the leading causes of cancer mortality (1). PCa represents one of the most heritable types of cancers, with inherited genetic factors making a significant contribution to its susceptibility (2). Tremendous efforts have been invested to identify these PCa risk-associated genetic factors. To date, more than 30 single-nucleotide polymorphisms (SNPs) have been identified to be reproducibly associated with PCa predisposition, thanks primarily to the successful application of genome-wide association studies (GWAS; refs. 3–16).
It is of note that among all established PCa risk SNPs, most are located within noncoding genomic regions, leaving the question wide open as to which molecular mechanisms account for these gene desert-localized genetic variants that are associated with PCa susceptibility. Recently, through a genome-wide survey over the SNPs located on the 22,447 androgen receptor (AR)-binding sites (MIM# 313700), previously identified by chromatin immunoprecipitation combined with tiled oligonucleotide microarrays (ChIP-on-chip; ref. 17), our group has suggested the AR signaling pathway as a potentially important mechanism explaining how PCa-associated SNPs may act to influence PCa risk (18, 19). In these reports, we showed that the PCa risk-associated SNPs are significantly enriched in the AR-binding genomic regions, and that as many as one-third of the consensual PCa risk SNPs (11 of 33, as before December, 2009) lie in regions containing AR-binding sites, among which notably include 5 SNPs (rs16901979, rs620861, rs1447295, rs1859962, and rs9623117) that are located in the gene desert regions such as 8q24, 17q24.3, and 22q13.
There have been mounting evidences that AR plays a crucial role in the PCa initiation and progression (20). A member of the steroid receptor subclass of nuclear receptors, AR functions primarily as a ligand-dependent transcription factor. Upon binding of androgens, which include testosterone, and its more potent metabolite 5α-dihydrotestosterone (DHT), AR is activated and the activated AR dimerizes, translocates into nuclei, binds DNA at specific sequences, and ultimately results in transcriptional up- and downregulation of its downstream genes. The critical role of AR signaling in the oncogenesis of PCa is evidenced by the observation that PCa is generally androgen dependent (21), and is further supported by the ∼25% reduction in PCa risk seen in 2 large clinical trials testing agents which block androgen activation (22, 23), and by the successful application of androgen ablation therapies as the mainstay of treatment for advanced diseases that are hormone dependent (24). Although androgen-dependent PCa almost invariably progresses into an androgen-independent late stage, accumulating evidences support a model that this process relies on the reactivation of AR activity (24, 25). Combining these findings with ours, it seems a plausible hypothesis that risk SNPs located in putative AR-binding sites might change the affinity of the androgen-AR complexes to their binding sequences, which in turn may result in changes of expression of its downstream genes, ultimately leading to modification of PCa risk.
In this study, we reported our discovery of a novel PCa susceptibility locus at 12q13.13, through our further exploration of SNPs that are located within the genomic regions containing AR-binding sites. A meta-analysis over the SNPs harbored in AR-binding regions was done using the GWAS data from the Johns Hopkins Hospital (JHH) study and the Cancer Genetic Markers of Susceptibility (CGEMS). The top associated SNPs were subsequently tested for their associations with PCa risk in an independent population from the Cancer of the Prostate in Sweden (CAPS) study.
Materials and Methods
Study populations
The 2 primary GWAS populations were from the PCa GWAS study carried out by JHH and from Stage 1 of the National Cancer Institute CGEMS study. The JHH population included 1,964 PCa cases, whose clinical characteristics have been described previously (18), and 3,172 control subjects, which were from an independent Illumina iControlDB (iControls) dataset (26). The CGEMS population included 1,172 PCa cases and 1,157 control subjects (5). The genotype and phenotype data of this study are publicly available and our use of the data was approved by CGEMS. Our confirmation population was from the CAPS study, which included 2,899 PCa cases and 1,722 control subjects and has been described in great detail elsewhere (27). The research ethics committees at Wake Forest University School of Medicine and the Karolinska Institute approved the study. All of the study subjects were of European ancestry.
GWAS genotyping data, imputation, and quality control
Genotyping of SNPs in the JHH case population and in the iControls population was done using the Illumina 610K chip, and Illumina Hap300/Hap550 Chips respectively. For all 3 study populations, imputation of all the known SNPs that are catalogued in HapMap Phase II (28) was done by the IMPUTE computer program (29) with a posterior probability of 0.9 as a threshold to call genotypes. The following quality control criteria were used to filter SNPs: Minor allele frequency (MAF) < 0.01, Hardy–Weinberg equilibrium < 0.001 and call rate < 0.90.
Genotyping for the confirmation study
A subset of SNPs was genotyped using the MassArray System from Sequenom. PCR and extension primers for these SNPs were designed using the MassARRAY Assay Design 3.0 software. PCR and extension reactions were performed according to the manufacturer's instructions, and extension product sizes were determined by mass spectrometry using the Sequenom iPLEX system. Duplicates and water samples, to which the technician was blind, were included in each 96-well plate as PCR negative controls. The genotype call rates of these SNPs were > 98% and the average concordance rate between samples was > 99%.
SNPs within ChIP-on-chip detected AR-binding site regions
The data on the 22,447 putative AR-binding genomic regions across the genome discovered by ChIP-on-chip analysis in 2 PCa cell lines are publically available (17). The experimentally determined AR-binding regions range from 299 to 5,554 base pairs (bp), with a median size of 911 bp. The search for all known SNPs harbored in these AR-binding regions was based on the HapMap database (Build 36).
Statistical analysis
Allele frequency differences between case patients and control subjects were tested for each of the SNPs under investigation using a χ2 test with 1 degree of freedom. Allelic OR and 95%CI were estimated based on a multiplicative model. Due to potential population stratification in the JHH study, the logistic regression analysis was carried out to reduce the spurious association results by adjusting for the top 5 eigenvectors that were estimated using EIGENSOFT software (30). Results from multiple case–control populations were combined using a Mantel–Haenszel model in which the populations were allowed to have different population frequencies for alleles but were assumed to have a common OR. The homogeneity of ORs among different study populations was tested using Breslow–Day χ2 test. SNP-SNP interactions were tested by including both SNPs and an interaction term (product of 2 SNPs) in a logistic regression model. An additive genetic model was used in the analysis.
Reported PCa risk-associated SNPs by GWAS
From PCa GWAS reported before December 2009, a total of 33 PCa risk-associated SNPs were selected for comparison in this study, based on the selection criterion that the associations with PCa exceeded genome-wide significance levels in their initial reports (P < 10−7) which had also been replicated in independent study populations. The linkage disequilibrium (LD) blocks for these risk SNPs were also inferred to define PCa risk-associated genomic regions, based on the CEU genotype data from HapMap release#27 (Phase II + Phase III). A LD block was defined as a set of SNPs within a genomic region of 1,000 kb with pairwise r2 value ≥ 0.5, and was estimated using the CLUMP function of the PLINK software (31). After exclusion of one SNP (rs16902094), the data of which is unavailable in the HapMap database, a total of 32 PCa susceptibility associated LD blocks were identified. The SNP list and pairwise r2 values of the PCa risk-associated SNPs have been described (18).
Results
A genome-wide survey over the 22,447 ChIP-on-chip detected AR-binding site regions revealed a total of 18,401 SNPs that are located on AR-binding regions, based on the HapMap database (Build 36), among which there were 13,899 SNPs, found to reside in 8,189 AR-binding regions, that were commonly found from both the CGEMS and JHH populations to be directly genotyped or indirectly imputed and successfully passed our quality control criteria (Supplementary Table S1). The allele frequency differences between PCa cases and controls were tested for each of these SNPs in each individual and their combined population. A total of 46 AR-binding region harboring SNPs were found to be associated with PCa risk at P < 0.001 in the combined GWAS data from JHH and CGEMS studies (Table 1). Given that some of these SNPs are in LD, SNPs are grouped into LD blocks (r2 ≥ 0.5). Among the 29 LD blocks in which these 46 SNPs are located, there are notably 6 (20.7%, 6/29) blocks, containing 15 (30.6%, 15/46) SNPs, that overlap with the 32 PCa risk-associated LD blocks derived from the consensual PCa associated SNPs (Table 1). These observations are similar to those reported in our previous article (18) although different quality control criteria were used.
To confirm the above association results, we genotyped 23 candidate SNPs in the CAPS study population, which represent the tagging SNPs for the 23 of the aforementioned 29 AR-binding regions containing LD blocks. The SNPs in the remaining 6 LD blocks that contain at least one of the established PCa risk-associated SNPs were reported in CAPS population in the previous study (27) and thus were not examined further in this study. Except for one SNP (rs8087095), which failed our quality control criteria, the allelic test was performed to examine the association of each of the remaining 22 SNPs with the risk of PCa. As shown in Table 2, one SNP (rs4919743: G>A at 12q13) was significantly associated with PCa susceptibility (P = 7.17 × 10−4), with its minor “A” allele more frequently found in PCa cases than in control subjects (OR = 1.22, 95% CI = 1.09–1.37) in the CAPS population. The association remained significant after Bonferroni correction for multiple comparisons (P = 0.016). The direction of association was also consistent with that in the other 2 GWAS populations (OR = 1.21, 95% CI = 1.06–1.37 for JHH, and OR = 1.22, 95% CI = 1.02–1.46 for CGEMS, Table 1). Age adjustment did not seem to alter the significance of the associations across the 3 populations (data not shown). It is noteworthy that when all 3 populations were combined, the association of rs4919743 with PCa risk (P = 4.50 × 10−7, OR = 1.22, 95% CI = 1.13–1.32, Table 3) remained significant after Bonferroni correction (P = 0.006). No heterogeneity was detected across the 3 study populations (P = 0.9974 for Breslow-Day χ2 test). Except for rs4919743, we failed to find any significant associations for the other SNPs in this study population.
We further evaluated whether rs4919743 interacts with any of the other 21SNPs. In the CAPS population, the most significant interaction (Pinteraction = 0.008, Table 4) was observed between rs4919743: G>A and rs4741304: G>T, which interestingly was also top ranked in the univariate analysis (Table 2). Further examination indicated that the interaction between these 2 SNPs was also nominally significant in the CGEMS population (Pinteraction = 0.05), but not so in the JHH population (Pinteraction = 0.37). The analysis over the combined population revealed that interaction between these 2 SNPs was significant (P = 0.002, Bonferroni corrected, P = 0.042; Table 4). No interaction effect was confirmed for other pairs in either CGEMS or JHH population (data not shown).
Discussion
Through a genome-wide survey over the SNPs that are located on the AR-binding genomic regions detected by the ChIP-on-chip technology, we identified a novel genetic locus at 12q13.13 (rs4919743) that was highly associated with PCa risk (P = 4.50 × 10−7) and remained significant after correction for multiple comparisons. The minor “A” allele of this SNP was consistently shown to confer higher risk for PCa across 3 independent study populations with a combined OR of 1.22 (95% CI: 1.13–1.32).
Rs4919743 is located within the keratin gene cluster at 12q13.13 which encodes ∼30 keratin proteins that constitute the cytoskeletonal proteins of the intermediate filament (IF). This SNP is in a 60-Kbp LD block spanning the whole gene region of KRT8 (MIM# 148060, Supplementary Fig. S1), which encodes a keratin protein (K8) that typically dimerizes with K18 (encoded by KRT18; MIM# 148070) to form IFs in simple single-layered epithelial cells. Due to its cell type and differentiation/functional status-specific expression in the epithelial cells, K8 has long been used as a diagnostic marker for a variety of epithelial malignancies including PCa (reviewed in ref. 32). The functional implication of KRT8 in epithelial tumorigenesis is suggested by the findings that K8 deficiency or overexpression can induce colorectal hyperplasia (33) and pancreatic neoplasia (34), respectively, and that the altered K8 expression or phosphorylation may be causally related to the invasion and metastasis of several cancer types (35–37). Given these revelations, it is thus possible that rs4919743, or another SNP in LD with it, may cause the change of structure, expression, or regulation of the K8 protein such that it confers susceptibility to PCa. It is noteworthy that 2 nonsynonymous SNPs (rs11170164 and rs641615) within KRT5 (MIM# 148040), which encodes another keratin protein (K5) expressed in the basal epithelial cells, have been associated with predisposition for basal cell carcinoma in a recent GWAS (38). Thus our findings, combined with this one, points to potentially important implications of keratin variants in the cancer etiologies. Despite reports that expression of KRT8 may be correlated with increased invasiveness of cancers, our analysis on rs4919743 failed to identify any associations of this SNP with the aggressiveness of PCa (Supplementary Table S2).
This hypothesis aside, the fact that rs4919743 is harbored within a ChIP-on-chip identified AR-binding site region suggests a potential involvement of the AR signaling in the risk-conferring mechanism. Recent advances reveal that many of the sites whereby AR acts on the androgen-responsive genes localize outside of the classic promoter regions; they instead predominantly lie in distal enhancer regions (39). It is thus possible that the genetic variant of rs4919743 or other SNPs harbored in the AR-binding site region (chromosome 12: from 51,595,612 to 51,596,236 bp, based on NCBI Build 36.1) identified by Wang and colleagues (17), may affect the binding/signaling of AR at this very region which in turn could influence the expression of certain AR-target genes that are causally related to PCa risk, although not necessarily in close proximity to the regulatory element. It should be noted that under this alternative hypothesis the possibility still holds that the causal event is channeled through the altered expression of KRT8 gene, as suggested by a report that KRT8 expression in prostate epithelia might be AR repressive (40). Additional studies are warranted to replicate this association and investigate the precise molecular mechanism conferring PCa susceptibility.
An evidence of epistasis between rs4919743 and rs4741304 at 9p23 was suggested statistically in 2 of the study populations we examined. Because no genes are located within the ∼200 kbp window upstream and downstream of rs4741304, it is difficult to make inferences about the biological mechanism. Given the fact that these 2 SNPs are both harbored in AR-binding regions, it seems possible that the yet-to-be-identified genes regulated by these 2 likely AR enhancers may be involved in a shared biological process. Alternatively, there may exist certain physical interactions between these 2 loci, mediated possibly by AR and its transcriptional regulators. More studies are required to elucidate the underlying mechanism.
This study has illustrated the great potential of prior biological knowledge in facilitating the search for novel disease-associated genetic variants. With more than 1 million genome-wide SNP markers being tested, current GWAS usually requires a stringent multiple-testing adjustment to control for false positive associations. Such a harsh penalty causes that only the most significant associations are established, with the majority of plausible associations still remaining buried within the statistical “noise” that is inherent to this approach (41). Here, by incorporating the experimentally determined AR-binding knowledge into the currently available PCa GWAS data, we have constrained multiple testing and detected a novel PCa risk-associated locus which was able to be confirmed in an independent study population but had not previously been detected. It should be mentioned that this association (P = 4.50 × 10−7), albeit did not reach the most conservative genome-wide significance level (P < 5 × 10−8), was strictly statistically significant given the much smaller number of SNPs (13,899) being tested in this study that are located in the AR-binding regions (Bonferroni corrected, P = 0.006.). It is noteworthy that the currently widely applied “pathway analysis,” that intends to better identify disease associated genetic loci by leveraging the available GWAS data, is essentially based on this very same concept. Although the pathway analysis utilizes previously curated pathway knowledge (KEGG, NCI, and Biocarta, etc.) and focuses on protein-encoding genes, the approach used in this study is based on experimentally obtained knowledge on genomic regions that are of relevance. Not only does his new approach widens the scope of pathway analysis by extending to noncoding genomic regions, but it also is more reliable because it is based on carefully controlled experiments, which has an advantage of being immune to selection biases in the pathway analysis. We expect more and more novel-yet-hidden associations to be uncovered by this approach.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant Support
The study is partially supported by the National Cancer Institute (R01CA129684 to J. Xu).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.