Abstract
Background: Only a minority of the genetic components of prostate cancer risk have been explained. Some observed associations of SNPs with prostate cancer might arise from associations of these SNPs with circulating prostate-specific antigen (PSA) because PSA values are used to select controls.
Methods: We undertook a genome-wide association study (GWAS) of screen-detected prostate cancer (ProtecT: 1,146 cases and 1,804 controls); meta-analyzed the results with those from the previously published UK Genetic Prostate Cancer Study (1,854 cases and 1,437 controls); investigated associations of SNPs with prostate cancer using either “low” (PSA < 0.5 ng/mL) or “high” (PSA ≥ 3 ng/mL, biopsy negative) PSA controls; and investigated associations of SNPs with PSA.
Results: The ProtecT GWAS confirmed previously reported associations of prostate cancer at three loci: 10q11.23, 17q24.3, and 19q13.33. The meta-analysis confirmed associations of prostate cancer with SNPs near four previously identified loci (8q24.21,10q11.23, 17q24.3, and 19q13.33). When comparing prostate cancer cases with low PSA controls, alleles at genetic markers rs1512268, rs445114, rs10788160, rs11199874, rs17632542, rs266849, and rs2735839 were associated with an increased risk of prostate cancer, but the effect-estimates were attenuated to the null when using high PSA controls (Pheterogeneity in effect-estimates < 0.04). We found a novel inverse association of rs9311171-T with circulating PSA.
Conclusions: Differences in effect-estimates for prostate cancer observed when comparing low versus high PSA controls may be explained by associations of these SNPs with PSA.
Impact: These findings highlight the need for inferences from genetic studies of prostate cancer risk to carefully consider the influence of control selection criteria. Cancer Epidemiol Biomarkers Prev; 23(7); 1356–65. ©2014 AACR.
Introduction
Prostate cancer is the leading cause of cancer in men in developed countries, accounting for 25% of new cancer cases. Prostate-specific antigen (PSA) testing is currently the most widely used screening test for prostate cancer, but it is limited because current thresholds defining raised levels (typically circulating PSA levels >3 or 4 ng/mL) do not distinguish clinically important from indolent cancer and lower levels do not exclude prostate cancer (1).
Epidemiologic studies suggest a significant genetic component in the etiology of the disease, but only 30% of the estimated heritability has so far been explained (2), highlighting the need for more studies to identify the full genetic profile of prostate cancer. The majority of genome-wide association studies (GWAS) of prostate cancer have been based on clinically detected cases (3–11), with some studies specifically focusing on young age at onset and/or familial or aggressive prostate cancer (i.e., pathologic stage T3/T4, N+, M+, Gleason score ≥ 7, grade III, metastases, androgen ablation therapy, or PSA at least 10–50 ng/mL; refs. 4–6, 8–10, 12). Most studies have compared cases to controls that have been selected on the basis of PSA levels under a certain threshold (5, 6, 10, 12–17), including some studies that have used controls with extremely low PSA levels (<0.5 ng/mL; “supernormal” controls; refs. 3, 5, 6, 12, 15). Such control selection strategies can make it difficult to interpret associations between a genetic variant and disease, as such associations may result from a relationship between the variant and PSA levels rather than prostate cancer. A limited number of GWAS have attempted to clarify this relationship by analyzing associations of PSA levels with genetic markers identified using PSA screened controls (2, 5, 6, 15, 18, 19). It has been shown that SNPs at 13 known prostate cancer susceptibility loci (rs6869841, rs1270884, rs17632542, rs2242652, rs6983561, rs620861, rs10090154, rs7837688, rs12500426, rs7127900, rs10993994, rs2659056, rs2735839, rs5945619) are also associated with PSA concentration in blood in controls (2, 5, 6, 15, 19). As controls were defined by a PSA cutoff, however, associations between genotypes and the full distribution of PSA in men without prostate cancer could not be assessed. Only one study specifically investigated the relationship between genetic variants and circulating PSA level in men without detected prostate cancer (19). The Prostate Testing for cancer and Treatment study (ProtecT) contributed replication data to this study for the significant SNPs identified in the discovery GWAS. Genome-wide significant associations between PSA levels and markers at 6 different loci were reported (19).
Our investigation had 3 aims. First, to conduct a GWAS in a population-based screen detected prostate cancer population (ProtecT; ref. 20). Second, to conduct a meta-analysis to combine the results of the ProtecT GWAS with the previously published UK Genetic Prostate Cancer Study (UKGPCS; ref. 6). Third, to investigate whether previously identified associations of genetic variants with prostate cancer could be explained by associations with PSA levels, rather than cancer per se. For the third aim, (i) we identified previously published prostate cancer risk SNPs and compared their association with prostate cancer in 2 case–control studies nested within ProtecT; the first compared prostate cancer cases (diagnosed following a PSA ≥ 3 ng/mL and a positive biopsy) with “low” PSA controls (PSA < 0.5 ng/mL) and the second compared prostate cancer cases with “high” PSA controls (PSA ≥ 3 ng/mL and a negative biopsy); our hypothesis was that if the genetic marker is associated with PSA level and not prostate cancer, then the effect-estimates would be greatest when using the low PSA controls and close to the null when using the high PSA controls; and (ii) we examined the relationship of each of the identified genetic markers with PSA level as a continuous variable in controls selected independently of PSA level (“unrestricted” controls), including those with PSA levels ≥ 3 ng/mL who were biopsy negative, as well as those with levels below 3 ng/mL.
Materials and Methods
Samples
Two study populations were used in this analysis. First, participants from the ProtecT study (20), and second the UK Genetic Prostate Cancer Study stage I (UKGPCS), both of European ancestry (6). The ProtecT study recruited all men aged 50 to 69 years between 2001 and 2009 from 337 general practices in 9 UK centers (Birmingham, Bristol, Cambridge, Cardiff, Edinburgh, Leeds, Leicester, Newcastle, and Sheffield). All men were invited to a prostate check clinic and were offered a PSA test: approximately 110,000 attended. For those with a PSA level ≥3 ng/mL, a transrectal ultrasound-guided prostate biopsy was conducted. All detected tumors were histologically confirmed and assigned a Gleason score by a specialist uropathologist following a standard proforma. Tumors were categorized as low (score < 7), mid (score = 7), or high (score > 7) grade. Clinical staging was assigned using the tumor–node–metastasis (TNM) system (21) as either localized (T1–T2) or advanced (T3–T4; although most in the latter category were locally advanced and few tumors had metastasized distally). Approximately 3,000 men were diagnosed with prostate cancer between 2001 and 2009. All men diagnosed with prostate cancer after attending a prostate check clinic before end November 2006 (and their matched controls) were eligible for inclusion in the current ProtecT GWAS (n = 1,215 cases).
Men with no evidence of prostate cancer were eligible for selection as controls (approximate n = 107,000); these were men with a PSA < 3 ng/mL or a PSA ≥ 3 ng/mL with the most recent biopsy being negative. Controls were stratum matched to the genotyped prostate cancer cases by age, GP practice, and calendar time period, in 2 distinct rounds of matching. In the first round, a random sample of 6 controls per case were selected from all eligible controls in each age and GP practice strata (“unrestricted” controls); as the prostate check clinics were completed in one GP practice before moving to the next, matching on GP practice also matched on calendar time period. The second round randomly selected 1 control with a PSA < 0.5 ng/mL (and with whole blood collected) per case from the same age and GP practice strata as the index case (“supernormal controls”). A total of 1,925 matched controls were eligible for inclusion in the current GWAS. The Trent Multicenter Research Ethics Committee (MREC) approved the ProtecT study (MREC/01/4/025) and the associated ProMPT study that collected biologic material (MREC/01/4/061), and written informed consent was obtained.
The UKGPCS (stage I) dataset has been previously described (6). Briefly the cases (n = 2,017) were clinically detected and selected if the man had a diagnosis at ≤60 years of age or a first-/second-degree family history of prostate cancer. Self-reported “non-white” men were excluded, as were those who were diagnosed through asymptomatic screening. UKGPCS controls (total n = 1,893) were solely selected from men in the ProtecT study who had a PSA < 0.5 ng/mL. In the current meta-analysis, we excluded from the UKGPCS results those men who were in the ProtecT GWAS control series described above (n = 456) so that the populations in the pooled analysis were independent samples (Fig. 1).
Venn Diagram showing the inclusion of controls in the ProtecT and UKGPCS GWAS. *, controls overlapping between control populations were excluded from the UKGPCS GWAS.
Venn Diagram showing the inclusion of controls in the ProtecT and UKGPCS GWAS. *, controls overlapping between control populations were excluded from the UKGPCS GWAS.
Genotyping in ProtecT
The genotyping was performed at the Center National de Genotypage, Evry, France using the Illumina Human660W-Quad_v1_A array (Illumina, Inc.). Quality control measures included exclusions for sex-mismatches, minimal (<0.325) or excessive heterozygosity (>0.345), cryptic relatedness as estimated by proportion of loci identical by descent (IBD > 0.1), and disproportionate levels of individual missingness (>3%). Individuals were then checked for evidence of population stratification by multidimensional scaling analysis and compared with HapMap II (release 22) European descent (CEU), Han Chinese (CHB), Japanese (JPT), and Yoruba (YRI) reference populations. Individuals showing evidence of non-European ancestry were removed. SNPs with a minor allele frequency of below 1%, a call rate of <95%, or evidence for violation of Hardy–Weinberg equilibrium (P < 5 × 10−7) were discarded.
Imputation of autosomal genotypic data used Markov Chain Haplotyping software (MACH v.1.0.16; ref. 22) and phased haplotype data from CEU individuals (HapMap release 22, Phase II NCBI B36, dbSNP 126) on 514,432 autosomal SNPs. All SNPs with poor imputation quality (r2 hat < 0.3) were removed. The final working dataset included 2,950 individuals (cases, n = 1,146; controls, n = 1,804). A total of 1,272 controls were included from the first round of “unrestricted” control selection (PSA < 0.5 ng/mL, n = 238; PSA ≥ 0.5 and PSA < 3 ng/mL, n = 941; PSA ≥ 3 ng/mL, n = 93) and a further 532 from the second round (“supernormal” control selection).
UKGPCS genotyping
This has been reported elsewhere (6). Briefly, genotypes were generated using the Illumina Infinium HumanHap550 array, and we only used genotypes with a call rate of >97%. Related samples and those with Asian/African ancestry were excluded. After exclusions, 1,854 cases and 1,437 controls (independent of ProtecT—see Fig. 1) were used in the current meta-analysis.
Statistical analysis
The statistical analysis was undertaken in 4 stages (see Fig. 2). In the first stage, a genome-wide association analysis was performed on the ProtecT samples using the software package MACH2DAT (22, 23), using a logistic regression model based on an expected allelic dosage model for SNPs. Associations between each SNP and disease were assessed by a Wald test and P < 5 × 10−8 was considered genome-wide significant. All estimates are reported in the direction of the risk allele.
In the second stage, we meta-analyzed the top SNPs identified in ProtecT with the UKGPCS dataset. A total of 381 SNPs reaching a threshold for suggestive significance of 5 × 10−5 in the ProtecT GWAS were included in the meta-analysis. The threshold level of 5 × 10−5 was a pragmatic choice based on similar thresholds for suggestive significance cited in the literature, because of the relatively small sample size in ProtecT. Using a fixed-effect inverse variance model, the β-coefficients and SEs from ProtecT and UKGPCS were combined using the metan command in STATA. A P value of 5 × 10−8 was used to identify genome-wide significant SNPs in the meta-analysis. Heterogeneity between the ProtecT and UKGPCS studies was evaluated using Cochran's Q statistic and the I2 value (24). The 2 studies had the same strand orientation and used the same effect alleles. Associations showing moderate-to-high levels of heterogeneity (I2 > 50) were investigated further to determine whether the heterogeneity could be explained by the difference in control or case populations used in ProtecT and UKGPCS. We did this in 2 stages: first, we used only the low PSA (<0.5 ng/mL) controls in ProtecT to generate new effect-estimates and re-ran the meta-analysis for those SNPs showing moderate-to-high levels of heterogeneity in the original analysis; second, for those SNPs that still showed moderate-to-high levels of heterogeneity, we restricted the ProtecT cases to include only young cases (≤60 years of age) and/or those with a family history and re-ran the meta-analysis.
In stage 3, we investigated how the choice of controls in case–control GWAS studies may have impacted on associations of SNPs with prostate cancer, using 2 control populations from the ProtecT study: 770 low PSA controls (PSA < 0.5 ng/mL, from both the “unrestricted” control selection, n = 238, and “supernormal” control selection, n = 532); and 93 high PSA controls (PSA ≥ 3 ng/mL). We also did a sensitivity analysis relaxing the high PSA control threshold to a PSA level ≥ 2 ng/mL, resulting in 250 high PSA controls. We did this to increase power and move our high control population toward a more general population. We generated a list of SNPs to be tested by downloading all published SNPs associated with both prostate cancer risk and PSA level from the catalogue of Published Genome-wide Association Studies (http://www.genome.gov/gwastudies/: downloaded 21/11/2012). We supplemented this list with additional SNPs associated with prostate cancer in a recent meta-analysis (2), resulting in a total of 83 SNPs. Using this list, we extracted the individual genotypic data for all participants in our ProtecT dataset, although 2 SNPs were neither genotyped nor imputed (rs7210100 and rs16902094). Using the genotypic data, we generated effect-estimates for each SNP using multinomial logistic regression, where each control group (low and high) was compared with cases, separately (Fig. 2). To test the null hypothesis, that there was no difference between the 2 control groups when used to estimate the ORs for the associations of SNPs with prostate cancer, we carried out a separate logistic regression model that tested the association of each SNP with control type (low vs. high); this is the same as testing whether the ratio of ORs using the 2 control groups were equivalent (i.e., =1), as the base case group would be cancelled out (Fig. 2).
Finally, we conducted a regression analysis of SNPs on PSA level within the “unrestricted” control population (controls selected independent of PSA level). PSA values were log-transformed to approximate a normal distribution. To account for the multiple comparisons between PSA levels and SNP markers, we only highlight findings that showed evidence of an association in both the stratified analysis and a complementary association in the regression analysis.
The anti-log of β was taken to estimate the percentage change in PSA level by increasing allele dose. To incorporate the uncertainty of the imputation process, expected allelic dosage data were used for the analysis. Genotypic dosages are the expected number of copies of an allele that ranges from 0 to 2. For genotyped SNPs, the exact number of copies of an allele is known, but for imputed SNPs, the estimate maybe less precise.
Results
GWAS in ProtecT (stage 1)
The study characteristics of those individuals (cases: n = 1,146, controls: n = 1,804) included in the ProtecT GWAS are presented in Supplementary Table S1. Supplementary Figure S1 shows the quantile–quantile (Q–Q) plot of the distribution of test statistics for comparison of genotype frequencies in cases versus controls. The lambda inflation factor was 1.025. In GWAS, 381 SNPs were associated with prostate cancer at P < 5 × 10−5 (Supplementary Fig. S2 and Supplementary Table S2). We detected a genome-wide significant association (P < 5 × 10−8) between prostate cancer and SNPs at 3 previously identified loci: 10q11.23 (rs10993994), 17q24.3 (rs7222314), and 19q13.33 (rs1058205; Table 1 and locus zoom plots in Supplementary Figs. S3–S5).
Results of independent SNPs reaching genome-wide significance (P < 5 × 10−8) in the ProtecT GWAS
| Loci . | SNP . | Chromosome . | Position . | Alleles . | βa . | SE . | ORb . | P . | EAF . | RSQRc . |
|---|---|---|---|---|---|---|---|---|---|---|
| 10q11.23 | rs10993994 | 10 | 51219502 | T/C | 0.33 | 0.06 | 1.39 | 1.58 × 10−09 | 0.5983 | 0.9793 |
| 17q24.3 | rs7222314 | 17 | 66616533 | A/G | 0.32 | 0.06 | 1.38 | 8.59 × 10−09 | 0.4772 | 0.9295 |
| 19q13.33 | rs1058205 | 19 | 56055210 | T/C | 0.40 | 0.07 | 1.49 | 4.63 × 10−08 | 0.1828 | 1 |
| Loci . | SNP . | Chromosome . | Position . | Alleles . | βa . | SE . | ORb . | P . | EAF . | RSQRc . |
|---|---|---|---|---|---|---|---|---|---|---|
| 10q11.23 | rs10993994 | 10 | 51219502 | T/C | 0.33 | 0.06 | 1.39 | 1.58 × 10−09 | 0.5983 | 0.9793 |
| 17q24.3 | rs7222314 | 17 | 66616533 | A/G | 0.32 | 0.06 | 1.38 | 8.59 × 10−09 | 0.4772 | 0.9295 |
| 19q13.33 | rs1058205 | 19 | 56055210 | T/C | 0.40 | 0.07 | 1.49 | 4.63 × 10−08 | 0.1828 | 1 |
Abbreviation: EAF, effect allele frequency.
aCorrelation coefficient.
bEffect-estimate (OR).
cImputation accuracy.
Meta-analysis of ProtecT and UKGPCS studies (stage 2)
All SNPs at P < 5 × 10−5 in ProtecT were meta-analyzed with results from the UKGPCS. The total number of individuals included was 6,241 (cases: ProtecT n = 1,146, UKGPCS n = 1,854; controls: ProtecT n = 1,804, UKGPCS n = 1,437). A total of 175 SNPs reached genome-wide significance (P < 5 × 10−8) for an association with prostate cancer in the meta-analysis and were all located in or near genes across 4 previously identified loci (Table 2), with 6 independent signals. There was evidence of significant heterogeneity between the studies for the rs1447295 (8q24.21) marker (I2 = 86.1%, Pheterogeneity = 0.01). There was also evidence of moderate-to-high levels of heterogeneity for rs6983267, rs10993994, and rs17632542. The heterogeneity of rs6983267, rs10993994, and rs17632542 was greatly attenuated when the analysis was repeated using the ProtecT effect-estimates from a restricted control population (low PSA controls < 0.5 ng/mL; Supplementary Table S3). The heterogeneity observed for rs1447295, however, remained high (I2 = 83.1%, Pheterogeneity = 0.01). When the analysis was repeated using ProtecT effect-estimates based on young and/or familial cases and low PSA controls, the heterogeneity was only slightly reduced and remained statistically significant (I2 = 78%, Pheterogeneity = 0.03).
Summary results of top independent SNPs in known loci associated with prostate cancer identified by meta-analysis of ProtecT with UKGPCS
| . | . | . | . | . | ProtecT . | UKGPCS . | Combined (meta-results) . | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SNP . | Chromosome . | Position . | Gene . | EAF . | β . | SE . | EAF . | P . | β . | SE . | EAF . | P . | β (95% CI) . | P . | I2a . | PHetb . | |
| rs12682344 | 8 | 128175966 | SRRM1P1 - POU5F1B | G | 0.57 | 0.13 | 0.04 | 1.28 × 10−05 | 0.79 | 0.14 | 0.04 | 9.66 × 10−09 | 0.67 (0.48–0.86) | 4.69 × 10−12 | 21.7% | 0.26 | |
| rs6983267 | 8 | 128482487 | SRRM1P1 - POU5F1B | G | 0.22 | 0.05 | 0.52 | 4.25 × 10−05 | 0.36 | 0.05 | 0.53 | 2.30 × 10−12 | 0.29 (0.22–0.36) | 4.04 × 10−15 | 72.0% | 0.06 | |
| rs1447295 | 8 | 128554220 | POU5F1B - MYC | A | 0.35 | 0.08 | 0.11 | 2.36 × 10−05 | 0.67 | 0.08 | 0.12 | 1.20 × 10−16 | 0.51 (0.40–0.63) | 5.61 × 10−18 | 86.1% | 0.01 | |
| rs10993994 | 10 | 51219502 | MSMB | T | 0.33 | 0.06 | 0.40 | 1.58 × 10−09 | 0.46 | 0.05 | 0.40 | 2.15 × 10−19 | 0.40 (0.33–0.47) | 3.48 × 10−26 | 65.8% | 0.09 | |
| rs4793529 | 17 | 66630231 | CALM2P1 - SOX9 | T | 0.33 | 0.06 | 0.47 | 1.88 × 10−09 | 0.23 | 0.05 | 0.49 | 8.11 × 10−06 | 0.28 (0.20–0.35) | 1.78 × 10−13 | 49.8% | 0.16 | |
| rs17632542 | 19 | 56053569 | KLK3 | T | 0.60 | 0.11 | 0.91 | 1.28 × 10−08 | 0.82 | 0.08 | 0.89 | 4.74 × 10−23 | 0.73 (0.6–0.86) | 2.34 × 10−28 | 62.7% | 0.10 | |
| . | . | . | . | . | ProtecT . | UKGPCS . | Combined (meta-results) . | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SNP . | Chromosome . | Position . | Gene . | EAF . | β . | SE . | EAF . | P . | β . | SE . | EAF . | P . | β (95% CI) . | P . | I2a . | PHetb . | |
| rs12682344 | 8 | 128175966 | SRRM1P1 - POU5F1B | G | 0.57 | 0.13 | 0.04 | 1.28 × 10−05 | 0.79 | 0.14 | 0.04 | 9.66 × 10−09 | 0.67 (0.48–0.86) | 4.69 × 10−12 | 21.7% | 0.26 | |
| rs6983267 | 8 | 128482487 | SRRM1P1 - POU5F1B | G | 0.22 | 0.05 | 0.52 | 4.25 × 10−05 | 0.36 | 0.05 | 0.53 | 2.30 × 10−12 | 0.29 (0.22–0.36) | 4.04 × 10−15 | 72.0% | 0.06 | |
| rs1447295 | 8 | 128554220 | POU5F1B - MYC | A | 0.35 | 0.08 | 0.11 | 2.36 × 10−05 | 0.67 | 0.08 | 0.12 | 1.20 × 10−16 | 0.51 (0.40–0.63) | 5.61 × 10−18 | 86.1% | 0.01 | |
| rs10993994 | 10 | 51219502 | MSMB | T | 0.33 | 0.06 | 0.40 | 1.58 × 10−09 | 0.46 | 0.05 | 0.40 | 2.15 × 10−19 | 0.40 (0.33–0.47) | 3.48 × 10−26 | 65.8% | 0.09 | |
| rs4793529 | 17 | 66630231 | CALM2P1 - SOX9 | T | 0.33 | 0.06 | 0.47 | 1.88 × 10−09 | 0.23 | 0.05 | 0.49 | 8.11 × 10−06 | 0.28 (0.20–0.35) | 1.78 × 10−13 | 49.8% | 0.16 | |
| rs17632542 | 19 | 56053569 | KLK3 | T | 0.60 | 0.11 | 0.91 | 1.28 × 10−08 | 0.82 | 0.08 | 0.89 | 4.74 × 10−23 | 0.73 (0.6–0.86) | 2.34 × 10−28 | 62.7% | 0.10 | |
Abbreviation: EAF, effect allele frequency.
aPercentage of variation between study-specific effect-estimates that is due to heterogeneity.
bTests the hypothesis that there is no difference in the study-specific effect-estimates.
Stratified analysis using different control populations in ProtecT and PSA regressed in the unrestricted controls (stages 3 and 4)
Of the 81 SNPs examined, SNPs showing evidence of heterogeneity (P < 0.05) in effect-estimates quantifying the difference in risk between cases and low or high PSA controls and/or SNPs that showed at least nominal association with PSA levels in unrestricted controls (P < 0.05) are shown in Table 3. When comparing cases with low PSA controls, alleles at the genetic markers rs1512268 (8p21.2), rs445114 (8q24.21), rs10788160 (10q26.12), rs11199874 (10q26.12), rs17632542 (19q13.33), rs266849 (19q13.33), and rs2735839 (19q13.33) were strongly associated with an increased risk of prostate cancer (all P ≤ 0.01), with ORs ranging from 1.24 to 2.73. The effect-estimates, however, were attenuated to the null when using high PSA controls (Pheterogeneity between estimates all <0.04). There was also some evidence of associations of these SNPs with PSA levels in the “unrestricted” control population [in particular, rs17632542 (19q19.33) associated with a 32% increase in PSA per allele], which suggests that the difference observed in the stratified analysis maybe explained by an association of these SNPs with PSA level. Only associations of the following SNPs with PSA levels were statistically significant (P ≤ 0.05) in the “unrestricted” control population: rs445114, rs17632542, rs2735839, and rs266849. A further SNP (rs3850699) showed a similar association with PSA, with the effect-estimate attenuating toward the null in the high PSA controls, but in the “unrestricted” control population, the allele only increased PSA levels by 1% (P = 0.73).
Comparison of cases with supernormal or high PSA controls and PSA regressed on SNPs in unrestricted controls
| . | . | . | . | Stratified analysis comparing cases with 2 different control groups . | PSA regressed on SNPs in unrestricted controls (n = 1,272) . | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| . | . | . | . | Low PSA controls (n = 770) . | High PSA controls (n = 93) . | . | . | ||||
| Chromosome . | SNP . | Putative gene . | Risk allele . | OR (SE) . | P . | OR (SE) . | P . | Pheta . | β . | % PSA change . | P . |
| 3 | rs9311171 | CTDSPL | T | 1.07 (0.10) | 0.49 | 1.79 (0.34) | 0.002 | 0.01 | −0.09 | −9% | 0.04 |
| 3 | rs7611694 | SIDT1 | A | 1.09 (0.63) | 0.19 | 0.87 (0.19) | 0.40 | 0.17 | 0.07 | 8% | 0.02 |
| 5 | rs6869841 | FAM44B (BOD1) | T | 1.35 (0.11) | 3.44 × 10−4 | 1.22 (0.24) | 0.29 | 0.63 | 0.08 | 9% | 0.03 |
| 6 | rs1983891 | FOXP4 | T | 1.10 (0.08) | 0.22 | 0.96 (0.16) | 0.80 | 0.43 | 0.08 | 8% | 0.03 |
| 8 | rs1512268 | SLC25A37 - NKX3-1 | T | 1.30 (0.09) | 9.51 × 10−5 | 0.83 (0.13) | 0.24 | 0.004 | 0.06 | 6% | 0.07 |
| 8 | rs445114 | SRRM1P1 - POU5F1B | T | 1.46 (0.10) | 8.15 × 10−8 | 0.92 (0.16) | 0.60 | 0.01 | 0.08 | 8% | 0.03 |
| 8 | rs6983267 | SRRM1P1 - POU5F1B | G | 1.29 (0.05) | 8.51 × 10−5 | 1.15 (0.13) | 0.36 | 0.44 | 0.08 | 9% | 0.01 |
| 10 | rs11199874 | RPL19P16 - FGFR2 | A | 1.24 (0.06) | 0.01 | 0.78 (0.21) | 0.14 | 0.01 | 0.07 | 7% | 0.07 |
| 10 | rs10788160 | RPL19P16 - FGFR2 | A | 1.24 (0.06) | 0.01 | 0.80 (0.21) | 0.18 | 0.01 | 0.06 | 6% | 0.08 |
| 10 | rs3850699 | TRIM8 | A | 1.17 (0.06) | 0.03 | 0.82 (0.22) | 0.28 | 0.048 | 0.01 | 1% | 0.73 |
| 10 | rs10993994 | MSMB | T | 1.61 (0.11) | 4.54 × 10−12 | 1.45 (0.23) | 0.02 | 0.51 | 0.08 | 8% | 0.02 |
| 13 | rs9600079 | FABP5L1 - KLF12 | G | 1.07 (0.06) | 0.33 | 0.99 (0.16) | 0.97 | 0.65 | 0.06 | 6% | 0.048 |
| 19 | rs17632542 | KLK3 | T | 2.73 (0.32) | 1.40 × 10−17 | 0.76 (0.27) | 0.44 | 1.94 × 10−4 | 0.28 | 32% | 5.49 × 10−7 |
| 19 | rs2735839 | KLK3 - KLK2 | G | 2.13 (0.20) | 1.90 × 10−16 | 1.29 (0.28) | 0.26 | 0.02 | 0.21 | 23% | 0.004 |
| 19 | rs266849 | KLK15 - KLK3 | A | 1.76 (0.05) | 8.62 × 10−12 | 1.15 (0.17) | 0.49 | 0.03 | 0.12 | 13% | 2.40 × 10−6 |
| . | . | . | . | Stratified analysis comparing cases with 2 different control groups . | PSA regressed on SNPs in unrestricted controls (n = 1,272) . | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| . | . | . | . | Low PSA controls (n = 770) . | High PSA controls (n = 93) . | . | . | ||||
| Chromosome . | SNP . | Putative gene . | Risk allele . | OR (SE) . | P . | OR (SE) . | P . | Pheta . | β . | % PSA change . | P . |
| 3 | rs9311171 | CTDSPL | T | 1.07 (0.10) | 0.49 | 1.79 (0.34) | 0.002 | 0.01 | −0.09 | −9% | 0.04 |
| 3 | rs7611694 | SIDT1 | A | 1.09 (0.63) | 0.19 | 0.87 (0.19) | 0.40 | 0.17 | 0.07 | 8% | 0.02 |
| 5 | rs6869841 | FAM44B (BOD1) | T | 1.35 (0.11) | 3.44 × 10−4 | 1.22 (0.24) | 0.29 | 0.63 | 0.08 | 9% | 0.03 |
| 6 | rs1983891 | FOXP4 | T | 1.10 (0.08) | 0.22 | 0.96 (0.16) | 0.80 | 0.43 | 0.08 | 8% | 0.03 |
| 8 | rs1512268 | SLC25A37 - NKX3-1 | T | 1.30 (0.09) | 9.51 × 10−5 | 0.83 (0.13) | 0.24 | 0.004 | 0.06 | 6% | 0.07 |
| 8 | rs445114 | SRRM1P1 - POU5F1B | T | 1.46 (0.10) | 8.15 × 10−8 | 0.92 (0.16) | 0.60 | 0.01 | 0.08 | 8% | 0.03 |
| 8 | rs6983267 | SRRM1P1 - POU5F1B | G | 1.29 (0.05) | 8.51 × 10−5 | 1.15 (0.13) | 0.36 | 0.44 | 0.08 | 9% | 0.01 |
| 10 | rs11199874 | RPL19P16 - FGFR2 | A | 1.24 (0.06) | 0.01 | 0.78 (0.21) | 0.14 | 0.01 | 0.07 | 7% | 0.07 |
| 10 | rs10788160 | RPL19P16 - FGFR2 | A | 1.24 (0.06) | 0.01 | 0.80 (0.21) | 0.18 | 0.01 | 0.06 | 6% | 0.08 |
| 10 | rs3850699 | TRIM8 | A | 1.17 (0.06) | 0.03 | 0.82 (0.22) | 0.28 | 0.048 | 0.01 | 1% | 0.73 |
| 10 | rs10993994 | MSMB | T | 1.61 (0.11) | 4.54 × 10−12 | 1.45 (0.23) | 0.02 | 0.51 | 0.08 | 8% | 0.02 |
| 13 | rs9600079 | FABP5L1 - KLF12 | G | 1.07 (0.06) | 0.33 | 0.99 (0.16) | 0.97 | 0.65 | 0.06 | 6% | 0.048 |
| 19 | rs17632542 | KLK3 | T | 2.73 (0.32) | 1.40 × 10−17 | 0.76 (0.27) | 0.44 | 1.94 × 10−4 | 0.28 | 32% | 5.49 × 10−7 |
| 19 | rs2735839 | KLK3 - KLK2 | G | 2.13 (0.20) | 1.90 × 10−16 | 1.29 (0.28) | 0.26 | 0.02 | 0.21 | 23% | 0.004 |
| 19 | rs266849 | KLK15 - KLK3 | A | 1.76 (0.05) | 8.62 × 10−12 | 1.15 (0.17) | 0.49 | 0.03 | 0.12 | 13% | 2.40 × 10−6 |
NOTE: SNPs showing evidence of heterogeneity (P < 0.05) in effect-estimates quantifying the difference in risk between cases and low or high PSA controls and/or SNPs that showed at least nominal association with PSA levels in unrestricted controls (P < 0.05).
aTests the hypothesis that there is no difference between the effect-estimate of the cases versus low PSA controls and cases versus high PSA controls.
There was also evidence that a further 6 SNP alleles (rs7611694, rs6869841, rs1983891, rs6983267, rs10993994, and rs9600079) were associated with increasing circulating PSA level (P < 0.05) and that the use of high PSA controls attenuated effect-estimates of these SNPs with prostate cancer toward the null. The difference in estimates, however, was not significant (all P > 0.4). Relaxing the high PSA threshold to ≥2 ng/mL, we observed a statistically significant difference when comparing the estimates generated using low versus high PSA controls for rs6869841-T (P = 0.01) and rs10993994-T (P = 0.04; Supplementary Table S4).
The T-allele at 3p22.2 (rs9311171) was associated with a more pronounced increase in risk of prostate cancer when cases were compared with high PSA controls (79% increase) as opposed to low PSA controls (7% increase; Pheterogeneity = 0.01). There was also evidence of an association between the rs9311171-T allele and decreased PSA levels (P = 0.035). In the sensitivity analysis (high PSA controls ≥ 2 ng/mL), rs4775302 (15q21.1) was associated with a larger increase in risk of prostate cancer when cases were compared with the high PSA controls (29% increase), as opposed to the low PSA controls (2% increase; Pheterogeneity = 0.03; Supplementary Table S4). This was supported by the regression analysis of PSA in the “unrestricted” control population, which showed an association in the opposite direction (5% decrease; P = 0.09).
Discussion
Our GWAS and meta-analysis replicate previous findings for the following loci: 10q11.23 (rs10993994 - MSMB), 17q24.3 (rs7222314 - CALM2P1 - SOX9), and 19q13.33 (rs1058205 - KLK3) in the ProtecT GWAS; and 8q24.21 (rs12682344, rs6983267, and rs1447295 - intergenic), 10q11.23 (rs10993994 - MSMB), 17q24.3 (rs4793529 - CALM2P1 - SOX9), and 19q13.33 (rs17632542 - KLK3) in the meta-analysis. We also demonstrated a novel association of one SNP with circulating PSA level: rs9311171-T (inversely associated with PSA). The allele at the genetic marker rs9311171 is located near a microRNA (MIR26A1) and a protein linked to oncogenesis (CTDSPL). This SNP (rs9311171) had a larger positive association with prostate cancer when cases were compared with high PSA, as opposed to low PSA controls. This suggests that this marker is directly associated with both PSA level and prostate cancer risk.
We also found evidence that the association of prostate cancer risk of 7 other SNPs at loci 8p21.2, 10q26.12, and 19q13.33 could be confounded by the variant-PSA association (rs1512268-T, rs10788160-A, rs445114-T, rs11199874-A, rs17632542-T, rs2735839-G, and rs266849-A). The association between prostate cancer and the respective SNP allele was present in the comparison of cases with low PSA controls, but this association disappeared when compared with high PSA controls. There was also evidence of associations between these SNPs and PSA levels in the “unrestricted” control population. Associations of rs1512268-T, rs10788160-A, and rs11199874-A with PSA levels were only statistically significant (P < 0.05) in the stratified analysis. While associations of SNPs with PSA levels were not statistically significant in the regression analysis, the direction of the effect provides some support for the conclusion that these SNP alleles are associated with increased circulating PSA level. Eeles and colleagues (5) and Gudmundsson and colleagues (19) previously observed a positive association of PSA with rs1512268-T (near NKX3.1) and rs10788160-A (FGFR2), respectively. The allele at the genetic marker rs445114 (intergenic) was previously shown to have a positive association with PSA level, albeit non–genome-wide significant (P = 1.27 × 10−2; ref. 19). This is consistent with the findings of Al Olama and colleagues (18), who found that an SNP (rs620861) in strong linkage disequilibrium with rs445114 was positively associated with PSA level (P = 4.8 × 10−8). While rs11199874-A has not previously been tested for a relationship with PSA, it is in strong linkage disequilibrium with rs10788160-A (r2 = 0.94; FGFR2), which has been shown to be associated with increasing PSA levels. Our analysis suggests that rs10788160-A is associated with higher PSA levels, whereas rs10788160-G is associated with lower levels. A previous study reported a positive association between rs11199874-G and prostate cancer (25). This association could be driven by the large proportion (>50%) of high PSA (4–10 ng/mL) controls used rather than reflecting a true association because this high PSA–enriched control sample is likely to have a higher proportion of individuals with the allele rs11199874-A; this could in turn account for the positive association of prostate cancer with rs11199874-G (the opposite allele to rs11199874-A). Two markers, in strong linkage disequilibrium (rs17632542-T and rs2735839-G), located near the kallikrein-related peptidase protein-coding genes, have previously been observed to be positively associated with PSA level (19). SNP rs266849 is also located near but is independent of this set of PSA- associated proteins; a previous study reported no association of rs266849 with PSA in controls with a PSA < 10 ng/mL (6). When we reduced our threshold for defining high PSA controls (PSA ≥ 2 ng/mL) in the stratified analysis, we observed similar findings in the same direction to those that were found to be significant using the higher PSA threshold (PSA ≥ 3 ng/mL). In addition, in this sensitivity analysis, 3 other SNPs were shown to be associated with PSA.
Impact of control selection
Our analysis suggests that the selection of controls for GWAS below a certain PSA threshold results in associations with prostate cancer, when the actual association is with PSA level (e.g., rs6869841). In addition, the selection of these controls can lead to the masking of a prostate cancer association (e.g., rs9311171). To avoid these difficulties, an optimal design for a GWAS would be to select controls who have PSA levels at or above the threshold for biopsy and who are biopsy negative. These criteria ensure that the cases and controls are comparable in terms of PSA level and therefore associations of SNP markers with prostate cancer are not confounded by associations with PSA.
Strengths and limitations
This study was based on a population sample of men who underwent PSA testing, providing findings relevant to screen-detected cancer, and controls that were not selected according to PSA level. We also conducted a meta-analysis using 6,241 individuals (3,000 cases and 3,241 controls), but we were only able to confirm previous GWAS findings rather than show novel associations. There was significant heterogeneity between the ProtecT and UKGPCS studies for associations of markers at 8q24.21, 10q11.23, and 19q13.33 with prostate cancer. These differences were partly explained by the difference in control ascertainment between studies (Supplementary Table S3). We were, however, unable to explain the substantial heterogeneity (I2 = 78%) for rs1447295. We had a relatively small number of high PSA controls (n = 93) so our stratified analysis will have been underpowered, explaining why we were unable to detect some previously published associations of SNPs with PSA. To increase power, we carried out a sensitivity analysis by relaxing the high PSA control threshold to ≥2 ng/mL (n = 250), which resulted in 3 additional SNPs being associated with PSA level (rs6869841, rs10993994, and rs4775302).
To avoid potential false-positive results arising from multiple comparisons, we only highlight findings that showed evidence of an association in both the stratified and regression analyses. A Bonferroni-corrected threshold significance level is P ≤ 0.0006 (0.05/81 tests). However, this correction assumes that the tests were independent of each other, which in this case may not be true because some SNPs were correlated. Hence, a rigid application of the Bonferroni-corrected P value could lead to an erroneous rejection of genuine loci (false-negatives). Nevertheless, we have presented exact P values, allowing comparison with the Bonferroni-corrected P values. While all controls with an elevated PSA level were biopsy negative, there is the possibility that some controls with a PSA level < 3 ng/mL had undetected prostate cancer, as evidenced by a study (1) showing that 14% men with a PSA level ≤ 3 ng/mL had prostate cancer. The majority of controls in our study had a PSA level ≤ 1 ng/mL, which was associated with less than a 9% chance of undetected prostate cancer prostate cancer (1). It is also possible that up to a quarter of men with an elevated PSA level (≥3 ng/mL) and a negative biopsy had undetected prostate cancer (26), which could explain why we did not observe associations of SNPs with prostate cancer when using the high PSA control group.
Conclusion
We have confirmed associations of prostate cancer risk with 4 previously identified loci (8q24.21,10q11.23, 17q24.3, and 19q13.33) and associations of 7 markers with PSA level. We also found new evidence of an association of genetic variation at 3p22.2 with circulating PSA levels. We have highlighted that the method of selecting controls in case–control studies of associations of genetic variation with prostate cancer can influence the results, suggesting that inferences made in these studies should carefully consider control selection.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: D.W. Knipe, J.P. Kemp, J.L. Donovan, G. Davey Smith, R.M. Martin
Development of methodology: J.P. Kemp, M. Lathrop
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R. Eeles, D.F. Easton, Z. Kote-Jarai, S. Benlloch, J.L. Donovan, F.C. Hamdy, D.E. Neal
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D.W. Knipe, D.M. Evans, J.P. Kemp, D.F. Easton, A.A. Al Olama, F.C. Hamdy, M. Lathrop, R.M. Martin
Writing, review, and/or revision of the manuscript: D.W. Knipe, D.M. Evans, J.P. Kemp, R. Eeles, D.F. Easton, A.A. Al Olama, J.L. Donovan, F.C. Hamdy, G. Davey Smith, M. Lathrop, R.M. Martin
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Z. Kote-Jarai, S. Benlloch
Study supervision: J.L. Donovan, R.M. Martin
Acknowledgments
The authors thank the tremendous contribution of all members of the ProtecT study research group, and especially the following who were involved in this research (Athene Lane, Prasad Bollina, Sue Bonnington, Lynn Bradshaw, James Catto, Debbie Cooper, Michael Davis, Liz Down, Andrew Doble, Alan Doherty, Garrett Durkan, Emma Elliott, David Gillatt, Pippa Herbert, Peter Holding, Joanne Howson, Mandy Jones, Roger Kockelbergh, Howard Kynaston, Teresa Lennon, Norma Lyons, Hing Leung, Malcolm Mason, Hilary Moody, Philip Powell, Alan Paul, Stephen Prescott, Derek Rosario, Patricia O'Sullivan, Pauline Thompson, Sarah Tidball). They also thank Gemma Marsden, who processed the blood samples at the biorepository, and Dr. Chris Metcalfe for providing statistical advice and the Center National de Genotypage for genotyping the ProtecT samples.
Grant Support
The ProtecT study is funded by the U.K. Health Technology Assessment (HTA) Programme of the NIH Research (HTA 96/20/99; ISRCTN20141297). D.W. Knipe is funded by the Wellcome Trust 4-year studentship (WT099874MA). D.M. Evans, J.P. Kemp, G. Davey-Smith, and R.M. Martin work in an MRC Unit that is supported by the UK Medical Research Council (MC_UU_12013/1-9) and the University of Bristol. D.F. Easton is supported by a Cancer Research UK Grant (C1287/A10118). R.A. Eeles and Z. Kote-Jarai are supported by Cancer Research UK Grant (C5047/A7357) and support from the NIHR to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden NHS Foundation Trust. J.F.C. Hamdy, D.E. Neal, and J.L. Donovan are NIHR Senior Investigators. The authors thank the provision of the additional epidemiological data by the NHS R&D Directorate supported Prodigal study and the ProMPT (Prostate Mechanisms of Progression and Treatment) collaboration which is supported by the National Cancer Research Institute (NCRI) formed by the Department of Health, the Medical Research Council, and Cancer Research UK (G0500966/75466).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

