Background: Only a minority of the genetic components of prostate cancer risk have been explained. Some observed associations of SNPs with prostate cancer might arise from associations of these SNPs with circulating prostate-specific antigen (PSA) because PSA values are used to select controls.

Methods: We undertook a genome-wide association study (GWAS) of screen-detected prostate cancer (ProtecT: 1,146 cases and 1,804 controls); meta-analyzed the results with those from the previously published UK Genetic Prostate Cancer Study (1,854 cases and 1,437 controls); investigated associations of SNPs with prostate cancer using either “low” (PSA < 0.5 ng/mL) or “high” (PSA ≥ 3 ng/mL, biopsy negative) PSA controls; and investigated associations of SNPs with PSA.

Results: The ProtecT GWAS confirmed previously reported associations of prostate cancer at three loci: 10q11.23, 17q24.3, and 19q13.33. The meta-analysis confirmed associations of prostate cancer with SNPs near four previously identified loci (8q24.21,10q11.23, 17q24.3, and 19q13.33). When comparing prostate cancer cases with low PSA controls, alleles at genetic markers rs1512268, rs445114, rs10788160, rs11199874, rs17632542, rs266849, and rs2735839 were associated with an increased risk of prostate cancer, but the effect-estimates were attenuated to the null when using high PSA controls (Pheterogeneity in effect-estimates < 0.04). We found a novel inverse association of rs9311171-T with circulating PSA.

Conclusions: Differences in effect-estimates for prostate cancer observed when comparing low versus high PSA controls may be explained by associations of these SNPs with PSA.

Impact: These findings highlight the need for inferences from genetic studies of prostate cancer risk to carefully consider the influence of control selection criteria. Cancer Epidemiol Biomarkers Prev; 23(7); 1356–65. ©2014 AACR.

Prostate cancer is the leading cause of cancer in men in developed countries, accounting for 25% of new cancer cases. Prostate-specific antigen (PSA) testing is currently the most widely used screening test for prostate cancer, but it is limited because current thresholds defining raised levels (typically circulating PSA levels >3 or 4 ng/mL) do not distinguish clinically important from indolent cancer and lower levels do not exclude prostate cancer (1).

Epidemiologic studies suggest a significant genetic component in the etiology of the disease, but only 30% of the estimated heritability has so far been explained (2), highlighting the need for more studies to identify the full genetic profile of prostate cancer. The majority of genome-wide association studies (GWAS) of prostate cancer have been based on clinically detected cases (3–11), with some studies specifically focusing on young age at onset and/or familial or aggressive prostate cancer (i.e., pathologic stage T3/T4, N+, M+, Gleason score ≥ 7, grade III, metastases, androgen ablation therapy, or PSA at least 10–50 ng/mL; refs. 4–6, 8–10, 12). Most studies have compared cases to controls that have been selected on the basis of PSA levels under a certain threshold (5, 6, 10, 12–17), including some studies that have used controls with extremely low PSA levels (<0.5 ng/mL; “supernormal” controls; refs. 3, 5, 6, 12, 15). Such control selection strategies can make it difficult to interpret associations between a genetic variant and disease, as such associations may result from a relationship between the variant and PSA levels rather than prostate cancer. A limited number of GWAS have attempted to clarify this relationship by analyzing associations of PSA levels with genetic markers identified using PSA screened controls (2, 5, 6, 15, 18, 19). It has been shown that SNPs at 13 known prostate cancer susceptibility loci (rs6869841, rs1270884, rs17632542, rs2242652, rs6983561, rs620861, rs10090154, rs7837688, rs12500426, rs7127900, rs10993994, rs2659056, rs2735839, rs5945619) are also associated with PSA concentration in blood in controls (2, 5, 6, 15, 19). As controls were defined by a PSA cutoff, however, associations between genotypes and the full distribution of PSA in men without prostate cancer could not be assessed. Only one study specifically investigated the relationship between genetic variants and circulating PSA level in men without detected prostate cancer (19). The Prostate Testing for cancer and Treatment study (ProtecT) contributed replication data to this study for the significant SNPs identified in the discovery GWAS. Genome-wide significant associations between PSA levels and markers at 6 different loci were reported (19).

Our investigation had 3 aims. First, to conduct a GWAS in a population-based screen detected prostate cancer population (ProtecT; ref. 20). Second, to conduct a meta-analysis to combine the results of the ProtecT GWAS with the previously published UK Genetic Prostate Cancer Study (UKGPCS; ref. 6). Third, to investigate whether previously identified associations of genetic variants with prostate cancer could be explained by associations with PSA levels, rather than cancer per se. For the third aim, (i) we identified previously published prostate cancer risk SNPs and compared their association with prostate cancer in 2 case–control studies nested within ProtecT; the first compared prostate cancer cases (diagnosed following a PSA ≥ 3 ng/mL and a positive biopsy) with “low” PSA controls (PSA < 0.5 ng/mL) and the second compared prostate cancer cases with “high” PSA controls (PSA ≥ 3 ng/mL and a negative biopsy); our hypothesis was that if the genetic marker is associated with PSA level and not prostate cancer, then the effect-estimates would be greatest when using the low PSA controls and close to the null when using the high PSA controls; and (ii) we examined the relationship of each of the identified genetic markers with PSA level as a continuous variable in controls selected independently of PSA level (“unrestricted” controls), including those with PSA levels ≥ 3 ng/mL who were biopsy negative, as well as those with levels below 3 ng/mL.

Samples

Two study populations were used in this analysis. First, participants from the ProtecT study (20), and second the UK Genetic Prostate Cancer Study stage I (UKGPCS), both of European ancestry (6). The ProtecT study recruited all men aged 50 to 69 years between 2001 and 2009 from 337 general practices in 9 UK centers (Birmingham, Bristol, Cambridge, Cardiff, Edinburgh, Leeds, Leicester, Newcastle, and Sheffield). All men were invited to a prostate check clinic and were offered a PSA test: approximately 110,000 attended. For those with a PSA level ≥3 ng/mL, a transrectal ultrasound-guided prostate biopsy was conducted. All detected tumors were histologically confirmed and assigned a Gleason score by a specialist uropathologist following a standard proforma. Tumors were categorized as low (score < 7), mid (score = 7), or high (score > 7) grade. Clinical staging was assigned using the tumor–node–metastasis (TNM) system (21) as either localized (T1–T2) or advanced (T3–T4; although most in the latter category were locally advanced and few tumors had metastasized distally). Approximately 3,000 men were diagnosed with prostate cancer between 2001 and 2009. All men diagnosed with prostate cancer after attending a prostate check clinic before end November 2006 (and their matched controls) were eligible for inclusion in the current ProtecT GWAS (n = 1,215 cases).

Men with no evidence of prostate cancer were eligible for selection as controls (approximate n = 107,000); these were men with a PSA < 3 ng/mL or a PSA ≥ 3 ng/mL with the most recent biopsy being negative. Controls were stratum matched to the genotyped prostate cancer cases by age, GP practice, and calendar time period, in 2 distinct rounds of matching. In the first round, a random sample of 6 controls per case were selected from all eligible controls in each age and GP practice strata (“unrestricted” controls); as the prostate check clinics were completed in one GP practice before moving to the next, matching on GP practice also matched on calendar time period. The second round randomly selected 1 control with a PSA < 0.5 ng/mL (and with whole blood collected) per case from the same age and GP practice strata as the index case (“supernormal controls”). A total of 1,925 matched controls were eligible for inclusion in the current GWAS. The Trent Multicenter Research Ethics Committee (MREC) approved the ProtecT study (MREC/01/4/025) and the associated ProMPT study that collected biologic material (MREC/01/4/061), and written informed consent was obtained.

The UKGPCS (stage I) dataset has been previously described (6). Briefly the cases (n = 2,017) were clinically detected and selected if the man had a diagnosis at ≤60 years of age or a first-/second-degree family history of prostate cancer. Self-reported “non-white” men were excluded, as were those who were diagnosed through asymptomatic screening. UKGPCS controls (total n = 1,893) were solely selected from men in the ProtecT study who had a PSA < 0.5 ng/mL. In the current meta-analysis, we excluded from the UKGPCS results those men who were in the ProtecT GWAS control series described above (n = 456) so that the populations in the pooled analysis were independent samples (Fig. 1).

Figure 1.

Venn Diagram showing the inclusion of controls in the ProtecT and UKGPCS GWAS. *, controls overlapping between control populations were excluded from the UKGPCS GWAS.

Figure 1.

Venn Diagram showing the inclusion of controls in the ProtecT and UKGPCS GWAS. *, controls overlapping between control populations were excluded from the UKGPCS GWAS.

Close modal

Genotyping in ProtecT

The genotyping was performed at the Center National de Genotypage, Evry, France using the Illumina Human660W-Quad_v1_A array (Illumina, Inc.). Quality control measures included exclusions for sex-mismatches, minimal (<0.325) or excessive heterozygosity (>0.345), cryptic relatedness as estimated by proportion of loci identical by descent (IBD > 0.1), and disproportionate levels of individual missingness (>3%). Individuals were then checked for evidence of population stratification by multidimensional scaling analysis and compared with HapMap II (release 22) European descent (CEU), Han Chinese (CHB), Japanese (JPT), and Yoruba (YRI) reference populations. Individuals showing evidence of non-European ancestry were removed. SNPs with a minor allele frequency of below 1%, a call rate of <95%, or evidence for violation of Hardy–Weinberg equilibrium (P < 5 × 10−7) were discarded.

Imputation of autosomal genotypic data used Markov Chain Haplotyping software (MACH v.1.0.16; ref. 22) and phased haplotype data from CEU individuals (HapMap release 22, Phase II NCBI B36, dbSNP 126) on 514,432 autosomal SNPs. All SNPs with poor imputation quality (r2 hat < 0.3) were removed. The final working dataset included 2,950 individuals (cases, n = 1,146; controls, n = 1,804). A total of 1,272 controls were included from the first round of “unrestricted” control selection (PSA < 0.5 ng/mL, n = 238; PSA ≥ 0.5 and PSA < 3 ng/mL, n = 941; PSA ≥ 3 ng/mL, n = 93) and a further 532 from the second round (“supernormal” control selection).

UKGPCS genotyping

This has been reported elsewhere (6). Briefly, genotypes were generated using the Illumina Infinium HumanHap550 array, and we only used genotypes with a call rate of >97%. Related samples and those with Asian/African ancestry were excluded. After exclusions, 1,854 cases and 1,437 controls (independent of ProtecT—see Fig. 1) were used in the current meta-analysis.

Statistical analysis

The statistical analysis was undertaken in 4 stages (see Fig. 2). In the first stage, a genome-wide association analysis was performed on the ProtecT samples using the software package MACH2DAT (22, 23), using a logistic regression model based on an expected allelic dosage model for SNPs. Associations between each SNP and disease were assessed by a Wald test and P < 5 × 10−8 was considered genome-wide significant. All estimates are reported in the direction of the risk allele.

Figure 2.

Analysis strategy. *, population; **, ratio of ORs.

Figure 2.

Analysis strategy. *, population; **, ratio of ORs.

Close modal

In the second stage, we meta-analyzed the top SNPs identified in ProtecT with the UKGPCS dataset. A total of 381 SNPs reaching a threshold for suggestive significance of 5 × 10−5 in the ProtecT GWAS were included in the meta-analysis. The threshold level of 5 × 10−5 was a pragmatic choice based on similar thresholds for suggestive significance cited in the literature, because of the relatively small sample size in ProtecT. Using a fixed-effect inverse variance model, the β-coefficients and SEs from ProtecT and UKGPCS were combined using the metan command in STATA. A P value of 5 × 10−8 was used to identify genome-wide significant SNPs in the meta-analysis. Heterogeneity between the ProtecT and UKGPCS studies was evaluated using Cochran's Q statistic and the I2 value (24). The 2 studies had the same strand orientation and used the same effect alleles. Associations showing moderate-to-high levels of heterogeneity (I2 > 50) were investigated further to determine whether the heterogeneity could be explained by the difference in control or case populations used in ProtecT and UKGPCS. We did this in 2 stages: first, we used only the low PSA (<0.5 ng/mL) controls in ProtecT to generate new effect-estimates and re-ran the meta-analysis for those SNPs showing moderate-to-high levels of heterogeneity in the original analysis; second, for those SNPs that still showed moderate-to-high levels of heterogeneity, we restricted the ProtecT cases to include only young cases (≤60 years of age) and/or those with a family history and re-ran the meta-analysis.

In stage 3, we investigated how the choice of controls in case–control GWAS studies may have impacted on associations of SNPs with prostate cancer, using 2 control populations from the ProtecT study: 770 low PSA controls (PSA < 0.5 ng/mL, from both the “unrestricted” control selection, n = 238, and “supernormal” control selection, n = 532); and 93 high PSA controls (PSA ≥ 3 ng/mL). We also did a sensitivity analysis relaxing the high PSA control threshold to a PSA level ≥ 2 ng/mL, resulting in 250 high PSA controls. We did this to increase power and move our high control population toward a more general population. We generated a list of SNPs to be tested by downloading all published SNPs associated with both prostate cancer risk and PSA level from the catalogue of Published Genome-wide Association Studies (http://www.genome.gov/gwastudies/: downloaded 21/11/2012). We supplemented this list with additional SNPs associated with prostate cancer in a recent meta-analysis (2), resulting in a total of 83 SNPs. Using this list, we extracted the individual genotypic data for all participants in our ProtecT dataset, although 2 SNPs were neither genotyped nor imputed (rs7210100 and rs16902094). Using the genotypic data, we generated effect-estimates for each SNP using multinomial logistic regression, where each control group (low and high) was compared with cases, separately (Fig. 2). To test the null hypothesis, that there was no difference between the 2 control groups when used to estimate the ORs for the associations of SNPs with prostate cancer, we carried out a separate logistic regression model that tested the association of each SNP with control type (low vs. high); this is the same as testing whether the ratio of ORs using the 2 control groups were equivalent (i.e., =1), as the base case group would be cancelled out (Fig. 2).

Finally, we conducted a regression analysis of SNPs on PSA level within the “unrestricted” control population (controls selected independent of PSA level). PSA values were log-transformed to approximate a normal distribution. To account for the multiple comparisons between PSA levels and SNP markers, we only highlight findings that showed evidence of an association in both the stratified analysis and a complementary association in the regression analysis.

The anti-log of β was taken to estimate the percentage change in PSA level by increasing allele dose. To incorporate the uncertainty of the imputation process, expected allelic dosage data were used for the analysis. Genotypic dosages are the expected number of copies of an allele that ranges from 0 to 2. For genotyped SNPs, the exact number of copies of an allele is known, but for imputed SNPs, the estimate maybe less precise.

GWAS in ProtecT (stage 1)

The study characteristics of those individuals (cases: n = 1,146, controls: n = 1,804) included in the ProtecT GWAS are presented in Supplementary Table S1. Supplementary Figure S1 shows the quantile–quantile (Q–Q) plot of the distribution of test statistics for comparison of genotype frequencies in cases versus controls. The lambda inflation factor was 1.025. In GWAS, 381 SNPs were associated with prostate cancer at P < 5 × 10−5 (Supplementary Fig. S2 and Supplementary Table S2). We detected a genome-wide significant association (P < 5 × 10−8) between prostate cancer and SNPs at 3 previously identified loci: 10q11.23 (rs10993994), 17q24.3 (rs7222314), and 19q13.33 (rs1058205; Table 1 and locus zoom plots in Supplementary Figs. S3–S5).

Table 1.

Results of independent SNPs reaching genome-wide significance (P < 5 × 10−8) in the ProtecT GWAS

LociSNPChromosomePositionAllelesβaSEORbPEAFRSQRc
10q11.23 rs10993994 10 51219502 T/C 0.33 0.06 1.39 1.58 × 10−09 0.5983 0.9793 
17q24.3 rs7222314 17 66616533 A/G 0.32 0.06 1.38 8.59 × 10−09 0.4772 0.9295 
19q13.33 rs1058205 19 56055210 T/C 0.40 0.07 1.49 4.63 × 10−08 0.1828 
LociSNPChromosomePositionAllelesβaSEORbPEAFRSQRc
10q11.23 rs10993994 10 51219502 T/C 0.33 0.06 1.39 1.58 × 10−09 0.5983 0.9793 
17q24.3 rs7222314 17 66616533 A/G 0.32 0.06 1.38 8.59 × 10−09 0.4772 0.9295 
19q13.33 rs1058205 19 56055210 T/C 0.40 0.07 1.49 4.63 × 10−08 0.1828 

Abbreviation: EAF, effect allele frequency.

aCorrelation coefficient.

bEffect-estimate (OR).

cImputation accuracy.

Meta-analysis of ProtecT and UKGPCS studies (stage 2)

All SNPs at P < 5 × 10−5 in ProtecT were meta-analyzed with results from the UKGPCS. The total number of individuals included was 6,241 (cases: ProtecT n = 1,146, UKGPCS n = 1,854; controls: ProtecT n = 1,804, UKGPCS n = 1,437). A total of 175 SNPs reached genome-wide significance (P < 5 × 10−8) for an association with prostate cancer in the meta-analysis and were all located in or near genes across 4 previously identified loci (Table 2), with 6 independent signals. There was evidence of significant heterogeneity between the studies for the rs1447295 (8q24.21) marker (I2 = 86.1%, Pheterogeneity = 0.01). There was also evidence of moderate-to-high levels of heterogeneity for rs6983267, rs10993994, and rs17632542. The heterogeneity of rs6983267, rs10993994, and rs17632542 was greatly attenuated when the analysis was repeated using the ProtecT effect-estimates from a restricted control population (low PSA controls < 0.5 ng/mL; Supplementary Table S3). The heterogeneity observed for rs1447295, however, remained high (I2 = 83.1%, Pheterogeneity = 0.01). When the analysis was repeated using ProtecT effect-estimates based on young and/or familial cases and low PSA controls, the heterogeneity was only slightly reduced and remained statistically significant (I2 = 78%, Pheterogeneity = 0.03).

Table 2.

Summary results of top independent SNPs in known loci associated with prostate cancer identified by meta-analysis of ProtecT with UKGPCS

ProtecTUKGPCSCombined (meta-results)
SNPChromosomePositionGeneEAFβSEEAFPβSEEAFPβ (95% CI)PI2aPHetb
rs12682344 128175966 SRRM1P1 - POU5F1B 0.57 0.13 0.04 1.28 × 10−05 0.79 0.14 0.04 9.66 × 10−09 0.67 (0.48–0.86) 4.69 × 10−12 21.7% 0.26 
rs6983267 128482487 SRRM1P1 - POU5F1B 0.22 0.05 0.52 4.25 × 10−05 0.36 0.05 0.53 2.30 × 10−12 0.29 (0.22–0.36) 4.04 × 10−15 72.0% 0.06 
rs1447295 128554220 POU5F1B - MYC 0.35 0.08 0.11 2.36 × 10−05 0.67 0.08 0.12 1.20 × 10−16 0.51 (0.40–0.63) 5.61 × 10−18 86.1% 0.01 
rs10993994 10 51219502 MSMB 0.33 0.06 0.40 1.58 × 10−09 0.46 0.05 0.40 2.15 × 10−19 0.40 (0.33–0.47) 3.48 × 10−26 65.8% 0.09 
rs4793529 17 66630231 CALM2P1 - SOX9 0.33 0.06 0.47 1.88 × 10−09 0.23 0.05 0.49 8.11 × 10−06 0.28 (0.20–0.35) 1.78 × 10−13 49.8% 0.16 
rs17632542 19 56053569 KLK3 0.60 0.11 0.91 1.28 × 10−08 0.82 0.08 0.89 4.74 × 10−23 0.73 (0.6–0.86) 2.34 × 10−28 62.7% 0.10 
ProtecTUKGPCSCombined (meta-results)
SNPChromosomePositionGeneEAFβSEEAFPβSEEAFPβ (95% CI)PI2aPHetb
rs12682344 128175966 SRRM1P1 - POU5F1B 0.57 0.13 0.04 1.28 × 10−05 0.79 0.14 0.04 9.66 × 10−09 0.67 (0.48–0.86) 4.69 × 10−12 21.7% 0.26 
rs6983267 128482487 SRRM1P1 - POU5F1B 0.22 0.05 0.52 4.25 × 10−05 0.36 0.05 0.53 2.30 × 10−12 0.29 (0.22–0.36) 4.04 × 10−15 72.0% 0.06 
rs1447295 128554220 POU5F1B - MYC 0.35 0.08 0.11 2.36 × 10−05 0.67 0.08 0.12 1.20 × 10−16 0.51 (0.40–0.63) 5.61 × 10−18 86.1% 0.01 
rs10993994 10 51219502 MSMB 0.33 0.06 0.40 1.58 × 10−09 0.46 0.05 0.40 2.15 × 10−19 0.40 (0.33–0.47) 3.48 × 10−26 65.8% 0.09 
rs4793529 17 66630231 CALM2P1 - SOX9 0.33 0.06 0.47 1.88 × 10−09 0.23 0.05 0.49 8.11 × 10−06 0.28 (0.20–0.35) 1.78 × 10−13 49.8% 0.16 
rs17632542 19 56053569 KLK3 0.60 0.11 0.91 1.28 × 10−08 0.82 0.08 0.89 4.74 × 10−23 0.73 (0.6–0.86) 2.34 × 10−28 62.7% 0.10 

Abbreviation: EAF, effect allele frequency.

aPercentage of variation between study-specific effect-estimates that is due to heterogeneity.

bTests the hypothesis that there is no difference in the study-specific effect-estimates.

Stratified analysis using different control populations in ProtecT and PSA regressed in the unrestricted controls (stages 3 and 4)

Of the 81 SNPs examined, SNPs showing evidence of heterogeneity (P < 0.05) in effect-estimates quantifying the difference in risk between cases and low or high PSA controls and/or SNPs that showed at least nominal association with PSA levels in unrestricted controls (P < 0.05) are shown in Table 3. When comparing cases with low PSA controls, alleles at the genetic markers rs1512268 (8p21.2), rs445114 (8q24.21), rs10788160 (10q26.12), rs11199874 (10q26.12), rs17632542 (19q13.33), rs266849 (19q13.33), and rs2735839 (19q13.33) were strongly associated with an increased risk of prostate cancer (all P ≤ 0.01), with ORs ranging from 1.24 to 2.73. The effect-estimates, however, were attenuated to the null when using high PSA controls (Pheterogeneity between estimates all <0.04). There was also some evidence of associations of these SNPs with PSA levels in the “unrestricted” control population [in particular, rs17632542 (19q19.33) associated with a 32% increase in PSA per allele], which suggests that the difference observed in the stratified analysis maybe explained by an association of these SNPs with PSA level. Only associations of the following SNPs with PSA levels were statistically significant (P ≤ 0.05) in the “unrestricted” control population: rs445114, rs17632542, rs2735839, and rs266849. A further SNP (rs3850699) showed a similar association with PSA, with the effect-estimate attenuating toward the null in the high PSA controls, but in the “unrestricted” control population, the allele only increased PSA levels by 1% (P = 0.73).

Table 3.

Comparison of cases with supernormal or high PSA controls and PSA regressed on SNPs in unrestricted controls

Stratified analysis comparing cases with 2 different control groupsPSA regressed on SNPs in unrestricted controls (n = 1,272)
Low PSA controls (n = 770)High PSA controls (n = 93)
ChromosomeSNPPutative geneRisk alleleOR (SE)POR (SE)PPhetaβ% PSA changeP
rs9311171 CTDSPL 1.07 (0.10) 0.49 1.79 (0.34) 0.002 0.01 −0.09 −9% 0.04 
rs7611694 SIDT1 1.09 (0.63) 0.19 0.87 (0.19) 0.40 0.17 0.07 8% 0.02 
rs6869841 FAM44B (BOD1) 1.35 (0.11) 3.44 × 10−4 1.22 (0.24) 0.29 0.63 0.08 9% 0.03 
rs1983891 FOXP4 1.10 (0.08) 0.22 0.96 (0.16) 0.80 0.43 0.08 8% 0.03 
rs1512268 SLC25A37 - NKX3-1 1.30 (0.09) 9.51 × 10−5 0.83 (0.13) 0.24 0.004 0.06 6% 0.07 
rs445114 SRRM1P1 - POU5F1B 1.46 (0.10) 8.15 × 10−8 0.92 (0.16) 0.60 0.01 0.08 8% 0.03 
rs6983267 SRRM1P1 - POU5F1B 1.29 (0.05) 8.51 × 10−5 1.15 (0.13) 0.36 0.44 0.08 9% 0.01 
10 rs11199874 RPL19P16 - FGFR2 1.24 (0.06) 0.01 0.78 (0.21) 0.14 0.01 0.07 7% 0.07 
10 rs10788160 RPL19P16 - FGFR2 1.24 (0.06) 0.01 0.80 (0.21) 0.18 0.01 0.06 6% 0.08 
10 rs3850699 TRIM8 1.17 (0.06) 0.03 0.82 (0.22) 0.28 0.048 0.01 1% 0.73 
10 rs10993994 MSMB 1.61 (0.11) 4.54 × 10−12 1.45 (0.23) 0.02 0.51 0.08 8% 0.02 
13 rs9600079 FABP5L1 - KLF12 1.07 (0.06) 0.33 0.99 (0.16) 0.97 0.65 0.06 6% 0.048 
19 rs17632542 KLK3 2.73 (0.32) 1.40 × 10−17 0.76 (0.27) 0.44 1.94 × 10−4 0.28 32% 5.49 × 10−7 
19 rs2735839 KLK3 - KLK2 2.13 (0.20) 1.90 × 10−16 1.29 (0.28) 0.26 0.02 0.21 23% 0.004 
19 rs266849 KLK15 - KLK3 1.76 (0.05) 8.62 × 10−12 1.15 (0.17) 0.49 0.03 0.12 13% 2.40 × 10−6 
Stratified analysis comparing cases with 2 different control groupsPSA regressed on SNPs in unrestricted controls (n = 1,272)
Low PSA controls (n = 770)High PSA controls (n = 93)
ChromosomeSNPPutative geneRisk alleleOR (SE)POR (SE)PPhetaβ% PSA changeP
rs9311171 CTDSPL 1.07 (0.10) 0.49 1.79 (0.34) 0.002 0.01 −0.09 −9% 0.04 
rs7611694 SIDT1 1.09 (0.63) 0.19 0.87 (0.19) 0.40 0.17 0.07 8% 0.02 
rs6869841 FAM44B (BOD1) 1.35 (0.11) 3.44 × 10−4 1.22 (0.24) 0.29 0.63 0.08 9% 0.03 
rs1983891 FOXP4 1.10 (0.08) 0.22 0.96 (0.16) 0.80 0.43 0.08 8% 0.03 
rs1512268 SLC25A37 - NKX3-1 1.30 (0.09) 9.51 × 10−5 0.83 (0.13) 0.24 0.004 0.06 6% 0.07 
rs445114 SRRM1P1 - POU5F1B 1.46 (0.10) 8.15 × 10−8 0.92 (0.16) 0.60 0.01 0.08 8% 0.03 
rs6983267 SRRM1P1 - POU5F1B 1.29 (0.05) 8.51 × 10−5 1.15 (0.13) 0.36 0.44 0.08 9% 0.01 
10 rs11199874 RPL19P16 - FGFR2 1.24 (0.06) 0.01 0.78 (0.21) 0.14 0.01 0.07 7% 0.07 
10 rs10788160 RPL19P16 - FGFR2 1.24 (0.06) 0.01 0.80 (0.21) 0.18 0.01 0.06 6% 0.08 
10 rs3850699 TRIM8 1.17 (0.06) 0.03 0.82 (0.22) 0.28 0.048 0.01 1% 0.73 
10 rs10993994 MSMB 1.61 (0.11) 4.54 × 10−12 1.45 (0.23) 0.02 0.51 0.08 8% 0.02 
13 rs9600079 FABP5L1 - KLF12 1.07 (0.06) 0.33 0.99 (0.16) 0.97 0.65 0.06 6% 0.048 
19 rs17632542 KLK3 2.73 (0.32) 1.40 × 10−17 0.76 (0.27) 0.44 1.94 × 10−4 0.28 32% 5.49 × 10−7 
19 rs2735839 KLK3 - KLK2 2.13 (0.20) 1.90 × 10−16 1.29 (0.28) 0.26 0.02 0.21 23% 0.004 
19 rs266849 KLK15 - KLK3 1.76 (0.05) 8.62 × 10−12 1.15 (0.17) 0.49 0.03 0.12 13% 2.40 × 10−6 

NOTE: SNPs showing evidence of heterogeneity (P < 0.05) in effect-estimates quantifying the difference in risk between cases and low or high PSA controls and/or SNPs that showed at least nominal association with PSA levels in unrestricted controls (P < 0.05).

aTests the hypothesis that there is no difference between the effect-estimate of the cases versus low PSA controls and cases versus high PSA controls.

There was also evidence that a further 6 SNP alleles (rs7611694, rs6869841, rs1983891, rs6983267, rs10993994, and rs9600079) were associated with increasing circulating PSA level (P < 0.05) and that the use of high PSA controls attenuated effect-estimates of these SNPs with prostate cancer toward the null. The difference in estimates, however, was not significant (all P > 0.4). Relaxing the high PSA threshold to ≥2 ng/mL, we observed a statistically significant difference when comparing the estimates generated using low versus high PSA controls for rs6869841-T (P = 0.01) and rs10993994-T (P = 0.04; Supplementary Table S4).

The T-allele at 3p22.2 (rs9311171) was associated with a more pronounced increase in risk of prostate cancer when cases were compared with high PSA controls (79% increase) as opposed to low PSA controls (7% increase; Pheterogeneity = 0.01). There was also evidence of an association between the rs9311171-T allele and decreased PSA levels (P = 0.035). In the sensitivity analysis (high PSA controls ≥ 2 ng/mL), rs4775302 (15q21.1) was associated with a larger increase in risk of prostate cancer when cases were compared with the high PSA controls (29% increase), as opposed to the low PSA controls (2% increase; Pheterogeneity = 0.03; Supplementary Table S4). This was supported by the regression analysis of PSA in the “unrestricted” control population, which showed an association in the opposite direction (5% decrease; P = 0.09).

Our GWAS and meta-analysis replicate previous findings for the following loci: 10q11.23 (rs10993994 - MSMB), 17q24.3 (rs7222314 - CALM2P1 - SOX9), and 19q13.33 (rs1058205 - KLK3) in the ProtecT GWAS; and 8q24.21 (rs12682344, rs6983267, and rs1447295 - intergenic), 10q11.23 (rs10993994 - MSMB), 17q24.3 (rs4793529 - CALM2P1 - SOX9), and 19q13.33 (rs17632542 - KLK3) in the meta-analysis. We also demonstrated a novel association of one SNP with circulating PSA level: rs9311171-T (inversely associated with PSA). The allele at the genetic marker rs9311171 is located near a microRNA (MIR26A1) and a protein linked to oncogenesis (CTDSPL). This SNP (rs9311171) had a larger positive association with prostate cancer when cases were compared with high PSA, as opposed to low PSA controls. This suggests that this marker is directly associated with both PSA level and prostate cancer risk.

We also found evidence that the association of prostate cancer risk of 7 other SNPs at loci 8p21.2, 10q26.12, and 19q13.33 could be confounded by the variant-PSA association (rs1512268-T, rs10788160-A, rs445114-T, rs11199874-A, rs17632542-T, rs2735839-G, and rs266849-A). The association between prostate cancer and the respective SNP allele was present in the comparison of cases with low PSA controls, but this association disappeared when compared with high PSA controls. There was also evidence of associations between these SNPs and PSA levels in the “unrestricted” control population. Associations of rs1512268-T, rs10788160-A, and rs11199874-A with PSA levels were only statistically significant (P < 0.05) in the stratified analysis. While associations of SNPs with PSA levels were not statistically significant in the regression analysis, the direction of the effect provides some support for the conclusion that these SNP alleles are associated with increased circulating PSA level. Eeles and colleagues (5) and Gudmundsson and colleagues (19) previously observed a positive association of PSA with rs1512268-T (near NKX3.1) and rs10788160-A (FGFR2), respectively. The allele at the genetic marker rs445114 (intergenic) was previously shown to have a positive association with PSA level, albeit non–genome-wide significant (P = 1.27 × 10−2; ref. 19). This is consistent with the findings of Al Olama and colleagues (18), who found that an SNP (rs620861) in strong linkage disequilibrium with rs445114 was positively associated with PSA level (P = 4.8 × 10−8). While rs11199874-A has not previously been tested for a relationship with PSA, it is in strong linkage disequilibrium with rs10788160-A (r2 = 0.94; FGFR2), which has been shown to be associated with increasing PSA levels. Our analysis suggests that rs10788160-A is associated with higher PSA levels, whereas rs10788160-G is associated with lower levels. A previous study reported a positive association between rs11199874-G and prostate cancer (25). This association could be driven by the large proportion (>50%) of high PSA (4–10 ng/mL) controls used rather than reflecting a true association because this high PSA–enriched control sample is likely to have a higher proportion of individuals with the allele rs11199874-A; this could in turn account for the positive association of prostate cancer with rs11199874-G (the opposite allele to rs11199874-A). Two markers, in strong linkage disequilibrium (rs17632542-T and rs2735839-G), located near the kallikrein-related peptidase protein-coding genes, have previously been observed to be positively associated with PSA level (19). SNP rs266849 is also located near but is independent of this set of PSA- associated proteins; a previous study reported no association of rs266849 with PSA in controls with a PSA < 10 ng/mL (6). When we reduced our threshold for defining high PSA controls (PSA ≥ 2 ng/mL) in the stratified analysis, we observed similar findings in the same direction to those that were found to be significant using the higher PSA threshold (PSA ≥ 3 ng/mL). In addition, in this sensitivity analysis, 3 other SNPs were shown to be associated with PSA.

Impact of control selection

Our analysis suggests that the selection of controls for GWAS below a certain PSA threshold results in associations with prostate cancer, when the actual association is with PSA level (e.g., rs6869841). In addition, the selection of these controls can lead to the masking of a prostate cancer association (e.g., rs9311171). To avoid these difficulties, an optimal design for a GWAS would be to select controls who have PSA levels at or above the threshold for biopsy and who are biopsy negative. These criteria ensure that the cases and controls are comparable in terms of PSA level and therefore associations of SNP markers with prostate cancer are not confounded by associations with PSA.

Strengths and limitations

This study was based on a population sample of men who underwent PSA testing, providing findings relevant to screen-detected cancer, and controls that were not selected according to PSA level. We also conducted a meta-analysis using 6,241 individuals (3,000 cases and 3,241 controls), but we were only able to confirm previous GWAS findings rather than show novel associations. There was significant heterogeneity between the ProtecT and UKGPCS studies for associations of markers at 8q24.21, 10q11.23, and 19q13.33 with prostate cancer. These differences were partly explained by the difference in control ascertainment between studies (Supplementary Table S3). We were, however, unable to explain the substantial heterogeneity (I2 = 78%) for rs1447295. We had a relatively small number of high PSA controls (n = 93) so our stratified analysis will have been underpowered, explaining why we were unable to detect some previously published associations of SNPs with PSA. To increase power, we carried out a sensitivity analysis by relaxing the high PSA control threshold to ≥2 ng/mL (n = 250), which resulted in 3 additional SNPs being associated with PSA level (rs6869841, rs10993994, and rs4775302).

To avoid potential false-positive results arising from multiple comparisons, we only highlight findings that showed evidence of an association in both the stratified and regression analyses. A Bonferroni-corrected threshold significance level is P ≤ 0.0006 (0.05/81 tests). However, this correction assumes that the tests were independent of each other, which in this case may not be true because some SNPs were correlated. Hence, a rigid application of the Bonferroni-corrected P value could lead to an erroneous rejection of genuine loci (false-negatives). Nevertheless, we have presented exact P values, allowing comparison with the Bonferroni-corrected P values. While all controls with an elevated PSA level were biopsy negative, there is the possibility that some controls with a PSA level < 3 ng/mL had undetected prostate cancer, as evidenced by a study (1) showing that 14% men with a PSA level ≤ 3 ng/mL had prostate cancer. The majority of controls in our study had a PSA level ≤ 1 ng/mL, which was associated with less than a 9% chance of undetected prostate cancer prostate cancer (1). It is also possible that up to a quarter of men with an elevated PSA level (≥3 ng/mL) and a negative biopsy had undetected prostate cancer (26), which could explain why we did not observe associations of SNPs with prostate cancer when using the high PSA control group.

We have confirmed associations of prostate cancer risk with 4 previously identified loci (8q24.21,10q11.23, 17q24.3, and 19q13.33) and associations of 7 markers with PSA level. We also found new evidence of an association of genetic variation at 3p22.2 with circulating PSA levels. We have highlighted that the method of selecting controls in case–control studies of associations of genetic variation with prostate cancer can influence the results, suggesting that inferences made in these studies should carefully consider control selection.

No potential conflicts of interest were disclosed.

Conception and design: D.W. Knipe, J.P. Kemp, J.L. Donovan, G. Davey Smith, R.M. Martin

Development of methodology: J.P. Kemp, M. Lathrop

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R. Eeles, D.F. Easton, Z. Kote-Jarai, S. Benlloch, J.L. Donovan, F.C. Hamdy, D.E. Neal

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D.W. Knipe, D.M. Evans, J.P. Kemp, D.F. Easton, A.A. Al Olama, F.C. Hamdy, M. Lathrop, R.M. Martin

Writing, review, and/or revision of the manuscript: D.W. Knipe, D.M. Evans, J.P. Kemp, R. Eeles, D.F. Easton, A.A. Al Olama, J.L. Donovan, F.C. Hamdy, G. Davey Smith, M. Lathrop, R.M. Martin

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): Z. Kote-Jarai, S. Benlloch

Study supervision: J.L. Donovan, R.M. Martin

The authors thank the tremendous contribution of all members of the ProtecT study research group, and especially the following who were involved in this research (Athene Lane, Prasad Bollina, Sue Bonnington, Lynn Bradshaw, James Catto, Debbie Cooper, Michael Davis, Liz Down, Andrew Doble, Alan Doherty, Garrett Durkan, Emma Elliott, David Gillatt, Pippa Herbert, Peter Holding, Joanne Howson, Mandy Jones, Roger Kockelbergh, Howard Kynaston, Teresa Lennon, Norma Lyons, Hing Leung, Malcolm Mason, Hilary Moody, Philip Powell, Alan Paul, Stephen Prescott, Derek Rosario, Patricia O'Sullivan, Pauline Thompson, Sarah Tidball). They also thank Gemma Marsden, who processed the blood samples at the biorepository, and Dr. Chris Metcalfe for providing statistical advice and the Center National de Genotypage for genotyping the ProtecT samples.

The ProtecT study is funded by the U.K. Health Technology Assessment (HTA) Programme of the NIH Research (HTA 96/20/99; ISRCTN20141297). D.W. Knipe is funded by the Wellcome Trust 4-year studentship (WT099874MA). D.M. Evans, J.P. Kemp, G. Davey-Smith, and R.M. Martin work in an MRC Unit that is supported by the UK Medical Research Council (MC_UU_12013/1-9) and the University of Bristol. D.F. Easton is supported by a Cancer Research UK Grant (C1287/A10118). R.A. Eeles and Z. Kote-Jarai are supported by Cancer Research UK Grant (C5047/A7357) and support from the NIHR to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden NHS Foundation Trust. J.F.C. Hamdy, D.E. Neal, and J.L. Donovan are NIHR Senior Investigators. The authors thank the provision of the additional epidemiological data by the NHS R&D Directorate supported Prodigal study and the ProMPT (Prostate Mechanisms of Progression and Treatment) collaboration which is supported by the National Cancer Research Institute (NCRI) formed by the Department of Health, the Medical Research Council, and Cancer Research UK (G0500966/75466).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Thompson
IM
,
Pauler
DK
,
Goodman
PJ
,
Tangen
CM
,
Lucia
MS
,
Parnes
HL
, et al
Prevalence of prostate cancer among men with a prostate-specific antigen level < or = 4.0 ng per milliliter
.
N Engl J Med
2004
;
350
:
2239
46
.
2.
Eeles
RA
,
Al Olama
AA
,
Benlloch
S
,
Saunders
EJ
,
Leongamornlert
DA
,
Tymrakiewicz
M
, et al
Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array
.
Nat Genet
2013
;
45
:
385
91
.
3.
Cheng
I
,
Chen
GK
,
Nakagawa
H
,
He
J
,
Wan
P
,
Laurie
CC
, et al
Evaluating genetic risk for prostate cancer among Japanese and Latinos
.
Cancer Epidemiol Biomarkers Prev
2012
;
21
:
2048
58
.
4.
Duggan
D
,
Zheng
SL
,
Knowlton
M
,
Benitez
D
,
Dimitrov
L
,
Wiklund
F
, et al
Two genome-wide association studies of aggressive prostate cancer implicate putative prostate tumor suppressor gene DAB2IP
.
J Natl Cancer Inst
2007
;
99
:
1836
44
.
5.
Eeles
RA
,
Kote-Jarai
Z
,
Al Olama
AA
,
Giles
GG
,
Guy
M
,
Severi
G
, et al
Identification of seven new prostate cancer susceptibility loci through a genome-wide association study
.
Nat Genet
2009
;
41
:
1116
21
.
6.
Eeles
RA
,
Kote-Jarai
Z
,
Giles
GG
,
Olama
AA
,
Guy
M
,
Jugurnauth
SK
, et al
Multiple newly identified loci associated with prostate cancer susceptibility
.
Nat Genet
2008
;
40
:
316
21
.
7.
Gudmundsson
J
,
Sulem
P
,
Gudbjartsson
DF
,
Blondal
T
,
Gylfason
A
,
Agnarsson
BA
, et al
Genome-wide association and replication studies identify four variants associated with prostate cancer susceptibility
.
Nat Genet
2009
;
41
:
1122
6
.
8.
Gudmundsson
J
,
Sulem
P
,
Manolescu
A
,
Amundadottir
LT
,
Gudbjartsson
D
,
Helgason
A
, et al
Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24
.
Nat Genet
2007
;
39
:
631
7
.
9.
Gudmundsson
J
,
Sulem
P
,
Steinthorsdottir
V
,
Bergthorsson
JT
,
Thorleifsson
G
,
Manolescu
A
, et al
Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes
.
Nat Genet
2007
;
39
:
977
83
.
10.
Sun
J
,
Zheng
SL
,
Wiklund
F
,
Isaacs
SD
,
Li
G
,
Wiley
KE
, et al
Sequence variants at 22q13 are associated with prostate cancer risk
.
Cancer Res
2009
;
69
:
10
5
.
11.
Takata
R
,
Akamatsu
S
,
Kubo
M
,
Takahashi
A
,
Hosono
N
,
Kawaguchi
T
, et al
Genome-wide association study identifies five new susceptibility loci for prostate cancer in the Japanese population
.
Nat Genet
2010
;
42
:
751
4
.
12.
Schumacher
FR
,
Berndt
SI
,
Siddiq
A
,
Jacobs
KB
,
Wang
Z
,
Lindstrom
S
, et al
Genome-wide association study identifies new prostate cancer susceptibility loci
.
Hum Mol Genet
2011
;
20
:
3867
75
.
13.
FitzGerald
LM
,
Kwon
EM
,
Conomos
MP
,
Kolb
S
,
Holt
SK
,
Levine
D
, et al
Genome-wide association study identifies a genetic variant associated with risk for more aggressive prostate cancer
.
Cancer Epidemiol Biomarkers Prev
2011
;
20
:
1196
203
.
14.
Haiman
CA
,
Chen
GK
,
Blot
WJ
,
Strom
SS
,
Berndt
SI
,
Kittles
RA
, et al
Genome-wide association study of prostate cancer in men of African ancestry identifies a susceptibility locus at 17q21
.
Nat Genet
2011
;
43
:
570
3
.
15.
Kote-Jarai
Z
,
Olama
AA
,
Giles
GG
,
Severi
G
,
Schleutker
J
,
Weischer
M
, et al
Seven prostate cancer susceptibility loci identified by a multi-stage genome-wide association study
.
Nat Genet
2011
;
43
:
785
91
.
16.
Nam
RK
,
Zhang
WW
,
Trachtenberg
J
,
Seth
A
,
Klotz
LH
,
Stanimirovic
A
, et al
Utility of incorporating genetic variants for the early detection of prostate cancer
.
Clin Cancer Res
2009
;
15
:
1787
93
.
17.
Thomas
G
,
Jacobs
KB
,
Yeager
M
,
Kraft
P
,
Wacholder
S
,
Orr
N
, et al
Multiple loci identified in a genome-wide association study of prostate cancer
.
Nat Genet
2008
;
40
:
310
5
.
18.
Al Olama
AA
,
Kote-Jarai
Z
,
Giles
GG
,
Guy
M
,
Morrison
J
,
Severi
G
, et al
Multiple loci on 8q24 associated with prostate cancer susceptibility
.
Nat Genet
2009
;
41
:
1058
60
.
19.
Gudmundsson
J
,
Besenbacher
S
,
Sulem
P
,
Gudbjartsson
DF
,
Olafsson
I
,
Arinbjarnarson
S
, et al
Genetic correction of PSA values using sequence variants associated with PSA levels
.
Sci Transl Med
2010
;
2
:
62ra92
.
20.
Lane
JA
,
Hamdy
FC
,
Martin
RM
,
Turner
EL
,
Neal
DE
,
Donovan
JL
. 
Latest results from the UK trials evaluating prostate cancer screening and treatment: the CAP and ProtecT studies
.
Eur J Cancer
2010
;
46
:
3095
101
.
21.
Ohori
M
,
Wheeler
TM
,
Scardino
PT
. 
The New American Joint Committee on Cancer and International Union Against Cancer TNM classification of prostate cancer. Clinicopathologic correlations
.
Cancer
1994
;
74
:
104
14
.
22.
Li
Y
,
Willer
CJ
,
Ding
J
,
Scheet
P
,
Abecasis
GR
. 
MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes
.
Genet Epidemiol
2010
;
34
:
816
34
.
23.
Li
Y
,
Willer
C
,
Sanna
S
,
Abecasis
G
. 
Genotype imputation
.
Annu Rev Genomics Hum Genet
2009
;
10
:
387
406
.
24.
Higgins
JP
,
Thompson
SG
,
Deeks
JJ
,
Altman
DG
. 
Measuring inconsistency in meta-analyses
.
BMJ
2003
;
327
:
557
60
.
25.
Nam
RK
,
Zhang
W
,
Siminovitch
K
,
Shlien
A
,
Kattan
MW
,
Klotz
LH
, et al
New variants at 10q26 and 15q21 are associated with aggressive prostate cancer in a genome-wide association study from a prostate biopsy screening cohort
.
Cancer Biol Ther
2011
;
12
:
997
1004
.
26.
Zackrisson
B
,
Aus
G
,
Lilja
H
,
Lodding
P
,
Pihl
CG
,
Hugosson
J
. 
Follow-up of men with elevated prostate-specific antigen and one set of benign biopsies at prostate cancer screening
.
Eur Urol
2003
;
43
:
327
32
.