Abstract
Background: Genome-wide association studies (GWAS) have produced weak (OR = 1.1–1.5) but significant associations between single nucleotide polymorphisms (SNPs) and prostate cancer. However, these associations may be explained by detection bias caused by SNPs influencing PSA concentration. Thus, in a simulation study, we quantified the extent of bias in the association between a SNP and prostate cancer when the SNP influences PSA concentration.
Methods: We generated 2,000 replicate cohorts of 20,000 men using real-world estimates of prostate cancer risk, prevalence of carrying ≥1 minor allele, PSA concentration, and the influence of a SNP on PSA concentration. We modeled risk ratios (RR) of 1.00, 1.25, and 1.50 for the association between carrying ≥1 minor allele and prostate cancer. We calculated mean betas from the replicate cohorts and quantified bias under each scenario.
Results: Assuming no association between a SNP and prostate cancer, the estimated mean bias in betas ranged from 0.02 to 0.10 for ln PSA being 0.05 to 0.20 ng/mL higher in minor allele carriers; the mean biased RRs ranged from 1.03 to 1.11. Assuming true RRs = 1.25 and 1.50, the biased RRs were as large as 1.39 and 1.67, respectively.
Conclusion: Estimates of the association between SNPs and prostate cancer can be biased to the magnitude observed in published GWAS, possibly resulting in type I error. However, large associations (RR > 1.10) may not fully be explained by this bias.
Impact: The influence of SNPs on PSA concentration should be considered when interpreting results from GWAS on prostate cancer. Cancer Epidemiol Biomarkers Prev; 24(1); 88–93. ©2014 AACR.
Introduction
Genome-wide association studies (GWAS) have produced weak but significant associations between SNPs and prostate cancer incidence, with ORs ranging from approximately 1.10 to greater than 1.50 (1, 2). As SNPs have been shown to be correlated with serum concentration of PSA, a common tool for prostate cancer screening, among men without prostate cancer (3–8), these associations, in part or whole, may be due to detection bias. Most of the 70+ SNPs identified from prostate cancer GWAS have not been located in regions containing genes or are intronic (9), and thus, there was no a priori expectation about links between these SNPs and PSA concentration (10). Parsing out true associations between SNPs and prostate cancer risk from those associations caused by detection bias is necessary given the call for increasing the specificity of prostate cancer screening through the addition of genetic risk profiles (1, 11, 12). Thus, a determination of the likelihood and extent of possible bias in associations between SNPs and prostate cancer in GWAS is needed.
Given the infeasibility of collecting data in the current era on large populations where PSA is never utilized as a screening tool for prostate cancer, we simulated cohorts of men at risk for prostate cancer to estimate the bias resulting from the influence of a SNP on PSA concentration in GWAS on prostate cancer.
Materials and Methods
We simulated cohorts of men at risk for prostate cancer consisting of three age strata, 50 to 59 years old, 60 to 69 years old, and 70 to 79 years old, weighted using 2009 estimates from the United States Census Bureau (13). We set the 10-year age-specific prostate cancer risk to 2%, 4%, and 6% for men 50 to 59, 60 to 69, and 70 to 79 years old, respectively, based on Surveillance Epidemiology and End Results Program (SEER) data for non-Hispanic white men, yielding an overall 10-year observed cumulative risk of 4.4%. However, we additionally assumed this risk represented only 85% of men who truly have prostate cancer, based on estimates that 15% of men with prostate cancer have undetectable disease (14). We made this assumption across all three age strata and for both the major and minor alleles. We incorporated this real-world lack of sensitivity by modeling the simulated risk of prostate cancer as a function of the 10-year age-specific risk and the detection rate (i.e., divided the risk by 85%). Because the SNPs identified from GWAS have been associated similarly with more and less aggressive disease, we did not specify stage or Gleason sum in the simulations.
We assumed X was a SNP, and its effect on prostate cancer incidence (path “a” in Fig. 1) and PSA (path “b” in Fig. 1) was independent of all other known and unknown risk factors. The independent effect of X on both prostate cancer and PSA was modeled under a dominant model such that the value 1 indicated that a man carried ≥1 minor allele and 0 indicated that a man was homozygous for the major allele; the distribution of carrying ≥1 minor allele was derived from a random Bernoulli distribution. We natural logarithm transformed serum PSA concentration (ln PSA) for all simulations, but report back-transformed values. Age-specific ln PSA distributions were based on the joint distributions of prostate cancer status and SNP X. Under the scenario where the SNP has no independent effect on PSA concentration in men without prostate cancer, we selected values for ln PSA from a random normal distribution with a mean and SD based on National Health and Nutrition Examination Survey 2001–2008 data for non-Hispanic white men without a diagnosis of prostate cancer (15). Under the same SNP–PSA association scenario for men with prostate cancer, we selected values for ln PSA from a random normal distribution with a mean and SD based on data from white patients with prostate cancer in the Johns Hopkins Prostate Cancer Recurrence Cohort (16). For scenarios where the SNP had an effect on PSA concentration, we simulated the values for PSA concentration for men ≥1 minor allele based on data from Wiklund and colleagues (3). We assumed the prevalences of all other causes of increased or decreased PSA, such as obesity, were uniform in the three age strata and in the major and minor alleles.
Forty-five scenarios were generated with the prevalence of carrying ≥1 minor allele set to 0.10, 0.30, and 0.50, and risk ratios (RR, path “a” in Fig. 1) equal to 1.00, 1.25, and 1.50. For each prevalence and RR combination, we assumed that ln PSA (Δ, path “b” in Fig. 1) was 0, 0.05, 0.10, 0.15, and 0.20 ng/mL higher in carriers of the minor allele than in noncarriers, based on the range of effects reported by Wiklund and colleagues among controls (3). Replicate cohorts (2,000) were created for each scenario. The total sample size for each replicate cohort was set to 20,000 with 49% (n = 9,800) of the men in the 50- to 59-year-old age stratum, 33% (n = 6,600) in the 60- to 69-year-old age stratum, and 18% (n = 3,600) in the 70- to 79-year-old age stratum. We assumed there were no losses to follow-up, and the SNP was in Hardy–Weinberg equilibrium. For all scenarios, we assumed that if a man had prostate cancer and a PSA concentration ≥4 ng/mL, he would be biopsied, and that biopsy would show prostate cancer (i.e., no sampling error).
The βestimate and RR for the association between the SNP and prostate cancer were estimated for the 2,000 replicate cohorts under each of the 45 scenarios using PROC GLM (SAS v 9.1, Cary, NC). We report the estimate of the bias, defined as the mean difference between the natural logarithm of each estimated RR (βestimate) and the natural logarithm of the true RR (βtruth). We also report the mean 10-year cumulative risk of prostate cancer and the mean ln PSA for each combination of minor allele carrier/noncarrier and case/control status and the effect of the SNP on PSA.
Results
When carrying ≥1 minor allele of the SNP was simulated to have no effect on the PSA distribution (Δ = 0), as expected there was no bias in the estimates of the RR (1.00, 1.25, 1.50) at each minor allele carrier prevalence (0.10, 0.30, 0.50; Table 1). The PSA distribution and the 10-year cumulative risk of prostate cancer (Table 2) were also unchanged from the true values, and the overall risk matched the risk observed in SEER data for non-Hispanic white men, 4.4%. In contrast, when the SNP influenced ln PSA (Δ = 0.05, 0.10, 0.15, 0.20; Table 1), the estimated RRs were biased for all minor allele prevalences. In addition, the 10-year cumulative risk of prostate cancer was slightly increased when the SNP was simulated to influence ln PSA (Table 2).
Assuming no association between the SNP and prostate cancer, we observed a minor increase from 4.4% to 4.5% in the mean 10-year cumulative risk of prostate cancer when the prevalence of carrying ≥1 minor allele was 0.10 and the SNP was assumed to have the greatest effect on ln PSA, Δ = 0.20 (Table 2). There was a slight increase in prostate cancer risk with an increase in the prevalence of carrying ≥1 minor allele; at a prevalence of 0.50, the 10-year cumulative risk of prostate cancer was observed to be 4.6% (Table 2). The 10-year cumulative risk of prostate cancer depended on the prevalence of carrying ≥1 minor allele and the true effect of the SNP on PSA, but did not depend on the true association between the SNP and prostate cancer risk. For all prevalences of carrying ≥1 minor allele, the bias in association between the SNP and prostate cancer risk increased as the SNP's effect on PSA distribution increased, with the maximum bias equal to 0.10 and an observed mean RR equal to 1.11 (Table 1).
Under the scenario where the true RR equaled 1.25 and prevalence of carrying ≥1 minor allele equaled 0.10, the mean bias was 0.10 when the effect of the SNP on PSA was Δ = 0.20 (Table 1). Holding Δ constant, this bias increased as the prevalence of carrying ≥1 minor allele increased. When the true RR equaled 1.25, the largest bias (0.10) occurred when the effect of the SNP on ln PSA was Δ = 0.20 for all prevalences of ≥1 minor allele.
Bias similar to the other scenarios occurred when the true RR equaled 1.50 for all prevalences of carrying ≥1 minor allele. Holding Δ constant, bias increased as the simulated prevalence of carrying ≥1 minor allele increased. Although not the most biased estimate, the largest observed RR occurred under the scenario of a true RR = 1.50, prevalence of ≥1 minor allele = 0.50, and Δ = 0.20 (Table 3).
In men without prostate cancer who carried ≥1 minor allele, the mean of the median PSA concentration increased from 1.09 to 1.32 ng/mL as Δ in PSA between carriers and noncarriers of a risk allele increased from 0 to 0.20 for all combinations of the prevalence of carrying ≥1 minor allele (0.10, 0.30, and 0.50) and true RR (1.00, 1.25, and 1.50; Table 4). For men with prostate cancer who did not carry a minor allele, the mean of the median PSA concentration under all scenarios was equal to 6.31 or 6.32 ng/mL (Table 4). Of note, the mean of the median PSA and mean number of men detected with prostate cancer increased only slightly as the effect of the SNP on PSA increased (data not shown).
Discussion
In this simulation, the magnitude of detection bias in the association between a SNP and prostate cancer risk was related to the extent of that SNP's influence on PSA concentration and the size of the true association between the SNP and prostate cancer. The largest departure from the true RR was observed when the true SNP–prostate cancer association was null and the SNP had the greatest effect on PSA (Δ = 0.20), irrespective of the prevalence of carrying ≥1minor allele. The bias in the observed RR decreased as the true association between the SNP and prostate cancer risk increased, irrespective of the prevalence of carrying ≥1 minor allele.
We observed biased RRs when we simulated true associations of 1.00 and larger, similar to and slightly greater in magnitude than the range of observed RRs for 70+ prostate cancer risk alleles identified in GWAS (1, 2, 9, 10). These results indicate RRs observed in GWAS could be due partially or completely to PSA's role in the detection of asymptomatic prostate cancer. It is estimated that at least 40% of the variability in total PSA is due to heritable factors, perhaps a greater effect than age (17). It has been shown that the frequency of genetic variants elevating PSA is greater among men with normal prostates who undergo biopsy than population controls providing evidence of the increase in probability of being biopsied among men with elevated PSA caused by genetic variation (4). Our results are in line with these observations, and together indicate a genetically driven increase in PSA can lead to an increased frequency of biopsies and a resulting increase in detection of prostate cancer. We incorporated imperfect sensitivity of PSA screening for the detection of prostate cancer into this simulation and assumed the same sensitivity in all scenarios. We did not incorporate imperfect sensitivity of biopsy. However, we acknowledge that biopsy sensitivity may differ by tumor size, prostate volume, and genetic variant.
Of the previously identified risk SNPs, several have been found to be associated with elevated PSA concentration (3, 5–8, 18–23). It is likely that any increase in prostate cancer risk observed for these SNPs is driven partially or completely by the SNP's effect on PSA concentration (4); our simulation results show that even minor changes in PSA distribution could result in the magnitude of RRs seen for these risk loci seen in the prostate cancer GWAS.
The benefit of screening for elevated PSA on prostate cancer mortality is still controversial (24) given differing results of the U.S. and European randomized trials (25, 26). In addition, the sensitivity and specificity of PSA screening are imperfect; men below the traditional cutoff of 4.0 ng/mL have been shown to have a high prevalence of prostate cancer, including high-grade disease, when biopsied without indication, and men well above the cutoff have been found not to have cancer on biopsy (14). Thus, risk prediction using polymorphisms along with PSA concentration may be an appealing solution.
For genetic profiles to have utility in prostate cancer risk prediction, it is not only important to understand the extent of the bias created by PSA screening, but also to explore how this bias may arise. One of the hypothesized pathways we simulated was the SNP as a cause of prostate cancer (RR ≠ 1.00; path “a” on Fig. 1). Because some of the prostate cancer risk SNPs identified in GWAS are independent of PSA concentration among controls, this suggests some SNPs may be causal. However, our simulation results suggest some SNPs identified in GWAS that are associated with PSA concentration may appear to be associated with prostate cancer risk solely because they influence PSA concentration (Δ ≠ 0, path “b” on Fig. 1). In the simulation scenario where the true association is null (RR = 1.00; path “a” on Fig. 1), the observed association between the SNP and prostate cancer was completely due to detection bias.
A combination of cause and detection bias forms another plausible hypothesis, where the SNP is itself a direct cause of prostate cancer and is also a direct cause of PSA concentration (RR ≠ 1.00 and Δ ≠ 0; paths “a” and “b” on Fig. 1). Similar to the second hypothesis, this observed association is being driven by an increased propensity of men with elevated PSA to be biopsied and thus have a greater opportunity to have occult prostate cancer diagnosed, leading to inflation in the observed association between the SNP and prostate cancer. Our simulation results indicate that although the absolute bias is reduced in this scenario, measures of association are being overestimated when the association between the SNP and PSA is not considered.
A fourth emerging hypothesis places the SNP as a direct cause of PSA elevation, and the resulting elevated PSA concentration as a direct cause of prostate cancer; the SNP becomes an indirect cause of prostate cancer. In this model (Fig. 2), the SNP also produces an increase in biopsies and higher likelihood of being diagnosed with prostate cancer. This hypothesis represents a shift from the traditional status of PSA as a biomarker to a more involved component of either the initiation or promotion of prostate cancer (27). However, we were unable to explore this relationship in this simulation study due the lack of knowledge of this causal hypothesis and our inability to parameterize this scenario.
Given that prostate cancer is likely overdetected (25, 26), that is, many of the cancers detected by PSA screening never would have caused morbidity and mortality, and thus, overtreated, we must attempt to increase the specificity of the PSA test. Because SNPs previously identified as prostate cancer risk factors have also been shown to be independently associated with PSA concentration suggests that their associations may be due to detection bias. In this simulation, we showed a SNP that influences PSA concentration, but is not truly associated with prostate cancer, could appear to be associated with as much as a 10% increase in the risk of prostate cancer; in this scenario, 100% of the association is due to bias. We focused on the role type I error plays in GWAS, but we acknowledge that not observing true associations could be a result of type II error. For this analysis, we did not set out to explore the impact of type II error and therefore did not address the so-called winner's curse of replications studies (28–30).
In conclusion, the potential for detection bias due to the influence of SNPs on PSA concentration should be considered before attempting to increase the specificity of PSA screening by including genetic risk profiles from GWAS.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Disclaimer
The content of this work is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Authors' Contributions
Conception and design: P.J. Dluzniewski, E.A. Platz
Development of methodology: P.J. Dluzniewski, I. Ruczinski, E.A. Platz
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): P.J. Dluzniewski, J. Xu, I. Ruczinski, W.B. Isaacs, E.A. Platz
Writing, review, and/or revision of the manuscript: P.J. Dluzniewski, J. Xu, I. Ruczinski, W.B. Isaacs, E.A. Platz
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): P.J. Dluzniewski, E.A. Platz
Study supervision: E.A. Platz
Grant Support
This study was supported by NIH grant R01 CA140262 (to Drs. J. Xu and E.A. Platz). P.J. Dluzniewski was supported by NIH grant T32 CA009314 (to Dr. E.A. Platz).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.