Abstract
Several blood protein biomarkers have been associated with prostate cancer risk. However, most studies assessed only a small number of biomarkers and/or included a small sample size. To identify novel protein biomarkers of prostate cancer risk, we studied 79,194 cases and 61,112 controls of European ancestry, included in the PRACTICAL/ELLIPSE consortia, using genetic instruments of protein quantitative trait loci for 1,478 plasma proteins. A total of 31 proteins were associated with prostate cancer risk including proteins encoded by GSTP1, whose methylation level was shown previously to be associated with prostate cancer risk, and MSMB, SPINT2, IGF2R, and CTSS, which were previously implicated as potential target genes of prostate cancer risk variants identified in genome-wide association studies. A total of 18 proteins inversely correlated and 13 positively correlated with prostate cancer risk. For 28 of the identified proteins, gene somatic changes of short indels, splice site, nonsense, or missense mutations were detected in patients with prostate cancer in The Cancer Genome Atlas. Pathway enrichment analysis showed that relevant genes were significantly enriched in cancer-related pathways. In conclusion, this study identifies 31 candidates of protein biomarkers for prostate cancer risk and provides new insights into the biology and genetics of prostate tumorigenesis.
Integration of genomics and proteomics data identifies biomarkers associated with prostate cancer risk.
Introduction
Prostate cancer is the second most frequently diagnosed malignancy and the fifth leading cause of cancer-related mortality among males worldwide (1). In the United States, there were 164,690 estimated new prostate cancer cases and 29,430 estimated deaths due to prostate cancer in 2018, making it a malignancy with the highest incidence and second highest mortality in males (2). The survival rate is higher when cancer is diagnosed at a localized stage while it drops substantially when prostate cancer is diagnosed at a metastatic stage (3). Biomarkers are needed for screening and the early detection of prostate cancer. PSA has been used widely for prostate cancer screening (4, 5); however, there are controversies in using PSA screening due to the lack of a clear cut-off point for high sensitivity and specificity (6–8), unclear benefit in reducing mortality in some populations (9–11), and overdiagnosis of prostate cancer (12). Thus, there is a critical need to identify additional screening biomarkers aiming to reduce the mortality of prostate cancer.
Several other protein biomarkers measured in blood have been reported to be potentially associated with prostate cancer risk, such as IGF-1, IGFBP1/2, and IL6 (13–16). However, findings have been inconsistent from previous studies. Most existing studies have assessed only a small number of candidates. With the recent development of proteomics technology, there have been several studies searching the whole proteome to identify novel biomarkers for prostate cancer early detection and diagnosis (17–20). These studies have generated some promising findings. However, these have only included a relatively small number of subjects as it is expensive to profile the proteome in a large population-based study. More importantly, there are multiple limitations that are commonly encountered in conventional epidemiologic studies, including selection bias, potential confounding, and reverse causation. These limitations may explain some of the inconsistent results from previous studies.
To reduce these biases, we used genetic variants associated with blood protein levels as the instruments to assess the associations between genetically predicted protein levels and prostate cancer risk. Because of the random assortment of alleles transferred from parents to offspring during gamete formation, this approach should be less susceptible to selection bias, reverse causation, and confounding effects. Over the past few years, genome-wide association studies (GWAS) have identified hundreds of protein quantitative loci (pQTL; refs. 21, 22). With a large sample size, many of these genetic variants can serve as strong instrumental variables for evaluating the associations of genetically predicted protein levels with prostate cancer risk. Herein, we report results from the first large study investigating the associations between genetically predicted blood protein levels and prostate cancer risk using genetic instruments. We used the data from 79,194 cases and 61,112 controls of European descent included in GWAS consortia PRACTICAL, CRUK, CAPS, BPC3, and PEGASUS, as described previously (23).
Materials and Methods
A literature search was performed to identify the GWAS that uncovered genetic variants that were significantly associated with protein levels. After careful evaluation, the study conducted by Sun and colleagues represents the largest and most comprehensive study to date (24). By using the data from two subcohorts of 2,731 and 831 healthy European ancestry participants from the INTERVAL study, Sun and colleagues identified 1,927 genetic associations with 1,478 proteins at a stringent significance level (24). The detailed information of this study has been described elsewhere (24). In brief, an aptamer-based multiplex protein assay (SOMAscan) was used to quantify 3,620 plasma proteins. The robustness of the protein measurements was verified using several methods (24). Genotypes were measured using the Affymetrix Axiom UK Biobank array, which were further imputed using a combined reference panel from 1000 Genomes and UK10K. pQTL analyses were performed within each subcohort, with adjustments for age, sex, duration between blood draw and processing, and the first three principal components. After combining the association results from the two subcohorts via fixed-effects inverse-variance meta-analysis using METAL, the genetic associations between 1,927 variants and 1,478 proteins showed a meta-analysis of P < 1.5 × 10−11, and a consistent direction of effect and nominal significance (P < 0.05). These pQTLs were used to construct the instrumental variables for assessing associations between protein levels and the risk of developing prostate cancer. When two or more variants located at the same chromosome were identified to be associated with a particular protein, we assessed the correlations of the SNPs using the Pairwise LD function of SNiPA (http://snipa.helmholtz-muenchen.de/snipa/index.php?task=pairwise_ld). For each protein, only SNPs independent of each other, as defined by r2 < 0.1 (based on 1000 Genomes Project phase III version 5 data focusing on European populations), were used to construct the instruments.
We used the summary statistics data for the association of genetic variants with prostate cancer risk that were generated from 79,194 prostate cancer cases and 61,112 controls of European ancestry in the consortia PRACTICAL, CRUK, CAPS, BPC3, and PEGASUS (23, 25). In brief, 46,939 prostate cancer cases and 27,910 controls were genotyped using OncoArray, which included 570,000 SNPs (http://epi.grants.cancer.gov/oncoarray/). Also included were data from several previous prostate cancer GWAS of European ancestry: UK stage 1 and stage 2; CaPS 1 and CaPS 2; BPC3; NCI PEGASUS; and iCOGS. These genotype data were imputed using the June 2014 release of the 1000 Genomes Project data as a reference. Logistic regression summary statistics were then meta-analyzed using an inverse variance fixed effect approach.
For estimating the association between genetically predicted circulating protein levels and prostate cancer risk, the inverse variance weighted (IVW) method, using summary statistics results, was used (26). The beta coefficient of the association between genetically predicted protein levels and prostate cancer risk was estimated using , and the corresponding SE was estimated using
. Here, βi,GX represents the beta coefficient of the association between ith SNP and the protein of interest generated from the pQTL study by Sun and colleagues; βi,GY and σi,GY represent the beta coefficient and SE, respectively, for the association between ith SNP and prostate cancer risk in the prostate cancer GWAS. The association OR, confidence interval (CI), and P value were then estimated on the basis of the calculated beta coefficient and SE. A Benjamini–Hochberg FDR of <0.05 was used to adjust for multiple comparisons. Furthermore, to evaluate whether the identified associations between genetically predicted circulating protein levels and prostate cancer risk were independent of association signals identified in GWAS, we performed conditional analyses, adjusting for the closest risk SNPs identified in previous GWAS or fine-mapping studies. For this analysis, we performed GCTA-COJO analyses (version 1.26.0; refs. 27, 30) to calculate associations of SNPs with prostate cancer risk, after adjusting for the risk SNP of interest. We then reran the IVW analyses using the association estimates generated from conditional analyses.
For each of the genes encoding the proteins that are identified in our study in association with prostate cancer risk, we evaluated genetic variants/mutations/indels in prostate tumor tissues from patients with prostate cancer included in The Cancer Genome Atlas (TCGA). The somatic level genetic changes were analyzed using MuTect (31) and deposited to the TCGA data portal. Data were retrieved in April 2016, through the data portal. The proportion of assessed genes containing such somatic level genetic events tended to be enriched, when compared with the proportion of all protein-coding genes across the genome. Analysis was performed using MedCalc online software.
To further assess whether our identified prostate cancer–associated proteins are enriched in specific pathways, molecular and cellular functions, and networks, we performed an enrichment analysis of the genes encoding identified proteins using Ingenuity Pathway Analysis (IPA) software (32). The detailed methodology of this tool has been described elsewhere (32). In brief, an “enrichment” score (Fisher exact test P value) that measures overlap of observed and predicted regulated gene sets was generated for each of the tested gene sets. The most significant pathways and functions with an enrichment P value less than 0.05 were reported.
Data availability
The OncoArray genotype data and relevant covariate information (i.e., ethnicity, country, principal components, etc.) for prostate cancer study are available in dbGAP (Accession no.: phs001391.v1.p1). In total, 47 of the 52 OncoArray studies, encompassing nearly 90% of the individual samples, are available. The previous meta-analysis summary results and genotype data are currently available in dbGAP (Accession no.: phs001081.v1.p1).
Results
Of the pQTLs for 1,478 proteins assessed in this study, association results for prostate cancer risk were available for pQTLs of 1,469 proteins in the prostate cancer GWAS. For 1,106 of these proteins, only a single pQTL was identified. Two pQTLs were identified for 302 proteins and three or more pQTLs were identified for 71 proteins. Using the IVW method, we identified 31 proteins for which their genetically predicted levels were associated with prostate cancer risk at a FDR < 0.05 (Tables 1 and 2), including 22 encoded by genes located more than 500 Kb away from any reported prostate cancer risk variants identified in GWAS or fine-mapping studies (Table 1). The other nine associated proteins are encoded by genes locate at previously reported prostate cancer risk loci (Table 2), including MSMB, SPINT2, IGF2R, and CTSS, which were previously implicated as candidate target genes of prostate cancer risk variants identified in GWAS (33–35). Interestingly, we also observed a significant association for glutathione S-transferase Pi, encoded by GSTP1 (Table 2), whose methylation has been identified as a potential biomarker for prostate cancer (36). In our study, an inverse association between protein level and prostate cancer risk was detected for PSP-94, DcR3, IGF-II receptor, KDEL2, Cathepsin S, ZHX3, ZN175, GPC6, RM33, PIM1, WISP-3, NCF-2, ATF6A, laminin, glutathione S-transferase Pi, GNMT, LRRN1, and SNAB (ORs ranging from 0.69 to 0.97). Conversely, an association between a higher protein level and increased prostate cancer risk was identified for TACT, GRIA4, PDE4D, TIP39, SPINT2, MICB, IL21, ARFP2, RF1ML, TPST1, KLRF1, TM149, and NKp46 (ORs ranging from 1.11 to 1.23).
Twenty-two novel protein–prostate cancer associations for proteins whose encoding genes are located at genomic loci at least 500 kb away from any GWAS-identified prostate cancer risk variants
Protein . | Protein full name . | Protein-encoding gene . | Region . | Index SNP(s)a . | Distance of gene to the index SNP (kb) . | Instrument variants . | Type of pQTL . | OR (95% CI)b . | P . | FDR P valuec . | P value after adjusting for risk SNPd . |
---|---|---|---|---|---|---|---|---|---|---|---|
ATF6A | Cyclic AMP–dependent transcription factor ATF-6 alpha | ATF6 | 1q23.3 | rs4845695 | 6,824 | rs8111, rs61738953 | trans, trans | 0.90 (0.86–0.95) | 1.31 × 10−4 | 9.18 × 10−3 | 1.31 × 10−4 |
NCF-2 | Neutrophil cytosol factor 2 | NCF2 | 1q25.3 | rs199774366 | 20,932 | rs4632248, rs28929474 | trans, trans | 0.95 (0.92–0.97) | 9.93 × 10−5 | 7.29 × 10−3 | NA* |
Laminin | Laminin | LAMC1 | 1q25.3 | rs199774366 | 21,377 | rs62199218, rs4129858 | trans, cis | 0.93 (0.89–0.97) | 4.16 × 10−4 | 0.03 | NA* |
RM33 | 39S ribosomal protein L33_mitochondrial | MRPL33 | 2p23.2 | rs13385191 | 7,106 | rs28929474 | trans | 0.93 (0.90–0.96) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
LRRN1 | Leucine-rich repeat neuronal protein 1 | LRRN1 | 3p26.2 | rs2660753 | 83,221 | rs429358, rs6801789 | trans, cis | 0.97 (0.95–0.99) | 7.21 × 10−4 | 0.04 | 7.21 × 10−4 |
TACT | T-cell surface protein tactile | CD96 | 3q13.13–3q13.2 | rs7611694 | 1,891 | rs3132451 | trans | 1.22 (1.16–1.29) | 1.02 × 10−12 | 3.75 × 10−10 | 1.02 × 10−12 |
IL21 | IL21 | IL21 | 4q27 | rs34480284 | 17,469 | rs12368181, rs3129897 | trans, trans | 1.11 (1.06–1.16) | 7.77 × 10−6 | 7.43 × 10−4 | NA* |
PDE4D | cAMP-specific 3_5-cyclic phosphodiesterase 4D | PDE4D | 5q11.2–5q12.1 | rs1482679 | 13,879 | rs3132451 | trans | 1.17 (1.12–1.22) | 1.02 × 10−12 | 3.75 × 10−10 | 1.02 × 10−12 |
GNMT | Glycine N-methyltransferase | GNMT | 6p21.1 | rs4711748 | 763 | rs57736976 | cis | 0.93 (0.89–0.97) | 6.80 × 10−4 | 0.04 | 2.78 × 10−4 |
PIM1 | Serine/threonine-protein kinase pim-1 | PIM1 | 6p21.2 | rs9469899 | 2,345 | rs28929474 | trans | 0.88 (0.83–0.93) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
WISP-3 | WNT1-inducible signaling pathway protein 3 | WISP3 | 6q21 | rs2273669 | 3,090 | rs28929474 | trans | 0.83 (0.77–0.90) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
TPST1 | Protein-tyrosine sulfotransferase 1 | TPST1 | 7q11.21 | rs56232506 | 18,233 | rs313829 | cis | 1.14 (1.06–1.22) | 5.23 × 10−4 | 0.03 | 5.43 × 10−4 |
ARFP2 | Arfaptin-2 | ARFIP2 | 11p15.4 | rs61890184 | 1,045 | rs28929474 | trans | 1.23 (1.12–1.35) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
GRIA4 | Glutamate receptor 4 | GRIA4 | 11q22.3 | rs1800057 | 2,291 | rs3132451 | trans | 1.17 (1.12–1.22) | 1.02 × 10−12 | 3.75 × 10−10 | 1.02 × 10−12 |
KLRF1 | Killer cell lectin-like receptor subfamily F member 1 | KLRF1 | 12p13.31 | rs2066827 | 2,873 | rs11708955, rs62143194 | trans, trans | 1.13 (1.05–1.20) | 5.74 × 10−4 | 0.03 | 5.74 × 10−4 |
GPC6 | Glypican-6 | GPC6 | 13q31.3–13q32.1 | rs9600079 | 20,151 | rs28929474 | trans | 0.81 (0.73–0.89) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
TM149 | IGF-like family receptor 1 | IGFLR1 | 19q13.12 | rs8102476 | 2,502 | rs12459634 | cis | 1.06 (1.02–1.09) | 7.31 × 10−4 | 0.04 | 4.68 × 10−3 |
TIP39 | Tuberoinfundibular peptide of 39 residues | PTH2 | 19q13.33 | rs2659124 | 1,428 | rs375375234 | trans | 1.22 (1.13–1.32) | 3.06 × 10−7 | 4.99 × 10−5 | 2.96 × 10−7 |
ZN175 | Zinc finger protein 175 | ZNF175 | 19q13.41 | rs2735839 | 710 | rs28929474 | trans | 0.91 (0.87–0.95) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
NKp46 | Natural cytotoxicity triggering receptor 1 | NCR1 | 19q13.42 | rs103294 | 620 | rs2278428 | cis | 1.16 (1.06–1.26) | 9.91 × 10−4 | 0.05 | 9.65 × 10−4 |
SNAB | Beta-soluble NSF attachment protein | NAPB | 20p11.21 | rs11480453 | 7,945 | rs429358, rs7658970 | trans, trans | 0.91 (0.86–0.96) | 9.77 × 10−4 | 0.05 | 9.77 × 10−4 |
ZHX3 | Zinc fingers and homeoboxes protein 3 | ZHX3 | 20q12 | rs11480453 | 8,460 | rs1694123 | trans | 0.79 (0.71–0.88) | 9.38 × 10−6 | 7.43 × 10−4 | 9.67 × 10−6 |
Protein . | Protein full name . | Protein-encoding gene . | Region . | Index SNP(s)a . | Distance of gene to the index SNP (kb) . | Instrument variants . | Type of pQTL . | OR (95% CI)b . | P . | FDR P valuec . | P value after adjusting for risk SNPd . |
---|---|---|---|---|---|---|---|---|---|---|---|
ATF6A | Cyclic AMP–dependent transcription factor ATF-6 alpha | ATF6 | 1q23.3 | rs4845695 | 6,824 | rs8111, rs61738953 | trans, trans | 0.90 (0.86–0.95) | 1.31 × 10−4 | 9.18 × 10−3 | 1.31 × 10−4 |
NCF-2 | Neutrophil cytosol factor 2 | NCF2 | 1q25.3 | rs199774366 | 20,932 | rs4632248, rs28929474 | trans, trans | 0.95 (0.92–0.97) | 9.93 × 10−5 | 7.29 × 10−3 | NA* |
Laminin | Laminin | LAMC1 | 1q25.3 | rs199774366 | 21,377 | rs62199218, rs4129858 | trans, cis | 0.93 (0.89–0.97) | 4.16 × 10−4 | 0.03 | NA* |
RM33 | 39S ribosomal protein L33_mitochondrial | MRPL33 | 2p23.2 | rs13385191 | 7,106 | rs28929474 | trans | 0.93 (0.90–0.96) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
LRRN1 | Leucine-rich repeat neuronal protein 1 | LRRN1 | 3p26.2 | rs2660753 | 83,221 | rs429358, rs6801789 | trans, cis | 0.97 (0.95–0.99) | 7.21 × 10−4 | 0.04 | 7.21 × 10−4 |
TACT | T-cell surface protein tactile | CD96 | 3q13.13–3q13.2 | rs7611694 | 1,891 | rs3132451 | trans | 1.22 (1.16–1.29) | 1.02 × 10−12 | 3.75 × 10−10 | 1.02 × 10−12 |
IL21 | IL21 | IL21 | 4q27 | rs34480284 | 17,469 | rs12368181, rs3129897 | trans, trans | 1.11 (1.06–1.16) | 7.77 × 10−6 | 7.43 × 10−4 | NA* |
PDE4D | cAMP-specific 3_5-cyclic phosphodiesterase 4D | PDE4D | 5q11.2–5q12.1 | rs1482679 | 13,879 | rs3132451 | trans | 1.17 (1.12–1.22) | 1.02 × 10−12 | 3.75 × 10−10 | 1.02 × 10−12 |
GNMT | Glycine N-methyltransferase | GNMT | 6p21.1 | rs4711748 | 763 | rs57736976 | cis | 0.93 (0.89–0.97) | 6.80 × 10−4 | 0.04 | 2.78 × 10−4 |
PIM1 | Serine/threonine-protein kinase pim-1 | PIM1 | 6p21.2 | rs9469899 | 2,345 | rs28929474 | trans | 0.88 (0.83–0.93) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
WISP-3 | WNT1-inducible signaling pathway protein 3 | WISP3 | 6q21 | rs2273669 | 3,090 | rs28929474 | trans | 0.83 (0.77–0.90) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
TPST1 | Protein-tyrosine sulfotransferase 1 | TPST1 | 7q11.21 | rs56232506 | 18,233 | rs313829 | cis | 1.14 (1.06–1.22) | 5.23 × 10−4 | 0.03 | 5.43 × 10−4 |
ARFP2 | Arfaptin-2 | ARFIP2 | 11p15.4 | rs61890184 | 1,045 | rs28929474 | trans | 1.23 (1.12–1.35) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
GRIA4 | Glutamate receptor 4 | GRIA4 | 11q22.3 | rs1800057 | 2,291 | rs3132451 | trans | 1.17 (1.12–1.22) | 1.02 × 10−12 | 3.75 × 10−10 | 1.02 × 10−12 |
KLRF1 | Killer cell lectin-like receptor subfamily F member 1 | KLRF1 | 12p13.31 | rs2066827 | 2,873 | rs11708955, rs62143194 | trans, trans | 1.13 (1.05–1.20) | 5.74 × 10−4 | 0.03 | 5.74 × 10−4 |
GPC6 | Glypican-6 | GPC6 | 13q31.3–13q32.1 | rs9600079 | 20,151 | rs28929474 | trans | 0.81 (0.73–0.89) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
TM149 | IGF-like family receptor 1 | IGFLR1 | 19q13.12 | rs8102476 | 2,502 | rs12459634 | cis | 1.06 (1.02–1.09) | 7.31 × 10−4 | 0.04 | 4.68 × 10−3 |
TIP39 | Tuberoinfundibular peptide of 39 residues | PTH2 | 19q13.33 | rs2659124 | 1,428 | rs375375234 | trans | 1.22 (1.13–1.32) | 3.06 × 10−7 | 4.99 × 10−5 | 2.96 × 10−7 |
ZN175 | Zinc finger protein 175 | ZNF175 | 19q13.41 | rs2735839 | 710 | rs28929474 | trans | 0.91 (0.87–0.95) | 9.61 × 10−6 | 7.43 × 10−4 | 9.42 × 10−6 |
NKp46 | Natural cytotoxicity triggering receptor 1 | NCR1 | 19q13.42 | rs103294 | 620 | rs2278428 | cis | 1.16 (1.06–1.26) | 9.91 × 10−4 | 0.05 | 9.65 × 10−4 |
SNAB | Beta-soluble NSF attachment protein | NAPB | 20p11.21 | rs11480453 | 7,945 | rs429358, rs7658970 | trans, trans | 0.91 (0.86–0.96) | 9.77 × 10−4 | 0.05 | 9.77 × 10−4 |
ZHX3 | Zinc fingers and homeoboxes protein 3 | ZHX3 | 20q12 | rs11480453 | 8,460 | rs1694123 | trans | 0.79 (0.71–0.88) | 9.38 × 10−6 | 7.43 × 10−4 | 9.67 × 10−6 |
NOTE: NA*, the adjacent risk variant is not available in the 1000 Genomes Project data.
aClosest risk variant identified in previous GWAS or fine-mapping studies for prostate cancer risk.
bOR and CI per one SD increase in genetically predicted protein.
cFDR P value, FDR-adjusted P value; associations with a FDR P ≤ 0.05 considered statistically significant.
dUsing COJO method (27).
Nine novel protein–prostate cancer associations for proteins whose encoding genes are located at genomic loci within 500 kb of previous GWAS-identified prostate cancer risk variants
Protein . | Protein name . | Protein-encoding gene . | Region . | Index SNP(s)a . | Distance of gene to the index SNP (kb) . | Instrument variants . | Type of pQTL . | OR (95% CI)b . | P . | FDR P valuec . | P value after adjusting for risk SNPsd . |
---|---|---|---|---|---|---|---|---|---|---|---|
Cathepsin S | Cathepsin S | CTSS | 1q21.3 | rs17599629 | 44 | rs41271951 | cis | 0.91 (0.88–0.95) | 2.73 × 10−7 | 4.99 × 10−5 | 0.16 |
MICB | MHC class I polypeptide-related sequence B | MICB | 6p21.33 | rs2596546 | 133 | rs3134900 | cis | 1.09 (1.05–1.12) | 2.07 × 10−6 | 2.76 × 10−4 | 0.03 |
RF1ML | Peptide chain release factor 1-like_ mitochondrial | MTRF1L | 6q25.2 | rs3968480 | 109 | rs503366 | cis | 1.18 (1.08–1.29) | 4.67 × 10−4 | 0.03 | 0.21 |
IGF-II receptor | Cation-independent mannose-6-phosphate receptor | IGF2R | 6q25.3 | rs651164 | 47 | rs629849 | cis | 0.92 (0.90–0.94) | 3.98 × 10−10 | 9.73 × 10−8 | 9.95 × 10−11 |
PSP-94 | Beta-microseminoprotein | MSMB | 10q11.22 | rs10993994 | 0.002 | rs541781976, rs10993994 | trans, cis | 0.81 (0.80–0.82) | 3.60 × 10−155 | 5.29 × 10−152 | NA* |
Glutathione S-transferase Pi | Glutathione S-transferase P | GSTP1 | 11q13.2 | rs12785905 | 399 | rs1695, rs62143206 | cis, trans | 0.94 (0.91–0.97) | 5.91 × 10−4 | 0.03 | 3.12 × 10−3 |
KDEL2 | KDEL motif-containing protein 2 | KDELC2 | 11q22.3 | rs1800057 | 199 | rs74911261 | cis | 0.89 (0.86–0.93) | 1.83 × 10−8 | 3.85 × 10−6 | 0.42 |
SPINT2 | Kunitz-type protease inhibitor 2 | SPINT2 | 19q13.2 | rs8102476 rs12610267 | 0 | rs71354995 | cis | 1.05 (1.03–1.06) | 1.31 × 10−6 | 1.92 × 10−4 | 0.07 |
DcR3 | Tumor necrosis factor receptor superfamily member 6B | TNFRSF6B | 20q13.33 | rs6062509 | 33 | rs62217798 | cis | 0.69 (0.62–0.77) | 1.98 × 10−11 | 5.81 × 10−9 | 0.05 |
Protein . | Protein name . | Protein-encoding gene . | Region . | Index SNP(s)a . | Distance of gene to the index SNP (kb) . | Instrument variants . | Type of pQTL . | OR (95% CI)b . | P . | FDR P valuec . | P value after adjusting for risk SNPsd . |
---|---|---|---|---|---|---|---|---|---|---|---|
Cathepsin S | Cathepsin S | CTSS | 1q21.3 | rs17599629 | 44 | rs41271951 | cis | 0.91 (0.88–0.95) | 2.73 × 10−7 | 4.99 × 10−5 | 0.16 |
MICB | MHC class I polypeptide-related sequence B | MICB | 6p21.33 | rs2596546 | 133 | rs3134900 | cis | 1.09 (1.05–1.12) | 2.07 × 10−6 | 2.76 × 10−4 | 0.03 |
RF1ML | Peptide chain release factor 1-like_ mitochondrial | MTRF1L | 6q25.2 | rs3968480 | 109 | rs503366 | cis | 1.18 (1.08–1.29) | 4.67 × 10−4 | 0.03 | 0.21 |
IGF-II receptor | Cation-independent mannose-6-phosphate receptor | IGF2R | 6q25.3 | rs651164 | 47 | rs629849 | cis | 0.92 (0.90–0.94) | 3.98 × 10−10 | 9.73 × 10−8 | 9.95 × 10−11 |
PSP-94 | Beta-microseminoprotein | MSMB | 10q11.22 | rs10993994 | 0.002 | rs541781976, rs10993994 | trans, cis | 0.81 (0.80–0.82) | 3.60 × 10−155 | 5.29 × 10−152 | NA* |
Glutathione S-transferase Pi | Glutathione S-transferase P | GSTP1 | 11q13.2 | rs12785905 | 399 | rs1695, rs62143206 | cis, trans | 0.94 (0.91–0.97) | 5.91 × 10−4 | 0.03 | 3.12 × 10−3 |
KDEL2 | KDEL motif-containing protein 2 | KDELC2 | 11q22.3 | rs1800057 | 199 | rs74911261 | cis | 0.89 (0.86–0.93) | 1.83 × 10−8 | 3.85 × 10−6 | 0.42 |
SPINT2 | Kunitz-type protease inhibitor 2 | SPINT2 | 19q13.2 | rs8102476 rs12610267 | 0 | rs71354995 | cis | 1.05 (1.03–1.06) | 1.31 × 10−6 | 1.92 × 10−4 | 0.07 |
DcR3 | Tumor necrosis factor receptor superfamily member 6B | TNFRSF6B | 20q13.33 | rs6062509 | 33 | rs62217798 | cis | 0.69 (0.62–0.77) | 1.98 × 10−11 | 5.81 × 10−9 | 0.05 |
NOTE: NA*, the adjacent risk variant is the corresponding pQTL.
aClosest risk variant(s) identified in previous GWAS or fine-mapping studies for prostate cancer risk.
bOR and CI per one SD increase in genetically predicted protein.
cFDR P value, FDR-adjusted P value; associations with a FDR P ≤ 0.05 considered statistically significant.
dUsing COJO method (27).
To determine whether the identified significant associations between genetically predicted protein levels and prostate cancer risk were independent of GWAS-identified association signals, we performed conditional analyses adjusting for the GWAS-identified risk SNPs closest to the genes encoding our identified proteins (Tables 1 and 2; ref. 27). For proteins listed in Table 1, the analysis could not be performed for three proteins due to lack of data, and for all other proteins, the associations remained essentially unchanged in the conditional analysis, suggesting these associations may be independent of GWAS-identified association signals. On the other hand, for proteins whose encoding genes locate at known prostate cancer risk loci, except for IGF2R, all other associations were no longer statistically significant when conditioning on GWAS-identified risk SNPs, suggesting these associations may be influenced by GWAS-identified association signals (Table 2).
By analyzing exome-sequencing data of prostate tumor–adjacent normal tissue and tumor tissue obtained from 498 patients with prostate cancer of TCGA, we observed somatic level changes of indels, nonsense mutations, splice site variations, or missense mutations in at least 1 patient for 28 of the 31 genes encoding identified associated proteins (enrichment P < 0.0001 compared with the proportion of all protein-coding genes across the genome; Supplementary Table S1). In addition to the somatic missense mutations detected in 24 genes, indels were detected in four genes (ARFIP2, LRRN1, ZNF175, and PDE4DIP), splice site variations were detected in four genes (IGF2R, IL21, MICB, and PTH2R), and a nonsense mutation was detected in KLRF1 (Supplementary Table S1). Although the majority of these somatic changes occurred in only 1 patient, a missense mutation in PTH2 occurred in 9 patients (1.8%; Supplementary Table S1).
On the basis of the IPA analysis, several cancer-related functions were enriched for the genes encoding the associated proteins identified in this study (Supplementary Table S2). The top canonical pathways identified included STAT3 pathway (P = 4.54 × 10−3), glutathione redox reactions I (P = 0.027), glutathione-mediated detoxification (P = 0.030), endoplasmic reticulum stress pathway (P = 0.031), and tRNA splicing (P = 0.044).
Discussion
This is the first large-scale study to evaluate the associations of genetically predicted protein levels with prostate cancer risk using GWAS-identified pQTLs as instruments. We identified 31 proteins that demonstrated a statistically significant association with prostate cancer risk after FDR correction, including 22 whose encoding genes were located more than 500 Kb away from any reported prostate cancer risk variants. Our study provides novel information to improve the understanding of genetics and etiology for prostate cancer, and generates a list of promising proteins as potential biomarkers for early detection of prostate cancer, the most common malignancy among men in most countries around the world.
In this article, we used data from large GWASs involving 79,194 prostate cancer cases and 61,112 controls. The purpose and approach of this analysis are different from those of the study of Schumacher and colleagues (23). In the GWAS, investigators evaluated each genetic variant across the genome one at a time, aiming to identify novel susceptibility variants showing an association with prostate cancer risk (23). This work aimed to use genetically predicted protein expression levels as the testing unit to identify prostate cancer–associated proteins. We used a protein-based approach that aggregates the effects of several SNPs into one testing unit whenever possible. The analysis unit for our study is proteins, while the analysis unit in GWAS by Schumacher and colleagues (23) is genetic variants.
Previous research suggests that PSA, IGF-1, IGFBP1/2, and IL6 measured in blood may be associated with prostate cancer risk. For PSA, IGFBP1/2, and IL6, there was no corresponding pQTL identified in the study conducted by Sun and colleagues (24), thus they were not investigated in this study. For IGF-1, by using its pQTL rs74480769 as instrument, we did not observe a significant association with prostate cancer risk (OR = 0.98; 95% CI, 0.90–1.07; P = 0.70). The inconsistent finding of IGF-1 with previous studies could be due to either a weak instrument used in this study or potential confounded estimates of associations in previous studies using a conventional epidemiologic design. Indeed, the significant positive association of IGF-1 was observed in the Health Professionals Follow-up Study (15), but not in the Prostate Cancer Prevention Trial (14). Further research would be needed to better understand the relationship between these proteins and prostate cancer.
In this large study, we identified 22 associated proteins, of which, the encoding genes are located at genomic loci not mapped by any of the previous GWAS. The statistical power in our study is larger than GWAS because (i) the number of comparisons is smaller in our study than GWAS and thus we could use a less stringent statistical significance threshold rather than 5 × 10−8 in GWAS and (ii) the predicted protein levels are continuous variables, which improves statistical power. It is worth noting that nine of the proteins identified in this study are encoded by genes locating at the GWAS-identified loci. For many of the identified proteins, the genetic instrument includes trans pQTL(s) beyond only cis pQTL(s) (Tables 1 and 2), thus explaining why the corresponding protein-coding genes are not always at known susceptibility loci. In vitro/in vivo studies and human studies have suggested that some of these novel genes may play an important role in prostate tumorigenesis. For example, an interchromosomal interaction between a known prostate cancer risk locus, 8q24, and CD96 was observed by the use of a chromosome conformation capture-based multi-target sequencing technology (37). GPC6 was found to be recurrently altered across tumors of patients with advanced and lethal prostate cancer (38). PDE4D was shown to function as a proliferation-promoting factor in prostate cancer and was overexpressed in human prostate carcinoma (39); its inhibition had been shown to decrease prostate cancer cell growth (40). ATF6, which is related to the unfolded protein response, was observed to be downregulated in high-grade prostatic intraepithelial neoplasia compared with normal prostate samples (41).
Of the nine associated proteins, of which, the encoding genes are located at GWAS-identified prostate cancer risk loci, several have also been found to potentially play functional roles in prostate cancer development. For example, the decreased GSTP1 expression was observed to accompany human prostatic carcinogenesis (42). It is highly expressed in benign prostate glands while tends to not express in prostate cancer glands (43). MSMB encodes MSP for prostatic secretory protein of 94 amino acids, which is secreted by the prostate and functions as a suppressor of tumor growth and metastasis (44). Besides the study of Sun and colleagues (24), several other studies also support the potential of MSP as a serum marker for the early detection of high-grade prostate cancer (45, 46). The decreased expression of IGF2R was thought to be partly responsible for the increased growth of LNCaP human prostate cancer cells (47). In a mouse model, the mRNA of IGF2R was significantly decreased in metastatic prostate lesions and androgen-independent prostate cancer (48). By analyzing patient samples, it was identified that the loss of the heterozygosity of IGF2R was an early event in the development of prostate cancer (49). In in vivo and human studies, it was suggested that the shedding of MICB might contribute to the impairment of natural killer cell antitumor immunity in prostate cancer formation (50, 51). These previous studies provide support for a potential role of these genes in prostate carcinogenesis.
The sample size for the main association analysis of our study was large, providing high statistical power to detect the protein–prostate cancer associations. Also, the design of using genetic instruments reduces biases, such as selection bias and potential confounding, and eliminates potential influence due to reverse causation. On the other hand, there are several potential limitations of our study. The possibility of pleiotropy effect cannot be excluded. For example, rs28929474, which was the instrument for proteins ZN175, ARFP2, GPC6, RM33, PIM1, and WISP-3, as well as one of the two variants constituting an instrument for NCF-2, was also reported to be associated with several other traits, including glycoprotein acetyls (52–54). Similarly, rs429358, which was included in the instruments of LRRN1 and SNAB, was associated with cerebral amyloid deposition and red cell distribution width (55, 56); rs62143206, which was included in the instrument of glutathione S-transferase Pi, was also associated with the monocyte percentage of white cells and the granulocyte percentage of myeloid white cells (55). Further studies will be needed to validate our identified protein–prostate cancer associations. Second, our analysis was constrained by the pQTLs identified in previous GWAS of circulating protein levels, and thus we were unable to evaluate some important protein biomarkers for prostate cancer as discussed previously. We anticipated that additional protein biomarkers could be identified using newly identified pQTLs in the future. Furthermore, this work generates a list of promising protein candidates that show an association with prostate cancer, which can be investigated further in future studies that directly measure levels of these proteins. Identification of circulating protein biomarkers should be useful for prostate cancer risk assessment.
In conclusion, in a large-scale study assessing associations between genetically predicted circulating protein levels and prostate cancer, we identified multiple novel proteins showing a significant association. Further investigation of these proteins will provide additional insight into the biology and genetics of prostate cancer and facilitate the development of appropriate biomarker panels for the early detection of prostate cancer.
Disclosure of Potential Conflicts of Interest
R.A. Eeles has received speakers bureau honoraria from GUASCO to San Francisco in Jan 2016, Janssen Nov 2017, and ASCO, Chicago, June 2018. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: L. Wu, Z. Kote-Jarai, W. Zheng
Development of methodology: L. Wu, X. Guo, W. Zheng
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): the PRACTICAL CRUK, BPC3, CAPS, PEGASUS consortia, Z. Kote-Jarai, C.A. Haiman
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L. Wu, J. Bao, X. Guo, the PRACTICAL CRUK, BPC3, CAPS, PEGASUS consortia, W. Zheng
Writing, review, and/or revision of the manuscript: L. Wu, X. Shu, the PRACTICAL CRUK, BPC3, CAPS, PEGASUS consortia, Z. Kote-Jarai, R.A. Eeles, W. Zheng
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): the PRACTICAL CRUK, BPC3, CAPS, PEGASUS consortia, W. Zheng
Study supervision: W. Zheng
Acknowledgments
The authors thank Jirong Long and Wanqing Wen of the Vanderbilt University School of Medicine for their help for this study. The authors also would like to thank all of the individuals for their participation in the parent studies and all the researchers, clinicians, technicians, and administrative staff for their contribution to the studies. The data analyses were conducted using the Advanced Computing Center for Research and Education at Vanderbilt University. This project at Vanderbilt University Medical Center was supported in part by funds from the Anne Potter Wilson endowment. L. Wu was supported by NCI K99 CA218892 and the Vanderbilt Molecular and Genetic Epidemiology of Cancer training program (US NCI grant R25 CA160056 to X. Shu).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.