Abstract
Background: Numerous germline genetic variants are associated with prostate cancer risk, but their biologic role is not well understood. One possibility is that these variants influence gene expression in prostate tissue. We therefore examined the association of prostate cancer risk variants with the expression of genes nearby and genome-wide.
Methods: We generated mRNA expression data for 20,254 genes with the Affymetrix GeneChip Human Gene 1.0 ST microarray from normal prostate (N = 160) and prostate tumor (N = 264) tissue from participants of the Physicians' Health Study and Health Professionals Follow-up Study. With linear models, we tested the association of 39 risk variants with nearby genes and all genes, and the association of each variant with canonical pathways using a global test.
Results: In addition to confirming previously reported associations, we detected several new significant (P < 0.05) associations of variants with the expression of nearby genes including C2orf43, ITGA6, MLPH, CHMP2B, BMPR1B, and MTL5. Genome-wide, five genes (MSMB, NUDT11, RBPMS2, NEFM, and KLHL33) were significantly associated after accounting for multiple comparisons for each SNP (P < 2.5 × 10−6). Many more genes had an FDR <10%, including SRD5A1 and PSCA, and we observed significant associations with pathways in tumor tissue.
Conclusions: The risk variants were associated with several genes, including promising prostate cancer candidates and lipid metabolism pathways, suggesting mechanisms for their impact on disease. These genes should be further explored in biologic and epidemiologic studies.
Impact: Determining the biologic role of these variants can lead to improved understanding of prostate cancer etiology and identify new targets for chemoprevention. Cancer Epidemiol Biomarkers Prev; 24(1); 255–60. ©2014 AACR.
Introduction
Numerous germline genetic risk variants have been linked to prostate cancer risk from genome-wide association studies (1–14). With the report from the PRACTICAL consortium in 2013, the number of prostate cancer risk variants is now >70 (15), a major step toward uncovering the genetic etiology of prostate cancer. Family and twin studies demonstrate that prostate cancer is highly heritable (16); these SNPs explain an ever increasing portion of this underlying heritability, currently about 30% (15). However, the biologic function of these risk SNPs remains largely unknown given that the majority is located outside of protein coding regions. A critical next step in translating knowledge of identified SNPs to the prevention or treatment of prostate cancer is determining their biologic mechanisms.
One possibility is that the risk variants are expression quantitative trait loci (eQTL), genetic loci that are associated with mRNA transcript levels. Few large studies have both SNP data and prostate tissue for gene expression studies. We recently showed that a prostate cancer risk SNP on chromosome 10q11, rs10993994, was significantly associated with mRNA expression of two nearby genes (17) in prostate tissue. Men with the risk allele had decreased expression of MSMB in both normal prostate and tumor tissue, and increased expression of NCOA4 in normal prostate tissue only. A similar result for MSMB was observed by Lou and colleagues (18). A study of 12 prostate cancer risk loci found that four acted as eQTLs. In addition to confirming the MSMB and NCOA4 results, NUDT11 was associated with rs5945619 and SLC22A3 was borderline significantly associated with rs9364554 in both tumor and normal tissue, and HNF1B was associated with rs4430796 in normal tissue only (19). Harries and colleagues observed an association between rs6465657 and expression of nearby LMTK2 (20), whereas Xu and colleagues found a proxy for rs12653946 to be strongly associated with the expression of nearby IRX4 (21).
Published work has primarily focused on the expression of genes near the risk variants. Although variants may have larger effects on nearby genes, and there is therefore more power to identify these effects, genetic polymorphisms can influence the expression of genes anywhere in the genome. This can happen either directly or downstream in a pathway, such as through a transcription factor (reviewed in ref. 22). We therefore examined the association of the risk variants with transcriptome-wide expression data in tumor and normal prostate tissue, performing a cis analysis (examining the association of the variants with nearby genes), a trans-analysis (determining the association of the variants with all genes), and a pathway analysis.
Materials and Methods
Study participants
Physicians' Health Study and Health Professionals Follow-up Study.
The men in the study are participants in two prospective studies ongoing for more than 25 years: the Physicians' Health Study (PHS) and Health Professionals Follow-up Study (HPFS). PHS began in 1986 as a randomized, double-blind trial of aspirin and β-carotene in the prevention of cardiovascular disease and cancer among 22,071 initially healthy U.S. physicians (23). The HPFS, an ongoing prospective cohort study on the causes of cancer and heart disease in men, consists of 51,529 U.S. health professionals who were ages 40 to 75 years in 1986 (24). In both studies, men were excluded if they had any serious medical conditions at baseline including all cancers (except nonmelanoma skin cancer).
The men in this study were diagnosed with incident, histologically confirmed prostate cancer between 1982 and 2004. Participants are followed through regular questionnaires to collect self-reported data on diet, lifestyle behaviors, medical history, and health outcomes, including prostate cancer. All prostate cancer cases in this study were verified through medical record and pathology review. Through this systematic medical record review, we also abstract data on clinical information, including clinical stage and PSA at diagnosis.
The Human Subjects Committee at Partners Healthcare and the Harvard School of Public Health (Boston, MA) approved these studies.
mRNA expression profiling
In both cohorts, we sought to retrieve archival formalin-fixed paraffin embedded (FFPE) specimens. The PHS and HPFS Tumor Cohort includes 2,200 men with prostate cancer for whom we have collected archival radical prostatectomy (RP; 95%) and trans-urethral resection of the prostate (TURP; 5%) specimens.
For a subset of the tumor cohort, we undertook whole-genome gene expression profiling as part of a study designed to identify expression signatures that can differentiate lethal from indolent prostate cancer. We sampled men from the Tumor Cohort using an extreme case design, which includes 116 men who died of their cancer or developed bony or distant metastases and 292 men who lived at least 8 years after prostate cancer diagnosis and were not diagnosed with metastases through 2012. For a subset of these men, we also profiled adjacent normal tissue. To conduct this profiling in FFPE tissues, whole transcriptome amplification was paired with microarray technologies. Briefly, RNA was extracted using the Biomek FxP automated platform with the Agencourt FormaPure FFPE kit (Beckman Coulter). The mRNA was amplified using the WT-Ovation FFPE System V2 (NuGEN), a whole transcriptome amplification system that allows for complete gene expression analysis from archives of FFPE samples known to harbor small and degraded RNA. Using a combination 5′ and random primer, reverse transcription created a cDNA/mRNA hybrid. The mRNA was subsequently fragmented, creating binding sites for DNA polymerase. Isothermal strand-displacement, using a proprietary DNA/RNA chimeric SPIA primer, amplified the cDNA. The cDNA was then fragmented and labeled with a terminal deoxynucleotidyl transferase covalently linked to biotin to prepare for microarray hybridization. The labeled cDNA was then hybridized to a GeneChip Human Gene 1.0 ST microarray (Affymetrix).
For the expression profiles generated, we regressed out technical variables including mRNA concentration, age of the block, batch (96-well plate), percentage of the probes on the array detectable above the background, log-transformed average background signal, and the median of the perfect match probes for each probe intensity of the raw data. The residuals were then shifted to have the original mean expression values and normalized using the RMA method (25, 26). We mapped gene names to Affymetrix transcript cluster IDs using the NetAffx annotations as implemented in Bioconductor annotation package pd.hugene.1.0.st.v1; this resulted in 20,254 unique named genes. Gene expression data are available through Gene Expression Omnibus accession number GSE62872.
Risk variant genotypes
The SNPs were genotyped on DNA extracted from whole blood as part of the National Cancer Institute funded Breast and Prostate Cancer Cohort Consortium (BPC3) using the TaqMan assay (Applied Biosystems) at the Harvard School of Public Health (Boston, MA); details on the SNP selection and genotyping are provided in (ref. 27). Prostate cancer participants included in the current study are all of European ancestry. To reduce missing data, we combined data for SNPs in very high linkage disequilibrium. For rs12418451, we used genotypes from either rs12418451 or rs10896438 (r2 = 0.96 in HapMap CEU population); for rs2928679, we used genotypes from either rs2928679 or rs13264338 (r2 = 0.97); for rs1983891, we used genotypes from either rs1983891 or rs9381080 (r2 = 1.00); and for rs11672691, we used genotypes from either rs11672691 or rs11673591 (r2 = 1.00). In addition, eight SNPs from (ref. 27) were not genotyped in PHS, and were therefore excluded from this study due to the reduction in sample size. The average SNP and individual call rate were 95.2% and 95.3%, respectively.
Statistical analysis
There were 264 participants with SNP and tumor tissue expression data; 160 of these cases also had normal prostate tissue expression data. Each SNP (three genotype categories: 0, 1, or 2 copies of the risk allele) was compared with each gene (continuous expression) with a linear model test for trend. This analysis was performed separately for gene expression from prostate tumor and normal prostate tissue. First, a cis analysis was performed, examining the association between each SNP and “nearby” (500 kb up- and downstream) genes. For this more focused analysis, P < 0.05 was considered statistically significant; despite the possibility for false positives using this liberal approach, these are reported to provide candidates for future studies. Next, a genome-wide analysis was performed, examining the association of each SNP with all genes. For this analysis, a Bonferroni corrected P value (P < 2.5 × 10−6 = 0.05/20,254 tests) was considered statistically significant, considering each SNP and tissue type as an independent hypothesis. In addition, all associations with an FDR less than 10% were reported as potentially interesting. Finally, a pathway analysis was performed using a global test model (R package global test; ref. 28). For this analysis, individuals were classified as either not carrying the risk allele or carrying one or two copies of the risk allele to create a binary outcome variable. The Broad Institute (Cambridge, MA) MSigDB KEGG pathway classifications (v4.0) were used (29).
Analysis was performed with R version 2.15.0. All P values reported are two sided and unadjusted for multiple comparisons. FDR q-values were generated using the R package Q value [Alan Dabney <adabney@u.washington.edu>, John D. Storey <jstorey@u.washington.edu>, and with assistance from Gregory R. Warnes <gregory_r_warnes@groton.pfizer.com> (2011). q value: Q-value estimation for FDR control.]
Results
A description of the study participants is provided in Table 1. Information on the 39 risk SNPs and the frequencies in this population are in Supplementary Table S1. To examine cis relationships where a larger effect size and therefore more statistical power are expected, we specifically looked at the association of the risk SNPs with the expression of genes within a 1 Mb window (500 kb up- and downstream) around the SNP. In this focused analysis, we considered P < 0.05 to be statistically significant. Several SNPs were associated with the expression of nearby genes in normal and tumor (Table 2). We observed new associations of SNPs with the genes in which they are located. rs13385191 was significantly associated with C2orf43 in both tumor and normal, rs12621278 with ITGA6 in tumor, and rs2292884 with MLPH in tumor (and borderline significant in normal). Additional novel associations with genes very nearby, where the SNP could be in a regulatory region or in linkage disequilibrium with another SNP within the gene, were also observed: rs7629490 with CHMP2B in tumor, rs17021918 with BMPR1B in tumor (and borderline in normal), rs4242382 with POU5F1B in tumor, rs7127900 with ASCL2 in tumor, rs902774 with KRT79 in normal, rs10896449 with MTL5 in tumor and normal, and several others. Some significant associations observed in tumor that are not significant in normal tissue could be due to the larger number of tumor samples, so we also performed the cis analysis on the subset of the tumor that had normal tissue data available. The associations remained significant or close to significant, even with the smaller sample size for the vast majority (Table 2). Results for all genes within the 1 Mb windows are reported in Supplementary Table S2 indicating additional associations that were just over the statistically significant threshold, but involve clear prostate cancer candidate genes (e.g., KLK2 (P = 0.07) and KLK3 (P = 0.07) with rs2735839 on chromosome 19 in tumor only). Box plots showing the gene expression levels by genotype for significant associations are presented in Supplementary Data.
. | Tumor (n = 264) . | Normal (n = 160) . |
---|---|---|
Lethal, n (%) | 41 (15.5) | 24 (15.0) |
Gleason scorea, n (%) | ||
5–6 | 39 (14.8) | 28 (17.5) |
7 | 167 (63.3) | 100 (62.5) |
8–10 | 58 (22.0) | 32 (20.0) |
Pathologic stageb, n (%) | ||
T2 | 164 (62.1) | 107 (66.9) |
T3 | 81 (30.7) | 43 (26.9) |
T4/N1/M1 | 17 (6.4) | 9 (5.6) |
Missing | 2 (0.8) | 1 (0.6) |
PSA at diagnosis, n (%) | ||
0–4 | 28 (10.6) | 18 (11.3) |
4–10 | 136 (51.5) | 83 (51.9) |
10–20 | 49 (18.6) | 28 (17.5) |
>20 | 24 (9.1) | 15 (9.4) |
Pre-PSA era (before 1992) | 27 (10.2) | 16 (10.0) |
. | Tumor (n = 264) . | Normal (n = 160) . |
---|---|---|
Lethal, n (%) | 41 (15.5) | 24 (15.0) |
Gleason scorea, n (%) | ||
5–6 | 39 (14.8) | 28 (17.5) |
7 | 167 (63.3) | 100 (62.5) |
8–10 | 58 (22.0) | 32 (20.0) |
Pathologic stageb, n (%) | ||
T2 | 164 (62.1) | 107 (66.9) |
T3 | 81 (30.7) | 43 (26.9) |
T4/N1/M1 | 17 (6.4) | 9 (5.6) |
Missing | 2 (0.8) | 1 (0.6) |
PSA at diagnosis, n (%) | ||
0–4 | 28 (10.6) | 18 (11.3) |
4–10 | 136 (51.5) | 83 (51.9) |
10–20 | 49 (18.6) | 28 (17.5) |
>20 | 24 (9.1) | 15 (9.4) |
Pre-PSA era (before 1992) | 27 (10.2) | 16 (10.0) |
aGleason score is from RP for 256 tumor and 154 normal, and from TURP for the remaining cases.
bRP pathologic stage for most, but clinical stage at diagnosis for TURP cases.
SNP . | Chromosome . | Position . | Gene . | Distance from SNP (kb) . | P (tumor)a . | P (normal)a . | P (tumor -restricted)b . |
---|---|---|---|---|---|---|---|
rs13385191 | 2p24 | 20888264 | C2orf43 | 0 | 0.029 (lower) | 0.016 (lower) | 0.029 |
rs12621278 | 2q31 | 173311552 | DLX2 | −344.1 | 0.035 (higher) | 0.439 | 0.006 |
ITGA6 | 0 | 0.022 (higher) | 0.295 | 0.111 | |||
rs2292884 | 2q37 | 238443225 | MLPH | 0 | 0.020 (lower) | 0.051 (lower) | 0.009 |
LRRFIP1 | 93 | 0.475 | 0.030 (higher) | 0.537 | |||
rs7629490 | 3p11 | 87241496 | CHMP2B | 34.9 | 0.01 (higher) | 0.958 | 0.008 |
rs17021918 | 4q22 | 95562876 | BMPR1B | 116.3 | 0.034 (lower) | 0.056 | 0.012 |
rs12653946 | 5p15 | 1895828 | IRX4 | −8.5 | 0.001 (lower) | 0.019 (lower) | 0.009 |
rs1983891 | 6p21 | 41536426 | TREM2 | −405.5 | 0.738 | 0.012 (higher) | 0.750 |
MED20 | 336.7 | 0.016 (higher) | 0.636 | 0.046 | |||
rs339331 | 6q22 | 117210051 | RSPH4A | −255.9 | 0.029 (lower) | 0.976 | 0.029 |
rs9364554 | 6q25 | 160833663 | IGF2R | −306.1 | 0.048 (lower) | 0.501 | 0.041 |
rs10486567 | 7p15 | 27976562 | HIBADH | −273.9 | 0.028 (lower) | 0.130 | 0.172 |
TAX1BP1 | −107.2 | 0.443 | 0.037 (lower) | 0.933 | |||
rs6465657 | 7q21 | 97816326 | LMTK2 | 0 | 0.021 (higher) | 0.607 | 0.132 |
rs4242382 | 8q24 | 128517572 | POU5F1B | −88.1 | 0.010 (higher) | 0.083 | 0.048 |
rs1571801 | 9q33 | 124427372 | RAB14 | −463 | 0.510 | 0.023 (lower) | 0.704 |
TTLL11 | 156.8 | 0.889 | 0.046 (higher) | 0.744 | |||
NDUFA8 | 479 | 0.407 | 0.045 (lower) | 0.088 | |||
rs10993994 | 10q11 | 51549495 | MSMB | 0.1 | 3.94E−07 (lower) | 0.002 (lower) | 4.29E−05 |
NCOA4 | 15.6 | 0.640 | 5.70E−05 (higher) | 0.649 | |||
ASAH2 | 397.5 | 0.046 (lower) | 0.205 | 0.398 | |||
rs7127900 | 11p15 | 2233573 | ASCL2 | 56.1 | 2.84E−04 (higher) | 0.793 | 0.003 |
KCNQ1 | 232.6 | 0.025 (higher) | 0.559 | 0.225 | |||
rs12418451 | 11q13 | 68935418 | IGHMBP2 | −227.3 | 0.779 | 0.006 (higher) | 0.849 |
rs10896449 | 11q13 | 68994666 | MTL5 | −475.7 | 0.026 (lower) | 0.037 (lower) | 0.116 |
rs902774 | 12q13 | 53273903 | KRT6B | −428 | 0.034 (higher) | 0.744 | 0.124 |
KRT79 | −45.8 | 0.780 | 0.019 (lower) | 0.641 | |||
ESPL1 | 388.2 | 0.473 | 0.043 (lower) | 0.763 | |||
SP7 | 446.5 | 0.017 (higher) | 0.027 (lower) | 0.145 | |||
rs8102476 | 19q13 | 38735612 | PPP1R14A | 6.3 | 0.004 (higher) | 0.038 (higher) | 0.014 |
SPINT2 | 19.5 | 0.012 (lower) | 0.344 | 0.014 | |||
GGN | 139.3 | 0.237 | 0.012 (higher) | 0.309 | |||
rs11672691 | 19q13 | 41985586 | CYP2F1 | −351.3 | 0.177 | 0.040 (higher) | 0.560 |
AXL | −217.9 | 0.004 (lower) | 0.915 | 0.004 | |||
DMRTC2 | 363.4 | 0.044 (lower) | 0.565 | 0.023 | |||
rs2735839 | 19q13 | 51364622 | C19orf63 | −378 | 0.016 (higher) | 0.386 | 0.069 |
rs5945619 | Xp11 | 51241671 | NUDT10 | −161.3 | 0.040 (lower) | 0.331 | 0.076 |
NUDT11 | −2.2 | 4.60E−11 (higher) | 3.98E−04 (higher) | 3.24E−09 |
SNP . | Chromosome . | Position . | Gene . | Distance from SNP (kb) . | P (tumor)a . | P (normal)a . | P (tumor -restricted)b . |
---|---|---|---|---|---|---|---|
rs13385191 | 2p24 | 20888264 | C2orf43 | 0 | 0.029 (lower) | 0.016 (lower) | 0.029 |
rs12621278 | 2q31 | 173311552 | DLX2 | −344.1 | 0.035 (higher) | 0.439 | 0.006 |
ITGA6 | 0 | 0.022 (higher) | 0.295 | 0.111 | |||
rs2292884 | 2q37 | 238443225 | MLPH | 0 | 0.020 (lower) | 0.051 (lower) | 0.009 |
LRRFIP1 | 93 | 0.475 | 0.030 (higher) | 0.537 | |||
rs7629490 | 3p11 | 87241496 | CHMP2B | 34.9 | 0.01 (higher) | 0.958 | 0.008 |
rs17021918 | 4q22 | 95562876 | BMPR1B | 116.3 | 0.034 (lower) | 0.056 | 0.012 |
rs12653946 | 5p15 | 1895828 | IRX4 | −8.5 | 0.001 (lower) | 0.019 (lower) | 0.009 |
rs1983891 | 6p21 | 41536426 | TREM2 | −405.5 | 0.738 | 0.012 (higher) | 0.750 |
MED20 | 336.7 | 0.016 (higher) | 0.636 | 0.046 | |||
rs339331 | 6q22 | 117210051 | RSPH4A | −255.9 | 0.029 (lower) | 0.976 | 0.029 |
rs9364554 | 6q25 | 160833663 | IGF2R | −306.1 | 0.048 (lower) | 0.501 | 0.041 |
rs10486567 | 7p15 | 27976562 | HIBADH | −273.9 | 0.028 (lower) | 0.130 | 0.172 |
TAX1BP1 | −107.2 | 0.443 | 0.037 (lower) | 0.933 | |||
rs6465657 | 7q21 | 97816326 | LMTK2 | 0 | 0.021 (higher) | 0.607 | 0.132 |
rs4242382 | 8q24 | 128517572 | POU5F1B | −88.1 | 0.010 (higher) | 0.083 | 0.048 |
rs1571801 | 9q33 | 124427372 | RAB14 | −463 | 0.510 | 0.023 (lower) | 0.704 |
TTLL11 | 156.8 | 0.889 | 0.046 (higher) | 0.744 | |||
NDUFA8 | 479 | 0.407 | 0.045 (lower) | 0.088 | |||
rs10993994 | 10q11 | 51549495 | MSMB | 0.1 | 3.94E−07 (lower) | 0.002 (lower) | 4.29E−05 |
NCOA4 | 15.6 | 0.640 | 5.70E−05 (higher) | 0.649 | |||
ASAH2 | 397.5 | 0.046 (lower) | 0.205 | 0.398 | |||
rs7127900 | 11p15 | 2233573 | ASCL2 | 56.1 | 2.84E−04 (higher) | 0.793 | 0.003 |
KCNQ1 | 232.6 | 0.025 (higher) | 0.559 | 0.225 | |||
rs12418451 | 11q13 | 68935418 | IGHMBP2 | −227.3 | 0.779 | 0.006 (higher) | 0.849 |
rs10896449 | 11q13 | 68994666 | MTL5 | −475.7 | 0.026 (lower) | 0.037 (lower) | 0.116 |
rs902774 | 12q13 | 53273903 | KRT6B | −428 | 0.034 (higher) | 0.744 | 0.124 |
KRT79 | −45.8 | 0.780 | 0.019 (lower) | 0.641 | |||
ESPL1 | 388.2 | 0.473 | 0.043 (lower) | 0.763 | |||
SP7 | 446.5 | 0.017 (higher) | 0.027 (lower) | 0.145 | |||
rs8102476 | 19q13 | 38735612 | PPP1R14A | 6.3 | 0.004 (higher) | 0.038 (higher) | 0.014 |
SPINT2 | 19.5 | 0.012 (lower) | 0.344 | 0.014 | |||
GGN | 139.3 | 0.237 | 0.012 (higher) | 0.309 | |||
rs11672691 | 19q13 | 41985586 | CYP2F1 | −351.3 | 0.177 | 0.040 (higher) | 0.560 |
AXL | −217.9 | 0.004 (lower) | 0.915 | 0.004 | |||
DMRTC2 | 363.4 | 0.044 (lower) | 0.565 | 0.023 | |||
rs2735839 | 19q13 | 51364622 | C19orf63 | −378 | 0.016 (higher) | 0.386 | 0.069 |
rs5945619 | Xp11 | 51241671 | NUDT10 | −161.3 | 0.040 (lower) | 0.331 | 0.076 |
NUDT11 | −2.2 | 4.60E−11 (higher) | 3.98E−04 (higher) | 3.24E−09 |
aLower or higher in parentheses indicates value in those carrying two copies of the risk allele compared with those carrying zero copies of the risk allele for significant (P < 0.05) associations.
bTumor restricted refers to using only the tumors that have paired normal samples (N = 159).
We additionally replicated previously reported associations. rs10993994 was associated with MSMB (P = 3.9 × 10−7), both located on chromosome 10q11, and rs5945619 with NUDT11 (P = 4.6 × 10−11), both on chromosome Xp11. These associations were also nominally associated in normal tissue (P = 0.0017 and 0.0004, respectively), consistent with previous findings (19). rs10993994 was associated with NCOA4 in normal tissue with P = 5.7 × 10−5, but was null in tumor (P = 0.64), consistent with a previous finding (19). We also confirmed other previous findings (20, 21): rs12653946 with IRX4 (in tumor and normal), rs8102476 with PPP1R14A (in tumor and normal), rs6465657 with LMTK2 (in tumor only), and rs5945619 with NUDT10 (in tumor only). However, we did not confirm an association of rs4430796 with HNF1B in normal tissue (P = 0.66; ref. 19) or associations with FAM83F (P = 0.29), YIF1B (P = 0.37), FAM98C (P = 0.17), FOXP4 (P = 0.19), or TFEB (P = 0.78) in tumor tissue (21).
We performed a genome-wide (“trans”) analysis of the risk SNPs across all transcripts. After Bonferroni correction for the 20,254 tests performed for each SNP, only five results remained statistically significant (P < 2.5 × 10−6). These include the MSMB and NUDT11 associations in tumor, mentioned above, as well as RBPMS2 (15q22) with rs11672691 (19q13; P = 2.22 × 10−6 in tumor). There were two associations in normal tissue: rs1859962 (8q24) with NEFM (8p21; P = 2.22 × 10−6) and rs1571801 (9q33) with KLHL33 (14q11; P = 1.97 × 10−6). Associations using a more liberal threshold (FDR q < 0.1) are presented in Supplementary Table S3. Of note, rs339331 and rs11672691 were each associated with tumor expression of dozens of genes at FDR q < 0.1
Although a risk SNP may not be strongly associated with individual genes, it may influence an entire pathway indirectly by impacting a transcription factor or other regulator of gene processes. After accounting for the 186 pathways tested, nine pathways were significantly (p < 2.7 × 10−4) associated with rs1512268 in tumor, most likely due to the overlap of several lipid-related genes within these pathways. These results are presented in Table 3, with all pathways significant with a less stringent threshold (P < 0.001) in Supplementary Table S4.
. | Pathway . | P . |
---|---|---|
rs1512268 | KEGG_VEGF_SIGNALING_PATHWAY | 5.27E−05 |
KEGG_LINOLEIC_ACID_METABOLISM | 5.42E−05 | |
KEGG_ALPHA_LINOLENIC_ACID_METABOLISM | 6.09E−05 | |
KEGG_ARACHIDONIC_ACID_METABOLISM | 6.42E−05 | |
KEGG_ETHER_LIPID_METABOLISM | 7.71E−05 | |
KEGG_FC_EPSILON_RI_SIGNALING_PATHWAY | 1.13E−04 | |
KEGG_LONG_TERM_DEPRESSION | 1.40E−04 | |
KEGG_GNRH_SIGNALING_PATHWAY | 1.59E−04 | |
KEGG_GLYCEROPHOSPHOLIPID_METABOLISM | 1.72E−04 |
. | Pathway . | P . |
---|---|---|
rs1512268 | KEGG_VEGF_SIGNALING_PATHWAY | 5.27E−05 |
KEGG_LINOLEIC_ACID_METABOLISM | 5.42E−05 | |
KEGG_ALPHA_LINOLENIC_ACID_METABOLISM | 6.09E−05 | |
KEGG_ARACHIDONIC_ACID_METABOLISM | 6.42E−05 | |
KEGG_ETHER_LIPID_METABOLISM | 7.71E−05 | |
KEGG_FC_EPSILON_RI_SIGNALING_PATHWAY | 1.13E−04 | |
KEGG_LONG_TERM_DEPRESSION | 1.40E−04 | |
KEGG_GNRH_SIGNALING_PATHWAY | 1.59E−04 | |
KEGG_GLYCEROPHOSPHOLIPID_METABOLISM | 1.72E−04 |
Discussion
Prostate cancer is one of the most heritable malignancies. Numerous germline genetic risk variants have been linked to risk; however, their function is often difficult to discern because many lie in intergenic and intronic regions. eQTL studies of these risk loci have been performed, but have primarily focused on genes close to the SNP. We evaluated the association of the risk loci with genes and pathways across the genome in tumor and normal tissue.
For the cis-based analysis, in addition to previously reported associations, several novel associations were observed. Many of the associated genes are interesting candidates, including several that are transcription factors (DLX2, IRX4, ASCL2, SP7, DMRTC2). Other novel associations include rs17021981 with BMPR1B, a bone morphogenic protein. Because the primary metastatic site of prostate cancer is bone, this gene could be relevant to progression as well. Though their expression did not vary in LNCaP cell lines following the addition of androgens, Hazelett and colleagues note that BMPR1B, as well as IGF2R and CHMP2B (both associated with nearby risk SNPs in this analysis in tumor), are androgen-regulated genes (30). The association of rs12621278 with the expression of C2orf43 was previously reported in liver cancer. This gene is associated with defective apolipoprotein B-100, which leads to hypercholesteremia (31). This may suggest a plausible mechanism for further investigation because statin use has been inversely associated with prostate cancer (32). Although rs1512268 was not associated with any single genes, it was the only SNP significantly associated with pathways in tumor tissue. Many of these pathways were related to lipid metabolism, which may lend additional support to the hypothesis that lipids are involved with cancer development (33). POU5F1B has previously been associated with gonadoblastoma. Although we observed rs4242382 to be significantly associated with POU5F1B in tumor and borderline significant in normal tissue, Breyer and colleagues recently found other risk SNPs in the 8q24 region to be associated with POU5F1B expression only in normal tissue (34). MTL5, found to be reduced in those with the risk allele in both tumor and normal, is a metallothionien-like protein that may be involved with cell growth and differentiation, as well as spermatogenesis. Several other genes, while not quite reaching the P<0.05 threshold, are also interesting candidates, including KLK2 and KLK3 mentioned above, because KLK3 encodes the PSA gene, and NKX3.1 was associated with rs2928679 in normal tissue only with borderline significance (P = 0.06).
When examining all genes, we observed several highly significant associations. Some of these associations suggest excellent candidate genes for prostate cancer, particularly SRD5A1 (steroid 5-α-reductase-1), the target of 5-α-reductase inhibitors, such as finasteride, which was associated with rs339331 in normal tissue (FDR q-value = 0.045). PSCA (prostate stem cell antigen) was associated with rs1859962 in normal tissue (FDR q-value = 0.099); this gene is a cell surface marker that has been found to be upregulated in prostate and other cancers.
We observe different associations for some SNPs in normal and tumor tissue, as others have reported previously. Results in normal may suggest earlier effects of the SNPs and involvement with tumor initiation; these associations could be lost in tumor due to the development of mutations, or the dysregulation of another gene, miRNA, or lncRNA, which then has a larger influence on the expression of these genes. Significant results in tumor only may point toward genes driving continued carcinogenesis; these associations could be masked in normal tissue because the gene expression is tightly regulated by other mechanisms that are lost during tumorigenesis.
Confirming many previously reported associations gives us confidence in our data and our findings. The lack of replication of some previously reported associations could be for several reasons, including limited power, expression technology with differing probe location or splice variants measured, or that the original report was a false positive. Our study does have some limitations that could lead to false negatives. Although this study is large for a gene expression profiling study, the statistical power to detect the small effects anticipated is relatively low. Also, the risk SNPs could affect mRNA expression levels in a transitory way, and we are only able to capture one time point after cancer has already developed. In addition, using a less stringent P value cutoff of 0.05 in the cis analysis may have led to the reporting of some false-positive associations; however, this provides a list of candidates to be confirmed in additional studies. Also, approximately 100 prostate cancer risk SNPs now have been identified; future analysis should not only attempt to confirm the results reported here, but also expand to include the more recently identified SNPs.
The genes and pathways we identified that are associated with the risk SNPs can improve the biologic understanding of prostate cancer development. These genes may additionally help explain the mechanism of epidemiologic results and provide candidates for new treatment or prevention strategies.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: K.L. Penney, L.A. Mucci
Development of methodology: M. Loda, M.J. Stampfer
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H.D. Sesso, L.A. Mucci
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.L. Penney, J.A. Sinnott, S. Tyekucheva, T. Gerke, I.M. Shui, P. Kraft, M.L. Freedman
Writing, review, and/or revision of the manuscript: K.L. Penney, J.A. Sinnott, S. Tyekucheva, I.M. Shui, P. Kraft, H.D. Sesso, M.L. Freedman, M. Loda, L.A. Mucci, M.J. Stampfer
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): T. Gerke, H.D. Sesso
Study supervision: K.L. Penney, L.A. Mucci
Acknowledgments
The authors thank the participants and staff of the PHS and HPFS for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. In addition, this study was approved by the Connecticut Department of Public Health (DPH) Human Investigations Committee. Certain data used in this publication were obtained from the DPH. The authors assume full responsibility for analyses and interpretation of these data. The authors also thank Natalie Dupre, Elizabeth Nuttall, Michael Pitt, Sam Peisch, and Rosina Lis for their significant involvement with study design and implementation and Edward Fox (in memoriam) at the DFCI Microarray Core.
Grant Support
The PHS was supported by grants CA34944, CA40360, CA097193, HL26490, and HL34595. The HPFS was supported by grants CA133891, CA141298, and P01CA055075. This study and the following authors were supported by 5U01CA098233 (P. Kraft), CA136578 (L.A. Mucci), CA141298 (M.J. Stampfer), CA131945 (M. Loda), and P50CA090381 (L.A. Mucci and S. Tyekucheva). J.A. Sinnott received support from T32CA09001. K.L. Penney and L.A. Mucci were supported by Prostate Cancer Foundation Young Investigator Awards. I.M. Shui was supported by a Department of Defense Prostate Cancer Research Fellowship. K.L. Penney was supported by the A. David Mazzone Research Awards Program.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.