Abstract
African American (AA) men have a higher risk of developing prostate cancer than white men. SNPs are known to play an important role in developing prostate cancer. The impact of PVT1 and its neighborhood genes (CASC11 and MYC) on prostate cancer risk are getting more attention recently. The interactions among these three genes associated with prostate cancer risk are understudied, especially for AA men. The objective of this study is to investigate SNP–SNP interactions in the CASC11–MYC–PVT1 region associated with prostate cancer risk in AA men.
We evaluated 205 SNPs using the 2,253 prostate cancer patients and 2,423 controls and applied multiphase (discovery-validation) design. In addition to SNP individual effects, SNP–SNP interactions were evaluated using the SNP Interaction Pattern Identifier, which assesses 45 patterns.
Three SNPs (rs9642880, rs16902359, and rs12680047) and 79 SNP–SNP pairs were significantly associated with prostate cancer risk. These two SNPs (rs16902359 and rs9642880) in CASC11 interacted frequently with other SNPs with 56 and 9 pairs, respectively. We identified the novel interaction of CASC11–PVT1, which is the most common gene interaction (70%) in the top 79 pairs. Several top SNP interactions have a moderate to large effect size (OR, 0.27–0.68) and have a higher prediction power to prostate cancer risk than SNP individual effects.
Novel SNP–SNP interactions in the CASC11–MYC–PVT1 region have a larger impact than SNP individual effects on prostate cancer risk in AA men.
This gene–gene interaction between CASC11 and PVT1 can provide valuable information to reveal potential biological mechanisms of prostate cancer development.
Introduction
Prostate cancer has the highest cancer incidence among men; it was estimated that there would be 164,690 new cases in the United States in 2018 (1). The prostate cancer incidence in African American (AA) men is higher compared with European American (EA) men regardless of access to care and socioeconomic status (2–5). Prostate cancer is the most commonly diagnosed cancer, accounting for 31% of all cancers in AA men. Although the causes of prostate cancer are not yet fully understood, genetic variation has an impact on prostate cancer development. During the past decade, genome-wide association studies (GWAS) identified approximately 160 SNPs to be associated with prostate cancer (6). However, the majority of genetic association studies of prostate cancer risk were focused on EA men. Genetic variation on prostate cancer in AA men is still understudied (7).
Increasing evidence suggested PVT1 may play an important role in prostate cancer risk. PVT1 is overexpressed in prostate cancer tumor tissues and is associated with regulating tumor growth (8, 9). It has been shown that PVT1 exon 9 is significantly overexpressed in aggressive prostate cancer cell lines derived from AA men (10). PVT1 is a long noncoding RNA and is the host to a cluster of miRNAs (such as miR-1204, miR-1205, and miR-1206; refs. 8, 11). Studies have shown that PVT1 expression can interact with several miRNAs to influence on prostate cancer development and progression (12, 13). Neighboring genes of PVT1 also show some biological evidence to influence prostate cancer. PVT1 is associated with colorectal, lung, and renal cell cancer in GWAS (14, 15).
MYC, a protein-coding gene and an upstream gene of PVT1 (see Fig. 1), is overexpressed in prostate tumor tissues. DNA methylation in MYC is associated with prostate cancer aggressiveness (16, 17). The MYC oncogene works as a transcriptional activator that is involved in cell growth, apoptosis, and differentiation (8, 18–22). CASC11, another neighboring gene of PVT1, is also reported to be differentially upregulated in prostate cancer compared with normal prostate (23). In addition, SNPs in the CASC11–MYC–PVT1 region in the chromosome 8q24.21 (Fig. 1) are associated with various cancers (24), such as breast, colorectal, lung, and kidney cancer in the GWAS. CASC11 is also a GWAS-identified gene associated with bladder (25, 26), colorectal, pancreatic (27), and breast cancer (28).
These three genes (CASC11, MYC, and PVT1) are physically close to each other, and their interactions associated with prostate cancer risk remain unknown, especially for AA men. It is suggested that SNP–SNP interactions have a large impact on revealing the mechanism of complex diseases (29–34). Individual SNP effects are well known to have low prediction power despite of the fact that large numbers of GWAS SNPs have been identified. For the existing SNP risk scores, the weighted sum of multiple SNP individual effects have shown to increase prediction power and risk classification (6, 35), but still a large portion of the genetic susceptibility of prostate cancer risk is still missing. SNP–SNP interactions may be the answer for filling this gap. It has been shown the additive-additive full interaction model (AA_Full), a hierarchical interaction model (with two SNP main effects and their interaction), is not sufficient to detect SNP–SNP interactions associated with complex diseases (36–38). Applying advanced statistical methods to thoroughly search SNP–SNP interactions is a key to identify useful biomarkers associated with prostate cancer risk. The SNP interaction pattern identifier (SIPI), a novel statistical method recently developed by Lin and colleagues (2017), is a proven powerful tool. SIPI tests 45 biologically meaningful interaction patterns based on nonhierarchical models, inheritance mode, and mode coding direction (37). The objective of this study is to investigate SNP–SNP/gene–gene associations in the CASC11–MYC–PVT1 region associated with prostate cancer risk using the SIPI approach.
Materials and Methods
Study population
The 4,676 men for this study were from the Multiethnic Cohort (MEC), a large prospective cohort established between 1993 and 1996. Blood samples for the cohort were collected during the first two phases—between 1996 and 2001. Specimens were obtained retrospectively from incidental prostate cancer cases in conjunction with a random sample of the cohort to serve as controls. This cohort is comprised of AA, along with other races, living in Hawaii and California (39, 40). For this study, only AAs were included. The study population includes 2,253 cases and 2,423 controls with AA ancestry from five study sites. For each study site, half of the sample was randomly assigned as the discovery set, and the other half became the validation set. The sample sizes in discovery and validation sets are both 2,338. The combined set is the sum of the discovery and validation set. In this study, we applied the dbGaP version of the MEC data (dbGaP phs000306.v3.p1; ref. 41), which contains a portion of the large MEC cohort.
SNP selection and quality control
We included 205 SNPs in the CASC11–MYC–PVT1 region in the chromosome 8q24.21 (413 kb, chr8: 127,688,099- 128,101,210, GRCh38.p7). The genotype data were collected using Illumina Human1M-Du with the approximately 1 million genotype data of the whole genome. A discrepancy of <0.1% was used for duplicate genotypes obtained by using the same genotyping assay (39, 40). For population stratification for AAs, principal component analyses were performed using all available SNPs. The first four principal components, selected based on the screen plot method, were adjusted in modeling.
For quality control, SNPs were removed from the candidate list if their Hardy–Weinberg equilibrium (HWE) tests had a P < 2 × 10−4 (Bonferroni correction) in the controls. A total of 194 SNPs, which followed the HWE, were included for SNP main effect analyses. For SNP interaction analyses, SNPs with a minor allele frequency (MAF) ≥ 0.05 and one SNP in the pairs with a strong LD pair (r2 > 0.8) were included as candidates. Based on these criteria, 40 SNPs were excluded, and a total of 154 SNPs were included for the SNP–SNP interaction analyses. Other quality control details were reported previously (39, 40).
Individual SNP effects
The individual SNP effects were tested using logistic regression. All models were adjusted for study site and the four principal components. We tested the 194 candidate SNPs for main effects by considering three different inheritance modes: additive, dominant, and recessive (see Fig. 2). These modes were defined based on the minor allele. For each SNP, the best inheritance mode with the lowest P value was selected. First, we performed SNP main effect analyses associated with prostate cancer risk in the discovery set. For SNPs with a P < 0.05 in the discovery set, the main effects were tested using the same approach in the validation set. Only SNPs with a P < 0.05 in both discovery and validation sets were assessed in the combined set. An SNP main effect was defined as significant if it has P < 0.05 in both discovery and validation sets and P < 0.01 in the combined set. For the multi-SNP main effect model, the SNPs with significant main effects were treated as candidates, and the stepwise selection procedure in logistic regression with a significance criterion of 0.05 was applied. The R package “SNPassoc” was applied for SNP main effect analyses (42).
SNP–SNP interaction analyses
A total of 11,781 pairwise SNP pairs based on 154 SNPs in the target region were included for SNP–SNP interaction analyses. We applied the novel SIPI approach to evaluate SNP–SNP interactions associated with prostate cancer risk. As shown in Supplementary Table S1, SIPI tests 45 biologically meaningful interaction patterns for SNP–SNP interactions by considering the three key features: nonhierarchical models, inheritance modes, and mode coding direction (Supplementary Table S2; ref. 43). Among the 45 patterns for each SNP pair, the best interaction pattern is selected based on the lowest value of the Bayesian information criterion. The R package “SIPI” was used to detect SNP–SNP interactions (https://linhuiyi.github.io/LinHY_Software/).
SIPI was separately applied in the discovery and validation sets. For unipair analyses, only significant SNP pairs in the discovery set were further tested in the validation set (Fig. 2). The significant individual SNP effects were defined as the SNP pairs with P < 0.01 in both discovery and validation sets, and P < 0.001 in the combined set. These significance levels were selected based on the distribution of the empirical P values (Supplementary Fig. S1).
In addition to unipair analyses, a multipair prediction model was also conducted. We defined “super-SNPs” as those that occurred ≥4 times in the top SNP pairs (P < 0.01 in both discovery/validation sets and P < 0.001 in the combined set). When building the multipair model, a two-step variable selection approach was used. The pairs containing one same SNP were grouped as a cluster. In the first step, candidate SNP pairs were selected for each super-SNP cluster using the stepwise selection in logistic regression. Within each super-SNP cluster, associations among SNPs in this cluster were assessed using the LD r2 coefficient; the correlations of the pairs were tested using Pearson correlation. In the second step, the selected pairs in each cluster plus the SNP pairs not in the clusters and significant individual SNP effects were then treated as candidates to build the final multipair model using logistic regression with stepwise selection with a criterion of α = 0.05. To demonstrate the performance of SIPI, we compared the top SIPI-identified pairs with the conventional AA_Full approach.
Results
SNP individual effects
Among the 194 eligible SNPs in the discovery set (Fig. 2), 22 of them had a P < 0.05, and 4 of these 22 SNPs had a P < 0.05 in the validation set. These four SNPs were further tested in the combined set. Among them, three SNPs had P < 0.01 in the combined set (Supplementary Table S3), although none of them reached the stringent Bonferroni correction criterion P < 2.6 × 10−4. Two of the significant SNPs (rs9642880 and rs16902359) are in CASC11, whereas the third one, rs12680047, is in MYC. AA men with the CC genotype of rs16902359 tend to have a lower risk of prostate cancer (recessive, OR = 0.76 for CC vs. CT/TT, P = 2.8 × 10−4). AA men with the TT or TC genotype of rs12680047 tend to have a lower risk of prostate cancer risk (dominant, OR = 0.84 for TT/TC vs. CC, P = 3.5 × 10−3). The G allele of rs9642880 had a protective effect on prostate cancer risk (additive mode, OR per minor G allele = 0.87, P = 3.8 × 10−3) for AA men. Table 1 displays the uni-SNP and multi-SNP model results based on the SNP individual effects. In the multi-SNP model, two SNPs (rs12680047 and rs16902359) were selected using the stepwise selection in logistic regression. The effect size of these SNPs in the multipair model was very similar to that in the uni-SNP models.
. | . | Min < Maj . | . | Uni-SNP . | . | Multi-SNPa . | . |
---|---|---|---|---|---|---|---|
SNP . | Gene . | (MAF)b . | Mode . | OR (95% CI)b . | P value . | OR (95% CI)b . | P value . |
rs16902359 | CASC11 | C<T (0.46) | Rec | 0.76 (0.66–0.88) | 2.8 × 10−4 | 0.79 (0.68–0.92) | 0.002 |
rs12680047 | MYC:PVT1 | T<C (0.24) | Dom | 0.84 (0.74–0.94) | 3.5 × 10−3 | 0.87 (0.77–0.99) | 0.029 |
rs9642880 | CASC11 | G<T (0.25) | Add | 0.87 (0.79–0.96) | 3.8 × 10−3 | – | – |
. | . | Min < Maj . | . | Uni-SNP . | . | Multi-SNPa . | . |
---|---|---|---|---|---|---|---|
SNP . | Gene . | (MAF)b . | Mode . | OR (95% CI)b . | P value . | OR (95% CI)b . | P value . |
rs16902359 | CASC11 | C<T (0.46) | Rec | 0.76 (0.66–0.88) | 2.8 × 10−4 | 0.79 (0.68–0.92) | 0.002 |
rs12680047 | MYC:PVT1 | T<C (0.24) | Dom | 0.84 (0.74–0.94) | 3.5 × 10−3 | 0.87 (0.77–0.99) | 0.029 |
rs9642880 | CASC11 | G<T (0.25) | Add | 0.87 (0.79–0.96) | 3.8 × 10−3 | – | – |
Abbreviations: Add, additive; CI, confidence interval; Dom, dominant; Rec, recessive.
aAll models adjusted for study site and first four principal components. Model only considered SNP individual effects.
bMin/Maj: minor and major allele.
Results of SNP–SNP interactions
There are 11,781 SNP–SNP pairs based on the 154 SNPs. Among 1,162 SNP–SNP pairs with a P < 0.01 in the discovery set, 223 pairs had a P < 0.01 in the validation set (Fig. 1). In the combined set, none of the SNP–SNP interaction results reached the stringent Bonferroni correction criteria (P < 4.2 × 10−6), but the P values of several top SNP pairs were close to this cutoff. Specifically, the top SNP pair (rs2720659 and rs9642880) had a P value of 1.4 × 10−5, and there were 79 pairs with a P < 0.001 in the combined set.
Our results indicate that the CASC11–PVT1 interaction plays an important role in prostate cancer development. The full list of top 79 SNP–SNP pairs is displayed in Supplementary Table S4A–S4C, and the top 10 SNP–SNP pairs are shown in Table 2. Eight of the top 10 pairs were for interactions of PVT1–CASC11. Among these 79 top SNP pairs, the most common gene–gene interaction (55 of 79 pairs) was also the interaction of PVT1 and CASC11. There were eight pairs for the interaction of MYC–PVT1. Interestingly, there were 65 pairs that contained one of the three SNPs (i.e., rs9642880, rs16902359, and rs12680047) with a significant individual effect. The remaining 14 pairs with SNPs without a significant individual effect suggest that some pure SNP–SNP interactions were associated with prostate cancer risk in AA men.
. | . | . | Combined . | . | . | . | . | ||
---|---|---|---|---|---|---|---|---|---|
SNP1 . | SNP2 . | Patternb . | Combination (SNP1+SNP2)b . | P value . | OR (95% CI) . | DiscoveryP value . | ValidationP value . | Gene1 . | Gene2 . |
rs2720659* | rs9642880* | AA_int_ro | G and G allele | 1.4 × 10−5 | 0.86 (0.81–0.92) | 6.6 × 10−4 | 4.3 × 10−3 | MIR-1207: PVT1 | CASC11 |
rs2648902* | rs16902359* | RR_int_ro | GG/AG + CC vs. others | 1.9 × 10−5 | 0.72 (0.62–0.84) | 4.2 × 10−4 | 9.3 × 10−4 | PVT1 | CASC11 |
rs2720709 | rs9642880 | AA_int_ro | G and G allele | 2.0 × 10−5 | 0.87 (0.82–0.93) | 1.1 × 10−3 | 2.2 × 10−3 | PVT1 | CASC11 |
rs2720667* | rs16902359* | RR_int_rr | AA/GA + TT/CT vs. others | 3.5 × 10−5 | 1.33 (1.16–1.52) | 1.2 × 10−4 | 1.7 × 10−3 | PVT1 | CASC11 |
rs12547643 | rs16902359 | RR_int_rr | GG/AG + TT/CT vs. others | 3.7 × 10−5 | 1.35 (1.17–1.56) | 2.6 × 10−3 | 3.7 × 10−3 | CASC11 | CASC11 |
rs4733828* | rs16902359* | DR_int_rr | AA+ TT/CT vs. others | 3.7 × 10−5 | 1.31 (1.15–1.48) | 1.9 × 10−3 | 1.3 × 10−3 | PVT1 | CASC11 |
rs16902359 | rs16902510 | RR_int_rr | TT/CT + GG/TG vs. others | 3.9 × 10−5 | 1.33 (1.16–1.53) | 1.9 × 10−3 | 8.8 × 10−3 | CASC11 | PVT1 |
rs4476972 | rs16902359 | DR_int_rr | GG+ TT/CT vs. others | 4.6 × 10−5 | 1.30 (1.15–1.48) | 5.8 × 10−3 | 6.4 × 10−4 | PVT1 | CASC11 |
rs4326353 | rs16902359 | RR_int_rr | AA/GA+ TT/CT vs. others | 5.2 × 10−5 | 1.32 (1.16–1.52) | 8.2 × 10−3 | 2.4 × 10−3 | MYC:PVT1 | CASC11 |
rs3901778 | rs4733828 | RD_int_ro | GG/AG+ GA/GG vs. others | 5.3 × 10−5 | 0.71 (0.60–0.84) | 7.5 × 10−3 | 2.9 × 10−3 | PVT1 | PVT1 |
. | . | . | Combined . | . | . | . | . | ||
---|---|---|---|---|---|---|---|---|---|
SNP1 . | SNP2 . | Patternb . | Combination (SNP1+SNP2)b . | P value . | OR (95% CI) . | DiscoveryP value . | ValidationP value . | Gene1 . | Gene2 . |
rs2720659* | rs9642880* | AA_int_ro | G and G allele | 1.4 × 10−5 | 0.86 (0.81–0.92) | 6.6 × 10−4 | 4.3 × 10−3 | MIR-1207: PVT1 | CASC11 |
rs2648902* | rs16902359* | RR_int_ro | GG/AG + CC vs. others | 1.9 × 10−5 | 0.72 (0.62–0.84) | 4.2 × 10−4 | 9.3 × 10−4 | PVT1 | CASC11 |
rs2720709 | rs9642880 | AA_int_ro | G and G allele | 2.0 × 10−5 | 0.87 (0.82–0.93) | 1.1 × 10−3 | 2.2 × 10−3 | PVT1 | CASC11 |
rs2720667* | rs16902359* | RR_int_rr | AA/GA + TT/CT vs. others | 3.5 × 10−5 | 1.33 (1.16–1.52) | 1.2 × 10−4 | 1.7 × 10−3 | PVT1 | CASC11 |
rs12547643 | rs16902359 | RR_int_rr | GG/AG + TT/CT vs. others | 3.7 × 10−5 | 1.35 (1.17–1.56) | 2.6 × 10−3 | 3.7 × 10−3 | CASC11 | CASC11 |
rs4733828* | rs16902359* | DR_int_rr | AA+ TT/CT vs. others | 3.7 × 10−5 | 1.31 (1.15–1.48) | 1.9 × 10−3 | 1.3 × 10−3 | PVT1 | CASC11 |
rs16902359 | rs16902510 | RR_int_rr | TT/CT + GG/TG vs. others | 3.9 × 10−5 | 1.33 (1.16–1.53) | 1.9 × 10−3 | 8.8 × 10−3 | CASC11 | PVT1 |
rs4476972 | rs16902359 | DR_int_rr | GG+ TT/CT vs. others | 4.6 × 10−5 | 1.30 (1.15–1.48) | 5.8 × 10−3 | 6.4 × 10−4 | PVT1 | CASC11 |
rs4326353 | rs16902359 | RR_int_rr | AA/GA+ TT/CT vs. others | 5.2 × 10−5 | 1.32 (1.16–1.52) | 8.2 × 10−3 | 2.4 × 10−3 | MYC:PVT1 | CASC11 |
rs3901778 | rs4733828 | RD_int_ro | GG/AG+ GA/GG vs. others | 5.3 × 10−5 | 0.71 (0.60–0.84) | 7.5 × 10−3 | 2.9 × 10−3 | PVT1 | PVT1 |
Abbreviation: CI, confidence interval.
aBased on P < 0.01 in the discovery and validation sets, and P < 0.001 in the combined set; all models adjusted for study site and first four principal components.
bAA, additive-additive; DD, dominant-dominant; DR, dominant-recessive; RD, recessive-dominant; RR, recessive-recessive.
_int, interaction-only; _oo, _or, _ro, _rr, original-original, original-reverse, reverse-original, and reverse-reverse coding for SNP1 and SNP2. “AA_int” pattern means a monotonic risk (OR > 1) or protective (OR < 1) effect based on the selected alleles. See Supplementary Tables S2 and S3 for details.
These results show that rs16902359 in CASC11 plays a great role in SNP–SNP interactions for AA men. Over two thirds of the interaction pairs contain rs16902359 (70.9%). Some SNPs showed up several times in these top identified pairs. We defined the SNPs frequently involved in significant SNP–SNP interactions (with ≥4 pairs) as “super-SNPs.” There are five super-SNPs, which are not in strong LD (all pairwise r2 < 0.3, Supplementary Fig. S2). Other four super-SNPs were CASC11 rs9642880 (9 pairs), PVT1 rs4476972 (5 pairs), PVT1 rs4733828 (5 pairs), and rs12680047 located between MYC and PVT1 (4 pairs). For the pairs inside each super-SNP cluster, some pairs in rs16902359 and rs9642880 were highly correlated [Pearson correlation (r) ≥ 0.8, see Supplementary Table S5]. Approximately 44.5% of the pairwise tests of the SNP pairs involving rs16902359 were highly correlated. This high correlation among SNP pairs was also observed for the pairs involving rs9642880 (30.6% test with r ≥ 0.8). However, the LD for these SNPs in the super-SNP pairs was weak (mean of LD r2 close to 0, Supplementary Table S5).
Examples of SNP–SNP interaction pairs with a large effect size
Among the top 79 SNP interactions, three of them had a larger impact on prostate cancer risk with an effect size close to medium (0.5 ≤ OR < 0.67 or 1.5 ≤ OR < 2) or large (OR ≤ 0.50 or OR ≥ 2.00). As shown in the heatmap plots in Fig. 3A–C, these three SNP pairs are rs963475-rs1863563 (PVT1 and miR-1206:PVT1), rs2720685-rs10087240 (both in PVT1), and rs9642880-rs2173537 (CASC11–PVT1). The interaction between PVT1 rs963475 and miR-1206:PVT1 rs1863563 had the pattern of RD_int_oo, an original-recessive and original-dominant interaction-only model, identified by SIPI (Fig. 3A). This pattern indicated that AA men with an “AA + GA/AA” genotype suggested a lower risk of developing prostate cancer (OR = 0.27, P = 9.6 × 10−4) compared with the group with other seven genotypes in this SNP pair. The observed prostate cancer prevalence for AA men with the genotype combination of AA + GA/AA for this SNP pair was 22% (=8/37) compared with prostate cancer prevalence for other genotype combinations (45%–53%). In addition, both SNP individual effects alone were not significant (P = 0.409 for rs963475, and P = 0.079 for rs1863563). Prostate cancer prevalence for AA men for the three genotypes of PVT1 rs963475 was very similar (46%–49%); the same observation was obtained for rs1863563 (45%–49%).
For the pair of rs2720685-rs10087240 (both in PVT1) with the RR_int_oo pattern, an original-recessive and original-recessive interaction-only model, AA men with the “AA + TT” genotype had a lower risk of developing prostate cancer (OR = 0.54, P = 9.2 × 10−5) compared with the group with other eight genotypes. As shown in Fig. 3B, prostate cancer prevalence for AA men with the “AA + TT” genotype was 34%, which was significantly lower than those with other genotype combinations in this SNP pair (47%–53%). Similarly, the individual effects of two SNPs were not significant. Prostate cancer prevalence for AA men for the three genotypes of PVT1 rs2720685 was very similar (46%–49%), and the same observation was obtained for rs2720685 (47%–50%). For the pair of rs9642880-rs2173537 (CASC11–PVT1) shown in Fig. 3C, the identified best interaction pattern is the “RD_int_or” pattern, an original-recessive and reverse-dominant interaction-only model. This means that AA men with the GG+ GG genotype had a lower risk of developing prostate cancer (35%) compared with other genotypes (43%–55%), except the GG+ AA genotype. The SIPI did not consider GG+ AA as a distinct risk group because this estimate was not stable due to the small sample size in this group. The individual SNP effect of rs2173537 was not significantly associated with prostate cancer risk. Among the top 79 SNP pairs with the “AA_int” (additive-additive interaction only) pattern, the SNP pair with the largest effect size is rs4476972 and rs4733789 in PVT1 (Fig. 3D; Table 3). This pair with an “AA_int_or” pattern means the men with the minor allele T in rs4476972 and with the major allele C in rs4733789 tend to have a lower risk of developing prostate cancer (OR = 0.77). This SNP pair was also significant in the final multipair model. These observations demonstrated that these SIPI-identified interactions could have a larger effect on prostate cancer risk prediction than SNP individual effects.
. | . | . | Unipaira . | . | Multipaira . | . | . | . |
---|---|---|---|---|---|---|---|---|
SNP pairb . | Modelc . | Combination (SNP1+SNP2)c . | OR (95% CI) . | P value . | OR (95% CI) . | P value . | Gene1 . | Gene2 . |
rs2720659 and rs9642880*& | AA_int_ro | G and G allele | 0.86 (0.81–0.92) | 1.4 × 10−5 | 0.91 (0.85–0.98) | 0.014 | MIR-1207: PVT1 | CASC11 |
rs2720667 and rs16902359*& | RR_int_rr | AA/GA + TT/CT vs. others | 1.33 (1.16–1.52) | 3.5 × 10−5 | 1.17 (1.01–1.35) | 0.032 | PVT1 | CASC11 |
rs3901778 and rs4733828& | RD_int_ro | AA+GG vs. others | 0.71 (0.60–0.84) | 5.4 × 10−5 | 0.78 (0.66–0.93) | 0.005 | PVT1 | PVT1 |
rs2720685 and rs10087240 | RR_int_oo | AA+ TT vs. others | 0.54 (0.40–0.74) | 9.3 × 10−5 | 0.55 (0.40–0.76) | 2.1 × 10−4 | PVT1 | PVT1 |
rs4733809 and rs12680047*& | AA_int_oo | T and T allele | 0.85 (0.79–0.93) | 1.4 × 10−4 | 0.88 (0.81–0.96) | 0.005 | PVT1 | MYC:PVT1 |
rs4476972& and rs4733789 | AA_int_or | T and C allele | 0.77 (0.68–0.89) | 1.8 × 10−4 | 0.79 (0.69–0.90) | 6.8 × 10−4 | PVT1 | PVT1 |
rs963475 and rs1863563 | RD_int_oo | AA+ GA/AA vs. others | 0.27 (0.12–0.58) | 9.7 × 10−4 | 0.25 (0.12–0.56) | 6.4 × 10−4 | PVT1 | MIR-1206:PVT1 |
. | . | . | Unipaira . | . | Multipaira . | . | . | . |
---|---|---|---|---|---|---|---|---|
SNP pairb . | Modelc . | Combination (SNP1+SNP2)c . | OR (95% CI) . | P value . | OR (95% CI) . | P value . | Gene1 . | Gene2 . |
rs2720659 and rs9642880*& | AA_int_ro | G and G allele | 0.86 (0.81–0.92) | 1.4 × 10−5 | 0.91 (0.85–0.98) | 0.014 | MIR-1207: PVT1 | CASC11 |
rs2720667 and rs16902359*& | RR_int_rr | AA/GA + TT/CT vs. others | 1.33 (1.16–1.52) | 3.5 × 10−5 | 1.17 (1.01–1.35) | 0.032 | PVT1 | CASC11 |
rs3901778 and rs4733828& | RD_int_ro | AA+GG vs. others | 0.71 (0.60–0.84) | 5.4 × 10−5 | 0.78 (0.66–0.93) | 0.005 | PVT1 | PVT1 |
rs2720685 and rs10087240 | RR_int_oo | AA+ TT vs. others | 0.54 (0.40–0.74) | 9.3 × 10−5 | 0.55 (0.40–0.76) | 2.1 × 10−4 | PVT1 | PVT1 |
rs4733809 and rs12680047*& | AA_int_oo | T and T allele | 0.85 (0.79–0.93) | 1.4 × 10−4 | 0.88 (0.81–0.96) | 0.005 | PVT1 | MYC:PVT1 |
rs4476972& and rs4733789 | AA_int_or | T and C allele | 0.77 (0.68–0.89) | 1.8 × 10−4 | 0.79 (0.69–0.90) | 6.8 × 10−4 | PVT1 | PVT1 |
rs963475 and rs1863563 | RD_int_oo | AA+ GA/AA vs. others | 0.27 (0.12–0.58) | 9.7 × 10−4 | 0.25 (0.12–0.56) | 6.4 × 10−4 | PVT1 | MIR-1206:PVT1 |
Abbreviation: CI, confidence interval.
aUnipair: model with only 1 SNP pair; multipair: model with multiple SNP pairs; all models adjusted for study site and first four principal components.
b&, super-SNP, which occurred ≥4 times in the top 79 SNP pairs; *, SNP with a significant individual effect.
cAA, additive-additive; DD, dominant-dominant; DR, dominant-recessive; RD, recessive-dominant; RR, recessive-recessive.
_int, Interaction-only; _oo, _or, _ro, _rr, original-original, original-reverse, reverse-original, and reverse-reverse coding for SNP1 and SNP2. “AA_int” pattern means a monotonic risk (OR > 1) or protective (OR < 1) effect based on the selected alleles. See Supplementary Tables S2 and S3 for details.
A multipair prediction model
The final multipair model is displayed in Table 3. A total of seven SNP pairs were selected for the final multivariable model. To avoid multicollinearity, the two-step variable selection approach for building the multivariable model was used. For each super-SNP cluster, only one SNP pair was selected. Of the seven pairs, five pairs contained a super-SNP. There were three pairs containing an SNP, which had a significant individual effect. Two SNP pairs did not consist of a super-SNP or an SNP with a significant individual effect: rs2720685-rs10087240 and rs963475-rs1863563. This multipair model includes several SNP pairs with a moderate to large effect size.
Comparison between SIPI and AA_Full approach
To demonstrate how powerful SIPI is, the top 79 SNP–SNP pairs identified by SIPI were evaluated in the combined set using the AA_Full approach—the most commonly used approach for analyzing SNP–SNP interactions. Using the AA_Full approach, only two SNP pairs (2.5%) had P < 0.001 in the combined set out of the 79 top pairs. As shown in Supplementary Table S6, these two interaction pairs were rs2720659-rs9642880 (P = 2.6 × 10−5) and rs2720709-rs9642880 (P = 2.0 × 10−5) using the AA_Full approach. The minor/major allele and locations of the SNPs involved in these 79 pairs were listed in Supplementary Table S7A and S7B.
Discussion
We have showed that three individual SNPs and 79 SNP–SNP interaction pairs are significantly associated with prostate cancer risk. These three SNPs with a significant individual effect are rs9642880 (CASC11), rs12680047 (MYC), and rs16902359 (CASC11). These three SNPs also frequently interact with other SNPs in the CACS11–MYC–PVT1 region associated with prostate cancer risk in AA men. In a previous study for AA men (44), the CASC11 SNP (rs9642880) did not impact on prostate cancer risk (P = 0.13). In a neighborhood region (chromosome 8: 126.8–127.8 Mb, GRCh38) of our target region, several SNPs in PRNCR1, PCAT1, and PCAT2 were identified to be associated with prostate cancer risk for AA men (45). For EA men, these two CASC11 SNPs (rs9642880 and rs16902359) do not show a significant impact on prostate cancer risk (46). However, these two CASC11 SNPs have a demonstrable link to other cancers. The CASC11 rs9642880 is a GWAS SNP associated with bladder cancer (25, 26, 47), and rs16902359 has a reported association with colorectal cancer (48). Among the top 79 SNP pairs, CASC11 rs16902359 was involved in 56 pairs (70.9% of top 79 pairs), CASC11 rs9642880 was involved in 9 pairs, and MYC: PVT1 rs12680047 was involved in 4 pairs. Including these three SNPs, another two super-SNPs, rs4476972 and rs4733828 in PVT1, also frequently interacted with other SNPs associated with prostate cancer risk.
All top 79 SNP pairs associated with prostate cancer risk had an interaction-only pattern out of the 45 SIPI interaction pairs. The conventional statistical approach (AA_Full) could only identify less than 3% of the top 79 pairs. This demonstrates the limitation of the convention full interaction approach. When considering the three SNP individual effects and the 79 SNP pairs for building a prediction model, none of the individual effects but only pure SNP–SNP interactions were selected. These results indicate that SNP–SNP interactions are more powerful than the SNP individual effects in terms of predicting prostate cancer risk in AA men.
Several SNP interaction pairs (Fig. 3) have a moderate to large effect size associated with prostate cancer risk. As shown in Fig. 3, the ORs are in a range of 0.27 to 0.68 for the SIPI-detected subgroups. By converting them to a risk effect, the ORs are 1.5 to 3.7. Two of these pairs with a large effect size are also included in the multipair model. Comparing with the GWAS-identified SNPs, they have a small effect size of cancer risk with a median of 1.2 per allele OR (49). Several polygenic risk scores by adding SNP individual effects have been proposed for both AA and EA men (6, 35, 50). Our identified SNP interaction pairs could be integrated to the existing polygenic risk scores to increase prediction power for prostate cancer risk in AA men.
MYC is a protein-coding gene, and d PVT1 and CASC11 are noncoding genes. In our target region, the most common gene–gene interaction (55 of 79 pairs) is the interaction of PVT1 and CASC11. There are eight pairs for the interaction of MYC–PVT1. Based on our knowledge, there is no literature supporting the direct link between PVT1 and CASC11, but previous studies show linkage of PVT1 and CASC11 through MYC (8, 20–22, 51). The link of PVT1 and MYC has been reported. PVT1 is a downstream gene to MYC. Increased copies of MYC are often accompanied with coamplification of PVT1, and PVT1's expression has been shown to be influenced by 8q24 genetic variation in relation to cancer (8, 21). Guan and colleagues have suggested that PVT1 acts as an MYC activator. Previous studies have shown that PVT1 has been implicated to interact with MYC, has significant association with MYC and expression of colorectal cancers, and has significant correlation with MYC expression (8, 51, 52). In addition to prostate cancer, PVT1 expression has been linked to the end-stage renal disease attributed to type I diabetes, poor prognosis of gastric cancer, ovarian cancer, breast cancer, and colorectal cancer (20, 53–55). In addition to coexpression with MYC, PVT1 has been shown to act independently to inhibit apoptosis when amplified and overexpressed (20, 21). The miRNAs have been implicated to regulate MYC expression by regulating factors that activate MYC, but it is not known whether this effect is direct or indirect (8). Although a clear functional role for these transcripts is not known, the miRNAs that are located within the PVT1 region have also been shown to play important roles in disease risk (21). miR-1207-3p has been linked to prostate cancer. An increase in miR-1207-3p expression in prostate cancer tissue has been significantly associated with more aggressive prostate cancer features. This overexpression in miR-1207-3p has the potential to serve as a prognostic biomarker for prostate cancer. Males of African ancestry have a significantly lower expression of miR-1207-3p in prostate cancer tissues compared with those of Caucasian men. The difference in expression could play a great role in explaining the reason why prostate cancer is more aggressive in AA men (56). Das and Ogunwobi suggest that miR-1207 underexpression may be associated with the onset of progression of prostate cancer and correlates with tumor aggressiveness (57). The impact of CASC11 on prostate cancer has not been reported. However, CASC11 has been found to be overexpressed on colorectal cancer tumors and correlates with large primary colorectal tumors. MYC has been linked to CASC11 by binding and promoting the CASC11 gene, activating its transcription (51), and enhancing promoter histone acetylation to increase CASC11 expression in colorectal cancer (48).
The findings in this study provide a new understanding with risk loci associated with prostate cancer risk in the 8q24.21 region of AA men. Although there is no direct link to support our major findings of SNP–SNP interactions of CASC11–PVT1 for both AA and EA men, the indirect link of these two genes through MYC has been shown. The biological functions of interactions among MYC, PVT1, and CASC11 genes are still unclear. The numerous identified SNP–SNP interactions between genes PVT1 and CASC11 lead us to a new area of research for investigating their biological functions for prostate cancer development through performing two-way expression quantitative trait loci (eQTL) analyses in future studies. The conventional one-way eQTL analyses evaluate the one-to-one relationship between one SNP and one gene expression. The two-way eQTL analyses evaluate associations between one SNP interaction pair (two SNPs) and one gene expression. Further cis- and trans-eQTL analyses of these two genes can evaluate whether the combination of these two specific SNPs influences a gene expression in PVT1, CASC11, or other faraway genes. In addition, the miRNAs found in PVT1 may play a role in these gene–gene interactions. To our knowledge, this is the first report of gene/SNP interactions among MYC, PVT1, and CASC11 linked to prostate cancer risk in AA men. The SIPI method has been proven to be a powerful tool to uncover significant gene–gene interactions in relation to prostate cancer risk. Although none of the SNP individual effects reach the Bonferroni correction criteria, our findings are solid because of the discovery–validation design. The limitations of this study are listed below. The behavioral and environmental factors were not taken into consideration because of the missing data issue (∼55% missing) for these behavioral factors. The SNP–SNP interactions associated with prostate cancer progression were not investigated due to a limited sample size of aggressive prostate cancer cases (680 with aggressive tumors). Further research can study the biological reasons for these SNP–SNP interactions associated with prostate cancer risk and progression, and compare the results among different race groups. Larger studies are warranted to further validate the identified SNP–SNP interactions and evaluate their potential biological functions.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: H.-Y. Lin, C.Y. Callan, J.Y. Park
Development of methodology: H.-Y. Lin
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): H.-Y. Lin, H.-Y. Tung
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): H.-Y. Lin, C.Y. Callan, H.-Y. Tung, J.Y. Park
Writing, review, and/or revision of the manuscript: H.-Y. Lin, Z. Fang, J.Y. Park
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): H.-Y. Lin, C.Y. Callan
Study supervision: H.-Y. Lin, Z. Fang
Acknowledgments
We thank our anonymous reviewers for their valuable comments, which have led to many improvements to this article. This study was supported by the NCI (R21CA202417, principal investigator: H.-Y. Lin).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.