It is widely hypothesized that the interactions of multiple genes influence individual risk to prostate cancer. However, current efforts at identifying prostate cancer risk genes primarily rely on single-gene approaches. In an attempt to fill this gap, we carried out a study to explore the joint effect of multiple genes in the inflammation pathway on prostate cancer risk. We studied 20 genes in the Toll-like receptor signaling pathway as well as several cytokines. For each of these genes, we selected and genotyped haplotype-tagging single nucleotide polymorphisms (SNP) among 1,383 cases and 780 controls from the CAPS (CAncer Prostate in Sweden) study population. A total of 57 SNPs were included in the final analysis. A data mining method, multifactor dimensionality reduction, was used to explore the interaction effects of SNPs on prostate cancer risk. Interaction effects were assessed for all possible n SNP combinations, where n = 2, 3, or 4. For each n SNP combination, the model providing lowest prediction error among 100 cross-validations was chosen. The statistical significance levels of the best models in each n SNP combination were determined using permutation tests. A four-SNP interaction (one SNP each from IL-10, IL-1RN, TIRAP, and TLR5) had the lowest prediction error (43.28%, P = 0.019). Our ability to analyze a large number of SNPs in a large sample size is one of the first efforts in exploring the effect of high-order gene-gene interactions on prostate cancer risk, and this is an important contribution to this new and quickly evolving field.

Genetic susceptibility to prostate cancer is consistently observed from a large number of case-control studies, twin studies, and segregation analyses (1). Inference from tumorigenesis and results from genetic modeling studies suggest that several major susceptibility genes and many modifier genes underlie this genetic susceptibility. Although it is widely hypothesized that the interactions of these genes, either additively or epistatically, determine the individual risk to prostate cancer, current efforts in identifying prostate cancer risk genes rely on single-gene approaches. This gap is largely attributed to a combination of factors, including difficulties in genotyping a large number of variants in many genes, inadequate analytic approaches and computing power to model gene-gene interaction, and a small sample size to achieve reasonable power to detect interaction.

In an attempt to fill this gap and to explore the joint effect of multiple sequence variants on prostate cancer risk, we designed a study to systematically evaluate a large number of sequence variants among multiple genes in the inflammation pathway in a large prostate cancer case-control study population. In addition to assessing a main effect on prostate cancer risk for each sequence variant, we explored the joint effects of multiple sequence variants using a data mining method, multifactor dimensionality reduction (MDR). We systematically evaluated the ability of this approach to classify and predict which individuals were affected with prostate cancer based on any combination of two, three, or four variants from all the genotyped variants. We found that the interaction of four inflammation pathway genes significantly predicts prostate cancer risk.

Study Population

The study design and description were described in detail elsewhere (2). Briefly, this is a large-scale population-based case-control study in Sweden, named CAPS (CAncer Prostate in Sweden). Prostate cancer patients were identified and recruited from four of the six regional cancer registries in Sweden. The inclusion criterion for cases was pathologically or cytologically verified adenocarcinoma of the prostate, diagnosed between July 1, 2001 and September 30, 2002. Control subjects were randomly selected from the continuously updated Swedish Population Register and frequency-matched according to age (within 5 years) and geographic origin of the cases. In total, 1,444 cases and 866 controls were recruited. Among them, DNA samples and questionnaires were available for 1,383 cases and 780 controls. The clinical characteristics of the study subjects are presented in Table 1. The cases were further classified as advanced (prone to progressive disease) if they met any of the following criteria: T3/4, N+, M+, grade 3, Gleason score sum 8 to 10, or PSA >50; otherwise, they were classified as localized. All subjects that participated in this study gave full informed consent.

Table 1.

SNPs included in the MDR analysis

GeneSNPPositionVariationMinor allele
AlleleFrequency in casesFrequency in controls
TLR6-1-10 rs5743604 −833 T/C 0.24 0.20 
TLR2 rs3804100 1,349 T/C 0.07 0.07 
TLR3 rs5743305 −8,411 T/A 0.34 0.36 
 rs3775296 −7 C/A 0.18 0.20 
 rs5743313 2,593 C/T 0.18 0.19 
TLR4 rs1927914 −2,026 A/G 0.33 0.33 
 IIPGA-TLR4-2856 −1,607 T/C 0.15 0.14 
 IIPGA-TLR4-18208 3,747 A/C 0.02 0.02 
 rs4986790 8,551 A/G 0.05 0.06 
 IIPGA-TLR4-14078 9,614 G/A 0.05 0.06 
 rs7873784 12,185 G/C 0.14 0.14 
 IIPGA-TLR4-15844 11,380 G/C 0.13 0.11 
TLR5 IIPGA-TLR5-5187 −27,694 C/T 0.49 0.47 
 rs2072493 1,774 A/G 0.14 0.16 
 rs5744174 1,845 T/C 0.48 0.47 
 rs1053954 2,522 A/G 0.08 0.08 
TLR7 rs2302267 −120 T/G 0.06 0.05 
 rs179019 4,271 C/A 0.23 0.23 
 rs179008 17,961 A/T 0.13 0.13 
TLR8 rs1548731 −558 C/T 0.26 0.27 
 rs4830806 3,467 C/T 0.39 0.38 
 rs5744068 6,553 C/T 0.16 0.17 
TLR9 rs187084 −1,486 T/C 0.44 0.44 
IL-1RN rs878972 2,117 A/C 0.27 0.28 
 rs315934 8,110 T/C 0.19 0.20 
 rs315951 14,991 C/G 0.32 0.34 
 rs3087263 19,172 A/G 0.09 0.10 
TIRAP rs4251431 9,537 T/C 0.37 0.37 
 TIRAP_14115 14,115 T/G 0.24 0.21 
 TIRAP_17678 17,678 G/A 0.05 0.05 
MyD88 rs4988453 −938 C/A 0.06 0.07 
IRAK1 rs1059703 6,434 C/T 0.13 0.14 
 rs30278898 9,373 T/G 0.18 0.19 
IRAK4 rs4251571 −2,001 A/G 0.02 0.02 
 rs4251487 7,987 G/C 0.02 0.01 
 rs4251545 18,380 G/A 0.10 0.10 
TNF rs2799724 −1,037 C/T 0.07 0.07 
 rs3093662 670 A/G 0.03 0.03 
 rs3093664 1,123 A/G 0.07 0.08 
 rs3093665 1,872 A/C 0.01 0.02 
IL-6 rs1800797 −661 G/A 0.51 0.54 
 rs1800796 −636 G/C 0.04 0.05 
 rs1800795 −237 G/C 0.49 0.48 
 rs1474348 1,027 G/C 0.51 0.52 
 rs2069845 3,268 G/A 0.52 0.51 
 rs2069860 4,157 A/T 0.01 0.01 
IL-10 rs1800896 −1,117 A/G 0.47 0.49 
 rs1800872 −627 C/A 0.26 0.25 
 rs1554286 1,547 C/T 0.21 0.19 
 rs3024509 2,483 CT/C 0.06 0.08 
 rs3024505 5,876 C/T 0.13 0.13 
COX2 rs2745557 201 T/C 0.16 0.15 
 rs20432 3,099 T/G 0.87 0.84 
 rs4648276 3,934 T/C 0.10 0.12 
 rs5275 6,364 T/C 0.35 0.36 
 rs689470 8,364 C/T 0.02 0.04 
MIC1 rs1058587 2,433 G/C 0.27 0.29 
GeneSNPPositionVariationMinor allele
AlleleFrequency in casesFrequency in controls
TLR6-1-10 rs5743604 −833 T/C 0.24 0.20 
TLR2 rs3804100 1,349 T/C 0.07 0.07 
TLR3 rs5743305 −8,411 T/A 0.34 0.36 
 rs3775296 −7 C/A 0.18 0.20 
 rs5743313 2,593 C/T 0.18 0.19 
TLR4 rs1927914 −2,026 A/G 0.33 0.33 
 IIPGA-TLR4-2856 −1,607 T/C 0.15 0.14 
 IIPGA-TLR4-18208 3,747 A/C 0.02 0.02 
 rs4986790 8,551 A/G 0.05 0.06 
 IIPGA-TLR4-14078 9,614 G/A 0.05 0.06 
 rs7873784 12,185 G/C 0.14 0.14 
 IIPGA-TLR4-15844 11,380 G/C 0.13 0.11 
TLR5 IIPGA-TLR5-5187 −27,694 C/T 0.49 0.47 
 rs2072493 1,774 A/G 0.14 0.16 
 rs5744174 1,845 T/C 0.48 0.47 
 rs1053954 2,522 A/G 0.08 0.08 
TLR7 rs2302267 −120 T/G 0.06 0.05 
 rs179019 4,271 C/A 0.23 0.23 
 rs179008 17,961 A/T 0.13 0.13 
TLR8 rs1548731 −558 C/T 0.26 0.27 
 rs4830806 3,467 C/T 0.39 0.38 
 rs5744068 6,553 C/T 0.16 0.17 
TLR9 rs187084 −1,486 T/C 0.44 0.44 
IL-1RN rs878972 2,117 A/C 0.27 0.28 
 rs315934 8,110 T/C 0.19 0.20 
 rs315951 14,991 C/G 0.32 0.34 
 rs3087263 19,172 A/G 0.09 0.10 
TIRAP rs4251431 9,537 T/C 0.37 0.37 
 TIRAP_14115 14,115 T/G 0.24 0.21 
 TIRAP_17678 17,678 G/A 0.05 0.05 
MyD88 rs4988453 −938 C/A 0.06 0.07 
IRAK1 rs1059703 6,434 C/T 0.13 0.14 
 rs30278898 9,373 T/G 0.18 0.19 
IRAK4 rs4251571 −2,001 A/G 0.02 0.02 
 rs4251487 7,987 G/C 0.02 0.01 
 rs4251545 18,380 G/A 0.10 0.10 
TNF rs2799724 −1,037 C/T 0.07 0.07 
 rs3093662 670 A/G 0.03 0.03 
 rs3093664 1,123 A/G 0.07 0.08 
 rs3093665 1,872 A/C 0.01 0.02 
IL-6 rs1800797 −661 G/A 0.51 0.54 
 rs1800796 −636 G/C 0.04 0.05 
 rs1800795 −237 G/C 0.49 0.48 
 rs1474348 1,027 G/C 0.51 0.52 
 rs2069845 3,268 G/A 0.52 0.51 
 rs2069860 4,157 A/T 0.01 0.01 
IL-10 rs1800896 −1,117 A/G 0.47 0.49 
 rs1800872 −627 C/A 0.26 0.25 
 rs1554286 1,547 C/T 0.21 0.19 
 rs3024509 2,483 CT/C 0.06 0.08 
 rs3024505 5,876 C/T 0.13 0.13 
COX2 rs2745557 201 T/C 0.16 0.15 
 rs20432 3,099 T/G 0.87 0.84 
 rs4648276 3,934 T/C 0.10 0.12 
 rs5275 6,364 T/C 0.35 0.36 
 rs689470 8,364 C/T 0.02 0.04 
MIC1 rs1058587 2,433 G/C 0.27 0.29 

NOTE: For SNPs which have not been assigned an rs identifier, we used either an IIPGA identifier if available or the relative position of the SNP in the gene (from the transcription site). The relative position of TLR4 is based on isoform A. The relative position of IL-1RN is based on isoform 3.

Selection of Genes and Single Nucleotide Polymorphisms

We intend to systematically evaluate the association of sequence variants of genes in the inflammation pathway and prostate cancer risk. Because the exact mechanism by which inflammation might act in tumor development and progression is largely unknown, the selection of genes for association studies is difficult. As an initial step, we begin with genes that play an important role in the Toll-like receptor (TLR) signaling pathway. The majority of the 20 genes code for proteins that are a major player in this signaling pathway, including receptors (10 TLRs and IL-1RN), adaptors (MyD88 and TIRAP), kinases [IL-1R-associated kinase (IRAK)-1 and IRAK4], and cytokines and other effector genes [tumor necrosis factor, interleukin (IL)-6, IL-10, and cyclooxygenase-2]. For each of these genes, we identified haplotype-tagging single nucleotide polymorphisms (SNP) that can represent at least 95% of the haplotypes of these genes in our study population. We first selected validated SNP based on a publicly available database from the Innate Immunity Program for Genetic Application (http://innateimmunity.net) using two criteria: (a) a minor allele frequency of at least 5%, at a resolution of 1 SNP per kb across the genomic region and 2.5 kb of the upstream promoter of each gene, and (b) all SNPs that lead to an amino acid substitution. We then genotyped these selected SNPs in 96 CAPS control subjects. The haplotypes of these SNPs were then estimated using a computer program, PHASE 2.0 (http://www.stats.ox.ac.uk/mathgen/software.html; ref. 3). Haplotype blocks in this region were constructed and haplotype-tagging SNPs were selected using HaploBlockFinder, a web-based program (http://cgi.uc.edu/cgi-bin/kzhang/haploBlockFinder.cgi/). A threshold of minimal pair-wise D′ = 0.8 was used to define a block. In total, 104 SNPs from 20 genes were selected for genotyping in the entire collection of 1,383 cases and 780 controls.

Genotyping Methods

Genotyping was done using the MassARRAY system (Sequenom, Inc., San Diego, CA). For the MassARRAY assay, PCR and extension primers for sequence variants were designed using SpectroDesigner software (Sequenom). The primer information is available at the corresponding author's web site (http://www.wfubmc.edu/genomics). PCR and extension reactions were done according to the manufacturer's instructions, and extension product sizes were determined by mass spectrometry.

Statistical Analysis

A Hardy-Weinberg Equilibrium test was done for each SNP using the Fisher probability test statistic (4), as implemented in the software package Genetic Data Analysis. Empirical P values for the Hardy-Weinberg equilibrium test were based on 10,000 permutation tests.

The MDR method was first described by Moore and colleagues (5-8). Briefly, this method is designed to improve the identification of factors associated with disease risk by reducing the dimensionality of multifactor information. The method involves several steps: in the first step, the data were divided into a training set (consisting of 9/10 of the data) and an independent testing set (consisting of the remaining 1/10 of the data) as part of cross-validation. In the second step, a set of n factors (in this case, SNPs) were selected, where n = 1, 2, 3, and 4. In steps 3 and 4, the n SNPs and their possible multifactor classes are represented in n dimensional space, e.g., for two SNPs with three genotypes each, there are nine possible two–locus-genotype combinations. The ratio for the number of cases to the number of controls was calculated within each multifactor class. Each multifactor class in n dimensional space was then labeled as “high risk” if the case to control ratio met or exceeded a threshold (for example, 1.0), or as “low risk” if that threshold was not exceeded, thus reducing the n dimensional space to one dimension with two levels (low risk and high risk). In the fifth step, the model that gave the lowest misclassification error (error in classifying cases and controls based on high risk or low risk in the training set) was selected for each set of n SNPs. In step six, a prediction error (error in classifying disease status in the testing set) was estimated for each model selected in step 5, as a cross-validation procedure. Steps 1 to 6 were repeated 10 times using a random seed number. We did this entire 10-fold cross-validation procedure 10 times, using different random seed numbers, to reduce the chance of observing spurious results due to chance divisions of the data. In addition to the misclassification error and prediction error, we also estimated a cross-validation consistency, defined as a percentage of the same combination of SNPs selected as the best model among different cross-validation data sets, for each set of n SNPs.

We determined the statistical significance of the observed prediction error of the best model for each set of n SNPs by empirical simulations as described below. We first generated a data set with no association between prostate cancer and SNPs by randomly permuting case and control status among the CAPS subjects. We then did the above 10-fold cross-validation MDR analysis for each generated data. We repeated these two steps 1,000 times for each set of n SNPs. Empirical P values were based on the number of prediction errors estimated among the 1,000 simulations that were as small as or smaller than the observed prediction errors. The simulations were done using a 1,024-CPU IBM supercomputer cluster.

To decrease the effect of missing data on the results, we removed SNPs with ≥5% missing data. We also removed the subjects with missing data on 10 or more SNPs (28 cases and 10 controls). Furthermore, to decrease the effect of strong linkage disequilibrium between SNPs in the same gene on the MDR analysis, when SNPs were in strong pair-wise LD, defined as D′ > 0.8, one of the pair was randomly dropped. Inclusion of SNPs that are highly correlated may lead to unstable results because MDR analyses report only the best predictor (SNP) and these highly correlated SNPs may compete for the best predictor. However, removing highly correlated SNPs may increase the chance of detecting a possible haplotype effect, or a cis effect of two or more functional SNPs in a single gene. Among the 104 SNPs of 20 genes genotyped in this CAPS population, 57 SNPs were included in the MDR analysis. Finally, we are aware that an unbalanced number of cases and controls may affect the results of MDR analyses. Therefore, to decrease the effect of an unbalanced number of cases and controls on the MDR results, we randomly selected 585 men from the control pool of 770 subjects, with replacement, to obtain a balanced number of cases and controls for the MDR analysis. This approach, although fully utilizing the genotype information of cases, may introduce an extra-correlation among the controls. However, the use of a cross-validation procedure to estimate prediction error and the use of a permutation procedure to determine significance levels in our analyses may relieve this concern to some degree.

The position and minor allele frequency in cases and controls for each of the 57 SNPs are presented in Table 1. All SNPs were in Hardy-Weinberg equilibrium (P > 0.05) in cases and controls. The results of the MDR analysis for each n SNP combination are presented in Table 2. The model with the lowest prediction error and highest cross-validation consistency for each n SNP combination was presented.

Table 2.

Results from MDR analysis

Number of factors consideredBest candidate modelAverage cross-validation consistency (%)Average classification error (%)Average prediction error (%)
TLR1* 75 46.19 47.41 
MIC1, TLR5 51 44.16 46.21 
MIC1, TLR3§, TLR5 48 41.78 45.46 
IL-10, IL1RN, TIRAP**, TLR5 56 37.67 43.28†† 
Number of factors consideredBest candidate modelAverage cross-validation consistency (%)Average classification error (%)Average prediction error (%)
TLR1* 75 46.19 47.41 
MIC1, TLR5 51 44.16 46.21 
MIC1, TLR3§, TLR5 48 41.78 45.46 
IL-10, IL1RN, TIRAP**, TLR5 56 37.67 43.28†† 
*

TLR1 (rs5743604).

MIC1 (rs1058587).

TLR5 (IIPGA-5187).

§

TLR3 (rs3775296).

IL10 (rs1800896).

IL1RN (rs878972).

**

TIRAP (14115).

††

P = 0.019.

When SNPs were considered one at a time, the TLR1 SNP (rs5743604) had the highest cross-validation consistency (75%), and the lowest classification error (46.19), and prediction error (47.41%) among all 57 SNPs. Subjects with the “CT” or “CC” genotype had a high risk for prostate cancer, and therefore were classified as affected by this model. The prediction error, however, was not statistically significant. The empirical P value for this prediction error was 0.19, based on 1,000 permutations. It is interesting to note that this result is similar, albeit weaker, to the results of a single SNP χ2 test for allele frequency difference between cases and controls. The allele frequency for the minor allele “C” of this SNP was significantly higher in cases (0.24) than in controls (0.20), P = 0.002, and was the most significant among these 57 SNPs.

When SNPs were considered two at a time, the SNPs from MIC1 (rs1058587) and TLR5 (IIPGA-5187) had the highest cross-validation consistency (51%) and the lowest classification error (44.16%) and prediction error (46.21%) among all the possible combinations of two SNPs. As presented in Table 3, subjects with five combinations of genotypes had a high risk of prostate cancer. These five risk genotypes do not follow simple dominant, recessive, or additive models for any alleles of the two SNPs. The prediction error was not statistically significant, with an empirical P = 0.12 based on 1,000 permutations. Evidently, the ability to predict prostate cancer status using this two-SNP model was improved over the one-SNP model described above.

Table 3.

The model with the lowest misclassification and prediction error among all the two-SNP combinations

MIC1 (rs1058587)Number of cases / number of controls TLR5 (IIPGA-5187)
CCTCTT
CC 166 / 195 = 0.85 354 / 329 = 1.08 171 / 117 = 1.46 
GC 159 / 138 = 1.15 222 / 282 = 0.79 117 / 155 = 0.75 
GG 17 / 23 = 0.74 66 / 35 = 1.88 28 / 23 = 1.2 
MIC1 (rs1058587)Number of cases / number of controls TLR5 (IIPGA-5187)
CCTCTT
CC 166 / 195 = 0.85 354 / 329 = 1.08 171 / 117 = 1.46 
GC 159 / 138 = 1.15 222 / 282 = 0.79 117 / 155 = 0.75 
GG 17 / 23 = 0.74 66 / 35 = 1.88 28 / 23 = 1.2 

When SNPs were considered three at a time, the SNPs from MIC1 (rs1058587), TLR3 (rs3775296), and TLR5 (IIPGA-5187) had the highest cross-validation consistency (48%), and the lowest classification error (41.78%) and prediction error (45.46%) among all the possible combinations of three SNPs. The addition of the TLR3 SNP to the best two SNP interactions (MIC1 and TLR5) described above, improved the ability to predict prostate cancer risk. Subjects with 10 combinations of genotypes had a high risk for prostate cancer (data not shown). These 10 risk genotypes do not follow simple dominant, recessive, or additive models for any alleles of the three SNPs. Again, the prediction error was not statistically significant, with an empirical P = 0.06 based on 1,000 permutations.

When SNPs were considered four at a time, the SNPs from IL-10 (rs1800896), IL-1RN (rs878972), TIRAP (14115), and TLR5 (IIPGA-5187) had the highest cross-validation consistency (56%) and the lowest prediction error (43.28%) among all the possible combinations of four SNPs. The prediction error was statistically significant, with an empirical P = 0.019 based on 1,000 permutations. Although this prediction error is far from a perfect 0%, it is an important improvement from the a priori 50% chance in predicting prostate cancer status. Forty-three combinations of genotypes of these four SNPs had a high risk for prostate cancer (data not shown). These 43 combinations again did not follow simple dominant, recessive, or additive models for any alleles of the four SNPs. When these four SNPs were examined one at a time using a χ2 test for allele frequency difference, only the SNP in TIRAP (14115) had a significantly different allele frequency between cases and controls (P = 0.04), whereas no significant differences in the allele frequencies between cases and controls were observed for the SNPs in IL-10 (P = 0.28), IL-1RN (P = 0.35), and TLR5 (P = 0.28).

The hypothesis that multiple genes are involved in the predisposition to prostate cancer is well supported by our understanding of the biology of prostate cancer development and by observational data from epidemiologic and genetic epidemiologic studies. With our ability to systematically genotype haplotype-tagging SNPs in 20 genes among thousands of subjects, and the availability of the MDR method and the computing power required to model high-order interactions, our study represents the first major attempt to explore the effects of gene-gene interaction on prostate cancer risk. The large, homogeneous, and epidemiologically sound study population increases the likelihood that our findings represent a true interaction effect between these four genes on prostate cancer risk.

Several bodies of evidence suggest that a gene-gene interaction plays a role in susceptibility to common human diseases (9). First, the idea of gene-gene interactions has been around for nearly 100 years. The observed deviations from Mendelian ratios suggested interactions between genes. Second, the ubiquity of biomolecular interactions in gene regulation and biochemical and metabolic systems suggests that relationships between DNA sequence variants and clinical end points are likely to involve gene-gene interactions. Third, positive results from studies of single polymorphisms typically do not replicate across independent samples. This is true for both linkage and association studies. Fourth, gene-gene interactions are commonly found when properly investigated. For example, Nelson and colleagues (10) simultaneously considered multiple polymorphic loci to identify combinations of genotypes that are most strongly associated with variation in triglycerides using a combinatorial partitioning method. They identified nonadditive epistatic interactions between multiple loci in the absence of independent main effects. If gene-gene interactions play roles in the risk for common diseases, it suggests that we need a research strategy for identifying common disease susceptibility genes that embraces, rather than ignores, the complexity of the genotype to phenotype relationship (11).

Moore and colleagues introduced the MDR method as a way to reduce the dimensionality of multilocus information, in order to improve the identification of polymorphism combinations associated with disease risk. The MDR method is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, they showed that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When this was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes, COMT, CYP1B1, and CYP1A1 (5). Similar results have been observed for other common diseases such as atrial fibrillation (12), type II diabetes (13), and essential hypertension (14). The MDR method is an example of the type of analytic retooling that is needed for common disease research (11).

Exploring the effect of gene-gene interaction on prostate cancer risk among genes in the inflammation pathway is relevant. Chronic or recurrent inflammation has been implicated in the initiation and development of multiple human cancers, including those affecting the stomach, liver, colon, and urinary bladder (15, 16), and a role for chronic inflammation in the etiology of prostate cancer has been proposed (17-20). The fact that two of the three prostate cancer susceptibility genes (MSR1 and RNASEL) identified through positional cloning approaches are involved in innate immunity and inflammation has suggested a further link between inflammation and prostate cancer (21, 22). Sequence variants of genes in the inflammation pathway may affect the hosts' ability to regulate inflammation responses and may ultimately modify prostate cancer risk. If a sequence variant itself is sufficient to confer an increased risk to prostate cancer, it can be detected by comparing the frequency of the variant in cases and controls, assuming that there is a sufficient number of subjects. This may be one of the explanations for our observations of prostate cancer association with sequence variants in the TLR4 gene (2). The TLR6-TLR1-TLR10 gene cluster (23), and MIC1 (24) in the CAPS population. On the other hand, if a sequence variant confers an increased risk to prostate cancer only in the presence of other risk variants, they can only be detected when these variants are studied simultaneously by modeling gene-gene interactions. The four-gene interaction identified from this study is consistent with this scenario. Among four implicated SNPs of four genes, only the SNP in TIRAP (14115) had a significantly different allele frequency between cases and controls, whereas no significant differences in the allele frequencies between cases and controls were observed for the other three SNPs.

An interaction between TLR5, IL-1RN, TIRAP, and IL-10 is biologically plausible. TLR5 and IL-1 receptors recognize and bind bacteria, viruses, and other ligands. IL-1RN is a protein that binds to IL-1 receptors and inhibits the binding of IL-1α and IL-1β. The engagement of ligands on these receptors initiates a series of downstream signaling cascades, including adaptor proteins such as TIRAP. The union of adaptor molecules with receptors leads to the activation of IL-1R-associated kinase (IRAK), and results in the production of various pro- or antiinflammatory cytokines such as IL-10. Therefore, sequence variants in these genes may interact, in a complex fashion, to regulate physiologic and pathophysiologic immune and inflammatory responses and modify prostate cancer risk.

Examining the 43 SNP combinations of the four genes that increased prostate cancer risk, no simple pattern of dominant, recessive, or additive effects of any alleles can be inferred. The complexity of the interactions between these genes makes it difficult to detect these interactions through modeling interaction terms in conventional logistic regression analyses, for several reasons. First, any SNPs that do not impart a main effect are likely to be missed in the logistic regression. For example, the SNPs in IL-10, IL-1RN, and TLR5 would not typically be included in most logistic regression analyses because they did not show a main effect. Second, with four SNPs, there will be many contingency table cells that have few or no data points. This will lead to variable estimates that have very large SEs resulting in an increased type I error (25). Third, the lack of a simple pattern of dominant, recessive, or additive action of alleles makes it nearly impossible to model the interaction terms. With these caveats, we retrospectively modeled the main and interaction effects using a logistic regression for these four SNPs. Four main effects (additive model), and six pair-wise interactions between the four SNPs were modeled. An interaction between IL-10 (rs1800896) and TIRAP (14115) was statistically significant (P = 0.016). No main effect or other interaction was statistically significant. The advantage of the MDR approach is that we can effectively classify individuals into high- or low-risk groups based on the genotype at each SNP without knowing the mechanisms of the interaction. The power and advantages of the MDR approach in identifying risk genes and in predicting prostate cancer risk will be more prominent as the number of genes increases.

It is worth noting that the best four-SNP interaction model did not include the genes that were previously identified to be associated with prostate cancer risk using single SNP analysis (2, 23, 24). This is not surprising because the four-SNP model considerably improved the ability to predict the disease risk (43.28% prediction error) compared with a single SNP (47.41% prediction error). The results from the MDR analysis were not in conflict with our previous single-gene analysis. When SNP's were considered one at a time, we found a SNP in TLR1 to be the best predictor for prostate cancer, similar to the results of our single-gene analysis (23). A comparison of the results from our current study and our previous single-gene studies shows the advantage of considering multiple SNPs in genetic association studies.

As a data mining approach, it is important to note that the results are suggestive and should be subjected to confirmation. The 1,000 permutation test suggested that the identified four-gene interaction is unlikely due to chance; however, confidence in this result will be increased if it can be confirmed. A confirmatory test is planned for the second phase of the CAPS study, in which an additional set of >1,000 cases and 1,000 controls are being recruited using the same protocol and criteria as in the first phase of CAPS.

In summary, using the MDR method to explore the effect of gene-gene interactions among many genes in the inflammation pathway on prostate cancer risk in a large case-control population, we have identified a four-gene interaction that significantly predicts prostate cancer risk. Whereas the ability to predict prostate cancer status is limited with a 43.28% prediction error, our ability to analyze a large number of SNPs in a large sample size is one of the first efforts in exploring the effect of high-order gene-gene interactions on prostate cancer risk, and this is an important contribution to this new and quickly evolving field. Future studies that include additional genes and environmental factors in a systematic assessment using methods such as MDR will likely improve upon this prediction error.

Grant support: Swedish Cancer Foundation and Spear grant from the Umeå University Hospital, Umeå, Sweden. This study was also partially funded by National Cancer Institute grants to J. Xu (CA95052, CA105055, and CA106523).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The authors thank all study participants in the CAPS1 study. We thank Ulrika Lund for coordinating the study at Karolinska Institute, all urologists who recruited their patients to this study and provided clinical data to the national registry of prostate cancer. We also thank Karin Andersson, Susan Okhravi-Lindh, Gabriella Thorén-Berglund, and Margareta Åswärd at the Regional Cancer Registries in Umeå, Uppsala, Stockholm-Gotland, and Lindköping. In addition, we thank Sören Holmgren and the personnel at the Medical Biobank in Umeå for skillfully handling the blood samples.

1
Schaid DJ. The complex genetic epidemiology of prostate cancer.
Hum Mol Genet
2004
;
13
:
R103
–21.
2
Zheng SL, Augustsson-Balter K, Chang B, et al. Sequence variants of toll-like receptor 4 are associated with prostate cancer risk: results from the CAncer Prostate in Sweden Study.
Cancer Res
2004
;
24
:
2918
–22.
3
Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data.
Am J Hum Genet
2001
;
68
:
978
–89.
4
Weir BS. Sunderland, (MA): Sinauer Association; 1996.
5
Ritchie MD, Hahn LW, Roodi N, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer.
Am J Hum Genet
2001
;
69
:
138
–47.
6
Hahn LW, Ritchie MD, Moore JH. Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions.
Bioinformatics
2003
;
19
:
376
–82.
7
Ritchie MD, Hahn LW, Moore JH. Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity.
Genet Epidemiol
2003
;
24
:
150
–7.
8
Moore JH. Computational analysis of gene-gene interactions using multifactor dimensionality reduction.
Expert Rev Mol Diagn
2004
;
4
:
795
–803.
9
Moore JH. The ubiquitous nature of epistasis in determining susceptibility to common human diseases.
Hum Hered
2003
;
56
:
73
–82.
10
Nelson MR, Kardia SL, Ferrell RE, Sing CF. A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation.
Genome Res
2001
;
1
:
458
–70.
11
Thornton-Wells TA, Moore JH, Haines JL. Genetics, statistics and human disease: analytical retooling for complexity.
Trends Genet
2004
;
20
:
640
–7.
12
Tsai CT, Lai LP, Lin JL, et al. Renin-angiotensin system gene polymorphisms and atrial fibrillation.
Circulation
2004
;
109
:
1640
–6.
13
Cho YM, Ritchie MD, Moore JH, et al. Multifactor-dimensionality reduction shows a two-locus interaction associated with Type 2 diabetes mellitus.
Diabetologia
2004
;
47
:
549
–54.
14
Williams FM, Cherkas LF, Spector TD, MacGregor AJ. A common genetic factor underlies hypertension and other cardiovascular disorders.
BMC Cardiovasc Disord
2004
;
4
:
20
.
15
Balkwill F, Mantovani A. Inflammation and cancer: back to Virchow?
Lancet
2001
;
357
:
539
–45.
16
Coussens LM, Werb Z. Inflammation and cancer.
Nature
2002
;
420
:
860
–7.
17
Nelson WG, De Marzo AM, Isaacs WB. Prostate cancer.
N Engl J Med
2003
;
349
:
366
–81.
18
Nelson WG, DeWeese TL, DeMarzo AM. The diet, prostate inflammation, and the development of prostate cancer.
Cancer Metastasis Rev
2002
;
21
:
3
–16.
19
De Marzo AM, Marchi VL, Epstein JI, Nelson WG. Proliferative inflammatory atrophy of the prostate: implications for prostatic carcinogenesis.
Am J Pathol
1999
;
155
:
1985
–92.
20
Shah R, Mucci NR, Amin A, Macoska JA, Rubin MA. Proliferative inflammatory atrophy of the prostate: implications for prostatic carcinogenesis.
Am J Pathol
2001
;
158
:
1767
–73.
21
Carpten J, Nupponen N, Isaacs S, et al. Germline mutations in the ribonuclease L gene in families showing linkage with HPC1.
Nat Genet
2002
;
302
:
181
–4.
22
Xu J, Zheng SL, Komiya A, et al. Germline mutations and sequence variants of the macrophage scavenger receptor 1 gene are associated with prostate cancer risk.
Nat Genet
2002
;
32
:
321
–5.
23
Sun J, Wiklund F, Zheng SL, et al. Sequence variants in Toll-like receptor gene cluster (TLR6-TLR1-TLR10) and prostate cancer risk.
J Natl Cancer Inst
2005
;
97
:
525
–32.
24
Lindmark F, Zheng SL, Wiklund F, et al. H6D polymorphism in macrophage-inhibitory cytokine-1 gene associated with prostate cancer.
J Natl Cancer Inst
2004
;
96
:
1248
–54.
25
Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models.
Ann Intern Med
1993
;
118
:
201
–10.