Malignant glioma is a complex disease involving many genes from different biological pathways; therefore, a study with only a few single nucleotide polymorphisms (SNPs) is insufficient to delineate the genetic etiology of malignant glioma. Analysis of multiple SNPs using parametric methods such as logistic regression can be challenging due to the increasing dimensionality and a lack of statistical power to detect interactions among SNPs. Random forests (RF) analysis is a non-parametric method to analyze large number of SNPs and account for interaction among SNPs without the need for model specification. In addition, RF ranks SNPs by importance and thus has the capability to reduce the number of SNPs to a small subset for further investigation. The current analysis included 112 Caucasian patients with glioblastoma multiforme and 112 Caucasian healthy controls frequency-matched to cases by age and gender. The 224 subjects were genotyped using a commercially available 10,000 non-synonymous SNP assay panel. Of the 10,000 SNPs genotyped, 1,190 SNPs with a high a priori hypothesis for glioma association and a minor allele frequency greater than 5% were selected for RF analysis. The top 30 SNPs ranked by the first RF analysis were chosen to perform further iterations of RF analyses. With each iteration of RF analysis, the lowest ranked SNP was excluded, and the remaining SNPs were used to build another RF. Using this algorithm, we were able to narrow 1,190 SNPs down to a parsimonious set of 17 SNPs that correctly classified 76.8% of the subjects by case-control status (out-of-bag error rate = 23.2%). These 17 SNPs belong to 14 genes (EXO1, LHCGR, CSMD2, WRN, BIRC8, ATM, ZNF228, MKI67, FGF23, FCGBP, SLC16A1, GCAT, MTRF, and ABCC4) that represent important biological pathways or chromosomal locations potentially associated with glioma. For example, EXO1, WRN, and ATM are important genes involved in DNA repair. Furthermore, a study has shown that WRN stimulates the exonucleolytic and endonucleolytic cleavage activity of EXO1. BIRC8 is an inhibitor of apoptosis and maps to chromosomal region 19q13.11-q13.43 along with ZNF228 and FCGBP. CSMD2 maps to chromosomal region 1p35.1-p34.3, a region that may contain a suppressor of oligodendroglioma. MKI67 is a nuclear antigen expressed only in proliferating cells. Our analysis showed that RF is a powerful data mining tool capable of screening a large number of SNPs and identifying a small subset of potentially important SNPs; further studies to attempt replication of the association of these SNPs with glioblastoma in independent series and to investigate the functional roles of these genes in malignant glioma are warranted.

98th AACR Annual Meeting-- Apr 14-18, 2007; Los Angeles, CA