Background: In the post-GWAS era, investigating gene-environment interactions for disease in large-scale observational studies poses new challenges. Existing single studies are not originally powered to investigate gene-environment interactions when genome-wide variation is taken into account. Pooling studies still may not offer enough power, and introduces the problem of heterogeneity in study designs and specifically in the measurement of environmental characteristics. Therefore, current approaches are multistage, with the first stage aimed at prioritizing SNPs, thereby reducing the number of SNPs before testing their interactions with environmental factors.

Methods: Current strategies for prioritizing SNPs in gene-environment interaction studies 1) with available genome-wide data for genetic variation, or 2) without readily available genome-wide data for genetic variation as in The Netherlands Cohort Study on diet and cancer are described and discussed.

Results: In current gene-environment interaction studies with genome-wide data for genetic variation, several approaches are used to prioritize SNPs to be tested in further gene-environment interactions. Two screening test statistics proposed are either marginal genetic association or correlation based. The underlying idea is that SNPs likely to show interactions with environment will at least show a slight association with outcome or correlation with the environment. Gene-environment interactions are then usually first tested on a SNP by SNP basis.

In the Netherlands Cohort Study on diet and cancer (NLCS), genome-wide data on genetic variants are not readily available. For ongoing projects within the NLCS, the strategies for SNP reduction to a priority list of up to 100 SNPs, depend on the hypothesis to be tested. First, genes are selected from pathways thought to be involved in the association between the environmental factor and the outcome. After gene selection using the KEGG database and literature, further prioritizing of SNPs is conducted using literature through MEDLINE and the HuGE Navigator using a pre-defined search strategy focusing on a particular pathway. The search strategy includes the SNP and either the outcome, related outcomes (that share similar etiology), or environmental factors. Inclusion criteria for SNPs further include a minor allele frequency over 10% in North Europeans or Caucasians, a positive SNP validation status according to dbSNP, and a position within 10 kb upstream of the 5'untranslated region and 10 kb downstream of the 3'untranslated region of the gene. SNPs are then ranked according to a pre-defined scoring strategy based on the number of significant associations observed in the literature with outcome and exposure. Gene-environment interactions can be explored using several complementary approaches, both exploratory (random forests and multifactor dimensionality reduction techniques) and parametric methods.

Conclusions: Different strategies for prioritizing SNPs for gene-environment interaction studies exist. They show some similarity in that prioritizing is based on the association between the SNP and the outcome and/or the SNP and the environment, either in own data (availability of tag SNPs) or in the literature (e.g. NLCS pathway based approach). There is a need for a formal discussion and a more uniform strategy for prioritizing SNPs in gene-environment interaction studies especially when focusing on specific pathways.

This proffered talk is also presented as Poster 53.

Citation Format: Matty P. Weijenberg, Colinda C.J.M. Simons, Leo J. Schouten, Milan S. Geybels, Bas A. Verhage, Janneke G. Hogervorst, Andras P. Keszei, Roger W. Godschalk, Frederik-Jan van Schooten, Piet A. van den Brandt. Prioritizing SNPs for gene-environment interaction studies. [abstract]. In: Proceedings of the AACR Special Conference on Post-GWAS Horizons in Molecular Epidemiology: Digging Deeper into the Environment; 2012 Nov 11-14; Hollywood, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2012;21(11 Suppl):Abstract nr PR4.