Genome-wide association (GWA) studies of colorectal cancer (CRC) have successfully replicated several susceptibility loci in populations of European ancestry. Most of the statistically associated tag SNPs do not correlate with any coding variants suggesting that the underlying functional variants responsible for the associations likely impact regulatory regions. Advancing basic research and translating GWAS findings into clinical benefit will be facilitated by identifying causal variants and their associated biological mechanism.

We developed a ‘post-GWAS’ strategy to functionally dissect susceptibility regions that combines in silico prediction of functional alleles with a novel allelic reporter assay to quantify effects on gene expression. Functional predictions were made by compiling a list of correlated SNPs for each published CRC susceptibility loci and annotating each list for potential regulatory evidence consistent with enhancers, promoters, insulators and silencers. Using the UCSC Genome Browser, we aligned each variant with a combined browser view of several ENCODE tracks and compared allelic regions in JASPAR and ConSite to potentially identify variants that alter transcription factor binding sites (TFBS). A transparent overlay of various cell lines in the Histone Modifications Track was used to characterize functional regulation at low resolution. The DNase hypersensitivity track provided a more precise demarcation of the region for plasmid design. Evidence of altered binding sites through ENCODE's ChIP-Seq TFBS track, JASPAR and ConSite helped prioritize variants for further functional analysis. Use of 46-way PhastCons track in the Genome Browser was used as secondary evidence for a regulatory region, but lack of conservation did not rule out a candidate. Priority for in vitro analysis was based on strength of the functional annotation and ability to identify CRC associated genes that could reasonably be regulated by these predicted regulatory elements. Our in silico analysis identified 2 strong candidate SNPs falling in the promoter regions of GREM1 and CDH1, as well as, 4 probable candidates in the putative enhancer regions of CDH1 and BMP2.

In order to test our in silico functional candidates, we designed allelic promoter and enhancer constructs for each candidate variant and inserted them into custom pCI-neo vectors to drive expression of the ZSgreen reporter gene. These plasmids were semi-stably co-transfected into a colon cancer cell line, and after 14 days of growth cells with integrated plasmid were Fluorescence-Activated Cell Sorted into 4 quartiles of ZSgreen expression. To quantify the relative activity of our allelic plasmids, each pool was sequenced with Illumina's high through-put sequencing technology. We tested for statistically significant log fold changes between each cotransfected plasmid. Due to interdependence of signal between alleles we tested differences using a Poisson Generalized Linear Model in a step-wise manner. Blinded modeling of this method revealed that this approach could be used to test at least 24 allelic plasmids simultaneously. Although this study focused on susceptibility loci for CRC, this framework could be extended to rapidly test many functional hypotheses in other complex genetic diseases providing a high throughput approach to gain substantial new insight into the functional effects of GWAS findings.

This proffered talk is also presented as Poster 63.

Citation Format: Stephanie A. Rosse, Christina T. Chen, Orsalem J. Kahsai, Ahmad S. Zebari, Paul W. Livermore Auer, Graham Casey, Ulrike Peters, Christopher S. Carlson. Framework for post-GWAS functional annotation of regulatory regions associated with susceptibility loci for colorectal cancer. [abstract]. In: Proceedings of the AACR Special Conference on Post-GWAS Horizons in Molecular Epidemiology: Digging Deeper into the Environment; 2012 Nov 11-14; Hollywood, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2012;21(11 Suppl):Abstract nr PR3.