Background:

Currently known associations between common genetic variants and colorectal cancer explain less than half of its heritability of 25%. As alcohol consumption has a J-shape association with colorectal cancer risk, nondrinking and heavy drinking are both risk factors for colorectal cancer.

Methods:

Individual-level data was pooled from the Colon Cancer Family Registry, Colorectal Transdisciplinary Study, and Genetics and Epidemiology of Colorectal Cancer Consortium to compare nondrinkers (≤1 g/day) and heavy drinkers (>28 g/day) with light-to-moderate drinkers (1–28 g/day) in GxE analyses. To improve power, we implemented joint 2df and 3df tests and a novel two-step method that modifies the weighted hypothesis testing framework. We prioritized putative causal variants by predicting allelic effects using support vector machine models.

Results:

For nondrinking as compared with light-to-moderate drinking, the hybrid two-step approach identified 13 significant SNPs with pairwise r2 > 0.9 in the 10q24.2/COX15 region. When stratified by alcohol intake, the A allele of lead SNP rs2300985 has a dose–response increase in risk of colorectal cancer as compared with the G allele in light-to-moderate drinkers [OR for GA genotype = 1.11; 95% confidence interval (CI), 1.06–1.17; OR for AA genotype = 1.22; 95% CI, 1.14–1.31], but not in nondrinkers or heavy drinkers. Among the correlated candidate SNPs in the 10q24.2/COX15 region, rs1318920 was predicted to disrupt an HNF4 transcription factor binding motif.

Conclusions:

Our study suggests that the association with colorectal cancer in 10q24.2/COX15 observed in genome-wide association study is strongest in nondrinkers. We also identified rs1318920 as the putative causal regulatory variant for the region.

Impact:

The study identifies multifaceted evidence of a possible functional effect for rs1318920.

This article is featured in Highlights of This Issue, p. 917

Though alcohol consumption is considered a risk factor for colorectal cancer, meta-analyses across our large consortia have revealed a J-shape relationship with alcohol consumption. Light-to-moderate drinking is the group at the lowest risk of colorectal cancer, while the risk of colorectal cancer increases slightly in nondrinkers and substantially in very heavy drinkers (1). Many mechanisms have been proposed to explain the relationship between alcohol consumption and colon carcinogenesis (2), but the lower risk of colorectal cancer observed among light-to-moderate drinkers relative to nondrinkers and heavy drinkers has only recently been described and is poorly understood. As a possible explanation, the increased risk of colorectal cancer in nondrinkers may be due to residual confounding because some of these individuals may abstain from or stop drinking for reasons related to colorectal cancer risk factors or health status, including alcoholism. In fact, the McNabb and colleagues manuscript describing the J-shape explicitly states that the observed inverse association could be explained by residual confounding or chance (1). Another possibility is that light-to-moderate drinking has a protective effect on risk of colorectal cancer, even though heavier consumption is detrimental. However, this hypothesis is only supported by very preliminary evidence of an anti-inflammatory effect of light-to-moderate drinking on the colon in rats (3, 4) and of low levels of ethanol exposure upregulating liver detoxification enzymes (5, 6), so future research is needed to explore any possible protective effects.

Given this complex relationship, it is possible that there are SNPs that affect only nondrinkers or heavy drinkers or that known loci have unknown interactions with alcohol consumption that would be difficult to detect in genome-wide association studies (GWAS) of colorectal cancer. In fact, common SNPs identified through GWAS and hereditary syndromes explain less than half of the roughly 25% of colorectal cancers that aggregate in families (7). Because alcohol consumption is widespread in the US population and there are known variants in genes like ADH and ALDH that have strong effects on alcohol metabolism (8), SNPs that have important interactions with alcohol may help fill in this missing heritability (9). In addition, variant effects in noncoding regions of the genome may play an important role, through interactions with mechanisms like alcohol-induced epigenetic changes in cancer (10). To search for important relationships with this established risk factor, we conducted genome-wide interaction analyses to test for SNPs that modify the effects of alcohol consumption on risk of colorectal cancer, including a novel hybrid two-step approach that aims to improve statistical power.

Study population

We pooled individual-level genomic and epidemiological data from studies participating in the Colon Cancer Family Registry (CCFR), the Colorectal Transdisciplinary Study, and the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO). Study details have been previously published (11–13) and can be found in Supplementary Table S1. For cohort studies, nested case–control sets were assembled using risk-set sampling. Controls were matched on factors such as age, sex, race, and enrollment date or trial group, when applicable. Colorectal adenocarcinoma cases were confirmed by medical records, pathologic reports, or death certificate information. For the small subset of advanced adenoma cases, matched controls displayed polyp-free sigmoidoscopy or colonoscopy at the time of adenoma selection. All participants gave written informed consent and studies were approved by their respective Institutional Review Boards.

Analyses were limited to individuals of European ancestry based on self-reported race and clustering of principal components with 1000 Genomes EUR superpopulation, yielding an initial sample size of 96,735. There were approximately 3,660 participants excluded from our main analysis based on ancestry. We excluded studies based on availability of alcohol consumption information, in addition to studies whose populations lacked sufficient variability in alcohol intake levels, were matched on smoking status, or studies where participants had a history of adenomas at baseline (n = 19,259). We further excluded samples that showed cryptic relatedness, were duplicates with lower genotyping quality, had genotyping or imputation errors, or were age outliers (n = 3,377), creating a final sample size of 74,099.

Exposure definition

Demographic and environmental risk factor information was self-reported either at in-person interviews or via structured questionnaires. Harmonization of alcohol intake information consisted of a multi-step procedure performed at Fred Hutchinson Cancer Research Center, which is the GECCO coordinating center (14). Briefly, common data elements (CDE) were defined a priori. Study questionnaires and data dictionaries were examined and, through an iterative process of communication with data contributors, elements were mapped to these CDEs. Definitions, permissible values, and standardized coding were implemented into a single database via SAS and T-SQL. Resulting data were checked for errors and outlying values within and between studies.

Food frequency questionnaires and diet histories were used to ascertain alcohol intake and other risk factors at the reference time, typically ranging from three months to two years prior to diagnosis for case–control studies and at enrollment for cohort studies (Supplementary Text 1). The harmonized alcohol intake variable is expressed as grams per day, and is categorized into three groups: nondrinkers (≤1 g/day; we did not set this to 0 as some studies included small amounts of alcohol intake from fermented foods), light-to-moderate drinkers (>1 to ≤28 g/day), and heavy drinkers (>28 g/day; ref. 15). To account for the potentially disparate biological mechanisms driving the J-shaped association between alcohol use and colorectal cancer, we conducted separate genome-wide interaction scans: nondrinkers versus light-to-moderate drinkers and heavy drinkers versus light-to-moderate drinkers. Light-to-moderate drinkers serve as the reference group for both scans as they have the lowest risk of colorectal cancer.

Genotyping and imputation

Details on genotyping and quality control have been previously published (11); genotyping platforms used are summarized in Supplementary Table S1. Briefly, genotyped SNPs were excluded based on call-rate less than 95% to 98%, lack of Hardy Weinberg equilibrium with P < 1 × 10–4, discrepancies between reported and genotypic sex, and discordant calls between duplicates. Autosomal SNPs of all studies were imputed to the Haplotype Reference Consortium r1.1 (2016) reference panel via the Michigan Imputation Server (16) and converted into a binary format for data management and analyses using R package BinaryDosage (Morrison 2019). We filtered imputed SNPs based on a pooled MAF greater than or equal to 1% and imputation accuracy of r2 > 0.80. After imputation and quality control, a total of over 7.2 million common SNPs were used.

Statistical analysis

Interaction tests

To evaluate main effects, we used logistic regression models adjusted for age at the reference time, sex, and total energy consumption (kcal/day) and stratified by study. Study-specific results were combined using random-effects meta-analysis models using the Hartung-Knapp method to obtain summary ORs and 95% confidence intervals (CI) across studies (17). Random effects were used given the large number of studies and possible heterogeneity of associations (17). We calculated the heterogeneity P values using Cochran's Q statistics (18), and funnel plots were used to identify studies with outlying ORs. These analyses were performed using R package meta (ref. 19).

We performed genome-wide interaction scans using the R package GxEScanR, which implements several interaction testing methods (20), including traditional logistic regression GxE and joint tests of association, as described below. Imputed SNP dosages were modeled as continuous variables (21). For the purposes of this study, |$E$| refers to alcohol exposure, |$G$| refers to a SNP included in the genome-wide tests, |$D$| refers to colorectal cancer disease status, and |$C$| refers to a set of adjustment covariates. To test for multiplicative scale interaction, we fit conventional logistic regression models augmented with an interaction term of the form |$logit( {Pr( {D\ = \ 1 \vee G} )} )\ = \ {\beta _0} + {\beta _G}G + {\beta _E}E + {\beta _{GxE}}GxE + {\beta _C}C$| and tested |$H0:{\beta _{GxE}}\ = \ 0$|⁠. The quantity |$exp( {{\beta _{GxE}}} )\ = \ O{R_{\{ {GxE} \}}}$| captures departure from multiplicative associations of |$E$| and |$G$| on |$D$|⁠. The models were adjusted for age at the reference time, sex, study, total energy consumption (kcal/day), and the first three principal components from EIGENSTRAT to account for potential population substructure. For any significant findings, we conducted sensitivity analyses stratified by sex and tumor site and adjusted for body mass index (BMI), diabetes, education level, ever smoking, and study and sex-specific quartiles of red meat, fruit, and vegetable consumption. Age at reference time was missing for a single participant and was median imputed by study for cases and controls separately. For the remaining variables, we fit models using only subsets of individuals with available covariate information (complete case analysis). Total energy consumption was imputed for 25,247 individuals using study and sex-specific means; missingness was 18% across all studies excluding UK Biobank. For UK Biobank, imputation was not feasible because energy intake information was missing for all individuals in this study. Consequently, to retain UK Biobank in the analysis, total energy intake was set to 0. For the 2df joint test, we used likelihood ratio tests to jointly test |$H0:{\beta _G}\ = \ {\beta _{GxE}}\ = \ 0,df\ = \ 2$|⁠. To accommodate E|G associations, we also extended this to a 3df likelihood ratio test to jointly test |$H0:{\beta _G}\ = \ {\beta _{GxE}}\ = \ {\delta _G}\ = \ 0,df\ = \ 3$|⁠, where |${\delta _G}$| represents the association between |$G$| and |$E$| in a combined case–control sample (22, 23). We report two-sided P values calculated from these likelihood-ratio tests, and consider a P < 5 × 10–8 significant and P < 5 × 10–6 suggestive.

We also implemented a hybrid two-step method that prioritizes potential interaction loci by weighting GxE tests (step 2) based on the ranks of an independent test statistic, in this case the genetic main effects on colorectal cancer (step 1). Our approach modifies the original weighted hypothesis testing framework, which uses step 1 ranks to prioritize and partition SNPs into exponentially larger bins of fixed sizes (based on an initial bin size of 5 and an overall significance level of 0.05) and increasingly more stringent step 2 significance thresholds (24, 25). A limitation of the original approach is that the top bins are often filled with correlated markers from the same loci in analyses of imputed SNPs. To address this issue, our approach accommodates bins of varying sizes while properly controlling for type I error. Specifically, SNPs are partitioned into bins based on step 1 P value thresholds in expectation, which were calculated using the original predetermined bin sizes and assumed uniform distribution of 1 million independent tests. For step 2 GxE testing, we accounted for the influx of correlated markers into each bin by correcting for the effective number of tests, estimated using principal component analysis performed on bin-specific genotype correlation matrices (26). This modification alleviates multiple testing burden and improves statistical power, while maintaining an overall type I error rate of 0.05. We also estimated stratified ORs by modeling interactions between alcohol intake and posterior genotype probabilities.

Relevant regional plots were generated using the command line version (Standalone) of LocusZoom v1.3 (27). Measures of linkage disequilibrium (LD) were estimated using study population controls. Possible eQTL relationships were explored using the Genotype-Tissue Expression (GTEx V8) and the University of Barcelona and University of Virginia genotyping and RNA sequencing project (BarcUVa-Seq) (28) datasets. The data used for the analyses described in this manuscript were obtained from: the GTEx Portal on April 14, 2020 and dbGaP accession number phs000424.vN.pN on April 15, 2020. The most promising eQTL-gene association was tested in a subset of 35 human normal colon 3D organoid lines from an ongoing study in which lines were grown and expression was measured as described for the control condition in Devall and colleagues (29); lines were genotyped on the OncoArray beadchip, and the variant of interest was imputed with an r2 of 0.98 using the TOPMed reference panel (30). We then tested predicted expression of the eQTL-associated gene of interest for an interaction with alcohol consumption in data from the three consortia involved in this study (Supplementary Text 2).

Prediction of regulatory impact of candidate noncoding variants

We used ATAC-seq, DNASE-seq, H3K27ac histone chromatin immunoprecipitation sequencing (ChIP-seq), and H3K4me1 histone ChIP-seq datasets of primary tissue from healthy colon and tumor primary tissue samples from Scacheri and colleagues (31), as well as from three colorectal cancer cell lines (SW480, HCT116, COLO205). These datasets were processed through ENCODE ATAC-seq/DNASE-seq (32) and histone ChIP-seq pipelines (33) to perform alignment and peak calling. Dataset sources are indicated in Supplementary Table S2. –log10(P value) tracks were extracted from the MACS2 step of the pipeline for visualization in genome browsers. Irreproducible Discovery Rate (IDR; ref. 34) peak calls for ATAC-seq and DNASE-seq datasets, as well as naive overlap peak calls for histone ChIP-seq datasets, were determined from the ENCODE pipelines. The pyGenomeTracks (35) software package was used to visualize chromatin accessibility across the functional datasets and to plot –log10(P value) signal tracks. Peaks across samples from the same assay were concatenated across datasets, cropped to within 200 bp centered on the peak summit, and merged using bedtools (36) merge.

Gapped k-mer support vector machine models (LS-GKM; v0.1.0) with a center-weighted GKM kernel were trained to classify chromatin accessible regions against genomic background regions as a function of their underlying DNA sequences (37). Default parameters were used. Support vector machines (SVM) were trained via 10-fold cross-validation, where groups of chromosomes were split into folds (Supplementary Table S3). Separate SVM models were trained on DNase-seq data from Supplementary Table S2 with samples pooled across assays as described above (31). For each biosample, the SVMs were trained on 120,000 genomic regions. The positive set for training was generated by selecting the 60,000 most IDR significant MACS2 peak calls and generating 1-kb sequences centered on the summits of these peaks. The negative set was generated by selecting 60,000 1-kb regions from the genome at random, such that these regions were GC-matched to the positive set and did not overlap any DNAse peaks.

The resulting trained models for each of the five DNASE-seq datasets were used to score all variants on the Haplotype Reference Consortium (HRC) imputed panel (n = 39,117,106). For each SNP along the HRC panel, we centered a 1 kb sequence interval and obtained SVM model predictions for the reference and alternate alleles. The difference in model predictions of accessibility (prediction for alternate allele - prediction for reference allele) are the in-silico mutagenesis scores (ISM), or SNP effect scores. We confirmed that the ISM scores for the HRC panel were normally distributed using the Kolmogorov–Smirnov and Shapiro Wilkes tests (P values > 0.10) and derived Z-scores. Variants with ISM scores > 1.65 or < –1.65, representing a 90% CI, were determined to have significant effects. A single score was obtained for each HRC SNP by taking the maximum of the absolute values of the GKMexplain delta scores across the five models.

The lead GWAS SNP rs11190164 was LD-expanded (500 kb window, r2 thresholded at 0.20) using PLINK (1.9; ref. 38) based on the 1000 genomes phase 3 fileset from the cog-genomics site (https://www.cog-genomics.org/plink/2.0/resources#1kg_phase3; ref. 39), which was filtered to separate individuals of CEU ancestry. Using the SVM models trained on each of the five DNase-seq datasets, we scored the LD-expanded rs11190164 locus and predicted ISM effects on chromatin accessibility. We further inferred the contribution scores of each nucleotide in the input sequences to the output prediction of the SVM models using the GKMexplain algorithm (40). For each sequence containing a candidate variant, we computed GkmExplain scores for the sequence containing the reference allele and the sequence containing the alternate allele. For each candidate variant, a deltaGKMexplain score was computed by subtracting the GKMexplain score for the 1 kb vector of GKMexplain scores of the sequence with the reference allele from the 1 kb vector of GKMexplain scores of the sequence with the alternate allele. The TomTom algorithm (41) was used to identify likely motif matches for subsequences with high deltaGKMexplain scores. The support vector machine LS-GKM + GKMexplain workflow source code is available on github: https://github.com/kundajelab/SVM_pipelines.

Candidate functional variants were annotated with the 18-state ChromHMM annotations (42) across 218 cell types from the Roadmap Atlas (43; https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/). Bedtools intersect was used to identify overlaps between candidate functional SNPs and regions of enhancer activity in cell types associated with colorectal cancer.

As initial steps, we examined the characteristics of participants included in our interaction tests. Cases were older, had higher BMI and energy intake, more frequently had a family history of colorectal cancer, had lower levels of education, and were more likely to ever smoke cigarettes (Table 1). We then confirmed the previously reported main effect relationships between alcohol consumption and colorectal cancer, where nondrinking (OR = 1.13; 95% CI, 1.05–1.21) and heavy drinking (OR = 1.34; 95% CI, 1.23–1.45) were associated with increased risk as compared with light-to-moderate drinking (Supplementary Fig. S1). The association between nondrinking and colorectal cancer risk was similar across tumor sites, and the heavy drinking association was weakest for proximal colon cancer (OR = 1.26) and strongest for distal (OR = 1.39) and rectal colon cancer (OR = 1.44). We observed substantial heterogeneity in the association between nondrinking and colorectal cancer across studies (I2 = 66%, P value < 0.001). This observation is consistent with the fact that the reason for abstaining from alcohol and the composition of never, former, and occasional drinkers in the nondrinking group both affect risk of colorectal cancer and vary across study populations. This heterogeneity was not observed for the association between heavy drinking and colorectal cancer.

Table 1.

Characteristics of all study participants by case–control status.

CasesControls
(N = 31,874)(N = 42,225)P
Alcohol consumptiona 
 Light-to-moderate drinkers (>1–28 g/d) 13,979 (44%) 21,658 (51%) <0.001 
 Nondrinkers (≤1 g/day) 13,754 (43%) 15,546 (37%)  
 Heavy drinkers (>28 g/day) 4,141 (13%) 5,021 (12%)  
Age (median imputed) 
 Mean (SD) 64.0 (± 10.4) 63.1 (±9.44) <0.001 
Sex 
 Female 15,531 (49%) 21,046 (50%) 0.00269 
 Male 16,343 (51%) 21,179 (50%)  
Total energy intake (mean imputed)b 
 Mean (SD) 1,910 (±708) 1,970 (±736) <0.001 
Family history of colorectal cancer 
 No 22,482 (71%) 27,925 (66%) <0.001 
 Yes 4,371 (14%) 4,481 (11%)  
 Missing 5,021 (15.8%) 9,819 (23.3%)  
BMI 
 Mean (SD) 27.4 (±4.89) 27.0 (±4.62) <0.001 
 Missing 697 (2.2%) 604 (1.4%)  
Education level (highest completed) 
 Less than high school 7,759 (24%) 8,313 (20%) <0.001 
 High school/GED 6,391 (20%) 6,420 (15%)  
 Some college 7,651 (24%) 10,780 (26%)  
 College/graduate school 9,011 (28%) 13,587 (32%)  
 Missing 1,062 (3.3%) 3,125 (7.4%)  
Ever smoking 
 No 14,284 (45%) 20,496 (49%) <0.001 
 Yes 17,093 (54%) 21,089 (50%)  
 Missing 497 (1.6%) 640 (1.5%)  
Type 2 diabetes (ever diagnosed) 
 No 26,725 (84%) 37,268 (88%) <0.001 
 Yes 3,837 (12%) 3,627 (9%)  
 Missing 1,312 (4.1%) 1,330 (3.1%)  
Total dietary red meat intakec 
 Q1 7,108 (22%) 10,764 (25%) <0.001 
 Q2 8,320 (26%) 11,986 (28%)  
 Q3 8,088 (25%) 10,910 (26%)  
 Q4 7,398 (23%) 7,717 (18%)  
 Missing 960 (3.0%) 848 (2.0%)  
Total dietary fruit intakec 
 Q1 8,406 (26%) 10,215 (24%) <0.001 
 Q2 9,749 (31%) 11,841 (28%)  
 Q3 6,832 (21%) 9,923 (24%)  
 Q4 5,868 (18%) 9,261 (22%)  
 Missing 1,019 (3.2%) 985 (2.3%)  
Total dietary vegetable intakec 
 Q1 7,124 (22%) 9,896 (23%) <0.001 
 Q2 10,091 (32%) 11,515 (27%)  
 Q3 7,459 (23%) 10,561 (25%)  
 Q4 6,248 (20%) 9,326 (22%)  
 Missing 952 (3.0%) 927 (2.2%)  
Physical activity (MET-hr/week)d 
 Mean (SD) 44.8 (±64.9) 48.0 (±70.6) <0.001 
 Missing 14,547 (45.6%) 16,449 (39.0%)  
Postmenopausal hormone replacement therapy use 
 No 7,510 (24%) 10,605 (25%) <0.001 
 Yes 3,827 (12%) 6,032 (14%)  
 Missing 20,537 (64.4%) 25,588 (60.6%)  
Tumor site 
 Distal 8,445 (26%) 0 (0%) NA 
 Proximal 10,035 (31%) 0 (0%)  
 Rectal 8,167 (26%) 0 (0%)  
 Missing 5,227 (16.4%) 42,225 (100%)  
CasesControls
(N = 31,874)(N = 42,225)P
Alcohol consumptiona 
 Light-to-moderate drinkers (>1–28 g/d) 13,979 (44%) 21,658 (51%) <0.001 
 Nondrinkers (≤1 g/day) 13,754 (43%) 15,546 (37%)  
 Heavy drinkers (>28 g/day) 4,141 (13%) 5,021 (12%)  
Age (median imputed) 
 Mean (SD) 64.0 (± 10.4) 63.1 (±9.44) <0.001 
Sex 
 Female 15,531 (49%) 21,046 (50%) 0.00269 
 Male 16,343 (51%) 21,179 (50%)  
Total energy intake (mean imputed)b 
 Mean (SD) 1,910 (±708) 1,970 (±736) <0.001 
Family history of colorectal cancer 
 No 22,482 (71%) 27,925 (66%) <0.001 
 Yes 4,371 (14%) 4,481 (11%)  
 Missing 5,021 (15.8%) 9,819 (23.3%)  
BMI 
 Mean (SD) 27.4 (±4.89) 27.0 (±4.62) <0.001 
 Missing 697 (2.2%) 604 (1.4%)  
Education level (highest completed) 
 Less than high school 7,759 (24%) 8,313 (20%) <0.001 
 High school/GED 6,391 (20%) 6,420 (15%)  
 Some college 7,651 (24%) 10,780 (26%)  
 College/graduate school 9,011 (28%) 13,587 (32%)  
 Missing 1,062 (3.3%) 3,125 (7.4%)  
Ever smoking 
 No 14,284 (45%) 20,496 (49%) <0.001 
 Yes 17,093 (54%) 21,089 (50%)  
 Missing 497 (1.6%) 640 (1.5%)  
Type 2 diabetes (ever diagnosed) 
 No 26,725 (84%) 37,268 (88%) <0.001 
 Yes 3,837 (12%) 3,627 (9%)  
 Missing 1,312 (4.1%) 1,330 (3.1%)  
Total dietary red meat intakec 
 Q1 7,108 (22%) 10,764 (25%) <0.001 
 Q2 8,320 (26%) 11,986 (28%)  
 Q3 8,088 (25%) 10,910 (26%)  
 Q4 7,398 (23%) 7,717 (18%)  
 Missing 960 (3.0%) 848 (2.0%)  
Total dietary fruit intakec 
 Q1 8,406 (26%) 10,215 (24%) <0.001 
 Q2 9,749 (31%) 11,841 (28%)  
 Q3 6,832 (21%) 9,923 (24%)  
 Q4 5,868 (18%) 9,261 (22%)  
 Missing 1,019 (3.2%) 985 (2.3%)  
Total dietary vegetable intakec 
 Q1 7,124 (22%) 9,896 (23%) <0.001 
 Q2 10,091 (32%) 11,515 (27%)  
 Q3 7,459 (23%) 10,561 (25%)  
 Q4 6,248 (20%) 9,326 (22%)  
 Missing 952 (3.0%) 927 (2.2%)  
Physical activity (MET-hr/week)d 
 Mean (SD) 44.8 (±64.9) 48.0 (±70.6) <0.001 
 Missing 14,547 (45.6%) 16,449 (39.0%)  
Postmenopausal hormone replacement therapy use 
 No 7,510 (24%) 10,605 (25%) <0.001 
 Yes 3,827 (12%) 6,032 (14%)  
 Missing 20,537 (64.4%) 25,588 (60.6%)  
Tumor site 
 Distal 8,445 (26%) 0 (0%) NA 
 Proximal 10,035 (31%) 0 (0%)  
 Rectal 8,167 (26%) 0 (0%)  
 Missing 5,227 (16.4%) 42,225 (100%)  

aNondrinking is treated as missing for the heavy vs. light-to-moderate comparison, and heavy drinking is treated as missing for the nondrinking vs. light-to-moderate comparison. MECC_1 is also excluded from the heavy vs. light-to-moderate comparison, so the heavy drinking interaction analyses involved 247 fewer light-to-moderate drinkers than shown in the table.

bCalculations exclude individuals with missing total energy intake information.

cStudy- and sex-specific quartiles of serving size.

dMET defined as 1 kcal/kg/hour. Calculated as the mean ± 3* (study- and sex-specific mean absolute deviation).

Using the traditional genome-wide GxE tests of the interaction, we did not identify a significant interaction between any SNP and alcohol consumption. For the nondrinking as compared with light-to-moderate drinking GxE, there was a suggestive interaction in the 10q24.2/COX15 region previously associated with colorectal cancer (Fig. 1A; refs. 12, 13). There were also suggestive interactions from the heavy drinking as compared with light-to-moderate GxE, but none in regions previously identified by GWAS of colorectal cancer (Fig. 1C). The joint 2df tests identified SNPs with known colorectal cancer associations, and the joint 3df tests additionally identified SNPs with known alcohol consumption associations; however, no novel GxE interaction was discovered.

Figure 1.

All analyses are adjusting for age, sex, study site, total energy consumption, and the first three principal components. A and B, Manhattan plots of interaction between genome-wide genetic variants and nondrinking (A) or heavy drinking (B) as compared with light-to-moderate drinking. The blue horizontal line indicates the threshold for suggestive hits (P value < 5e-6), and SNPs plotted in orange have previously reported associations with colorectal cancer. C and D, Plots of expectation-based partitions adjusted by the number of effective tests in each bin. The gray line indicates the threshold for significance based on the bin-specific alpha-threshold (Meff). C, shows 13 significant SNPs, which are all located in the 10q24.2/COX15 region. Point colors alternate blue and green for visibility; red points denote statistically significant findings. SNPs = number of markers included in each bin; Meff = the number of effective tests in each bin after accounting for correlation between SNPs.

Figure 1.

All analyses are adjusting for age, sex, study site, total energy consumption, and the first three principal components. A and B, Manhattan plots of interaction between genome-wide genetic variants and nondrinking (A) or heavy drinking (B) as compared with light-to-moderate drinking. The blue horizontal line indicates the threshold for suggestive hits (P value < 5e-6), and SNPs plotted in orange have previously reported associations with colorectal cancer. C and D, Plots of expectation-based partitions adjusted by the number of effective tests in each bin. The gray line indicates the threshold for significance based on the bin-specific alpha-threshold (Meff). C, shows 13 significant SNPs, which are all located in the 10q24.2/COX15 region. Point colors alternate blue and green for visibility; red points denote statistically significant findings. SNPs = number of markers included in each bin; Meff = the number of effective tests in each bin after accounting for correlation between SNPs.

Close modal

We also conducted a hybrid two-step method to test for interactions, which yielded a statistically significant finding in the same 10q24.2/COX15 locus that had a suggestive GxE interaction. For nondrinkers as compared with light-to-moderate drinkers, there were 13 SNPs with pairwise r2 > 0.90 in the 10q24.2/COX15 region that showed a statistically significant interaction on risk of colorectal cancer (Fig. 1B). This procedure was null for heavy as compared with light-to-moderate drinking (Fig. 1D). As shown in the regional association plot, the lead SNP with the most significant interaction P value in the region was rs2300985 (Fig. 2). A stratified analysis of the lead SNP illustrates the observed interaction, showing that the A allele of rs2300985 was associated with a higher risk of colorectal cancer compared with the G reference allele only in light-to-moderate drinkers; the association was null in nondrinkers and in heavy drinkers (Table 2). We observed a dose–response relationship in light-to-moderate drinkers, where the OR for one copy of the rs2300985 A allele was 1.11 (95% CI, 1.06–1.17) and was 1.22 (95% CI, 1.14–1.31) for two copies of the A allele (Table 2). Because light-to-moderate drinkers are the reference group, the OR for the interaction term in the pooled GxE was inverse (OR = 0.89; 95% CI, 0.84–0.94; P value = 1.16×10–6). The forest plot illustrates an acceptable level of heterogeneity for the interaction OR across studies and no substantial difference between cohort and case–control studies (Supplementary Fig. S2). The interaction term was similar in analyses stratified by sex and tumor site, though it was weakest in proximal colon cases (OR = 0.92) and strongest in distal colon cases (OR = 0.87). This result withstood a sensitivity analysis additionally adjusted for BMI, diabetes, education level, ever smoking, as well as study and sex-specific quartiles of red meat, fruit, and vegetable consumption (OR = 0.89; 95% CI, 0.84–0.94). Additional adjustment for physical activity and postmenopausal hormone replacement therapy use restricted our sample size to 14,948 women and produced an OR of 0.95.

Figure 2.

Regional association plot of SNP and nondrinking versus light-to-moderate drinking interaction -log10P values. Result from hybrid two-step analysis of colorectal cancer risk at 10q24.2/COX15. rs2300985 is the index SNP as indicated by the purple diamond (GRCh37 coordinates).

Figure 2.

Regional association plot of SNP and nondrinking versus light-to-moderate drinking interaction -log10P values. Result from hybrid two-step analysis of colorectal cancer risk at 10q24.2/COX15. rs2300985 is the index SNP as indicated by the purple diamond (GRCh37 coordinates).

Close modal
Table 2.

Colorectal cancer associations stratified by genotypes of rs2300985 in the 10q24.2/COX15 region and by alcohol consumption.

Stratified associations for rs2300985 genotypes with colorectal cancer within alcohol consumption categories
NondrinkersaLight-to-moderate drinkersbHeavy drinkersc
Genotype at rs2300985No. of CasesdNo. of ControlsdOR (95% CI)eNo. of CasesfNo. of ControlsfOR (95% CI)dNo. of CasesgNo. of ControlsgOR (95% CI)d
GG 5,366 5,747 1 (ref) 4,806 7,804 1 (ref) 1,349 1,639 1 (ref) 
GA 6,324 7,369 0.96 (0.91–1.01) 6,678 10,266 1.11 (1.06–1.17) 2,057 2,496 1.05 (0.95–1.16) 
AA 2,064 2,430 0.98 (0.91–1.06) 2,495 3,588 1.22 (1.14–1.31) 735 886 1.06 (0.93–1.21) 
Stratified associations for rs2300985 genotypes with colorectal cancer within alcohol consumption categories
NondrinkersaLight-to-moderate drinkersbHeavy drinkersc
Genotype at rs2300985No. of CasesdNo. of ControlsdOR (95% CI)eNo. of CasesfNo. of ControlsfOR (95% CI)dNo. of CasesgNo. of ControlsgOR (95% CI)d
GG 5,366 5,747 1 (ref) 4,806 7,804 1 (ref) 1,349 1,639 1 (ref) 
GA 6,324 7,369 0.96 (0.91–1.01) 6,678 10,266 1.11 (1.06–1.17) 2,057 2,496 1.05 (0.95–1.16) 
AA 2,064 2,430 0.98 (0.91–1.06) 2,495 3,588 1.22 (1.14–1.31) 735 886 1.06 (0.93–1.21) 
Joint associations for rs2300985 genotypes and alcohol consumption with colorectal cancer across when comparing with light-to-moderate drinkers with the GG genotype
NondrinkersaLight-to-moderate drinkersbHeavy drinkersc
Genotype at rs2300985No. of CasesdNo. of ControlsdOR (95% CI)eNo. of CasesfNo. of ControlsfOR (95% CI)dNo. of CasesgNo. of ControlsgOR (95% CI)d
GG 5,366 5,747 1.28 (1.21–1.35) 4,806 7,804 1 (ref) 1,349 1,639 1.45 (1.33–1.58) 
GA 6,324 7,369 1.23 (1.17–1.30) 6,678 10,266 1.11 (1.06–1.17) 2,057 2,496 1.51 (1.41–1.63) 
AA 2,064 2,430 1.26 (1.17–1.35) 2,495 3,588 1.22 (1.14–1.31) 735 886 1.54 (1.37–1.72) 
Joint associations for rs2300985 genotypes and alcohol consumption with colorectal cancer across when comparing with light-to-moderate drinkers with the GG genotype
NondrinkersaLight-to-moderate drinkersbHeavy drinkersc
Genotype at rs2300985No. of CasesdNo. of ControlsdOR (95% CI)eNo. of CasesfNo. of ControlsfOR (95% CI)dNo. of CasesgNo. of ControlsgOR (95% CI)d
GG 5,366 5,747 1.28 (1.21–1.35) 4,806 7,804 1 (ref) 1,349 1,639 1.45 (1.33–1.58) 
GA 6,324 7,369 1.23 (1.17–1.30) 6,678 10,266 1.11 (1.06–1.17) 2,057 2,496 1.51 (1.41–1.63) 
AA 2,064 2,430 1.26 (1.17–1.35) 2,495 3,588 1.22 (1.14–1.31) 735 886 1.54 (1.37–1.72) 

aNondrinkers consume less than 1 gram of alcohol per day.

bLight-to-moderate drinkers consume 1 to 28 grams of alcohol per day.

cHeavy drinkers consume more than 28 grams of alcohol per day.

dNondrinking cases: GG (39%), GA (46%), AA (15%); Nondrinking controls: GG (37%), GA (47%), AA (16%).

eAdjusted for age, sex, study site, total energy intake, and the first three principal components.

fLight-to-moderate drinking cases: GG (34%), GA (48%), AA (18%); Light-to-moderate drinking controls: GG (36%), GA (47%), AA (17%).

gHeavy drinking cases: GG (32%), GA (50%), AA (18%); Heavy drinking controls: GG (32%), GA (50%), AA (18%).

The initial GWAS that discovered the 10q24.2/COX15 locus identified rs11190164 as the most significantly associated with colorectal cancer risk (13). The lead SNP from our interaction analyses (rs2300985) was highly correlated with rs11190164 at an r2 of 0.59. We LD-expanded our candidate set of variants to include 158 SNPs, including rs2300985, in a 500 kb window around the rs11190164 lead GWAS SNP based on an r2 > 0.20 in the 1 KG Phase III EUR population. We integrated functional chromatin profiling data in healthy colon, colorectal cancer tumor tissue, and three cell lines (SW480, HCT116, COLO205) with machine learning models of regulatory DNA sequence to prioritize putative causal regulatory variants in this locus. For each candidate variant, we used gapped k-mer support vector machine (gkmSVM) models trained on DNase-seq data from the five colorectal cancer–relevant biosamples to predict its allelic effect on chromatin accessibility in each biosample (Fig. 3). As expected, most of the candidate variants were predicted to have no significant allelic effects on chromatin accessibility. However, the models predicted the rs1318920 variant as a putative causal variant based on a significant difference in predicted chromatin accessibility between the reference C and alternate T allele (ISM score = −1.86, P value = 0.02 in healthy colon; ISM score = −2.22, P value = 0.007 in colorectal cancer tumor; ISM score = −1.79, P value = 0.02 in COLO205). The rs1318920 SNP had an association P value of 6.9 × 10–5 in our GWAS of colorectal cancer and an r2 of 0.60 with both the lead GWAS SNP rs11190164 and the lead interaction SNP rs2300985. The characteristics of the three SNPs of interest in the 10q24.2/COX15 region are described in Supplementary Table S4, which also verifies that their allele frequencies did not differ substantially by category of alcohol intake.

Figure 3.

SVM learning model pipeline to predict functional effects of linked SNPs within the 10q24.2/COX15 region. A, Analysis pipeline for SVM classifier development and linked SNP scoring. B, SVM test set predictions for reference and alternate alleles for 158 variants with r2 > 0.2 within 500 kb of the COX15 tagged SNP rs11190164. Bottom panel highlights reference and alternate predictions for rs11191064, rs1318920, and rs2300985.

Figure 3.

SVM learning model pipeline to predict functional effects of linked SNPs within the 10q24.2/COX15 region. A, Analysis pipeline for SVM classifier development and linked SNP scoring. B, SVM test set predictions for reference and alternate alleles for 158 variants with r2 > 0.2 within 500 kb of the COX15 tagged SNP rs11190164. Bottom panel highlights reference and alternate predictions for rs11191064, rs1318920, and rs2300985.

Close modal

To further explore the regulatory sequence features disrupted by each of the candidate variants, we used the GkmExplain method to infer the contribution of each nucleotide in the 1000 bp sequences containing the reference and alternate allele to the predicted chromatin accessibility from the gkmSVM models. GkmExplain analysis of rs1318920, rs11190164 (the lead GWAS SNP), and rs2300985 (the lead interaction SNP) supported the prediction of a strong allelic effect specifically for rs1318920 in healthy tissue, tumor tissue, and the COLO205 cell line (Fig. 4AC). The C allele of the rs1318920 variant was predicted to significantly (P value = 4.2 × 10–5) amplify the contribution scores of an overlapping subsequence (TTTGGACTTTGACC) relative to the T allele. This subsequence is a strong match to the known binding motif of the Hepatocyte Nuclear Factor 4α (HNF4α) transcription factor. The rs1318920 variant was also found to lie within 50 bp of the overlapping DNase-seq peak summits, which are the locations with maximal signal; this additionally supports its strong effect size via motif disruption (Fig. 5). Integrative chromatin state annotations from ChromHMM (42) across 218 biosamples revealed that rs1318920 falls within a putative regulatory element that is in an active enhancer state marked by enhancer-associated H3k27ac and H3K4me1 specifically in colorectal tissues (Supplementary Fig. S3). In contrast to rs1318920, the lead GWAS SNP rs11190164 and the lead interaction SNP rs2300985 did not have any supporting evidence for functional effects.

Figure 4.

GkmExplain sequence importance scores within ± 50 bp of the variants of interest in the 10q24.2/COX15 region. Scores are derived from SVM models in healthy and tumor primary tissue samples as well as SVM models in cell lines SW480, HCT116, and COLO205. A, rs1318920 reference allele scores. B, rs1318290 alternate allele scores. C, rs1318920 alternate allele scores − reference allele scores. D, rs2300985 reference allele scores. E, rs2300985 alternate allele scores. F, rs2300985 alternate allele scores minus reference allele scores. G, Tag SNP rs11190164 reference allele scores. H, Tag SNP rs11190164 alternate allele scores. I, Tag SNP rs11190164 alternate allele scores minus reference allele scores.

Figure 4.

GkmExplain sequence importance scores within ± 50 bp of the variants of interest in the 10q24.2/COX15 region. Scores are derived from SVM models in healthy and tumor primary tissue samples as well as SVM models in cell lines SW480, HCT116, and COLO205. A, rs1318920 reference allele scores. B, rs1318290 alternate allele scores. C, rs1318920 alternate allele scores − reference allele scores. D, rs2300985 reference allele scores. E, rs2300985 alternate allele scores. F, rs2300985 alternate allele scores minus reference allele scores. G, Tag SNP rs11190164 reference allele scores. H, Tag SNP rs11190164 alternate allele scores. I, Tag SNP rs11190164 alternate allele scores minus reference allele scores.

Close modal
Figure 5.

Chromatin accessibility assays highlighting peaks within the 10q24.2/COX15 region. Top panel indicates GENCODE reference genes (GRCh37). Variants with r2 > 0.2 within 500 kb of tag SNP rs11190164 are color-coded by r2 value. LD was calculated for the EUR and EAS populations within phase III of the 1000 Genomes (panel 2 and 3 from the top). Healthy ATAC-seq, DNASE-seq, H3K27ac histone ChIP-seq, and H3K4me1 histone ChIP-seq P value bigWigs are indicated in green. The same set of assays for tumor samples are indicated in blue. The same set of assays for cell lines SW480, HCT116, and COLO205 are overlaid and indicated in red.

Figure 5.

Chromatin accessibility assays highlighting peaks within the 10q24.2/COX15 region. Top panel indicates GENCODE reference genes (GRCh37). Variants with r2 > 0.2 within 500 kb of tag SNP rs11190164 are color-coded by r2 value. LD was calculated for the EUR and EAS populations within phase III of the 1000 Genomes (panel 2 and 3 from the top). Healthy ATAC-seq, DNASE-seq, H3K27ac histone ChIP-seq, and H3K4me1 histone ChIP-seq P value bigWigs are indicated in green. The same set of assays for tumor samples are indicated in blue. The same set of assays for cell lines SW480, HCT116, and COLO205 are overlaid and indicated in red.

Close modal

Two independent sources of expression quantitative trait loci (eQTL) expand on the regulatory role of rs1318920. rs1318920 is an eQTL in the GTEx v8 compendium that influences the expression of EBAG9P1, ENTPD7, and RP11–85A1.3 in brain, cultured fibroblast, esophageal, or nerve tissues. rs1318920 is also a suggestive eQTL in normal colon tissue from the BarcUVa-Seq study (28) that regulates expression of ENTPD7, where the T alternate allele is associated with increased expression (β = 0.11, P value = 3.5 × 10–3). On the basis of the BarcUVa-Seq result, we checked for and similarly observed a positive association between the T allele of rs1318920 and ENTPD7 expression in human normal colon 3D organoids (β = 0.59, P value = 0.004). Exploring these results further, we detected a statistically significant interaction between standardized predicted expression of ENTPD7 and nondrinking (P value = 0.007) in our data from the involved consortia; the interaction was positive, but nonsignificant for heavy drinking (P value = 0.42; Supplementary Table S5).

We conducted a genome-wide interaction study (GWIS) of colorectal cancer and discovered a possible interaction between alcohol consumption and genetic variants in the 10q24.2/COX15 region. Specifically, for the lead interaction SNP, we found that the A allele of rs2300985 was associated with an increase in risk of colorectal cancer in light-to-moderate drinkers, but was not associated with colorectal cancer risk in nondrinkers.

If nondrinking partially captures other risk factors or health status, then our result suggests that those characteristics overwhelm or counteract the effects associated with rs2300985 in nondrinkers. If light-to-moderate drinking is in fact protective, a possible mechanism might be that low levels of ethanol exposure are anti-inflammatory (3, 4) or upregulate liver detoxification enzymes that then mitigate other risk factors, while the adverse consequences of alcohol predominate over any hypothesized benefits at high levels of ethanol exposure (5, 6). Based on our results, the effects associated with rs2300985 may be related to carcinogenesis only when combined with other changes due to light-to-moderate drinking.

Our integrative analysis also suggests rs1318920 as a potentially causal variant in the 10q24.2/COX15 region that is in LD with both the lead GWAS SNP rs11190164 and the lead interaction SNP rs2300985. rs1318920 is predicted to have a significant allelic effect on chromatin accessibility of a colorectal tissue-specific active enhancer by restoring the combined binding motif of HNF4 α and γ and may have a regulatory effect on expression of ENTPD7 in colon tissue. As a potential connection to alcohol consumption, there is evidence from mouse and cell-line experiments that HNF4α DNA binding inhibits alcoholic steatosis (44) and may prevent alcoholic liver disease, which is a possible risk factor for colorectal cancer (45). As a result, we hypothesize that rs1318920 is the causal variant driving the observed increased risk of colorectal cancer and that its effects on colorectal cancer may be affected by alcohol consumption. This finding warrants future work to confirm the functional relevance of rs1318920 in cancer cell lines, including results from a luciferase assay demonstrating the allele-specific enhancer activity that was predicted in the COLO205 cell line.

If confirmed, the possible causal variant suggests a plausible biological mechanism, where the T alternate allele completes the motif that binds to the HNF4α transcription factor. HNF4A regulates genes involved in glucose, cholesterol, fatty acid, and amino acid metabolism (46); it has also been linked to colorectal cancer and identified as a potential drug target by Sladek and colleagues (47). In this case, the HNF4α binding site is within COX15 and close to ENTPD7. While speculative, the T allele of rs1318920 in the 10q24.2/COX15 region may restore HNF4α binding and promote colorectal cancer, possibly through increased expression of ENTPD7 in the colon. Interestingly, SNPs located in the 10q24.2/COX15 region are also cis-eQTLs for EBAG9P1 in naive CD4+ T cells, CD8+ T cells, and TREG immune cells (48), suggesting a potentail impact on colorectal cancer risk via an immunomodulating effect. Follow-up analyses are warranted to assess the functional support for our hypotheses and to explore additional plausible mechanisms, including possible pathways through folate deficiency.

In a prior GWIS of alcohol in this consortium (49), we highlighted an interaction between SNPs in the 9q22.32/HIATL1 region and light-to-moderate drinking as compared with nondrinking. The tag SNP rs9409565 met the P value threshold of 0.05 for validation, but was not statistically significant in our larger study after adjustment for multiple testing (OR = 0.94, P value = 0.01).

Though GxE interactions are difficult to detect, our GWIS benefits from a substantial sample size of 31,874 cases and 42,225 controls, which was only possible through the inclusion of numerous epidemiological studies with detailed risk assessment and dedicated data harmonization efforts over many years. This study is currently among the largest of its kind and, combined with cutting edge statistical methods, allowed us to detect a possible interaction between a known GWAS hit and alcohol consumption. Our detailed evaluation of the J-shaped relationship between alcohol consumption and risk of colorectal cancer (1) also ensured that we appropriately modeled alcohol consumption during interaction testing.

Our study also has several limitations. Our cohort, though large, consisted of consortia involving individuals solely from EUR backgrounds, which limits the generalizability of the findings to individuals of non-European ancestry. Our categories of alcohol consumption were not sex-specific; however, we observed similar main effect associations between alcohol consumption and colorectal cancer in males and females, and we adjusted our interaction tests for sex. Alcohol consumption was based on intake at the reference time, so our nondrinking category includes former drinkers, and this approach may contribute to residual confounding in the nondrinking group. Given the complexity of the harmonization process and the inconsistent information about past drinking across the large number of diverse studies involved, a sensitivity analysis excluding former drinkers is not feasible. However, to explain our main finding, the presence of former heavy drinkers in the nondrinking group would need to attenuate the association between the rs2300985 A allele and colorectal cancer risk more than observed in the heavy drinking group itself and would also need to outweigh the bias away from the null introduced by the presence of former light-to-moderate drinkers. The interaction also survived a sensitivity analysis adjusted for a comprehensive set of potential confounders, though we were limited to covariates that were harmonized across the consortia and do not have access to all measured variables in each participating study. There were more nondrinkers than heavy drinkers, so we were not as well powered to detect interactions with heavy alcohol consumption.

For the reported interaction, the result from the traditional GxE was only suggestive, especially considering we conducted two GWIS. The modified two-step method that reported 10q24.2/COX15 as statistically significant is a newer approach that controls for multiple testing by estimating the bin-specific effective number of tests. While this approach is computationally more expensive, it addresses an important limitation of prior two-step methods, which do not currently account for correlated markers. Multiple methods exist to calculate the effective number of tests, and the Gao and colleagues method used in our approach is a comparatively stringent option (26, 50). Finally, we were unable to establish causality with the data available, so the proposed relationships need to be validated experimentally using methods like Perturb-seq coupled with differing alcohol treatment conditions (51, 52).

In summary, our results suggest that the association at the known 10q24.2/COX15 colorectal cancer locus is driven by light-to-moderate drinkers. Further, we have identified a putative causal variant in the region with strong evidence for a functional effect, which provides interesting directions for future research involving the link between rs1318920 and colorectal cancer and the possible role of alcohol consumption in this mechanism. Though we hope these findings inform future research involving the 10q24.2/COX15 region and colorectal cancer, they should not be used to guide public health recommendations without further validation, functional work, and research in the context of other colorectal cancer–associated variants and risk factors.

K.M. Jordahl reports grants from NCI (T32-CA094880) during the conduct of the study. A. Shcherbina reports other support from Insitro, Inc and personal fees from Bristol-Myers Squibb, Inc outside the submitted work. Y.-R. Su reports grants from NIH during the conduct of the study. S.A. Bien is an employee of and holds stock in Adaptive Biotechnologies. G. Casey reports grants from NIH during the conduct of the study. A.T. Chan reports personal fees from Bayer Pharma AG and Boehringer Ingelheim and grants and personal fees from Pfizer Inc. outside the submitted work. C.H. Dampier reports grants from NIH during the conduct of the study. D.A. Drew reports grants from NIH during the conduct of the study. G.G. Giles reports grants from National Health and Medical Research Council (Australia) during the conduct of the study. S.B. Gruber reports other support from Brogent International LLC outside the submitted work. H. Hampel reports other support from Myriad Genetics, Inc during the conduct of the study as well as other support from Invitae, Genome Medical, Promega, and GI OnDEMAND outside the submitted work. M.A. Jenkins reports grants from NIH and National Health and Medical Research Council, Australia during the conduct of the study. L. Le Marchand reports grants from NCI during the conduct of the study. V. Moreno reports grants from Agency for Management of University and Research Grants (AGAUR) of the Catalan Government and Instituto de Salud Carlos III during the conduct of the study. S. Ogino reports grants from NIH during the conduct of the study. R.K. Pai reports personal fees from Alimentiv Inc, PathAI, Allergan, AbbVie, and Eli Lilly outside the submitted work. J.R. Palmer reports grants from NIH during the conduct of the study. P.D.P. Pharoah reports grants from Cancer Research UK during the conduct of the study. E.A. Platz reports grants from NCI and AICR during the conduct of the study as well as personal fees from American Association for Cancer Research outside the submitted work. R.L. Prentice reports grants from NHLBI/NIH during the conduct of the study. L.C. Sakoda reports grants from NCI during the conduct of the study as well as grants from NCI and personal fees from NIH outside the submitted work. P.C. Scacheri reports grants and personal fees from Kronos Bio outside the submitted work. S.L. Schmit reports grants from NIEHS during the conduct of the study. R.E. Schoen reports grants from Freenome, Immunovia, and Exact outside the submitted work. M.L. Slattery reports grants from NCI during the conduct of the study. M.C. Stern reports grants from University of Southern California during the conduct of the study. C.M. Ulrich reports, as cancer center director, having oversight over research funded by several pharmaceutical companies, but has not received direct funding. B. Van Guelpen reports grants from Swedish Research Council, through Biobank Sweden, during the conduct of the study. A. Wolk reports grants from The Swedish Cancer Foundation and The Swedish Research Council/SIMPLER during the conduct of the study. A. Kundaje reports personal fees from Ilumina Inc. and Open Targets (GSK) outside the submitted work. No disclosures were reported by the other authors.

The results reported here and the conclusions derived are the sole responsibility of the authors. Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization.

K.M. Jordahl: Conceptualization, supervision, investigation, methodology, writing–original draft, writing–review and editing. A. Shcherbina: Software, formal analysis, visualization, methodology, writing–original draft, writing–review and editing. A.E. Kim: Data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. Y.-R. Su: Data curation, formal analysis, investigation. Y. Lin: Formal analysis, validation, investigation, visualization, methodology. J. Wang: Formal analysis. C. Qu: Conceptualization, resources, data curation. D. Albanes: Conceptualization, resources, data curation. V. Arndt: Conceptualization, resources, data curation. J.W. Baurley: Conceptualization, resources, data curation. S.I. Berndt: Conceptualization, resources, data curation. S.A. Bien: Conceptualization, resources, data curation. D.T. Bishop: Conceptualization, resources, data curation. E. Bouras: Conceptualization, resources, data curation. H. Brenner: Conceptualization, resources, data curation. D.D. Buchanan: Conceptualization, resources, data curation. A. Budiarto: Conceptualization, resources, data curation. P.T. Campbell: Conceptualization, resources, data curation. R. Carreras-Torres: Conceptualization, resources, data curation. G. Casey: Conceptualization, resources, data curation, investigation. T.W. Cenggoro: Conceptualization, resources, data curation. A.T. Chan: Conceptualization, resources, data curation. D.V. Conti: Conceptualization, resources, data curation. C.H. Dampier: Conceptualization, resources, data curation, writing–review and editing. M.A. Devall: Conceptualization, resources, data curation. V. Díez‐Obrero: Conceptualization, resources, data curation, writing–review and editing. N. Dimou: Conceptualization, resources, data curation. D.A. Drew: Conceptualization, resources, data curation. J.C. Figueiredo: Conceptualization, resources, data curation. S. Gallinger: Conceptualization, resources, data curation. G.G. Giles: Conceptualization, resources, data curation. S.B. Gruber: Conceptualization, resources, data curation. A. Gsur: Conceptualization, resources, data curation. M.J. Gunter: Conceptualization, resources, data curation. H. Hampel: Conceptualization, resources, data curation. S. Harlid: Conceptualization, resources, data curation. T.A. Harrison: Conceptualization, resources, data curation, supervision, funding acquisition. A. Hidaka: Conceptualization, resources, data curation. M. Hoffmeister: Conceptualization, resources, data curation. J.R. Huyghe: Conceptualization, resources, data curation. M.A. Jenkins: Conceptualization, resources, data curation. A.D. Joshi: Conceptualization, resources, data curation. T.O. Keku: Conceptualization, resources, data curation. S.C. Larsson: Conceptualization, resources, data curation. L. Le Marchand: Conceptualization, resources, data curation. J.P. Lewinger: Conceptualization, resources, data curation. L. Li: Conceptualization, resources, data curation. B. Mahesworo: Conceptualization, resources, data curation. V. Moreno: Conceptualization, resources, data curation. J.L. Morrison: Conceptualization, resources, data curation. N. Murphy: Conceptualization, resources, data curation. H. Nan: Conceptualization, resources, data curation. R. Nassir: Conceptualization, resources, data curation. P.A. Newcomb: Conceptualization, resources, data curation. M. Obón-Santacana: Conceptualization, resources, data curation. S. Ogino: Conceptualization, resources, data curation. J. Ose: Conceptualization, resources, data curation. R.K. Pai: Conceptualization, resources, data curation. J.R. Palmer: Conceptualization, resources, data curation. N. Papadimitriou: Conceptualization, resources, data curation. B. Pardamean: Conceptualization, resources, data curation. A.R. Peoples: Conceptualization, resources, data curation. P.D.P. Pharoah: Conceptualization, resources, data curation. E.A. Platz: Conceptualization, resources, data curation. J.D. Potter: Conceptualization, resources, data curation. R.L. Prentice: Conceptualization, resources, data curation. G. Rennert: Conceptualization, resources, data curation. E. Ruiz-Narvaez: Conceptualization, resources, data curation. L.C. Sakoda: Conceptualization, resources, data curation. P.C. Scacheri: Conceptualization, resources, data curation. S.L. Schmit: Conceptualization, resources, data curation. R.E. Schoen: Conceptualization, resources, data curation. M.L. Slattery: Conceptualization, resources, data curation. M.C. Stern: Conceptualization, resources, data curation. C.M. Tangen: Conceptualization, resources, data curation. S.N. Thibodeau: Conceptualization, resources, data curation. D.C. Thomas: Conceptualization, resources, data curation. Y. Tian: Conceptualization, resources, data curation. K.K. Tsilidis: Conceptualization, resources, data curation. C.M. Ulrich: Conceptualization, resources, data curation. F.J.B. van Duijnhoven: Conceptualization, resources, data curation. B. Van Guelpen: Conceptualization, resources, data curation. K. Visvanathan: Conceptualization, resources, data curation. P. Vodicka: Conceptualization, resources, data curation. E. White: Conceptualization, resources, data curation. A. Wolk: Conceptualization, resources, data curation. M.O. Woods: Conceptualization, resources, data curation. A.H. Wu: Conceptualization, resources, data curation. N. Zemlianskaia: Conceptualization, resources, data curation. J. Chang-Claude: Conceptualization, resources, data curation. W.J. Gauderman: Conceptualization, resources, data curation. L. Hsu: Conceptualization, data curation, formal analysis, writing–review and editing. A. Kundaje: Supervision, methodology, writing–review and editing. U. Peters: Conceptualization, resources, data curation, supervision, funding acquisition, project administration, writing–review and editing.

Genotyping services were provided by the Center for Inherited Disease Research (CIDR). Cancer data were provided by the Maryland Cancer Registry, Center for Cancer Prevention and Control, Maryland Department of Health. This research has been conducted using the UK Biobank Resource under Application Number 8614. We would also like to acknowledge data from VITAL, WHI. We thank all participants and cooperating clinicians, and everyone who provided excellent technical assistance from the following organizations, registries, and consortia: Colon CFR; Seattle CCFR; Hormones and Colon Cancer study (CORE Studies); CLUE II; CPS-II; Czech Republic CCS; DACHS; EPIC; Harvard cohorts (HPFS, NHS); Channing Division of Network Medicine; Department of Medicine, Brigham and Women's Hospital and Harvard T.H. Chan School of Public Health; Kentucky Cancer Registry; LCCS: We acknowledge the contributions of Jennifer Barrett, Robin Waxman, Gillian Smith and Emma Northwood in conducting this study; PLCO Cancer Screening Trial; District of Columbia Cancer Registry, Georgia Cancer Registry, Hawaii Cancer Registry, Minnesota Cancer Surveillance System, Missouri Cancer Registry, Nevada Central Cancer Registry, Pennsylvania Cancer Registry, Texas Cancer Registry, Virginia Cancer Registry, and Wisconsin Cancer Reporting System. All are supported in part by funds from the Center for Disease Control and Prevention, National Program for Central Registries, local states or by the National Cancer Institute, Surveillance, Epidemiology, and End Results program; SELECT; WHI. GECCO: National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (U01 CA137088, R01 CA059045, U01 CA164930, R01201407). The Editor-in-Chief of Cancer Epidemiology, Biomarkers & Prevention is an author on this article. In keeping with AACR editorial policy, a senior member of the Cancer Epidemiology, Biomarkers & Prevention editorial team managed the consideration process for this submission and independently rendered the final decision concerning acceptability.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
McNabb
S
,
Harrison
TA
,
Albanes
D
,
Berndt
SI
,
Brenner
H
,
Caan
BJ
, et al
.
Meta-analysis of 16 studies of the association of alcohol with colorectal cancer
.
Int J Cancer
2020
;
146
:
861
73
.
2.
Rossi
M
,
Anwar
MJ
,
Usman
A
,
Keshavarzian
A
,
Bishehsari
F
.
Colorectal cancer and alcohol consumption—populations to molecules
.
Cancers
2018
;
10
:
38
.
3.
Pai
JK
,
Hankinson
SE
,
Thadhani
R
,
Rifai
N
,
Pischon
T
,
Rimm
EB
.
Moderate alcohol consumption and lower levels of inflammatory markers in US men and women
.
Atherosclerosis
2006
;
186
:
113
20
.
4.
Klarich
DS
,
Penprase
J
,
Cintora
P
,
Medrano
O
,
Erwin
D
,
Brasser
SM
, et al
.
Effects of moderate alcohol consumption on gene expression related to colonic inflammation and antioxidant enzymes in rats
.
Alcohol
2017
;
61
:
25
31
.
5.
Gunji
T
,
Sato
H
,
Iijima
K
,
Fujibayashi
K
,
Okumura
M
,
Sasabe
N
, et al
.
Modest alcohol consumption has an inverse association with liver fat content
.
Hepatogastroenterology
2012
;
59
:
2552
6
.
6.
Alatalo
PI
,
Koivisto
HM
,
Hietala
JP
,
Puukka
KS
,
Bloigu
R
,
Niemelä
OJ
.
Effect of moderate alcohol consumption on liver enzymes increases with increasing body mass index
.
Am J Clin Nutr
2008
;
88
:
1097
103
.
7.
Schubert
SA
,
Morreau
H
,
de Miranda
NFCC
,
van Wezel
T
.
The missing heritability of familial colorectal cancer
.
Mutagenesis
2020
;
35
:
221
31
.
8.
Morozova
TV
,
Mackay
TFC
,
Anholt
RRH
.
Genetics and genomics of alcohol sensitivity
.
Mol Genet Genomics
2014
;
289
:
253
69
.
9.
van IJzendoorn
MH
,
Bakermans-Kranenburg
MJ
,
Belsky
J
,
Beach
S
,
Brody
G
,
Dodge
KA
, et al
.
Gene-by-environment experiments: a new approach to finding the missing heritability
.
Nat Rev Genet
2011
;
12
:
881
;
author reply 881
.
10.
Dumitrescu
RG
.
Alcohol-induced epigenetic changes in cancer
.
Methods Mol Biol
2018
;
1856
:
157
72
.
11.
Huyghe
JR
,
Bien
SA
,
Harrison
TA
,
Kang
HM
,
Chen
S
,
Schmit
SL
, et al
.
Discovery of common and rare genetic risk variants for colorectal cancer
.
Nat Genet
2019
;
51
:
76
87
.
12.
Schmit
SL
,
Edlund
CK
,
Schumacher
FR
,
Gong
J
,
Harrison
TA
,
Huyghe
JR
, et al
.
Novel common genetic susceptibility loci for colorectal cancer
.
J Natl Cancer Inst
2018
;
111
:
146
57
.
13.
Schumacher
FR
,
Schmit
SL
,
Jiao
S
,
Edlund
CK
,
Wang
H
,
Zhang
B
, et al
.
Genome-wide association study of colorectal cancer identifies six new susceptibility loci
.
Nat Commun
2015
;
6
:
7138
.
14.
Hutter
CM
,
Chang-Claude
J
,
Slattery
ML
,
Pflugeisen
BM
,
Lin
Y
,
Duggan
D
, et al
.
Characterization of gene-environment interactions for colorectal cancer susceptibility loci
.
Cancer Res
2012
;
72
:
2036
44
.
15.
Beuth
J
,
Moss
RW
.
Complementary oncology: adjunctive methods in the treatment of cancer
.
Thieme
:
2011
.
16.
Das
S
,
Forer
L
,
Schönherr
S
,
Sidore
C
,
Locke
AE
,
Kwong
A
, et al
.
Next-generation genotype imputation service and methods
.
Nat Genet
2016
;
48
:
1284
7
.
17.
Hartung
J
,
Knapp
G
.
A refined method for the meta-analysis of controlled clinical trials with binary outcome
.
Stat Med
2001
;
20
:
3875
89
.
18.
Cochran
WG
.
The combination of estimates from different experiments
.
Biometrics
1954
;
10
:
101
.
19.
Schwarzer
G
,
Carpenter
JR
,
Rücker
G
.
Meta-Analysis with R
.
Springer
:
2015
.
20.
Morrison
J
.
GxEScanR: Run GWAS/GWEIS Scans Using Binary Dosage Files
.
2020
.
21.
Zheng
J
,
Li
Y
,
Abecasis
GR
,
Scheet
P
.
A comparison of approaches to account for uncertainty in analysis of imputed genotypes
.
Genet Epidemiol
2011
;
35
:
102
10
.
22.
Murcray
CE
,
Lewinger
JP
,
Gauderman
WJ
.
Gene-environment interaction in genome-wide association studies
.
Am J Epidemiol
2009
;
169
:
219
26
.
23.
Gauderman
WJ
,
Kim
A
,
Conti
DV
,
Morrison
J
,
Thomas
DC
,
Vora
H
, et al
.
A unified model for the analysis of gene-environment interaction
.
Am J Epidemiol
2019
;
188
:
760
7
.
24.
Kooperberg
C
,
LeBlanc
M
.
Increasing the power of identifying gene × gene interactions in genome-wide association studies
.
Genet Epidemiol
2008
;
32
:
255
63
.
25.
Ionita-Laza
I
,
McQueen
MB
,
Laird
NM
,
Lange
C
.
Genome-wide weighted hypothesis testing in family-based association studies, with an application to a 100K scan
.
Am J Hum Genet
2007
;
81
:
607
14
.
26.
Gao
X
,
Starmer
J
,
Martin
ER
.
A multiple testing correction method for genetic association studies using correlated single-nucleotide polymorphisms
.
Genet Epidemiol
2008
;
32
:
361
9
.
27.
Pruim
RJ
,
Welch
RP
,
Sanna
S
,
Teslovich
TM
,
Chines
PS
,
Gliedt
TP
, et al
.
LocusZoom: regional visualization of genome-wide association scan results
.
Bioinformatics
2010
;
26
:
2336
7
.
28.
Díez-Obrero
V
,
Dampier
CH
,
Moratalla-Navarro
F
,
Devall
M
,
Plummer
SJ
,
Díez-Villanueva
A
, et al
.
Genetic effects on transcriptome profiles in colon epithelium provide functional insights for genetic risk loci
.
Cell Mol Gastroenterol Hepatol
2021
;
12
:
181
97
.
29.
Devall
M
,
Plummer
SJ
,
Bryant
J
,
Jennelle
LT
,
Eaton
S
,
Dampier
CH
, et al
.
Ethanol exposure drives colon location specific cell composition changes in a normal colon crypt 3D organoid model
.
Sci Rep
2021
;
11
:
432
.
30.
Taliun
D
,
Harris
DN
,
Kessler
MD
,
Carlson
J
,
Szpiech
ZA
,
Torres
R
, et al
.
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
.
Nature
2021
;
590
:
290
9
.
31.
Cohen
AJ
,
Saiakhova
A
,
Corradin
O
,
Luppino
JM
,
Lovrenert
K
,
Bartels
CF
, et al
.
Hotspots of aberrant enhancer activity punctuate the colorectal cancer epigenome
.
Nat Commun
2017
;
8
:
1
13
.
32.
Lee
J
,
Ottojolanki
,
Kim
D
,
Strattan
JS
,
Kundaje
A
,
Nordström
K
, et al
.
ENCODE-DCC/atac-seq-pipeline: v1.9.1
.
2020
.
33.
Lee
J
,
Seth Strattan
J
,
annashcherbina
,
Kagda
M
,
Maurizio
PL
.
ENCODE-DCC/chip-seq-pipeline2: v1.6.1
.
2020
.
34.
Li
Q
,
Brown
JB
,
Huang
H
,
Bickel
PJ
.
Measuring reproducibility of high-throughput experiments
.
Ann Appl Stat
2011
;
5
.
35.
Lopez-Delisle
L
,
Rabbani
L
,
Wolff
J
,
Bhardwaj
V
,
Backofen
R
,
Grüning
B
, et al
.
pyGenomeTracks: reproducible plots for multivariate genomic data sets
.
Bioinformatics
2021
;
37
:
422
3
.
36.
Quinlan
AR
.
BEDTools: the Swiss-army tool for genome feature analysis
.
Curr Protoc Bioinformatics
2014
;
47
:
11.12.1
34
.
37.
Lee
D
.
LS-GKM: a new gkm-SVM for large-scale datasets
.
Bioinformatics
2016
;
32
:
2196
8
.
38.
Chang
CC
,
Chow
CC
,
Tellier
LC
,
Vattikuti
S
,
Purcell
SM
,
Lee
JJ
.
Second-generation PLINK: rising to the challenge of larger and richer datasets
.
GigaScience
2015
;
4
:
7
.
39.
Katsnelson
A
.
1000 Genomes Project reveals human variation
.
Nature
2010
.
40.
Shrikumar
A
,
Prakash
E
,
Kundaje
A
.
Gkmexplain: fast and accurate interpretation of nonlinear gapped k-mer SVMs using integrated gradients
.
41.
Gupta
S
,
Stamatoyannopoulos
JA
,
Bailey
TL
,
Noble
W
.
Quantifying similarity between motifs
.
Genome Biol
2007
;
8
:
R24
.
42.
Ernst
J
,
Kellis
M
.
Chromatin-state discovery and genome annotation with ChromHMM
.
Nat Protoc
2017
;
12
:
2478
92
.
43.
Roadmap Epigenomics Consortium
,
Kundaje
A
,
Meuleman
W
,
Ernst
J
,
Bilenky
M
,
Yen
A
, et al
.
Integrative analysis of 111 reference human epigenomes
.
Nature
2015
;
518
:
317
30
.
44.
Kang
X
,
Zhong
W
,
Liu
J
,
Song
Z
,
McClain
CJ
,
Kang
YJ
, et al
.
Zinc supplementation reverses alcohol-induced steatosis in mice through reactivating hepatocyte nuclear factor-4alpha and peroxisome proliferator-activated receptor-alpha
.
Hepatology
2009
;
50
:
1241
50
.
45.
Komaki
Y
,
Komaki
F
,
Micic
D
,
Ido
A
,
Sakuraba
A
.
Risk of colorectal cancer in chronic liver diseases: a systematic review and meta-analysis
.
Gastrointest Endosc
2017
;
86
:
93
104
.
46.
Stoffel
M
,
Duncan
SA
.
The maturity-onset diabetes of the young (MODY1) transcription factor HNF4alpha regulates expression of genes required for glucose transport and metabolism
.
Proc Natl Acad Sci U S A
1997
;
94
:
13209
14
.
47.
Chellappa
K
,
Robertson
GR
,
Sladek
FM
.
HNF4α: a new biomarker in colon cancer?
Biomark Med
2012
;
6
:
297
.
48.
Schmiedel
BJ
,
Singh
D
,
Madrigal
A
,
Valdovino-Gonzalez
AG
,
White
BM
,
Zapardiel-Gonzalo
J
, et al
.
Impact of genetic polymorphisms on human immune cell gene expression
.
Cell
2018
;
175
:
1701
15
.
49.
Gong
J
,
Hutter
CM
,
Newcomb
PA
,
Ulrich
CM
,
Bien
SA
,
Campbell
PT
, et al
.
Genome-wide interaction analyses between genetic variants and alcohol consumption and smoking for risk of colorectal cancer
.
PLoS Genet
2016
;
12
:
e1006296
.
50.
Li
M-X
,
Yeung
JMY
,
Cherny
SS
,
Sham
PC
.
Evaluating the effective numbers of independent tests and significant P value thresholds in commercial genotyping arrays and public imputation reference datasets
.
Hum Genet
2012
;
131
:
747
56
.
51.
Dixit
A
,
Parnas
O
,
Li
B
,
Chen
J
,
Fulco
CP
,
Jerby-Arnon
L
, et al
.
Perturb-seq: Dissecting molecular circuits with scalable single cell RNA profiling of pooled genetic screens
.
Cell
2016
;
167
:
1853
.
52.
Schraivogel
D
,
Gschwind
AR
,
Milbank
JH
,
Leonce
DR
,
Jakob
P
,
Mathur
L
, et al
.
Targeted Perturb-seq enables genome-scale genetic screens in single cells
.
Nat Methods
2020
;
17
:
629
35
.

Supplementary data