Abstract
Background: Colorectal cancer is the second leading cause of cancer-related death, and most colorectal cancer usually arises from colorectal adenomas. Removal of polyps reduces mortality from colorectal cancer. Colorectal adenomas are known to aggregate in families; however, the genetic determinants for risk of polyps are largely unknown.
Methods: In this study, we used data from the Tennessee Colorectal Polyp Study and the Tennessee-Indiana Adenoma Recurrence Study to conduct a GWAS of adenoma cases and controls. Our design consisted of discovery and replication phases for a total of 2,551 Caucasian adenoma cases and 3,285 Caucasian controls. We carried out logistic regression to test for association in both the discovery and replication phase and further examined the results with meta-analysis.
Results: No single nucleotide polymorphism (SNP) achieved a genome-wide significant P value; however, the most significantly associated SNPs were either previously associated with colorectal cancer in GWAS, such as rs10505477 in the gene POU5F1 [odds ratio (OR) = 0.87; 95% confidence interval (CI) 0.81–0.94; P = 4.4 × 10−4), or have been biologically linked to benign growths in other tissues, such as rs1919314 in the gene histone deacetylase 9 (OR = 1.32; 95% CI, 1.18–1.47; P = 1.1 × 10−6).
Conclusions: This study suggests that several SNPs may be related to adenoma risk and provides clues for future studies.
Impact: These results suggest that some known genetic risk factors of colorectal cancer are necessary but not sufficient for carcinogenesis. Cancer Epidemiol Biomarkers Prev; 22(7); 1219–26. ©2013 AACR.
Introduction
Colorectal cancer is the second most common cause of cancer-related death in North America and the fourth most diagnosed cancer (1). The vast majority of colorectal cancers are derived from colorectal adenomas (2, 3), which are well-recognized colorectal cancer precursors (2). Colorectal cancer risk has been shown to be modulated by environmental and genetic factors, in addition to epigenetic phenomena that associate with tumors. In colorectal cancer pathways, normal colonic epithelium is transformed as the result of the progressive accumulation of genetic and epigenetic alterations such as somatic mutations through gain-of-function, loss-of-function, and subsequent genomic instability (4–6). Although this process is understood to some extent with regard to colorectal cancer endpoints, less is known about the genetic determinants for the intermediate steps of pathogenesis through the adenoma–carcinoma progression before the onset of colorectal cancer.
Many successful investigations have been conducted to discover germ-line genetic risk factors for colorectal cancer (7–12), and these associations generalize for the most part to diverse human populations (12, 13). These studies have provided important insight into the biology and pathogenesis of colorectal cancer; however, these studies have not explored earlier stages of colorectal cancer pathogenesis. Studies of colorectal adenoma have the potential to both characterize the role of known colorectal cancer risk factors at benign stages of aberrant tissue growth, to discover novel factors that are specific to risk of adenoma, and to identify the etiologic factors related to the initiation of colorectal cancer. A small number of studies have investigated adenoma risk as a secondary objective, with a recent notable nonstatistically significant discovery in the TP53 polyadenylation signal (14). Other studies have investigated a small number of candidate single nucleotide polymorphisms (SNP) for adenoma risk, usually on the basis of evidence from colorectal cancer studies (15). Here, we present the first genome-wide association study (GWAS) to investigate the genetic determinants of adenoma in a population of patients with European ancestry from the United States.
In this study, we investigated whether common interindividual genetic variation is a determinant for adenoma formation in participants from the Tennessee Colorectal Polyp Study (TCPS) and the Tennessee–Indiana Adenoma Recurrence Study (TIARS). These studies were designed and implemented to investigate lifestyle, genetic, and other environmental factors for influence on risk of adenoma.
Materials and Methods
Study population and data collection
TCPS was a colonoscopy-based case-control study conducted in Nashville, Tennessee from 2003 to 2010. Eligible participants, between 45 and 70 years of age, were identified from patients scheduled for colonoscopy at the Vanderbilt Gastroenterology Clinic and the Veteran's Affairs Tennessee Valley Health System Nashville Campus. Demographic characteristics of all participants are described in Table 1. For the purposes of the association analyses, we only included participants of Caucasian race, although original recruitment for TCPS was from a multiethnic population.
Characteristics of study participants by phase, the Tennessee Colorectal Polyp Study (2003–2010) and Tennessee–Indiana Adenoma Recurrence Study (1996–2006)
. | . | Discovery (N = 1,867) . | . | . | Replication (N = 3,969) . | . | ||
---|---|---|---|---|---|---|---|---|
. | . | Cases . | Controls . | . | . | Cases . | Controls . | . |
Characteristic . | Total . | (N = 958) . | (N = 909) . | P-valuea,d . | Total . | (N = 1,593) . | (N = 2,376) . | P-valuea,d . |
Study population (%) | ||||||||
TCPS | 100 | 100 | 100 | NA | 87.5 | 68.8 | 100 | <0.001 |
TIARS | 0 | 0 | 0 | 12.5 | 31.2 | 0 | ||
Study site (%) | ||||||||
Tennessee | 100 | 100 | 100 | NA | 94.0 | 85 | 100 | <0.001 |
Indiana | 0 | 0 | 0 | 6.0 | 15 | 0 | ||
Adenoma type (%) | ||||||||
Hyperplastic | 0 | 0 | 0 | NA | 13.4 | 33.5 | 0 | NA |
Multiple | 15.9 | 31.1 | 0 | NA | 13.0 | 32.4 | 0 | NA |
Advanced | 14.9 | 29.1 | 0 | NA | 8.5 | 21.2 | 0 | NA |
Multiple and advanced | 5.6 | 10.9 | 0 | NA | 5.1 | 12.7 | 0 | NA |
Age [years, mean (SD)] | 58.5 (7.4) | 59.0 (7.3) | 58.1 (7.5) | <0.001 | 57.1(7.5) | 58.2 (7.1) | 56.6 (7.6) | <0.001 |
Sex (female,%) | 26.5 | 26.3 | 26.6 | 0.89 | 40.6 | 27.9 | 49.2 | <0.001 |
Indications for colonoscopy (%)b | 0.04 | <0.001 | ||||||
Screening | 56.7 | 56.6 | 60.3 | 56.7 | 52.8 | 61.0 | ||
Other | 43.3 | 43.5 | 36.8 | 43.3 | 47.2 | 39.0 | ||
Educational attainment (%)b | 0.021 | <0.001 | ||||||
High school or less | 25.9 | 31.7 | 25.6 | 24.3 | 27.0 | 19.8 | ||
Some college | 25.1 | 25.7 | 26.6 | 26.5 | 28.4 | 23.7 | ||
College graduate | 20.0 | 18.8 | 20.8 | 19.3 | 18.6 | 18.9 | ||
Graduate or professional education | 22.8 | 20.9 | 24.1 | 23.7 | 15.7 | 27.2 | ||
Race (white,%) | 100 | 100 | 100 | NA | 100 | 100 | 100 | NA |
Colorectal cancer family history (%)b | 8.4 | 9.6 | 7.7 | <0.001 | 8.0 | 8.0 | 7.9 | 0.790 |
Regular cigarette smoking (%)2 | 58.4 | 64.9 | 54.1 | <0.001 | 57.7 | 70.3 | 52.0 | <0.001 |
Regular alcohol consumption (%)b | 50.2 | 54.7 | 48.4 | <0.001 | 49.3 | 53.7 | 48.5 | <0.001 |
Body mass index (kg/m2, mean)c | 28.3 | 28.5 | 28.1 | 0.073 | 28.0 | 28.9 | 27.7 | <0.001 |
Regularly exercised (%)b | 51.4 | 51.6 | 52.3 | 0.83 | 52.9 | 45.8 | 54.2 | <0.001 |
NSAID use (%)b | 0.007 | 0.343 | ||||||
Current | 39.1 | 40.0 | 38.7 | 45.1 | 34.3 | 50.0 | ||
Former | 10.9 | 5.0 | 13.6 | 6.2 | 5.7 | 6.4 | ||
Never | 50.0 | 55.0 | 47.7 | 48.7 | 60.0 | 43.6 | ||
Total energy intake (kcal/day, mean)b | 2,330 | 2,333.7 | 2,292.4 | 0.89 | 2,301 | 2,376.9 | 2,190.3 | 0.064 |
. | . | Discovery (N = 1,867) . | . | . | Replication (N = 3,969) . | . | ||
---|---|---|---|---|---|---|---|---|
. | . | Cases . | Controls . | . | . | Cases . | Controls . | . |
Characteristic . | Total . | (N = 958) . | (N = 909) . | P-valuea,d . | Total . | (N = 1,593) . | (N = 2,376) . | P-valuea,d . |
Study population (%) | ||||||||
TCPS | 100 | 100 | 100 | NA | 87.5 | 68.8 | 100 | <0.001 |
TIARS | 0 | 0 | 0 | 12.5 | 31.2 | 0 | ||
Study site (%) | ||||||||
Tennessee | 100 | 100 | 100 | NA | 94.0 | 85 | 100 | <0.001 |
Indiana | 0 | 0 | 0 | 6.0 | 15 | 0 | ||
Adenoma type (%) | ||||||||
Hyperplastic | 0 | 0 | 0 | NA | 13.4 | 33.5 | 0 | NA |
Multiple | 15.9 | 31.1 | 0 | NA | 13.0 | 32.4 | 0 | NA |
Advanced | 14.9 | 29.1 | 0 | NA | 8.5 | 21.2 | 0 | NA |
Multiple and advanced | 5.6 | 10.9 | 0 | NA | 5.1 | 12.7 | 0 | NA |
Age [years, mean (SD)] | 58.5 (7.4) | 59.0 (7.3) | 58.1 (7.5) | <0.001 | 57.1(7.5) | 58.2 (7.1) | 56.6 (7.6) | <0.001 |
Sex (female,%) | 26.5 | 26.3 | 26.6 | 0.89 | 40.6 | 27.9 | 49.2 | <0.001 |
Indications for colonoscopy (%)b | 0.04 | <0.001 | ||||||
Screening | 56.7 | 56.6 | 60.3 | 56.7 | 52.8 | 61.0 | ||
Other | 43.3 | 43.5 | 36.8 | 43.3 | 47.2 | 39.0 | ||
Educational attainment (%)b | 0.021 | <0.001 | ||||||
High school or less | 25.9 | 31.7 | 25.6 | 24.3 | 27.0 | 19.8 | ||
Some college | 25.1 | 25.7 | 26.6 | 26.5 | 28.4 | 23.7 | ||
College graduate | 20.0 | 18.8 | 20.8 | 19.3 | 18.6 | 18.9 | ||
Graduate or professional education | 22.8 | 20.9 | 24.1 | 23.7 | 15.7 | 27.2 | ||
Race (white,%) | 100 | 100 | 100 | NA | 100 | 100 | 100 | NA |
Colorectal cancer family history (%)b | 8.4 | 9.6 | 7.7 | <0.001 | 8.0 | 8.0 | 7.9 | 0.790 |
Regular cigarette smoking (%)2 | 58.4 | 64.9 | 54.1 | <0.001 | 57.7 | 70.3 | 52.0 | <0.001 |
Regular alcohol consumption (%)b | 50.2 | 54.7 | 48.4 | <0.001 | 49.3 | 53.7 | 48.5 | <0.001 |
Body mass index (kg/m2, mean)c | 28.3 | 28.5 | 28.1 | 0.073 | 28.0 | 28.9 | 27.7 | <0.001 |
Regularly exercised (%)b | 51.4 | 51.6 | 52.3 | 0.83 | 52.9 | 45.8 | 54.2 | <0.001 |
NSAID use (%)b | 0.007 | 0.343 | ||||||
Current | 39.1 | 40.0 | 38.7 | 45.1 | 34.3 | 50.0 | ||
Former | 10.9 | 5.0 | 13.6 | 6.2 | 5.7 | 6.4 | ||
Never | 50.0 | 55.0 | 47.7 | 48.7 | 60.0 | 43.6 | ||
Total energy intake (kcal/day, mean)b | 2,330 | 2,333.7 | 2,292.4 | 0.89 | 2,301 | 2,376.9 | 2,190.3 | 0.064 |
aDerived from ANOVA for continuous variables and χ2 test for categorical variables.
bStandardized by age (40–49, 50–59, 60–64, and ≥65 years old) and sex distribution of all study participants.
cStandardized by age distribution (40–49, 50–59, 60–64, and ≥65 years old) of all study participants.
dP value for case-control comparison.
Exclusions from the study included participants who had hereditary colorectal cancer syndromes, a prior history of inflammatory bowel disease, previous adenomatous polyps, or any cancer other than nonmelanoma skin cancer. Among eligible participants, 65% provided informed consent, and subsequently 84% completed telephonic interviews and 75% completed a food frequency questionnaire (FFQ) specifically designed for the southern United States (16). Participants provided DNA either before or after colonoscopy. Participants recruited before colonoscopy were asked to donate a 15 mL blood sample. Blood samples were provided by 5,504 participants. Buccal cell or Oragene kit samples were collected from 1,079 participants who chose not to provide a blood sample or were recruited after colonoscopy. DNA was obtained from blood for 82.9% of participants, and mouthwash buccal samples or Oragene samples for 16.3% of participants. The study was approved by the Vanderbilt University Institutional Review Board, the Veterans' Affairs Tennessee Valley Health System Institutional Review Board, and the Veterans' Affairs Tennessee Valley Health System Research and Development Committee.
Participants were also included as adenoma cases from TIARS, a retrospective cohort study conducted in Nashville, Tennessee, and Indianapolis, Indiana. Eligible participants, between 40 and 75 years of age, were identified from patients diagnosed during colonoscopy with an advanced or multiple adenomas between January, 1996 and December, 2002 at the Vanderbilt Gastroenterology Clinic, Veterans' Affairs Tennessee Valley Health System Nashville campus, Indiana University Hospital, the Richard L. Roudebush Veterans Administration Medical Center, and Wishard Memorial Hospital. Patients who could not speak or understand English, had genetic colorectal cancer syndromes (e.g., hereditary nonpolyposis colorectal cancer or familial adenomatous polyposis) as ascertained by self-report and review of medical records, were participating in an intervention trial to prevent adenoma recurrence, had a previous history of colon resection, inflammatory bowel disease, adenomas, or any cancer other than nonmelanoma skin cancers, or were a current resident in a correctional facility were excluded from TIARS. Overall, 1,643 eligible individuals were identified. Potential participants who were not known to be deceased were contacted first by letter and then by telephone. Six hundred and seventy participants provided written informed consent. The overall participation rate was 62.1%. A standardized telephonic interview was conducted by trained interviewers to obtain information on follow-up examinations, medication use since baseline, demographics, medical history, family history, reproductive history, anthropometry, and lifestyle. Among participants, 706 (63.7%) completed the telephonic interview. Beginning in May 2004, buccal cell samples were collected from participants or a saliva sample was collected using an Oragene kit; 532 participants (48.0%) provided a buccal and/or Oragene sample, of which 497 who were of European ancestry were included in genotyping experiments. This study was approved by the institutional review boards for TCPS, and the Indiana University Institutional Review Board Development Committee.
In both study populations, colonoscopic procedures were carried out and reported using standard clinical protocols by the patient's gastroenterologist. Any identified polyps were removed using biopsy forceps or snare techniques. All pathology diagnoses were determined by hospital pathologists and reported as part of routine care. Data were abstracted from these reports to classify study participants into the following groups: adenomas only, hyperplastic polyps only, presence of both adenomas and hyperplastic polyps, and polyp-free controls. In order to be classified as polyp free, the participant had to have a complete colonoscopy reaching the cecum without the observation of polyps. Participants with at least 2 adenomas were further classified as having multiple adenomas. An advanced adenoma was defined as meeting 1 of the following criteria: (i) size 1 centimeter or more; (ii) tubulovillous or villous; or (iii) high-grade dysplasia.
Genotyping
Initial genotyping was done using the Affymetrix Genome-Wide Human SNP Array 5.0 (Affymetrix, Inc.) to agnostically detect associations with adenoma risk throughout the genome. TCPS patients were selected for genotyping with preference for advanced and multiple adenoma cases, with the remaining cases in the discovery sample having adenoma. Imputation was carried out using IMPUTEv2.2 (17) with reference panels of densely genotyped SNPs from the International HapMap Project Phase 3 data, and data from the 1000 Genomes project for the entire genome. Follow-up genotyping of candidate SNPs where association signals were observed was done using Sequenom iPLEX Gold genotyping (Sequenom, Inc.).
Quality control
Quality control (QC) procedures were carried out on CEL files using the Dynamic Model (DM) algorithm in the Affymetrix Power Tools software package. Genotypes were called in the remaining samples using the BRLMM-P algorithm. The PLINK–sex-check option did not discover any participants whose X-chromosome heterozygosity was inconsistent with their reported sex. Sixteen participants who were first- or second-degree relatives with other study participants were removed from further analysis. One-hundred and sixty-five participants who were missing greater than 5% of their autosomal genotypes were removed from further analysis (Supplementary Fig. S1). The average concordance of genotypes assessed using the PLINK software package within duplicate QC participants was 99.9% (ref. 18; Supplementary Fig. S2). For SNP QC, SNPs were removed if they were missing in greater than 5% of participants, or if the minor allele frequency (MAF) in the samples that passed sample QC was less than 1%. After related and admixed participants were removed, SNPs were removed for major deviations from Hardy–Weinberg equilibrium (HWE), P less than 1 × 10−6. In addition, we attempted to re-genotype 121 case samples for which the GWAS assay failed. Batch effects were observed in association results with a genomic inflation factor of 1.2 when these cases were included, and more stringent QC did not alleviate the issue; therefore, we removed them from further analysis. After sample and SNP QC procedures, 402,326 SNPs remained in 958 adenoma cases and 909 adenoma-free controls (Supplementary Fig. S3). Population stratification was assessed by comparing the study participants to reference panels from the HapMap Phase 3 participants using EIGENSTRAT (19), resulting in the removal of 22 participants with apparent ancestral differences from the rest of the sample (Supplementary Fig. S4).
In the replication phase of the study, SNPs were required to have an imputation quality information score from SNPTEST of at least 0.8, an allele frequency of at least 10%, and an association with adenoma risk with a P value less than 10−4. Within SNPs passing these criteria, 27 candidate SNPs were selected on the basis of statistical significance. These SNPs were genotyped using the Sequenom genotyping system in 2,028 adenoma and/or polyp cases and 3,087 controls. Samples were checked for duplications, and 41 pairs were removed. In addition, participants who did not self-report as Caucasian were removed from this analysis (394 cases, 618 controls). These SNPs were evaluated for concordance among replicate QC participants (99%), missing data greater than 5%, HWE P less than 0.001, and minor allele frequency agreement with Phase 1. all SNPs passed QC checks. The final data for association analysis consisted of 1,593 cases with adenoma and/or polyps and 2,376 controls, including 1,059 adenoma-only cases.
Statistical analysis
Two independent samples of participants from TCPS and TIARS were evaluated for associations between genetic variation and adenoma risk in a 2-stage design. In the discovery phase, genotypes from a GWAS were imputed to the 1000 Genomes and HapMap reference panels. In the replication phase, selected SNPs were genotyped in an independent sample of participants, and results from both phases were combined using meta-analysis.
In data from the GWAS, we assessed the relationship between genetic variation in candidate genes and the risk of colorectal adenoma using the software package SNPTESTv2.2.0 with the “–method score” option, using logistic regression with frequentist tests, and assuming an additive effect of SNP alleles on risk, adjusted for age and sex (20). In genes where there were multiple nominally significant SNPs, we conducted conditional tests of association for the remaining SNPs, adjusting for the most significant SNP, age, and sex. This procedure mitigates the effect of LD-induced significance and helps identify associations at SNPs that are potentially due to LD with independent mutations on distinct haplotypic backgrounds. The SNPs genotyped for replications were evaluated for association with adenoma and/or polyp risk using PLINK with logistic regression, adjusting for age and sex. We also conducted a secondary analysis by removing all cases with only hyperplastic polyps and conducting the replication analysis using the 1,059 adenoma-only cases.
A meta-analysis was conducted combining the results from both phases of the investigation using the software METAL for the combined sample size of 2,551 cases and 3,285 controls (21). In addition, we conducted a secondary meta-analysis using the 2,017 adenoma-only cases by excluding all hyperplastic polyp cases. All reported P values are 2-sided.
Copy number variation analysis
For associated genes with previous evidence of CNVs in tumor tissue [gene for histone deacetylase 9 (HDAC9) and ERICH1] we checked a previous file for the Birdsuite program to determine if common copy number variation (CNV) were present and could be detected in these regions, to evaluate whether such CNVs might also be common standing germ-line variation in human populations. In addition, we downloaded genotypes for 1,092 samples in 2 Mb regions (1 Mb flanking peak SNPs in the regions) from the 1000 Genomes database, extracted any observed CNVs, labeled structural variants in variant call format files, and used PLINK to evaluate LD between peak SNPs and CNVs. We did this for the 4 major world populations (African, American, Asian, and European).
Results
Demographic data
In the discovery phase, cases were significantly older than controls, less educated, were more likely to have a family history of colorectal cancer, were more likely to regularly use tobacco and alcohol, and more likely to be current users of nonsteroidal anti-inflammatory drugs (NSAID; Table 1). In the replication phase, cases were older, less likely to be female, less educated, were more likely to use tobacco and alcohol, had significantly higher body mass index (BMI), and were less likely to exercise. Differences in these associations between phases are mostly attributable to sample size and the presence of more veterans in the discovery phase, as the direction of effects for family history, exercise, and BMI are consistent between phases.
Genetic effects
Results for the most statistically significant regions, identified by SNP P values after meta-analysis of both the discovery and replication phases are presented in Table 2. No P value from these analyses was small enough to conclusively reject the null hypothesis at a level exceeding the multiple-testing corrected threshold for significance. Effect sizes were in the expected range for a GWAS, with meta-estimated effect sizes of OR 1.1 to 1.3. The genomic inflation factor for all GWAS results with an information score more than 0.4 and MAF greater than 0.01 was 1.017, indicating that there were no strong confounders for genetic associations.
Results from tests of association at selected SNPs in the discovery, replication, and combined evidence from both phases using meta-analysis for SNPs with meta-analysis P values lesser than 0.001
Discovery . | Replication . | Discovery + Replication . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Chr . | SNP . | Position . | Gene . | RA/EAb . | OR . | SE . | P . | RA/EAb . | OR . | SE . | P . | RA/EAb . | OR . | SE . | P . | Direction‡ . |
7 | rs1919314 | 18106886 | HDAC9 | A/T | 1.551 | 0.088 | 1.19 × 10−5 | T/A | 0.850 | 0.073 | 0.025 | A/T | 1.315 | 0.056 | 1.05 × 10−6 | ++ |
15 | rs1528827a | 68974838 | LRRC49 | G/A | 1.352 | 0.059 | 1.21 × 10−5 | G/A | 1.084 | 0.048 | 0.095 | A/G | 0.845 | 0.037 | 5.71 × 10−6 | – |
1 | rs478859 | 100568708 | LOC646970 | C/T | 1.307 | 0.062 | 4.45 × 10−5 | C/T | 1.119 | 0.047 | 0.018 | T/C | 0.844 | 0.038 | 5.73 × 10−6 | – |
14 | rs61980016a | 97619168 | N/A | G/T | 0.669 | 0.078 | 4.17 × 10−5 | G/T | 0.915 | 0.067 | 0.185 | T/G | 1.247 | 0.051 | 1.36 × 10−5 | ++ |
15 | rs2955036a | 68972847 | LRRC49 | T/C | 1.347 | 0.063 | 1.44 × 10−5 | T/C | 1.079 | 0.048 | 0.112 | T/C | 1.172 | 0.038 | 3.32 × 10−5 | ++ |
8 | rs1669625a | 647581 | ERICH1 | T/C | 0.710 | 0.081 | 2.06 × 10−5 | T/C | 0.914 | 0.055 | 0.102 | T/C | 0.843 | 0.046 | 1.72 × 10−4 | – |
8 | rs10505477 | 128476625 | POU5F1P1 | A/G | 0.764 | 0.066 | 4.42 × 10−5 | A/G | 0.936 | 0.047 | 0.163 | A/G | 0.873 | 0.039 | 4.36 × 10−4 | – |
Discovery . | Replication . | Discovery + Replication . | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Chr . | SNP . | Position . | Gene . | RA/EAb . | OR . | SE . | P . | RA/EAb . | OR . | SE . | P . | RA/EAb . | OR . | SE . | P . | Direction‡ . |
7 | rs1919314 | 18106886 | HDAC9 | A/T | 1.551 | 0.088 | 1.19 × 10−5 | T/A | 0.850 | 0.073 | 0.025 | A/T | 1.315 | 0.056 | 1.05 × 10−6 | ++ |
15 | rs1528827a | 68974838 | LRRC49 | G/A | 1.352 | 0.059 | 1.21 × 10−5 | G/A | 1.084 | 0.048 | 0.095 | A/G | 0.845 | 0.037 | 5.71 × 10−6 | – |
1 | rs478859 | 100568708 | LOC646970 | C/T | 1.307 | 0.062 | 4.45 × 10−5 | C/T | 1.119 | 0.047 | 0.018 | T/C | 0.844 | 0.038 | 5.73 × 10−6 | – |
14 | rs61980016a | 97619168 | N/A | G/T | 0.669 | 0.078 | 4.17 × 10−5 | G/T | 0.915 | 0.067 | 0.185 | T/G | 1.247 | 0.051 | 1.36 × 10−5 | ++ |
15 | rs2955036a | 68972847 | LRRC49 | T/C | 1.347 | 0.063 | 1.44 × 10−5 | T/C | 1.079 | 0.048 | 0.112 | T/C | 1.172 | 0.038 | 3.32 × 10−5 | ++ |
8 | rs1669625a | 647581 | ERICH1 | T/C | 0.710 | 0.081 | 2.06 × 10−5 | T/C | 0.914 | 0.055 | 0.102 | T/C | 0.843 | 0.046 | 1.72 × 10−4 | – |
8 | rs10505477 | 128476625 | POU5F1P1 | A/G | 0.764 | 0.066 | 4.42 × 10−5 | A/G | 0.936 | 0.047 | 0.163 | A/G | 0.873 | 0.039 | 4.36 × 10−4 | – |
aindicates the SNP was imputed in the discovery stage.
bRA, reference allele; EA, effect allele.
cDirection: whether the effects from the discovery study are consistent with regard to the RA/EA in the meta-analysis.
Referent alleles were assigned at random for analysis of SNP data, because of a lack of a priori reasons for specifying a particular allele at a SNP as referent in GWAS. As a result, effect sizes may be presented as protective, but we only know the magnitude of the association, and not the true direction with regard to population prevalence without making risk estimates from prospective data. We observed nominally significant associations in the discovery GWAS and, upon replication genotyping and analysis in an independent sample of cases and controls, the direction of effect for the featured results in Table 2 were consistent. A complete list of SNPs genotyped in the replication sample and summaries for tests of association are presented in Supplementary Table S1. Results for the secondary analysis of adenoma-only cases and controls are also presented in Supplementary Table S2.
Discussion
In this investigation, we have conducted a GWAS of candidate SNPs, nominated on the basis of statistical significance, minor allele frequency, and imputation quality. Although none of our results for particular SNPs are statistically significant at the canonical GWAS threshold for significance, the most significant results are related to or are already known risk factors in cancer pathways. Here, we present the first GWAS of colorectal adenoma in the literature, and propose that many of the risk factors detected thus far in colorectal cancer genetics research are also risk factors for benign aberrant tissue growth, which is then at increased risk for colorectal cancer. In addition, in secondary analyses, we observed very similar effects and evidence for association when excluding the hyperplastic polyp-only cases, suggesting that the genetic etiologies may be similar for these classes of aberrant colorectal growths.
The SNP rs1919314 near the gene HDAC9 was associated with adenoma risk. HDAC inhibitors have shown promise in the treatment of carcinomas (22, 23). This gene has been previously noted for gain of copy number in uterine leiomyosarcoma using array-based comparative genomic hybridization (24). Leiomyosarcoma are believed to arise from uterine leiomyoma, or fibroids, just as colorectal cancer is believed to arise from adenoma. Exploring CNVs from the 1000 Genomes data within 1 Mb of this SNP indicated that there are no known CNVs with significant LD with the target SNP, although rare copy number gains are difficult to detect through sequencing. Elevated HDAC enzyme activity is associated with cancer risk, including leukemias and solid tumors (25). High HDAC expression may indicate poor prognosis in chronic lymphocytic leukemia (CLL) patients (26). CLL is also often diagnosed in asymptomatic patients and follows a prolonged, benign latency course.
SNPs rs1528827 and rs2955036 in the gene leucine-rich repeat containing 49 (LRRC49) was associated with adenoma risk. Silencing of LRRC49 by promoter hypermethylation has been observed to be common in breast cancer (27). Family-based studies have also found genome-wide significant evidence for linkage of this region with glioma risk (28). This gene lies nearby the SNP rs1549318 that has been significantly associated with fasting proinsulin levels near the gene LARP6 in a region of linkage disequilibrium in European HapMap participants that include LRRC49 and LARP6 (data not shown; ref. 29). The proinsulin-raising allele in LARP6 was associated with lower expression of the gene in adipose tissue. Thereby, this association with adenoma risk may indicate a pathogenic mechanism related to energy availability. In vivo and in vitro models of cancer suggest that insulin resistance resulting from obesity is a significant risk factor for tumor growth. Obesity leads to increases in IGF-1, insulin, and leptin (involved in appetite suppression) and decreases in adiponectin (regulates sugar uptake and fat breakdown; refs. 30, 31). This leads to decreased action by insulin on glucose transport and metabolism and, eventually, insulin resistance. The ratio of leptin to adiponectin is linked to tumor growth, with the highest growth coming from high leptin to adiponectin ratios (32). Insulin, leptin, and adiponectin are directly involved in AKT–mTOR signaling pathways. AKT–mTOR is activated in obese animals and increases risk for and progression of cancer (30, 31).
The SNP rs478859 in the region nearby the gene cell division cycle 14, S (CDC14A) was associated with risk of adenoma. This gene is essential for cell-cycle progression, and this gene may play a role in preparation for DNA replication during the subsequent cell cycle. This gene shares 64% sequence identity with the yeast gene, cdc14, and cdc14 is also a homolog of phosphatase and tensin homolog (PTEN; ref. 33). PTEN negatively regulates the MAPK pathway and has been shown to have tumor suppressor activity (34). In addition, CDC14A is an essential component in a positive feedback loop in which the anaphase-promoting complex (APC) initiates progression into anaphase, by CDC14A dephosphorylation of securin, which allows securin ubiquitination and sister chromatid separation (35). APC function has been implicated in studies of RAS oncogene virulence and it was further observed that RAS mutant cancer cells were sensitive to treatment with APC knockdown snRNAs (36). This relationship suggests that CDC14A could be an upstream regulator of the APC mechanism, which is an important factor for the pathogenesis of Ras-positive cancers.
The SNP rs1669625 in the gene glutamate-rich 1 (ERICH1) was associated with adenoma risk. ERICH1 was observed to have copy number and expression changes in a comparative genomic hybridization study of pancreatic cancer tumors (37). Again, CNV data from the 1000 Genomes project does not indicate there are any common CNVs in LD with this SNP.
The SNP rs10505477 near the gene putative POU domain, class 5, transcription factor 1B (POU5F1B) was also associated with adenoma risk. In GWAS studies, this region at 8q24 has been associated with prostate cancer (38–41), breast cancer (42–44), and this and other SNPs in the region have been associated with colorectal cancer (7–11). Previous reports have also shown nominal associations between rs10505477 and the nearby SNP rs6983267 and adenoma risk (9) and, furthermore, that this SNP disrupts a binding site for transcription factor 7-like 2 (TCF7L2) in an enhancer of the oncogene MYC (45, 46). Subsequent experiments in mice showed that deletion of the MYC enhancer element conferred protection from intestinal tumors in APCmin mouse lines (47), suggesting that this sequence is critical for tumorigenesis. Linkage disequilibrium between rs10505477 and rs6983267 is strong (r2 = 0.93), and rs10505477 was also significantly associated with colorectal cancer (9). TCF7L2 is also a known risk factor for type 2 diabetes (29, 48–51).
Our study had at least 80% statistical power to detect effects in the range between an additive OR of 1.7 for a SNP with a MAF of 0.1 to an OR of 1.37 for a SNP with a MAF of 0.5 in the discovery phase, assuming an effective threshold for significance of 3 × 10−5 (27/1,000,000). For the combined samples in the meta-analysis, we had at least 80% power to detect effects in the range between an OR of 1.47 for a SNP with a MAF of 0.1 to an OR of 1.27 for a SNP with a MAF of 0.5 with a threshold for significance of 5 × 10−8. It is, therefore, unlikely that effects that are very much stronger than these exist for adenoma risk variants, as we would have most likely observed them unless they were not adequately covered by the GWAS and imputation. Our strongest OR was 1.3 for the most significant SNP rs1919314, with MAF of 0.14, which is barely within our detectable range. The other notable effects were in the OR range from 1.1 to 1.2, which are difficult to conclusively associate with the sample size we employed. To our knowledge, TCPS/TIARS is the largest colonoscopy-based case-control study conducted to study adenomas.
This investigation suggests that several genes are related to adenoma risk. Notable in these results are the implication of genes that are closely related to diabetes risk. The relationship between diabetes and cancer is well-known and has a strong biological basis and, here, we note genetic evidence that supports this relationship between diabetes and colorectal adenoma. We also investigated the TP53 region for association with adenoma and found little evidence among our genotyped and imputed SNPs (data not shown), although this does not cast much doubt on those results due to the differences in fidelity from genotyping versus sequencing and the rare frequency of the causal SNP in that study (14). Furthermore, these findings show that many colorectal cancer susceptibility loci influence the risk of benign colorectal adenoma, which suggests that they function in early stages of transformation and may not be sufficient for cells to become tumors. Larger studies of adenoma genetic risk are needed to improve the quality of inferences with regard to the putative risk factors we describe here, and to discover other potentially rare risk factors using next-generation sequencing.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: T.L. Edwards, M.J. Shrubsole, R.M. Ness, W. Zheng
Development of methodology: M.J. Shrubsole, G. Li, R.M. Ness, W. Zheng
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M.J. Shrubsole, Q. Cai, G. Li, D.K. Rex, T.M. Ulbright, H.J. Murff, W.E. Smalley, R.M. Ness, W. Zheng
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): T.L. Edwards, M.J. Shrubsole, Q. Dai, Z. Fu, R. Delahanty, W. Zheng
Writing, review, and/or revision of the manuscript: T.L. Edwards, M.J. Shrubsole, Q. Cai, Q. Dai, D.K. Rex, H.J. Murff, W. Zheng
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): M.J. Shrubsole, R.M. Ness, W. Zheng
Study supervision: M.J. Shrubsole, W. Zheng
Grant Support
This study was financially supported by the National Cancer Institute (grant nos. P50CA95103, R01AT004660, and R01CA121060). Dr. T.L. Edwards is supported by a Vanderbilt Clinical and Translational Research Scholar Award 5KL2RR024977. The TCPS was conducted by the Survey and Biospecimen Shared Resource supported in part by the Vanderbilt–Ingram Cancer Center (P30 CA 68485).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.