Abstract
The role of methylation in pancreatic cancer risk remains unclear. We integrated genome and methylome data to identify CpG sites (CpG) with the genetically predicted methylation to be associated with pancreatic cancer risk. We also studied gene expression to understand the identified associations.
Using genetic data and white blood cell methylation data from 1,595 subjects of European descent, we built genetic models to predict DNA methylation levels. After internal and external validation, we applied prediction models with satisfactory performance to the genetic data of 8,280 pancreatic cancer cases and 6,728 controls of European ancestry to investigate the associations of predicted methylation with pancreatic cancer risk. For associated CpGs, we compared their measured levels in pancreatic tumor versus benign tissue.
We identified 45 CpGs at nine loci showing an association with pancreatic cancer risk, including 15 CpGs showing an association independent from identified risk variants. We observed significant correlations between predicted methylation of 16 of the 45 CpGs and predicted expression of eight adjacent genes, of which six genes showed associations with pancreatic cancer risk. Of the 45 CpGs, we were able to compare measured methylation of 16 in pancreatic tumor versus benign pancreatic tissue. Of them, six showed differentiated methylation.
We identified methylation biomarker candidates associated with pancreatic cancer using genetic instruments and added additional insights into the role of methylation in regulating gene expression in pancreatic cancer development.
A comprehensive study using genetic instruments identifies 45 CpG sites at nine genomic loci for pancreatic cancer risk.
This article is featured in Highlights of This Issue, p. 1979
Introduction
As the most fatal malignancy of all major cancers, pancreatic cancer is the third leading cause of cancer death in the United States with an overall 5-year survival rate of only 9% (1). Furthermore, distinct from other common cancers, the mortality from pancreatic cancer is expected to continue to increase and may develop into the second leading cause of cancer death before 2030 (2). One of the major reasons for the lethality of this disease is that most patients with pancreatic cancer are diagnosed late due to nonspecific symptoms in earlier stages. Unfortunately, up till now, there are no effective screening tests available for pancreatic cancer. Serum CA 19–9 is the only validated biomarker that is clinically used for pancreatic cancer diagnosis in symptomatic patients or for prognostic surveillance in predicting tumor stage or overall survival. However, this biomarker alone cannot serve as an effective screening tool given its unsatisfactory sensitivity (75.5%) and specificity (77.6%), as well as the inferior positive predictive value (0.5%–0.9%; ref. 3). There are urgent needs to identify additional biomarkers for improved risk assessment of pancreatic cancer.
DNA methylation, an important epigenetic modification that regulates gene expression, has been shown to be potentially related to pancreatic cancer. A number of studies evaluating DNA methylation levels in blood or pancreas tissue have identified multiple candidate DNA methylation markers for pancreatic cancer, including methylation at VHL, MYF3, TMS, GPC3, SRBC, HYAL2, ADAMTS1, BNC1, SERPINB5, and B3GALT5 (4–8). However, many of these earlier studies involved a small sample size and only investigated a few CpG sites (CpG), resulting in insufficient statistical power and limited scope for identifying discriminant DNA methylation markers. More importantly, previous studies using a conventional study design would be difficult to establish causality.
It has been increasingly recognized that one potential strategy for reducing several of these limitations is to evaluate the associations of interest using genetic instruments. The genetically determined proportion of DNA methylation levels should be less susceptible to these biases, given the random assortment of alleles from parents to offspring during the production of gametes. Studies have suggested there is high heritability for a large portion of CpGs, and multiple associations have been identified between genetic variants and DNA methylation levels of CpGs (9–12). In a large study with sufficient power, many of the DNA methylation associated genetic variants are likely to serve as strong instrument variables for assessing the association between DNA methylation and pancreatic cancer risk. In this study, we employed such a novel strategy to identify DNA methylation biomarker candidates associated with pancreatic cancer risk.
Besides identifying promising biomarkers, the findings of such a study may also help better understand the etiology of pancreatic cancer. So far, genome-wide association studies (GWAS) have identified 20 independent common susceptibility loci for pancreatic cancer in individuals of European ancestry, however, together these variants can only explain a small proportion of the total risk (13–18). Recent work estimated the heritability of pancreatic cancer to be 21.2% (19). A large proportion of the pancreatic cancer heritability remains unexplained (19). Recently, two large transcriptome-wide association studies (TWAS) of pancreatic cancer were conducted. In these studies 31 candidate susceptibility genes, of which the genetically-predicted expression was associated with pancreatic cancer risk, were identified (20). This study represents another endeavor focusing on studying DNA methylation, the findings of which may contribute to additional understanding of pancreatic cancer genetics. These CpGs may influence pancreatic cancer risk either through regulating expression of pancreatic cancer susceptibility genes or through other mechanisms. In this work, we also studied gene expression aiming to characterize whether some of the identified associated CpGs may influence pancreatic cancer risk through regulating expression of their target genes.
As far as we know, this study is the first large study to evaluate the association between genetically-predicted DNA methylation and pancreatic cancer risk, using data of 8,280 cases and 6,728 controls of European descendants from Pancreatic Cancer Cohort Consortium (PanScan) and Pancreatic Cancer Case-Control Consortium (PanC4). For the identified associated DNA methylation biomarker candidates, we further compared their directly measured levels in pancreatic tumor tissue specimens (n = 18) versus benign pancreatic tissue specimens (n = 18).
Materials and Methods
The overall study design is shown in Fig. 1. First, we developed genetic prediction models for DNA methylation levels by leveraging data of the Framingham Heart Study (FHS). After external validation, we selected DNA methylation models with satisfactory prediction performance for assessing associations of genetically predicted methylation levels with pancreatic cancer risk, by using data of the PanScan/PanC4 consortia which involves 8,280 cases and 6,728 controls. For CpGs showing an association with pancreatic cancer risk, we assessed correlations between their predicted methylation and predicted expression of adjacent genes (PanScan/PanC4), to identify potential target genes of these CpGs. For the identified candidate target genes, we further evaluated associations of their genetically predicted expression with pancreatic cancer risk. For the associated CpGs, we also compared their directly measured levels in pancreatic tumor tissue versus benign pancreatic tissue. Additional description of relevant studies was included in the Supplementary Materials and Methods.
DNA methylation prediction models
Genetic data and white blood cell DNA methylation data of a total of 1,595 unrelated subjects from the FHS Offspring Cohort were used for methylation genetic prediction model building. The detailed information for the datasets and data quality control (QC), has been described elsewhere (21–23). The genetic data were imputed to the Haplotype Reference Consortium reference panel. SNPs with high imputation quality (R2 ≥ 0.8), minor allele frequency (MAF) ≥5%, and those included in the HapMap Phase 2 version and not strand ambiguous were retained. The R package “minfi” was used for the quality control (QC) and normalization of the DNA methylation data (24). For the methylation level at each CpG site, a prediction model was built following the elastic net method (α = 0.50) using in-cis SNPs (flanking a 2 Mb window) with adjustment for age, sex, six cell type composition variables, and top 10 genetic principal components (PC). Ten-fold cross-validation was used to choose the penalty parameter lambda and validate the models internally (25). Performance of established prediction models were also examined externally by using data from Women's Health Initiative (WHI; N = 883), which were downloaded from dbGaP (accession nos. phs001335, phs000675, and phs000315). Identical methods were used for the imputation and QC as it was described for FHS data. DNA methylation data were processed following a similar procedure as for FHS data. We calculated the predicted DNA methylation for each CpG site using the models that were established using FHS data, and then compared the predicted methylation with the measured levels using Spearman correlation. DNA methylation prediction models with both internal and external performance R2 ≥ 0.01 (correlation between predicted and measured DNA methylation level > 0.1) were used for downstream association analyses. This is one of the standard criteria used in TWAS for gene expression (26–28), heritability of which is in similar range to that of DNA methylation in blood (29, 30). Importantly, in our work we aimed to capture the genetically regulated component of DNA methylation levels, and thus it is expected that the model performance R2 will not necessarily always be high for different CpGs. Indeed, the upper limit for such R2 should be the heritability of each CpG. We further excluded CpGs with SNPs within their probes in the Illumina 450K Beadchip because of potential bias for the measurement of DNA methylation levels of such CpGs (31).
Evaluation of the association between genetically predicted DNA methylation levels and pancreatic cancer risk
For evaluating associations of predicted DNA methylation levels with pancreatic cancer risk, we used data of GWAS conducted in PanScan and PanC4. Detailed information on these consortia has been described elsewhere (13–18). For the current analyses, the genetic and covariate data were accessed from dbGaP (dbGaP Study Accession: phs000206.v5.p3 and phs000648.v1.p1). We performed subject and SNP level QC based on guidelines recommended by the consortia (17). Briefly, in PanC4 dataset, we excluded subjects who were related to each other, with missing call rate ≥2%, or with missing information on covariates age and sex; we excluded SNPs with missing call rate ≥2%, positional duplicates, more than two discordant calls in study duplicates, more than one mendelian error in HapMap control trios, Hardy–Weinberg equilibrium (HWE) P < 1 × 10−4, sex difference in allele frequency > 0.2 for autosomes/XY in subjects of European ancestry, and/or sex difference in heterozygosity > 0.3 for autosomes/XY for European ancestry subjects, or with MAF < 0.005. In PanScan datasets, we excluded subjects with sex discordance, related with each other, or with a call rate <94%; we further excluded SNPs with a call rate <94% or HWE P < 1 × 10−7. In our analyses we only retained subjects with genetic ancestry of Europeans evaluated using pancreatic cancer analysis. The genotype data from all sources were imputed together to the Haplotype Reference Consortium reference panel (r1.1 2016; ref. 32) using Minimac3 for imputation and SHAPEIT for prephasing (33, 34), by using the Michigan Imputation Server (https://imputationserver.sph.umich.edu). Only imputed data with an imputation quality of at least 0.3 were retained in the association analyses. The final dataset included 8,280 cases and 6,728 controls.
The S-PrediXcan method (35) was used to evaluate the associations between genetically predicted DNA methylation levels and pancreatic cancer risk, using summary statistics of SNP-pancreatic cancer associations generated with adjustments of age, sex, and top PCs. The Z-score for the association between predicted DNA methylation levels at each CpG and pancreatic cancer risk was estimated on the basis of the formula of:
Here
Potential target genes of associated CpGs
The identified CpGs associated with pancreatic cancer risk were annotated with ANNOVAR (29). To determine potential target genes of these CpGs, we assessed whether genetically predicted DNA methylation levels of these CpGs were significantly correlated with genetically predicted expression of their adjacent genes in 8,280 cases and 6,728 controls of European ancestry included in PanScan I–III and PanC4. We estimated genetically predicted gene expression using prediction models built with data from the Genotype-Tissue Expression (GTEx) project focusing on blood tissue (N = 338). Only gene expression prediction models with R2 ≥ 0.01 were used for the analyses. For genes showing a correlation (P < 0.05), we further assessed whether their genetically predicted expression was significantly associated with pancreatic cancer risk. Finally, we assessed the consistency of the direction of identified associations in the DNA methylation-gene expression-pancreatic cancer risk pathway.
Directly measured levels of associated CpGs in pancreatic tumor tissue specimens versus benign pancreatic tissue specimens
RRBS was performed on DNA extracted from 18 pancreatic tumor tissue specimens and 18 benign pancreatic tissue specimens, as described previously (37). Sequencing was performed using the Illumina HiSeq 2000 in the Mayo Clinic Medical Genome Facility. SAAP-RRBS was used for sequence alignment and methylation extraction (38). We compared the DNA methylation levels of identified associated CpGs in pancreatic tumor tissue specimens versus benign pancreatic tissue specimens. For this exploratory analysis, P < 0.05 was used to determine significant differences.
Data Availability
The datasets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap through dbGaP accession phs000206.v5.p3 and phs000648.v1.p1 for PanScan/PanC4 data, phs000342 and phs000724 for FHS, phs000315, phs000675, and phs001335 for WHI, and phs000424.v6.p1 for GTEx.
Results
DNA methylation prediction models
Using data from the FHS, we were able to establish DNA methylation prediction models for a total of 223,959 CpGs, of which 70,269 showed a prediction performance (R2) ≥ 0.01 in both internal and external validation. Among them, 62,994 CpGs have no SNPs within their probes. The prediction models for these 62,994 CpGs showed similar performance in external and internal validation (Supplementary Fig. S1). The correlation coefficient between R2 in FHS and WHI was 0.95.
Associations between genetically predicted DNA methylation and pancreatic cancer risk
Of the 62,994 CpGs examined, 45 at nine genomic loci showed significant associations with pancreatic cancer risk for their genetically predicted methylation levels after FDR adjustment (Supplementary Fig. S2). Fifteen of the 45 CpGs were located >500 kb away from any risk variant reported in previous GWAS of pancreatic cancer. Positive associations between predicted DNA methylation level and pancreatic cancer risk were observed for cg02871659, cg18279742, cg01554064, cg04520704, cg19586165, cg16557858, cg02944084, and cg20930114; in contrast, inverse associations were identified for cg24483576, cg24520381, cg22833065, cg17288560, cg19439043, cg03013999, and cg15445000. After conditioning on previously identified pancreatic cancer risk variants, associations for all of these 15 CpGs at five novel loci remained largely unchanged (Table 1), suggesting that the identified associations represent novel associations independent of previously identified risk SNPs. On the other hand, for the other 30 identified CpGs located at four known pancreatic cancer risk loci, their associations with pancreatic cancer risk were all significantly attenuated after conditioning on adjacent risk SNPs (Table 2), suggesting that the identified associations may be influenced by the risk SNPs. On the basis of subgroup analyses, the associations of the identified 45 CpGs tended to be robust across different subsets (PanScan I, II, and III; PanScan I and II; PanC4 and PanScan I, II; and PanC4; Supplementary Table S1).
CpG site . | Chr . | Position (build37) . | Number of SNPs used for prediction . | Classification . | R2a . | OR (95% CI)b . | P valuec . | P value after FDR . | Risk SNP adjusted for . | P value after adjusting for risk SNP . |
---|---|---|---|---|---|---|---|---|---|---|
cg20930114 | 2 | 110372285 | 15 | Exonic | 0.02 | 1.94 (1.44–2.61) | 1.28 × 10−5 | 0.019 | rs1486134 | 1.28 × 10−5 |
cg01554064 | 9 | 106855171 | 27 | Upstream | 0.20 | 1.22 (1.12–1.32) | 1.75 × 10−6 | 0.003 | rs505922 | 1.77 × 10−6 |
cg02871659 | 16 | 2014063 | 7 | Intronic | 0.32 | 1.18 (1.09–1.28) | 3.34 × 10−5 | 0.045 | rs7190458 | 3.41 × 10−5 |
cg18279742 | 16 | 2015703 | 46 | Upstream/downstream | 0.21 | 1.20 (1.10–1.30) | 2.89 × 10−5 | 0.040 | rs7190458 | 2.94 × 10−5 |
cg15445000 | 17 | 37608096 | 50 | Upstream | 0.28 | 0.85 (0.80–0.91) | 2.42 × 10−6 | 0.005 | rs4795218 | 1.16 × 10−6 |
cg03013999 | 17 | 37608204 | 21 | Upstream | 0.18 | 0.81 (0.74–0.89) | 4.02 × 10−6 | 0.007 | rs4795218 | 1.63 × 10−6 |
cg19439043 | 17 | 37719913 | 27 | Intergenic | 0.04 | 0.64 (0.53–0.76) | 6.76 × 10−7 | 0.002 | rs4795218 | 2.51 × 10−7 |
cg17288560 | 17 | 37720009 | 18 | Intergenic | 0.05 | 0.62 (0.52–0.75) | 3.41 × 10−7 | 0.001 | rs4795218 | 1.35 × 10−7 |
cg24520381 | 17 | 37784694 | 20 | Intronic | 0.02 | 0.54 (0.43–0.69) | 3.71 × 10−7 | 0.001 | rs4795218 | 1.10 × 10−7 |
cg24483576 | 17 | 37792770 | 13 | UTR3 | 0.03 | 0.51 (0.38–0.68) | 7.31 × 10−6 | 0.012 | rs4795218 | 4.23 × 10−6 |
cg19586165 | 17 | 37814072 | 10 | Exonic | 0.08 | 1.38 (1.19–1.59) | 1.26 × 10−5 | 0.019 | rs4795218 | 2.86 × 10−6 |
cg02944084 | 17 | 37827057 | 22 | Downstream | 0.03 | 1.81 (1.44–2.29) | 5.82 × 10−7 | 0.001 | rs4795218 | 1.47 × 10−7 |
cg16557858 | 17 | 37879740 | 23 | Intronic | 0.06 | 1.47 (1.25–1.74) | 4.98 × 10−6 | 0.009 | rs4795218 | 1.23 × 10−6 |
cg22833065 | 17 | 38095691 | 14 | Intergenic | 0.03 | 0.59 (0.46–0.76) | 3.14 × 10−5 | 0.043 | rs4795218 | 1.86 × 10−5 |
cg04520704 | 22 | 18325160 | 18 | Intronic | 0.08 | 1.36 (1.18–1.57) | 2.63 × 10−5 | 0.038 | rs16986825 | 2.65 × 10−5 |
CpG site . | Chr . | Position (build37) . | Number of SNPs used for prediction . | Classification . | R2a . | OR (95% CI)b . | P valuec . | P value after FDR . | Risk SNP adjusted for . | P value after adjusting for risk SNP . |
---|---|---|---|---|---|---|---|---|---|---|
cg20930114 | 2 | 110372285 | 15 | Exonic | 0.02 | 1.94 (1.44–2.61) | 1.28 × 10−5 | 0.019 | rs1486134 | 1.28 × 10−5 |
cg01554064 | 9 | 106855171 | 27 | Upstream | 0.20 | 1.22 (1.12–1.32) | 1.75 × 10−6 | 0.003 | rs505922 | 1.77 × 10−6 |
cg02871659 | 16 | 2014063 | 7 | Intronic | 0.32 | 1.18 (1.09–1.28) | 3.34 × 10−5 | 0.045 | rs7190458 | 3.41 × 10−5 |
cg18279742 | 16 | 2015703 | 46 | Upstream/downstream | 0.21 | 1.20 (1.10–1.30) | 2.89 × 10−5 | 0.040 | rs7190458 | 2.94 × 10−5 |
cg15445000 | 17 | 37608096 | 50 | Upstream | 0.28 | 0.85 (0.80–0.91) | 2.42 × 10−6 | 0.005 | rs4795218 | 1.16 × 10−6 |
cg03013999 | 17 | 37608204 | 21 | Upstream | 0.18 | 0.81 (0.74–0.89) | 4.02 × 10−6 | 0.007 | rs4795218 | 1.63 × 10−6 |
cg19439043 | 17 | 37719913 | 27 | Intergenic | 0.04 | 0.64 (0.53–0.76) | 6.76 × 10−7 | 0.002 | rs4795218 | 2.51 × 10−7 |
cg17288560 | 17 | 37720009 | 18 | Intergenic | 0.05 | 0.62 (0.52–0.75) | 3.41 × 10−7 | 0.001 | rs4795218 | 1.35 × 10−7 |
cg24520381 | 17 | 37784694 | 20 | Intronic | 0.02 | 0.54 (0.43–0.69) | 3.71 × 10−7 | 0.001 | rs4795218 | 1.10 × 10−7 |
cg24483576 | 17 | 37792770 | 13 | UTR3 | 0.03 | 0.51 (0.38–0.68) | 7.31 × 10−6 | 0.012 | rs4795218 | 4.23 × 10−6 |
cg19586165 | 17 | 37814072 | 10 | Exonic | 0.08 | 1.38 (1.19–1.59) | 1.26 × 10−5 | 0.019 | rs4795218 | 2.86 × 10−6 |
cg02944084 | 17 | 37827057 | 22 | Downstream | 0.03 | 1.81 (1.44–2.29) | 5.82 × 10−7 | 0.001 | rs4795218 | 1.47 × 10−7 |
cg16557858 | 17 | 37879740 | 23 | Intronic | 0.06 | 1.47 (1.25–1.74) | 4.98 × 10−6 | 0.009 | rs4795218 | 1.23 × 10−6 |
cg22833065 | 17 | 38095691 | 14 | Intergenic | 0.03 | 0.59 (0.46–0.76) | 3.14 × 10−5 | 0.043 | rs4795218 | 1.86 × 10−5 |
cg04520704 | 22 | 18325160 | 18 | Intronic | 0.08 | 1.36 (1.18–1.57) | 2.63 × 10−5 | 0.038 | rs16986825 | 2.65 × 10−5 |
aR2: model prediction performance (R2) derived using FHS data.
bOR and CI per one standard deviation increase in genetically predicted DNA methylation.
cP Value: derived from association analyses of 8,282 cases and 6,728 controls; FDR-adjust P ≤ 0.05 considered statistically significant.
CpG site . | Chr . | Position (build37) . | Number of SNPs used for prediction . | Classification . | R2a . | OR (95% CI)b . | P valuec . | P value after FDR . | Risk SNP adjusted for . | P value after adjusting for risk SNP . |
---|---|---|---|---|---|---|---|---|---|---|
cg10015974 | 1 | 199827580 | 87 | Intergenic | 0.13 | 0.80 (0.73–0.87) | 1.28 × 10−7 | 3.84 × 10−4 | rs16986825; rs3790844 | 0.02 |
cg10098523 | 1 | 200002343 | 40 | Intronic | 0.22 | 0.83 (0.78–0.90) | 1.29 × 10−6 | 2.73 × 10−3 | rs16986825; rs3790844 | 0.52 |
cg07926895 | 1 | 200005833 | 24 | Intronic | 0.03 | 0.61 (0.49–0.77) | 1.89 × 10−5 | 2.77 × 10−2 | rs16986825; rs3790844 | 0.32 |
cg17804356 | 1 | 200009927 | 3 | Intronic | 0.01 | 3.38 (2.12–5.39) | 2.81 × 10−7 | 8.05 × 10−4 | rs16986825; rs3790844 | 0.32 |
cg07507801 | 5 | 1291235 | 5 | Intronic | 0.03 | 2.29 (1.66–3.16) | 5.14 × 10−7 | 1.30 × 10−3 | rs2736098; rs35226131; rs401681 | 0.13 |
cg07380026 | 5 | 1296007 | 14 | Upstream | 0.01 | 4.52 (2.97–6.90) | 2.39 × 10−12 | 1.67 × 10−8 | rs2736098; rs35226131; rs401681 | 4.55 × 10−3 |
cg26603275 | 5 | 1298965 | 10 | Intergenic | 0.04 | 2.24 (1.75–2.87) | 1.11 × 10−10 | 6.36 × 10−7 | rs2736098; rs35226131; rs401681 | 0.05 |
cg11624060 | 5 | 1316038 | 25 | Intergenic | 0.18 | 1.28 (1.17–1.40) | 2.49 × 10−8 | 9.23 × 10−5 | rs2736098; rs35226131; rs401681 | 0.93 |
cg26209169 | 5 | 1316264 | 22 | Intergenic | 0.24 | 1.24 (1.15–1.34) | 2.19 × 10−8 | 8.62 × 10−5 | rs2736098; rs35226131; rs401681 | 0.83 |
cg10441424 | 5 | 1316636 | 16 | Intergenic | 0.01 | 2.08 (1.52–2.86) | 5.82 × 10−6 | 1.02 × 10−2 | rs2736098; rs35226131; rs401681 | 0.65 |
cg07493874 | 5 | 1342172 | 11 | Intronic | 0.15 | 0.69 (0.61–0.77) | 8.91 × 10−11 | 5.61 × 10−7 | rs2736098; rs35226131; rs401681 | 0.93 |
cg19915256 | 5 | 1345677 | 11 | Upstream | 0.02 | 2.85 (2.00–4.04) | 5.16 × 10−9 | 2.32 × 10−5 | rs2736098; rs35226131; rs401681 | 0.52 |
cg27028750 | 5 | 1349422 | 20 | Intergenic | 0.25 | 0.79 (0.74–0.85) | 6.59 × 10−10 | 3.46 × 10−6 | rs2736098; rs35226131; rs401681 | 0.43 |
cg03474926 | 9 | 136023407 | 24 | Intronic | 0.01 | 2.72 (1.90–3.89) | 5.18 × 10−8 | 1.72 × 10−4 | rs505922 | 0.36 |
cg01169778 | 9 | 136038690 | 14 | Intronic | 0.04 | 1.98 (1.46–2.68) | 1.04 × 10−5 | 1.62 × 10−2 | rs505922 | 0.13 |
cg14653977 | 9 | 136038692 | 20 | Intronic | 0.03 | 4.27 (3.09–5.89) | 1.12 × 10−18 | 3.53 × 10−14 | rs505922 | 0.08 |
cg13531387 | 9 | 136078657 | 13 | Intergenic | 0.11 | 0.34 (0.25–0.45) | 3.16 × 10−13 | 2.49 × 10−9 | rs505922 | 0.75 |
cg00878953 | 9 | 136129875 | 36 | Downstream | 0.15 | 0.65 (0.54–0.79) | 6.83 × 10−6 | 1.16 × 10−2 | rs505922 | 0.42 |
cg11879188 | 9 | 136149908 | 36 | Intronic | 0.5 | 2.28 (1.84–2.83) | 4.84 × 10−14 | 4.36 × 10−10 | rs505922 | 0.89 |
cg21160290 | 9 | 136149941 | 43 | Intronic | 0.71 | 1.99 (1.69–2.34) | 8.87 × 10−17 | 1.12 × 10−12 | rs505922 | 0.76 |
cg22535403 | 9 | 136150032 | 44 | Intronic | 0.69 | 2.29 (1.89–2.77) | 4.63 × 10−17 | 7.29 × 10−13 | rs505922 | 0.59 |
cg24267699 | 9 | 136151359 | 13 | Upstream | 0.59 | 2.50 (2.07–3.02) | 1.33 × 10−21 | 8.38 × 10−17 | rs505922 | 0.01 |
cg06818865 | 9 | 136151958 | 10 | Intergenic | 0.3 | 1.84 (1.52–2.24) | 8.47 × 10−10 | 4.10 × 10−6 | rs505922 | 0.16 |
cg13660174 | 9 | 136238392 | 19 | Intronic | 0.07 | 1.64 (1.34–2.00) | 1.30 × 10−6 | 2.73 × 10−3 | rs505922 | 0.29 |
cg13568213 | 9 | 136387235 | 16 | Intronic | 0.03 | 7.05 (3.43–14.48) | 1.08 × 10−7 | 3.40 × 10−4 | rs505922 | 0.17 |
cg21101465 | 13 | 28493404 | 20 | Upstream | 0.04 | 0.61 (0.49–0.76) | 9.94 × 10−6 | 1.61 × 10−2 | rs9581943 | 0.06 |
cg11853320 | 13 | 28493913 | 52 | Upstream | 0.08 | 0.69 (0.61–0.79) | 3.88 × 10−8 | 1.36 × 10−4 | rs9581943 | 0.46 |
cg26793256 | 13 | 28494004 | 55 | Upstream | 0.06 | 0.72 (0.62–0.82) | 1.56 × 10−6 | 3.17 × 10−3 | rs9581943 | 0.16 |
cg04633225 | 13 | 28494161 | 22 | Upstream | 0.02 | 0.45 (0.34–0.59) | 1.09 × 10−8 | 4.58 × 10−5 | rs9581943 | 0.06 |
cg11213248 | 13 | 28534648 | 7 | Intergenic | 0.22 | 0.81 (0.75–0.88) | 1.16 × 10−6 | 2.61 × 10−3 | rs9581943 | 2.00 × 10−4 |
CpG site . | Chr . | Position (build37) . | Number of SNPs used for prediction . | Classification . | R2a . | OR (95% CI)b . | P valuec . | P value after FDR . | Risk SNP adjusted for . | P value after adjusting for risk SNP . |
---|---|---|---|---|---|---|---|---|---|---|
cg10015974 | 1 | 199827580 | 87 | Intergenic | 0.13 | 0.80 (0.73–0.87) | 1.28 × 10−7 | 3.84 × 10−4 | rs16986825; rs3790844 | 0.02 |
cg10098523 | 1 | 200002343 | 40 | Intronic | 0.22 | 0.83 (0.78–0.90) | 1.29 × 10−6 | 2.73 × 10−3 | rs16986825; rs3790844 | 0.52 |
cg07926895 | 1 | 200005833 | 24 | Intronic | 0.03 | 0.61 (0.49–0.77) | 1.89 × 10−5 | 2.77 × 10−2 | rs16986825; rs3790844 | 0.32 |
cg17804356 | 1 | 200009927 | 3 | Intronic | 0.01 | 3.38 (2.12–5.39) | 2.81 × 10−7 | 8.05 × 10−4 | rs16986825; rs3790844 | 0.32 |
cg07507801 | 5 | 1291235 | 5 | Intronic | 0.03 | 2.29 (1.66–3.16) | 5.14 × 10−7 | 1.30 × 10−3 | rs2736098; rs35226131; rs401681 | 0.13 |
cg07380026 | 5 | 1296007 | 14 | Upstream | 0.01 | 4.52 (2.97–6.90) | 2.39 × 10−12 | 1.67 × 10−8 | rs2736098; rs35226131; rs401681 | 4.55 × 10−3 |
cg26603275 | 5 | 1298965 | 10 | Intergenic | 0.04 | 2.24 (1.75–2.87) | 1.11 × 10−10 | 6.36 × 10−7 | rs2736098; rs35226131; rs401681 | 0.05 |
cg11624060 | 5 | 1316038 | 25 | Intergenic | 0.18 | 1.28 (1.17–1.40) | 2.49 × 10−8 | 9.23 × 10−5 | rs2736098; rs35226131; rs401681 | 0.93 |
cg26209169 | 5 | 1316264 | 22 | Intergenic | 0.24 | 1.24 (1.15–1.34) | 2.19 × 10−8 | 8.62 × 10−5 | rs2736098; rs35226131; rs401681 | 0.83 |
cg10441424 | 5 | 1316636 | 16 | Intergenic | 0.01 | 2.08 (1.52–2.86) | 5.82 × 10−6 | 1.02 × 10−2 | rs2736098; rs35226131; rs401681 | 0.65 |
cg07493874 | 5 | 1342172 | 11 | Intronic | 0.15 | 0.69 (0.61–0.77) | 8.91 × 10−11 | 5.61 × 10−7 | rs2736098; rs35226131; rs401681 | 0.93 |
cg19915256 | 5 | 1345677 | 11 | Upstream | 0.02 | 2.85 (2.00–4.04) | 5.16 × 10−9 | 2.32 × 10−5 | rs2736098; rs35226131; rs401681 | 0.52 |
cg27028750 | 5 | 1349422 | 20 | Intergenic | 0.25 | 0.79 (0.74–0.85) | 6.59 × 10−10 | 3.46 × 10−6 | rs2736098; rs35226131; rs401681 | 0.43 |
cg03474926 | 9 | 136023407 | 24 | Intronic | 0.01 | 2.72 (1.90–3.89) | 5.18 × 10−8 | 1.72 × 10−4 | rs505922 | 0.36 |
cg01169778 | 9 | 136038690 | 14 | Intronic | 0.04 | 1.98 (1.46–2.68) | 1.04 × 10−5 | 1.62 × 10−2 | rs505922 | 0.13 |
cg14653977 | 9 | 136038692 | 20 | Intronic | 0.03 | 4.27 (3.09–5.89) | 1.12 × 10−18 | 3.53 × 10−14 | rs505922 | 0.08 |
cg13531387 | 9 | 136078657 | 13 | Intergenic | 0.11 | 0.34 (0.25–0.45) | 3.16 × 10−13 | 2.49 × 10−9 | rs505922 | 0.75 |
cg00878953 | 9 | 136129875 | 36 | Downstream | 0.15 | 0.65 (0.54–0.79) | 6.83 × 10−6 | 1.16 × 10−2 | rs505922 | 0.42 |
cg11879188 | 9 | 136149908 | 36 | Intronic | 0.5 | 2.28 (1.84–2.83) | 4.84 × 10−14 | 4.36 × 10−10 | rs505922 | 0.89 |
cg21160290 | 9 | 136149941 | 43 | Intronic | 0.71 | 1.99 (1.69–2.34) | 8.87 × 10−17 | 1.12 × 10−12 | rs505922 | 0.76 |
cg22535403 | 9 | 136150032 | 44 | Intronic | 0.69 | 2.29 (1.89–2.77) | 4.63 × 10−17 | 7.29 × 10−13 | rs505922 | 0.59 |
cg24267699 | 9 | 136151359 | 13 | Upstream | 0.59 | 2.50 (2.07–3.02) | 1.33 × 10−21 | 8.38 × 10−17 | rs505922 | 0.01 |
cg06818865 | 9 | 136151958 | 10 | Intergenic | 0.3 | 1.84 (1.52–2.24) | 8.47 × 10−10 | 4.10 × 10−6 | rs505922 | 0.16 |
cg13660174 | 9 | 136238392 | 19 | Intronic | 0.07 | 1.64 (1.34–2.00) | 1.30 × 10−6 | 2.73 × 10−3 | rs505922 | 0.29 |
cg13568213 | 9 | 136387235 | 16 | Intronic | 0.03 | 7.05 (3.43–14.48) | 1.08 × 10−7 | 3.40 × 10−4 | rs505922 | 0.17 |
cg21101465 | 13 | 28493404 | 20 | Upstream | 0.04 | 0.61 (0.49–0.76) | 9.94 × 10−6 | 1.61 × 10−2 | rs9581943 | 0.06 |
cg11853320 | 13 | 28493913 | 52 | Upstream | 0.08 | 0.69 (0.61–0.79) | 3.88 × 10−8 | 1.36 × 10−4 | rs9581943 | 0.46 |
cg26793256 | 13 | 28494004 | 55 | Upstream | 0.06 | 0.72 (0.62–0.82) | 1.56 × 10−6 | 3.17 × 10−3 | rs9581943 | 0.16 |
cg04633225 | 13 | 28494161 | 22 | Upstream | 0.02 | 0.45 (0.34–0.59) | 1.09 × 10−8 | 4.58 × 10−5 | rs9581943 | 0.06 |
cg11213248 | 13 | 28534648 | 7 | Intergenic | 0.22 | 0.81 (0.75–0.88) | 1.16 × 10−6 | 2.61 × 10−3 | rs9581943 | 2.00 × 10−4 |
aR2: model prediction performance (R2) derived using FHS data.
bOR and CI per one standard deviation increase in genetically predicted DNA methylation.
cP value: derived from association analyses of 8,282 cases and 6,728 controls; FDR-adjust P ≤ 0.05 considered statistically significant.
Candidate target genes of associated CpGs
For the 45 CpGs associated with pancreatic cancer risk, ANNOVAR annotation suggested 32 adjacent genes. Of them, we were able to build blood tissue gene expression prediction models with R2 ≥ 0.01 for nine (RPS2, STARD3, GBGT1, ABO, SURF6, ERBB2, ORMDL3, SNHG9, SOWAHC). We further assessed Spearman rank correlations for 17 pairs of CpG site-gene for their genetically predicted levels of DNA methylation and gene expression, respectively (Supplementary Table S2). For all genes except for STARD3, we observed significant (P < 0.001) correlations (Supplementary Table S2).
Associations of predicted expression of candidate target genes with pancreatic cancer risk
Of these eight genes showing significant correlations, six further showed a significant association with pancreatic cancer risk for their genetically predicted expression levels, namely, ABO (P = 6.72 × 10−12), RPS2 (P = 3.48 × 10−5), SURF6 (P = 8.47 × 10−3), ORMDL3 (P = 2.58 × 10−4), SNHG9 (P = 1.15 × 10−2), and SOWAHC (P = 8.30 × 10−4). Overall, a total of 12 CpGs with six genes showed significant associations in each pair of the relationships in the DNA methylation-gene expression-pancreatic cancer risk pathway. Encouragingly, all these associations showed consistent directions. Taken the CpG site cg24267699 located upstream of ABO as an example, its genetically predicted DNA methylation showed a positive association with pancreatic cancer risk (OR = 2.50; P = 1.33 × 10−21). Meanwhile, we observed an inverse correlation between the genetically predicted DNA methylation level of cg24267699 and predicted expression of ABO (correlation coefficient = −0.62; P < 0.001), as well as an inverse association between predicted expression of ABO and pancreatic cancer risk (OR = 0.89, P = 6.72 × 10−12; Table 3; Supplementary Tables S2 and S3; Supplementary Fig. S3). Consistent three-way associations were also observed for CpGs and five other genes (RPS2, SURF6, ORMDL3, SNHG9, and SOWAHC), which have not been previously reported as pancreatic cancer susceptibility genes in GWAS or TWAS.
. | . | . | . | . | DNA methylation and pancreatic cancer risk . | DNA methylation and gene expression . | Gene expression and pancreatic cancer risk . | |||
---|---|---|---|---|---|---|---|---|---|---|
CpG site . | Chr . | Position . | Associated gene . | Classification . | OR . | P value . | Correlation coefficient . | Correlation P value . | OR . | P value . |
cg20930114 | 2 | 110372285 | SOWAHC | Exonic | 1.94 | 1.28 × 10−5 | −0.516 | <0.001 | 0.64 | 8.30 × 10−4 |
cg00878953 | 9 | 136129875 | ABO | Downstream | 0.65 | 6.83 × 10−6 | 0.420 | <0.001 | 0.49 | 6.72 × 10−12 |
cg11879188 | 9 | 136149908 | Intronic | 2.28 | 4.84 × 10−14 | −0.350 | <0.001 | |||
cg21160290 | 9 | 136149941 | Intronic | 1.99 | 8.87 × 10−17 | −0.344 | <0.001 | |||
cg22535403 | 9 | 136150032 | Intronic | 2.29 | 4.63 × 10−17 | −0.369 | <0.001 | |||
cg24267699 | 9 | 136151359 | Upstream | 2.50 | 1.33 × 10−21 | −0.620 | <0.001 | |||
cg06818865a | 9 | 136151958 | Intergenic | 1.84 | 8.47 × 10−10 | −0.423 | <0.001 | |||
cg06818865a | 9 | 136151958 | SURF6 | Intergenic | 1.84 | 8.47 × 10−10 | −0.323 | <0.001 | 0.91 | 8.47 × 10−3 |
cg02871659 | 16 | 2014063 | RPS2 | Intronic | 1.18 | 3.34 × 10−5 | −0.742 | <0.001 | 0.64 | 3.48 × 10−5 |
cg18279742 | 16 | 2015703 | Upstream | 1.20 | 2.89 × 10−5 | −0.739 | <0.001 | |||
cg18279742 | 16 | 2015703 | SNHG9 | Downstream | 1.20 | 2.89 × 10−5 | 0.305 | <0.001 | 1.10 | 1.15 × 10−2 |
cg22833065 | 17 | 38095691 | ORMDL3 | Downstream | 0.59 | 3.14 × 10−5 | −0.831 | <0.001 | 1.15 | 2.58 × 10−4 |
. | . | . | . | . | DNA methylation and pancreatic cancer risk . | DNA methylation and gene expression . | Gene expression and pancreatic cancer risk . | |||
---|---|---|---|---|---|---|---|---|---|---|
CpG site . | Chr . | Position . | Associated gene . | Classification . | OR . | P value . | Correlation coefficient . | Correlation P value . | OR . | P value . |
cg20930114 | 2 | 110372285 | SOWAHC | Exonic | 1.94 | 1.28 × 10−5 | −0.516 | <0.001 | 0.64 | 8.30 × 10−4 |
cg00878953 | 9 | 136129875 | ABO | Downstream | 0.65 | 6.83 × 10−6 | 0.420 | <0.001 | 0.49 | 6.72 × 10−12 |
cg11879188 | 9 | 136149908 | Intronic | 2.28 | 4.84 × 10−14 | −0.350 | <0.001 | |||
cg21160290 | 9 | 136149941 | Intronic | 1.99 | 8.87 × 10−17 | −0.344 | <0.001 | |||
cg22535403 | 9 | 136150032 | Intronic | 2.29 | 4.63 × 10−17 | −0.369 | <0.001 | |||
cg24267699 | 9 | 136151359 | Upstream | 2.50 | 1.33 × 10−21 | −0.620 | <0.001 | |||
cg06818865a | 9 | 136151958 | Intergenic | 1.84 | 8.47 × 10−10 | −0.423 | <0.001 | |||
cg06818865a | 9 | 136151958 | SURF6 | Intergenic | 1.84 | 8.47 × 10−10 | −0.323 | <0.001 | 0.91 | 8.47 × 10−3 |
cg02871659 | 16 | 2014063 | RPS2 | Intronic | 1.18 | 3.34 × 10−5 | −0.742 | <0.001 | 0.64 | 3.48 × 10−5 |
cg18279742 | 16 | 2015703 | Upstream | 1.20 | 2.89 × 10−5 | −0.739 | <0.001 | |||
cg18279742 | 16 | 2015703 | SNHG9 | Downstream | 1.20 | 2.89 × 10−5 | 0.305 | <0.001 | 1.10 | 1.15 × 10−2 |
cg22833065 | 17 | 38095691 | ORMDL3 | Downstream | 0.59 | 3.14 × 10−5 | −0.831 | <0.001 | 1.15 | 2.58 × 10−4 |
aThe same CpG site was annotated to two different genes.
Directly measured levels of associated CpGs in pancreatic tumor tissue versus benign pancreatic tissue
Of the 45 CpGs, 16 were directly captured in the Reduced representation bisulfite sequencing (RRBS) of 18 pancreatic tumor tissue specimens and 18 benign pancreatic tissue specimens. Of them, significances of levels of two CpGs (cg04520704 and cg04633225) in tumor versus benign tissues could not be determined. Among the others, six demonstrated significant different levels in pancreatic tumor tissue versus benign pancreatic tissue (Table 4). Encouragingly, the effect directions for all of them are consistent with findings from analyses using genetic instruments (Table 4).
CpG site . | Chr . | Position . | Direction of association between genetically predicted levels and pancreatic cancer risk . | Average levels in benign pancreatic tissue . | Standard deviation of levels in benign pancreatic tissue . | Average levels in pancreatic tumor tissue . | Standard deviation of levels in pancreatic tumor tissue . | P value comparing levels in pancreas tumor versus benign tissue . |
---|---|---|---|---|---|---|---|---|
cg17804356 | 1 | 200009927 | + | 0.02 | 0.04 | 0.12 | 0.15 | <0.0004 |
cg20930114 | 2 | 110372285 | + | 0.005 | 0.02 | 0.04 | 0.05 | 0.0004 |
cg07380026 | 5 | 1296007 | + | 0.24 | 0.18 | 0.54 | 0.20 | <0.0004 |
cg01169778 | 9 | 136038690 | + | 0.23 | 0.11 | 0.46 | 0.29 | 0.01 |
cg22535403 | 9 | 136150032 | + | 0.35 | 0.21 | 0.48 | 0.28 | 0.05 |
cg21101465 | 13 | 28493404 | − | 0.36 | 0.19 | 0.27 | 0.22 | 0.02 |
CpG site . | Chr . | Position . | Direction of association between genetically predicted levels and pancreatic cancer risk . | Average levels in benign pancreatic tissue . | Standard deviation of levels in benign pancreatic tissue . | Average levels in pancreatic tumor tissue . | Standard deviation of levels in pancreatic tumor tissue . | P value comparing levels in pancreas tumor versus benign tissue . |
---|---|---|---|---|---|---|---|---|
cg17804356 | 1 | 200009927 | + | 0.02 | 0.04 | 0.12 | 0.15 | <0.0004 |
cg20930114 | 2 | 110372285 | + | 0.005 | 0.02 | 0.04 | 0.05 | 0.0004 |
cg07380026 | 5 | 1296007 | + | 0.24 | 0.18 | 0.54 | 0.20 | <0.0004 |
cg01169778 | 9 | 136038690 | + | 0.23 | 0.11 | 0.46 | 0.29 | 0.01 |
cg22535403 | 9 | 136150032 | + | 0.35 | 0.21 | 0.48 | 0.28 | 0.05 |
cg21101465 | 13 | 28493404 | − | 0.36 | 0.19 | 0.27 | 0.22 | 0.02 |
Discussion
This study is by far the first large-scale study that evaluated the relationship between genetically predicted DNA methylation levels and pancreatic cancer risk. We identified 45 CpGs of which the predicted DNA methylation levels showed significant associations with pancreatic cancer risk at FDR < 0.05, including 15 CpGs located at five novel loci that have not been reported in previous GWAS. For the remaining 30 CpGs located at four known pancreatic cancer risk loci, the observed associations were substantially attenuated after adjusting for GWAS-identified risk SNPs, implying that the associations may be at least partly due to the reported risk SNPs. We found consistent direction of associations in the DNA methylation-gene expression-pancreatic cancer risk pathway for 12 CpGs with six genes. Our findings were further supported with the evidence from differentiated DNA methylation at six CpGs for their directly measured levels observed in pancreatic tumor versus benign tissue. Our study identified novel methylation biomarker candidates for pancreatic cancer, as well as provided new information in understanding etiology of pancreatic cancer, a highly lethal malignancy.
Of the 45 identified associated CpGs, we were able to assess correlations between genetically predicted DNA methylation and gene expression levels for 17 CpGs with nine adjacent genes. Among the examined correlations, except for the one between cg19586165 and STARD3, all others were statistically significant. The possible speculation for the insignificant correlation suggested that the most proximal gene of cg19586165, STARD3, might not be the actual target gene. Additional strategies beyond the scope of simple statistical correlations are needed to verify its actual target gene. Of the eight linked genes correlated with predicted DNA methylation of the identified CpGs, six (ABO, RPS2, SURF6, ORMDL3, SNHG9, and SOWAHC) demonstrated significant associations with pancreatic cancer risk for their predicted expression. Among them, The ABO blood group gene located at 9q34.2 has already been implicated as a potential target gene of pancreatic cancer risk SNPs from previous GWAS and TWAS (17, 20). Genotype-inferred non-O blood type was consistently suggested to be associated with an increased risk of pancreatic cancer compared with other blood types, which may be partly explained by differentiated expression of blood group antigens, or alterations in the systemic inflammatory state (39). SURF6 has been previously suggested as a potential pancreatic cancer biomarker, as indicated by a study comparing its expression level in malignant pancreatic cells to that in normal pancreatic duct cells or human papillomavirus-immortalized pancreatic duct epithelial cells (40). A higher expression level of SNHG9, a noncoding RNA, has been identified as a novel prognostic markers for pancreatic cancer (41). To the best of our knowledge, our study is the first one implicating potential link between this gene and pancreatic cancer risk. Further functional studies are needed to better understand potential regulatory effects of the identified CpGs on expression of the genes, and link between expression of the genes and pancreatic cancer.
In this study we systematically assessed relationships between genetically predicted DNA methylation in blood, genetically predicted expression for putatively target genes in blood, and pancreatic cancer risk. For our analyses using genetic instruments we used data generated from white blood cells rather than pancreatic tissue for several reasons. First, it is very challenging to acquire a large sample of pancreatic tissue from healthy subjects without pancreatic cancer. Information from pancreatic tumor-adjacent normal tissue would be less desirable, due to potential influence of somatic alterations on DNA methylation. Furthermore, findings of biomarkers identified in a study design using data from white blood cell samples may confer more translational and practical utilities for future risk assessment of pancreatic cancer, compared with biomarkers in pancreas tissue as it is impractical to obtain pancreas tissue from healthy subjects. We also acknowledge that compared with pancreas specimens, a study focusing on blood samples may not be ideal for pinpointing the underlying etiology of pancreatic cancer development given possible tissue-specific DNA methylation patterns. However, it is also worth noting that, high concordance for the genetically regulated component of DNA methylation cross several tissue types has been reported for a large number of CpGs (10, 42). In this study, we have compared the directly measured levels of a proportion of identified associated CpGs in pancreatic tumor tissue versus benign pancreatic tissue. It is worth noting that for this comparison, the overall DNA methylation levels influenced by both genetic and non-genetic factors were assessed, which is different from the analyses focusing on genetic instruments, in which case only genetically regulated components of DNA methylation levels were evaluated. Although the involved sample size is relatively small (18 vs. 18), we were still able to observe significant differences for six of the CpGs among the limited associated CpGs that were captured in our measurement using RRBS. Unlike The Cancer Genome Atlas (TCGA) study, in which only methylation of pancreatic tumor and tumor-adjacent normal tissue from patients with pancreatic cancer are available, in our comparison the control group focuses on histologically normal pancreas tissue from subjects without pancreatic cancer, thus representing a better design compared with other datasets such as TCGA.
Our study has several strengths. First, we used datasets with relatively large sample sizes for both methylation prediction model building (N = 1,595) and main association analyses for pancreatic cancer risk (8,280 cases and 6,728 controls), which enabled us to conduct a well powered assessment of the DNA methylation-pancreatic cancer risk associations. Second, our innovative study design of using genetic instruments to predict DNA methylation decreased several biases that are commonly embedded in traditional epidemiological studies, such as residual confounding and reverse causality. In addition, by integrating multi-omics data of DNA methylation and gene expression from various resources, we were able to further verify our findings by examining the consistency of the associations in the DNA methylation-gene expression-pancreatic cancer risk pathway for the identified significant CpGs, which may further contribute to potential etiologic understanding of pancreatic cancer. The performance of our developed models were externally validated in an independent WHI dataset, which uses different genotyping platforms (Illumina vs. Affymatrix used in FHS dataset), supporting the utility of our prediction models across platforms. Finally, besides evidence from analyses using genetic instruments, we found additional evidence for some of the identified CpGs using their directly measured levels in pancreatic tissue, further supporting relevance of the identified CpGs with pancreatic cancer. Although the sample size for this analysis is relatively small, it is worth noting that our study comparing tissue samples of PC cases and non-pancreatic cancer controls could well overcome the potential limitation of many other studies (e.g., The Cancer Genome Atlas) comparing tumor samples of cases and tumor-adjacent normal tissue samples of cases.
Several potential limitations need to be acknowledged for appropriate interpretation of our findings. First, the associated CpGs identified in this study do not necessarily imply their causal role in pancreatic cancer. Similar to TWAS, although our findings will be useful for prioritizing candidate DNA methylation biomarkers, false positive findings could exist for some of the identified associations (43). There are several potential reasons for this, such as correlated DNA methylation across individuals, correlated predicted DNA methylation, as well as shared variants (43). In our study, multiple identified CpGs locate at the same loci. Future functional investigation will better characterize whether the identified CpGs play a causal role in pancreatic tumorigenesis. Second, during the DNA methylation genetic prediction model building, due to a lack of data, we were not able to incorporate additional variables, including established pancreatic cancer risk factors, such as smoking, alcohol drinking, body mass index, diabetes status, etc., for adjustments. Future work for developing DNA methylation genetic prediction models after adjusting for these additional variables are warranted to validate our findings. Third, although we were able to show that a proportion of the pancreatic cancer-associated CpGs, we identified demonstrated differential levels in pancreatic tumor versus benign tissue, further work directly comparing DNA methylation levels of these CpGs in prediagnosed blood of pancreatic cancer cases and controls are warranted to further validate our findings. Fourth, it is worth noting that the PanScan III data on dbGaP only contained data for cases but not for controls. In this analysis for improving statistical power, we included cases of PanScan III in the analyses. Previous work suggested that imputation of datasets genotyped by different platforms before merging could generate slightly more SNPs than imputations after combining the datasets together (44). In this study, we merged genotyped data across cases and controls of PanScan I, II, III along with PanC4 and then imputed the data together. Although the design of incorporating data of cases only in PanScan III could be of potentially concerning, we carefully compared the association results in different subgroups (Supplementary Table S1), and the estimates are quite robust, suggesting that this is a less concerning issue and our design should be appropriate. Finally, in this study, we evaluated ANNOVAR annotated genes as candidate target genes of associated CpG sites for correlation analysis. With the recognized chromatin interaction and long-range regulation of gene expression in the human genome, it is possible that for some CpGs the target genes may not necessarily the nearest genes. Further work is warranted to better characterize potential target genes of our identified CpGs using other approaches beyond simply statistical correlation analyses.
In summary, in a large-scale study, we identified 45 CpGs showing significant associations with pancreatic cancer risk for their genetically predicted DNA methylation, including 15 at five novel loci showing an association independent from known risk variants. We further observed consistent directions of associations in the DNA methylation-gene expression-pancreatic cancer risk pathway. We found differentiated DNA methylation at six of the identified CpGs for their measured levels in pancreatic tumor versus benign tissue. The pancreatic cancer risk associated CpGs identified in this study could be investigated in future studies with direct measurement of circulating DNA methylation levels for examining potential utility in pancreatic cancer risk assessment.
Disclaimer
The funding organization had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Authors' Disclosures
J.B. Kisiel reports grants from Exact Sciences during the conduct of the study; also has a patent 10704107 issued and licensed to Exact Sciences. D.W. Mahoney reports a patent for 10704107 Detecting Gastrointestinal Neoplasms with royalties paid from Exact Sciences; and Mayo Clinic and Exact Sciences own intellectual property under which D.W. Mahoney is listed as an inventor and may receive royalties through a contracted services agreement between Mayo Clinic and Exact Sciences. W.R. Taylor reports other support from Exact Sciences during the conduct of the study; other support from Exact Sciences outside the submitted work; also has a patent 10704107 issued, licensed, and with royalties paid from Exact Sciences. X.-O. Shu reports grants from NCI during the conduct of the study. L. Wu reports grants from NIH during the conduct of the study. No disclosures were reported by the other authors.
Authors' Contributions
J. Zhu: Data curation, formal analysis, investigation, methodology, writing–original draft. Y. Yang: Data curation, formal analysis, methodology, writing–original draft. J.B. Kisiel: Data curation, funding acquisition, validation, writing–review and editing. D.W. Mahoney: Validation, investigation, writing–review and editing. D.S. Michaud: Investigation, writing–review and editing. X. Guo: Data curation, writing–review and editing. W.R. Taylor: Formal analysis, writing–review and editing. X.-O. Shu: Writing–review and editing. X. Shu: Data curation, writing–review and editing. D. Liu: Writing–review and editing. B. Li: Writing–review and editing. R. Tao: Writing–review and editing. Q. Cai: Writing–review and editing. W. Zheng: Writing–review and editing. J. Long: Conceptualization, resources, supervision, investigation, methodology, writing–review and editing. L. Wu: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, investigation, methodology, writing–original draft, project administration.
Acknowledgments
The authors would like to thank all of the individuals for their participation in the parent studies of PanScan/PanC4 consortia and all the researchers, clinicians, technicians, and administrative staff for their contribution to the studies. L. Wu is supported by the University of Hawaii Cancer Center, and NCI R00 CA218892. D. Liu is partially supported by the Harbin Medical University Cancer Hospital. Data on CpG positions in the independent case and control tissues was funded in part by Exact Sciences (Madison, WI). The PanScan study was funded in whole or in part with federal funds from the NCI, US NIH under contract number HHSN261200800001E. Additional support was received from NIH/NCI K07 CA140790, the American Society of Clinical Oncology Conquer Cancer Foundation, the Howard Hughes Medical Institute, the Lustgarten Foundation, the Robert T. and Judith B. Hale Fund for Pancreatic Cancer Research and Promises for Purple. A full list of acknowledgments for each participating study is provided in the Supplementary Note of the manuscript with PubMed ID: 25086665. For the PanC4 GWAS study, the patients and controls were derived from the following PANC4 studies: Johns Hopkins National Familial Pancreas Tumor Registry, Mayo Clinic Biospecimen Resource for Pancreas Research, Ontario Pancreas Cancer Study (OPCS), Yale University, MD Anderson Case Control Study, Queensland Pancreatic Cancer Study, University of California San Francisco Molecular Epidemiology of Pancreatic Cancer Study, International Agency of Cancer Research and Memorial Sloan Kettering Cancer Center. This work is supported by NCI R01CA154823. Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the NIH to Johns Hopkins University, contract number HHSN2682011000111.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.