Abstract
Age-related epigenetic dysregulations are associated with several diseases, including cancer. The number of stochastic epigenetic mutations (SEM) has been suggested as a biomarker of life-course accumulation of exposure-related DNA damage; however, the predictive role of SEMs in cancer has seldom been investigated.
A SEM, at a given CpG site, was defined as an extreme outlier of DNA methylation value distribution across individuals. We investigated the association of the total number of SEMs with the risk of eight cancers in 4,497 case–control pairs nested in three prospective cohorts. Furthermore, we investigated whether SEMs were randomly distributed across the genome or enriched in functional genomic regions.
In the three-study meta-analysis, the estimated ORs per one-unit increase in log(SEM) from logistic regression models adjusted for age and cancer risk factors were 1.25; 95% confidence interval (CI), 1.11–1.41 for breast cancer, and 1.23; 95% CI, 1.07–1.42 for lung cancer. In the Melbourne Collaborative Cohort Study, the OR for mature B-cell neoplasm was 1.46; 95% CI, 1.25–1.71. Enrichment analyses indicated that SEMs frequently occur in silenced genomic regions and in transcription factor binding sites regulated by EZH2 and SUZ12 (P < 0.0001 and P = 0.0005, respectively): two components of the polycomb repressive complex 2 (PCR2). Finally, we showed that PCR2-specific SEMs are generally more stable over time compared with SEMs occurring in the whole genome.
The number of SEMs is associated with a higher risk of different cancers in prediagnostic blood samples.
We identified a candidate biomarker for cancer early detection, and we described a carcinogenesis mechanism involving PCR2 complex proteins worthy of further investigations.
This article is featured in Highlights of This Issue, p. 1841
Introduction
The concept of “life-course accumulation of exposures” and related damage has been proposed to explain the decline of physiologic functioning and the consequent increased disease morbidity and mortality during aging (1). The accumulation of environmental, socioeconomic, and behavioral exposures may cause long-term damage, which may be amplified by a decreased ability to repair damage as the body ages (1). For most diseases, including cancer, age is an important risk factor and the incidence of most malignancies increases exponentially with age (2).
Basic research, combined with the increasing capacity of large-scale technologies including “omics” measurements, has led to the formulation of exposure-driven models of carcinogenesis (3) in which functional changes in gene regulation and genomic mutations reflect the life-course accumulation of exposure-related DNA damage. The accumulation over time of somatic mutations in specific genes may lead to cancer development, but recent studies demonstrated that this molecular mechanism alone is not sufficient (4, 5).
Two mechanisms contribute to age-related DNA methylation changes: the “epigenetic drift” (6) and the “epigenetic clock” (7). Although both are related to aging, the “epigenetic clock” refers to specific CpG sites at which DNA methylation levels steadily increase or decrease with age and thus can be used to predict chronologic age with high accuracy (7). The concept of epigenetic age acceleration (AA) has been introduced as the difference between predicted DNA methylation age and the chronologic age (7, 8). Epigenetic AA may be a good biomarker of biological aging as it has been associated with longevity (9–12), several pathologic conditions (13, 14), noncommunicable disease risk factors like obesity (15), poor physical activity (16), and low socioeconomic status (17), cancer risk, and shorter cancer survival (10). Recent literature discerns Horvath's (7) and Hannum and colleagues' (8) “first-generation clocks” from DNAmPhenoAge (18) and DNAmGrimAge (19), called the “next-generation clocks,” with the latest being trained not only on age, but also on a complex set of biomarkers which in turn are associated with individual health status and mortality. Early findings indicate that the next-generation clocks may be capturing important aspects of the accelerated biological aging. In a recent critique of the epigenetic clocks, Dugué and colleagues cautioned that early studies generally report stronger associations than later studies and are more likely to be affected by publication bias (20). In contrast, the “epigenetic drift” is a mechanism that involves the whole genome, suggesting a global dysregulation of DNA methylation patterns with age (21). Two critical aspects of the epigenetic drift are genomic instability and chromatin deterioration during aging, which lead to an accumulation of epigenetic mutations (also known as “epimutations,” i.e., changes in gene activity not involving DNA mutations but rather gain or loss of DNA methyl groups, which are conserved in cells during mitosis; ref. 22). A higher number of stochastic epigenetic mutations (SEM) across the genome has been associated with risk factors such as cigarette smoking, alcohol intake (22), and exposure to toxicants (23). We recently reported several associations between lifestyle-related variables and the number of SEMs (24). Moreover, more SEMs may be associated with skewed X chromosome inactivation in women and with hepatocellular carcinoma tumor stage (25), suggesting a possible role of SEMs in other age-related diseases. It is important to specify the meaning of the term “epimutation”, although some authors have used this term in a broader sense (26), including epigenetic changes driven by DNA mutations, hereby, we refer to “epimutation” as a switch of the “epigenetic state” not due to underlying DNA sequence variations but due to gain or loss of DNA methylation.
In this study, we investigated the associations between the number of SEMs across the genome and the risk of eight malignancies [breast, colorectal, lung, gastric, prostate, and kidney cancers, as well as urothelial cell carcinoma (UCC) and mature B-cell neoplasms (MBCN)] in 4,497 case–controls pairs, matched on age and other relevant variables, nested within three large prospective cohorts from Italy [the Italian part of the European Prospective Investigation into Cancer and Nutrition (EPIC) Study], Australia [the Melbourne Collaborative Cohort Study (MCCS)], and Norway [the Norwegian Women and Cancer Study (NOWAC)]. This is the first prospective case–control study to assess the association between SEMs and cancer in blood-derived DNA. The evaluation of whole-blood DNA methylation as a cancer risk marker is of interest because blood is a convenient tissue to assay for constitutional methylation and its collection is noninvasive. Before this study, Wang and colleagues investigated such relationship in blood samples of 375 individuals in old ages (27), whereas Teschendorff and colleagues (28) in cancer cells. We also investigated the biomolecular mechanisms linking aging, DNA methylation patterns, and the risk of different cancers by analyzing the genome-wide distribution of epimutations, to identify functional genomic regions enriched in SEMs, and to describe a possible biomolecular mechanism of carcinogenesis.
Materials and Methods
Study sample
EPIC Italy, MCCS, and NOWAC are prospective cohort studies with demographic and lifestyle variables and blood samples collected from participants at recruitment. For each cohort, blood samples from a subset of subjects were previously selected for DNA methylation analyses in nested case–control studies, using the incidence density sampling method for case–control matching (10, 29–31). A total of 4,497 case–control matched pairs were included in this study (Table 1). Details of participant recruitment, relevant covariate acquisition, matching parameters, average absolute differences in age between cases and controls, and the time from recruitment to cancer diagnosis [named time to disease (TTD)] by study are reported in the Supplementary Materials and Methods.
. | EPIC Italya . | MCCSb . | NOWACc . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | (n = 1,112) . | (n = 7,250) . | (n = 632) . | ||||||
. | Controls . | Cases . | Pd . | Controls . | Cases . | Pd . | Controls . | Cases . | Pd . |
N | 556 | 556 | 3,482 | 3,482 | 316 | 316 | |||
Agee | 53.5 (6.9) | 53.7 (7.0) | 58.9 (7.6) | 59.1 (7.6) | 55.9 (4.3) | 55.9 (4.3) | |||
Sex (% females) | 385 (69%) | 385 (69%) | 1,356 (39%) | 1,356 (39%) | 316 (100%) | 316 (100%) | |||
BMI | |||||||||
Normal weight | 259 (47%) | 232 (42%) | 2,647 (76%) | 2,583 (74%) | 155 (49%) | 190 (61%) | |||
Overweight | 216 (39%) | 238 (43%) | 659 (19%) | 662 (19%) | 126 (40%) | 90 (29%) | |||
Obese | 79 (14%) | 84 (15%) | 0.25 | 176 (5%) | 237 (7%) | 0.01 | 33 (11%) | 34 (11%) | 0.01 |
Smoking | |||||||||
Never | 268 (49%) | 234 (43%) | 1,671 (48%) | 1,613 (46%) | 120 (38%) | 83 (26%) | |||
Former | 155 (28%) | 141 (26%) | 1,335 (38%) | 1,379 (40%) | 92 (29%) | 93 (29%) | |||
Current | 128 (23%) | 171 (31%) | 0.01 | 476 (14%) | 490 (14%) | 0.38 | 101 (32%) | 141 (44%) | 0.001 |
Physical activity | |||||||||
High | 209 (38%) | 183 (34%) | 714 (21%) | 719 (21%) | 60 (20%) | 78 (27%) | |||
Medium | 179 (32%) | 190 (35%) | 2,021 (58%) | 2,010 (58%) | 191 (64%) | 169 (57%) | |||
Low | 163 (30%) | 173 (32%) | 0.31 | 747 (22%) | 753 (22%) | 0.96 | 47 (16%) | 47 (16%) | 0.16 |
Alcohol | |||||||||
Nondrinkers | 70 (13%) | 72 (13%) | 984 (28%) | 986 (28%) | 49 (16%) | 53 (17%) | |||
Occasional drinkers | 346 (63%) | 319 (58%) | 2,000 (57%) | 1,970 (56%) | 251 (81%) | 246 (78%) | |||
Habitual drinkers | 134 (24%) | 155 (28%) | 0.26 | 498 (14%) | 526 (15%) | 0.61 | 11 (4%) | 16 (5%) | 0.57 |
Diet | |||||||||
High quality | 52 (9%) | 43 (8%) | 528 (15%) | 526 (15%) | 98 (32%) | 104 (34%) | |||
Medium quality | 272 (49%) | 282 (52%) | 2,673 (77%) | 2,644 (76%) | 97 (31%) | 113 (37%) | |||
Low quality | 226 (41%) | 221 (40%) | 0.58 | 281 (8%) | 312 (9%) | 0.41 | 114 (37%) | 92 (30%) | 0.16 |
Education | |||||||||
High | 63 (11%) | 58 (11%) | 684 (20%) | 674 (19%) | 73 (24%) | 82 (27%) | |||
Medium | 323 (59%) | 322 (59%) | 2,100 (60%) | 2,071 (59%) | 113 (37%) | 104 (34%) | |||
Low | 165 (30%) | 166 (30%) | 0.91 | 698 (20%) | 737 (21%) | 0.51 | 117 (39%) | 122 (40%) | 0.62 |
. | EPIC Italya . | MCCSb . | NOWACc . | ||||||
---|---|---|---|---|---|---|---|---|---|
. | (n = 1,112) . | (n = 7,250) . | (n = 632) . | ||||||
. | Controls . | Cases . | Pd . | Controls . | Cases . | Pd . | Controls . | Cases . | Pd . |
N | 556 | 556 | 3,482 | 3,482 | 316 | 316 | |||
Agee | 53.5 (6.9) | 53.7 (7.0) | 58.9 (7.6) | 59.1 (7.6) | 55.9 (4.3) | 55.9 (4.3) | |||
Sex (% females) | 385 (69%) | 385 (69%) | 1,356 (39%) | 1,356 (39%) | 316 (100%) | 316 (100%) | |||
BMI | |||||||||
Normal weight | 259 (47%) | 232 (42%) | 2,647 (76%) | 2,583 (74%) | 155 (49%) | 190 (61%) | |||
Overweight | 216 (39%) | 238 (43%) | 659 (19%) | 662 (19%) | 126 (40%) | 90 (29%) | |||
Obese | 79 (14%) | 84 (15%) | 0.25 | 176 (5%) | 237 (7%) | 0.01 | 33 (11%) | 34 (11%) | 0.01 |
Smoking | |||||||||
Never | 268 (49%) | 234 (43%) | 1,671 (48%) | 1,613 (46%) | 120 (38%) | 83 (26%) | |||
Former | 155 (28%) | 141 (26%) | 1,335 (38%) | 1,379 (40%) | 92 (29%) | 93 (29%) | |||
Current | 128 (23%) | 171 (31%) | 0.01 | 476 (14%) | 490 (14%) | 0.38 | 101 (32%) | 141 (44%) | 0.001 |
Physical activity | |||||||||
High | 209 (38%) | 183 (34%) | 714 (21%) | 719 (21%) | 60 (20%) | 78 (27%) | |||
Medium | 179 (32%) | 190 (35%) | 2,021 (58%) | 2,010 (58%) | 191 (64%) | 169 (57%) | |||
Low | 163 (30%) | 173 (32%) | 0.31 | 747 (22%) | 753 (22%) | 0.96 | 47 (16%) | 47 (16%) | 0.16 |
Alcohol | |||||||||
Nondrinkers | 70 (13%) | 72 (13%) | 984 (28%) | 986 (28%) | 49 (16%) | 53 (17%) | |||
Occasional drinkers | 346 (63%) | 319 (58%) | 2,000 (57%) | 1,970 (56%) | 251 (81%) | 246 (78%) | |||
Habitual drinkers | 134 (24%) | 155 (28%) | 0.26 | 498 (14%) | 526 (15%) | 0.61 | 11 (4%) | 16 (5%) | 0.57 |
Diet | |||||||||
High quality | 52 (9%) | 43 (8%) | 528 (15%) | 526 (15%) | 98 (32%) | 104 (34%) | |||
Medium quality | 272 (49%) | 282 (52%) | 2,673 (77%) | 2,644 (76%) | 97 (31%) | 113 (37%) | |||
Low quality | 226 (41%) | 221 (40%) | 0.58 | 281 (8%) | 312 (9%) | 0.41 | 114 (37%) | 92 (30%) | 0.16 |
Education | |||||||||
High | 63 (11%) | 58 (11%) | 684 (20%) | 674 (19%) | 73 (24%) | 82 (27%) | |||
Medium | 323 (59%) | 322 (59%) | 2,100 (60%) | 2,071 (59%) | 113 (37%) | 104 (34%) | |||
Low | 165 (30%) | 166 (30%) | 0.91 | 698 (20%) | 737 (21%) | 0.51 | 117 (39%) | 122 (40%) | 0.62 |
Note: Bold type indicates statistical significance (P < 0.05).
aProportion of cases: breast cancer (45%), lung cancer (30%), and colorectal cancer (25%).
bProportion of cases: breast cancer (11%), lung cancer (9%), colorectal cancer (23%), gastric cancer (5%), kidney cancer (4%), prostate cancer (24%), UCC (12%), and MBCNs (12%).
cProportion of cases: breast cancer (59%) and lung cancer (41%).
dP values for the χ2 test.
eAverage case-matched control age differences in absolute values (±SD): 0.25 years (±0.26) in EPIC Italy, 1.00 (± 1.01) in MCCS, and 0.13 (±0.33) in NOWAC.
This study was conducted following the principles of the Declaration of Helsinki and its subsequent revisions. EPIC was reviewed and approved by the HuGeF (currently Italian Institute for Genomic Medicine, Candiolo, Torino, Italy) Ethics Committee. The MCCS protocol was approved by the Cancer Council Victoria's Human Research Ethics Committee. NOWAC was approved by the Regional Committee for Medical and Health Research Ethics in North Norway. All study participants signed an informed consent.
DNA methylation analyses
Whole-genome DNA methylation was quantified using the Illumina Infinium HumanMethylation450 BeadChip. Detailed methods and data preprocessing procedures can be found in the Supplementary Materials and Methods. To account for the possible bias introduced by the interindividual variability in the proportion of white blood cells (WBC) in peripheral blood, we estimated the percentage of WBC fractions according to the Houseman algorithm (32), which performs inference using a quadratic programming technique known as linear constrained projection, where nonnegativity and normalization constraints on cellular proportions are imposed during inference (33). We excluded from the analysis bimodal and trimodal CpGs using the function findpeaks in the R package pracma, thus focusing on rare, stochastic events.
Statistical analyses
Identification of SEMs
We computed the total number of SEMs as the sum of extreme DNA methylation values (outliers) per individual, on the basis of a modified version of the procedure described by Gentilini and colleagues (34), which takes into account differential WBC proportions among individuals, as described in Supplementary Materials and Methods. Because the number of SEMs increased exponentially with age, we used a logarithmic transformation of the total number of SEMs [hereafter referred as log(SEM)] for all association analyses.
Computation of epigenetic clock measures
Association of SEMs with cancer risk
We investigated the association between SEMs and the risk of eight types of cancer separately using log(SEM) as the predictor and case–control status as the outcome. ORs and confidence intervals (CI) were calculated using logistic regression models for a one-unit increase in log(SEM). For each cancer and cohort, we ran four regression models: model 1 included age, sex, and study-specific technical variables; model 2 included additional adjustment for smoking, body mass index (BMI), physical activity, alcohol intake, dietary quality, and education (as a proxy for socioeconomic status); model 3 included additional adjustment for Horvath epigenetic AA; and finally, model 4 included additional adjustment for DNAmGrimAge acceleration, which was found to be more strongly associated with cancer risk than Horvath epigenetic AA in the MCCS (35). All covariates were treated as categorical variables with three categories to harmonize sources of information across the three studies (see Supplementary Materials and Methods for more details on harmonization of covariates).
For associations with breast, lung, and colorectal cancer, which were investigated in more than one study, the overall OR estimates for the association between log(SEM) and cancer risk were calculated by random-effect maximum likelihood (REML) (36) meta-analysis using the R package, metafor (37). Heterogeneity in the associations among studies was evaluated using the I2 statistic. Further sensitivity analyses were performed by stratifying case–control pairs based on the TTD; ORs and CIs were calculated for TTD >10 years, TTD between 5 and 10 years, and TTD ≤5 years. The Cochran–Armitage test for trend was used to assess whether ORs followed a trend by TTD.
Further details about enrichment analyses, SEMs' stability over time, and comparison of SEMs in tumor versus normal adjacent tissues [data from The Cancer Genome Atlas (TCGA) project, https://portal.gdc.cancer.gov] are provided in the Supplementary Materials and Methods.
Data availability
The data generated and/or analyzed in this study can be accessed upon reasonable request to the originating cohorts. Access will be conditional on adherence to local ethical and security policy. R codes used for the analyses presented in the article are available upon request. EPIC DNAm partial data can be accessed through Gene Expression Omnibus accession number GSE51057.
Ethics reporting
All participants provided written informed consent, and all contributing cohorts confirmed compliance with their local research ethics committees or institutional review boards.
Results
Association of cancer risk factors with SEMs
In the three cohorts, we observed an exponential increase in the number of SEMs with age both in the complete study sample (Fig. 1, top; Pearson R = 0.17, P = 5 × 10−9; R = 0.04, P = 6 × 10−5; and R = 0.23, P = 2 × 10−9 in EPIC, MCCS, and NOWAC, respectively) and in controls only (Fig. 1, bottom; Pearson R = 0.15, P = 2 × 10−5; R = 0.04, P = 0.01; and R = 0.23, P = 1 × 10−8 in EPIC, MCCS, and NOWAC, respectively). In Table 2 are reported the cross-sectional associations of cancer risk factors with log(SEM) in both the complete study sample and in controls only (more details in Supplementary Materials and Methods).
. | EPIC Italy, all samples . | Controls only . | MCCS, all samples . | Controls only . | NOWAC, all samples . | Controls only . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Distribution of log(SEM) . | Na . | Mean (SD) . | P (R2)b . | Mean (SD) . | P (R2)b . | Na . | Mean (SD) . | P (R2)b . | Mean (SD) . | P (R2)b . | Na . | Mean (SD) . | P (R2)b . | Mean (SD) . | P (R2)b . |
Age | |||||||||||||||
<50 | 328 (29%) | 5.79 (0.98) | 5.79 (0.98) | 1,029 (15%) | 6.55 (0.98) | 6.54 (0.98) | 61 (10%) | 5.04 (0.65) | 5.16 (0.67) | ||||||
50–55 | 257 (23%) | 6.00 (0.98) | 6.01 (0.91) | 925 (13%) | 6.59 (1.03) | 6.58 (1.02) | 180 (28%) | 5.32 (0.81) | 5.44 (0.83) | ||||||
55–60 | 335 (30%) | 6.11 (1.02) | 6.04 (0.93) | 1,289 (19%) | 6.66 (1.09) | 6.59 (1.06) | 244 (39%) | 5.46 (0.81) | 5.46 (0.92) | ||||||
>60 | 192 (17%) | 6.28 (1.06) | 4 × 10−7 (2.86%) | 6.10 (0.84) | 0.02 (1.70%) | 3,721 (53%) | 6.69 (1.07) | 9 × 10−5 (0.26%) | 6.65 (1.05) | 0.05 (0.24%) | 147 (23%) | 5.74 (0.94) | 1 × 10−7 (5.42%) | 5.7 (0.9) | 0.03 (2.75%) |
Sex | |||||||||||||||
Males | 342 (31%) | 6.08 (0.99) | 6.04 (0.89) | 4,252 (61%) | 6.60 (1.02) | 6.58 (1.00) | 0 (0%) | 0 (0) | 0 (0) | ||||||
Females | 770 (69%) | 5.99 (1.03) | 0.16 (0.17%) | 5.94 (0.95) | 0.24 (0.24%) | 2,712 (39%) | 6.73 (1.11) | 1 × 10−7 (0.35%) | 6.67 (1.09) | 0.008 (0.20%) | 632 (100%) | 5.44 (0.87) | — | 5.48 (0.88) | |
Smoking | |||||||||||||||
Never | 502 (45%) | 5.98 (1.00) | 5.99 (0.96) | 3,284 (47%) | 6.63 (1.07) | 6.56 (1.04) | 203 (32%) | 5.43 (0.86) | 5.4 (0.84) | ||||||
Former | 296 (27%) | 5.97 (0.97) | 5.93 (0.88) | 2,714 (39%) | 6.68 (1.06) | 6.67 (1.05) | 185 (29%) | 5.44 (0.87) | 5.49 (0.9) | ||||||
Current | 299 (27%) | 6.16 (1.11) | 0.03 (0.66%) | 5.97 (0.96) | 0.84 (0.06%) | 966 (14%) | 6.64 (1.01) | 0.37 (0.04%) | 6.63 (1.01) | 0.02 (0.17%) | 242 (38%) | 5.46 (0.9) | 0.94 (0.02%) | 5.53 (0.89) | 0.55 (0.38%) |
BMIc | |||||||||||||||
Normal weight | 491 (44%) | 5.95 (1.01) | 5.88 (0.92) | 5,230 (75%) | 6.64 (1.05) | 6.60 (1.04) | 345 (55%) | 5.45 (0.88) | 5.46 (0.86) | ||||||
Overweight | 454 (41%) | 6.04 (0.99) | 5.99 (0.89) | 1,321 (19%) | 6.66 (1.10) | 6.58 (1.03) | 216 (34%) | 5.41 (0.87) | 5.52 (0.93) | ||||||
Obese | 163 (15%) | 6.17 (1.12) | 0.04 (0.59%) | 6.18 (1.05) | 0.04 (1.14%) | 413 (6%) | 6.79 (1.05) | 0.01 (0.15%) | 6.68 (1.05) | 0.03 (0.19%) | 67 (11%) | 5.52 (0.87) | 0.65 (0.14%) | 5.46 (0.82) | 0.88 (0.08%) |
Physical activity | |||||||||||||||
High | 392 (35%) | 5.99 (1.03) | 5.97 (0.99) | 1,433 (21%) | 6.70 (1.05) | 6.67 (1.01) | 94 (16%) | 5.44 (0.92) | 5.27 (0.89) | ||||||
Medium | 369 (33%) | 6.00 (1.02) | 5.93 (0.91) | 4,031 (58%) | 6.66 (1.06) | 6.63 (1.04) | 360 (61%) | 5.44 (0.87) | 5.54 (0.91) | ||||||
Low | 336 (30%) | 6.10 (1.03) | 0.31 (0.22%) | 6.02 (0.89) | 0.68 (0.14%) | 1,500 (22%) | 6.58 (1.05) | 0.01 (0.12%) | 6.52 (1.05) | 0.008 (0.17%) | 138 (23%) | 5.43 (0.87) | 0.98 (0.01%) | 5.42 (0.78) | 0.16 (1.24%) |
Alcohol | |||||||||||||||
No | 142 (13%) | 6.03 (1.02) | 5.96 (0.95) | 1,970 (28%) | 6.66 (1.09) | 6.62 (1.06) | 102 (16%) | 5.43 (0.75) | 5.39 (0.81) | ||||||
Occasional | 665 (60%) | 6.00 (1.01) | 5.94 (0.91) | 3,970 (57%) | 6.65 (1.05) | 6.61 (1.02) | 497 (79%) | 5.44 (0.88) | 5.48 (0.88) | ||||||
Habitual | 289 (26%) | 6.08 (1.05) | 0.49 (0.31%) | 6.04 (0.98) | 0.55 (0.20%) | 1,024 (15%) | 6.63 (1.04) | 0.74 (0.00%) | 6.60 (1.04) | 0.67 (0.01%) | 27 (4%) | 5.64 (1.1) | 0.49 (0.23%) | 5.82 (1.02) | 0.24 (0.92%) |
Diet | |||||||||||||||
High | 95 (9%) | 5.90 (0.84) | 6.00 (0.95) | 1,054 (15%) | 6.71 (1.06) | 6.56 (0.96) | 206 (33%) | 5.48 (0.88) | 5.51 (0.89) | ||||||
Medium | 554 (50%) | 6.06 (1.02) | 6.01 (0.96) | 5,317 (76%) | 6.64 (1.06) | 6.60 (1.04) | 210 (34%) | 5.42 (0.82) | 5.41 (0.82) | ||||||
Low | 447 (40%) | 6.01 (1.06) | 0.32 (0.21%) | 5.81 (0.83) | 0.16 (0.67%) | 593 (9%) | 6.65 (1.04) | 0.22 (0.04%) | 6.71 (1.04) | 0.02 (0.13%) | 202 (33%) | 5.44 (0.92) | 0.74 (0.1%) | 5.54 (0.94) | 0.51 (0.44%) |
Education | |||||||||||||||
High | 121 (11%) | 5.94 (0.86) | 5.94 (0.91) | 1,358 (19%) | 6.65 (1.01) | 6.54 (1.04) | 239 (39%) | 5.33 (0.86) | 5.36 (0.85) | ||||||
Medium | 645 (58%) | 5.96 (1.02) | 5.95 (0.95) | 4,171 (60%) | 6.68 (1.06) | 6.65 (1.05) | 217 (36%) | 5.45 (0.87) | 5.45 (0.88) | ||||||
Low | 331 (30%) | 6.18 (1.07) | 0.003 (1.01) | 6.02 (0.93) | 0.71 (0.12%) | 1,435 (21%) | 6.59 (1.07) | 0.01 (0.04%) | 6.57 (0.99) | 0.62 (0.007%) | 155 (25%) | 5.56 (0.86) | 0.04 (1.1%) | 5.71 (0.89) | 0.02 (2.68%) |
. | EPIC Italy, all samples . | Controls only . | MCCS, all samples . | Controls only . | NOWAC, all samples . | Controls only . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Distribution of log(SEM) . | Na . | Mean (SD) . | P (R2)b . | Mean (SD) . | P (R2)b . | Na . | Mean (SD) . | P (R2)b . | Mean (SD) . | P (R2)b . | Na . | Mean (SD) . | P (R2)b . | Mean (SD) . | P (R2)b . |
Age | |||||||||||||||
<50 | 328 (29%) | 5.79 (0.98) | 5.79 (0.98) | 1,029 (15%) | 6.55 (0.98) | 6.54 (0.98) | 61 (10%) | 5.04 (0.65) | 5.16 (0.67) | ||||||
50–55 | 257 (23%) | 6.00 (0.98) | 6.01 (0.91) | 925 (13%) | 6.59 (1.03) | 6.58 (1.02) | 180 (28%) | 5.32 (0.81) | 5.44 (0.83) | ||||||
55–60 | 335 (30%) | 6.11 (1.02) | 6.04 (0.93) | 1,289 (19%) | 6.66 (1.09) | 6.59 (1.06) | 244 (39%) | 5.46 (0.81) | 5.46 (0.92) | ||||||
>60 | 192 (17%) | 6.28 (1.06) | 4 × 10−7 (2.86%) | 6.10 (0.84) | 0.02 (1.70%) | 3,721 (53%) | 6.69 (1.07) | 9 × 10−5 (0.26%) | 6.65 (1.05) | 0.05 (0.24%) | 147 (23%) | 5.74 (0.94) | 1 × 10−7 (5.42%) | 5.7 (0.9) | 0.03 (2.75%) |
Sex | |||||||||||||||
Males | 342 (31%) | 6.08 (0.99) | 6.04 (0.89) | 4,252 (61%) | 6.60 (1.02) | 6.58 (1.00) | 0 (0%) | 0 (0) | 0 (0) | ||||||
Females | 770 (69%) | 5.99 (1.03) | 0.16 (0.17%) | 5.94 (0.95) | 0.24 (0.24%) | 2,712 (39%) | 6.73 (1.11) | 1 × 10−7 (0.35%) | 6.67 (1.09) | 0.008 (0.20%) | 632 (100%) | 5.44 (0.87) | — | 5.48 (0.88) | |
Smoking | |||||||||||||||
Never | 502 (45%) | 5.98 (1.00) | 5.99 (0.96) | 3,284 (47%) | 6.63 (1.07) | 6.56 (1.04) | 203 (32%) | 5.43 (0.86) | 5.4 (0.84) | ||||||
Former | 296 (27%) | 5.97 (0.97) | 5.93 (0.88) | 2,714 (39%) | 6.68 (1.06) | 6.67 (1.05) | 185 (29%) | 5.44 (0.87) | 5.49 (0.9) | ||||||
Current | 299 (27%) | 6.16 (1.11) | 0.03 (0.66%) | 5.97 (0.96) | 0.84 (0.06%) | 966 (14%) | 6.64 (1.01) | 0.37 (0.04%) | 6.63 (1.01) | 0.02 (0.17%) | 242 (38%) | 5.46 (0.9) | 0.94 (0.02%) | 5.53 (0.89) | 0.55 (0.38%) |
BMIc | |||||||||||||||
Normal weight | 491 (44%) | 5.95 (1.01) | 5.88 (0.92) | 5,230 (75%) | 6.64 (1.05) | 6.60 (1.04) | 345 (55%) | 5.45 (0.88) | 5.46 (0.86) | ||||||
Overweight | 454 (41%) | 6.04 (0.99) | 5.99 (0.89) | 1,321 (19%) | 6.66 (1.10) | 6.58 (1.03) | 216 (34%) | 5.41 (0.87) | 5.52 (0.93) | ||||||
Obese | 163 (15%) | 6.17 (1.12) | 0.04 (0.59%) | 6.18 (1.05) | 0.04 (1.14%) | 413 (6%) | 6.79 (1.05) | 0.01 (0.15%) | 6.68 (1.05) | 0.03 (0.19%) | 67 (11%) | 5.52 (0.87) | 0.65 (0.14%) | 5.46 (0.82) | 0.88 (0.08%) |
Physical activity | |||||||||||||||
High | 392 (35%) | 5.99 (1.03) | 5.97 (0.99) | 1,433 (21%) | 6.70 (1.05) | 6.67 (1.01) | 94 (16%) | 5.44 (0.92) | 5.27 (0.89) | ||||||
Medium | 369 (33%) | 6.00 (1.02) | 5.93 (0.91) | 4,031 (58%) | 6.66 (1.06) | 6.63 (1.04) | 360 (61%) | 5.44 (0.87) | 5.54 (0.91) | ||||||
Low | 336 (30%) | 6.10 (1.03) | 0.31 (0.22%) | 6.02 (0.89) | 0.68 (0.14%) | 1,500 (22%) | 6.58 (1.05) | 0.01 (0.12%) | 6.52 (1.05) | 0.008 (0.17%) | 138 (23%) | 5.43 (0.87) | 0.98 (0.01%) | 5.42 (0.78) | 0.16 (1.24%) |
Alcohol | |||||||||||||||
No | 142 (13%) | 6.03 (1.02) | 5.96 (0.95) | 1,970 (28%) | 6.66 (1.09) | 6.62 (1.06) | 102 (16%) | 5.43 (0.75) | 5.39 (0.81) | ||||||
Occasional | 665 (60%) | 6.00 (1.01) | 5.94 (0.91) | 3,970 (57%) | 6.65 (1.05) | 6.61 (1.02) | 497 (79%) | 5.44 (0.88) | 5.48 (0.88) | ||||||
Habitual | 289 (26%) | 6.08 (1.05) | 0.49 (0.31%) | 6.04 (0.98) | 0.55 (0.20%) | 1,024 (15%) | 6.63 (1.04) | 0.74 (0.00%) | 6.60 (1.04) | 0.67 (0.01%) | 27 (4%) | 5.64 (1.1) | 0.49 (0.23%) | 5.82 (1.02) | 0.24 (0.92%) |
Diet | |||||||||||||||
High | 95 (9%) | 5.90 (0.84) | 6.00 (0.95) | 1,054 (15%) | 6.71 (1.06) | 6.56 (0.96) | 206 (33%) | 5.48 (0.88) | 5.51 (0.89) | ||||||
Medium | 554 (50%) | 6.06 (1.02) | 6.01 (0.96) | 5,317 (76%) | 6.64 (1.06) | 6.60 (1.04) | 210 (34%) | 5.42 (0.82) | 5.41 (0.82) | ||||||
Low | 447 (40%) | 6.01 (1.06) | 0.32 (0.21%) | 5.81 (0.83) | 0.16 (0.67%) | 593 (9%) | 6.65 (1.04) | 0.22 (0.04%) | 6.71 (1.04) | 0.02 (0.13%) | 202 (33%) | 5.44 (0.92) | 0.74 (0.1%) | 5.54 (0.94) | 0.51 (0.44%) |
Education | |||||||||||||||
High | 121 (11%) | 5.94 (0.86) | 5.94 (0.91) | 1,358 (19%) | 6.65 (1.01) | 6.54 (1.04) | 239 (39%) | 5.33 (0.86) | 5.36 (0.85) | ||||||
Medium | 645 (58%) | 5.96 (1.02) | 5.95 (0.95) | 4,171 (60%) | 6.68 (1.06) | 6.65 (1.05) | 217 (36%) | 5.45 (0.87) | 5.45 (0.88) | ||||||
Low | 331 (30%) | 6.18 (1.07) | 0.003 (1.01) | 6.02 (0.93) | 0.71 (0.12%) | 1,435 (21%) | 6.59 (1.07) | 0.01 (0.04%) | 6.57 (0.99) | 0.62 (0.007%) | 155 (25%) | 5.56 (0.86) | 0.04 (1.1%) | 5.71 (0.89) | 0.02 (2.68%) |
aMissing data are not reported in the table.
bP values were derived from ANOVA model using the logarithm of the total number of SEMs as the outcome, and each lifestyle variable as the independent predictor, and R2 indicates the proportion of variance explained by each predictor considered in this study.
cNormal weight, BMI <25; overweight, BMI 25 to <30 and obese, BMI ≥30.
Association of SEMs with the risk of cancers
The estimated ORs and CIs derived from the four logistic regression models described in Materials and Methods are reported in Table 3. The ORs from model 1 did not deviate significantly from those estimated in model 2 (additional adjustment for cancer risk factors), and models 3 and 4 (additional adjustment for epigenetic AA).
. | Model 1a . | Model 2b . | Model 3c . | Model 4d . | . | TTD . | ||||
---|---|---|---|---|---|---|---|---|---|---|
. | OR (95% CI)e . | P (FDR) . | OR (95% CI)e . | P (FDR) . | OR (95% CI)e . | P (FDR) . | OR (95% CI)e . | P (FDR) . | N . | Median TTD in years (IQR) . |
Breast cancer | 1.25 (1.10–1.40) | 0.0002 (0.001) | 1.25 (1.11–1.41) | 0.0003 (0.002) | 1.24 (1.10–1.40) | 0.0004 (0.002) | 1.24 (1.10–1.40) | 0.0004 (0.002) | EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 250,409, 186 pairs, respectively | EPIC Italy, MCCS, and NOWAC studies; 7.0 yrs (7.1), 7.7 yrs (6.1), 2.1 yrs (2.1), respectively |
Colorectal cancer | 1.01 (0.91–1.13) | 0.67 (1.00) | 1.02 (0.91–1.14) | 0.68 (1.00) | 1.01 (0.89–1.15) | 0.85 (1.00) | 1.00 (0.89–1.25) | 0.86 (1.00) | EPIC Italy and MCCS studies meta-analysis; I2 = 0; N = 139,835 pairs, respectively | EPIC Italy and MCCS studies; 6.3 yrs (5.0), 9.3 yrs (8.0), respectively |
Gastric cancer | 0.93 (0.72–1.20) | 0.57 (1.00) | 0.88 0.65–1.16) | 0.35 (1.00) | 0.91 (0.68–1.22) | 0.54 (1.00) | 0.91 (0.68–1.22) | 0.54 (1.00) | MCCS; N = 170 pairs. | 11.4 yrs (10.3) |
Kidney cancer | 1.19 (0.90–1.57) | 0.23 (1.00) | 1.20 (0.87–1.66) | 0.25 (1.00) | 1.21 (0.87–1.67) | 0.25 (1.00) | 1.24 (0.89–1.72) | 0.21 (1.00) | MCCS; N = 143 pairs. | 11.2 yrs (8.5) |
Lung cancer | 1.25 (1.10–1.44) | 0.003 (0.02) | 1.23 (1.07–1.42) | 0.004 (0.02) | 1.22 (1.06–1.41) | 0.004 (0.02) | 1.18 (1.02–1.21) | 0.006 (0.03) | EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 167, 332,130 pairs, respectively | EPIC Italy, MCCS, and NOWAC studies: 7.4 yrs (5.7), 10.1 yrs (7.5), 4.1 yrs (3.2), respectively |
MBCN | 1.42 (1.22–1.67) | 5 × 10−6 (4 × 10−5) | 1.43 (1.22–1.67) | 5 × 10−6 (4 × 10−5) | 1.43 (1.21–1.68) | 2 × 10−5 (2 × 10−4) | 1.43 (1.22–1.69) | 1 × 10−5 (8 × 10−5) | MCCS; N = 439. | 10.5 yrs (8.1) |
Prostate cancer | 1.01 (0.89–1.15) | 0.88 (1.00) | 1.01 (0.89–1.16) | 0.84 (1.00) | 1.01 (0.89–1.15) | 0.83 (1.00) | 1.02 (0.90–1.16) | 0.76 (1.00) | MCCS; N = 869. | 10.5 yrs (7.9) |
UCC | 0.96 (0.83–1.12) | 0.61 (1.00) | 0.92 (0.79–1.08) | 0.33 (1.00) | 0.91 (0.78–1.07) | 0.27 (1.00) | 0.87 (0.74–1.03) | 0.1 (1.00) | MCCS; N = 428. | 6.3 yrs (6.8) |
. | Model 1a . | Model 2b . | Model 3c . | Model 4d . | . | TTD . | ||||
---|---|---|---|---|---|---|---|---|---|---|
. | OR (95% CI)e . | P (FDR) . | OR (95% CI)e . | P (FDR) . | OR (95% CI)e . | P (FDR) . | OR (95% CI)e . | P (FDR) . | N . | Median TTD in years (IQR) . |
Breast cancer | 1.25 (1.10–1.40) | 0.0002 (0.001) | 1.25 (1.11–1.41) | 0.0003 (0.002) | 1.24 (1.10–1.40) | 0.0004 (0.002) | 1.24 (1.10–1.40) | 0.0004 (0.002) | EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 250,409, 186 pairs, respectively | EPIC Italy, MCCS, and NOWAC studies; 7.0 yrs (7.1), 7.7 yrs (6.1), 2.1 yrs (2.1), respectively |
Colorectal cancer | 1.01 (0.91–1.13) | 0.67 (1.00) | 1.02 (0.91–1.14) | 0.68 (1.00) | 1.01 (0.89–1.15) | 0.85 (1.00) | 1.00 (0.89–1.25) | 0.86 (1.00) | EPIC Italy and MCCS studies meta-analysis; I2 = 0; N = 139,835 pairs, respectively | EPIC Italy and MCCS studies; 6.3 yrs (5.0), 9.3 yrs (8.0), respectively |
Gastric cancer | 0.93 (0.72–1.20) | 0.57 (1.00) | 0.88 0.65–1.16) | 0.35 (1.00) | 0.91 (0.68–1.22) | 0.54 (1.00) | 0.91 (0.68–1.22) | 0.54 (1.00) | MCCS; N = 170 pairs. | 11.4 yrs (10.3) |
Kidney cancer | 1.19 (0.90–1.57) | 0.23 (1.00) | 1.20 (0.87–1.66) | 0.25 (1.00) | 1.21 (0.87–1.67) | 0.25 (1.00) | 1.24 (0.89–1.72) | 0.21 (1.00) | MCCS; N = 143 pairs. | 11.2 yrs (8.5) |
Lung cancer | 1.25 (1.10–1.44) | 0.003 (0.02) | 1.23 (1.07–1.42) | 0.004 (0.02) | 1.22 (1.06–1.41) | 0.004 (0.02) | 1.18 (1.02–1.21) | 0.006 (0.03) | EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 167, 332,130 pairs, respectively | EPIC Italy, MCCS, and NOWAC studies: 7.4 yrs (5.7), 10.1 yrs (7.5), 4.1 yrs (3.2), respectively |
MBCN | 1.42 (1.22–1.67) | 5 × 10−6 (4 × 10−5) | 1.43 (1.22–1.67) | 5 × 10−6 (4 × 10−5) | 1.43 (1.21–1.68) | 2 × 10−5 (2 × 10−4) | 1.43 (1.22–1.69) | 1 × 10−5 (8 × 10−5) | MCCS; N = 439. | 10.5 yrs (8.1) |
Prostate cancer | 1.01 (0.89–1.15) | 0.88 (1.00) | 1.01 (0.89–1.16) | 0.84 (1.00) | 1.01 (0.89–1.15) | 0.83 (1.00) | 1.02 (0.90–1.16) | 0.76 (1.00) | MCCS; N = 869. | 10.5 yrs (7.9) |
UCC | 0.96 (0.83–1.12) | 0.61 (1.00) | 0.92 (0.79–1.08) | 0.33 (1.00) | 0.91 (0.78–1.07) | 0.27 (1.00) | 0.87 (0.74–1.03) | 0.1 (1.00) | MCCS; N = 428. | 6.3 yrs (6.8) |
Note: Bold type indicates statistical significance (P < 0.05).
Abbreviations: IQR, interquartile range; yrs, years.
aModel 1 is adjusted for matching variables and study-specific variables.
bModel 2 includes additional adjustment for smoking, BMI, dietary quality, alcohol intake, physical activity, and education.
cModel 3 includes additional adjustment for Horvath DNAmAge epigenetic AA.
dModel 4 includes additional adjustment for Horvath DNAmAge and DNAmGrimAge epigenetic AA.
eORs per one-unit increase in log(SEM).
A higher number of SEMs was associated with an increased risk of breast cancer (three-study meta-analysis: OR = 1.25; 95% CI, 1.11–1.41; P = 0.0003; I2 = 0%; Fig. 2A) and lung cancer (three-study meta-analysis: OR = 1.23; 95% CI, 1.07–1.42; P = 0.004; I2 = 0%; Fig. 2B). In MCCS, log(SEM) was associated with MBCN (OR = 1.43; 95% CI, 1.22–1.67; P = 5 × 10−06; Table 3). ORs greater than one were also observed for colorectal, kidney, and prostate cancers, although the associations were not statistically significant (Table 3). In the analysis stratified by TTD, ORs significantly increase as the TDD decreased for breast, lung, colorectal (all Ptrend < 0.001), MBCN, and prostate cancers (for both, Ptrend < 0.05; Fig. 3).
Association of number of SEMs with epigenetic clocks
As shown in Supplementary Figs. S1 and S2 (top), the number of SEMs was positively correlated with Horvath DNAmAge (R = 0.25, P < 0.0001; R = 0.03, P = 0.001; and R = 0.20, P = 0.04 in EPIC, MCCS, and NOWAC, respectively) and with DNAmGrimAge (R = 0.25, P = 0.0005; R = 0.07, P < 0.0001; and R = 0.24, P = 0.04 in EPIC, MCCS, and NOWAC, respectively) in all three studies. Consistent results were obtained from the same analyses on controls only (Supplementary Figs. S1 and S2, bottom).
Enrichment analyses
We found an enrichment of epimutations in genomic regions characterized by open chromatin states, CpG islands, and shores (P = 0.02, P = 0.05, and P = 0.0003, respectively; Supplementary Table S1), “inactive/poised promoters” (P < 0.0001), “heterochromatin/low signal/copy-number variation (CNV)” (P < 0.0001), “polycomb-repressed” regions (P = 0.02; Supplementary Table S2), in transcription factor binding sites (TFBS) targeted by two members of the polycomb repressive complex 2 (PRC2): EZH2 and SUZ12 (P < 0.0001 and P = 0.0005, respectively; Supplementary Table S3), and by the transcriptional corepressor, ctBP2 (P = 0.001; Supplementary Table S3).
SEMs' stability over time
We investigated the stability of SEMs over time, by analyzing longitudinal DNAm data of 33 healthy controls in EPIC Italy who participated as healthy volunteers in the PEM-Turin study within the EXPOsOMICS study (38), also, and for whom DNAm was assessed on average 18.7 years after recruitment (range, 16.4–20.3 years).
In longitudinal regression model, log(SEM) significantly increased over time (increase per-year = 0.168 ± 0.007; P < 0.0001; Supplementary Fig. S3). The average percentage of SEMs stable over time was 71% (range, 55%–93%), EZH2-specific SEMs being more stable compared with SEMs appearing over the whole genome (average EZH2-specific stable SEMs = 87%; range = 62%–100%; χ2 test for proportion, P < 0.0001).
SEMs in tumor compared with normal adjacent tissues
We analyzed data from TCGA project for lung, breast, and colorectal cancers; log(SEM) was significantly higher in tumors compared with normal adjacent tissues (all P < 0.0001; Supplementary Fig. S4), both considering whole-genome log(SEM) and EZH2-specific log(SEM). The average percentage of SEMs conserved in tumor from normal adjacent tissue was 72% (range, 54%–98%). The proportion of conserved EZH2-specific SEMs was significantly higher, 87% (range, 61%–97%, χ2 test for proportion, P < 0.0001). Enrichment analyses confirmed SEMs more likely occur in usually silenced genomic regions, like inactive or poised promoters, polycomb-repressed regions, and regions targeted by EZH2 and SUZ12.
It is worth observing that the majority of the CpGs in genomic regions targeted by EZH2 were on average hypomethylated (more than 80% of the CpGs had an average DNAm beta-value lower than 20%; Supplementary Fig. S5). Consequently, more than 95% of EZH2-specific SEMs occurred as abnormal hypermethylation of a locus that is hypomethylated in the overall sample.
In Supplementary Materials and Methods, we have shown further details about SEMs in EZH2 targets, SEMs' stability over time, and the analyses of data from TCGA.
Discussion
In this study, we have analyzed DNAm data from blood samples of approximately 4,500 case–control pairs nested within three large prospective cohorts: EPIC Italy, MCCS, and NOWAC. The main aim of this study was to investigate the association of the total number of SEMs with cancers using a prospective study design. In addition, we investigated SEMs' stability over time and the genomic regions in which SEMs appear more frequently.
SEMs increasing with aging and stability over time
The number of estimated SEMs per sample varied by cohort; however, we observed an exponential increase of SEMs with age in all cohorts (Fig. 1) confirming previous findings (27, 34). Differences in the number of SEMs between studies were mainly driven by different DNAm data normalization and preprocessing procedures, as well as by variable study sample size, which affects CpGs DNAm values distribution. Consequently, the magnitude of the association of log(SEM) with age (Fig. 1) and epigenetic clocks (Supplementary Figs. S1 and S2) varied by cohort as well. We overcame these batch effect issues with our study design by assaying both matched case–control samples in the same batch. The results observed in our cross-sectional study and reported in the literature about the exponential increase of SEMs with age were further confirmed using longitudinal data, available for a subset of the EPIC Italy cohort included in the EXPOsOMICS study, for whom DNAm was assessed at recruitment and on average 19 years later (methodologic details in Supplementary Materials and Methods). In this longitudinal analysis, we also observed a high interindividual variability both in the total number and in the growth rate of SEMs over time (Supplementary Fig. S3), strengthening our hypothesis of SEMs as candidate biomarkers of accumulation of exposure-related DNA damage during aging. Accordingly, we observed a cross-sectional association of SEMs with lifestyle-related factors such as smoking, obesity, and dietary quality, as we observed previously with alcohol intake and socioeconomic status (24). Also, log(SEM) positively correlates with the widely studied biological aging measures based on the epigenetic clocks developed by Horvath and colleagues (Supplementary Figs. S1 and S2; ref. 7). The association between the two age-related biomarkers is not driven by their association with chronologic age, because AA is independent of chronologic age by definition (12).
We were not able to investigate whether changes in lifestyle may slow down aging-related increase in SEMs. A recent study analyzing longitudinal data on SEMs in twins concluded that a small percentage of the differences in SEMs' growth rate within individuals might be driven by underlying genetic background. These results suggest other exposures may play a significant role, worthy of further investigation (27). On the other hand, using longitudinal data, we showed that once epimutations are established, most of them remain stable over time. Previous findings indicate that methylation patterns are transmittable during cell divisions (39), suggesting SEMs may be inherited through mitosis.
SEMs' association with cancer risk
The main findings of this study were the associations of the number of SEMs with a higher risk of breast and lung cancers, and MBCN. The estimated ORs were not confounded by age because cases and controls were age-matched. Moreover, we further adjusted for age to control for potential residual confounding. The observed associations remained significant even after adjustment for smoking, BMI, physical activity, diet, alcohol consumption, and epigenetic clock measures. Although we observed an association of the total number of SEMs with cancer risk factors like smoking, obesity, and epigenetic clocks, the results obtained in model 1 (minimally adjusted), model 2 (adjusted for various cancer risk factors), models 3 and 4 (additionally adjusted for epigenetic clocks measures) did not differ significantly. These results suggest that the increased number of SEMs that is associated with unhealthy lifestyle explains a small part of the association of log(SEM) with cancer, meaning that other biological mechanisms, including inflammation, reduced DNA repair capacity (40), and other unmeasured environmental and lifestyle exposures (e.g., exposure to toxicants) could be the main drivers of these associations. In MCCS, DNAmGrimAge AA outperforms first-generation clocks in predicting several cancers, the strongest association being with risk of lung cancer, after adjustment for smoking intensity and other smoking-related variables (35). The association of log(SEM) with breast and lung cancer and with MBCN was minimally changed after adjustment for DNAmGrimAge AA, suggesting that SEMs and epigenetic clocks are independent DNAm-based biomarkers, likely involving distinct biomolecular alterations. Further studies are needed to clarify the underlying biological mechanisms linking SEMs and DNAmGrimAge to cancer risk.
Our results indicate that alterations of DNA methylation profiles could be detected in the blood years before cancer diagnosis and, together with previous studies (25, 27), suggest that an increasing number of SEMs in blood could be predictive of risk of future cancers. The differences between cases and matched controls increased as the time from blood collection to cancer diagnosis decreased (Fig. 3) in all but one type of cancer investigated. We found a significant trend of increasing OR as the TTD decreased in breast, lung, colorectal, and prostate cancers and MBCN, suggesting that the predictive utility of the log(SEM) biomarker is greater for short-term risk.
SEMs occur more likely in specific genomic regions
We found that regions and sites affected by epimutations are not entirely “stochastic”; instead, they are enriched in specific genomic regions, and randomly distributed inside them (34). This behavior could be defined as “local, but nonglobal, stochasticity.” Our findings confirmed that epimutations preferentially occur in DNA sequences associated with open chromatin, as previously observed by Ong and colleagues (41). Furthermore, SEMs were enriched in transcriptionally silenced genomic regions such as “inactive promoters,” “heterochromatin/low signal/CNV,” and “polycomb-repressed” regions. In addition, epimutations more likely appeared in TFBSs targeted by two members of PRC2, EZH2 and SUZ12, and the transcriptional corepressor, ctBP2. Similar patterns of DNAm alterations were described in normal breast tissue adjacent to cancerous breast tissue, compared with breast tissue in cancer-free women (42). Comparing tumor with normal adjacent tissue using data from TCGA project on breast, lung, and colorectal cancers, we also found similar results. Interestingly, EZH2-specific SEMs are more stable over time (and conserved in tumor from normal adjacent tissue) compared with epimutations appearing in the rest of the genome.
A possible mechanism of carcinogenesis
We observed that the vast majority of EZH2-specific epimutations occur as gain of methylation in CpGs unmethylated in the general population (Supplementary Fig. S5). Because both DNA methylation and polycomb repression system are key players in cancer formation and progression, the SEM-driven reshaping of the epigenetic landscape observed with aging could play a role in PRCs relocation, a crucial mechanism in the establishment of a more cancer prone environment in healthy people years before the diagnosis.
The transcriptional regulations by DNA methylation and that by PRC2 proteins are related, they rarely act simultaneously on CpG islands (43) and removal of the epigenetic mark provokes a redistribution of the PRC2-distinctive H3K27me3 in mammalian cells. At a functional level, the link between aging, PRC2, and global DNA methylation dysregulation involves the loss of self-renewal capacity of adult stem cells (44). Multipotent stem cell senescence in vitro is characterized by downregulation of PRC2 genes, including EZH2 and SUZ12 (44). Therefore, a downregulation of EZH2 and SUZ12 may induce a further dysregulation of PRC2 protein complex targets, including several tumor suppressor genes (45), such as p53 (46).
The dynamics of the interaction between the polycomb protein complex and DNA methylation are complex and not entirely understood yet. The two repressive systems are mutually exclusive and DNA methylation prevents polycomb from accessing the promoter in vitro (47). In this study, we observed that aging may increase the enrichment of methylated sites in correspondence of TFBSs targeted by EZH2 and SUZ12, consequently altering the efficacy of epigenetic regulation of polycomb complex. Therefore, we could hypothesize that during aging, a more stable epigenetic silencing by DNA methylation could replace the more dynamic polycomb repressive signal, contributing to the early mechanisms involved in age-related diseases, specifically some types of cancer. As proposed by others, the tumor suppressive genes regulated by polycomb complex may switch from a dynamic to a fixed repressive state (48–50). In this context, tumor suppressor genes would not work properly, letting cells grow abnormally and become malignant. More studies are needed to verify these data that raised new intriguing hypotheses connecting aging and cancer. The fact that SEMs data have been extracted from prospective study enforces the observations done on patients with cancer when the disease was already present (51).
Study limitations
Although most risk factors were measured carefully in the three cohort studies, the procedure to minimize variability, due to the different sources of information, possibly introduced bias in the regression models we used.
We measured DNA methylation levels in blood and not in target tissues for each cancer type. Tissue biopsy is the gold standard approach for patients' diagnosis and prognostication. However, especially for early stages, a tissue biopsy sampling could be difficult or even dangerous (52). Epigenetic signatures in whole-blood DNA might reflect the interaction of host genetic and environmental factors associated with cancer susceptibility as previously shown by others (53–57), and further supported by our results on tumor tissues.
We found significant associations of SEMs with three of eight cancers investigated and overall small magnitude in the effect sizes. These results suggest that not every epimutation is potentially dangerous suggesting further research. Combining DNA methylation and gene expression data from both blood and tissue of the same individuals will help to elucidate whether or not specific genes or genomic regions influence cancer risk when affected by SEMs, keeping in mind that each type of cancer is a distinct disease, with its unique genetic landscape. Moreover, our results from the analyses stratified by TTD suggest that the predictive utility of the log(SEM) biomarker is greater for short-term risk.
Future studies are also needed to identify cancer-specific epimutational signatures and to understand the biological mechanisms associated with accumulation of epimutations during the lifespan, possibly involving the individual genetic background and DNA repair capacity.
Conclusions
A higher number of SEMs was significantly associated with an increased risk of breast and lung cancers and with MBCN. We confirmed previous observations about the exponential increase of SEMs with aging using longitudinal data showing that most of SEMs are stable over time and conserved in tumor compared with normal adjacent tissue. Finally, we showed that SEMs more likely occur in specific genomic regions, suggesting a biomolecular mechanism involving PRC2 proteins, which may deserve further investigation. These observations might open new avenues for the understanding the biomolecular mechanisms of carcinogenesis.
Disclosure of Potential Conflicts of Interest
P.-A. Dugué reports grants from the National Health and Medical Research Council (1106016, 1011618, 1026892, 1027505, 1050198, 57 1087683, 1088405, 1043616, 209057, 396414, and 1074383) during the conduct of the study. A.M. Hodge reports grants from the National Health and Medical Research Council during the conduct of the study. D.R. English reports grants from the National Health and Medical Research Council during the conduct of the study. G.G. Giles reports grants from the National Health and Medical Research Council (paid to Cancer Council Victoria) during the conduct of the study. R.L. Milne reports grants from the National Health and Medical Research Council during the conduct of the study. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
A. Gagliardi: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. P.-A. Dugué: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. T.H. Nøst: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. M.C. Southey: Data curation, funding acquisition, writing–review and editing. D.D. Buchanan: Writing–review and editing. D.F. Schmidt: Writing–review and editing. E. Makalic: Writing–review and editing. A.M. Hodge: Funding acquisition, writing–review and editing. D.R. English: Funding acquisition, writing–review and editing. N.W. Doo: Writing–review and editing. J.L. Hopper: Writing–review and editing. G. Severi: Funding acquisition, writing–review and editing. L. Baglietto: Funding acquisition, writing–review and editing. A. Naccarati: Funding acquisition, writing–review and editing. S. Tarallo: Writing–review and editing. L. Pace: Writing–review and editing. V. Krogh: Funding acquisition, writing–review and editing. D. Palli: Funding acquisition, writing–review and editing. S. Panico: Funding acquisition, writing–review and editing. C. Sacerdote: Funding acquisition, writing–review and editing. R. Tumino: Funding acquisition, writing–review and editing. E. Lund: Funding acquisition, writing–review and editing. G.G. Giles: Funding acquisition, writing–review and editing. B. Pardini: Writing–review and editing. T.M. Sandanger: Funding acquisition, writing–review and editing. R.L. Milne: Funding acquisition, writing–review and editing. P. Vineis: Funding acquisition, writing–review and editing. S. Polidoro: Conceptualization, data curation, writing–review and editing. G. Fiorito: Conceptualization, data curation, supervision, writing–review and editing.
Acknowledgments
The authors are very thankful to Dr. Akram Ghantous (IARC, Lyon, France) for the methylation analyses of PEM-Turin study, produced within the Exposomics EC FP7 grant (grant agreement no. 308610, to P. Vineis). The results here are, in part, based upon data generated by TCGA research network: https://www.cancer.gov/tcga. The EPIC Italy component of this research was supported by European Commission grant no. 633666, awarded to P. Vineis, and by the AIRC grant (Progetto IG 2013 N.14410, to C. Sacerdote) for part of the DNA methylation experiments. The Melbourne Collaborative Cohort Study (MCCS) cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS component of the work was funded by the Australian National Health and Medical Research Council, including grants 1106016 (to G.G. Giles); 1011618 (to L. Baglietto); 1026892 (to M.C. Southey); 1027505 (to D.R. English); 1050198 and 1087683 (to A.M. Hodge); 1088405 (to R.L. Milne); and 1043616, 209057, 396414, and 1074383 (to G.G. Giles). Cases and their vital status were ascertained through the Victorian Cancer Registry and the Australian Institute of Health and Welfare, including the National Death Index and the Australian Cancer Database. The NOWAC component of the work was supported by the European Research Council Advanced Researcher grant, 2008: Transcriptomics in cancer research (grant no. ERC-2008-AdG, to E. Lund).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.