Background:

Age-related epigenetic dysregulations are associated with several diseases, including cancer. The number of stochastic epigenetic mutations (SEM) has been suggested as a biomarker of life-course accumulation of exposure-related DNA damage; however, the predictive role of SEMs in cancer has seldom been investigated.

Methods:

A SEM, at a given CpG site, was defined as an extreme outlier of DNA methylation value distribution across individuals. We investigated the association of the total number of SEMs with the risk of eight cancers in 4,497 case–control pairs nested in three prospective cohorts. Furthermore, we investigated whether SEMs were randomly distributed across the genome or enriched in functional genomic regions.

Results:

In the three-study meta-analysis, the estimated ORs per one-unit increase in log(SEM) from logistic regression models adjusted for age and cancer risk factors were 1.25; 95% confidence interval (CI), 1.11–1.41 for breast cancer, and 1.23; 95% CI, 1.07–1.42 for lung cancer. In the Melbourne Collaborative Cohort Study, the OR for mature B-cell neoplasm was 1.46; 95% CI, 1.25–1.71. Enrichment analyses indicated that SEMs frequently occur in silenced genomic regions and in transcription factor binding sites regulated by EZH2 and SUZ12 (P < 0.0001 and P = 0.0005, respectively): two components of the polycomb repressive complex 2 (PCR2). Finally, we showed that PCR2-specific SEMs are generally more stable over time compared with SEMs occurring in the whole genome.

Conclusions:

The number of SEMs is associated with a higher risk of different cancers in prediagnostic blood samples.

Impact:

We identified a candidate biomarker for cancer early detection, and we described a carcinogenesis mechanism involving PCR2 complex proteins worthy of further investigations.

This article is featured in Highlights of This Issue, p. 1841

The concept of “life-course accumulation of exposures” and related damage has been proposed to explain the decline of physiologic functioning and the consequent increased disease morbidity and mortality during aging (1). The accumulation of environmental, socioeconomic, and behavioral exposures may cause long-term damage, which may be amplified by a decreased ability to repair damage as the body ages (1). For most diseases, including cancer, age is an important risk factor and the incidence of most malignancies increases exponentially with age (2).

Basic research, combined with the increasing capacity of large-scale technologies including “omics” measurements, has led to the formulation of exposure-driven models of carcinogenesis (3) in which functional changes in gene regulation and genomic mutations reflect the life-course accumulation of exposure-related DNA damage. The accumulation over time of somatic mutations in specific genes may lead to cancer development, but recent studies demonstrated that this molecular mechanism alone is not sufficient (4, 5).

Two mechanisms contribute to age-related DNA methylation changes: the “epigenetic drift” (6) and the “epigenetic clock” (7). Although both are related to aging, the “epigenetic clock” refers to specific CpG sites at which DNA methylation levels steadily increase or decrease with age and thus can be used to predict chronologic age with high accuracy (7). The concept of epigenetic age acceleration (AA) has been introduced as the difference between predicted DNA methylation age and the chronologic age (7, 8). Epigenetic AA may be a good biomarker of biological aging as it has been associated with longevity (9–12), several pathologic conditions (13, 14), noncommunicable disease risk factors like obesity (15), poor physical activity (16), and low socioeconomic status (17), cancer risk, and shorter cancer survival (10). Recent literature discerns Horvath's (7) and Hannum and colleagues' (8) “first-generation clocks” from DNAmPhenoAge (18) and DNAmGrimAge (19), called the “next-generation clocks,” with the latest being trained not only on age, but also on a complex set of biomarkers which in turn are associated with individual health status and mortality. Early findings indicate that the next-generation clocks may be capturing important aspects of the accelerated biological aging. In a recent critique of the epigenetic clocks, Dugué and colleagues cautioned that early studies generally report stronger associations than later studies and are more likely to be affected by publication bias (20). In contrast, the “epigenetic drift” is a mechanism that involves the whole genome, suggesting a global dysregulation of DNA methylation patterns with age (21). Two critical aspects of the epigenetic drift are genomic instability and chromatin deterioration during aging, which lead to an accumulation of epigenetic mutations (also known as “epimutations,” i.e., changes in gene activity not involving DNA mutations but rather gain or loss of DNA methyl groups, which are conserved in cells during mitosis; ref. 22). A higher number of stochastic epigenetic mutations (SEM) across the genome has been associated with risk factors such as cigarette smoking, alcohol intake (22), and exposure to toxicants (23). We recently reported several associations between lifestyle-related variables and the number of SEMs (24). Moreover, more SEMs may be associated with skewed X chromosome inactivation in women and with hepatocellular carcinoma tumor stage (25), suggesting a possible role of SEMs in other age-related diseases. It is important to specify the meaning of the term “epimutation”, although some authors have used this term in a broader sense (26), including epigenetic changes driven by DNA mutations, hereby, we refer to “epimutation” as a switch of the “epigenetic state” not due to underlying DNA sequence variations but due to gain or loss of DNA methylation.

In this study, we investigated the associations between the number of SEMs across the genome and the risk of eight malignancies [breast, colorectal, lung, gastric, prostate, and kidney cancers, as well as urothelial cell carcinoma (UCC) and mature B-cell neoplasms (MBCN)] in 4,497 case–controls pairs, matched on age and other relevant variables, nested within three large prospective cohorts from Italy [the Italian part of the European Prospective Investigation into Cancer and Nutrition (EPIC) Study], Australia [the Melbourne Collaborative Cohort Study (MCCS)], and Norway [the Norwegian Women and Cancer Study (NOWAC)]. This is the first prospective case–control study to assess the association between SEMs and cancer in blood-derived DNA. The evaluation of whole-blood DNA methylation as a cancer risk marker is of interest because blood is a convenient tissue to assay for constitutional methylation and its collection is noninvasive. Before this study, Wang and colleagues investigated such relationship in blood samples of 375 individuals in old ages (27), whereas Teschendorff and colleagues (28) in cancer cells. We also investigated the biomolecular mechanisms linking aging, DNA methylation patterns, and the risk of different cancers by analyzing the genome-wide distribution of epimutations, to identify functional genomic regions enriched in SEMs, and to describe a possible biomolecular mechanism of carcinogenesis.

Study sample

EPIC Italy, MCCS, and NOWAC are prospective cohort studies with demographic and lifestyle variables and blood samples collected from participants at recruitment. For each cohort, blood samples from a subset of subjects were previously selected for DNA methylation analyses in nested case–control studies, using the incidence density sampling method for case–control matching (10, 29–31). A total of 4,497 case–control matched pairs were included in this study (Table 1). Details of participant recruitment, relevant covariate acquisition, matching parameters, average absolute differences in age between cases and controls, and the time from recruitment to cancer diagnosis [named time to disease (TTD)] by study are reported in the Supplementary Materials and Methods.

Table 1.

Characteristics of the study population stratified for prospective cohorts.

EPIC ItalyaMCCSbNOWACc
(n = 1,112)(n = 7,250)(n = 632)
ControlsCasesPdControlsCasesPdControlsCasesPd
N 556 556  3,482 3,482  316 316  
Agee 53.5 (6.9) 53.7 (7.0)  58.9 (7.6) 59.1 (7.6)  55.9 (4.3) 55.9 (4.3)  
Sex (% females) 385 (69%) 385 (69%)  1,356 (39%) 1,356 (39%)  316 (100%) 316 (100%)  
BMI 
 Normal weight 259 (47%) 232 (42%)  2,647 (76%) 2,583 (74%)  155 (49%) 190 (61%)  
 Overweight 216 (39%) 238 (43%)  659 (19%) 662 (19%)  126 (40%) 90 (29%)  
 Obese 79 (14%) 84 (15%) 0.25 176 (5%) 237 (7%) 0.01 33 (11%) 34 (11%) 0.01 
Smoking 
 Never 268 (49%) 234 (43%)  1,671 (48%) 1,613 (46%)  120 (38%) 83 (26%)  
 Former 155 (28%) 141 (26%)  1,335 (38%) 1,379 (40%)  92 (29%) 93 (29%)  
 Current 128 (23%) 171 (31%) 0.01 476 (14%) 490 (14%) 0.38 101 (32%) 141 (44%) 0.001 
Physical activity 
 High 209 (38%) 183 (34%)  714 (21%) 719 (21%)  60 (20%) 78 (27%)  
 Medium 179 (32%) 190 (35%)  2,021 (58%) 2,010 (58%)  191 (64%) 169 (57%)  
 Low 163 (30%) 173 (32%) 0.31 747 (22%) 753 (22%) 0.96 47 (16%) 47 (16%) 0.16 
Alcohol 
 Nondrinkers 70 (13%) 72 (13%)  984 (28%) 986 (28%)  49 (16%) 53 (17%)  
 Occasional drinkers 346 (63%) 319 (58%)  2,000 (57%) 1,970 (56%)  251 (81%) 246 (78%)  
 Habitual drinkers 134 (24%) 155 (28%) 0.26 498 (14%) 526 (15%) 0.61 11 (4%) 16 (5%) 0.57 
Diet 
 High quality 52 (9%) 43 (8%)  528 (15%) 526 (15%)  98 (32%) 104 (34%)  
 Medium quality 272 (49%) 282 (52%)  2,673 (77%) 2,644 (76%)  97 (31%) 113 (37%)  
 Low quality 226 (41%) 221 (40%) 0.58 281 (8%) 312 (9%) 0.41 114 (37%) 92 (30%) 0.16 
Education 
 High 63 (11%) 58 (11%)  684 (20%) 674 (19%)  73 (24%) 82 (27%)  
 Medium 323 (59%) 322 (59%)  2,100 (60%) 2,071 (59%)  113 (37%) 104 (34%)  
 Low 165 (30%) 166 (30%) 0.91 698 (20%) 737 (21%) 0.51 117 (39%) 122 (40%) 0.62 
EPIC ItalyaMCCSbNOWACc
(n = 1,112)(n = 7,250)(n = 632)
ControlsCasesPdControlsCasesPdControlsCasesPd
N 556 556  3,482 3,482  316 316  
Agee 53.5 (6.9) 53.7 (7.0)  58.9 (7.6) 59.1 (7.6)  55.9 (4.3) 55.9 (4.3)  
Sex (% females) 385 (69%) 385 (69%)  1,356 (39%) 1,356 (39%)  316 (100%) 316 (100%)  
BMI 
 Normal weight 259 (47%) 232 (42%)  2,647 (76%) 2,583 (74%)  155 (49%) 190 (61%)  
 Overweight 216 (39%) 238 (43%)  659 (19%) 662 (19%)  126 (40%) 90 (29%)  
 Obese 79 (14%) 84 (15%) 0.25 176 (5%) 237 (7%) 0.01 33 (11%) 34 (11%) 0.01 
Smoking 
 Never 268 (49%) 234 (43%)  1,671 (48%) 1,613 (46%)  120 (38%) 83 (26%)  
 Former 155 (28%) 141 (26%)  1,335 (38%) 1,379 (40%)  92 (29%) 93 (29%)  
 Current 128 (23%) 171 (31%) 0.01 476 (14%) 490 (14%) 0.38 101 (32%) 141 (44%) 0.001 
Physical activity 
 High 209 (38%) 183 (34%)  714 (21%) 719 (21%)  60 (20%) 78 (27%)  
 Medium 179 (32%) 190 (35%)  2,021 (58%) 2,010 (58%)  191 (64%) 169 (57%)  
 Low 163 (30%) 173 (32%) 0.31 747 (22%) 753 (22%) 0.96 47 (16%) 47 (16%) 0.16 
Alcohol 
 Nondrinkers 70 (13%) 72 (13%)  984 (28%) 986 (28%)  49 (16%) 53 (17%)  
 Occasional drinkers 346 (63%) 319 (58%)  2,000 (57%) 1,970 (56%)  251 (81%) 246 (78%)  
 Habitual drinkers 134 (24%) 155 (28%) 0.26 498 (14%) 526 (15%) 0.61 11 (4%) 16 (5%) 0.57 
Diet 
 High quality 52 (9%) 43 (8%)  528 (15%) 526 (15%)  98 (32%) 104 (34%)  
 Medium quality 272 (49%) 282 (52%)  2,673 (77%) 2,644 (76%)  97 (31%) 113 (37%)  
 Low quality 226 (41%) 221 (40%) 0.58 281 (8%) 312 (9%) 0.41 114 (37%) 92 (30%) 0.16 
Education 
 High 63 (11%) 58 (11%)  684 (20%) 674 (19%)  73 (24%) 82 (27%)  
 Medium 323 (59%) 322 (59%)  2,100 (60%) 2,071 (59%)  113 (37%) 104 (34%)  
 Low 165 (30%) 166 (30%) 0.91 698 (20%) 737 (21%) 0.51 117 (39%) 122 (40%) 0.62 

Note: Bold type indicates statistical significance (P < 0.05).

aProportion of cases: breast cancer (45%), lung cancer (30%), and colorectal cancer (25%).

bProportion of cases: breast cancer (11%), lung cancer (9%), colorectal cancer (23%), gastric cancer (5%), kidney cancer (4%), prostate cancer (24%), UCC (12%), and MBCNs (12%).

cProportion of cases: breast cancer (59%) and lung cancer (41%).

dP values for the χ2 test.

eAverage case-matched control age differences in absolute values (±SD): 0.25 years (±0.26) in EPIC Italy, 1.00 (± 1.01) in MCCS, and 0.13 (±0.33) in NOWAC.

This study was conducted following the principles of the Declaration of Helsinki and its subsequent revisions. EPIC was reviewed and approved by the HuGeF (currently Italian Institute for Genomic Medicine, Candiolo, Torino, Italy) Ethics Committee. The MCCS protocol was approved by the Cancer Council Victoria's Human Research Ethics Committee. NOWAC was approved by the Regional Committee for Medical and Health Research Ethics in North Norway. All study participants signed an informed consent.

DNA methylation analyses

Whole-genome DNA methylation was quantified using the Illumina Infinium HumanMethylation450 BeadChip. Detailed methods and data preprocessing procedures can be found in the Supplementary Materials and Methods. To account for the possible bias introduced by the interindividual variability in the proportion of white blood cells (WBC) in peripheral blood, we estimated the percentage of WBC fractions according to the Houseman algorithm (32), which performs inference using a quadratic programming technique known as linear constrained projection, where nonnegativity and normalization constraints on cellular proportions are imposed during inference (33). We excluded from the analysis bimodal and trimodal CpGs using the function findpeaks in the R package pracma, thus focusing on rare, stochastic events.

Statistical analyses

Identification of SEMs

We computed the total number of SEMs as the sum of extreme DNA methylation values (outliers) per individual, on the basis of a modified version of the procedure described by Gentilini and colleagues (34), which takes into account differential WBC proportions among individuals, as described in Supplementary Materials and Methods. Because the number of SEMs increased exponentially with age, we used a logarithmic transformation of the total number of SEMs [hereafter referred as log(SEM)] for all association analyses.

Computation of epigenetic clock measures

We computed two measures of the epigenetic AA based on Horvath DNAmAge (7) and DNAmGrimAge (19). See Supplementary Materials and Methods for more details about computation of epigenetic clocks and AA measures.

Association of SEMs with cancer risk

We investigated the association between SEMs and the risk of eight types of cancer separately using log(SEM) as the predictor and case–control status as the outcome. ORs and confidence intervals (CI) were calculated using logistic regression models for a one-unit increase in log(SEM). For each cancer and cohort, we ran four regression models: model 1 included age, sex, and study-specific technical variables; model 2 included additional adjustment for smoking, body mass index (BMI), physical activity, alcohol intake, dietary quality, and education (as a proxy for socioeconomic status); model 3 included additional adjustment for Horvath epigenetic AA; and finally, model 4 included additional adjustment for DNAmGrimAge acceleration, which was found to be more strongly associated with cancer risk than Horvath epigenetic AA in the MCCS (35). All covariates were treated as categorical variables with three categories to harmonize sources of information across the three studies (see Supplementary Materials and Methods for more details on harmonization of covariates).

For associations with breast, lung, and colorectal cancer, which were investigated in more than one study, the overall OR estimates for the association between log(SEM) and cancer risk were calculated by random-effect maximum likelihood (REML) (36) meta-analysis using the R package, metafor (37). Heterogeneity in the associations among studies was evaluated using the I2 statistic. Further sensitivity analyses were performed by stratifying case–control pairs based on the TTD; ORs and CIs were calculated for TTD >10 years, TTD between 5 and 10 years, and TTD ≤5 years. The Cochran–Armitage test for trend was used to assess whether ORs followed a trend by TTD.

Further details about enrichment analyses, SEMs' stability over time, and comparison of SEMs in tumor versus normal adjacent tissues [data from The Cancer Genome Atlas (TCGA) project, https://portal.gdc.cancer.gov] are provided in the Supplementary Materials and Methods.

Data availability

The data generated and/or analyzed in this study can be accessed upon reasonable request to the originating cohorts. Access will be conditional on adherence to local ethical and security policy. R codes used for the analyses presented in the article are available upon request. EPIC DNAm partial data can be accessed through Gene Expression Omnibus accession number GSE51057.

Ethics reporting

All participants provided written informed consent, and all contributing cohorts confirmed compliance with their local research ethics committees or institutional review boards.

Association of cancer risk factors with SEMs

In the three cohorts, we observed an exponential increase in the number of SEMs with age both in the complete study sample (Fig. 1, top; Pearson R = 0.17, P = 5 × 10−9; R = 0.04, P = 6 × 10−5; and R = 0.23, P = 2 × 10−9 in EPIC, MCCS, and NOWAC, respectively) and in controls only (Fig. 1, bottom; Pearson R = 0.15, P = 2 × 10−5; R = 0.04, P = 0.01; and R = 0.23, P = 1 × 10−8 in EPIC, MCCS, and NOWAC, respectively). In Table 2 are reported the cross-sectional associations of cancer risk factors with log(SEM) in both the complete study sample and in controls only (more details in Supplementary Materials and Methods).

Figure 1.

Exponential increase of the total number of SEMs with age. A–C, Mean and 95% CIs of the total number of SEMs (on a logarithmic scale) by age group in the three study cohorts, in cases and controls combined (top) and in controls only (bottom). R and P values refer to Pearson correlation test.

Figure 1.

Exponential increase of the total number of SEMs with age. A–C, Mean and 95% CIs of the total number of SEMs (on a logarithmic scale) by age group in the three study cohorts, in cases and controls combined (top) and in controls only (bottom). R and P values refer to Pearson correlation test.

Close modal
Table 2.

Distribution of log(SEM) by cohort and cancer risk factors, in the overall samples and in controls only.

EPIC Italy, all samplesControls onlyMCCS, all samplesControls onlyNOWAC, all samplesControls only
Distribution of log(SEM)NaMean (SD)P (R2)bMean (SD)P (R2)bNaMean (SD)P (R2)bMean (SD)P (R2)bNaMean (SD)P (R2)bMean (SD)P (R2)b
Age 
 <50 328 (29%) 5.79 (0.98)  5.79 (0.98)  1,029 (15%) 6.55 (0.98)  6.54 (0.98)  61 (10%) 5.04 (0.65)  5.16 (0.67)  
 50–55 257 (23%) 6.00 (0.98)  6.01 (0.91)  925 (13%) 6.59 (1.03)  6.58 (1.02)  180 (28%) 5.32 (0.81)  5.44 (0.83)  
 55–60 335 (30%) 6.11 (1.02)  6.04 (0.93)  1,289 (19%) 6.66 (1.09)  6.59 (1.06)  244 (39%) 5.46 (0.81)  5.46 (0.92)  
 >60 192 (17%) 6.28 (1.06) 4 × 10−7 (2.86%) 6.10 (0.84) 0.02 (1.70%) 3,721 (53%) 6.69 (1.07) 9 × 10−5 (0.26%) 6.65 (1.05) 0.05 (0.24%) 147 (23%) 5.74 (0.94) 1 × 10−7 (5.42%) 5.7 (0.9) 0.03 (2.75%) 
Sex 
 Males 342 (31%) 6.08 (0.99)  6.04 (0.89)  4,252 (61%) 6.60 (1.02)  6.58 (1.00)  0 (0%) 0 (0)  0 (0)  
 Females 770 (69%) 5.99 (1.03) 0.16 (0.17%) 5.94 (0.95) 0.24 (0.24%) 2,712 (39%) 6.73 (1.11) 1 × 10−7 (0.35%) 6.67 (1.09) 0.008 (0.20%) 632 (100%) 5.44 (0.87) — 5.48 (0.88)  
Smoking 
 Never 502 (45%) 5.98 (1.00)  5.99 (0.96)  3,284 (47%) 6.63 (1.07)  6.56 (1.04)  203 (32%) 5.43 (0.86)  5.4 (0.84)  
 Former 296 (27%) 5.97 (0.97)  5.93 (0.88)  2,714 (39%) 6.68 (1.06)  6.67 (1.05)  185 (29%) 5.44 (0.87)  5.49 (0.9)  
 Current 299 (27%) 6.16 (1.11) 0.03 (0.66%) 5.97 (0.96) 0.84 (0.06%) 966 (14%) 6.64 (1.01) 0.37 (0.04%) 6.63 (1.01) 0.02 (0.17%) 242 (38%) 5.46 (0.9) 0.94 (0.02%) 5.53 (0.89) 0.55 (0.38%) 
BMIc 
 Normal weight 491 (44%) 5.95 (1.01)  5.88 (0.92)  5,230 (75%) 6.64 (1.05)  6.60 (1.04)  345 (55%) 5.45 (0.88)  5.46 (0.86)  
 Overweight 454 (41%) 6.04 (0.99)  5.99 (0.89)  1,321 (19%) 6.66 (1.10)  6.58 (1.03)  216 (34%) 5.41 (0.87)  5.52 (0.93)  
 Obese 163 (15%) 6.17 (1.12) 0.04 (0.59%) 6.18 (1.05) 0.04 (1.14%) 413 (6%) 6.79 (1.05) 0.01 (0.15%) 6.68 (1.05) 0.03 (0.19%) 67 (11%) 5.52 (0.87) 0.65 (0.14%) 5.46 (0.82) 0.88 (0.08%) 
Physical activity 
 High 392 (35%) 5.99 (1.03)  5.97 (0.99)  1,433 (21%) 6.70 (1.05)  6.67 (1.01)  94 (16%) 5.44 (0.92)  5.27 (0.89)  
 Medium 369 (33%) 6.00 (1.02)  5.93 (0.91)  4,031 (58%) 6.66 (1.06)  6.63 (1.04)  360 (61%) 5.44 (0.87)  5.54 (0.91)  
 Low 336 (30%) 6.10 (1.03) 0.31 (0.22%) 6.02 (0.89) 0.68 (0.14%) 1,500 (22%) 6.58 (1.05) 0.01 (0.12%) 6.52 (1.05) 0.008 (0.17%) 138 (23%) 5.43 (0.87) 0.98 (0.01%) 5.42 (0.78) 0.16 (1.24%) 
Alcohol 
 No 142 (13%) 6.03 (1.02)  5.96 (0.95)  1,970 (28%) 6.66 (1.09)  6.62 (1.06)  102 (16%) 5.43 (0.75)  5.39 (0.81)  
 Occasional 665 (60%) 6.00 (1.01)  5.94 (0.91)  3,970 (57%) 6.65 (1.05)  6.61 (1.02)  497 (79%) 5.44 (0.88)  5.48 (0.88)  
 Habitual 289 (26%) 6.08 (1.05) 0.49 (0.31%) 6.04 (0.98) 0.55 (0.20%) 1,024 (15%) 6.63 (1.04) 0.74 (0.00%) 6.60 (1.04) 0.67 (0.01%) 27 (4%) 5.64 (1.1) 0.49 (0.23%) 5.82 (1.02) 0.24 (0.92%) 
Diet 
 High 95 (9%) 5.90 (0.84)  6.00 (0.95)  1,054 (15%) 6.71 (1.06)  6.56 (0.96)  206 (33%) 5.48 (0.88)  5.51 (0.89)  
 Medium 554 (50%) 6.06 (1.02)  6.01 (0.96)  5,317 (76%) 6.64 (1.06)  6.60 (1.04)  210 (34%) 5.42 (0.82)  5.41 (0.82)  
 Low 447 (40%) 6.01 (1.06) 0.32 (0.21%) 5.81 (0.83) 0.16 (0.67%) 593 (9%) 6.65 (1.04) 0.22 (0.04%) 6.71 (1.04) 0.02 (0.13%) 202 (33%) 5.44 (0.92) 0.74 (0.1%) 5.54 (0.94) 0.51 (0.44%) 
Education 
 High 121 (11%) 5.94 (0.86)  5.94 (0.91)  1,358 (19%) 6.65 (1.01)  6.54 (1.04)  239 (39%) 5.33 (0.86)  5.36 (0.85)  
 Medium 645 (58%) 5.96 (1.02)  5.95 (0.95)  4,171 (60%) 6.68 (1.06)  6.65 (1.05)  217 (36%) 5.45 (0.87)  5.45 (0.88)  
Low 331 (30%) 6.18 (1.07) 0.003 (1.01) 6.02 (0.93) 0.71 (0.12%) 1,435 (21%) 6.59 (1.07) 0.01 (0.04%) 6.57 (0.99) 0.62 (0.007%) 155 (25%) 5.56 (0.86) 0.04 (1.1%) 5.71 (0.89) 0.02 (2.68%) 
EPIC Italy, all samplesControls onlyMCCS, all samplesControls onlyNOWAC, all samplesControls only
Distribution of log(SEM)NaMean (SD)P (R2)bMean (SD)P (R2)bNaMean (SD)P (R2)bMean (SD)P (R2)bNaMean (SD)P (R2)bMean (SD)P (R2)b
Age 
 <50 328 (29%) 5.79 (0.98)  5.79 (0.98)  1,029 (15%) 6.55 (0.98)  6.54 (0.98)  61 (10%) 5.04 (0.65)  5.16 (0.67)  
 50–55 257 (23%) 6.00 (0.98)  6.01 (0.91)  925 (13%) 6.59 (1.03)  6.58 (1.02)  180 (28%) 5.32 (0.81)  5.44 (0.83)  
 55–60 335 (30%) 6.11 (1.02)  6.04 (0.93)  1,289 (19%) 6.66 (1.09)  6.59 (1.06)  244 (39%) 5.46 (0.81)  5.46 (0.92)  
 >60 192 (17%) 6.28 (1.06) 4 × 10−7 (2.86%) 6.10 (0.84) 0.02 (1.70%) 3,721 (53%) 6.69 (1.07) 9 × 10−5 (0.26%) 6.65 (1.05) 0.05 (0.24%) 147 (23%) 5.74 (0.94) 1 × 10−7 (5.42%) 5.7 (0.9) 0.03 (2.75%) 
Sex 
 Males 342 (31%) 6.08 (0.99)  6.04 (0.89)  4,252 (61%) 6.60 (1.02)  6.58 (1.00)  0 (0%) 0 (0)  0 (0)  
 Females 770 (69%) 5.99 (1.03) 0.16 (0.17%) 5.94 (0.95) 0.24 (0.24%) 2,712 (39%) 6.73 (1.11) 1 × 10−7 (0.35%) 6.67 (1.09) 0.008 (0.20%) 632 (100%) 5.44 (0.87) — 5.48 (0.88)  
Smoking 
 Never 502 (45%) 5.98 (1.00)  5.99 (0.96)  3,284 (47%) 6.63 (1.07)  6.56 (1.04)  203 (32%) 5.43 (0.86)  5.4 (0.84)  
 Former 296 (27%) 5.97 (0.97)  5.93 (0.88)  2,714 (39%) 6.68 (1.06)  6.67 (1.05)  185 (29%) 5.44 (0.87)  5.49 (0.9)  
 Current 299 (27%) 6.16 (1.11) 0.03 (0.66%) 5.97 (0.96) 0.84 (0.06%) 966 (14%) 6.64 (1.01) 0.37 (0.04%) 6.63 (1.01) 0.02 (0.17%) 242 (38%) 5.46 (0.9) 0.94 (0.02%) 5.53 (0.89) 0.55 (0.38%) 
BMIc 
 Normal weight 491 (44%) 5.95 (1.01)  5.88 (0.92)  5,230 (75%) 6.64 (1.05)  6.60 (1.04)  345 (55%) 5.45 (0.88)  5.46 (0.86)  
 Overweight 454 (41%) 6.04 (0.99)  5.99 (0.89)  1,321 (19%) 6.66 (1.10)  6.58 (1.03)  216 (34%) 5.41 (0.87)  5.52 (0.93)  
 Obese 163 (15%) 6.17 (1.12) 0.04 (0.59%) 6.18 (1.05) 0.04 (1.14%) 413 (6%) 6.79 (1.05) 0.01 (0.15%) 6.68 (1.05) 0.03 (0.19%) 67 (11%) 5.52 (0.87) 0.65 (0.14%) 5.46 (0.82) 0.88 (0.08%) 
Physical activity 
 High 392 (35%) 5.99 (1.03)  5.97 (0.99)  1,433 (21%) 6.70 (1.05)  6.67 (1.01)  94 (16%) 5.44 (0.92)  5.27 (0.89)  
 Medium 369 (33%) 6.00 (1.02)  5.93 (0.91)  4,031 (58%) 6.66 (1.06)  6.63 (1.04)  360 (61%) 5.44 (0.87)  5.54 (0.91)  
 Low 336 (30%) 6.10 (1.03) 0.31 (0.22%) 6.02 (0.89) 0.68 (0.14%) 1,500 (22%) 6.58 (1.05) 0.01 (0.12%) 6.52 (1.05) 0.008 (0.17%) 138 (23%) 5.43 (0.87) 0.98 (0.01%) 5.42 (0.78) 0.16 (1.24%) 
Alcohol 
 No 142 (13%) 6.03 (1.02)  5.96 (0.95)  1,970 (28%) 6.66 (1.09)  6.62 (1.06)  102 (16%) 5.43 (0.75)  5.39 (0.81)  
 Occasional 665 (60%) 6.00 (1.01)  5.94 (0.91)  3,970 (57%) 6.65 (1.05)  6.61 (1.02)  497 (79%) 5.44 (0.88)  5.48 (0.88)  
 Habitual 289 (26%) 6.08 (1.05) 0.49 (0.31%) 6.04 (0.98) 0.55 (0.20%) 1,024 (15%) 6.63 (1.04) 0.74 (0.00%) 6.60 (1.04) 0.67 (0.01%) 27 (4%) 5.64 (1.1) 0.49 (0.23%) 5.82 (1.02) 0.24 (0.92%) 
Diet 
 High 95 (9%) 5.90 (0.84)  6.00 (0.95)  1,054 (15%) 6.71 (1.06)  6.56 (0.96)  206 (33%) 5.48 (0.88)  5.51 (0.89)  
 Medium 554 (50%) 6.06 (1.02)  6.01 (0.96)  5,317 (76%) 6.64 (1.06)  6.60 (1.04)  210 (34%) 5.42 (0.82)  5.41 (0.82)  
 Low 447 (40%) 6.01 (1.06) 0.32 (0.21%) 5.81 (0.83) 0.16 (0.67%) 593 (9%) 6.65 (1.04) 0.22 (0.04%) 6.71 (1.04) 0.02 (0.13%) 202 (33%) 5.44 (0.92) 0.74 (0.1%) 5.54 (0.94) 0.51 (0.44%) 
Education 
 High 121 (11%) 5.94 (0.86)  5.94 (0.91)  1,358 (19%) 6.65 (1.01)  6.54 (1.04)  239 (39%) 5.33 (0.86)  5.36 (0.85)  
 Medium 645 (58%) 5.96 (1.02)  5.95 (0.95)  4,171 (60%) 6.68 (1.06)  6.65 (1.05)  217 (36%) 5.45 (0.87)  5.45 (0.88)  
Low 331 (30%) 6.18 (1.07) 0.003 (1.01) 6.02 (0.93) 0.71 (0.12%) 1,435 (21%) 6.59 (1.07) 0.01 (0.04%) 6.57 (0.99) 0.62 (0.007%) 155 (25%) 5.56 (0.86) 0.04 (1.1%) 5.71 (0.89) 0.02 (2.68%) 

aMissing data are not reported in the table.

bP values were derived from ANOVA model using the logarithm of the total number of SEMs as the outcome, and each lifestyle variable as the independent predictor, and R2 indicates the proportion of variance explained by each predictor considered in this study.

cNormal weight, BMI <25; overweight, BMI 25 to <30 and obese, BMI ≥30.

Association of SEMs with the risk of cancers

The estimated ORs and CIs derived from the four logistic regression models described in Materials and Methods are reported in Table 3. The ORs from model 1 did not deviate significantly from those estimated in model 2 (additional adjustment for cancer risk factors), and models 3 and 4 (additional adjustment for epigenetic AA).

Table 3.

Results of the logistic regression models with log(SEM) as the predictor and case–control status as the outcome.

Model 1aModel 2bModel 3cModel 4dTTD
OR (95% CI)eP (FDR)OR (95% CI)eP (FDR)OR (95% CI)eP (FDR)OR (95% CI)eP (FDR)NMedian TTD in years (IQR)
Breast cancer 1.25 (1.10–1.40) 0.0002 (0.001) 1.25 (1.11–1.41) 0.0003 (0.002) 1.24 (1.10–1.40) 0.0004 (0.002) 1.24 (1.10–1.40) 0.0004 (0.002) EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 250,409, 186 pairs, respectively EPIC Italy, MCCS, and NOWAC studies; 7.0 yrs (7.1), 7.7 yrs (6.1), 2.1 yrs (2.1), respectively 
Colorectal cancer 1.01 (0.91–1.13) 0.67 (1.00) 1.02 (0.91–1.14) 0.68 (1.00) 1.01 (0.89–1.15) 0.85 (1.00) 1.00 (0.89–1.25) 0.86 (1.00) EPIC Italy and MCCS studies meta-analysis; I2 = 0; N = 139,835 pairs, respectively EPIC Italy and MCCS studies; 6.3 yrs (5.0), 9.3 yrs (8.0), respectively 
Gastric cancer 0.93 (0.72–1.20) 0.57 (1.00) 0.88 0.65–1.16) 0.35 (1.00) 0.91 (0.68–1.22) 0.54 (1.00) 0.91 (0.68–1.22) 0.54 (1.00) MCCS; N = 170 pairs. 11.4 yrs (10.3) 
Kidney cancer 1.19 (0.90–1.57) 0.23 (1.00) 1.20 (0.87–1.66) 0.25 (1.00) 1.21 (0.87–1.67) 0.25 (1.00) 1.24 (0.89–1.72) 0.21 (1.00) MCCS; N = 143 pairs. 11.2 yrs (8.5) 
Lung cancer 1.25 (1.10–1.44) 0.003 (0.02) 1.23 (1.07–1.42) 0.004 (0.02) 1.22 (1.06–1.41) 0.004 (0.02) 1.18 (1.02–1.21) 0.006 (0.03) EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 167, 332,130 pairs, respectively EPIC Italy, MCCS, and NOWAC studies: 7.4 yrs (5.7), 10.1 yrs (7.5), 4.1 yrs (3.2), respectively 
MBCN 1.42 (1.22–1.67) 5 × 10−6 (4 × 10−5) 1.43 (1.22–1.67) 5 × 10−6 (4 × 10−5) 1.43 (1.21–1.68) 2 × 10−5 (2 × 10−41.43 (1.22–1.69) 1 × 10−5 (8 × 10−5) MCCS; N = 439. 10.5 yrs (8.1) 
Prostate cancer 1.01 (0.89–1.15) 0.88 (1.00) 1.01 (0.89–1.16) 0.84 (1.00) 1.01 (0.89–1.15) 0.83 (1.00) 1.02 (0.90–1.16) 0.76 (1.00) MCCS; N = 869. 10.5 yrs (7.9) 
UCC 0.96 (0.83–1.12) 0.61 (1.00) 0.92 (0.79–1.08) 0.33 (1.00) 0.91 (0.78–1.07) 0.27 (1.00) 0.87 (0.74–1.03) 0.1 (1.00) MCCS; N = 428. 6.3 yrs (6.8) 
Model 1aModel 2bModel 3cModel 4dTTD
OR (95% CI)eP (FDR)OR (95% CI)eP (FDR)OR (95% CI)eP (FDR)OR (95% CI)eP (FDR)NMedian TTD in years (IQR)
Breast cancer 1.25 (1.10–1.40) 0.0002 (0.001) 1.25 (1.11–1.41) 0.0003 (0.002) 1.24 (1.10–1.40) 0.0004 (0.002) 1.24 (1.10–1.40) 0.0004 (0.002) EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 250,409, 186 pairs, respectively EPIC Italy, MCCS, and NOWAC studies; 7.0 yrs (7.1), 7.7 yrs (6.1), 2.1 yrs (2.1), respectively 
Colorectal cancer 1.01 (0.91–1.13) 0.67 (1.00) 1.02 (0.91–1.14) 0.68 (1.00) 1.01 (0.89–1.15) 0.85 (1.00) 1.00 (0.89–1.25) 0.86 (1.00) EPIC Italy and MCCS studies meta-analysis; I2 = 0; N = 139,835 pairs, respectively EPIC Italy and MCCS studies; 6.3 yrs (5.0), 9.3 yrs (8.0), respectively 
Gastric cancer 0.93 (0.72–1.20) 0.57 (1.00) 0.88 0.65–1.16) 0.35 (1.00) 0.91 (0.68–1.22) 0.54 (1.00) 0.91 (0.68–1.22) 0.54 (1.00) MCCS; N = 170 pairs. 11.4 yrs (10.3) 
Kidney cancer 1.19 (0.90–1.57) 0.23 (1.00) 1.20 (0.87–1.66) 0.25 (1.00) 1.21 (0.87–1.67) 0.25 (1.00) 1.24 (0.89–1.72) 0.21 (1.00) MCCS; N = 143 pairs. 11.2 yrs (8.5) 
Lung cancer 1.25 (1.10–1.44) 0.003 (0.02) 1.23 (1.07–1.42) 0.004 (0.02) 1.22 (1.06–1.41) 0.004 (0.02) 1.18 (1.02–1.21) 0.006 (0.03) EPIC Italy, MCCS, and NOWAC studies meta-analysis; I2 = 0; N = 167, 332,130 pairs, respectively EPIC Italy, MCCS, and NOWAC studies: 7.4 yrs (5.7), 10.1 yrs (7.5), 4.1 yrs (3.2), respectively 
MBCN 1.42 (1.22–1.67) 5 × 10−6 (4 × 10−5) 1.43 (1.22–1.67) 5 × 10−6 (4 × 10−5) 1.43 (1.21–1.68) 2 × 10−5 (2 × 10−41.43 (1.22–1.69) 1 × 10−5 (8 × 10−5) MCCS; N = 439. 10.5 yrs (8.1) 
Prostate cancer 1.01 (0.89–1.15) 0.88 (1.00) 1.01 (0.89–1.16) 0.84 (1.00) 1.01 (0.89–1.15) 0.83 (1.00) 1.02 (0.90–1.16) 0.76 (1.00) MCCS; N = 869. 10.5 yrs (7.9) 
UCC 0.96 (0.83–1.12) 0.61 (1.00) 0.92 (0.79–1.08) 0.33 (1.00) 0.91 (0.78–1.07) 0.27 (1.00) 0.87 (0.74–1.03) 0.1 (1.00) MCCS; N = 428. 6.3 yrs (6.8) 

Note: Bold type indicates statistical significance (P < 0.05).

Abbreviations: IQR, interquartile range; yrs, years.

aModel 1 is adjusted for matching variables and study-specific variables.

bModel 2 includes additional adjustment for smoking, BMI, dietary quality, alcohol intake, physical activity, and education.

cModel 3 includes additional adjustment for Horvath DNAmAge epigenetic AA.

dModel 4 includes additional adjustment for Horvath DNAmAge and DNAmGrimAge epigenetic AA.

eORs per one-unit increase in log(SEM).

A higher number of SEMs was associated with an increased risk of breast cancer (three-study meta-analysis: OR = 1.25; 95% CI, 1.11–1.41; P = 0.0003; I2 = 0%; Fig. 2A) and lung cancer (three-study meta-analysis: OR = 1.23; 95% CI, 1.07–1.42; P = 0.004; I2 = 0%; Fig. 2B). In MCCS, log(SEM) was associated with MBCN (OR = 1.43; 95% CI, 1.22–1.67; P = 5 × 10−06; Table 3). ORs greater than one were also observed for colorectal, kidney, and prostate cancers, although the associations were not statistically significant (Table 3). In the analysis stratified by TTD, ORs significantly increase as the TDD decreased for breast, lung, colorectal (all Ptrend < 0.001), MBCN, and prostate cancers (for both, Ptrend < 0.05; Fig. 3).

Figure 2.

Total number of SEMs and risk of breast and lung cancer. Forest plots representing the REML meta-analysis across three studies for breast (A) and lung cancer (B), and the meta-analysis of EPIC Italy and MCCS for colorectal cancer (C).

Figure 2.

Total number of SEMs and risk of breast and lung cancer. Forest plots representing the REML meta-analysis across three studies for breast (A) and lung cancer (B), and the meta-analysis of EPIC Italy and MCCS for colorectal cancer (C).

Close modal
Figure 3.

ORs significantly increase as TTD decreases in breast, lung, colorectal, and prostate cancers and MBCN. Forest plots indicating ORs stratified by the TTD and type of cancer. P values refer to the Cochran–Armitage test for trend.

Figure 3.

ORs significantly increase as TTD decreases in breast, lung, colorectal, and prostate cancers and MBCN. Forest plots indicating ORs stratified by the TTD and type of cancer. P values refer to the Cochran–Armitage test for trend.

Close modal

Association of number of SEMs with epigenetic clocks

As shown in Supplementary Figs. S1 and S2 (top), the number of SEMs was positively correlated with Horvath DNAmAge (R = 0.25, P < 0.0001; R = 0.03, P = 0.001; and R = 0.20, P = 0.04 in EPIC, MCCS, and NOWAC, respectively) and with DNAmGrimAge (R = 0.25, P = 0.0005; R = 0.07, P < 0.0001; and R = 0.24, P = 0.04 in EPIC, MCCS, and NOWAC, respectively) in all three studies. Consistent results were obtained from the same analyses on controls only (Supplementary Figs. S1 and S2, bottom).

Enrichment analyses

We found an enrichment of epimutations in genomic regions characterized by open chromatin states, CpG islands, and shores (P = 0.02, P = 0.05, and P = 0.0003, respectively; Supplementary Table S1), “inactive/poised promoters” (P < 0.0001), “heterochromatin/low signal/copy-number variation (CNV)” (P < 0.0001), “polycomb-repressed” regions (P = 0.02; Supplementary Table S2), in transcription factor binding sites (TFBS) targeted by two members of the polycomb repressive complex 2 (PRC2): EZH2 and SUZ12 (P < 0.0001 and P = 0.0005, respectively; Supplementary Table S3), and by the transcriptional corepressor, ctBP2 (P = 0.001; Supplementary Table S3).

SEMs' stability over time

We investigated the stability of SEMs over time, by analyzing longitudinal DNAm data of 33 healthy controls in EPIC Italy who participated as healthy volunteers in the PEM-Turin study within the EXPOsOMICS study (38), also, and for whom DNAm was assessed on average 18.7 years after recruitment (range, 16.4–20.3 years).

In longitudinal regression model, log(SEM) significantly increased over time (increase per-year = 0.168 ± 0.007; P < 0.0001; Supplementary Fig. S3). The average percentage of SEMs stable over time was 71% (range, 55%–93%), EZH2-specific SEMs being more stable compared with SEMs appearing over the whole genome (average EZH2-specific stable SEMs = 87%; range = 62%–100%; χ2 test for proportion, P < 0.0001).

SEMs in tumor compared with normal adjacent tissues

We analyzed data from TCGA project for lung, breast, and colorectal cancers; log(SEM) was significantly higher in tumors compared with normal adjacent tissues (all P < 0.0001; Supplementary Fig. S4), both considering whole-genome log(SEM) and EZH2-specific log(SEM). The average percentage of SEMs conserved in tumor from normal adjacent tissue was 72% (range, 54%–98%). The proportion of conserved EZH2-specific SEMs was significantly higher, 87% (range, 61%–97%, χ2 test for proportion, P < 0.0001). Enrichment analyses confirmed SEMs more likely occur in usually silenced genomic regions, like inactive or poised promoters, polycomb-repressed regions, and regions targeted by EZH2 and SUZ12.

It is worth observing that the majority of the CpGs in genomic regions targeted by EZH2 were on average hypomethylated (more than 80% of the CpGs had an average DNAm beta-value lower than 20%; Supplementary Fig. S5). Consequently, more than 95% of EZH2-specific SEMs occurred as abnormal hypermethylation of a locus that is hypomethylated in the overall sample.

In Supplementary Materials and Methods, we have shown further details about SEMs in EZH2 targets, SEMs' stability over time, and the analyses of data from TCGA.

In this study, we have analyzed DNAm data from blood samples of approximately 4,500 case–control pairs nested within three large prospective cohorts: EPIC Italy, MCCS, and NOWAC. The main aim of this study was to investigate the association of the total number of SEMs with cancers using a prospective study design. In addition, we investigated SEMs' stability over time and the genomic regions in which SEMs appear more frequently.

SEMs increasing with aging and stability over time

The number of estimated SEMs per sample varied by cohort; however, we observed an exponential increase of SEMs with age in all cohorts (Fig. 1) confirming previous findings (27, 34). Differences in the number of SEMs between studies were mainly driven by different DNAm data normalization and preprocessing procedures, as well as by variable study sample size, which affects CpGs DNAm values distribution. Consequently, the magnitude of the association of log(SEM) with age (Fig. 1) and epigenetic clocks (Supplementary Figs. S1 and S2) varied by cohort as well. We overcame these batch effect issues with our study design by assaying both matched case–control samples in the same batch. The results observed in our cross-sectional study and reported in the literature about the exponential increase of SEMs with age were further confirmed using longitudinal data, available for a subset of the EPIC Italy cohort included in the EXPOsOMICS study, for whom DNAm was assessed at recruitment and on average 19 years later (methodologic details in Supplementary Materials and Methods). In this longitudinal analysis, we also observed a high interindividual variability both in the total number and in the growth rate of SEMs over time (Supplementary Fig. S3), strengthening our hypothesis of SEMs as candidate biomarkers of accumulation of exposure-related DNA damage during aging. Accordingly, we observed a cross-sectional association of SEMs with lifestyle-related factors such as smoking, obesity, and dietary quality, as we observed previously with alcohol intake and socioeconomic status (24). Also, log(SEM) positively correlates with the widely studied biological aging measures based on the epigenetic clocks developed by Horvath and colleagues (Supplementary Figs. S1 and S2; ref. 7). The association between the two age-related biomarkers is not driven by their association with chronologic age, because AA is independent of chronologic age by definition (12).

We were not able to investigate whether changes in lifestyle may slow down aging-related increase in SEMs. A recent study analyzing longitudinal data on SEMs in twins concluded that a small percentage of the differences in SEMs' growth rate within individuals might be driven by underlying genetic background. These results suggest other exposures may play a significant role, worthy of further investigation (27). On the other hand, using longitudinal data, we showed that once epimutations are established, most of them remain stable over time. Previous findings indicate that methylation patterns are transmittable during cell divisions (39), suggesting SEMs may be inherited through mitosis.

SEMs' association with cancer risk

The main findings of this study were the associations of the number of SEMs with a higher risk of breast and lung cancers, and MBCN. The estimated ORs were not confounded by age because cases and controls were age-matched. Moreover, we further adjusted for age to control for potential residual confounding. The observed associations remained significant even after adjustment for smoking, BMI, physical activity, diet, alcohol consumption, and epigenetic clock measures. Although we observed an association of the total number of SEMs with cancer risk factors like smoking, obesity, and epigenetic clocks, the results obtained in model 1 (minimally adjusted), model 2 (adjusted for various cancer risk factors), models 3 and 4 (additionally adjusted for epigenetic clocks measures) did not differ significantly. These results suggest that the increased number of SEMs that is associated with unhealthy lifestyle explains a small part of the association of log(SEM) with cancer, meaning that other biological mechanisms, including inflammation, reduced DNA repair capacity (40), and other unmeasured environmental and lifestyle exposures (e.g., exposure to toxicants) could be the main drivers of these associations. In MCCS, DNAmGrimAge AA outperforms first-generation clocks in predicting several cancers, the strongest association being with risk of lung cancer, after adjustment for smoking intensity and other smoking-related variables (35). The association of log(SEM) with breast and lung cancer and with MBCN was minimally changed after adjustment for DNAmGrimAge AA, suggesting that SEMs and epigenetic clocks are independent DNAm-based biomarkers, likely involving distinct biomolecular alterations. Further studies are needed to clarify the underlying biological mechanisms linking SEMs and DNAmGrimAge to cancer risk.

Our results indicate that alterations of DNA methylation profiles could be detected in the blood years before cancer diagnosis and, together with previous studies (25, 27), suggest that an increasing number of SEMs in blood could be predictive of risk of future cancers. The differences between cases and matched controls increased as the time from blood collection to cancer diagnosis decreased (Fig. 3) in all but one type of cancer investigated. We found a significant trend of increasing OR as the TTD decreased in breast, lung, colorectal, and prostate cancers and MBCN, suggesting that the predictive utility of the log(SEM) biomarker is greater for short-term risk.

SEMs occur more likely in specific genomic regions

We found that regions and sites affected by epimutations are not entirely “stochastic”; instead, they are enriched in specific genomic regions, and randomly distributed inside them (34). This behavior could be defined as “local, but nonglobal, stochasticity.” Our findings confirmed that epimutations preferentially occur in DNA sequences associated with open chromatin, as previously observed by Ong and colleagues (41). Furthermore, SEMs were enriched in transcriptionally silenced genomic regions such as “inactive promoters,” “heterochromatin/low signal/CNV,” and “polycomb-repressed” regions. In addition, epimutations more likely appeared in TFBSs targeted by two members of PRC2, EZH2 and SUZ12, and the transcriptional corepressor, ctBP2. Similar patterns of DNAm alterations were described in normal breast tissue adjacent to cancerous breast tissue, compared with breast tissue in cancer-free women (42). Comparing tumor with normal adjacent tissue using data from TCGA project on breast, lung, and colorectal cancers, we also found similar results. Interestingly, EZH2-specific SEMs are more stable over time (and conserved in tumor from normal adjacent tissue) compared with epimutations appearing in the rest of the genome.

A possible mechanism of carcinogenesis

We observed that the vast majority of EZH2-specific epimutations occur as gain of methylation in CpGs unmethylated in the general population (Supplementary Fig. S5). Because both DNA methylation and polycomb repression system are key players in cancer formation and progression, the SEM-driven reshaping of the epigenetic landscape observed with aging could play a role in PRCs relocation, a crucial mechanism in the establishment of a more cancer prone environment in healthy people years before the diagnosis.

The transcriptional regulations by DNA methylation and that by PRC2 proteins are related, they rarely act simultaneously on CpG islands (43) and removal of the epigenetic mark provokes a redistribution of the PRC2-distinctive H3K27me3 in mammalian cells. At a functional level, the link between aging, PRC2, and global DNA methylation dysregulation involves the loss of self-renewal capacity of adult stem cells (44). Multipotent stem cell senescence in vitro is characterized by downregulation of PRC2 genes, including EZH2 and SUZ12 (44). Therefore, a downregulation of EZH2 and SUZ12 may induce a further dysregulation of PRC2 protein complex targets, including several tumor suppressor genes (45), such as p53 (46).

The dynamics of the interaction between the polycomb protein complex and DNA methylation are complex and not entirely understood yet. The two repressive systems are mutually exclusive and DNA methylation prevents polycomb from accessing the promoter in vitro (47). In this study, we observed that aging may increase the enrichment of methylated sites in correspondence of TFBSs targeted by EZH2 and SUZ12, consequently altering the efficacy of epigenetic regulation of polycomb complex. Therefore, we could hypothesize that during aging, a more stable epigenetic silencing by DNA methylation could replace the more dynamic polycomb repressive signal, contributing to the early mechanisms involved in age-related diseases, specifically some types of cancer. As proposed by others, the tumor suppressive genes regulated by polycomb complex may switch from a dynamic to a fixed repressive state (48–50). In this context, tumor suppressor genes would not work properly, letting cells grow abnormally and become malignant. More studies are needed to verify these data that raised new intriguing hypotheses connecting aging and cancer. The fact that SEMs data have been extracted from prospective study enforces the observations done on patients with cancer when the disease was already present (51).

Study limitations

Although most risk factors were measured carefully in the three cohort studies, the procedure to minimize variability, due to the different sources of information, possibly introduced bias in the regression models we used.

We measured DNA methylation levels in blood and not in target tissues for each cancer type. Tissue biopsy is the gold standard approach for patients' diagnosis and prognostication. However, especially for early stages, a tissue biopsy sampling could be difficult or even dangerous (52). Epigenetic signatures in whole-blood DNA might reflect the interaction of host genetic and environmental factors associated with cancer susceptibility as previously shown by others (53–57), and further supported by our results on tumor tissues.

We found significant associations of SEMs with three of eight cancers investigated and overall small magnitude in the effect sizes. These results suggest that not every epimutation is potentially dangerous suggesting further research. Combining DNA methylation and gene expression data from both blood and tissue of the same individuals will help to elucidate whether or not specific genes or genomic regions influence cancer risk when affected by SEMs, keeping in mind that each type of cancer is a distinct disease, with its unique genetic landscape. Moreover, our results from the analyses stratified by TTD suggest that the predictive utility of the log(SEM) biomarker is greater for short-term risk.

Future studies are also needed to identify cancer-specific epimutational signatures and to understand the biological mechanisms associated with accumulation of epimutations during the lifespan, possibly involving the individual genetic background and DNA repair capacity.

Conclusions

A higher number of SEMs was significantly associated with an increased risk of breast and lung cancers and with MBCN. We confirmed previous observations about the exponential increase of SEMs with aging using longitudinal data showing that most of SEMs are stable over time and conserved in tumor compared with normal adjacent tissue. Finally, we showed that SEMs more likely occur in specific genomic regions, suggesting a biomolecular mechanism involving PRC2 proteins, which may deserve further investigation. These observations might open new avenues for the understanding the biomolecular mechanisms of carcinogenesis.

P.-A. Dugué reports grants from the National Health and Medical Research Council (1106016, 1011618, 1026892, 1027505, 1050198, 57 1087683, 1088405, 1043616, 209057, 396414, and 1074383) during the conduct of the study. A.M. Hodge reports grants from the National Health and Medical Research Council during the conduct of the study. D.R. English reports grants from the National Health and Medical Research Council during the conduct of the study. G.G. Giles reports grants from the National Health and Medical Research Council (paid to Cancer Council Victoria) during the conduct of the study. R.L. Milne reports grants from the National Health and Medical Research Council during the conduct of the study. No potential conflicts of interest were disclosed by the other authors.

A. Gagliardi: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. P.-A. Dugué: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. T.H. Nøst: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. M.C. Southey: Data curation, funding acquisition, writing–review and editing. D.D. Buchanan: Writing–review and editing. D.F. Schmidt: Writing–review and editing. E. Makalic: Writing–review and editing. A.M. Hodge: Funding acquisition, writing–review and editing. D.R. English: Funding acquisition, writing–review and editing. N.W. Doo: Writing–review and editing. J.L. Hopper: Writing–review and editing. G. Severi: Funding acquisition, writing–review and editing. L. Baglietto: Funding acquisition, writing–review and editing. A. Naccarati: Funding acquisition, writing–review and editing. S. Tarallo: Writing–review and editing. L. Pace: Writing–review and editing. V. Krogh: Funding acquisition, writing–review and editing. D. Palli: Funding acquisition, writing–review and editing. S. Panico: Funding acquisition, writing–review and editing. C. Sacerdote: Funding acquisition, writing–review and editing. R. Tumino: Funding acquisition, writing–review and editing. E. Lund: Funding acquisition, writing–review and editing. G.G. Giles: Funding acquisition, writing–review and editing. B. Pardini: Writing–review and editing. T.M. Sandanger: Funding acquisition, writing–review and editing. R.L. Milne: Funding acquisition, writing–review and editing. P. Vineis: Funding acquisition, writing–review and editing. S. Polidoro: Conceptualization, data curation, writing–review and editing. G. Fiorito: Conceptualization, data curation, supervision, writing–review and editing.

The authors are very thankful to Dr. Akram Ghantous (IARC, Lyon, France) for the methylation analyses of PEM-Turin study, produced within the Exposomics EC FP7 grant (grant agreement no. 308610, to P. Vineis). The results here are, in part, based upon data generated by TCGA research network: https://www.cancer.gov/tcga. The EPIC Italy component of this research was supported by European Commission grant no. 633666, awarded to P. Vineis, and by the AIRC grant (Progetto IG 2013 N.14410, to C. Sacerdote) for part of the DNA methylation experiments. The Melbourne Collaborative Cohort Study (MCCS) cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS component of the work was funded by the Australian National Health and Medical Research Council, including grants 1106016 (to G.G. Giles); 1011618 (to L. Baglietto); 1026892 (to M.C. Southey); 1027505 (to D.R. English); 1050198 and 1087683 (to A.M. Hodge); 1088405 (to R.L. Milne); and 1043616, 209057, 396414, and 1074383 (to G.G. Giles). Cases and their vital status were ascertained through the Victorian Cancer Registry and the Australian Institute of Health and Welfare, including the National Death Index and the Australian Cancer Database. The NOWAC component of the work was supported by the European Research Council Advanced Researcher grant, 2008: Transcriptomics in cancer research (grant no. ERC-2008-AdG, to E. Lund).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Ben-Shlomo
Y
,
Kuh
D
. 
A life course approach to chronic disease epidemiology: conceptual models, empirical challenges and interdisciplinary perspectives
.
Int J Epidemiol
2002
;
31
:
285
93
.
2.
Berger
NA
,
Savvides
P
,
Koroukian
SM
,
Kahana
EF
,
Deimling
GT
,
Rose
JH
, et al
Cancer in the elderly
.
Trans Am Clin Climatol Assoc
2006
;
117
:
147
55
.
3.
Lund
E
. 
An exposure driven functional model of carcinogenesis
.
Med Hypotheses
2011
;
77
:
195
8
.
4.
Lopez-Otin
C
,
Blasco
MA
,
Partridge
L
,
Serrano
M
,
Kroemer
G
. 
The hallmarks of aging
.
Cell
2013
;
153
:
1194
217
.
5.
Rozhok
AI
,
DeGregori
J
. 
The evolution of lifespan and age-dependent cancer risk
.
Trends Cancer
2016
;
2
:
552
60
.
6.
Jones
MJ
,
Goodman
SJ
,
Kobor
MS
. 
DNA methylation and healthy human aging
.
Aging Cell
2015
;
14
:
924
32
.
7.
Horvath
S
. 
DNA methylation age of human tissues and cell types
.
Genome Biol
2013
;
14
:
R115
.
8.
Hannum
G
,
Guinney
J
,
Zhao
L
,
Zhang
L
,
Hughes
G
,
Sadda
S
, et al
Genome-wide methylation profiles reveal quantitative views of human aging rates
.
Mol Cell
2013
;
49
:
359
67
.
9.
Dugue
PA
,
Bassett
JK
,
Joo
JE
,
Baglietto
L
,
Jung
CH
,
Wong
EM
, et al
Association of DNA methylation-based biological age with health risk factors and overall and cause-specific mortality
.
Am J Epidemiol
2018
;
187
:
529
38
.
10.
Dugue
PA
,
Bassett
JK
,
Joo
JE
,
Jung
CH
,
Ming Wong
E
,
Moreno-Betancur
M
, et al
DNA methylation-based biological aging and cancer risk and survival: pooled analysis of seven prospective studies
.
Int J Cancer
2018
;
142
:
1611
9
.
11.
Marioni
RE
,
Shah
S
,
McRae
AF
,
Chen
BH
,
Colicino
E
,
Harris
SE
, et al
DNA methylation age of blood predicts all-cause mortality in later life
.
Genome Biol
2015
;
16
:
25
.
12.
Chen
BH
,
Marioni
RE
,
Colicino
E
,
Peters
MJ
,
Ward-Caviness
CK
,
Tsai
PC
, et al
DNA methylation-based measures of biological age: meta-analysis predicting time to death
.
Aging
2016
;
8
:
1844
65
.
13.
Horvath
S
,
Gurven
M
,
Levine
ME
,
Trumble
BC
,
Kaplan
H
,
Allayee
H
, et al
An epigenetic clock analysis of race/ethnicity, sex, and coronary heart disease
.
Genome Biol
2016
;
17
:
171
.
14.
Marioni
RE
,
Shah
S
,
McRae
AF
,
Ritchie
SJ
,
Muniz-Terrera
G
,
Harris
SE
, et al
The epigenetic clock is correlated with physical and cognitive fitness in the Lothian Birth Cohort 1936
.
Int J Epidemiol
2015
;
44
:
1388
96
.
15.
Horvath
S
,
Erhart
W
,
Brosch
M
,
Ammerpohl
O
,
von Schonfels
W
,
Ahrens
M
, et al
Obesity accelerates epigenetic aging of human liver
.
Proc Natl Acad Sci USA
2014
;
111
:
15538
43
.
16.
Quach
A
,
Levine
ME
,
Tanaka
T
,
Lu
AT
,
Chen
BH
,
Ferrucci
L
, et al
Epigenetic clock analysis of diet, exercise, education, and lifestyle factors
.
Aging
2017
;
9
:
419
46
.
17.
Fiorito
G
,
Polidoro
S
,
Dugue
PA
,
Kivimaki
M
,
Ponzi
E
,
Matullo
G
, et al
Social adversity and epigenetic aging: a multi-cohort study on socioeconomic differences in peripheral blood DNA methylation
.
Sci Rep
2017
;
7
:
16266
.
18.
Levine
ME
,
Lu
AT
,
Quach
A
,
Chen
BH
,
Assimes
TL
,
Bandinelli
S
, et al
An epigenetic biomarker of aging for lifespan and healthspan
.
Aging
2018
;
10
:
573
91
.
19.
Lu
AT
,
Quach
A
,
Wilson
JG
,
Reiner
AP
,
Aviv
A
,
Raj
K
, et al
DNA methylation GrimAge strongly predicts lifespan and healthspan
.
Aging
2019
;
11
:
303
27
.
20.
Dugué
P
,
Li
S
,
Hopper
JL
,
Milne
RL
. 
DNA methylation-based measures of biological aging
.
Epigenetics in human disease
.
Cambridge (MA)
:
Academic Press
; 
2018
. p.
39
64
.
21.
Teschendorff
AE
,
West
J
,
Beck
S
. 
Age-associated epigenetic drift: implications, and a case of epigenetic thrift?
Hum Mol Genet
2013
;
22
:
R7
R15
.
22.
Yamashita
S
,
Kishino
T
,
Takahashi
T
,
Shimazu
T
,
Charvat
H
,
Kakugawa
Y
, et al
Genetic and epigenetic alterations in normal tissues have differential impacts on cancer risk among tissues
.
Proc Natl Acad Sci U S A
2018
;
115
:
1328
33
.
23.
Haque
MM
,
Nilsson
EE
,
Holder
LB
,
Skinner
MK
. 
Genomic clustering of differential DNA methylated regions (epimutations) associated with the epigenetic transgenerational inheritance of disease and phenotypic variation
.
BMC Genomics
2016
;
17
:
418
.
24.
Fiorito
G
,
McCrory
C
,
Robinson
O
,
Carmeli
C
,
Rosales
CO
,
Zhang
Y
, et al
Socioeconomic position, lifestyle habits and biomarkers of epigenetic aging: a multi-cohort analysis
.
Aging
2019
;
11
:
2045
70
.
25.
Gentilini
D
,
Scala
S
,
Gaudenzi
G
,
Garagnani
P
,
Capri
M
,
Cescon
M
, et al
Epigenome-wide association study in hepatocellular carcinoma: identification of stochastic epigenetic mutations through an innovative statistical approach
.
Oncotarget
2017
;
8
:
41890
902
.
26.
Oey
H
,
Whitelaw
E
. 
On the meaning of the word ‘epimutation’
.
Trends Genet
2014
;
30
:
519
20
.
27.
Wang
Y
,
Karlsson
R
,
Jylhava
J
,
Hedman
AK
,
Almqvist
C
,
Karlsson
IK
, et al
Comprehensive longitudinal study of epigenetic mutations in aging
.
Clin Epigenetics
2019
;
11
:
187
.
28.
Teschendorff
AE
,
Jones
A
,
Fiegl
H
,
Sargent
A
,
Zhuang
JJ
,
Kitchener
HC
, et al
Epigenetic variability in cells of normal cytology is associated with the risk of future morphological transformation
.
Genome Med
2012
;
4
:
24
.
29.
Fasanelli
F
,
Baglietto
L
,
Ponzi
E
,
Guida
F
,
Campanella
G
,
Johansson
M
, et al
Hypomethylation of smoking-related genes is associated with future lung cancer in four prospective cohorts
.
Nat Commun
2015
;
6
:
10192
.
30.
Baglietto
L
,
Ponzi
E
,
Haycock
P
,
Hodge
A
,
Bianca Assumma
M
,
Jung
CH
, et al
DNA methylation changes measured in pre-diagnostic peripheral blood samples are associated with smoking and lung cancer risk
.
Int J Cancer
2017
;
140
:
50
61
.
31.
van Veldhoven
K
,
Polidoro
S
,
Baglietto
L
,
Severi
G
,
Sacerdote
C
,
Panico
S
, et al
Epigenome-wide association study reveals decreased average methylation levels years before breast cancer diagnosis
.
Clin Epigenetics
2015
;
7
:
67
.
32.
Houseman
EA
,
Accomando
WP
,
Koestler
DC
,
Christensen
BC
,
Marsit
CJ
,
Nelson
HH
, et al
DNA methylation arrays as surrogate measures of cell mixture distribution
.
BMC Bioinformatics
2012
;
13
:
86
.
33.
Teschendorff
AE
,
Breeze
CE
,
Zheng
SC
,
Beck
S
. 
A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies
.
BMC Bioinformatics
2017
;
18
:
105
.
34.
Gentilini
D
,
Garagnani
P
,
Pisoni
S
,
Bacalini
MG
,
Calzari
L
,
Mari
D
, et al
Stochastic epigenetic mutations (DNA methylation) increase exponentially in human aging and correlate with X chromosome inactivation skewing in females
.
Aging
2015
;
7
:
568
78
.
35.
Dugue
PA
,
Bassett
JK
,
Wong
EM
,
Joo
JE
,
Li
S
,
Yu
C
, et al
Biological aging measures based on blood DNA methylation and risk of cancer: a prospective study
.
MedRxiv
2020.04.08.20058727 [Preprint]
. 
2020
.
Available from
: .
36.
Breusch
TS
. 
Maximum likelihood estimation of random effects model
.
J Econometrics
1987
;
36
:
383
9
.
37.
Viechtbauer
W
. 
Conducting meta-analyses in R with the metafor package
.
J Stat Softw
2010
;
36
:
1
48
.
38.
Vineis
P
,
Chadeau-Hyam
M
,
Gmuender
H
,
Gulliver
J
,
Herceg
Z
,
Kleinjans
J
, et al
The exposome in practice: design of the EXPOsOMICS project
.
Int J Hyg Environ Health
2017
;
220
:
142
51
.
39.
Robertson
KD
. 
DNA methylation, methyltransferases, and cancer
.
Oncogene
2001
;
20
:
3139
55
.
40.
Slyskova
J
,
Korenkova
V
,
Collins
AR
,
Prochazka
P
,
Vodickova
L
,
Svec
J
, et al
Functional, genetic, and epigenetic aspects of base and nucleotide excision repair in colorectal carcinomas
.
Clin Cancer Res
2012
;
18
:
5878
87
.
41.
Ong
ML
,
Holbrook
JD
. 
Novel region discovery method for Infinium 450K DNA methylation data reveals changes associated with aging in muscle and neuronal pathways
.
Aging Cell
2014
;
13
:
142
55
.
42.
Teschendorff
AE
,
Gao
Y
,
Jones
A
,
Ruebner
M
,
Beckmann
MW
,
Wachter
DL
, et al
DNA methylation outliers in normal breast tissue identify field defects that are enriched in cancer
.
Nat Commun
2016
;
7
:
10478
.
43.
Brinkman
AB
,
Gu
H
,
Bartels
SJ
,
Zhang
Y
,
Matarese
F
,
Simmer
F
, et al
Sequential ChIP-bisulfite sequencing enables direct genome-scale investigation of chromatin and DNA methylation cross-talk
.
Genome Res
2012
;
22
:
1128
38
.
44.
Jung
JW
,
Lee
S
,
Seo
MS
,
Park
SB
,
Kurtz
A
,
Kang
SK
, et al
Histone deacetylase controls adult stem cell aging by balancing the expression of polycomb genes and jumonji domain containing 3
.
Cell Mol Life Sci
2010
;
67
:
1165
76
.
45.
Zingg
D
,
Debbache
J
,
Schaefer
SM
,
Tuncer
E
,
Frommel
SC
,
Cheng
P
, et al
The epigenetic modifier EZH2 controls melanoma growth and metastasis through silencing of distinct tumour suppressors
.
Nat Commun
2015
;
6
:
6051
.
46.
Shiogama
S
,
Yoshiba
S
,
Soga
D
,
Motohashi
H
,
Shintani
S
. 
Aberrant expression of EZH2 is associated with pathological findings and P53 alteration
.
Anticancer Res
2013
;
33
:
4309
17
.
47.
Sproul
D
,
Meehan
RR
. 
Genomic insights into cancer-associated aberrant CpG island hypermethylation
.
Brief Funct Genomics
2013
;
12
:
174
90
.
48.
Ohm
JE
,
McGarvey
KM
,
Yu
X
,
Cheng
L
,
Schuebel
KE
,
Cope
L
, et al
A stem cell-like chromatin pattern may predispose tumor suppressor genes to DNA hypermethylation and heritable silencing
.
Nat Genet
2007
;
39
:
237
42
.
49.
Baylin
SB
,
Ohm
JE
. 
Epigenetic gene silencing in cancer - a mechanism for early oncogenic pathway addiction?
Nat Rev Cancer
2006
;
6
:
107
16
.
50.
Widschwendter
M
,
Fiegl
H
,
Egle
D
,
Mueller-Holzner
E
,
Spizzo
G
,
Marth
C
, et al
Epigenetic stem cell signature in cancer
.
Nat Genet
2007
;
39
:
157
8
.
51.
Tsai
HC
,
Baylin
SB
. 
Cancer epigenetics: linking basic biology to clinical medicine
.
Cell Res
2011
;
21
:
502
17
.
52.
Constancio
V
,
Nunes
SP
,
Henrique
R
,
Jeronimo
C
. 
DNA methylation-based testing in liquid biopsies as detection and prognostic biomarkers for the four major cancer types
.
Cells
2020
;
9
:
624
.
53.
Tahara
T
,
Maegawa
S
,
Chung
W
,
Garriga
J
,
Jelinek
J
,
Estecio
MR
, et al
Examination of whole blood DNA methylation as a potential risk marker for gastric cancer
.
Cancer Prev Res
2013
;
6
:
1093
100
.
54.
Marsit
CJ
,
Koestler
DC
,
Christensen
BC
,
Karagas
MR
,
Houseman
EA
,
Kelsey
KT
. 
DNA methylation array analysis identifies profiles of blood-derived DNA methylation associated with bladder cancer
.
J Clin Oncol
2011
;
29
:
1133
9
.
55.
Wong
EM
,
Southey
MC
,
Fox
SB
,
Brown
MA
,
Dowty
JG
,
Jenkins
MA
, et al
Constitutional methylation of the BRCA1 promoter is specifically associated with BRCA1 mutation-associated pathology in early-onset breast cancer
.
Cancer Prev Res
2011
;
4
:
23
33
.
56.
Brennan
K
,
Garcia-Closas
M
,
Orr
N
,
Fletcher
O
,
Jones
M
,
Ashworth
A
, et al
Intragenic ATM methylation in peripheral blood DNA as a biomarker of breast cancer risk
.
Cancer Res
2012
;
72
:
2304
13
.
57.
Hao
X
,
Luo
H
,
Krawczyk
M
,
Wei
W
,
Wang
W
,
Wang
J
, et al
DNA methylation markers for diagnosis and prognosis of common cancers
.
Proc Natl Acad Sci U S A
2017
;
114
:
7414
9
.