Background:

White blood cell (WBC) DNA may contain methylation patterns that are associated with subsequent breast cancer risk. Using a high-throughput array and samples collected, on average, 1.3 years prior to diagnosis, a case–cohort analysis nested in the prospective Sister Study identified 250 individual CpG sites that were differentially methylated between breast cancer cases and noncases. We examined five of the top 40 CpG sites in a case–control study nested in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) Cohort.

Methods:

We investigated the associations between prediagnostic WBC DNA methylation in 297 breast cancer cases and 297 frequency-matched controls. Two WBC DNA specimens from each participant were used: a proximate sample collected 1 to 2.9 years and a distant sample collected 4.2–7.3 years prior to diagnosis in cases or the comparable timepoints in controls. WBC DNA methylation level was measured using targeted bisulfite amplification sequencing. We used logistic regression to obtain ORs and 95% confidence intervals (CI).

Results:

A one-unit increase in percent methylation in ERCC1 in proximate WBC DNA was associated with increased breast cancer risk (adjusted OR = 1.29; 95% CI, 1.06–1.57). However, a one-unit increase in percent methylation in ERCC1 in distant WBC DNA was inversely associated with breast cancer risk (adjusted OR = 0.83; 95% CI, 0.69–0.98). None of the other ORs met the threshold for statistical significance.

Conclusions:

There was no convincing pattern between percent methylation in the five CpG sites and breast cancer risk.

Impact:

The link between prediagnostic WBC DNA methylation marks and breast cancer, if any, is poorly understood.

Cytosines followed by guanine on the same strand of the DNA backbone and connected by a phosphate are referred to as CpG sites. An important characteristic of CpG sites is that the cytosine can be methylated at C-5 (1). CpG site methylation is an epigenetic mechanism that plays an important role in gene expression and carcinogenesis, including breast cancer development. Two major patterns of aberrant DNA methylation are commonly found in both early and late-stage breast tumor tissue: (i) hypomethylation of the CpG sites of the entire genome; and (ii) hypermethylation in promoter regions in key regulatory genes (2, 3). Global hypomethylation may contribute to cancer by activating oncogenes and promoting genomic instability, whereas hypermethylation in promoter regions of tumor suppressor genes may silence gene expression of key proteins needed to control cell division (4).

Despite the well-recognized cell- and tissue-specificity of DNA methylation, there are some reports suggesting that white blood cell (WBC) DNA methylation is associated prospectively with breast cancer risk (5–7). One possible explanation for these findings is WBC DNA methylation patterns are an early marker of developing breast cancer. For example, differences observed in methylation in WBC DNA between breast cancer cases and controls might arise as the result of a difference in the respective leukocyte profiles and represent an immune response to the growing tumor (8). If so, then methylation profiling of blood would hold promise for breast cancer detection, especially if differences were observed at a preclinical phase. Another possibility is that DNA methylation profiles in WBC reflects genetics, diet, lifestyle, or environment, and that associations between epigenetic patterns in WBC DNA and breast cancer are markers of increased risk.

Methylation array platforms provide single-nucleotide resolution of DNA methylation over a large portion of the genome. The Sister Study, with 298 incident breast cancer cases and a random sample of 612 noncases was the first prospective study to investigate the link between genome-wide site-specific levels of WBC DNA methylation and subsequent breast cancer (9). Case subjects had an average time from WBC DNA collection to diagnosis of 465 days (range: 5–1389 days). Using an Illumina array that interrogated over 27,000 CpG sites, a total of 250 CpG sites were found to be differentially methylated between cases and controls, based on a false discovery value of Q < 0.05. A ROC analysis estimated a prediction accuracy of 65.8% for subsequent breast cancer, which exceeded both the Gail model and a genome-wide association study polymorphism model. The prediction accuracy of the top five statistically significant CpG sites (64.1%) was almost as strong as the complete 250 CpG site panel.

In this study, we further investigated the associations between prediagnostic WBC DNA methylation and breast cancer risk in a case–control study nested in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) prospective cohort. We specifically focused our analysis on a set of five CpG sites that were among the top forty reported as differentially methylated in the Sister Study (9). We used bisulfite amplicon sequencing (BSAS), a highly quantitative targeted method of next-generation sequencing of individual CpG sites (10). Another unique strength of our study design is that with serial prospective WBC DNA specimens available on each study subject, we were able to assess the extent to which alterations in the methylation of WBC DNA can be informative for early detection and to assess its potential for future risk prediction.

PLCO cohort

Study subjects were participants in the Prostate, Lung, Colorectal, and Ovarian (PLCO) Screening Trial, a randomized controlled trial in which females and males, aged 55–74 at entry, were randomized into each of the two arms: a control arm in which participants received their regular care and an intervention arm in which women were screened for lung, colorectal, and ovarian cancers and men were screened for prostate, lung, and colorectal cancers yearly for 6 years (11). Subjects were recruited between November 1993 and July 2001 in ten centers throughout the United States. All participants completed a written informed consent form as part of the PLCO Screening Trial. Institutional review boards at the NCI (Bethesda, MD) and each screening center approved the study. This analysis was also approved by the University of Massachusetts Institutional Review Board. Demographic and risk factor information was obtained through a baseline questionnaire completed by study participants at or very near to the time of randomization. In addition, shortly after the time of randomization, blood specimens were obtained from the participants, including 20,828 women randomized to the intervention arm (T0 visit). Blood specimens were also obtained at four additional times: at the start time of year 2 (T2), year 3 (T3), year 4 (T4), and year 5 (T5) of the trial. Thus, our study focuses on the intervention arm participants only. Breast and other cancers were primarily identified through annual study update questionnaires that ascertained type and date of cancer diagnosis in the previous year. Self-reported cancers were validated and histologically confirmed through medical records, death certificates, data from cancer registries, and information from next-of-kin for deceased participants.

Case–control study nested in PLCO trial cohort

Details of selection of cases and controls are provided in Supplementary Table S1. Briefly, we planned to identify 300 breast cancer cases who had serial prediagnostic bloods: a proximate sample collected 1.0–2.9 years prior to diagnosis (from a T3, T4, or T5 study visit) and a distant sample collected 4.2–7.3 years prior to diagnosis (always from the T0 study visit). We also planned to select a similar number of controls who had two serial bloods from women who did not develop breast cancer through December 31, 2013, frequency-matched on age at randomization (5-year age groups), study visit year of blood collections (T0, T3, T4, T5), and material type of the two WBC DNA samples of the cases (DNA newly extracted for this study from buffy coat, whole blood, or previously extracted DNA). After excluding one WBC DNA sample (distant/case) with an outlier methylation beta-value of 96.82 for C6orf79 from all analyses, we had 297 cases and 297 controls who had at least one WBC DNA specimen, and these were included in the risk factor analysis. Our proximate sample analysis was based on 292 breast cases and 294 controls, and the distant sample analysis was based on 294 case and 292 controls. A total 97% of the cases and controls had two serial WBC DNA specimens and were included in both proximate and distant analyses. The majority of breast cancer cases were invasive (n = 231).

We did not include bloods collected within a year of breast cancer diagnosis because of the potential for reverse causality and the likelihood that such close proximity would mean the samples would not be useful for early detection. The average time from blood collection to diagnosis for proximate specimens was 1.82 years (range: 1.0–2.9 years) and for distant specimens 5.7 years (range: 4.2–7.3 years) for distant specimens.

Selection of CpG sites

We initially planned to examine methylation of the top five statistically significant CpG sites identified in the Sister Study (9) using targeted next-generation sequencing of PCR amplicons of bisulfite-modified DNA (10). We developed primers for the top five sites and used these primers in the amplification of mixtures (0%, 25%, 50%, 75%, and 100%) of methylated and unmethylated DNA (Zymo Research, D5013). In a test-sequencing run (sequencing methods described below), however, only one of these five CpG sites (ERCC1) showed a strong correlation between the expected and observed percent methylation; results for the other genes revealed significant bias, most likely due to bisulfite-induced sequence-specific DNA degradation that is further amplified by the PCR (12). We further attempted to develop primer sets for the next top 35 CpGs identified in the Sister Study (9). Because of the density of CpGs within the regions of interest, we could only develop primers for 33 of the top 40 CpGs in the Sister Study (including the top 5 CpG sites). Unmethylated and fully methylated DNA standards (Zymo Research, D5013) were mixed to create controls corresponding to 0%, 5%, 15%, 25%, 50%, 75%, and 100% methylation. We used these methylation controls to run two more sequencing test runs with ERCC1 and 28 additional PCR products and selected the five CpG sites with the highest R2 values for the observed versus expected percent methylation from our methylated DNA mixtures (Table 1). The amplicons identified for analysis were as follows: ERCC1, MGAT4C, C6orf79, PRKCBDP, OPTN, ranked 1, 10, 14, 28, and 40 in the Sister Study. We note that the R2 for ERCC1 was very low in the second sequencing run (0.35), but at the time we attributed this to experimental error, and selected ERCC1 based on the high R2 in our first test run (0.99) and third test run (0.78) and because ERCC1 was the top CpG site in the Sister Study. The primer sequences and annealing temperatures used to amplify the five CpG sites are provided in Supplementary Table S2.

Table 1.

Results of test runs for the five CpG sites selected for analysis based on performance in test sequencing run and their rank in the Sister Study.

R2 observed versus expected % methylation
GeneCpG siteChromosome and position: GRCh38.p7Rank in the Sister Study1st test runa2nd test runb3rd test runb
ERCC1 cg17378989 19:45423493 0.99 0.35 0.78 
MGAT4C cg18344063 12:86838734 10 – 0.99 0.82 
C6orf79 (CCDC90Acg00214855 11:6320121 14 – 0.94 0.95 
PRKCDBP cg18392783 10:13099832 28 – 0.92 0.77 
OPTN cg27555776 6:13813715 40 – 0.96 0.92 
R2 observed versus expected % methylation
GeneCpG siteChromosome and position: GRCh38.p7Rank in the Sister Study1st test runa2nd test runb3rd test runb
ERCC1 cg17378989 19:45423493 0.99 0.35 0.78 
MGAT4C cg18344063 12:86838734 10 – 0.99 0.82 
C6orf79 (CCDC90Acg00214855 11:6320121 14 – 0.94 0.95 
PRKCDBP cg18392783 10:13099832 28 – 0.92 0.77 
OPTN cg27555776 6:13813715 40 – 0.96 0.92 

aSequencing run included the top five differentially methylated CpG sites in previous study (9).

bSequencing run included ERCC1 and 28 additional CpG sites from the top 40 differentially methylation CpG sites in previous study (9).

DNA extraction, bisulfite modification

A total of 0.25 micrograms of DNA from study samples and quality control samples (described below) were bisulfite-converted in fifteen 96-well plates using EZ DNA Methylation-Lightning (Zymo Research, catalog number D5302) according to the manufacturer's protocol. Bisulfite-modified DNA was eluted in 30 μL buffer. All samples were randomly distributed across fifteen 96-well plates. Samples were blinded as to case–control status.

PLCO quality control specimens

A total of 119 quality control WBC DNA specimens from five women who were out of scope in the PLCO study were randomly inserted in duplicate or triplicate across plates to assess within and across plate variability. All control DNA samples were treated similarly to actual samples: sequencing methods are described below. Median coefficient of variations (CV) were as follows: ERCC1 (85.2%), MGAT4C (1.6%), PRKCBDP (32.5%), OPTN (44.7%), and C6orf79 (60.1%). The mean percent levels of methylation and SDs for ERCC1, PRKCDBP, OPTN, C6orf79, and MGAT4C were 0.8 (SD: 0.7), 1.7 (0.7), 1.5 (0.7), 1.1 (0.6), and 92.6 (1.6). The mean levels of all but MGAT4C are close to zero; as a result, the large CVs associated with these genes are not directly interpretable as these estimates are highly sensitive to small changes in mean levels. The CV of MGAT4C is well under 5%, indicating good assay reproducibility for this gene that has high mean levels of percent methylation (>90%). CVs did not vary substantially across batches; thus, CVs could not be improved by excluding certain batches.

In-house quality control specimens

Using a comparable approach to Masser and colleagues (10), we also included standards with known methylation levels (0%, 5%, 15%, 25%, 50%, 75%, and 100%) for each of the five genes on all 15 plates. The average R2 for each gene across 15 runs were 0.60, 0.85, 0.95, 0.79, and 0.98 for ERCC1, MGAT4C, C6orf79, PRKCDBP, and OPTN, respectively. A no template control included in every PCR plate indicated that there was no contamination with a PCR product, as the sequencing result was always zero.

Next-generation sequencing

We used targeted next-generation sequencing of bisulfite-modified DNA for the present analysis of five CpGs sites, following the BiSulfite Amplicon Sequencing methods recently described by Masser and colleagues (13).

One- to 3 μL of bisulfite-modified DNA were used in a PCR reaction (MyTaqHS mix Bioline) with each of the five primer sets (see Table 2 for primer sequences) designed using Pyromark Assay Design Software (Qiagen) to amplify regions that include the targeted CpG. PCR products were purified to eliminate primers and enzymes and eluted in 30 μL buffer. All PCR products (1,380 × 5 = 6,900) were quantified using PQuant-IT PicoGreen dsDNA Kit (Invitrogen) and diluted to 2 ng/μL. Next, all five diluted PCR plates were pooled at a final concentration of 2 ng/μL for Nextera XT DNA Library Preparation (Illumina). Dual-indexed libraries, appropriate for multiplex sequencing, were generated using Nextera XT DNA Sample Preparation Kit (Illumina) according to the manufacturer's protocol and methods detailed in Masser and colleagues (13). Double-stranded libraries were assessed using a High Sensitivity D1000 Screentape System (Agilent) and quantified using Quant-IT Picogreen on a Fusion Plate Reader (Packard Bioscience) and prepared (denaturation and dilution) for sequencing at the University of Massachusetts Genomic Core Facility using a Nexseq 550 Sequencer (Illumina).

Table 2.

Associations between demographic, lifestyle, reproductive, menstrual, and medical characteristics, adjusted for age (5-year age groups), and time of entry into study.a

Cases (297)Controls (297)
n%n%Odds ratio95% CI
Age (years) <59 127 42.8 126 42.4   
 60–64 80 26.9 81 27.3   
 65–69 56 18.9 56 18.9   
 ≥70 34 11.5 34 11.5   
Time of entry ≥Oct 1, 1997 137 46.1 140 47.1   
 ≤Sept 30, 1997 160 53.8 157 52.9   
Race White, non-Hispanic 273 90.2 267 91.9 1.00b  
 Other 24 9.8 30 8.1 0.78 0.44–1.37 
Body mass index (kg/m215–24.99 119 40.1 132 44.4 1.00b  
 25.0–29.99 100 33.7 101 34.0 1.10 0.76–1.60 
 ≥30 75 25.3 62 20.9 1.35 0.88–2.05 
Age at menarche (years) <12 64 21.6 52 17.5 1.00b  
 12–13 168 56.6 165 55.6 0.83 0.54–1.27 
 ≥14 65 21.9 80 26.9 0.66 0.40–1.08 
Live births None 21 7.1 27 9.1 1.00b  
 1–2 births 114 38.4 87 29.3 1.02 0.54–1.93 
 3+ births 156 52.5 189 63.6 0.63 0.34–1.17 
Age at first birth (years) <20 43 14.5 46 15.5 1.00b  
 20–24 134 45.1 156 52.5 0.92 0.57–1.48 
 25–29 65 21.9 50 16.8 1.40 0.80–2.45 
 ≥30 28 9.4 23 7.7 1.31 0.65–2.60 
 No live births 21 7.1 27 9.1 1.38 0.68–2.80 
Type of menopause Natural 203 68.4 170 57.2 1.00b  
 Bilateral oophorectomy 76 25.6 115 38.7 0.54 0.38–0.78 
 Other 14 4.7 2.4 1.68 0.66–4.28 
Age at natural menopause (years) <50 128 43.1 154 51.9 1.00b  
 50–54 116 39.1 105 35.4 1.34 0.94–1.58 
 55+ 50 16.8 35 11.8 1.74 1.06–2.85 
History of benign breast disease No 176 59.3 211 71.0 1.00b  
 Yes 119 40.1 81 27.3 1.77 1.25–2.50 
Smoking status Never 157 52.9 194 52.8 1.00b  
 Former/Current 140 47.1 103 38.1 1.69 1.21–2.36 
Recent alcohol intake No 77 25.9 79 26.6 1.00b  
 Yes 204 68.7 198 66.7 1.06 0.73–1.53 
Family history of breast cancer No 244 82.2 261 87.8 1.00b  
 Yes 48 16.2 34 11.5 1.52 0.95–2.45 
Hormone use Never 78 26.3 80 26.9 1.00b  
 Current/Former 177 59.6 184 62.0 0.98 0.67–1.43 
 Former 42 62.1 33 11.1 1.31 0.75–2.27 
Cases (297)Controls (297)
n%n%Odds ratio95% CI
Age (years) <59 127 42.8 126 42.4   
 60–64 80 26.9 81 27.3   
 65–69 56 18.9 56 18.9   
 ≥70 34 11.5 34 11.5   
Time of entry ≥Oct 1, 1997 137 46.1 140 47.1   
 ≤Sept 30, 1997 160 53.8 157 52.9   
Race White, non-Hispanic 273 90.2 267 91.9 1.00b  
 Other 24 9.8 30 8.1 0.78 0.44–1.37 
Body mass index (kg/m215–24.99 119 40.1 132 44.4 1.00b  
 25.0–29.99 100 33.7 101 34.0 1.10 0.76–1.60 
 ≥30 75 25.3 62 20.9 1.35 0.88–2.05 
Age at menarche (years) <12 64 21.6 52 17.5 1.00b  
 12–13 168 56.6 165 55.6 0.83 0.54–1.27 
 ≥14 65 21.9 80 26.9 0.66 0.40–1.08 
Live births None 21 7.1 27 9.1 1.00b  
 1–2 births 114 38.4 87 29.3 1.02 0.54–1.93 
 3+ births 156 52.5 189 63.6 0.63 0.34–1.17 
Age at first birth (years) <20 43 14.5 46 15.5 1.00b  
 20–24 134 45.1 156 52.5 0.92 0.57–1.48 
 25–29 65 21.9 50 16.8 1.40 0.80–2.45 
 ≥30 28 9.4 23 7.7 1.31 0.65–2.60 
 No live births 21 7.1 27 9.1 1.38 0.68–2.80 
Type of menopause Natural 203 68.4 170 57.2 1.00b  
 Bilateral oophorectomy 76 25.6 115 38.7 0.54 0.38–0.78 
 Other 14 4.7 2.4 1.68 0.66–4.28 
Age at natural menopause (years) <50 128 43.1 154 51.9 1.00b  
 50–54 116 39.1 105 35.4 1.34 0.94–1.58 
 55+ 50 16.8 35 11.8 1.74 1.06–2.85 
History of benign breast disease No 176 59.3 211 71.0 1.00b  
 Yes 119 40.1 81 27.3 1.77 1.25–2.50 
Smoking status Never 157 52.9 194 52.8 1.00b  
 Former/Current 140 47.1 103 38.1 1.69 1.21–2.36 
Recent alcohol intake No 77 25.9 79 26.6 1.00b  
 Yes 204 68.7 198 66.7 1.06 0.73–1.53 
Family history of breast cancer No 244 82.2 261 87.8 1.00b  
 Yes 48 16.2 34 11.5 1.52 0.95–2.45 
Hormone use Never 78 26.3 80 26.9 1.00b  
 Current/Former 177 59.6 184 62.0 0.98 0.67–1.43 
 Former 42 62.1 33 11.1 1.31 0.75–2.27 

aMissing values (cases, controls): body mass index (3, 2), type of menopause (4, 5), age at natural menopause (3, 3), history of benign breast disease (2, 5), smoking status (0, 27), recent alcohol use (16, 20), family history (5, 2).

bReferent category.

Bisulfite sequencing data analysis

Sequencing reads in FastQ format were uploaded to the Massachusetts Green High-Performance Computing Cluster for bioinformatic analyses. Bismark bisulfite mapper was used to align reads generated from bisulfite-treatment to the reference gene sequences to determine cytosine methylation calls (14). Bismark performed alignments, deduplication, and methylation extraction and provided the overall project summary reports. Bismark genome preparation scripts were used to prepare the reference genes (i.e., ERCC1, MGAT4C, C6orf79, PRKCDBP, and OPTN) for bisulfite alignments. For the methylation analysis, sequence reads were quality filtered using FastQC and libraries were specified as nonstrand specific (nondirectional). Paired-end alignments with reference genes were performed using Bowtie 2 v2.3.2 (14, 15). Alignment output files were written in BAM format using Samtools v1.3 (16). The output formats were visualized in the genome viewer using SeqMonk v1.40.1 (17). The percent methylation in each CpG was calculated by (number of reads with methylated C/total reads) × 100 and were reported in excel format for each reference gene. Final cytosine methylation and Bismark summary reports were generated and summarized for each run.

Read depth

Masser and colleagues (10) have reported that a read depth of 1000X in BSAS is sufficient for methylation quantification. Read depth in the total number of WBC DNA samples for case and controls (distant and proximate combined) was very high for four of the five genes. Average read depth (SD) for ERCC1, C6orf79, PRKCDBP, and OPTN was 60,900 (53,632), 107,711 (82,755), 160,243 (116,992), and 94,470 (71,416), respectively. Average read depth (SD) for MGAT4C was 9,136 (16, 651). The number of samples with a read depth below 1,000 for each gene was as follows: ERCC1 (n = 0), C6orf79 (n = 0), PRKCDBP (n = 10), OPTN (n = 2), and MGAT4C (n = 241). For MGAT4C, read depth across samples was especially low on two of fifteen plates: 93% (147/158) of the samples on these plates had a read depth below 1,000. The percent of samples with read depth below 1,000 on the remaining plates for MGAT4C ranged from 0% to 41%. In our primary analysis, we excluded all samples with a read depth below 1,000; and for MGAT4C analyses only, we additionally excluded the small number of remaining samples on the two batches described above.

Statistical analysis

Logistic regression was used to estimate OR and 95% confidence intervals (CI) for breast cancer with case–control status as the outcome. For primary analyses, the continuous percent methylation values of each of the five CpG sites were used as individual exposures of interest, analyzed separately for the proximate and distant WBC DNA collections, and adjusted for participant and DNA sample characteristics: age at entry (≤59, 60–64, 65–69, ≥70 years), blood collection visit (T0, T3, T4, T5) and source of DNA (proximate: whole blood, buffy coat; distant: previously extracted DNA, buffy coat). Statistical significance of the associations of each CpG site with breast cancer risk was based on two-sided likelihood ratio tests with P ≤ 0.05. Additional simultaneous adjustment for age at menarche, age at first livebirth, parity, alcohol, history of benign breast disease, cigarette smoking, alcohol intake, type of menopause, age at natural menopause, family history of breast cancer, and hormone use among women had a negligible influence on risk; and therefore, these factors were not included in the models. In sensitivity analyses, we also present associations between ERCC1 percent methylation and breast cancer risk, separately by age categories (<60 years; ≥60 years) and according to tumor characteristics. We assessed interaction by age by examining the P value of the cross-product term.

OR estimates for established breast cancer risk factors were generally comparable in our analytic subgroup of 297 cases and 297 controls to those reported previously in the PLCO cohort (Table 2; ref. 18). Obesity, nulliparity, later age at first birth, later age at natural menopause, a family history of breast cancer, current or past smoking, and a personal history of benign breast disease were positively associated with risk. Later age at menarche, having three or more live births, and having a surgical menopause were inversely associated with risk.

The magnitudes of absolute differences in unadjusted mean WBC DNA percent methylation levels between cases and controls were very small, ranging from 0% for PRKCDBP in distant WBC DNA to 0.18% for ERCC1 in proximate WBC DNA (Table 3). In proximate WBC DNA, mean percent methylation level for ERCC1 was significantly higher in the cases than in the controls (1.07 vs. 0.89; P = 0.01). In distant WBC DNA, in contrast, mean percent methylation for ERCC1 was significantly lower in the cases than in the controls (0.93 vs. 1.11; P = 0.03). In addition, mean percent methylation for PRKCDBP in proximate DNA was significantly higher in cases than in the controls (2.18 vs. 2.00; P = 0.05). No other associations were observed that met the threshold of P ≤ 0.05. On the basis of all samples, the interquartile ranges for percent methylation for ERCC1, MGAT4C, C6orf79, PRKCDBP, and OPTN were 0.37–1.36, 90.1–93.2, 0.45–1.37, 1.26–2.56, and 0.89–1.61, respectively.

Table 3.

Unadjusted mean percent methylation level and SE in breast cancer cases and controls in five genes, separately for proximate (1–3 years prior to diagnosis) and distant (4–7 years prior to diagnosis).

Proximate WBC DNADistant WBC DNA
1–3 years prediagnosis for cases4–7 years prediagnosis
Cases (n = 294)aControls (n = 292)aCases (n = 292)bControls (n = 294)b
MeanSEMeanSEPMeanSEMeanSEP
ERCC1 1.07 0.05 0.89 0.05 0.01 0.93 0.07 1.11 0.05 0.03 
MGAT4C 90.81 0.30 90.60 0.36 0.65 90.94 0.33 90.47 0.37 0.34 
C6orf79 1.00 0.04 0.93 0.04 0.22 1.01 0.04 1.00 0.04 0.82 
PRKCDBP 2.18 0.07 2.00 0.06 0.05 2.06 0.07 2.06 0.06 0.97 
OPTN 1.31 0.03 1.25 0.03 0.23 1.26 0.03 1.31 0.03 0.23 
Proximate WBC DNADistant WBC DNA
1–3 years prediagnosis for cases4–7 years prediagnosis
Cases (n = 294)aControls (n = 292)aCases (n = 292)bControls (n = 294)b
MeanSEMeanSEPMeanSEMeanSEP
ERCC1 1.07 0.05 0.89 0.05 0.01 0.93 0.07 1.11 0.05 0.03 
MGAT4C 90.81 0.30 90.60 0.36 0.65 90.94 0.33 90.47 0.37 0.34 
C6orf79 1.00 0.04 0.93 0.04 0.22 1.01 0.04 1.00 0.04 0.82 
PRKCDBP 2.18 0.07 2.00 0.06 0.05 2.06 0.07 2.06 0.06 0.97 
OPTN 1.31 0.03 1.25 0.03 0.23 1.26 0.03 1.31 0.03 0.23 

aMissing values (proximate cases, controls): MGAT4C (61, 64), PRKCDBP (3, 3), OPTN (0, 1).

bMissing values (distant cases, controls): MGAT4C (61, 66), PRKCDBP (3, 1), OPTN (0, 1).

Table 4 shows adjusted OR associated with a one-unit increase in percent methylation (corresponding to a 1% increase in methylation) for each of the five CpG sites, separately for proximate and distant-collected PLCO WBC DNA. A one-unit increase in ERCC1 percent methylation in proximate WBC DNA was significantly associated with an increased risk of breast cancer (adjusted OR = 1.29; 95% CI, 1.06–1.57). Conversely, a one-unit increase in ERCC1 percent methylation in distant WBC DNA was significantly inversely associated with risk (adjusted OR = 0.83; 95% CI, 0.69–0.98). For comparison, the adjusted proximate ORs for breast cancer in the 90th percentile of WBC DNA methylation (corresponding to ≥2.2%) was 3.0 (95% CI, 1.4–6.5) compared with the 10th percentile (corresponding to 0%–0.33%). The adjusted distant ORs for breast cancer in the 90th percentile (corresponding to ≥2.2%) was 0.66 (95% CI, 0.29–1.59) compared with the 10th percentile (corresponding to 0%–0.31%). There was a borderline positive association between a one-unit increase in percent methylation in proximate WBC DNA in PRKCDBP and breast cancer risk (OR = 1.16; 95% CI, 1.00–1.35). Results were also unchanged in the above analyses with no exclusions of data for read depth. For example, the comparable ORs for a one-unit increase in percent methylation of MGAT4C in proximate and distant DNA were 1.02 (95% CI, 0.99–1.05) and 1.03 (95% CI, 1.0–1.06), respectively. There was no discernable pattern of association between DNA methylation patterns in the five genes and breast cancer risk factors. As an example, we show the association between ERCC1 methylation in proximate and distant WBC DNA and breast cancer risk factors for control subjects only. (Supplementary Table S3). Finally, separate analyses were conducted to examine associations between ERCC1 percent methylation and breast cancer risk by tumor characteristics and age (Supplementary Table S4). For both proximate and distant WBC DNA, ORs were similar in direction for invasive, ER-positive, PR-positive cases, and by stage to the all-case analysis in Table 2. The P value for the interaction between age and ERCC1 percent methylation did not reach the threshold for statistical significance for proximate or distant WBC DNA.

Table 4.

Adjusteda ORs for one-unit change in percent methylation and breast cancer risk for five genes, separately for proximate and distant WBC DNA collections.

Proximate WBC DNA (1–3 years prediagnosis for cases)Distant WBC DNA (4–7 years prediagnosis for cases)
CasesControlsOR (95% CI)CasesControlsOR (95% CI)
ERCC1 294 292 1.29 (1.06–1.57) 292 294 0.83 (0.69–0.98) 
MGAT4C 233 228 1.00 (0.97–1.05) 231 228 1.02 (0.98–1.06) 
C6orf79 294 291 1.18 (0.91–1.55) 291 294 1.03 (0.80–1.32) 
PRKCDBP 291 289 1.16 (1.00–1.34) 292 294 1.00 (0.87–1.16) 
OPTN 294 291 1.21 (0.89–1.64) 291 294 0.83 (0.61–1.12) 
Proximate WBC DNA (1–3 years prediagnosis for cases)Distant WBC DNA (4–7 years prediagnosis for cases)
CasesControlsOR (95% CI)CasesControlsOR (95% CI)
ERCC1 294 292 1.29 (1.06–1.57) 292 294 0.83 (0.69–0.98) 
MGAT4C 233 228 1.00 (0.97–1.05) 231 228 1.02 (0.98–1.06) 
C6orf79 294 291 1.18 (0.91–1.55) 291 294 1.03 (0.80–1.32) 
PRKCDBP 291 289 1.16 (1.00–1.34) 292 294 1.00 (0.87–1.16) 
OPTN 294 291 1.21 (0.89–1.64) 291 294 0.83 (0.61–1.12) 

aORs are adjusted for age at entry (5-year age groups), study year of blood collection (proximate: T3, T4, T5; distant: restricted to T0), and material type (proximate: whole blood, buffy coat; distant: previously extracted DNA, buffy coat).

In this study, we attempted to validate the Sister Study findings for five CpG sites in a new study population. In the Sister Study (9), a one-unit increase in percent methylation in a single CpG site in each of three genes was significantly associated with decreased breast cancer risk [ERCC1 HR: 0.83 (95% CI, 0.78–0.88), C6orf79 HR: 0.89 (95% CI, 0.85–0.93), and PRKCDBP HR: 0.73 (95% CI, 0.65–0.81)]. Conversely, a one-unit increase in percent methylation in a single CpG site in each of two genes was significantly associated with increased breast cancer risk [MGAT4C HR: 1.19 (95% CI, 1.15–1.26) and OPTN HR: 2.00 (95% CI, 1.59–2.51)]. The percent methylation (average beta values × 100) in WBC DNA in noncases was 11.5, 86.4, 13.1, 5.1, and 1.5 for ERCC1, MGAT4C, C6orf79, PRKCDBP, and OPTN, respectively. The differences in average percent methylation between cases and noncases were −1.0, 0.96, −1.1, −0.44, and 0.15, respectively.

In contrast, we found no association between percent methylation level in four of the five CpG sites (MGAT4C, C6orf79, PRKCDBP, and OPTN) and qualitatively different associations in the CpG site in ERCC1 that were dependent on the timing of the WBC DNA collection. Specifically, we found an increased risk for breast cancer in our proximate analyses (1–2.9 years prior to diagnosis in cases, which was most comparable with time between blood draw and diagnosis in the Sister Study) and an inverse association for our distant analyses (4.2–7.3 years prior to diagnosis in cases).

Results from other studies, including an updated analysis of the Sister Study (5), provide little additional support for an association between these five CpG sites and breast cancer risk. For example, based on an analysis of four cohorts (1663 cases; 1885 controls), Bodelon and colleagues (19) reported that none of the CpG sites on the HM450K array (including the five CpG sites we examined) were associated with breast cancer risk overall or after stratification by time of blood collection (<5, 5–10, >10 years). Furthermore, an updated Sister Study analysis using the HM450K array (1552 cases; 1224 noncases; ref. 5) confirmed only five CpG sites from among the entire 250 CpG sites identified in the original study: only one was among the original top five CpG sites, and none of these were among the five CpG sites assessed in our study.

As previously noted, observed differences in WBC DNA methylation between cases and controls are very small and existing laboratory methods to assess percent methylation may lack sufficient sensitivity to reliably detect such small differences (20). We found a statistically significant difference between cases and noncases for one of the five genes we examined, ERCC1. The small difference in methylation that we report for ERCC1 could represent a change in a minor population of cells, such as immune cells responding to lesion in the breast or detection of a circulating precancerous shed breast cell (8). However, any interpretation of the biological significance of ERCC1 is further complicated because the direction of the significant associations was dependent on the time between diagnosis and blood collection. Given the variation in direction of association and the knowledge that other studies have not confirmed an association between ERCC1 WBC DNA methylation and breast cancer risk, we suggest that our results are due to chance.

We designed our study to examine the potential value of WBC DNA methylation for early detection and risk prediction. We therefore excluded individuals whose blood collection occurred within one year of diagnosis, as detecting increased methylation within one year of diagnosis would provide limited clinical advantage beyond current screening approaches. It remains possible that our inability to confirm the findings from the original Sister Study are related to the exclusion of cases with blood collection within one year of diagnosis as the Sister Study included these cases. It is also possible that differences in the population characteristics across studies might contribute to observed differences. For instance, age at enrollment of participant was approximately 56 years (range: 27–75 years) in the Sister Study, whereas the PLCO study population was much older at entry (range: 55–74 years; ref. 21). In addition, the Sister Study only includes women with a biological sister with breast cancer and participants were more likely to be nulliparous than in the PLCO cohort (18.2% vs. 7.9%). However, differences in study characteristics do not seem a likely explanation because of the lack of confirmation reported by updated Sister Study (5). On the basis of our results and the weight of evidence in the literature, we suggest that the association between WBC DNA methylation and breast cancer risk, if it exists, is likely complex.

Because of the limited variation in levels of methylation in CpG sites in WBC DNA between cases and controls, and the limited ability to detect such small differences with state-of-the-art technology, we think this approach is not likely to be useful for breast cancer risk identification without future major advances. As has been required in GWAS studies, large replication studies should be required in future-published epidemiologic studies of DNA methylation patterns in WBC DNA and breast cancer risk. Finally, studying DNA methylation patterns directly in breast cell DNA from sources such as breast milk, breast tissue biopsies, or circulating tumor cells may be logistically challenging but may be more likely to move the field of breast cancer detection and prediction forward.

S.R. Sturgeon reports grants from NIH during the conduct of the study. E.P. Browne reports grants from NIH during the conduct of the study. J. Einson reports grants from NIH during the conduct of the study. M. Halabi reports grants from NIH during the conduct of the study. T. Kania reports grants from NIH during the conduct of the study. R. Balasubramanian reports grants from NCI during the conduct of the study. K.T. Kelsey reports other from Cellintec outside the submitted work; in addition, K.T. Kelsey has a patent for PCT/US2012/039699 and the title is “Methods Using DNA Methylation for Identifying a Cell or a Mixture of Cells for Prognosis and Diagnosis of Diseases, and for Cell Remediation Therapies.” United States, Serial No. 14/089,398 filed 11/25/13. European Patent, Serial No. 12789375.8 filed 10/7/13. Canada, Serial No, 2,869,295 filed 5/25/12 issued. K.F. Arcaro reports grants from NIH/NCI during the conduct of the study. No disclosures were reported by the other authors.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health (Bethesda, MD).

S.R. Sturgeon: Conceptualization, formal analysis, investigation, writing–original draft, writing–review and editing. D.A. Sela: Conceptualization, investigation, writing–original draft, writing–review and editing. E.P. Browne: Data curation, formal analysis, investigation, methodology, writing–original draft, writing–review and editing. J. Einson: Investigation, writing–original draft. A. Rani: Data curation, visualization, writing–original draft. M. Halabi: Formal analysis, investigation, writing–review and editing. T. Kania: Formal analysis, investigation, writing–review and editing. A. Keezer: Formal analysis, investigation, writing–review and editing. R. Balasubramanian: Formal analysis, writing–original draft, writing–review and editing. R.G. Ziegler: Conceptualization, writing–review and editing. C. Schairer: Conceptualization, methodology, writing–original draft, writing–review and editing. K.T. Kelsey: Conceptualization, investigation, writing–review and editing. K.F. Arcaro: Conceptualization, formal analysis, supervision, investigation, writing–original draft, writing–review and editing.

The authors thank Eva Goldwater, MS, at the University of Massachusetts Amherst (Amherst, MA) for her assistance with the statistical analysis. This study was supported by the National Cancer Institute (Bethesda, MD) of the National Institutes of Health (Bethesda, MD) under award number U01CA184 (to S.R. Sturgeon) and by the National Cancer Institute Intramural Research Program (to R.G. Ziegler and C. Schairer).

The costs of publication of this article were defrayed, in part, by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Foley
DL
,
Craig
JM
,
Morley
R
,
Olsson
CJ
,
Dwyer
T
,
Smith
K
, et al
Prospects for epigenetic epidemiology
.
Am J Epidemiol
2009
;
169
:
389
400
.
2.
Jackson
K
,
Yu
MC
,
Arakawa
K
,
Fiala
E
,
Youn
B
,
Fiegl
H
, et al
DNA hypomethylation is prevalent even in low-grade breast cancers
.
Cancer Biol Ther
2004
;
3
:
1225
31
.
3.
Campan
M
,
Weisenberger
DJ
,
Laird
PW
. 
DNA methylation profiles of female steroid hormone-driven human malignancies
.
Curr Top Microbiol Immunol
2006
;
310
:
141
78
.
4.
Ehrlich
M
. 
DNA methylation in cancer: too much, but also too little
.
Oncogene
2002
;
21
:
5400
13
.
5.
Xu
Z
,
Sandler
DP
,
Taylor
JA
. 
Blood DNA methylation and breast cancer: a prospective case-cohort analysis in the sister study
.
J Natl Cancer Inst
2020
;
112
:
87
94
.
6.
van Veldhoven
K
,
Polidoro
S
,
Baglietto
L
,
Severi
G
,
Sacerdote
C
,
Panico
S
, et al
Epigenome-wide association study reveals decreased average methylation levels years before breast cancer diagnosis
.
Clin Epigenetics
2015
;
7
:
67
.
7.
Severi
G
,
Southey
MC
,
English
DR
,
Jung
CH
,
Lonie
A
,
McLean
C
, et al
Epigenome-wide methylation in DNA from peripheral blood as a marker of risk for breast cancer
.
Breast Cancer Res Treat
2014
;
148
:
665
73
.
8.
Garcia-Closas
M
,
Gail
MH
,
Kelsey
KT
,
Ziegler
RG
. 
Searching for blood DNA methylation markers of breast cancer risk and early detection
.
J Natl Cancer Inst
2013
;
105
:
678
80
.
9.
Xu
Z
,
Bolick
SC
,
Deroo
LA
,
Weinberg
CR
,
Sandler
DP
,
Taylor
JA
. 
Epigenome-wide association study of breast cancer using prospectively collected sister study samples
.
J Natl Cancer Inst
2013
;
105
:
694
700
.
10.
Masser
DR
,
Berg
AS
,
Freeman
WM
. 
Focused, high accuracy 5-methylcytosine quantitation with base resolution by benchtop next-generation sequencing
.
Epigenetics Chromatin
2013
;
6
:
33
-.
11.
Prorok
PC
,
Andriole
GL
,
Bresalier
RS
,
Buys
SS
,
Chia
D
,
Crawford
ED
, et al
Design of the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial
.
Control Clin Trials
2000
;
21
:
273S
309S
.
12.
Olova
N
,
Krueger
F
,
Andrews
S
,
Oxley
D
,
Berrens
RV
,
Branco
MR
, et al
Comparison of whole-genome bisulfite sequencing library preparation strategies identifies sources of biases affecting DNA methylation data
.
Genome Biol
2018
;
19
:
33
2
.
13.
Masser
DR
,
Stanford
DR
,
Freeman
WM
. 
Targeted DNA methylation analysis by next-generation sequencing
.
J Vis Exp
2015
;
96
:
52488
.
14.
Krueger
F
,
Andrews
SR
. 
Bismark: a flexible aligner and methylation caller for bisulfite-seq applications
.
Bioinformatics
2011
;
27
:
1571
2
.
15.
Langmead
B
,
Salzberg
SL
. 
Fast gapped-read alignment with bowtie 2
.
Nat Methods
2012
;
9
:
357
9
.
16.
Li
H
,
Handsaker
B
,
Wysoker
A
,
Fennell
T
,
Ruan
J
,
Homer
N
, et al
The sequence alignment/map format and SAMtools
.
Bioinformatics
2009
;
25
:
2078
9
.
17.
SekMonk: a tool to visualise and analyse high throughput mapped sequence data [homepage on the Internet]
. 
2017
.
Available from
: https://wwwbioinformaticsbabrahamacuk/projects/seqmonk/2007.
18.
Lacey
JV
 Jr
,
Kreimer
AR
,
Buys
SS
,
Marcus
PM
,
Chang
SC
,
Leitzmann
MF
, et al
Breast cancer epidemiology according to recognized breast cancer risk factors in the prostate, lung, colorectal and ovarian (PLCO) cancer screening trial cohort
.
BMC Cancer
2009
;
9
:
84
.
19.
Bodelon
C
,
Ambatipudi
S
,
Dugue
PA
,
Johansson
A
,
Sampson
JN
,
Hicks
B
, et al
Blood DNA methylation and breast cancer risk: a meta-analysis of four prospective cohort studies
.
Breast Cancer Res
2019
;
21
:
62
9
.
20.
Wong
EM
,
Southey
MC
,
Terry
MB
. 
Integrating DNA methylation measures to improve clinical risk assessment: are we there yet? The case of BRCA1 methylation marks to improve clinical risk assessment of breast cancer
.
Br J Cancer
2020
;
122
:
1133
40
.
21.
Sandler
DP
,
Hodgson
ME
,
Deming-Halverson
SL
,
Juras
PS
,
D'Aloisio
AA
,
Suarez
LM
, et al
The Sister Study cohort: baseline methods and participant characteristics
.
Environ Health Perspect
2017
;
125
:
127003
.