Background:

DNA methylation patterning is cell-type–specific and altered DNA methylation is well established to occur early in breast carcinogenesis, affecting non-cancerous, histopathologically normal breast tissue. Previous work assessing risk factor–associated alterations to DNA methylation in breast tissue has been limited, with even less published research in breast milk, a noninvasively obtained biospecimen containing sloughed mammary epithelial cells that may identify early alterations indicative of cancer risk.

Methods:

Here, we present a novel library for the estimation of the cellular composition of breast tissue and milk and subsequent assessment of cell-type–independent alterations to DNA methylation associated with established breast cancer–risk factors in solid breast tissue (n = 95) and breast milk (n = 48) samples using genome-scale DNA methylation measures from the Illumina HumanMethylation450 array.

Results:

We identified 772 hypermethylated CpGs (P < 0.01) associated with age consistent between breast tissue and breast milk samples. Age-associated hypermethylated CpG loci were significantly enriched for CpG island shore regions known to be important for regulating gene expression. Among the overlapping hypermethylated loci mapping to genes, a differentially methylated region was identified in the promoter region of SFRP2, a gene observed to undergo promoter hypermethylation in breast cancer.

Conclusions:

Our findings suggest the potential to identify epigenetic biomarkers of breast cancer risk in noninvasively obtained, tissue-specific breast milk specimens.

Impact:

This work demonstrates the potential of using breast milk as a noninvasive biomarker of breast cancer risk, improving our ability to detect early-stage disease and lowering the overall disease burden.

Breast cancer presents a major public health burden with an estimated 281,550 new cases and 43,600 deaths in 2021 in women in the United States alone (1). The high burden of disease and the increased survival observed when diagnoses are made at earlier stages of disease (1) underscore the importance of early detection in breast cancer. Precision prevention strategies are limited for breast cancer and assessment of tissue-specific molecular profiles offer new opportunities understand and act on individual breast cancer risk.

Documented intrinsic factors associated with breast cancer risk include age, family history of disease, and genetic factors such as high penetrance germline mutations in BRCA1 and BRCA2 and low penetrance SNPs (2–7). Although genetic approaches have aided in identifying women at risk of developing breast cancer (8), they provide an incomplete picture of disease risk. The study of epigenetic alterations related to disease risk, such as alterations to DNA methylation, presents an opportunity to expand upon our molecular understanding of disease risk and lifestyle-associated risk factors. Lifestyle factors associated with breast cancer risk include alcohol consumption, diet, and reproductive factors such as parity (2, 3). Other reproductive factors, such as earlier age at menarche and later age at first birth also have been associated with increased breast cancer risk (9, 10).

Early in breast carcinogenesis, alterations to DNA methylation are vast (11). The majority of the scope of DNA methylation alterations are already present in pre-invasive ductal carcinoma in situ relative to invasive tumors (12, 13), indicating that alterations to DNA methylation occur early in breast carcinogenesis. In addition, we demonstrated that histologically normal tissue collected from the same breast as a tumor harbors alterations to DNA methylation relative to breast tissue collected from the opposite breast (14), suggesting that early alterations to DNA methylation are present in a field of cells in the affected breast. By better understanding the shared somatic alterations associated with specific breast cancer–risk factors in the healthy target tissue and tumors, we can develop new opportunities for precision prevention through risk models and interventions that reduce disease burden.

Prior work in women without breast cancer demonstrated that DNA methylation profiles in solid breast tissue are associated with age, an established breast cancer–risk factor (15). As DNA methylation in the breast tissue of disease-free women is associated with established risk factors, and alterations to DNA methylation are known to occur early on in neoplastic transformation, DNA methylation profiles of breast tissue hold the potential to act as powerful biomarkers of disease risk. However, DNA methylation patterning is cell and tissue-specific, and maximizing the potential of DNA methylation measures in risk assessment and precision prevention requires measures in the target tissue. Of course, obtaining tissue with a breast biopsy is invasive and inappropriate for risk assessment. However, breast milk contains shed breast epithelial cells (and other cell types), providing a tissue-specific biospecimen obtained noninvasively. Using prospectively collected breast milk samples, we previously showed that DNA methylation alterations in breast milk are associated with breast cancer risk (16).

Whether breast milk can serve as an effective surrogate tissue for solid breast biopsies regarding genome-scale DNA methylation profiles is an open question. In addition, and critically, prior work has been limited by the inability to use reference-based cell-type estimates in breast milk and breast tissue biospecimens. Here, we develop and apply a novel reference-based cell type library to identify cellular composition-independent DNA methylation alterations associated with established breast cancer–risk factors and assess whether these alterations are shared between solid breast tissue and breast milk in disease-free women.

Study populations

Approximately half of subjects donating solid breast tissue (n = 95; Susan G. Komen Tissue Bank) had a first-degree family member with a history of breast or ovarian cancer (subsequently referred to as a family history of disease) compared with approximately a third of subjects donating breast milk [n = 24; New Hampshire Birth Cohort Study, (NHBCS); Table 1]. Solid breast tissue donors were also, on average, slightly older and captured a wider range of ages with a mean age of 36.5 compared with breast milk donors with a mean age of 32.6. In addition, although approximately half of solid breast tissue donors had never given birth, all breast milk donors had at least one child with 41.7% of samples being collected from mothers for which this was their first child.

Table 1.

Subject demographics for two study cohorts.

Solid breast tissue (n = 95)Breast milk (n = 24)P
Age mean (SD) 36.51 (13.23) 32.57 (5.42) 0.157 
Parity n (%)   <0.001 
46 (48.9) 0 (0.0)  
14 (14.9) 10 (41.7)  
16 (17.0) 9 (37.5)  
3+ 18 (19.1) 5 (20.8)  
Reproductive age mean (SD) 13.74 (6.31) 16.84 (5.95) 0.143 
Family history of disease n (%) 44 (51.8) 8 (34.8) 0.226 
Solid breast tissue (n = 95)Breast milk (n = 24)P
Age mean (SD) 36.51 (13.23) 32.57 (5.42) 0.157 
Parity n (%)   <0.001 
46 (48.9) 0 (0.0)  
14 (14.9) 10 (41.7)  
16 (17.0) 9 (37.5)  
3+ 18 (19.1) 5 (20.8)  
Reproductive age mean (SD) 13.74 (6.31) 16.84 (5.95) 0.143 
Family history of disease n (%) 44 (51.8) 8 (34.8) 0.226 

Normal breast tissues

Fresh-frozen breast tissue was obtained from the Susan G. Komen Tissue Bank for 95 cancer-free women who donated breast biopsies. Briefly, samples classified as having a high proportion of epithelial cells by the Susan G Komen Tissue Bank study pathologist were selected from subjects with a wide age range and an approximately equal distribution of parous and nulliparous women. Following DNA extraction and bisulfite modification, we measured DNA methylation with the Illumina 450k array as previously described (15). DNA methylation data are available at GSE88883.

Breast milk samples

Bilateral breast milk samples (n = 48) were obtained from participants (n = 24) in the NHBCS. Characteristics of the NHBCS have been previously described (17). Briefly, pregnant women were enrolled between approximately 24- and 28-week gestation from prenatal clinics in New Hampshire. Eligibility criteria included age of 18–45 years, English literacy, the use of a private, unregulated water system (e.g., private well) at home, not planning to move, and singleton pregnancies. Data were obtained from questionnaires and medical record reviews, including subject characteristics, lifestyle factors, reproductive history, and general health. Participants were asked to bring bilateral breast milk specimens to the postpartum follow-up appointment, collected approximately 6 weeks postpartum. Breast milk was processed, DNA extracted and bisulfite modified, and DNA methylation was measured with the Illumina 450k array as previously described (16). DNA methylation data are available at GSE133918.

Data processing

Sample intensity data (IDAT) files for solid breast tissue from GEO dataset GSE88883 (15) and for breast milk samples from GEO dataset GSE133918 (16). IDAT files were processed using the functional normalization (funnorm) method from the R package minfi (18) and subsequent probe-type normalization was completed using beta-mixture quantile normalization (19). Probes with a detection P value of >1.0×10−6 in more than 5% of samples in either dataset, CpGs with common SNP(s) in the probe that are within 5 base pairs of the target CpG dinucleotide, probes previously described to be potentially cross-hybridizing (20), and sex-specific probes were excluded from subsequent analysis. Of the 485,512 probes included on the Infinium HumanMethylation450 BeadChip, 414,950 were included in the final analysis.

Reference library development and cell-type estimation

Among cell types present in breast tissue and breast milk, we identified cell-specific differential methylation and developed a novel reference library for DNA methylation-based cell-type deconvolution. We used data for mammary epithelial (n = 2), endothelial (n = 2), and fibroblast (n = 2) cell lines (ScienCell Research Laboratories; GSE74877; ref. 21), isolated adipocytes from abdominal subcutaneous fat (n = 29) in obese and non-obese women (GSE67024; ref. 22), and purified B cells (n = 6), CD4+ T cells (n = 7), CD8+ T cells (n = 6), NK cells (n = 6), monocytes (n = 6), and neutrophils (n = 6; GSE110554; ref. 23). CpG loci that were included in the above datasets were restricted to those that passed all filtering steps for both the solid breast tissue and breast milk data as previously described under data processing. The IDOL algorithm was used to identify the optimal reference library for each cell type (24).

To validate the identified reference library, we used DNA methylation profiles from in silico mixtures of immune cells (n = 12; GSE110554; ref. 23), and isolated adipocytes from abdominal subcutaneous fat (n = 30; GSE58622). Relative proportions of all cell types were estimated using the epidish function (25) and specifying the robust partial correlations (RPC) method (26). Library performance was evaluated using Pearson correlation tests between the true relative proportions of cells present in in silico mixtures and estimated proportions from our novel reference library. Relative proportions of reference library cell types were estimated using DNA methylation data from solid breast tissue (n = 95) and breast milk (n = 48) applying the identified reference library and the RPC method within the epidish function.

Statistical analysis

Interrogated risk factors for analyses included age (continuous), parity (binary), reproductive age (continuous), BMI (continuous), and family history of breast or ovarian cancer (binary). Reproductive age was calculated as:
women with at least one live birth and

for nulliparous women.

Univariate analyses comparing estimated cellular proportions of solid breast tissue and breast milk, respectively, and continuous breast cancer–risk factors (age, reproductive age, and BMI) were conducted using a Pearson correlation coefficient using the R package stats. Analyses comparing estimated cellular proportions of solid breast tissue and breast milk, respectively, and binary breast cancer–risk factors (parity and family history of breast or ovarian cancer) were conducted using two-sample t tests with Bonferroni correction from the R package rstatix.

Differential methylation was assessed using linear mixed-effects models, modeling the relationship between logit-transformed beta values (M values) and breast cancer–risk factors. Models were also adjusted for the estimated relative proportions of epithelial cells, endothelial cells, fibroblasts, adipocytes, monocytes, and neutrophils.

The annotation data for the Illumina HumanMethylation450 array were used to define the genomic context of included CpG loci. The Cochran–Mantel–Haenszel tests were used to assess the enrichment of CpG island context among differentially methylated CpG loci (P < 0.01) in both solid breast tissue and breast milk, separately for differentially hypermethylated and differentially hypomethylated CpG sets. The Cochran–Mantel–Haenszel tests allow adjustment for array probe type, and the denominator was all 414,950 CpG loci included in the analysis.

Data for validation analysis and relevance to tumor

To validate age-related differentially methylated (P < 0.01) CpG loci shared between biospecimen types, we leveraged data obtained from normal breast tissue in an independent and publicly available dataset (n = 121, GSE101961; ref. 27). Data were processed and cell-type proportions were estimated as described above.

DNA methylation data were obtained from 392 breast tumors and 82 adjacent normal samples from The Cancer Genome Atlas (TCGA; ref. 11). Data were normalized as described above. The mean methylation status of CpG islands with identified hypermethylated shores with increasing age in solid breast tissue and breast milk was calculated across all CpG loci mapping to each island.

Data availability

All data used in this study are publicly available and can be accessed from GEO under the accession numbers GSE88883 (solid tissue samples) and GSE133918 (breast milk samples).

Study populations

DNA methylation was measured with the Illumina 450k array in solid breast tissue samples from 95 healthy donors to the Susan G. Komen Tissue Bank (Table 1). Of these individuals, 46 (48.9%) were nulliparous and the mean age was 36.5-years-old. DNA methylation also was measured with the Illumina 450k array in 48 paired breast milk samples (left and right breast) approximately 6 weeks postpartum from 24 lactating mothers in the NHBCS (Table 1). The mean age for breast milk donors was 32.6 years. This was the first live birth for 10 (41.7%) of the included mothers.

Cellular composition of solid breast tissue and breast milk estimated with DNA methylation

We produced a novel reference library for DNA methylation-based cell-type deconvolution in breast milk and solid breast tissue using publicly available data for mammary epithelial, endothelial, and fibroblast cell lines (GSE74877), isolated adipocytes from abdominal subcutaneous fat in obese and non-obese women (GSE67024), and purified B cells, CD4+ T cells, CD8+ T cells, NK cells, monocytes, and neutrophils (GSE110554). The optimized reference library included 677 probes to estimate the relative proportions of epithelial cells, endothelial cells, fibroblasts, adipocytes, B cells, CD4+ T cells, CD8+ T cells, NK cells, monocytes, and neutrophils (Fig. 1A).

Figure 1.

Breast-specific DNA methylation reference library for cell-type deconvolution in breast tissue and breast milk. A, Heatmap of the CpG weights assigned to each cell type for the 677 probes included in the breast-specific reference library. Mammary epithelial cells, endothelial cells, fibroblasts, adipocytes, B cells, CD4+ T cells, CD8+ T cells, NK cells, monocytes, and neutrophils were included in the library. Unsupervised hierarchical clustering of CpG loci and cell types, respectively, was conducted using Manhattan distance. B, Distribution of the estimated relative proportion of each cell type represented in the reference library across solid breast tissue (n = 95) and breast milk (n = 48) samples.

Figure 1.

Breast-specific DNA methylation reference library for cell-type deconvolution in breast tissue and breast milk. A, Heatmap of the CpG weights assigned to each cell type for the 677 probes included in the breast-specific reference library. Mammary epithelial cells, endothelial cells, fibroblasts, adipocytes, B cells, CD4+ T cells, CD8+ T cells, NK cells, monocytes, and neutrophils were included in the library. Unsupervised hierarchical clustering of CpG loci and cell types, respectively, was conducted using Manhattan distance. B, Distribution of the estimated relative proportion of each cell type represented in the reference library across solid breast tissue (n = 95) and breast milk (n = 48) samples.

Close modal

We next validated the accuracy and performance of our breast-specific reference library. First, we applied the library to estimate relative proportions of immune cells from in silico mixtures of data derived from purified cell-type DNA methylation profiles (n = 12). All six immune cell types demonstrated strong correlation (R2 > 0.97) with the true relative proportions in the in silico mixtures (Supplementary Fig. S1). Then, in an independent dataset of isolated adipocytes from abdominal subcutaneous fat (n = 30; GSE58622), our breast-specific reference library estimated that the median adipocyte content of interrogated samples was 99% (Supplementary Fig. S2).

Applying our library to breast tissue and breast milk DNA methylation data, we observed expected differences in the cellular composition of solid breast tissue and breast milk (Fig. 1B). Among the 95 solid breast tissue samples, the cell types with the highest median estimated relative abundance were adipocytes (48.2%), epithelial cells (33.0%), and fibroblasts (8.3%). Among the 48 breast milk samples, the most abundant cell types were epithelial cells (49.5%), neutrophils (15.0%), and adipocytes (12.6%).

Associations of biospecimen cellular composition with breast cancer–risk factors

Breast cancer–risk factors were associated with sample cellular composition in solid breast tissue. Age was significantly negatively associated with the relative proportion of epithelial cells, B cells, and CD8+ T cells (P < 0.05), and significantly positively associated with fibroblast proportions (P < 0.05, Supplementary Fig. S3). BMI was significantly negatively associated with epithelial cell proportions (P < 0.05) and significantly positively associated with endothelial cell and monocyte proportions (P < 0.05, Supplementary Fig. S4). Reproductive age was significantly negatively associated with B cell, CD8+ T, and NK cell proportions (P < 0.05), and significantly positively associated with adipocyte proportions (P < 0.05, Supplementary Fig. S5). Parity was significantly negatively associated with B-cell proportions (P < 0.05), and significantly positively associated with fibroblast and NK cell proportions (P < 0.05, Supplementary Fig. S6). Family history of disease was not significantly associated with the relative proportion of cell types (Supplementary Fig. S7).

In breast milk, cellular composition was associated with breast cancer–risk factors. Age was significantly negatively associated with the relative proportion of B cells, CD4+ T cells, CD8+ T cells, and NK cells (P < 0.05, Supplementary Fig. S8). BMI was significantly positively associated CD4+ T-cell proportions (P < 0.05, Supplementary Fig. S9). Family history of disease was significantly negatively associated with adipocyte proportions (P < 0.05) and significantly positively associated with neutrophil proportions (P < 0.05, Supplementary Fig. S7). No significant associations were observed between cell-type proportions and reproductive age (Supplementary Fig. S10) or parity (Supplementary Fig. S6).

DNA methylation differences between solid breast tissue and breast milk are largely attributable to differences in cellular composition

Comparing the methylome of breast milk with that of solid breast tissue, 327,547 CpG loci were identified as differentially methylated (FDR < 0.05) after adjusting for subject age and parity (Supplementary Fig. S11A). After adjusting for cellular composition, specifically the relative proportions of epithelial cells, endothelial cells, fibroblasts, adipocytes, monocytes and neutrophils in each sample, observed differential methylation was strongly attenuated (Supplementary Fig. S11B). These findings were robust to sensitivity analyses also adjusting for family history of disease (Supplementary Fig. S11C and S11D) to account for underlying differences in the proportion of subjects with a family history of disease between the two cohorts. In age, parity, and cell-type–adjusted models, hypermethylated CpG loci in breast milk relative to solid breast tissue were significantly enriched for CpG island bordering shore regions and CpG sparse open sea regions, whereas depleted for CpG dense CpG island regions. Similarly, hypomethylated loci were enriched for open sea regions and depleted for CpG islands but also depleted for shore regions (Supplementary Fig. S11C).

Shared associations of DNA methylation with reproductive age, family history of breast cancer, and BMI in solid breast tissue and breast milk

Reproductive age was used to assess a possible association between reproductive history, including age at menarche and age at first birth, and breast DNA methylation. Reproductive age was defined as the difference between age at first birth and age at menarche for parous women and between age at donation and age at menarche for nulliparous women. Reproductive age was significantly associated (FDR < 0.05) with methylation status of 55 CpG loci in solid breast tissue (Supplementary Fig. S12A) and 17 CpG loci in breast milk (Supplementary Fig. S12B), after adjusting for differences in underlying cellular composition. At a nominal significance threshold of P < 0.01, 9,076 loci were identified as differentially methylated with increasing reproductive age in solid breast tissue and 5,140 loci in breast milk. In both, solid breast tissue and breast milk, 42 hypermethylated and 42 hypomethylated CpG loci had shared differential methylation in the two tissue types (Supplementary Fig. S12C). Of the shared hypermethylated loci, 4 tracked to a region within GRM2 (Supplementary Table S1) whereas 2 of the shared hypomethylated loci tracked to RASA3, a member of the Ras pathway (Supplementary Table S2). These findings were robust to sensitivity analyses adjusting for parity in solid tissue samples due to the presence of nulliparous women in that cohort and the differences in calculation of reproductive age relative to parity (Supplementary Tables S1 and S2).

Approximately half of the solid breast tissue samples and a third of the breast milk samples were collected from women with a family history of breast or ovarian cancer (Table 1). When assessing the association between family history of breast or ovarian cancer (binary) and DNA methylation, no CpG loci met a significance threshold of FDR < 0.05 in either solid breast tissue or breast milk. At a nominal significance threshold of a P value of <0.01, 3,025 loci were identified as differentially methylated in solid breast tissue (Supplementary Fig. S13A) and 25,434 loci were identified as differentially methylated in breast milk (Supplementary Fig. S13B). Overlap was observed in the identified differentially methylated loci with 7 loci hypermethylated in both solid breast tissue and breast milk and 9 loci hypomethylated, including a locus mapping to the gene body region of BRCA2, in both tissues (Supplementary Fig. S13C, Supplementary Tables S3 and S4).

When assessing the association between pre-pregnancy BMI and breast milk DNA methylation, 4 CpG loci were identified as differentially methylated at a significance threshold of FDR < 0.05 (Supplementary Fig. S14A). Among these 4 loci, 3 demonstrated decreased gene body methylation in HLA-DQA2, VAC14, and TBC1D22A, respectively, with increasing pre-pregnancy BMI (Supplementary Table S5). Although no CpG loci met a significance threshold of FDR < 0.05 when assessing the association between BMI and solid breast tissue DNA methylation (Supplementary Fig. S14B), 20 CpG loci were consistently hypermethylated with increasing BMI across tissue types and 9 CpG loci were consistently hypomethylated (Supplementary Fig. S14C, Supplementary Tables S6 and S7).

Hypermethylated CpG loci in solid breast tissue and breast milk of parous women are enriched for CpG sparse open sea regions

In solid breast tissue, 30,137 loci were identified as differentially methylated (FDR < 0.05) in women with a history of at least one live birth relative to nulliparous women after adjusting for the estimated cellular composition of each sample (Fig. 2A). In breast milk, 235 CpG loci were identified as differentially methylated (FDR < 0.05) in women with at least one prior live birth relative to women for which the current pregnancy was their first live birth (Fig. 2B). At a nominal significance threshold of P < 0.01, 37,284 loci were identified as differentially methylated in solid breast tissue and 8,440 loci were identified as differentially methylated in breast milk with 537 loci being consistently hypermethylated between the two tissue types (Fig. 2C; Supplementary Tables S8 and S9). These 537 hypermethylated loci demonstrated 1.8-fold enrichment for CpG sparse open sea regions [95% confidence interval (CI), 1.5–2.2] and 0.3-fold depletion for CpG dense CpG island regions (95% CI, 0.2–0.4; Fig. 2D). The 85 hypomethylated loci included 5 loci mapping to the promoter region of tumor-suppressor gene TAGLN (Supplementary Table S9).

Figure 2.

CpG sites significantly differentially methylated with parity from epigenome-wide association analysis in (A) solid breast tissue (parous relative to nulliparous) and (B) in breast milk (parous before pregnancy relative to first time mothers) adjusted for cell type. Red dashed lines indicate a significance threshold of FDR < 0.05. C, Overlap of identified hypermethylated and hypomethylated loci (P < 0.01) in solid breast tissue and breast milk. D, Enrichment analysis of all overlapping hypermethylated CpG loci (P < 0.01) in solid breast tissue and breast milk.

Figure 2.

CpG sites significantly differentially methylated with parity from epigenome-wide association analysis in (A) solid breast tissue (parous relative to nulliparous) and (B) in breast milk (parous before pregnancy relative to first time mothers) adjusted for cell type. Red dashed lines indicate a significance threshold of FDR < 0.05. C, Overlap of identified hypermethylated and hypomethylated loci (P < 0.01) in solid breast tissue and breast milk. D, Enrichment analysis of all overlapping hypermethylated CpG loci (P < 0.01) in solid breast tissue and breast milk.

Close modal

CpG loci that are hypermethylated in solid breast tissue and breast milk with increasing age are enriched for CpG island bordering shore regions

In solid breast tissue, the methylation levels at 45,885 loci were significantly associated with age at donation after adjusting for the estimated cellular composition of each sample (Q < 0.05; Fig. 3A). In breast milk, no CpG loci demonstrated an association between methylation status and age at an FDR significance threshold of (Q < 0.05; Fig. 3B). Because of the markedly smaller sample size and the expected narrower age range of subjects who donated breast milk (Table 1), a nominal significance threshold of a P value of <0.01 was used for subsequent comparisons. At a nominal significance threshold of a P value of <0.01 and adjusting for sample cellular composition, 46,445 loci were associated with age in solid breast tissue and 8,450 loci were associated with age in breast milk. Among these differentially methylated loci, 130 loci were consistently hypomethylated with increasing age in both tissue types and 772 loci were consistently hypermethylated with increasing age (Fig. 3C; Supplementary Tables S10 and S11). The 772 consistently hypermethylated loci were significantly enriched for CpG island bordering shore regions (OR, 1.6; 95% CI, 1.3–1.9; Fig. 3D). Because of age and parity being positively correlated, associations were additionally assessed in models adjusting for both chronological age and parity. In the combined model, fewer loci were identified as being consistently hyper- or hypomethylated in both tissue types associated with age and parity, respectively (Supplementary Tables S12 and S13).

Figure 3.

Epigenome-wide association analysis identifying CpG sites that are significantly differentially methylated with increasing age, adjusted for estimated proportions of epithelial cells, endothelial cells, fibroblasts, adipocytes, monocytes, and neutrophils in (A) solid breast tissue and (B) breast milk. Red dashed lines indicate a significance threshold of FDR < 0.05. C, Overlap of identified hypermethylated and hypomethylated loci (P < 0.01) with increasing age in solid breast tissue and breast milk. D, Enrichment analysis of 772 overlapping hypermethylated CpG loci (P < 0.01) with increasing age in solid breast tissue and breast milk.

Figure 3.

Epigenome-wide association analysis identifying CpG sites that are significantly differentially methylated with increasing age, adjusted for estimated proportions of epithelial cells, endothelial cells, fibroblasts, adipocytes, monocytes, and neutrophils in (A) solid breast tissue and (B) breast milk. Red dashed lines indicate a significance threshold of FDR < 0.05. C, Overlap of identified hypermethylated and hypomethylated loci (P < 0.01) with increasing age in solid breast tissue and breast milk. D, Enrichment analysis of 772 overlapping hypermethylated CpG loci (P < 0.01) with increasing age in solid breast tissue and breast milk.

Close modal

To replicate age-related findings with cell-type adjustment, we accessed publicly available data (GSE101961) from a study assessing the association between age and DNA methylation (measured on the Illumina 450k array) in breast tissue from 121 disease-free donors (27). A total of 588 CpG age-related loci replicated in the independent dataset, including 72 hypomethylated loci (55%) and 516 hypermethylated loci (67%; Fig. 4A and B). Furthermore, when we compared our validation results using a subset of 130 and 772 randomly selected CpG loci, only 21 hypomethylated loci (16%) and 113 hypermethylated loci (15%) were identified as related with age.

Figure 4.

Association between age and DNA methylation in an independent dataset of solid breast tissue (n = 121, GSE101961) in (A) 130 shared hypomethylated loci (P < 0.01, Fig. 3C) with increasing age in solid breast tissue and breast milk, (B) 772 shared hypermethylated loci (P < 0.01, Fig. 3C) with increasing age in solid breast tissue and breast milk. Red loci indicate loci with consistent hypomethylation in (A) and hypermethylation in (B), respectively. C, Differential methylation of CpG islands, assessed as the mean methylation of all loci mapping to a given CpG island, in TCGA tumor relative to adjacent normal tissue for the 223 CpG islands whose shores contain significantly hypermethylated CpG loci with increasing age in both solid breast tissue and breast milk.

Figure 4.

Association between age and DNA methylation in an independent dataset of solid breast tissue (n = 121, GSE101961) in (A) 130 shared hypomethylated loci (P < 0.01, Fig. 3C) with increasing age in solid breast tissue and breast milk, (B) 772 shared hypermethylated loci (P < 0.01, Fig. 3C) with increasing age in solid breast tissue and breast milk. Red loci indicate loci with consistent hypomethylation in (A) and hypermethylation in (B), respectively. C, Differential methylation of CpG islands, assessed as the mean methylation of all loci mapping to a given CpG island, in TCGA tumor relative to adjacent normal tissue for the 223 CpG islands whose shores contain significantly hypermethylated CpG loci with increasing age in both solid breast tissue and breast milk.

Close modal

CpG islands with shores that contain age-associated hypermethylated loci demonstrate hypermethylation in tumor relative to adjacent normal tissue

From our age-associated methylation results, there were 223 CpG island shores that had significant age-associated DNA methylation in both breast tissue and breast milk samples. We then tested whether the neighboring CpG island exhibited hypermethylation in breast tumors using 450k array data from TCGA tumors (n = 392) and adjacent normal tissue (n = 82). We identified 94 CpG islands (42% of the 223 assessed island regions) with significant hypermethylation in breast tumors compared with adjacent normal tissue (Fig. 4C). Among these CpG islands, the most hypermethylated island in tumor relative to adjacent normal tissue mapped to the promoter region of SST.

SFRP2 age-associated promoter hypermethylation is observed in tumor and adjacent normal tissue

Of the 772 CpG loci hypermethylated in both solid breast tissue and breast milk, 10 loci mapped to the promoter region of SFRP2 (Supplementary Table S14), the most overlapping loci to map to a single gene. The next most hypermethylated genes were GRM2 with 7 overlapping hypermethylated loci in the promoter CpG island and HOXC13 with 5 overlapping hypermethylated loci in the gene body. Of the 10 hypermethylated loci in SFRP2, 9 mapped to the CpG island shore region within the promoter. The mean methylation beta value of the 10 hypermethylated loci in each sample positively correlates with age across both solid breast tissue and breast milk (Fig. 5A). Mean methylation of these 10 loci also demonstrates intermediate methylation levels in both tumor and adjacent normal tissue, independent of breast cancer subtype, in TCGA data (Fig. 5B). Importantly, across the promoter region of SFRP2 in breast tumor and adjacent normal tissue from TCGA, intermediate methylation levels are observed in the CpG island shore. In contrast, intermediate methylation of the CpG island itself is only observed in tumor tissue (Fig. 5C). Thus, methylation profiles of promoter CpG island shore SFRP2 in breast tumor and adjacent normal tissue are breast cancer subtype-independent.

Figure 5.

SFRP2 promoter CpG island shore DNA methylation in solid breast tissue, breast milk, and breast tumor tissue. A, Unadjusted association between age and mean DNA methylation of the 10 overlapping hypermethylated loci SFRP2 (Supplementary Table S14) in solid breast tissue and breast milk, respectively. B, Distribution of the mean methylation of the 10 overlapping hypermethylated loci in SFRP2 (Supplementary Table S14) in tumor and adjacent normal TCGA (The Cancer Genome Atlas) samples across tumor subtypes. C, Distribution of methylation of probes across the SFRP2 promoter in TCGA breast tumor and adjacent normal tissue. Loci tracking to the promoter CpG island is indicated in black and loci tracking to the adjacent CpG island shore is indicated in gray. Loci that are hypermethylated with increasing age in both solid breast tissue and breast milk (Supplementary Table S14) are indicated in black.

Figure 5.

SFRP2 promoter CpG island shore DNA methylation in solid breast tissue, breast milk, and breast tumor tissue. A, Unadjusted association between age and mean DNA methylation of the 10 overlapping hypermethylated loci SFRP2 (Supplementary Table S14) in solid breast tissue and breast milk, respectively. B, Distribution of the mean methylation of the 10 overlapping hypermethylated loci in SFRP2 (Supplementary Table S14) in tumor and adjacent normal TCGA (The Cancer Genome Atlas) samples across tumor subtypes. C, Distribution of methylation of probes across the SFRP2 promoter in TCGA breast tumor and adjacent normal tissue. Loci tracking to the promoter CpG island is indicated in black and loci tracking to the adjacent CpG island shore is indicated in gray. Loci that are hypermethylated with increasing age in both solid breast tissue and breast milk (Supplementary Table S14) are indicated in black.

Close modal

Using genome-scale DNA methylation data, we compared the associations of established breast cancer–risk factors with DNA methylation in both solid breast tissue and breast milk. Using a novel reference library for cell-type–specific DNA methylation in breast tissue, we identified, as expected, differences in the cellular composition of solid breast tissue and breast milk. As references for breast epithelial cells, endothelial cells, and fibroblasts were derived from cell lines, there remains the possibility that their methylome differs from that of cells isolated from tissue. Although this approach remains advantageous over reference-free approaches, additional work is needed to further develop breast-specific reference libraries for component cell types.

Solid tissue was found to have a higher relative proportion of adipocytes whereas breast milk had higher proportions of epithelial cells and immune cells. This reference library was critical in adjusting for cellular composition in downstream analyses to allow for direct comparisons between breast milk and solid breast tissue without the analyses being confounded by differences in the underlying cellular composition of each tissue type. In addition, identified statistically significant associations between estimated sample cellular composition and investigated breast cancer–risk factors in both solid breast tissue and breast milk reinforces the importance of adjusting for sample cellular composition in downstream analyses due to the risk of confounding of results by differences in cellular composition in the absence of adjustment.

Although the scope of shared differentially methylated loci associated with reproductive age, family history of disease, and BMI observed in both tissue types was narrow, small sample size, particularly in the breast milk data, may have precluded the identification of differentially methylated loci. Furthermore, despite little observed overlap, family history of disease was associated with gene body hypomethylation, typically associated with reduced gene expression, of well documented tumor-suppressor gene BRCA2 across tissue types. This suggests that relevant molecular alterations associated with disease risk in breast tissue may be detectable in breast milk, a far less invasive biospecimen for stratifying disease risk. Furthermore, although the use of breast milk as a potential biomarker of disease risk is limited to lactating women, findings in breast milk have the potential to be extended to nipple aspirate fluid collected from non-lactating women (28).

Despite minimal overlap observed in differentially methylated CpG loci with increasing BMI in breast milk and solid breast tissue, breast milk samples interesting identified 4 differentially methylated CpG loci with increasing BMI after correcting for multiple comparisons whereas no differentially methylated CpG loci were observed in solid breast tissue at this significance threshold. One identified CpG locus mapped to the gene body region of TBC1D22A, a gene previously linked to obesity (29). This highlights the potential to identify risk factor–associated molecular alterations in breast milk and motivates further investigation of this potential biospecimen in identifying alterations indicative of disease risk.

More extensive shared differentially methylated loci with increasing age and with parity were observed between solid breast tissue and breast milk. Shared hypermethylated loci associated with parity were enriched for CpG island sparse open sea regions that commonly overlap with enhancer regions, suggesting that these alterations to DNA methylation associated with parity, which is associated with a decreased risk of breast cancer, may play an important role in gene regulation. Although shared alterations to DNA methylation were attenuated for age and parity, respectively, in models adjusting for both risk factors this may reflect a more nuanced interaction between the two risk factors that we are underpowered to assess in the present study.

Shared hypermethylated loci with increasing age in solid breast tissue and breast milk were significantly enriched for CpG island shore regions. Previous work has identified that CpG island hypermethylation observed in tumor tissue relative to adjacent normal tissue may begin in CpG island bordering shore regions (14, 30). Alterations in shore regions would be undetected in comparisons of tumor relative with adjacent normal tissue as they are also present in the adjacent normal tissue itself. Therefore, the identification of CpG island shore hypermethylation in non-diseased tissue with increasing age may be indicative of early alterations in DNA methylation associated with increased risk of disease.

To further investigate this, we assessed the methylation status of CpG islands in tumor relative to adjacent normal TCGA tissue for islands with shore loci that were hypermethylated with age to explore the possibility of seeding hypermethylation events in CpG island shores in normal tissue leading to encroachment of hypermethylated CpG islands in tumor tissue. CpG island hypermethylation was observed for CpG island shores that were hypermethylated with increasing age in solid breast tissue and breast milk. Notably, the most hypermethylated of the assessed CpG islands mapped to the promoter region of SST that encodes the hormone somatostatin (SST), a hormone that is elevated during pregnancy and lactation (31) and that is involved in the indirect inhibition of mammary tumor growth through the inhibition of hormones and growth factors that promote tumor growth (32). In addition, the promoter CpG island of SST was found to exhibit hypermethylation in the CpG island shore in adjacent normal tissue relative to paired tissue collected from the opposite breast and subsequent hypermethylation of the CpG island itself in paired tumor tissue (14).

We identified SFRP2 as the gene with the greatest number of shared hypermethylated loci with increasing age in solid breast tissue and breast milk. SFRP2 encodes secreted frizzled-related protein 2, an antagonist of the Wnt pathway, the secretion of which has been documented as being decreased in multiple tumor types, including breast tumors (33). Furthermore, promoter hypermethylation of SFRP2 has been identified as a mechanism of decreased protein expression in breast tumor tissue and has even been proposed as a potential tumor biomarker (34).

Of the 10 shared hypermethylated loci identified in SFRP2, 9 mapped to the CpG island shore in the promoter region while the remaining locus mapped to the adjacent promoter CpG island. In TCGA breast tumor and adjacent normal methylation data, we identified similar intermediate levels of methylation at shore loci in both tumor tissue and adjacent normal tissue, whereas increased methylation of the adjacent island was observed only in tumor tissue, with these loci remaining hypomethylated in adjacent normal tissue. Taken together, these findings may suggest that DNA methylation alterations to the promoter CpG island shore region of SFRP2 occur early in carcinogenesis, priming the adjacent CpG island for subsequent hypermethylation. Early alterations to shore region sites have the potential to act as biomarkers for disease risk.

This study was limited by the small sample size, particularly in the breast milk data, as well as the unpaired nature of solid breast tissue and breast milk samples, due to the difficulty and ethical concerns of collecting breast biopsies from healthy lactating mothers, preventing direct comparisons between tissue types. However, developing a breast tissue–specific reference library for the estimation of cellular composition greatly enhanced our ability to compare results across tissue types with minimal confounding by differences in underlying cellularity. Despite these limitations, we identified common breast cancer–risk factor–associated DNA methylation alterations in both solid breast tissue and breast milk. In addition, because of small sample size we were limited in our ability to address concerns of multiple testing, and therefore present these initial findings as hypothesis generating. However, we were able to validate that some identified loci demonstrate methylation patterns consistent with what our findings suggest in independent breast cancer datasets. Identified alterations were consistent with prior findings from our group and others on breast tumorigenesis, together supporting the potential of breast milk as a noninvasive biomarker of disease risk and as harboring similar molecular alterations to those observed in solid breast tissue.

M.E. Muse reports a patent for a method to deconvolute breast tissue and breast milk cell proportions using reference DNA methylation profiles pending. L.A. Salas reports grants from Congressionally Directed Medical Research Programs/Department of Defense and NIGMS during the conduct of the study. B.C. Christensen reports a patent for system and method for deconvolution of breast tissue and breast milk cell proportions using reference DNA methylation profiles pending. No disclosures were reported by the other authors.

M.E. Muse: Conceptualization, formal analysis, visualization, methodology, writing–original draft, writing–review and editing. C.D. Carroll: Formal analysis, visualization, writing–review and editing. L.A. Salas: Supervision, methodology, writing–review and editing. M.R. Karagas: Supervision, writing–review and editing. B.C. Christensen: Conceptualization, supervision, funding acquisition, methodology, writing–review and editing.

This work was supported by funds from the Burroughs-Wellcome/Dartmouth Big Data in the Life Sciences Training Program to MEM; National Institutes of General Medical Sciences [Centers of Biomedical Research Excellence (COBRE) Center for Molecular Epidemiology at Dartmouth P20GM104416; to M.R. Karagas and subawards to B.C. Christensen and L.A. Salas]; National Cancer Institute (R01CA216265 and R01CA253976; to B.C. Christensen); National Institute of Environmental Health Sciences (P01ES022832), Environmental Protection Agency(RD-83544201), and UH3OD023275 (to M.R. Karagas); CDMRP/Department of Defense (W81XWH-20–1-0778; to L.A. Salas).

The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).

1.
Siegel
RL
,
Miller
KD
,
Fuchs
HE
,
Jemal
A
.
Cancer statistics, 2021
.
CA Cancer J Clin
2021
;
71
:
7
33
.
2.
Dumitrescu
RG
,
Cotarla
I
.
Understanding breast cancer risk—Where do we stand in 2005?
J Cell Mol Med
2005
;
9
:
208
21
.
3.
Barnard
ME
,
Boeke
CE
,
Tamimi
RM
.
Established breast cancer–risk factors and risk of intrinsic tumor subtypes
.
Biochim Biophys Acta
2015
;
1856
:
73
85
.
4.
Maas
P
,
Barrdahl
M
,
Joshi
AD
,
Auer
PL
,
Gaudet
MM
,
Milne
RL
, et al
.
Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States
.
JAMA Oncol
2016
;
2
:
1295
302
.
5.
Michailidou
K
,
Hall
P
,
Gonzalez-Neira
A
,
Ghoussaini
M
,
Dennis
J
,
Milne
RL
, et al
.
Large-scale genotyping identifies 41 new loci associated with breast cancer risk
.
Nat Genet
2013
;
45
:
353
61
.
6.
Michailidou
K
,
Beesley
J
,
Lindstrom
S
,
Canisius
S
,
Dennis
J
,
Lush
MJ
, et al
.
Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer
.
Nat Genet
2015
;
47
:
373
80
.
7.
Michailidou
K
,
Lindström
S
,
Dennis
J
,
Beesley
J
,
Hui
S
,
Kar
S
, et al
.
Association analysis identifies 65 new breast cancer–risk loci
.
Nature
2017
;
551
:
92
94
.
8.
Lee
A
,
Mavaddat
N
,
Wilcox
AN
,
Cunningham
AP
,
Carver
T
,
Hartley
S
, et al
.
BOADICEA: a comprehensive breast cancer–risk prediction model incorporating genetic and nongenetic risk factors
.
Genet Med
2019
;
21
:
1708
18
.
9.
Kelsey
JL
,
Gammon
MD
,
John
EM
.
Reproductive factors and breast cancer
.
Epidemiol Rev
1993
;
15
:
36
47
.
10.
Phipps
AI
,
Buist
DSM
,
Malone
KE
,
Barlow
WE
,
Porter
PL
,
Kerlikowske
K
, et al
.
Reproductive history and risk of three breast cancer subtypes defined by three biomarkers
.
Cancer Causes Control
2011
;
22
:
399
405
.
11.
Koboldt
DC
,
Fulton
RS
,
McLellan
MD
,
Schmidt
H
,
Kalicki-Veizer
J
,
McMichael
JF
, et al
.
Comprehensive molecular portraits of human breast tumours
.
Nature
2012
;
490
:
61
70
.
12.
Johnson
KC
,
Koestler
DC
,
Fleischer
T
,
Chen
P
,
Jenson
EG
,
Marotti
JD
, et al
.
DNA methylation in ductal carcinoma in situ related with future development of invasive breast cancer
.
Clin Epigenetics
2015
;
7
:
75
.
13.
Fleischer
T
,
Frigessi
A
,
Johnson
KC
,
Edvardsen
H
,
Touleimat
N
,
Klajic
J
, et al
.
Genome-wide DNA methylation profiles in progression to in situand invasive carcinoma of the breast with impact on gene transcription and prognosis
.
Genome Biol
2014
;
15
:
435
.
14.
Muse
ME
,
Titus
AJ
,
Salas
LA
,
Wilkins
OM
,
Mullen
C
,
Gregory
KJ
, et al
.
Enrichment of CpG island shore region hypermethylation in epigenetic breast field cancerization
.
Epigenetics
2020
;
15
:
1093
106
.
15.
Johnson
KC
,
Houseman
EA
,
King
JE
,
Christensen
BC
.
Normal breast tissue DNA methylation differences at regulatory elements are associated with the cancer-risk factor age
.
Breast Cancer Res
2017
;
19
:
81
.
16.
Salas
LA
,
Lundgren
SN
,
Browne
EP
,
Punska
EC
,
Anderton
DL
,
Karagas
MR
, et al
.
Prediagnostic breast milk DNA methylation alterations in women who develop breast cancer
.
Hum Mol Genet
2020
;
29
:
662
73
.
17.
Gilbert-Diamond
D
,
Cottingham
KL
,
Gruber
JF
,
Punshon
T
,
Sayarath
V
,
Gandolfi
AJ
, et al
.
Rice consumption contributes to arsenic exposure in US women
.
Proc Natl Acad Sci U S A
2011
;
108
:
20656
60
.
18.
Aryee
MJ
,
Jaffe
AE
,
Corrada-Bravo
H
,
Ladd-Acosta
C
,
Feinberg
AP
,
Hansen
KD
, et al
.
Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays
.
Bioinformatics
2012
,
2014
;
30
:
1363
9
.
19.
Teschendorff
AE
,
Marabita
F
,
Lechner
M
,
Bartlett
T
,
Tegner
J
,
Gomez-Cabrero
D
, et al
.
A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data
.
Bioinformatics
2013
;
29
:
189
96
.
20.
Zhou
W
,
Laird
PW
,
Shen
H
.
Comprehensive characterization, annotation, and innovative use of Infinium DNA methylation BeadChip probes
.
Nucleic Acids Res
2017
;
45
:
e22
.
21.
Holm
K
,
Staaf
J
,
Lauss
M
,
Aine
M
,
Lindgren
D
,
Bendahl
P-O
, et al
.
An integrated genomics analysis of epigenetic subtypes in human breast tumors links DNA methylation patterns to chromatin states in normal mammary cells
.
Breast Cancer Res
2016
;
18
:
27
.
22.
Arner
P
,
Sinha
I
,
Thorell
A
,
Rydén
M
,
Dahlman-Wright
K
,
Dahlman
I
.
The epigenetic signature of subcutaneous fat cells is linked to altered expression of genes implicated in lipid metabolism in obese women
.
Clin Epigenetics
2015
;
7
:
93
.
23.
Salas
LA
,
Koestler
DC
,
Butler
RA
,
Hansen
HM
,
Wiencke
JK
,
Kelsey
KT
, et al
.
An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray
.
Genome Biol
2018
;
19
:
64
.
24.
Koestler
DC
,
Jones
MJ
,
Usset
J
,
Christensen
BC
,
Butler
RA
,
Kobor
MS
, et al
.
Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL)
.
BMC Bioinf
2016
;
17
:
120
.
25.
Teschendorff
AE
,
Relton
CL
.
Statistical and integrative system-level analysis of DNA methylation data
.
Nat Rev Genet
2018
;
19
:
129
47
.
26.
Newman
AM
,
Liu
CL
,
Green
MR
,
Gentles
AJ
,
Feng
W
,
Xu
Y
, et al
.
Robust enumeration of cell subsets from tissue expression profiles
.
Nat Methods
2015
;
12
:
453
7
.
27.
Song
MA
,
Brasky
TM
,
Weng
DY
,
McElroy
JP
,
Marian
C
,
Higgins
MJ
, et al
.
Landscape of genome-wide age-related DNA methylation in breast tissue
.
Oncotarget
2017
;
8
:
114648
62
.
28.
Wrensch
MR
,
Petrakis
NL
,
Gruenke
LD
,
Ernster
VL
,
Miike
R
,
King
EB
, et al
.
Factors associated with obtaining nipple aspirate fluid: analysis of 1428 women and literature review
.
Breast Cancer Res Treat
1990
;
15
:
39
51
.
29.
Liu
AY
,
Gu
D
,
Hixson
JE
,
Rao
DC
,
Shimmin
LC
,
Jaquish
CE
, et al
.
Genome-wide linkage and regional association study of obesity-related phenotypes: the GenSalt study
.
Obesity
2014
;
22
:
545
56
.
30.
Skvortsova
K
,
Masle-Farquhar
E
,
Luu
P-L
,
Song
JZ
,
Qu
W
,
Zotenko
E
, et al
.
DNA Hypermethylation encroachment at CpG island borders in cancer is predisposed by H3K4 monomethylation patterns
.
Cancer Cell
2019
;
35
:
297
314
.
31.
Goldstein
A
,
Armony-Sivan
R
,
Rozin
A
,
Weller
A
.
Somatostatin levels during infancy, pregnancy, and lactation: a review
.
Peptides
1995
;
16
:
1321
6
.
32.
Watt
HL
,
Kharmate
G
,
Kumar
U
.
Biology of somatostatin in breast cancer
.
Mol Cell Endocrinol
2008
;
286
:
251
61
.
33.
Suzuki
H
,
Toyota
M
,
Caraway
H
,
Gabrielson
E
,
Ohmura
T
,
Fujikane
T
, et al
.
Frequent epigenetic inactivation of Wnt antagonist genes in breast cancer
.
Br J Cancer
2008
;
98
:
1147
56
.
34.
Veeck
J
,
Noetzel
E
,
Bektas
N
,
Jost
E
,
Hartmann
A
,
Knüchel
R
, et al
.
Promoter hypermethylation of the SFRP2 gene is a high-frequent alteration and tumor-specific epigenetic marker in human breast cancer
.
Mol Cancer
2008
;
7
:
83
.
This open access article is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.

Supplementary data