Background: Random periareolar fine-needle aspiration (RP-FNA) is increasingly used in trials of breast cancer prevention for biomarker assessments. DNA methylation markers may have value as surrogate endpoint biomarkers, but this requires identification of biologically relevant markers suitable for paucicellular, lymphocyte-contaminated clinical samples.

Methods: Unbiased whole-genome 5-aza-2′-deoxycytidine (5AZA)–induced gene expression assays, followed by several phases of qualitative and quantitative methylation-specific PCR (MSP) testing, were used to identify novel breast cancer DNA methylation markers optimized for clinical FNA samples.

Results: The initial 5AZA experiment identified 453 genes whose expression was potentially regulated by promoter region methylation. Informatics filters excluded 273 genes unlikely to yield useful DNA methylation markers. MSP assays were designed for 271 of the remaining genes and, ultimately, 33 genes were identified that were differentially methylated in clinical breast cancer samples, as compared with benign RP-FNA samples, and never methylated in lymphocytes. A subset of these markers was validated by quantitative multiplex MSP in extended clinical sample sets. Using a novel permutation method for analysis of quantitative methylation data, PSAT1, GNE, CPNE8, and CXCL14 were found to correlate strongly with specific clinical and pathologic features of breast cancer. In general, our approach identified markers methylated in a smaller subpopulation of tumor cells than those identified in published methylation array studies.

Conclusions: Clinically relevant DNA methylation markers were identified using a 5AZA-induced gene expression approach.

Impact: These breast cancer-relevant, FNA-optimized DNA methylation markers may have value as surrogate endpoint biomarkers in RP-FNA studies. Cancer Epidemiol Biomarkers Prev; 22(12); 2212–21. ©2013 AACR.

This article is highlighted in the In This Issue feature, p. 2143

Random periareolar fine-needle aspiration (RP-FNA) samples are frequently used in phase 2 trials of breast cancer prevention for assessments of surrogate endpoint biomarkers (1–7). DNA methylation is a potentially reversible early event in breast carcinogenesis (8–11), and is readily assessable in paucicellular RP-FNA samples (12–15). For some genes, higher levels of methylation are identified in benign breast cells from cancer patients than similar samples from unaffected women (16–18) and there is, often, a correlation between specific methylation changes observed in benign breast tissue and the associated cancers (16, 19, 20). This suggests that measures of DNA methylation in benign breast tissue may predict breast cancer risk, and, in fact, DNA methylation of some genes is observed more frequently in breast epithelial cells obtained by RP-FNA from high-risk women as compared with lower risk women (17, 18, 21); in addition, tamoxifen has been shown to reduce methylation measured in RP-FNA samples for some genes (7). Although the risk signals generated by these candidate gene studies are intriguing, the clinical relevance of hazard ratios in the 2.5 to 3.0 range is marginal at best.

Clinically useful RP-FNA-based breast cancer risk stratification may be achievable with better DNA methylation markers. Methylation arrays have greatly expanded the list of potential markers (22–28), but have not necessarily identified genes whose expression is regulated by promoter region methylation. We used a genome-wide 5-aza-2′-deoxycytidine (5AZA) approach to identify genes whose expression is upregulated with demethylation. The gene lists were subsequently refined based on several phases of qualitative and quantitative methylation-specific PCR (MSP) assays targeting specific promoter regions in benign and malignant breast samples and lymphocytes. A selected subset of markers was assessed in a large panel of clinical breast cancer FNA and benign breast RP-FNA samples to confirm clinical relevance.

Cell lines and primary cell cultures for demethylation studies

Six breast cancer cell lines were selected for demethylation studies based on known tumor suppressor gene expression regulation by promoter region hypermethylation: HCC1569 (CCND2), HCC1954 (SCGB3A1, APC, RASSF1A), MCF-7 (RAR-β2), MDA-MB-231 (ESR1), UACC3199 (BRCA1), and BT-549 (hypermethylator phenotype). Apart from MCF10A, we specifically avoided immortalized benign human mammary epithelial cell lines for this experiment as these cells frequently show tumor suppressor gene methylation (e.g., p16) and gene expression profiles that are intermediate between normal breast epithelial cells and breast cancer (29). Instead, we opted to test six first-passage benign human mammary epithelial cell cultures (HME) generated in serum-free media from small fragments of normal breast tissue obtained from young women undergoing fibroadenoma excision. Fresh benign breast tissue was minced and digested overnight on a rotator at 37°C in 1 mg/mL collagenase (C2674, Sigma-Aldrich) in Mammary Epithelial Cell Growth Medium (MEGM; NC9523177, Fisher Scientific) and then centrifuged at 200g for 3 minutes. The pellet was rinsed with PBS, centrifuged again, and then treated with 0.25% trypsin in EDTA at 37°C for 3 minutes to dissociate the cell clumps. After neutralizing the trypsin, the cells were centrifuged for 3 minutes at 200g, resuspended in MEGM with 5% FBS, and plated into tissue culture flasks. The media was changed to MEGM without serum the next day. Contaminating fibroblasts were removed by a quick trypsinization 3 to 5 days later.

5AZA treatment

Dose-dependent cytostasis was observed in 5AZA-treated cells, and was most pronounced for the primary benign breast epithelial cell cultures. The 5AZA dose (0.5 μmol/L) was selected based on evaluation of growth curves and induction of BNC1, SERPINB, and TKTL1 gene expression measured by reverse transcription polymerase chain reaction (RT-PCR) in benign and malignant cells. The breast cancer cell lines, HME cultures, and MC10A cells were treated with 0.5 μmol/L 5AZA (Sigma-Aldrich) in dimethylsulfoxide (DMSO) or DMSO alone for 6 days, after which RNA was prepared using the Illumina TotalPrep kit (AMIL1791, Life Technologies). This RNA was used for whole-genome expression analysis.

Gene expression analysis

Whole-genome expression was assessed using the Illumina HumanWG-6-v3 chip. The initial analysis was done in GeneSpring as follows. Flags were set to present for entities with detection P values of 0.2 or less and absent for entities with detection P values of more than 0.4. Raw signals with values less than 1 were reset to 1. Quantile normalization was used, and baseline transformation performed on the basis of the median expression level for all entities. Of the 48,701 entities included on the Illumina array, flags were present or marginal for 75% of the samples for 27,324 entities, and these were retained. The complete dataset is available on the NCBI GEO site (GSE41692).

Informatics filters

Publically available informatics tools were used to limit the gene list identified in the 5AZA experiment to those most likely to yield useful epigenetic markers (30–34). Specific exclusions included duplicate genes (192), no CpG island (87), known imprinted genes (5), X-chromosome genes and likely imprinted (30), pseudogenes or noncoding RNA (6), poorly annotated or discontinued records (12), and CpG island more than 2,000 bp proximal to the transcription start site (33).

Qualitative methylation-specific PCR

Qualitative methylation-specific PCR (MSP) assays were designed and run as described previously (21). For genes previously shown to be epigenetically regulated, published sequences were used. For novel genes, primers were designed within the CpG island as close to the transcriptional start site as possible (median 5′ distance from the transcription start site was 97 bp with a median product size of 113 bp). Primer sequences used in the initial MSP screen are shown in Supplementary Table S1. For the second MSP screen in clinical samples, HotStar taq (Qiagen) was used, and MgCl2 concentrations and PCR cycles optimized to permit the detection of 60 pg of methylated NaBS-treated gDNA in a 6-ng DNA sample (Supplementary Table S2).

Quantitative multiplex methylation-specific PCR

Quantitative multiplex MSP (QM-MSP) assays were designed as described previously by Fackler (35) using the same conditions and quality assurance standards we have previously described (17, 18). Results were expressed as a methylation fraction, which is calculated as methylated copies/(methylated copies + unmethylated copies). Primer and probe sequences are shown in Supplementary Table S3.

Extended FNA sample set for clinical and pathologic correlations

Five markers were selected from the QM-MSP screen for assessment in an expanded clinical sample set that included all of the samples used in the initial QM-MSP screen with the addition of new, prospectively acquired samples. This research was performed in accordance with an assurance filed with and approved by the U.S. Department of Health and Human Services. Institutional Review Board approval was obtained, and informed consent was documented in writing for each participant. Unselected patients with untreated stages 1 to 3 primary breast cancer (n = 52) underwent FNA sampling of the primary tumor at the time of definitive breast cancer surgery. In addition, benign breast epithelium was obtained by RP-FNA from patients with breast cancer (n = 52) and women never diagnosed with breast cancer (n = 90) as previously described (18, 21). Samples not meeting quality assurance standards for QM-MSP (∼15%) were excluded from analysis; therefore, the number of samples shown in the tables is lower than the number collected. The total number of samples included in this analysis (archival plus prospectively acquired) is 97 breast cancers (one bilateral breast cancer), 104 benign RP-FNA samples from cancer patients, and 223 benign RP-FNA samples from women never diagnosed with breast cancer.

Clinical information was prospectively collected on case report forms and tumor data, including tumor size, nodal status, associated ductal carcinoma in situ (DCIS), expression of estrogen receptor (ER), progesterone receptor, and Her-2/neu, proliferative index (Ki67), and p53 expression, were subsequently abstracted from the pathology reports.

A permutation method for comparing quantitative methylation data between patient groups

For the purposes of summarization and comparison, quantitative methylation data are typically dichotomized using an arbitrary threshold value above which a sample is scored as methylated and below which a sample is scored as not methylated. This approach artificially constrains the data, reducing it to the level of qualitative MSP data. To avoid this when making comparisons between patient groups, we dichotomized the patient variable of interest, ER status for example, and then plotted the proportion of variable (+) and variable (−) cases that exceeded a continuous range of possible methylation threshold values (e.g., 10−8 to 0.5). These two curves (Fig. 1) were compared as follows. Methylation values were randomly permuted to destroy the association, if any, between methylation of a specific gene and the selected clinicopathologic variable of interest. For each permuted data set, we calculated the proportions of cases in each of the two categories as in Fig. 1. Note that if there is no association between the methylation and the clinicopathologic variable, the two curves are expected to be identical. We then computed the differences between the two curves and used their summation as a measure of the similarity between the two groups. A null distribution for the sum of the differences is developed over the course of 10,000 iterations, and P values are calculated based on the null distribution. To account for the multiple comparisons included in this analysis, false-discovery rate (FDR)–corrected P values were also calculated by the method of Benjamini and Hochberg (36).

Figure 1.

Statistical approach for comparing quantitative methylation data between two groups. In this example, the proportion of samples positive for PSAT1 methylation across a range of threshold values is plotted for estrogen receptor positive (solid line) and estrogen receptor negative cancers (dashed line). A permutation method is used to calculate the sum of differences (SoD) between the two curves and the statistical significance of that difference. For these curves, SoD = 17.71 and P = 0.

Figure 1.

Statistical approach for comparing quantitative methylation data between two groups. In this example, the proportion of samples positive for PSAT1 methylation across a range of threshold values is plotted for estrogen receptor positive (solid line) and estrogen receptor negative cancers (dashed line). A permutation method is used to calculate the sum of differences (SoD) between the two curves and the statistical significance of that difference. For these curves, SoD = 17.71 and P = 0.

Close modal

Initially, 5AZA-induced gene expression was used to identify potential breast cancer-relevant methylation markers. The resulting gene list was refined using informatics filters followed by three phases of MSP testing in panels of cell lines, benign primary breast epithelial cell cultures, lymphocytes, and clinical samples. A subset of five markers fully optimized for RP-FNA samples was tested in an extended clinical sample set to confirm clinical relevance. The marker discovery and validation pipeline is shown in Supplementary Fig. S1.

5AZA-induced gene expression

Volcano plots initially identified 72 genes whose expression was induced 2-fold or more with an FDR of less than 0.05 in breast cancer cell lines treated with 5AZA. Expression of two genes was reduced by 5AZA by 2-fold or more with FDR less than 0.05 (HDAC4 and SMYD3). Hierarchical cluster analysis of baseline expression of 72 genes induced by 5AZA discriminated perfectly between the benign and malignant samples (Fig. 2). The pattern of expression for these 72 genes was highly consistent among the benign breast epithelial cells and more variable among the cancers. Although MCF10A cells clustered with the benign breast epithelial cells, they were recognized as a distinct subclass. Thirty-six of these 72 genes showed reduced baseline expression in the breast cancers as compared with the benign cells (Fig. 2, orange box), but only 20 of these passed the informatics filters required for advancement to the MSP screens. To reduce the probability of overlooking useful genes, we ultimately advanced genes with 4-fold or greater induction by 5AZA in one or more breast cancer cell lines, or 1.5-fold or greater induction by 5AZA in two or more breast cancer cell lines. In total, 645 entities, representing 453 unique genes, were selected from the 5AZA experiment. Of these, 280 passed the informatics filters, but MSP assays could be optimized for only 271 of these.

Figure 2.

Unsupervised hierarchical cluster analysis of baseline expression of 72 genes that were significantly upregulated by 5AZA in breast cancer cells lines. The 36 genes with reduced baseline expression in breast cancer cell lines compared with benign cells (orange box) were advanced to the informatics filters. The list was ultimately expanded to include any genes induced 4-fold or greater in one or more breast cancer cell lines, or 1.5-fold or greater in two or more breast cancer cell lines. Cell type: breast cancer cell line, benign breast epithelial cell primary culture, and immortalized benign breast epithelial MCF10A cells. Color scale based on Log2 normalized expression.

Figure 2.

Unsupervised hierarchical cluster analysis of baseline expression of 72 genes that were significantly upregulated by 5AZA in breast cancer cells lines. The 36 genes with reduced baseline expression in breast cancer cell lines compared with benign cells (orange box) were advanced to the informatics filters. The list was ultimately expanded to include any genes induced 4-fold or greater in one or more breast cancer cell lines, or 1.5-fold or greater in two or more breast cancer cell lines. Cell type: breast cancer cell line, benign breast epithelial cell primary culture, and immortalized benign breast epithelial MCF10A cells. Color scale based on Log2 normalized expression.

Close modal

Characteristics of the 271 5AZA-induced genes advanced to the MSP screens

There were 31 (11.4%) transcription factors represented among the 271 5AZA-induced genes, which is a significant enrichment over the 1,857 (7.4%) transcription factors included among the 25,186 unique Entrez IDs found in the Illumina genome (P = 0.011). This list included only three homeobox genes, HOXB5, HOXD1, and MSX1, which is consistent with the total number of homeobox genes found in the Illumina genome (P = 0.974). Gene ontology analysis showed enrichment for several biologic processes, including response to stimulus (corrected P = 0.013), multiorganism processes (corrected P = 0.006), and developmental processes (corrected P = 0.006). Enrichment for response to stimulus genes was most significant for the response to stress terms (corrected P = 0.0008).

Initial qualitative MSP validation in cultured cells and lymphocytes

The 271 genes identified in the 5AZA experiment were tested by qualitative MSP assays in 10 human breast cancer cell lines, six benign early passage primary HME cultures, and four lymphocyte samples (Supplementary Table S1). These breast cancer cell lines were selected to include ER-positive, ER-negative, HER2-positive, and HER2-negative cancers and did not overlap with the cell lines used in the 5AZA experiment. Because our final implementation is in RP-FNA samples, many markers underwent more extensive testing in lymphocytes. Markers generating any signal in lymphocytes were excluded prior to validation in clinical samples. Ninety-four of these 271 genes were methylated in breast cancer cell lines but not in benign HME cultures (34 of these genes were also methylated in lymphocytes). Of these, 72 did not occur on previously published methylation array-based marker lists (Fig. 3; refs. 22–25, 37). Many common breast cancer methylation markers were absent from our list, including 13 that we have recently been assessing for breast cancer–risk stratification: LOX, BNC1, CFTR, HS3ST2, CTSZ, ESR1, BRCA1, CDH13, RASSF1, CCND2, APC, RAR-β2, and SCGB3A1. Nine of these 13 genes did not meet expression array detection P value thresholds, and the others failed to meet thresholds for 5AZA-induced upregulation. Notably, only six of these 13 genes were among the 1,136 markers identified by Van der Auwera using the Infinium Human Methylation-27 platform (22), and only two were among the 263 identified by Hill using the same platform (24). None occurred on any of the other array-derived lists shown in Fig. 3.

Figure 3.

Comparison of gene lists identified by whole-genome methylation screens. Van der Auwera (22), Hill (24), and Fackler (23) used the Infinium Human Methylation-27 platform. Fackler used additional filters based on hormone receptor status of the tumors. The Cancer Genome Atlas data is derived from an Infinium Human Methylation-450 array (37). We selected loci associated with named genes where the mean β was 0.4 or higher for malignant than benign samples (similar to the criteria used by Hill). Kamalakaran (25) used an MspI digestion approach called the Methylation Oligonucleotide Microarray Analysis (MOMA).

Figure 3.

Comparison of gene lists identified by whole-genome methylation screens. Van der Auwera (22), Hill (24), and Fackler (23) used the Infinium Human Methylation-27 platform. Fackler used additional filters based on hormone receptor status of the tumors. The Cancer Genome Atlas data is derived from an Infinium Human Methylation-450 array (37). We selected loci associated with named genes where the mean β was 0.4 or higher for malignant than benign samples (similar to the criteria used by Hill). Kamalakaran (25) used an MspI digestion approach called the Methylation Oligonucleotide Microarray Analysis (MOMA).

Close modal

Qualitative MSP validation in clinical samples

From the 271 5AZA-induced genes assessed by qualitative MSP in breast cancer cell lines and HME cultures, 102 were excluded because of methylation signals in lymphocytes, 99 because no methylation signals could be detected for any cell type, and 21 because of very infrequent methylation in the cancer cell lines. The remaining 49 markers were assessed by qualitative MSP in a panel of fresh-frozen primary breast cancers (15), benign RP-FNA samples from untreated women recently diagnosed with breast cancer (5), and benign RP-FNA samples from women never diagnosed with breast cancer (10). Methylation prevalence was 20% or greater for cancers than benign samples for 33 of these markers and 40% or greater for 15 markers (Supplementary Table S2). Eight of these later markers do not occur on the methylation array-based marker lists included in Fig. 3 (HLA-B, LAT2, FBLN2, VCAN, ADM, FLNC, ARTN, and CLDN1; refs. 22–25, 37).

In silico comparison with The Cancer Genome Atlas data

The TCGA data presents an opportunity to understand the characteristic of our 5AZA-derived marker list in relation to methylation array-derived lists. TCGA data is derived from the Infinium Human Methylation-450 array, which quantifies the level of methylation (β-value) for each of 485,000 CpGs distributed across the genome for 627 breast cancers and 97 benign breast samples. We initially identified 140 loci associated with unique named genes where the mean β was 0.4 or higher for malignant than benign samples (Fig. 3). Of the 33 5AZA genes differentially methylated in cancer as compared with benign samples (tumor minus benign methylation prevalence, ≥0.2), only one occurred among these 140 most differentially methylated TCGA genes. This gene was GNE. Next we expanded the TCGA list to include any probe with a mean β difference greater than 0 for a unique named gene (n = 9,509). This list was ordered by FDR for tumor versus benign differentiation, to determine how many of our 33 differentially methylated genes could have been derived from TCGA data and how far down the list they occur. Fifteen of our 33 differentially methylated 5AZA genes were recognized as differentially methylated in the TCGA data, and their positions on the sorted list ranged from 199 for AKR1B1 (FDR = 8.5 × 10−135) to 6721 for GPX1 (FDR = 1.3 × 10−19). Finally, we identified the TCGA probe closest to our amplicon for the 49 genes assessed by MSP in clinical samples and compared the tumor versus benign discrimination from the MSP data (difference in methylation prevalence) with the TCGA data (difference in mean beta). We were able to identify TCGA probes 2 to 279 base pairs (mean, 71) from the center of our MSP amplicons for each of these 49 genes. There was significant correlation between tumor versus benign discrimination for MSP and TCGA for these 49 markers (Spearman r = 0.406, P = 0.004). In addition, the difference in mean β values for malignant versus benign samples from the TCGA data was 0.085 for the 15 markers with the greatest difference in methylation prevalence by MSP (≥0.4 difference threshold) as compared with only 0.028 for the 16 least discriminatory markers (<0.2 difference threshold, P = 0.025). This suggests that the best 5AZA-derived markers would be identifiable within the TCGA data if the filtering algorithms were generously relaxed.

QM-MSP validation in clinical samples

A subset of the genes with higher methylation prevalence in malignant than benign clinical samples was selected for QM-MSP testing. In order to select genes whose methylation status was largely independent of the methylation status of other genes, hierarchical clustering of MSP methylation results for 15 breast cancers was performed (data not shown). This identified two gene clusters. The first cluster included genes that were methylated in the majority of the cancers (AKRB1, PSAT1, CYP24A1, GBP4, HLA-B, LAT2, CPNE8, and UCHL1) and the second cluster included genes methylated in only a fraction of the cancers and clustering with one to five other genes. Four markers from clusters 1 and 8 distributed across cluster 2 (12 markers total) were selected for QM-MSP validation in a panel of 51 archival primary breast cancer FNAs, 59 archival benign RP-FNA samples from untreated women recently diagnosed with breast cancer, and 145 archival benign RP-FNA samples from women never diagnosed with breast cancer. These genes included WDR66, UCHL1, FBLN2, PSAT1, CLDN1, BIRC3, CYP24A1, CCNA, GNE, CXCL14, CPNE8, and AKR1B1. Of these, all except BIRC3, GNE, and WDR66 were differentially methylated between malignant and paired benign samples with P value between 0 and 0.013 by sum-of-difference analysis (Supplementary Table S3).

Fresh-frozen breast cancer versus breast cancer FNA

Methylation prevalence for these 12 genes was compared between the 15 fresh-frozen primary breast cancers used in the initial MSP screen and the 51 breast cancer FNA samples assessed in the QM-MSP screen using greater than 0.1% as the threshold for classifying a sample as positive, as this is the reported sensitivity of MSP (38). Methylation was identified at a modestly higher frequency in fresh-frozen breast cancer tissue than in breast cancer FNA samples for each of the 12 genes (mean 0.53 vs. 0.43, P = 0.13), and correlation between the samples for methylation prevalence was poor (Spearman r = 0.343, P = 0.298). This may reflect tumor heterogeneity, which is missed by FNA sampling.

QM-MSP validation in an expanded clinical sample set

Four markers with higher levels of methylation in malignant than in paired benign samples were selected for QM-MSP validation in an expanded clinical sample set (CPNE8, PSAT1, CXCL14, and CLDN1). GNE was included as the fifth marker because it had shown a trend for differential methylation in malignant as compared with paired benign samples (P = 0.085), and methylation in benign samples had only been observed for women recently diagnosed with a primary breast cancer. The sample set included 97 breast cancer FNAs (one bilateral breast cancer, Table 1), 104 benign RP-FNA samples from patients with breast cancer, and 223 benign RP-FNA samples from women never diagnosed with breast cancer.

Table 1.

Characteristics of the cancer patients and cancers

FeatureValue
Number of patients 96 
Number of cancers 97 
Patient characteristics 
Mean age (range) 56 (31–93) 
Race/ethnicity 
 Asian (%) 2 (2.1) 
 African-American (%) 25 (25.8) 
 Non-Hispanic Caucasian (%) 63 (64.9) 
 Hispanic (%) 7 (7.2) 
Menopausal status 
 Premenopausal (%) 35 (36.1) 
 Postmenopausal (%) 62 (63.1) 
Known BRCA gene mutation 
Tumor characteristics 
Histology (%) 
 Infiltrating ductal 81 (83.5) 
 Infiltrating lobular 10 (10.3) 
 Mucinous 2 (2.1) 
 Metaplastic 2 (2.1) 
 Medullary 1 (1.0) 
 Small cell 1 (1.0) 
Grade (%) 
 1 19 (19.6) 
 2 39 (40.2) 
 3 38 (39.2) 
 Unknown 1 (1.0) 
Associated DCIS (%) 
 Any 76 (78.4) 
 Unknown 4 (4.1) 
 ≥25% 19 (19.6) 
 Unknown 6 (6.2) 
Tumor size (%) 
 T1 41 (42.3) 
 T2 49 (50.5) 
 T3 7 (7.2) 
Nodal status (%) 
 pN0 64 (66.0) 
 pN1 20 (20.6) 
 pN2 11 (11.3) 
 pN3 2 (2.1) 
Biomarkers (%) 
 ER positive 66 (68.0) 
 ER unknown 1 (1.0) 
 PR positive 64 (66.0) 
 PR unknown 1 (1.0) 
 HER2 positive 9 (9.3) 
 HER2 unknown 1 (1.0) 
 Ki67 ≥15% 61 (62.9) 
 Ki67 unknown 1 (1.0) 
 p53 ≥ 10% 30 (30.9) 
 p53 unknown 4 (4.1) 
FeatureValue
Number of patients 96 
Number of cancers 97 
Patient characteristics 
Mean age (range) 56 (31–93) 
Race/ethnicity 
 Asian (%) 2 (2.1) 
 African-American (%) 25 (25.8) 
 Non-Hispanic Caucasian (%) 63 (64.9) 
 Hispanic (%) 7 (7.2) 
Menopausal status 
 Premenopausal (%) 35 (36.1) 
 Postmenopausal (%) 62 (63.1) 
Known BRCA gene mutation 
Tumor characteristics 
Histology (%) 
 Infiltrating ductal 81 (83.5) 
 Infiltrating lobular 10 (10.3) 
 Mucinous 2 (2.1) 
 Metaplastic 2 (2.1) 
 Medullary 1 (1.0) 
 Small cell 1 (1.0) 
Grade (%) 
 1 19 (19.6) 
 2 39 (40.2) 
 3 38 (39.2) 
 Unknown 1 (1.0) 
Associated DCIS (%) 
 Any 76 (78.4) 
 Unknown 4 (4.1) 
 ≥25% 19 (19.6) 
 Unknown 6 (6.2) 
Tumor size (%) 
 T1 41 (42.3) 
 T2 49 (50.5) 
 T3 7 (7.2) 
Nodal status (%) 
 pN0 64 (66.0) 
 pN1 20 (20.6) 
 pN2 11 (11.3) 
 pN3 2 (2.1) 
Biomarkers (%) 
 ER positive 66 (68.0) 
 ER unknown 1 (1.0) 
 PR positive 64 (66.0) 
 PR unknown 1 (1.0) 
 HER2 positive 9 (9.3) 
 HER2 unknown 1 (1.0) 
 Ki67 ≥15% 61 (62.9) 
 Ki67 unknown 1 (1.0) 
 p53 ≥ 10% 30 (30.9) 
 p53 unknown 4 (4.1) 

Each of these five markers is methylated at a greater frequency in primary breast cancer FNAs than in benign RP-FNAs, and each, except GNE, shows significantly greater methylation in primary cancers than paired benign samples (Table 2). The sum-of-differences permutation method was used to assess the association between methylation of specific genes and various clinical and pathologic features of the patients and tumors (Table 3). Most notably, the PSAT1 methylation was associated with low-grade, low-proliferation, hormone receptor (HR)-positive, lymph node–positive breast cancer in postmenopausal Caucasian women and with infiltrating lobular carcinoma. Conversely, GNE methylation was associated with high-grade, HR-negative breast cancer in younger women. Most of the markers showed greater methylation in tumors from Caucasian women than African-American women, and CXCL14 methylation was associated with HER-2/neu-positive breast cancer. Methylation of CLDN1 in a benign RP-FNA sample was marginally predictive of methylation of CLDN1 in the paired cancer sample (P = 0.058). This was not observed for the other four markers.

Table 2.

Frequency of methylation (≥0.1%) assessed by QM-MSP in primary cancer FNAs and benign RP-FNAs

SymbolCA (97)Benign (327)P valueSoD CA_B9CA (81)P value (SoD)
CPNE8 0.54 0.14 <0.0001 34.1 
PSAT1 0.44 0.07 <0.0001 20.6 
CXCL14 0.40 0.08 <0.0001 25.7 
CLDN1 0.36 0.11 <0.0001 21.9 0.0002 
GNE 0.07 0.02 0.008 0.9 0.09 
SymbolCA (97)Benign (327)P valueSoD CA_B9CA (81)P value (SoD)
CPNE8 0.54 0.14 <0.0001 34.1 
PSAT1 0.44 0.07 <0.0001 20.6 
CXCL14 0.40 0.08 <0.0001 25.7 
CLDN1 0.36 0.11 <0.0001 21.9 0.0002 
GNE 0.07 0.02 0.008 0.9 0.09 

NOTE: The number in parentheses in the column header is the number of samples tested. SoD CA_B9CA is the sum of difference for methylation frequencies of primary cancers and paired benign samples (see Materials and Methods). Greater numbers indicate higher levels of methylation in greater numbers of cancer samples.

P value (SoD) is the P value for the SoD values computed by a permutation method.

Table 3.

DNA methylation in cancer samples in relation to clinical and pathologic features

GeneSoDaP valuebfdrP valuec
Estrogen receptor positive 
PSAT1 17.71 
GNE −2.38 0.001 0.018 
CPNE8 16.35 0.007 0.06 
Progesterone receptor positive 
PSAT1 16.23 
CPNE8 18.32 0.003 0.038 
GNE −1.88 0.006 0.054 
Caucasian (vs. African American) 
PSAT1 15.28 0.0008 0.017 
CPNE8 19.16 0.004 0.038 
CXCL14 10.22 0.059 0.198 
Grade I or II (vs. III) 
PSAT1 14.68 0.0004 0.014 
GNE −1.89 0.004 0.038 
Ki67 <15% 
PSAT1 14.06 0.0008 0.017 
Age >50 
CXCL14 14.81 0.004 0.038 
GNE −1.54 0.036 0.145 
PSAT1 9.10 0.037 0.145 
Lobular histology (vs. ductal) 
PSAT1 17.26 0.014 0.092 
CXCL14 17.65 0.027 0.122 
Postmenopausal 
PSAT1 10.03 0.015 0.094 
HER-2/neu positive 
CXCL14 19.02 0.019 0.1 
Lymph node positive 
PSAT1 8.75 0.042 0.155 
GeneSoDaP valuebfdrP valuec
Estrogen receptor positive 
PSAT1 17.71 
GNE −2.38 0.001 0.018 
CPNE8 16.35 0.007 0.06 
Progesterone receptor positive 
PSAT1 16.23 
CPNE8 18.32 0.003 0.038 
GNE −1.88 0.006 0.054 
Caucasian (vs. African American) 
PSAT1 15.28 0.0008 0.017 
CPNE8 19.16 0.004 0.038 
CXCL14 10.22 0.059 0.198 
Grade I or II (vs. III) 
PSAT1 14.68 0.0004 0.014 
GNE −1.89 0.004 0.038 
Ki67 <15% 
PSAT1 14.06 0.0008 0.017 
Age >50 
CXCL14 14.81 0.004 0.038 
GNE −1.54 0.036 0.145 
PSAT1 9.10 0.037 0.145 
Lobular histology (vs. ductal) 
PSAT1 17.26 0.014 0.092 
CXCL14 17.65 0.027 0.122 
Postmenopausal 
PSAT1 10.03 0.015 0.094 
HER-2/neu positive 
CXCL14 19.02 0.019 0.1 
Lymph node positive 
PSAT1 8.75 0.042 0.155 

aSum of differences (see Materials and Methods). A positive value indicates that the listed clinical or pathologic feature is associated with DNA methylation.

bCalculated using the permutation method described in the Materials and Methods section.

cFalse discovery rate–corrected P value.

Whole-genome array-based approaches have identified hundreds of potential methylation markers in breast cancer (22–25, 37) using probes specific for individual CpGs found near named genes. Studies using the same methylation platform will generally identify different markers depending on the samples tested and the filtering criteria used. These marker lists will be even more divergent for different platforms (Fig. 3). Methylation arrays examine individual CpGs. One highly methylated CpG does not necessarily equate to dense CpG methylation in a region critical for regulation of gene expression. Indeed, for a subset of genes, CpG methylation is a consequence of transcriptional repression and not a cause as evidenced by failure of 5AZA treatment to induce expression (39).

In order to identify DNA methylation markers with a high probability of regulating gene expression, we started by identifying genes expressed at higher levels in benign breast epithelial cells than breast cancer cells and then selected genes that were induced by 5AZA treatment in the cancer cells. Two phases of MSP screening were used to identify 33 markers differentially methylated in clinical breast cancer samples as compared with clinical benign samples without generating signals in lymphocytes (Supplementary Table S2, δCA-B9 ≥ 0.2). These are breast cancer-relevant methylation markers optimized for FNA samples.

The clinical relevance of a subset of these markers was demonstrated in an expanded clinical sample set using QM-MSP. Consistent with other studies (19, 23, 24, 26, 28, 40–44), DNA methylation was observed more frequently in HR-positive than HR-negative breast cancer, and this difference was most marked for PSAT1, CPNE8, UCHL1, and AKR1B1. Notably, PSAT1 methylation was associated with low-grade, HR-positive, lymph node–positive breast cancer in postmenopausal women and infiltrating lobular cancer. In addition, GNE methylation was associated with high-grade, HR-negative breast cancer and CXCL14 methylation with HER2-positive breast cancer.

Only 12 of the 33 5AZA-selected genes we identified as differentially methylated in malignant as compared with benign clinical samples were also identified by methylation array studies (CYP24A1, NBL1, CPNE8, AKR1B1, UCHL1, HBA2, GSTP1, GBP4, PSAT1, IRF7, PYCARD, and GNE; refs. 22–25, 37). In addition, many common tumor suppressor genes well-documented to be regulated by promoter-region hypermethylation were missed by our 5AZA approach and by methylation array approaches (e.g., BRCA1, CFTR, CTSZ, ESR1, LOX, RAR-β, and SCGB3A1) although CCND2, APC, RASSF1, CDH13, BNC1, and HS3ST2 do occur among the 1,136 genes identified by Van der Auwera using the Infinium Human Methylation-27 platform. These genes were missed by our 5AZA approach because of failure to meet expression level or quality thresholds on the initial gene expression array as described in the Results section. Methylation array data are generally filtered based on mean β-scores for individual CpGs in order to retain genes where neighboring CpGs are methylated for the majority of DNA copies in the majority of samples. This will overlook genes that are methylated in a relatively small subpopulation of cells or methylated in only a small subset of tumors. Methylation-specific PCR, used for methylation marker discovery in the past and for initial validation in our study, will generate a signal when only 0.1% of DNA copies are methylated (38). This may permit recognition of minor cell populations important for tumor biology and maintenance. In this regard, the sum-of-difference permutation method we described for analysis of quantitative methylation data essentially compares the “sizes” of methylated populations between two groups and is ideal for recognizing differences in the relative abundance of minor cell populations.

The clinical utility of methylation markers in breast cancer has not yet been well established, and it is unclear whether the greatest value will come from loci methylated in major cell populations in most cancers, or from genes regulated by promoter methylation in minor cell populations. Our primary interest is in RP-FNA markers for breast cancer risk stratification. For this application, identification of minor cell populations (e.g., tumor-initiating cells) may be particularly relevant. However, our data suggest that tumor FNA samples may not capture the entire methylation spectrum identifiable in fresh-frozen tumor tissue. Additional work is required to recognize the markers most relevant to tumor initiation, to validate tumor FNA as an adequate sampling approach, and to map relevant methylation changes in benign breast tissue.

No potential conflicts of interest were disclosed.

Conception and design: D. Bu, C. Lewis, A.F. Gazdar, D.M. Euhus

Development of methodology: D. Bu, C. Lewis, A.F. Gazdar, D.M. Euhus

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C. Lewis, V. Sarode, A. Lazorowitz, R. Rao, M. Leitch, V. Andrews, A.F. Gazdar, D.M. Euhus

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D. Bu, C. Lewis, M. Chen, X. Ma, A. Moldrem, A.F. Gazdar, D.M. Euhus

Writing, review, and/or revision of the manuscript: C. Lewis, V. Sarode, R. Rao, V. Andrews, A.F. Gazdar, D.M. Euhus

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): D. Bu, C. Lewis, A. Moldrem, D.M. Euhus

Study supervision: D.M. Euhus

The BRCA1-methylated cell line, UACC3199, was kindly provided by the Arizona Cancer Center, Tucson, Arizona.

This work was financially supported by the Department of Defense Breast Cancer Research Program, contract number W81XWH-07-1-0262, to D. M. Euhus.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Fabian
CJ
,
Kimler
BF
,
Zalles
CM
,
Klemp
JR
,
Petroff
BK
,
Khan
QJ
, et al
Reduction in Ki-67 in benign breast tissue of high-risk women with the lignan secoisolariciresinol diglycoside
.
Cancer Prev Res (Phila)
2010
;
3
:
1342
50
.
2.
Bartels
PH
,
Fabian
CJ
,
Kimler
BF
,
Ranger-Moore
JR
,
Frank
DH
,
Yozwiak
ML
, et al
Karyometry of breast epithelial cells acquired by random periareolar fine needle aspiration in women at high risk for breast cancer
.
Anal Quant Cytol Histol
2007
;
29
:
63
70
.
3.
Fabian
CJ
,
Kimler
BF
,
Zalles
CM
,
Khan
QJ
,
Mayo
MS
,
Phillips
TA
, et al
Reduction in proliferation with six months of letrozole in women on hormone replacement therapy
.
Breast Cancer Res Treat
2007
;
106
:
75
84
.
4.
Fabian
CJ
,
Kimler
BF
,
Brady
DA
,
Mayo
MS
,
Chang
CH
,
Ferraro
JA
, et al
A phase II breast cancer chemoprevention trial of oral alpha-difluoromethylornithine: breast tissue, imaging, and serum and urine biomarkers
.
Clin Cancer Res
2002
;
8
:
3105
17
.
5.
Khan
SA
,
Chatterton
RT
,
Michel
N
,
Bryk
M
,
Lee
O
,
Ivancic
D
, et al
Soy isoflavone supplementation for breast cancer risk reduction: a randomized phase II trial
.
Cancer Prev Res (Phila)
2012
;
5
:
309
19
.
6.
Baker
JC
 Jr
,
Ostrander
JH
,
Lem
S
,
Broadwater
G
,
Bean
GR
,
D'Amato
NC
, et al
ESR1 promoter hypermethylation does not predict atypia in RPFNA nor persistent atypia after 12 months tamoxifen chemoprevention
.
Cancer Epidemiol Biomarkers Prev
2008
;
17
:
1884
90
.
7.
Euhus
D
,
Bu
D
,
Xie
XJ
,
Sarode
V
,
Ashfaq
R
,
Hunt
K
, et al
Tamoxifen downregulates ets oncogene family members ETV4 and ETV5 in benign breast tissue: implications for durable risk reduction
.
Cancer Prev Res (Phila)
2011
;
4
:
1852
62
.
8.
Novak
P
,
Jensen
TJ
,
Garbe
JC
,
Stampfer
MR
,
Futscher
BW
. 
Stepwise DNA methylation changes are linked to escape from defined proliferation barriers and mammary epithelial cell immortalization
.
Cancer Res
2009
;
69
:
5251
8
.
9.
Umbricht
CB
,
Evron
E
,
Gabrielson
E
,
Ferguson
A
,
Marks
J
,
Sukumar
S
. 
Hypermethylation of 14-3-3 sigma (stratifin) is an early event in breast cancer
.
Oncogene
2001
;
20
:
3348
53
.
10.
Tommasi
S
,
Karm
DL
,
Wu
X
,
Yen
Y
,
Pfeifer
GP
. 
Methylation of homeobox genes is a frequent and early epigenetic event in breast cancer
.
Breast Cancer Res
2009
;
11
:
R14
.
11.
Tlsty
TD
,
Crawford
YG
,
Holst
CR
,
Fordyce
CA
,
Zhang
J
,
McDermott
K
, et al
Genetic and epigenetic changes in mammary epithelial cells may mimic early events in carcinogenesis
.
J Mammary Gland Biol Neoplasia
2004
;
9
:
263
74
.
12.
Vasilatos
SN
,
Broadwater
G
,
Barry
WT
,
Baker
JC
 Jr
,
Lem
S
,
Dietze
EC
, et al
CpG island tumor suppressor promoter methylation in non-BRCA-associated early mammary carcinogenesis
.
Cancer Epidemiol Biomarkers Prev
2009
;
18
:
901
14
.
13.
Bean
GR
,
Ibarra
Drendall C
,
Goldenberg
VK
,
Baker
JC
 Jr
,
Troch
MM
,
Paisie
C
, et al
Hypermethylation of the breast cancer-associated gene 1 promoter does not predict cytologic atypia or correlate with surrogate end points of breast cancer risk
.
Cancer Epidemiol Biomarkers Prev
2007
;
16
:
50
6
.
14.
Bean
GR
,
Bryson
AD
,
Pilie
PG
,
Goldenberg
V
,
Baker
JC
 Jr
,
Ibarra
C
, et al
Morphologically normal-appearing mammary epithelial cells obtained from high-risk women exhibit methylation silencing of INK4a/ARF
.
Clin Cancer Res
2007
;
13
:
6834
41
.
15.
Bean
GR
,
Scott
V
,
Yee
L
,
Ratliff-Daniel
B
,
Troch
MM
,
Seo
P
, et al
Retinoic acid receptor-B2 promoter methylation in random periareolar fine needle aspiration
.
Cancer Epi Biomark Prev
2005
;
14
:
790
8
.
16.
Van der Auwera
I
,
Bovie
C
,
Svensson
C
,
Trinh
XB
,
Limame
R
,
van Dam
P
, et al
Quantitative methylation profiling in tumor and matched morphologically normal tissues from breast cancer patients
.
BMC Cancer
2010
;
10
:
97
.
17.
Euhus
DM
,
Bu
D
,
Ashfaq
R
,
Xie
XJ
,
Bian
A
,
Leitch
AM
, et al
Atypia and DNA methylation in nipple duct lavage in relation to predicted breast cancer risk
.
Cancer Epidemiol Biomarkers Prev
2007
;
16
:
1812
21
.
18.
Euhus
DM
,
Bu
D
,
Milchgrub
S
,
Xie
XJ
,
Bian
A
,
Leitch
AM
, et al
DNA methylation in benign breast epithelium in relation to age and breast cancer risk
.
Cancer Epidemiol Biomarkers Prev
2008
;
17
:
1051
9
.
19.
Feng
W
,
Shen
L
,
Wen
S
,
Rosen
DG
,
Jelinek
J
,
Hu
X
, et al
Correlation between CpG methylation profiles and hormone receptor status in breast cancers
.
Breast Cancer Res
2007
;
9
:
R57
.
20.
Cho
YH
,
Yazici
H
,
Wu
HC
,
Terry
MB
,
Gonzalez
K
,
Qu
M
, et al
Aberrant promoter hypermethylation and genomic hypomethylation in tumor, adjacent normal tissues and blood from breast cancer patients
.
Anticancer Res
2010
;
30
:
2489
96
.
21.
Lewis
CM
,
Cler
LR
,
Bu
DW
,
Zochbauer-Muller
S
,
Milchgrub
S
,
Naftalis
EZ
, et al
Promoter hypermethylation in benign breast epithelium in relation to predicted breast cancer risk
.
Clin Cancer Res
2005
;
11
:
166
72
.
22.
Van der Auwera
I
,
Yu
W
,
Suo
L
,
Van Neste
L
,
van Dam
P
,
Van Marck
EA
, et al
Array-based DNA methylation profiling for breast cancer subtype discrimination
.
PLoS ONE
[Electronic Resource]
2010
;
5
:
e12616
.
23.
Fackler
MJ
,
Umbricht
CB
,
Williams
D
,
Argani
P
,
Cruz
LA
,
Merino
VF
, et al
Genome-wide methylation analysis identifies genes specific to breast cancer hormone receptor status and risk of recurrence
.
Cancer Res
2011
;
71
:
6195
207
.
24.
Hill
VK
,
Ricketts
C
,
Bieche
I
,
Vacher
S
,
Gentle
D
,
Lewis
C
, et al
Genome-wide DNA methylation profiling of CpG islands in breast cancer identifies novel genes associated with tumorigenicity
.
Cancer Res
2011
;
71
:
2988
99
.
25.
Kamalakaran
S
,
Varadan
V
,
Giercksky
Russnes HE
,
Levy
D
,
Kendall
J
,
Janevski
A
, et al
DNA methylation patterns in luminal breast cancers differ from non-luminal subtypes and can identify relapse risk independent of other clinical variables
.
Mol Oncol
2011
;
5
:
77
92
.
26.
Holm
K
,
Hegardt
C
,
Staaf
J
,
Vallon-Christersson
J
,
Jonsson
G
,
Olsson
H
, et al
Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns
.
Breast Cancer Res
2010
;
12
:
R36
.
27.
Fang
F
,
Turcan
S
,
Rimner
A
,
Kaufman
A
,
Giri
D
,
Morris
LG
, et al
Breast cancer methylomes establish an epigenomic foundation for metastasis
.
Sci Transl Med
2011
;
3
:
75ra25
.
28.
Li
L
,
Lee
KM
,
Han
W
,
Choi
JY
,
Lee
JY
,
Kang
GH
, et al
Estrogen and progesterone receptor status affect genome-wide DNA methylation profile in breast cancer
.
Hum Mol Genet
2010
;
19
:
4273
7
.
29.
Tlsty
TD
,
Romanov
SR
,
Kozakiewicz
BK
,
Holst
CR
,
Haupt
LM
,
Crawford
YG
. 
Loss of chromosomal integrity in human mammary epithelial cells subsequent to escape from senescence
.
J Mammary Gland Biol Neoplasia
2001
;
6
:
235
43
.
30.
Genome Bioinformatics Group of UC Santa Cruz, The UCSC Genome Browser
, http://genome.ucsc.edu/cgi-bin/hgGateway.
Accessed August 18
, 
2013
.
31.
Jirtle
RL
,
geneimprint
, http://www.geneimprint.com/site/genes-by-species.
Accessed August 18
, 
2013
.
32.
Li
LC
,
Dahiya
R
. 
MethPrimer: designing primers for methylation PCRs
.
Bioinformatics
2002
;
18
:
1427
31
.
33.
Premier Biosoft International, Beacon Designer, version 7.21
.
34.
Applied Biosystems, Methyl Primer Express, version 1.0
.
35.
Fackler
MJ
,
McVeigh
M
,
Mehrotra
J
,
Blum
MA
,
Lange
J
,
Lapides
A
, et al
Quantitative multiplex methylation-specific PCR assay for the detection of promoter hypermethylation in multiple genes in breast cancer
.
Cancer Res
2004
;
64
:
4442
52
.
36.
Benjamini
Y
,
Hochberg
Y
. 
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J R Stat Soc B
1995
;
57
:
289
300
.
37.
US Department of Health and Human Services
. 
National Cancer Institute. The Cancer Genome Atlas
. http://cancergenome.nih.gov.
Accessed August 18
, 
2013
.
38.
Herman
JG
,
Graff
JR
,
Myohanen
S
,
Nelkin
BD
,
Baylin
SB
. 
Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands
.
Proc Natl Acad Sci U S A
1996
;
93
:
9821
6
.
39.
Sproul
D
,
Nestor
C
,
Culley
J
,
Dickson
JH
,
Dixon
JM
,
Harrison
DJ
, et al
Transcriptionally repressed genes become aberrantly methylated and distinguish tumors of different lineages in breast cancer
.
Proc Natl Acad Sci U S A
2011
;
108
:
4364
9
.
40.
Widschwendter
M
,
Siegmund
KD
,
Muller
HM
,
Fiegl
H
,
Marth
C
,
Muller-Holzner
E
, et al
Association of breast cancer DNA methylation profiles with hormone receptor status and response to tamoxifen
.
Cancer Res
2004
;
64
:
3807
3813
.
41.
Sunami
E
,
Shinozaki
M
,
Sim
MS
,
Nguyen
SL
,
Vu
AT
,
Giuliano
AE
, et al
Estrogen receptor and HER2/neu status affect epigenetic differences of tumor-related genes in primary breast tumors
.
Breast Cancer Res
2008
;
10
:
R46
.
42.
Park
SY
,
Kwon
HJ
,
Choi
Y
,
Lee
HE
,
Kim
SW
,
Kim
JH
, et al
Distinct patterns of promoter CpG island methylation of breast cancer subtypes are associated with stem cell phenotypes
.
Mod Pathol
2012
;
25
:
185
96
.
43.
Lee
JS
,
Fackler
MJ
,
Lee
JH
,
Choi
C
,
Park
MH
,
Yoon
JH
, et al
Basal-like breast cancer displays distinct patterns of promoter methylation
.
Cancer Biol Ther
2010
;
9
:
1017
24
.
44.
Ronneberg
JA
,
Fleischer
T
,
Solvang
HK
,
Nordgard
SH
,
Edvardsen
H
,
Potapenko
I
, et al
Methylation profiling with a panel of cancer related genes: association with estrogen receptor, TP53 mutation status and expression subtypes in sporadic breast cancer
.
Mol Oncol
2011
;
5
:
61
76
.