Abstract
Background: Random periareolar fine-needle aspiration (RP-FNA) is increasingly used in trials of breast cancer prevention for biomarker assessments. DNA methylation markers may have value as surrogate endpoint biomarkers, but this requires identification of biologically relevant markers suitable for paucicellular, lymphocyte-contaminated clinical samples.
Methods: Unbiased whole-genome 5-aza-2′-deoxycytidine (5AZA)–induced gene expression assays, followed by several phases of qualitative and quantitative methylation-specific PCR (MSP) testing, were used to identify novel breast cancer DNA methylation markers optimized for clinical FNA samples.
Results: The initial 5AZA experiment identified 453 genes whose expression was potentially regulated by promoter region methylation. Informatics filters excluded 273 genes unlikely to yield useful DNA methylation markers. MSP assays were designed for 271 of the remaining genes and, ultimately, 33 genes were identified that were differentially methylated in clinical breast cancer samples, as compared with benign RP-FNA samples, and never methylated in lymphocytes. A subset of these markers was validated by quantitative multiplex MSP in extended clinical sample sets. Using a novel permutation method for analysis of quantitative methylation data, PSAT1, GNE, CPNE8, and CXCL14 were found to correlate strongly with specific clinical and pathologic features of breast cancer. In general, our approach identified markers methylated in a smaller subpopulation of tumor cells than those identified in published methylation array studies.
Conclusions: Clinically relevant DNA methylation markers were identified using a 5AZA-induced gene expression approach.
Impact: These breast cancer-relevant, FNA-optimized DNA methylation markers may have value as surrogate endpoint biomarkers in RP-FNA studies. Cancer Epidemiol Biomarkers Prev; 22(12); 2212–21. ©2013 AACR.
This article is highlighted in the In This Issue feature, p. 2143
Introduction
Random periareolar fine-needle aspiration (RP-FNA) samples are frequently used in phase 2 trials of breast cancer prevention for assessments of surrogate endpoint biomarkers (1–7). DNA methylation is a potentially reversible early event in breast carcinogenesis (8–11), and is readily assessable in paucicellular RP-FNA samples (12–15). For some genes, higher levels of methylation are identified in benign breast cells from cancer patients than similar samples from unaffected women (16–18) and there is, often, a correlation between specific methylation changes observed in benign breast tissue and the associated cancers (16, 19, 20). This suggests that measures of DNA methylation in benign breast tissue may predict breast cancer risk, and, in fact, DNA methylation of some genes is observed more frequently in breast epithelial cells obtained by RP-FNA from high-risk women as compared with lower risk women (17, 18, 21); in addition, tamoxifen has been shown to reduce methylation measured in RP-FNA samples for some genes (7). Although the risk signals generated by these candidate gene studies are intriguing, the clinical relevance of hazard ratios in the 2.5 to 3.0 range is marginal at best.
Clinically useful RP-FNA-based breast cancer risk stratification may be achievable with better DNA methylation markers. Methylation arrays have greatly expanded the list of potential markers (22–28), but have not necessarily identified genes whose expression is regulated by promoter region methylation. We used a genome-wide 5-aza-2′-deoxycytidine (5AZA) approach to identify genes whose expression is upregulated with demethylation. The gene lists were subsequently refined based on several phases of qualitative and quantitative methylation-specific PCR (MSP) assays targeting specific promoter regions in benign and malignant breast samples and lymphocytes. A selected subset of markers was assessed in a large panel of clinical breast cancer FNA and benign breast RP-FNA samples to confirm clinical relevance.
Materials and Methods
Cell lines and primary cell cultures for demethylation studies
Six breast cancer cell lines were selected for demethylation studies based on known tumor suppressor gene expression regulation by promoter region hypermethylation: HCC1569 (CCND2), HCC1954 (SCGB3A1, APC, RASSF1A), MCF-7 (RAR-β2), MDA-MB-231 (ESR1), UACC3199 (BRCA1), and BT-549 (hypermethylator phenotype). Apart from MCF10A, we specifically avoided immortalized benign human mammary epithelial cell lines for this experiment as these cells frequently show tumor suppressor gene methylation (e.g., p16) and gene expression profiles that are intermediate between normal breast epithelial cells and breast cancer (29). Instead, we opted to test six first-passage benign human mammary epithelial cell cultures (HME) generated in serum-free media from small fragments of normal breast tissue obtained from young women undergoing fibroadenoma excision. Fresh benign breast tissue was minced and digested overnight on a rotator at 37°C in 1 mg/mL collagenase (C2674, Sigma-Aldrich) in Mammary Epithelial Cell Growth Medium (MEGM; NC9523177, Fisher Scientific) and then centrifuged at 200g for 3 minutes. The pellet was rinsed with PBS, centrifuged again, and then treated with 0.25% trypsin in EDTA at 37°C for 3 minutes to dissociate the cell clumps. After neutralizing the trypsin, the cells were centrifuged for 3 minutes at 200g, resuspended in MEGM with 5% FBS, and plated into tissue culture flasks. The media was changed to MEGM without serum the next day. Contaminating fibroblasts were removed by a quick trypsinization 3 to 5 days later.
5AZA treatment
Dose-dependent cytostasis was observed in 5AZA-treated cells, and was most pronounced for the primary benign breast epithelial cell cultures. The 5AZA dose (0.5 μmol/L) was selected based on evaluation of growth curves and induction of BNC1, SERPINB, and TKTL1 gene expression measured by reverse transcription polymerase chain reaction (RT-PCR) in benign and malignant cells. The breast cancer cell lines, HME cultures, and MC10A cells were treated with 0.5 μmol/L 5AZA (Sigma-Aldrich) in dimethylsulfoxide (DMSO) or DMSO alone for 6 days, after which RNA was prepared using the Illumina TotalPrep kit (AMIL1791, Life Technologies). This RNA was used for whole-genome expression analysis.
Gene expression analysis
Whole-genome expression was assessed using the Illumina HumanWG-6-v3 chip. The initial analysis was done in GeneSpring as follows. Flags were set to present for entities with detection P values of 0.2 or less and absent for entities with detection P values of more than 0.4. Raw signals with values less than 1 were reset to 1. Quantile normalization was used, and baseline transformation performed on the basis of the median expression level for all entities. Of the 48,701 entities included on the Illumina array, flags were present or marginal for 75% of the samples for 27,324 entities, and these were retained. The complete dataset is available on the NCBI GEO site (GSE41692).
Informatics filters
Publically available informatics tools were used to limit the gene list identified in the 5AZA experiment to those most likely to yield useful epigenetic markers (30–34). Specific exclusions included duplicate genes (192), no CpG island (87), known imprinted genes (5), X-chromosome genes and likely imprinted (30), pseudogenes or noncoding RNA (6), poorly annotated or discontinued records (12), and CpG island more than 2,000 bp proximal to the transcription start site (33).
Qualitative methylation-specific PCR
Qualitative methylation-specific PCR (MSP) assays were designed and run as described previously (21). For genes previously shown to be epigenetically regulated, published sequences were used. For novel genes, primers were designed within the CpG island as close to the transcriptional start site as possible (median 5′ distance from the transcription start site was 97 bp with a median product size of 113 bp). Primer sequences used in the initial MSP screen are shown in Supplementary Table S1. For the second MSP screen in clinical samples, HotStar taq (Qiagen) was used, and MgCl2 concentrations and PCR cycles optimized to permit the detection of 60 pg of methylated NaBS-treated gDNA in a 6-ng DNA sample (Supplementary Table S2).
Quantitative multiplex methylation-specific PCR
Quantitative multiplex MSP (QM-MSP) assays were designed as described previously by Fackler (35) using the same conditions and quality assurance standards we have previously described (17, 18). Results were expressed as a methylation fraction, which is calculated as methylated copies/(methylated copies + unmethylated copies). Primer and probe sequences are shown in Supplementary Table S3.
Extended FNA sample set for clinical and pathologic correlations
Five markers were selected from the QM-MSP screen for assessment in an expanded clinical sample set that included all of the samples used in the initial QM-MSP screen with the addition of new, prospectively acquired samples. This research was performed in accordance with an assurance filed with and approved by the U.S. Department of Health and Human Services. Institutional Review Board approval was obtained, and informed consent was documented in writing for each participant. Unselected patients with untreated stages 1 to 3 primary breast cancer (n = 52) underwent FNA sampling of the primary tumor at the time of definitive breast cancer surgery. In addition, benign breast epithelium was obtained by RP-FNA from patients with breast cancer (n = 52) and women never diagnosed with breast cancer (n = 90) as previously described (18, 21). Samples not meeting quality assurance standards for QM-MSP (∼15%) were excluded from analysis; therefore, the number of samples shown in the tables is lower than the number collected. The total number of samples included in this analysis (archival plus prospectively acquired) is 97 breast cancers (one bilateral breast cancer), 104 benign RP-FNA samples from cancer patients, and 223 benign RP-FNA samples from women never diagnosed with breast cancer.
Clinical information was prospectively collected on case report forms and tumor data, including tumor size, nodal status, associated ductal carcinoma in situ (DCIS), expression of estrogen receptor (ER), progesterone receptor, and Her-2/neu, proliferative index (Ki67), and p53 expression, were subsequently abstracted from the pathology reports.
A permutation method for comparing quantitative methylation data between patient groups
For the purposes of summarization and comparison, quantitative methylation data are typically dichotomized using an arbitrary threshold value above which a sample is scored as methylated and below which a sample is scored as not methylated. This approach artificially constrains the data, reducing it to the level of qualitative MSP data. To avoid this when making comparisons between patient groups, we dichotomized the patient variable of interest, ER status for example, and then plotted the proportion of variable (+) and variable (−) cases that exceeded a continuous range of possible methylation threshold values (e.g., 10−8 to 0.5). These two curves (Fig. 1) were compared as follows. Methylation values were randomly permuted to destroy the association, if any, between methylation of a specific gene and the selected clinicopathologic variable of interest. For each permuted data set, we calculated the proportions of cases in each of the two categories as in Fig. 1. Note that if there is no association between the methylation and the clinicopathologic variable, the two curves are expected to be identical. We then computed the differences between the two curves and used their summation as a measure of the similarity between the two groups. A null distribution for the sum of the differences is developed over the course of 10,000 iterations, and P values are calculated based on the null distribution. To account for the multiple comparisons included in this analysis, false-discovery rate (FDR)–corrected P values were also calculated by the method of Benjamini and Hochberg (36).
Statistical approach for comparing quantitative methylation data between two groups. In this example, the proportion of samples positive for PSAT1 methylation across a range of threshold values is plotted for estrogen receptor positive (solid line) and estrogen receptor negative cancers (dashed line). A permutation method is used to calculate the sum of differences (SoD) between the two curves and the statistical significance of that difference. For these curves, SoD = 17.71 and P = 0.
Statistical approach for comparing quantitative methylation data between two groups. In this example, the proportion of samples positive for PSAT1 methylation across a range of threshold values is plotted for estrogen receptor positive (solid line) and estrogen receptor negative cancers (dashed line). A permutation method is used to calculate the sum of differences (SoD) between the two curves and the statistical significance of that difference. For these curves, SoD = 17.71 and P = 0.
Results
Initially, 5AZA-induced gene expression was used to identify potential breast cancer-relevant methylation markers. The resulting gene list was refined using informatics filters followed by three phases of MSP testing in panels of cell lines, benign primary breast epithelial cell cultures, lymphocytes, and clinical samples. A subset of five markers fully optimized for RP-FNA samples was tested in an extended clinical sample set to confirm clinical relevance. The marker discovery and validation pipeline is shown in Supplementary Fig. S1.
5AZA-induced gene expression
Volcano plots initially identified 72 genes whose expression was induced 2-fold or more with an FDR of less than 0.05 in breast cancer cell lines treated with 5AZA. Expression of two genes was reduced by 5AZA by 2-fold or more with FDR less than 0.05 (HDAC4 and SMYD3). Hierarchical cluster analysis of baseline expression of 72 genes induced by 5AZA discriminated perfectly between the benign and malignant samples (Fig. 2). The pattern of expression for these 72 genes was highly consistent among the benign breast epithelial cells and more variable among the cancers. Although MCF10A cells clustered with the benign breast epithelial cells, they were recognized as a distinct subclass. Thirty-six of these 72 genes showed reduced baseline expression in the breast cancers as compared with the benign cells (Fig. 2, orange box), but only 20 of these passed the informatics filters required for advancement to the MSP screens. To reduce the probability of overlooking useful genes, we ultimately advanced genes with 4-fold or greater induction by 5AZA in one or more breast cancer cell lines, or 1.5-fold or greater induction by 5AZA in two or more breast cancer cell lines. In total, 645 entities, representing 453 unique genes, were selected from the 5AZA experiment. Of these, 280 passed the informatics filters, but MSP assays could be optimized for only 271 of these.
Unsupervised hierarchical cluster analysis of baseline expression of 72 genes that were significantly upregulated by 5AZA in breast cancer cells lines. The 36 genes with reduced baseline expression in breast cancer cell lines compared with benign cells (orange box) were advanced to the informatics filters. The list was ultimately expanded to include any genes induced 4-fold or greater in one or more breast cancer cell lines, or 1.5-fold or greater in two or more breast cancer cell lines. Cell type: breast cancer cell line,
benign breast epithelial cell primary culture, and
immortalized benign breast epithelial MCF10A cells. Color scale based on Log2 normalized expression.
Unsupervised hierarchical cluster analysis of baseline expression of 72 genes that were significantly upregulated by 5AZA in breast cancer cells lines. The 36 genes with reduced baseline expression in breast cancer cell lines compared with benign cells (orange box) were advanced to the informatics filters. The list was ultimately expanded to include any genes induced 4-fold or greater in one or more breast cancer cell lines, or 1.5-fold or greater in two or more breast cancer cell lines. Cell type: breast cancer cell line,
benign breast epithelial cell primary culture, and
immortalized benign breast epithelial MCF10A cells. Color scale based on Log2 normalized expression.
Characteristics of the 271 5AZA-induced genes advanced to the MSP screens
There were 31 (11.4%) transcription factors represented among the 271 5AZA-induced genes, which is a significant enrichment over the 1,857 (7.4%) transcription factors included among the 25,186 unique Entrez IDs found in the Illumina genome (P = 0.011). This list included only three homeobox genes, HOXB5, HOXD1, and MSX1, which is consistent with the total number of homeobox genes found in the Illumina genome (P = 0.974). Gene ontology analysis showed enrichment for several biologic processes, including response to stimulus (corrected P = 0.013), multiorganism processes (corrected P = 0.006), and developmental processes (corrected P = 0.006). Enrichment for response to stimulus genes was most significant for the response to stress terms (corrected P = 0.0008).
Initial qualitative MSP validation in cultured cells and lymphocytes
The 271 genes identified in the 5AZA experiment were tested by qualitative MSP assays in 10 human breast cancer cell lines, six benign early passage primary HME cultures, and four lymphocyte samples (Supplementary Table S1). These breast cancer cell lines were selected to include ER-positive, ER-negative, HER2-positive, and HER2-negative cancers and did not overlap with the cell lines used in the 5AZA experiment. Because our final implementation is in RP-FNA samples, many markers underwent more extensive testing in lymphocytes. Markers generating any signal in lymphocytes were excluded prior to validation in clinical samples. Ninety-four of these 271 genes were methylated in breast cancer cell lines but not in benign HME cultures (34 of these genes were also methylated in lymphocytes). Of these, 72 did not occur on previously published methylation array-based marker lists (Fig. 3; refs. 22–25, 37). Many common breast cancer methylation markers were absent from our list, including 13 that we have recently been assessing for breast cancer–risk stratification: LOX, BNC1, CFTR, HS3ST2, CTSZ, ESR1, BRCA1, CDH13, RASSF1, CCND2, APC, RAR-β2, and SCGB3A1. Nine of these 13 genes did not meet expression array detection P value thresholds, and the others failed to meet thresholds for 5AZA-induced upregulation. Notably, only six of these 13 genes were among the 1,136 markers identified by Van der Auwera using the Infinium Human Methylation-27 platform (22), and only two were among the 263 identified by Hill using the same platform (24). None occurred on any of the other array-derived lists shown in Fig. 3.
Comparison of gene lists identified by whole-genome methylation screens. Van der Auwera (22), Hill (24), and Fackler (23) used the Infinium Human Methylation-27 platform. Fackler used additional filters based on hormone receptor status of the tumors. The Cancer Genome Atlas data is derived from an Infinium Human Methylation-450 array (37). We selected loci associated with named genes where the mean β was 0.4 or higher for malignant than benign samples (similar to the criteria used by Hill). Kamalakaran (25) used an MspI digestion approach called the Methylation Oligonucleotide Microarray Analysis (MOMA).
Comparison of gene lists identified by whole-genome methylation screens. Van der Auwera (22), Hill (24), and Fackler (23) used the Infinium Human Methylation-27 platform. Fackler used additional filters based on hormone receptor status of the tumors. The Cancer Genome Atlas data is derived from an Infinium Human Methylation-450 array (37). We selected loci associated with named genes where the mean β was 0.4 or higher for malignant than benign samples (similar to the criteria used by Hill). Kamalakaran (25) used an MspI digestion approach called the Methylation Oligonucleotide Microarray Analysis (MOMA).
Qualitative MSP validation in clinical samples
From the 271 5AZA-induced genes assessed by qualitative MSP in breast cancer cell lines and HME cultures, 102 were excluded because of methylation signals in lymphocytes, 99 because no methylation signals could be detected for any cell type, and 21 because of very infrequent methylation in the cancer cell lines. The remaining 49 markers were assessed by qualitative MSP in a panel of fresh-frozen primary breast cancers (15), benign RP-FNA samples from untreated women recently diagnosed with breast cancer (5), and benign RP-FNA samples from women never diagnosed with breast cancer (10). Methylation prevalence was 20% or greater for cancers than benign samples for 33 of these markers and 40% or greater for 15 markers (Supplementary Table S2). Eight of these later markers do not occur on the methylation array-based marker lists included in Fig. 3 (HLA-B, LAT2, FBLN2, VCAN, ADM, FLNC, ARTN, and CLDN1; refs. 22–25, 37).
In silico comparison with The Cancer Genome Atlas data
The TCGA data presents an opportunity to understand the characteristic of our 5AZA-derived marker list in relation to methylation array-derived lists. TCGA data is derived from the Infinium Human Methylation-450 array, which quantifies the level of methylation (β-value) for each of 485,000 CpGs distributed across the genome for 627 breast cancers and 97 benign breast samples. We initially identified 140 loci associated with unique named genes where the mean β was 0.4 or higher for malignant than benign samples (Fig. 3). Of the 33 5AZA genes differentially methylated in cancer as compared with benign samples (tumor minus benign methylation prevalence, ≥0.2), only one occurred among these 140 most differentially methylated TCGA genes. This gene was GNE. Next we expanded the TCGA list to include any probe with a mean β difference greater than 0 for a unique named gene (n = 9,509). This list was ordered by FDR for tumor versus benign differentiation, to determine how many of our 33 differentially methylated genes could have been derived from TCGA data and how far down the list they occur. Fifteen of our 33 differentially methylated 5AZA genes were recognized as differentially methylated in the TCGA data, and their positions on the sorted list ranged from 199 for AKR1B1 (FDR = 8.5 × 10−135) to 6721 for GPX1 (FDR = 1.3 × 10−19). Finally, we identified the TCGA probe closest to our amplicon for the 49 genes assessed by MSP in clinical samples and compared the tumor versus benign discrimination from the MSP data (difference in methylation prevalence) with the TCGA data (difference in mean beta). We were able to identify TCGA probes 2 to 279 base pairs (mean, 71) from the center of our MSP amplicons for each of these 49 genes. There was significant correlation between tumor versus benign discrimination for MSP and TCGA for these 49 markers (Spearman r = 0.406, P = 0.004). In addition, the difference in mean β values for malignant versus benign samples from the TCGA data was 0.085 for the 15 markers with the greatest difference in methylation prevalence by MSP (≥0.4 difference threshold) as compared with only 0.028 for the 16 least discriminatory markers (<0.2 difference threshold, P = 0.025). This suggests that the best 5AZA-derived markers would be identifiable within the TCGA data if the filtering algorithms were generously relaxed.
QM-MSP validation in clinical samples
A subset of the genes with higher methylation prevalence in malignant than benign clinical samples was selected for QM-MSP testing. In order to select genes whose methylation status was largely independent of the methylation status of other genes, hierarchical clustering of MSP methylation results for 15 breast cancers was performed (data not shown). This identified two gene clusters. The first cluster included genes that were methylated in the majority of the cancers (AKRB1, PSAT1, CYP24A1, GBP4, HLA-B, LAT2, CPNE8, and UCHL1) and the second cluster included genes methylated in only a fraction of the cancers and clustering with one to five other genes. Four markers from clusters 1 and 8 distributed across cluster 2 (12 markers total) were selected for QM-MSP validation in a panel of 51 archival primary breast cancer FNAs, 59 archival benign RP-FNA samples from untreated women recently diagnosed with breast cancer, and 145 archival benign RP-FNA samples from women never diagnosed with breast cancer. These genes included WDR66, UCHL1, FBLN2, PSAT1, CLDN1, BIRC3, CYP24A1, CCNA, GNE, CXCL14, CPNE8, and AKR1B1. Of these, all except BIRC3, GNE, and WDR66 were differentially methylated between malignant and paired benign samples with P value between 0 and 0.013 by sum-of-difference analysis (Supplementary Table S3).
Fresh-frozen breast cancer versus breast cancer FNA
Methylation prevalence for these 12 genes was compared between the 15 fresh-frozen primary breast cancers used in the initial MSP screen and the 51 breast cancer FNA samples assessed in the QM-MSP screen using greater than 0.1% as the threshold for classifying a sample as positive, as this is the reported sensitivity of MSP (38). Methylation was identified at a modestly higher frequency in fresh-frozen breast cancer tissue than in breast cancer FNA samples for each of the 12 genes (mean 0.53 vs. 0.43, P = 0.13), and correlation between the samples for methylation prevalence was poor (Spearman r = 0.343, P = 0.298). This may reflect tumor heterogeneity, which is missed by FNA sampling.
QM-MSP validation in an expanded clinical sample set
Four markers with higher levels of methylation in malignant than in paired benign samples were selected for QM-MSP validation in an expanded clinical sample set (CPNE8, PSAT1, CXCL14, and CLDN1). GNE was included as the fifth marker because it had shown a trend for differential methylation in malignant as compared with paired benign samples (P = 0.085), and methylation in benign samples had only been observed for women recently diagnosed with a primary breast cancer. The sample set included 97 breast cancer FNAs (one bilateral breast cancer, Table 1), 104 benign RP-FNA samples from patients with breast cancer, and 223 benign RP-FNA samples from women never diagnosed with breast cancer.
Characteristics of the cancer patients and cancers
Feature . | Value . |
---|---|
Number of patients | 96 |
Number of cancers | 97 |
Patient characteristics | |
Mean age (range) | 56 (31–93) |
Race/ethnicity | |
Asian (%) | 2 (2.1) |
African-American (%) | 25 (25.8) |
Non-Hispanic Caucasian (%) | 63 (64.9) |
Hispanic (%) | 7 (7.2) |
Menopausal status | |
Premenopausal (%) | 35 (36.1) |
Postmenopausal (%) | 62 (63.1) |
Known BRCA gene mutation | 0 |
Tumor characteristics | |
Histology (%) | |
Infiltrating ductal | 81 (83.5) |
Infiltrating lobular | 10 (10.3) |
Mucinous | 2 (2.1) |
Metaplastic | 2 (2.1) |
Medullary | 1 (1.0) |
Small cell | 1 (1.0) |
Grade (%) | |
1 | 19 (19.6) |
2 | 39 (40.2) |
3 | 38 (39.2) |
Unknown | 1 (1.0) |
Associated DCIS (%) | |
Any | 76 (78.4) |
Unknown | 4 (4.1) |
≥25% | 19 (19.6) |
Unknown | 6 (6.2) |
Tumor size (%) | |
T1 | 41 (42.3) |
T2 | 49 (50.5) |
T3 | 7 (7.2) |
Nodal status (%) | |
pN0 | 64 (66.0) |
pN1 | 20 (20.6) |
pN2 | 11 (11.3) |
pN3 | 2 (2.1) |
Biomarkers (%) | |
ER positive | 66 (68.0) |
ER unknown | 1 (1.0) |
PR positive | 64 (66.0) |
PR unknown | 1 (1.0) |
HER2 positive | 9 (9.3) |
HER2 unknown | 1 (1.0) |
Ki67 ≥15% | 61 (62.9) |
Ki67 unknown | 1 (1.0) |
p53 ≥ 10% | 30 (30.9) |
p53 unknown | 4 (4.1) |
Feature . | Value . |
---|---|
Number of patients | 96 |
Number of cancers | 97 |
Patient characteristics | |
Mean age (range) | 56 (31–93) |
Race/ethnicity | |
Asian (%) | 2 (2.1) |
African-American (%) | 25 (25.8) |
Non-Hispanic Caucasian (%) | 63 (64.9) |
Hispanic (%) | 7 (7.2) |
Menopausal status | |
Premenopausal (%) | 35 (36.1) |
Postmenopausal (%) | 62 (63.1) |
Known BRCA gene mutation | 0 |
Tumor characteristics | |
Histology (%) | |
Infiltrating ductal | 81 (83.5) |
Infiltrating lobular | 10 (10.3) |
Mucinous | 2 (2.1) |
Metaplastic | 2 (2.1) |
Medullary | 1 (1.0) |
Small cell | 1 (1.0) |
Grade (%) | |
1 | 19 (19.6) |
2 | 39 (40.2) |
3 | 38 (39.2) |
Unknown | 1 (1.0) |
Associated DCIS (%) | |
Any | 76 (78.4) |
Unknown | 4 (4.1) |
≥25% | 19 (19.6) |
Unknown | 6 (6.2) |
Tumor size (%) | |
T1 | 41 (42.3) |
T2 | 49 (50.5) |
T3 | 7 (7.2) |
Nodal status (%) | |
pN0 | 64 (66.0) |
pN1 | 20 (20.6) |
pN2 | 11 (11.3) |
pN3 | 2 (2.1) |
Biomarkers (%) | |
ER positive | 66 (68.0) |
ER unknown | 1 (1.0) |
PR positive | 64 (66.0) |
PR unknown | 1 (1.0) |
HER2 positive | 9 (9.3) |
HER2 unknown | 1 (1.0) |
Ki67 ≥15% | 61 (62.9) |
Ki67 unknown | 1 (1.0) |
p53 ≥ 10% | 30 (30.9) |
p53 unknown | 4 (4.1) |
Each of these five markers is methylated at a greater frequency in primary breast cancer FNAs than in benign RP-FNAs, and each, except GNE, shows significantly greater methylation in primary cancers than paired benign samples (Table 2). The sum-of-differences permutation method was used to assess the association between methylation of specific genes and various clinical and pathologic features of the patients and tumors (Table 3). Most notably, the PSAT1 methylation was associated with low-grade, low-proliferation, hormone receptor (HR)-positive, lymph node–positive breast cancer in postmenopausal Caucasian women and with infiltrating lobular carcinoma. Conversely, GNE methylation was associated with high-grade, HR-negative breast cancer in younger women. Most of the markers showed greater methylation in tumors from Caucasian women than African-American women, and CXCL14 methylation was associated with HER-2/neu-positive breast cancer. Methylation of CLDN1 in a benign RP-FNA sample was marginally predictive of methylation of CLDN1 in the paired cancer sample (P = 0.058). This was not observed for the other four markers.
Frequency of methylation (≥0.1%) assessed by QM-MSP in primary cancer FNAs and benign RP-FNAs
Symbol . | CA (97) . | Benign (327) . | P value . | SoD CA_B9CA (81) . | P value (SoD) . |
---|---|---|---|---|---|
CPNE8 | 0.54 | 0.14 | <0.0001 | 34.1 | 0 |
PSAT1 | 0.44 | 0.07 | <0.0001 | 20.6 | 0 |
CXCL14 | 0.40 | 0.08 | <0.0001 | 25.7 | 0 |
CLDN1 | 0.36 | 0.11 | <0.0001 | 21.9 | 0.0002 |
GNE | 0.07 | 0.02 | 0.008 | 0.9 | 0.09 |
Symbol . | CA (97) . | Benign (327) . | P value . | SoD CA_B9CA (81) . | P value (SoD) . |
---|---|---|---|---|---|
CPNE8 | 0.54 | 0.14 | <0.0001 | 34.1 | 0 |
PSAT1 | 0.44 | 0.07 | <0.0001 | 20.6 | 0 |
CXCL14 | 0.40 | 0.08 | <0.0001 | 25.7 | 0 |
CLDN1 | 0.36 | 0.11 | <0.0001 | 21.9 | 0.0002 |
GNE | 0.07 | 0.02 | 0.008 | 0.9 | 0.09 |
NOTE: The number in parentheses in the column header is the number of samples tested. SoD CA_B9CA is the sum of difference for methylation frequencies of primary cancers and paired benign samples (see Materials and Methods). Greater numbers indicate higher levels of methylation in greater numbers of cancer samples.
P value (SoD) is the P value for the SoD values computed by a permutation method.
DNA methylation in cancer samples in relation to clinical and pathologic features
Gene . | SoDa . | P valueb . | fdrP valuec . |
---|---|---|---|
Estrogen receptor positive | |||
PSAT1 | 17.71 | 0 | 0 |
GNE | −2.38 | 0.001 | 0.018 |
CPNE8 | 16.35 | 0.007 | 0.06 |
Progesterone receptor positive | |||
PSAT1 | 16.23 | 0 | 0 |
CPNE8 | 18.32 | 0.003 | 0.038 |
GNE | −1.88 | 0.006 | 0.054 |
Caucasian (vs. African American) | |||
PSAT1 | 15.28 | 0.0008 | 0.017 |
CPNE8 | 19.16 | 0.004 | 0.038 |
CXCL14 | 10.22 | 0.059 | 0.198 |
Grade I or II (vs. III) | |||
PSAT1 | 14.68 | 0.0004 | 0.014 |
GNE | −1.89 | 0.004 | 0.038 |
Ki67 <15% | |||
PSAT1 | 14.06 | 0.0008 | 0.017 |
Age >50 | |||
CXCL14 | 14.81 | 0.004 | 0.038 |
GNE | −1.54 | 0.036 | 0.145 |
PSAT1 | 9.10 | 0.037 | 0.145 |
Lobular histology (vs. ductal) | |||
PSAT1 | 17.26 | 0.014 | 0.092 |
CXCL14 | 17.65 | 0.027 | 0.122 |
Postmenopausal | |||
PSAT1 | 10.03 | 0.015 | 0.094 |
HER-2/neu positive | |||
CXCL14 | 19.02 | 0.019 | 0.1 |
Lymph node positive | |||
PSAT1 | 8.75 | 0.042 | 0.155 |
Gene . | SoDa . | P valueb . | fdrP valuec . |
---|---|---|---|
Estrogen receptor positive | |||
PSAT1 | 17.71 | 0 | 0 |
GNE | −2.38 | 0.001 | 0.018 |
CPNE8 | 16.35 | 0.007 | 0.06 |
Progesterone receptor positive | |||
PSAT1 | 16.23 | 0 | 0 |
CPNE8 | 18.32 | 0.003 | 0.038 |
GNE | −1.88 | 0.006 | 0.054 |
Caucasian (vs. African American) | |||
PSAT1 | 15.28 | 0.0008 | 0.017 |
CPNE8 | 19.16 | 0.004 | 0.038 |
CXCL14 | 10.22 | 0.059 | 0.198 |
Grade I or II (vs. III) | |||
PSAT1 | 14.68 | 0.0004 | 0.014 |
GNE | −1.89 | 0.004 | 0.038 |
Ki67 <15% | |||
PSAT1 | 14.06 | 0.0008 | 0.017 |
Age >50 | |||
CXCL14 | 14.81 | 0.004 | 0.038 |
GNE | −1.54 | 0.036 | 0.145 |
PSAT1 | 9.10 | 0.037 | 0.145 |
Lobular histology (vs. ductal) | |||
PSAT1 | 17.26 | 0.014 | 0.092 |
CXCL14 | 17.65 | 0.027 | 0.122 |
Postmenopausal | |||
PSAT1 | 10.03 | 0.015 | 0.094 |
HER-2/neu positive | |||
CXCL14 | 19.02 | 0.019 | 0.1 |
Lymph node positive | |||
PSAT1 | 8.75 | 0.042 | 0.155 |
aSum of differences (see Materials and Methods). A positive value indicates that the listed clinical or pathologic feature is associated with DNA methylation.
bCalculated using the permutation method described in the Materials and Methods section.
cFalse discovery rate–corrected P value.
Discussion
Whole-genome array-based approaches have identified hundreds of potential methylation markers in breast cancer (22–25, 37) using probes specific for individual CpGs found near named genes. Studies using the same methylation platform will generally identify different markers depending on the samples tested and the filtering criteria used. These marker lists will be even more divergent for different platforms (Fig. 3). Methylation arrays examine individual CpGs. One highly methylated CpG does not necessarily equate to dense CpG methylation in a region critical for regulation of gene expression. Indeed, for a subset of genes, CpG methylation is a consequence of transcriptional repression and not a cause as evidenced by failure of 5AZA treatment to induce expression (39).
In order to identify DNA methylation markers with a high probability of regulating gene expression, we started by identifying genes expressed at higher levels in benign breast epithelial cells than breast cancer cells and then selected genes that were induced by 5AZA treatment in the cancer cells. Two phases of MSP screening were used to identify 33 markers differentially methylated in clinical breast cancer samples as compared with clinical benign samples without generating signals in lymphocytes (Supplementary Table S2, δCA-B9 ≥ 0.2). These are breast cancer-relevant methylation markers optimized for FNA samples.
The clinical relevance of a subset of these markers was demonstrated in an expanded clinical sample set using QM-MSP. Consistent with other studies (19, 23, 24, 26, 28, 40–44), DNA methylation was observed more frequently in HR-positive than HR-negative breast cancer, and this difference was most marked for PSAT1, CPNE8, UCHL1, and AKR1B1. Notably, PSAT1 methylation was associated with low-grade, HR-positive, lymph node–positive breast cancer in postmenopausal women and infiltrating lobular cancer. In addition, GNE methylation was associated with high-grade, HR-negative breast cancer and CXCL14 methylation with HER2-positive breast cancer.
Only 12 of the 33 5AZA-selected genes we identified as differentially methylated in malignant as compared with benign clinical samples were also identified by methylation array studies (CYP24A1, NBL1, CPNE8, AKR1B1, UCHL1, HBA2, GSTP1, GBP4, PSAT1, IRF7, PYCARD, and GNE; refs. 22–25, 37). In addition, many common tumor suppressor genes well-documented to be regulated by promoter-region hypermethylation were missed by our 5AZA approach and by methylation array approaches (e.g., BRCA1, CFTR, CTSZ, ESR1, LOX, RAR-β, and SCGB3A1) although CCND2, APC, RASSF1, CDH13, BNC1, and HS3ST2 do occur among the 1,136 genes identified by Van der Auwera using the Infinium Human Methylation-27 platform. These genes were missed by our 5AZA approach because of failure to meet expression level or quality thresholds on the initial gene expression array as described in the Results section. Methylation array data are generally filtered based on mean β-scores for individual CpGs in order to retain genes where neighboring CpGs are methylated for the majority of DNA copies in the majority of samples. This will overlook genes that are methylated in a relatively small subpopulation of cells or methylated in only a small subset of tumors. Methylation-specific PCR, used for methylation marker discovery in the past and for initial validation in our study, will generate a signal when only 0.1% of DNA copies are methylated (38). This may permit recognition of minor cell populations important for tumor biology and maintenance. In this regard, the sum-of-difference permutation method we described for analysis of quantitative methylation data essentially compares the “sizes” of methylated populations between two groups and is ideal for recognizing differences in the relative abundance of minor cell populations.
The clinical utility of methylation markers in breast cancer has not yet been well established, and it is unclear whether the greatest value will come from loci methylated in major cell populations in most cancers, or from genes regulated by promoter methylation in minor cell populations. Our primary interest is in RP-FNA markers for breast cancer risk stratification. For this application, identification of minor cell populations (e.g., tumor-initiating cells) may be particularly relevant. However, our data suggest that tumor FNA samples may not capture the entire methylation spectrum identifiable in fresh-frozen tumor tissue. Additional work is required to recognize the markers most relevant to tumor initiation, to validate tumor FNA as an adequate sampling approach, and to map relevant methylation changes in benign breast tissue.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: D. Bu, C. Lewis, A.F. Gazdar, D.M. Euhus
Development of methodology: D. Bu, C. Lewis, A.F. Gazdar, D.M. Euhus
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C. Lewis, V. Sarode, A. Lazorowitz, R. Rao, M. Leitch, V. Andrews, A.F. Gazdar, D.M. Euhus
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D. Bu, C. Lewis, M. Chen, X. Ma, A. Moldrem, A.F. Gazdar, D.M. Euhus
Writing, review, and/or revision of the manuscript: C. Lewis, V. Sarode, R. Rao, V. Andrews, A.F. Gazdar, D.M. Euhus
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): D. Bu, C. Lewis, A. Moldrem, D.M. Euhus
Study supervision: D.M. Euhus
Acknowledgments
The BRCA1-methylated cell line, UACC3199, was kindly provided by the Arizona Cancer Center, Tucson, Arizona.
Grant Support
This work was financially supported by the Department of Defense Breast Cancer Research Program, contract number W81XWH-07-1-0262, to D. M. Euhus.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.