Abstract
Methods to determine individualized breast cancer risk lack sufficient sensitivity to select women most likely to benefit from preventive strategies. Alterations in DNA methylation occur early in breast cancer. We hypothesized that cancer-specific methylation markers could enhance breast cancer risk assessment. We evaluated 380 women without a history of breast cancer. We determined their menopausal status or menstrual cycle phase, risk of developing breast cancer (Gail model), and breast density and obtained random fine-needle aspiration (rFNA) samples for assessment of cytopathology and cumulative methylation index (CMI). Eight methylated gene markers were identified through whole-genome methylation analysis and included novel and previously established breast cancer detection genes. We performed correlative and multivariate linear regression analyses to evaluate DNA methylation of a gene panel as a function of clinical factors associated with breast cancer risk. CMI and individual gene methylation were independent of age, menopausal status or menstrual phase, lifetime Gail risk score, and breast density. CMI and individual gene methylation for the eight genes increased significantly (P < 0.001) with increasing cytological atypia. The findings were verified with multivariate analyses correcting for age, log (Gail), log (percent density), rFNA cell number, and body mass index. Our results demonstrate a significant association between cytological atypia and high CMI, which does not vary with menstrual phase or menopause and is independent of Gail risk and mammographic density. Thus, CMI is an excellent candidate breast cancer risk biomarker, warranting larger prospective studies to establish its utility for cancer risk assessment. Cancer Prev Res; 9(8); 673–82. ©2016 AACR.
Introduction
Current tools to determine an individual woman's risk of developing breast cancer are insensitive. Biomarkers have the potential to improve the accuracy of risk assessment models based on family and personal health history in selecting women likely to develop breast cancer. Risk assessment today consists of models such as Gail and is based on endocrine factors, family history, and other parameters. The prime age for risk assessment and preventive interventions is 35 to 60 years (1, 2). The effect of endocrine variations (e.g., endogenous in perimenopausal women or exogenous caused by hormone replacement therapy in postmenopausal women) on biomarkers is unknown and must be understood before hormone concentrations can be used for either risk assessment or as surrogate endpoints in phase II chemoprevention studies. Mammographic breast density is also a risk factor of breast cancer but is not regularly included in risk assessment models (3).
Current literature suggests that the presence of ductal or lobular atypical hyperplasia is associated with a 3- to 4-fold risk for both ductal carcinoma in situ (DCIS) and invasive ductal carcinomas (4–6), and the absolute risk is higher with the cumulative incidence approaching 30% at 25 years (6). Random periareolar fine needle aspiration (rFNA) of the breast and cytologic examination of the retrieved cells is emerging as a powerful minimally invasive tool for assessing short-term breast cancer risk in asymptomatic women and for tracking response to chemopreventive intervention (7–11). Fabian and colleagues (7) reported a 5-fold risk of breast cancer with a median of 45 months associated with the presence of cytologic atypia in women at high risk for a new breast cancer. Cytologic evaluation however is subjective and dependent on the expertise of the pathologist. An objective, molecular test would provide a valuable adjunct to cytology and would improve the discriminatory performance of present risk assessment models, not only in women currently identified as high risk, but also in those who are missed by present models.
Molecular markers have the potential to detect early changes in the breast that are predictive of future breast cancer occurrence and to allow selection of women who will benefit most from risk reduction strategies. Methylation-mediated silencing of tumor suppressor genes can contribute to the establishment and maintenance of the malignant phenotype (12, 13). High levels of DNA methylation occur in invasive and in situ carcinomas (14–19), asymptomatic breasts of high risk compared with average risk women (15–17, 20), and in normal breast tissue 2 to 4 cm distant to tumor (16, 21, 22). We use the quantitative multiplex methylation-specific PCR (QM-MSP) technique that quantifies methylation levels for up to 12 genes in the same aliquot of DNA (14, 23) and have successfully assayed as few as 50 cells obtained through ductal lavage, nipple fluid aspiration, rFNA, or biopsy of solid tissue. Using QM-MSP, one copy of methylated DNA is detected in 100,000 copies of unmethylated DNA (14, 20, 21, 23, 24).
Since DNA methylation changes appear early in disease (16, 21, 22), we hypothesized that detection of hypermethylated genes in rFNA can identify patients with increased cancer risk among women at average risk of developing breast cancer. Furthermore, DNA cumulative methylation index (CMI) of a panel of markers in breast tissue will not vary significantly by menstrual phase or menopausal status and would independently reveal subjects at risk not identified with other clinical factor assessment strategies. To test these hypotheses, we initiated a multicenter cohort study of rFNA in healthy women of normal and high risk in which we examined the relationships between CMI in rFNA samples with endocrine states (menstrual phase and menopause) and correlated CMI and individual methylated genes with known breast cancer risk factors including lifetime Gail risk score and breast density.
Materials and Methods
Eligibility
Women 35 to 60 years of age were eligible if they had no personal history of breast cancer, no recent breast biopsy, and were at least 5 years from any previous cancer (with the exception of non-melanoma skin cancer or cervical carcinoma in situ). Women with implants or history of breast radiation were excluded. A recent, negative 2-view mammogram within approximately 3 months prior to enrollment was required. Recruitment was focused to enroll similar numbers of pre- and postmenopausal women, as determined by serum concentrations of follicular-stimulating hormone (FSH), progesterone, and estradiol. Menstrual phase was determined on the basis of estradiol and progesterone levels. Prior use of endocrine agents for breast cancer prevention was allowed if used for less than 12 months, at least 2 years previously. Oral contraceptive, hormone replacement therapy, or vaginal/topical hormonal preparations were not permitted within 3 months of study enrollment; concurrent use of daily aspirin, nonsteroidal anti-inflammatory drugs (NSAIDs), fish oil supplements (omega-3 fatty acids), multivitamins, or vitamin C and E were not allowed within 2 weeks of the rFNA procedure. Participants had to understand and sign a written informed consent approved by the Institutional Review Boards (IRB) of participating institutions.
Subject recruitment
Subjects were recruited through the Lynn Sage Breast Imaging and Surgery Centers and the Bluhm Family Program for Breast Cancer Early Detection and Prevention (part of the Lynn Sage Breast Center) at Northwestern University and through the Sidney Kimmel Comprehensive Cancer Center and Avon Breast Center at Johns Hopkins (Baltimore, MD). Women seen for screening mammography, diagnostic imaging, evaluation of benign breast problems, or for breast cancer risk evaluation were offered participation. Women were also recruited through the Dr. Susan Love Research Foundation's Army of Women. The Love/Avon Army of Women volunteers (25) were sent e-mails providing information about the study and study participation, and the study was also advertised in media presentations. All recruitment material was submitted and approved by the local IRB prior to implementation.
Study design
Eligible participants were scheduled for an in-person clinic visit where study procedures, in most cases, occurred in a single visit (Consort Flow Diagram, Fig. 1). Participants completed personal and medical history questionnaires and vital signs. Blood for hormone levels and rFNA cells were collected following a breast exam and mammogram (if required to be repeated for the study). Nipple aspirate fluid (NAF) and serum were stored for future studies. The participants were also consented to allow follow-up for up to 10 years for updates about breast health, any breast biopsies and development of breast cancer to directly determine association between gene methylation in rFNA and risk.
Gail risk assessment
We collected data through patient-completed questionnaires, confirmed with clinical data wherever available, and estimation of breast cancer risk was performed using the Gail model version at the NCI website (26). This model generates 5-year and lifetime probabilities of breast cancer diagnosis based on age, age at menarche, age at first-term pregnancy, number of first-degree relatives with breast cancer, number of biopsies, and whether atypia was present on any biopsy.
Mammographic breast density
Bilateral breast density was measured by 2-view mammography, performed less than 14 weeks prior to the study visit. If a bilateral mammogram was 14 to 26 weeks prior to the study visit, a single LCC (left craniocaudal) view mammogram was performed; if the routine mammogram was performed more than 26 weeks previously, a repeat bilateral 2-view mammogram was required. Full-field digital mammography (FFDM) was used in the majority of women. If a recent mammogram was a film-screen study, this was digitized using the Vidar DiagnosticPRO Advantage Film Digitizer. Only the LCC view from the mammograms was used for the assessment of breast density.
We used CUMULUS software and followed the method of Byng and colleagues (27). Assessment of each digital mammogram was performed in 5 steps: (i) displayed the digital mammogram, (ii) determined the area of the entire breast (excluding lesion markers and possible pectoralis major muscle along the chest wall) by summing the number of pixels within the entire breast (BA), (iii) determined a threshold signal value that separates background fat from fibroglandular regions, (iv) determined the area of fibroglandular tissues (FA) within the breast area by summing the number of pixels below the threshold signal value, and (v) determined the 2-dimensional breast density by calculating the ratio FA/BA (percent density, PD). PD values from each view of both breasts were averaged to determine mean PD for each subject.
Blood samples
We obtained approximately 28 mL of whole blood from each participant for determination of estradiol, progesterone, and FSH levels to confirm menstrual/menopausal status. Blood was also collected for future DNA extraction. Samples were stored frozen at −80°C.
rFNA
rFNA was performed on each breast, as previously described (7, 11). Briefly, buffered lidocaine was infiltrated into the skin, then the parenchyma of each breast in 2 locations (upper inner and upper outer periareolar). The rFNA procedure was performed with a 21-G needle attached to a 10-cc syringe with 4 to 5 passes at each location and 8 to 10 passes per breast. In a small number of cases at Johns Hopkins, ultrasound guidance was used to target dense areas of breast tissue. Following the procedure, a cold pack was applied to the breast and the subjects were asked to wear a firm sports bra with compression wraps.
The cells retrieved from each breast were pooled, maintained on ice, and delivered as soon as possible to the laboratories of Drs. Khan and Sukumar for processing. Each sample was split equally and placed into either Cytolyte (for cytology and DNA extraction for QM-MSP analysis) or into PBS (for future hormone studies). ThinPrep slides were prepared for cytomorphologic evaluation.
Cytologic examination
All slides were assessed for cytomorphologic examination blindly by a single cytopathologist (C.M. Zalles) who assigned a categorical assessment of nonproliferative, hyperplasia, borderline hyperplasia with atypia, or hyperplasia with atypia, as well as a Masood semiquantitative index score (7–11, 28).
Selection of markers and quantitative analysis of methylation
The 8 genes used in this study were methylated gene markers of breast cancer: RASSF1 (RASSF1A), CCND2 (Cyclin D2), RARB (RAR-beta), SCGBA1 (HIN1), and TWIST1 (TWIST1), AKR1B1, TM6SF1and TMEFF2 (14, 21, 23, 24). All 8 markers were also stably methylated through tumor progression (29, 30). On the basis of their frequent occurrence in preneoplastic as well as invasive breast cancer, for the current study, we selected this panel of markers as most likely to be hypermethylated in the early stages of transformation of normal breast epithelial cells. DNA was extracted from rFNA cells, treated with sodium bisulfite, and analyzed by QM-MSP (14, 23) as described previously. Individual gene methylation (M) was calculated as |${\rm{\% }}\,{\rm{M}}\,{\rm{gene}} = {\frac{{({\# \ methylated\,copies})}}{{({\# \ methylated + unmethylated\,copies} )}}}[ {100} ]$|. CMI = sum of % M for all genes.
Statistical considerations
The primary study objectives were to determine whether gene DNA methylation varied by age, menstrual phase, or menopausal status and to correlate CMI with established risk factors including Gail risk score, mammographic density, and cytomorphology. In our study, these clinical factors followed a non-Gaussian distribution according to the D'Agostino and Pearson omnibus K normality test (31). We used nonparametric Spearman correlation to examine dependency among CMI/risk factors as continuous variables. Univariate analyses were performed using the Wilcoxon Mann–Whitney rank test when two groups were compared and Kruskal–Wallis tests were performed when more than 2 groups were compared. Before performing multivariate linear regression analysis, a common Box–Cox transformation (32) was applied to reduce the overall skewedness of gene methylation variables, and the log transformation was applied to breast percent density, Masood score, and Gail score. The Box–Cox transform takes the form (yλ − 1)/λ which is frequently used to correct normality and linearity. Since all gene methylation variables had highly skewed distributions, we searched all possible λ values and selected λ = 0.2 that gives relative smaller skewness across all gene methylation variables. Results were considered significant at P ≤ 0.05 for clinical factors and cumulative methylation scores. Significance was set at P ≤ 0.006 after Bonferroni correction for multiple comparisons of the 8 genes. Receiver operating characteristic (ROC) analyses were used to characterize the performance of the 8-gene panel through estimating the area under the ROC curve (AUC), sensitivity, specificity, classification accuracy, and likelihood ratio along with the 95% confidence intervals. All statistical analyses were performed using RStudio version 0.98.953 –2009–2013 RStudio, Inc. and GraphPad Prism version 5.04 for Windows, GraphPad Software.
Results
Patient characteristics
From August 2008 to September 2011, 382 women were enrolled in the study and 2 women subsequently were excluded because of unevaluable mammograms. Of 380 evaluable women, 200 women were from Johns Hopkins and 180 women were from Northwestern University. The majority of participants at both centers (71% from Johns Hopkins and 57% from Northwestern University) were recruited through 3 Army of Women “e-blasts.” Patient characteristics are summarized in Table 1. Median 5-year and lifetime risk of breast cancer using the Gail model were 1.4 and 12.7, respectively. Sixty of 378 women (15.9%) had a lifetime risk of ≥20. To evaluate breast density, BI-RADS scores and percent glandular density were independently calculated. BI-RADS 3 or 4 was observed in 233 of 378 (62%) women, slightly higher than previously reported values (33).
Patient characteristics
Patient characteristics . | Number . |
---|---|
Age, y | 50 (35–60) |
Race | N = 380 |
Caucasian | 327 (86.1%) |
African-American | 40 (10.5%) |
Other | 13 (3.4%) |
Menopausal status (measure by hormones) | N = 366 |
Pre | 168 (45.9%) |
Post | 198 (54.1%) |
Body Mass Index (BMI) | N = 380 |
Median (range) | 26.8 (18.7–51.3) |
Low–Normal (≤25) | 144 (37.9%) |
High (>25) | 213 (62.1%) |
Menstrual phase | N = 166 |
Follicular | 53 (31.9%) |
Mid-cycle | 67 (40.4%) |
Luteal | 46 (27.7%) |
Lifetime risk (Gail model) | N = 378 |
Median (range) | 12.7 (5.6–54.1) |
<10 | 112 (29.6%) |
10–19 | 206 (54.5%) |
≥20 | 60 (15.9%) |
Breast percent density | N = 379 |
Median percent density (range) | 18.1 (2.5–72.8) |
Percent density | |
<5 | 15 (4%) |
5–24 | 264 (70%) |
25–49 | 92 (24%) |
50–75 | 8 (2%) |
>75 | 0 (0%) |
Cytology, standard | N = 380 |
Insufficient | 26 (6.8%) |
Benign | 207 (54.5%) |
Borderline atypical | 8 (2.1%) |
Atypical/Indeterminate | 136 (35.8%) |
Suspicious | 3 (0.8%) |
Masood cytology score | N = 354 |
Median (range) | 13 (7–18) |
Normal (6–10) | 30 (8.5%) |
Hyperplasia, benign (11–14) | 192 (54.2%) |
Hyperplasia, atypical (15–18) | 132 (37.3%) |
Patient characteristics . | Number . |
---|---|
Age, y | 50 (35–60) |
Race | N = 380 |
Caucasian | 327 (86.1%) |
African-American | 40 (10.5%) |
Other | 13 (3.4%) |
Menopausal status (measure by hormones) | N = 366 |
Pre | 168 (45.9%) |
Post | 198 (54.1%) |
Body Mass Index (BMI) | N = 380 |
Median (range) | 26.8 (18.7–51.3) |
Low–Normal (≤25) | 144 (37.9%) |
High (>25) | 213 (62.1%) |
Menstrual phase | N = 166 |
Follicular | 53 (31.9%) |
Mid-cycle | 67 (40.4%) |
Luteal | 46 (27.7%) |
Lifetime risk (Gail model) | N = 378 |
Median (range) | 12.7 (5.6–54.1) |
<10 | 112 (29.6%) |
10–19 | 206 (54.5%) |
≥20 | 60 (15.9%) |
Breast percent density | N = 379 |
Median percent density (range) | 18.1 (2.5–72.8) |
Percent density | |
<5 | 15 (4%) |
5–24 | 264 (70%) |
25–49 | 92 (24%) |
50–75 | 8 (2%) |
>75 | 0 (0%) |
Cytology, standard | N = 380 |
Insufficient | 26 (6.8%) |
Benign | 207 (54.5%) |
Borderline atypical | 8 (2.1%) |
Atypical/Indeterminate | 136 (35.8%) |
Suspicious | 3 (0.8%) |
Masood cytology score | N = 354 |
Median (range) | 13 (7–18) |
Normal (6–10) | 30 (8.5%) |
Hyperplasia, benign (11–14) | 192 (54.2%) |
Hyperplasia, atypical (15–18) | 132 (37.3%) |
Breast cancer risk factors: lifetime Gail risk, breast density, and Masood score in rFNA
To provide a framework for our study, we compared our cohort to the general population of healthy women in terms of breast cancer risk. Considering clinical factors as continuous variables, we observed significant inverse correlation between age and rFNA cytomorphology (P < 0.0001, Masood; Table 2). Consistent with the pattern in the general population, an inverse correlation between age and percent breast density (P = 0.005) was observed. Lifetime Gail risk estimates among individuals in our cohort positively correlated with the mammographic percent density (r = 0.166; P = 0.002; PD) and Masood cytology score (r = 0.156; P = 0.004; Table 2). Among women in our study, PD positively correlated with the Masood cytology score (P = 0.0004); thus, women with denser breasts were more likely to have cytological atypia. Cytomorphology was evaluable in 354 of 380 rFNA samples. Twenty-six of 380 (6.8%) of samples were insufficient, as they either had fewer than 100 epithelial cells or an uninterpretable cellular morphology. Among all samples, using the categorical descriptor of cytology, 207 of 380 (54.5%) had benign or borderline-normal cytology and 147 of 380 (38.7%) had atypical cytology, including 8 borderline-atypical samples and 3 suspicious for malignancy (Table 1). These latter 3 women were investigated per clinical standards, no cancer was found, and they have not subsequently developed breast cancer. Using Masood scores, 132 of 354 (37.3%) of rFNA samples had cytologic atypia, demonstrating concordance with standard cytology. Among women for whom both Gail risk and Masood scores were available (N = 353), 57 of 353 (16.1%) had lifetime risk of ≥20%, 132 of 353 (37.4%) had cytological atypia, and 28 of 353 (7.9%) had both high lifetime probability by the Gail model and cytologic atypia (Masood ≥15).
Correlation between clinical risk factors within the study cohort
. | Lifetime Gail risk . | Percent density . | Masood score . |
---|---|---|---|
Number of sample pairs | 356 | 356 | 332 |
Spearman r | −0.256 | −0.15 | −0.246 |
95% CI | −0.353 to −0.153 | −0.253 to −0.0441 | −0.348 to −0.139 |
P (two-tailed) | < 0.0001 | 0.005 | < 0.0001 |
Lifetime Gail assessment | Age | Percent density | Masood score |
Number of sample pairs | 356 | 355 | 331 |
Spearman r | −0.256 | 0.166 | 0.156 |
95% CI | −0.353 to −0.153 | 0.0596–0.268 | 0.0461–0.263 |
P (two-tailed) | < 0.0001 | 0.002 | 0.004 |
Percent density | Age | Lifetime Gail assess | Masood score |
Number of sample pairs | 356 | 355 | 331 |
Spearman r | −0.15 | 0.166 | 0.195 |
95% CI | −0.253 to −0.0441 | 0.0596–0.268 | 0.0855–0.299 |
P (two-tailed) | 0.005 | 0.002 | 0.0004 |
Masood score | Age | Lifetime Gail assess | Percent density |
Number of sample pairs | 332 | 331 | 331 |
Spearman r | −0.246 | 0.156 | 0.195 |
95% CI | −0.348 to −0.139 | 0.0461–0.263 | 0.0855–0.299 |
P (two-tailed) | < 0.0001 | 0.004 | 0.0004 |
. | Lifetime Gail risk . | Percent density . | Masood score . |
---|---|---|---|
Number of sample pairs | 356 | 356 | 332 |
Spearman r | −0.256 | −0.15 | −0.246 |
95% CI | −0.353 to −0.153 | −0.253 to −0.0441 | −0.348 to −0.139 |
P (two-tailed) | < 0.0001 | 0.005 | < 0.0001 |
Lifetime Gail assessment | Age | Percent density | Masood score |
Number of sample pairs | 356 | 355 | 331 |
Spearman r | −0.256 | 0.166 | 0.156 |
95% CI | −0.353 to −0.153 | 0.0596–0.268 | 0.0461–0.263 |
P (two-tailed) | < 0.0001 | 0.002 | 0.004 |
Percent density | Age | Lifetime Gail assess | Masood score |
Number of sample pairs | 356 | 355 | 331 |
Spearman r | −0.15 | 0.166 | 0.195 |
95% CI | −0.253 to −0.0441 | 0.0596–0.268 | 0.0855–0.299 |
P (two-tailed) | 0.005 | 0.002 | 0.0004 |
Masood score | Age | Lifetime Gail assess | Percent density |
Number of sample pairs | 332 | 331 | 331 |
Spearman r | −0.246 | 0.156 | 0.195 |
95% CI | −0.348 to −0.139 | 0.0461–0.263 | 0.0855–0.299 |
P (two-tailed) | < 0.0001 | 0.004 | 0.0004 |
Incidence of gene DNA methylation in rFNA
Methylation is an early event in breast cancer (21, 22) and has been observed to be higher-than-normal in breast epithelial cells with atypia (9, 16, 17, 20). We examined DNA methylation by QM-MSP in the rFNA samples (n = 354 rFNA samples; median yield of cells following enumeration of stained cytology slides was 700 epithelial cells (range: 0–74,000). DNA methylation in AKR1B1, TM6SF1, TMEFF2, RASSF1, CCND2, HIN1, RARB, and TWIST1 genes was examined. Among individual genes, the median methylation was low (0%–2%), but the range of methylation was large (0%–30% methylation; Fig. 2, Supplementary Table S1). Similarly, CMI median values were low (CMI = 8 units), but the range was large (CMI = 2–96 units; Fig. 2, Supplementary Table S1).
Gene methylation in rFNA of healthy women. QM-MSP was performed to quantitate the level of methylation for 8 individual genes, as indicated on x-axis. Methylation (M) was calculated as % M gene = |${\frac{{( {\# \ methylated\,copies} )}}{{( {\# \ methylated + unmethylated\,copies} )}}} [ {100} ]$| (using left y-axis) and CMI = sum of %M for all genes within the panel (right y-axis). Median methylation is indicated by the bars, and each symbol in the scatter plot represents a sample.
Gene methylation in rFNA of healthy women. QM-MSP was performed to quantitate the level of methylation for 8 individual genes, as indicated on x-axis. Methylation (M) was calculated as % M gene = |${\frac{{( {\# \ methylated\,copies} )}}{{( {\# \ methylated + unmethylated\,copies} )}}} [ {100} ]$| (using left y-axis) and CMI = sum of %M for all genes within the panel (right y-axis). Median methylation is indicated by the bars, and each symbol in the scatter plot represents a sample.
Association between DNA methylation and menopause or menstrual cycle
Recent results suggest that the DNA methyltransferases, DNMT3A, DNMT3B, and DNMT1, are under the regulation of female sex steroid hormones during the menstrual cycle (34, 35) implying that methylation could fluctuate cyclically. For DNA methylation to be a powerful marker of risk, it must not change as a function of estradiol, progesterone, or FSH levels (36, 37). Using univariate analysis, no association was observed between CMI and menopausal status (P = 0.715), but RARB gene methylation was significantly higher (P = 0.004) in postmenopausal women (Supplementary Table S2). No association was observed between CMI and menstrual phase (P = 0.289), follicular, luteal, or mid-cycle phases measured by estradiol and progesterone levels, nor did individual genes vary significantly (Supplementary Table S3). Therefore, for this panel of markers, interactions with menopausal and menstrual phase status did not need to be considered when evaluating DNA methylation as a potential independent risk factor.
Correlation between age and DNA methylation
Among the rFNA samples studied, analyzing parameters as continuous variables, cumulative methylation did not correlate with subject age (Spearman r = −0.048, P = 0.367; Table 3). Individual gene methylation did not correlate with age with the exception of RARB [Spearman r = 0.172, P = 0.001, 95% confidence interval (CI), 0.067–0.274]. Moreover, age was not correlated with most genes within our panel (Table 3).
Correlation between Cumulative Methylation and clinical risk factors, and between age or Masood cytology score and individual gene methylation
N | Spearman r (95% CI) | P-value | |
Correlation between Cumulative Methylation and Clinical Risk Factors | |||
Age | 357 | −0.048 (−0.154 to 0.059) | 0.367 |
Lifetime Gail Assessment | 356 | 0.005 (−0.103 to 0.111) | 0.933 |
Percent Density | 356 | 0.0966 (−0.011 to 0.202 | 0.069 |
Masood Score | 332 | 0.385 (0.287 to 0.476) | < 0.0001 |
Correlation between Age and Individual Gene Methylation | |||
RASSF1 | 357 | −0.085 (−0.190 to 0.022) | 0.1102 |
TWIST1 | 357 | −0.041 (−0.148 to 0.066) | 0.436 |
CCND2 | 355 | −0.040 (−0.015 to 0.068) | 0.455 |
HIN1 | 357 | −0.036 (−0.142 to 0.071) | 0.496 |
RARB | 357 | 0.172 (0.067 to 0.274) | 0.001 |
AKR1B1 | 357 | −0.119 (−0.223 to 0.012) | 0.024 |
TM6SF1 | 357 | 0.024 (−0.083 to 0.130 | 0.656 |
TMEFF2 | 357 | −0.003 (−0.110 to 0.104) | 0.953 |
Correlation between Masood Cytology and Individual Gene Methylation | |||
RASSF1 | 332 | 0.413 (0.316 to 0.501) | < 0.0001 |
TWIST1 | 332 | 0.301 (0.195 to 0.397) | < 0.0001 |
CCND2 | 331 | 0.464 (0.372 to 0.547) | < 0.0001 |
HIN1 | 332 | 0.312 (0.208 to 0.408) | < 0.0001 |
RARB | 332 | −0.0927 (−0.201 to 0.083) | 0.092 |
AKR1B1 | 332 | 0.253 (0.146 to 0.354) | < 0.0001 |
TM6SF1 | 332 | 0.228 (0.121 to 0.331) | < 0.0001 |
TMEFF2 | 332 | 0.025 (−0.086 to 0.135) | 0.652 |
N | Spearman r (95% CI) | P-value | |
Correlation between Cumulative Methylation and Clinical Risk Factors | |||
Age | 357 | −0.048 (−0.154 to 0.059) | 0.367 |
Lifetime Gail Assessment | 356 | 0.005 (−0.103 to 0.111) | 0.933 |
Percent Density | 356 | 0.0966 (−0.011 to 0.202 | 0.069 |
Masood Score | 332 | 0.385 (0.287 to 0.476) | < 0.0001 |
Correlation between Age and Individual Gene Methylation | |||
RASSF1 | 357 | −0.085 (−0.190 to 0.022) | 0.1102 |
TWIST1 | 357 | −0.041 (−0.148 to 0.066) | 0.436 |
CCND2 | 355 | −0.040 (−0.015 to 0.068) | 0.455 |
HIN1 | 357 | −0.036 (−0.142 to 0.071) | 0.496 |
RARB | 357 | 0.172 (0.067 to 0.274) | 0.001 |
AKR1B1 | 357 | −0.119 (−0.223 to 0.012) | 0.024 |
TM6SF1 | 357 | 0.024 (−0.083 to 0.130 | 0.656 |
TMEFF2 | 357 | −0.003 (−0.110 to 0.104) | 0.953 |
Correlation between Masood Cytology and Individual Gene Methylation | |||
RASSF1 | 332 | 0.413 (0.316 to 0.501) | < 0.0001 |
TWIST1 | 332 | 0.301 (0.195 to 0.397) | < 0.0001 |
CCND2 | 331 | 0.464 (0.372 to 0.547) | < 0.0001 |
HIN1 | 332 | 0.312 (0.208 to 0.408) | < 0.0001 |
RARB | 332 | −0.0927 (−0.201 to 0.083) | 0.092 |
AKR1B1 | 332 | 0.253 (0.146 to 0.354) | < 0.0001 |
TM6SF1 | 332 | 0.228 (0.121 to 0.331) | < 0.0001 |
TMEFF2 | 332 | 0.025 (−0.086 to 0.135) | 0.652 |
Lack of correlation between lifetime Gail risk and DNA methylation
There was no significant correlation between lifetime Gail risk and cumulative methylation using Spearman statistics to evaluate these factors as continuous variables (CMI, P = 0.933; Table 3). Neither were these associations significant by multivariate analyses (P = 0.384; Supplementary Table S4), after adjusting for age, body mass index (BMI), and log (PD) and cell number. Similarly, individual gene methylation was not significantly correlated with lifetime Gail risk (Table 3, Supplementary Table S4). Therefore, no association between Gail risk estimates and DNA methylation was found.
Lack of correlation between mammographic breast density and DNA methylation
No significant correlation between PD and CMI was observed using Spearman analysis (P = 0.069; Table 3) nor was a significant association evident with multivariate analysis, after adjusting for age, BMI, and log (Gail risk) (P = 0.248) and cell number (Supplementary Table S5). Similarly, for individual genes no significant association was observed (Supplementary Table S5). We concluded that breast density is less likely to be affected by gene methylation.
Correlation between cytomorphology and DNA methylation
We observed a moderate and highly significant correlation between Masood cytology scores and CMI (P < 0.0001; Spearman coefficient = 0.385, 95% CI, 0.287–0.476; Table 3). Masood score and CMI also correlated for most individual genes, as evaluated by Spearman analyses: AKR1B1 (P < 0.0001), TM6SF1 (P < 0.0001), CCND2 (P < 0.0001), RASSF1 (P < 0.0001), TWIST1 (P < 0.0001), HIN1 (P < 0.0001; Table 3) and by multivariate analyses, after adjustment for age, BMI, log (PD), and log (Gail) and cell number (Supplementary Table S6). This was clearly demonstrated by visualizing CMI as a function of Masood score. A significant difference was observed in methylation between samples having Masood scores ranging from 11 to 14 (hyperplasia, benign) versus scores of 15 to 18 (hyperplasia, atypical; P < 0.0001, Mann–Whitney; Fig. 3A). A steady incremental increase in CMI occurred as a function of increasing Masood scores (P < 0.0001, Kruskal–Wallis), shown in scatter plots (Fig. 3A). Individual gene methylation also increased incrementally with increasing Masood score and was most profound among proliferative benign samples (Masood 11–14) for CCND2 (Fig. 3B) and more specifically related to atypical hyperplasia (Masood 15–18) for RASSF1, AKR1B1, TM6SF1, TWIST1, and HIN1 (Fig. 3B). In contrast, RARB (P = 0.092) and TMEFF2 (P = 0.652) methylation did not correlate with Masood (Fig. 3B). ROC analysis of Masood 8–14 (benign) versus Masood 15–18 (atypia, hyperplasia) showed an AUC = 0.698 (P < 0.0001). At a CMI cutoff of 6.55 units, the sensitivity was 39.8%, specificity was 83.8%, and likelihood ratio was 2.46.
Association between DNA methylation and Masood cytology score. CMI was plotted as a function of Masood score for each rFNA sample. A, association between Masood score and CMI: Box plots show statistically significant increases in median cumulative methylation (y-axis) as a function of increasing Masood score (x-axis). Scatter plots indicate the cumulative methylation level for the 8-gene panel for each rFNA sample (y-axis) and Masood score (x-axis). Significant differences observed between groups (P < 0.0001; Kruskal–Wallis). B, association between Masood score and individual gene methylation. Represented for each gene is the median and interquartile range of the % methylation (y-axis) within each Masood scoring group, indicated on the x-axis, where the identity of each gene(s) is indicated in the title. Kruskal–Wallis P is shown. C, hierarchical cluster analysis. Two-dimensional hierarchical cluster analysis was performed plotting rank (Spearman) dissimilarity for genes (columns) and rFNA samples (rows). Two principal sample clusters were observed: Cluster 1 contained 67% of the rFNA samples, that included all the samples with normal cytology (mainly in Cluster 1A; Masood 8–10), 79.8% of samples with benign cytology (Masood 11–14), and 43.8% of samples with atypical cytology (Masood ≥ 15). Cluster 2 was enriched in samples with atypia (56.3% of samples with atypical cytology) and depleted of samples with benign cytology (20% of samples). Cluster 2A was defined mainly by samples with high HIN1, RASSF1, and TWIST1, whereas Cluster 2B was defined mainly by samples with high AKR1B1 and TM6SF1 methylation.
Association between DNA methylation and Masood cytology score. CMI was plotted as a function of Masood score for each rFNA sample. A, association between Masood score and CMI: Box plots show statistically significant increases in median cumulative methylation (y-axis) as a function of increasing Masood score (x-axis). Scatter plots indicate the cumulative methylation level for the 8-gene panel for each rFNA sample (y-axis) and Masood score (x-axis). Significant differences observed between groups (P < 0.0001; Kruskal–Wallis). B, association between Masood score and individual gene methylation. Represented for each gene is the median and interquartile range of the % methylation (y-axis) within each Masood scoring group, indicated on the x-axis, where the identity of each gene(s) is indicated in the title. Kruskal–Wallis P is shown. C, hierarchical cluster analysis. Two-dimensional hierarchical cluster analysis was performed plotting rank (Spearman) dissimilarity for genes (columns) and rFNA samples (rows). Two principal sample clusters were observed: Cluster 1 contained 67% of the rFNA samples, that included all the samples with normal cytology (mainly in Cluster 1A; Masood 8–10), 79.8% of samples with benign cytology (Masood 11–14), and 43.8% of samples with atypical cytology (Masood ≥ 15). Cluster 2 was enriched in samples with atypia (56.3% of samples with atypical cytology) and depleted of samples with benign cytology (20% of samples). Cluster 2A was defined mainly by samples with high HIN1, RASSF1, and TWIST1, whereas Cluster 2B was defined mainly by samples with high AKR1B1 and TM6SF1 methylation.
A minimal marker panel consisting of 5 genes most specific to high Masood scores was evaluated (Supplementary Fig. S1). Significantly higher methylation was observed in cytologically atypical (P < 0.0001) compared with benign rFNA samples, shown by the box plots and a line plot of median CMI versus Masood score (with interquartile range). ROC analysis of Masood 8–14 (benign) versus Masood 15–18 (atypia, hyperplasia) showed a higher, more specific AUC = 0.726 (P < 0.0001). At a CMI cutoff of 4.05 units, the sensitivity was 36.7%, specificity was 86.3%, and likelihood ratio was 2.68 (Supplementary Fig. S1). Therefore, with the 5-gene methylation panel, there was an improvement over the 8-gene panel in the ability to specifically discriminate between benign samples and samples with atypia, albeit with a slight loss in sensitivity.
Correlations between gene methylation and standard cytology were consistent with those found with the continuous, numerical Masood score, with significantly higher CMI observed in atypical than benign rFNA samples (Supplementary Table S7).
Correlation between epigenetic markers within the methylation panel
Two-dimensional hierarchical cluster analysis of the raw methylation values was performed. The samples grouped into 2 main clusters (Fig. 3C; calculations based on Spearman dissimilarity), with the normal/benign samples enriched in Cluster 1, and atypical samples enriched in Cluster 2. To investigate the gene to gene relationship between the epigenetic markers in the panel, a Spearman correlation matrix was generated (Supplementary Table S8). Considering significance to be P ≤ 0.006 after Bonferroni adjustment, TWIST1 and TM6SF1 significantly correlated with all other markers. RASSF1, CCND2, AKR1B1, and HIN1 significantly correlated with nearly all the other makers including each other. RARB and TMEFF2 significantly correlated with the fewest genes which included each other.
Assessment of gene methylation in rFNA over time
To evaluate the agreement between the baseline and the 6-month methylation measures, we performed QM-MSP on 2 samples of rFNA collected 6 months apart from each of 15 patients in the placebo arm of a clinical trial (11). We dichotomized all genes on the basis of whether the methylation levels were above or below the corresponding medians measured at the same time point. Cohen κ was calculated for each gene and for the CMI. The Kappa was not evaluable for gene TMEFF2 because the observed methylation level was non-zero only from one observation, and thus its variance was not estimable. Except AKR1B1 that did not demonstrate agreement, the kappa statistics for all other genes and CMI showed fair-to-moderate agreement (κ = 0.182–0.571; Supplementary Table S9). Because of small sample size (n = 15), we were not able to assess the significance of these analyses.
Discussion
We report results from a study that was carefully designed to establish the baseline characteristics of a quantitative methylation assay of a panel of genes detected in DNA from cells retrieved by rFNA from healthy breasts of women unselected by risk for breast cancer. These data will form the foundation of a larger study that establishes the value of these promising risk indicators for the identification of women at high risk for breast cancer, regardless of their risk profile or hormonal status. Our major findings demonstrate that CMI of our selected gene panel has many of the desirable features required of a risk marker: It does not vary by age, menopausal status, or menstrual phase; and it is independent of calculated risk using standard breast cancer risk factors and of mammographic density. We did, however, find a significant correlation between presence of hypermethylated genes in rFNA and cytologic atypia. These findings support our hypothesis that analysis of cancer-specific hypermethylated genes, along with cytology, has the potential to identify individuals with increased cancer risk among women at average risk of developing breast cancer. In this initial study, we have demonstrated the robustness of CMI of our select panel of genes as a candidate individualized risk indicator in breast epithelium.
Our study has several strengths. First, we enrolled women regardless of estimated risk, since current models discriminate poorly at the individual level (38). All studies of rFNA reported to date have recruited women already known to be at high risk for breast cancer by Gail criteria (7, 8, 15), abnormal cytology (39), or presence of a germline BRCA1 or BRCA2 mutation (17). The vast majority (about 80%) of women who develop breast cancer are not at high risk due to family history or BRCA1/2 mutation, and 70% to 80% have never had a breast biopsy (40). Therefore, the majority of women who develop breast cancer have either weak or no identifiable risk factors. Our study results address this reality by including all interested women, regardless of risk status. Indeed, 84% the women in our study were at average risk (5-year risk of 1.7%) according to Gail criteria.
Second, for the first time, we combine a facile method to collect cells and to determine whether methylated gene markers have the potential, in the future, to provide a molecular test to assess risk in the average woman. Our large dual-center prospective study consisted of women at average risk of developing breast cancer who underwent rFNA of their healthy breasts. The Love-Avon Army of Women website enabled brisk enrollment. Recruitment was highly successful, volunteers were well informed, highly motivated, and compliant, and the ratio of eligible to ineligible was very high. Among the minimally invasive/noninvasive techniques currently under investigation for retrieving cells from the breast such as nipple aspiration, ductal lavage, and rFNA (24, 41), rFNA has several advantages. The rFNA procedure is feasible and informative in women regardless of risk for breast cancer. The advantages of rFNA compared with other methods of cell retrieval from the breast include the use of simple tools, the reproducibly high yields of cells amenable to cytologic and molecular analyses, and the ability for repeated sampling over several years of follow-up.
Breast cancer risk is currently assessed by the Gail model (42) on the basis of age, first-degree family history, the number of surgical biopsies of the breast and reproductive factors such as age of menarche and of first-term pregnancy, and the number of surgical biopsies of the breast. This last factor is particularly weighty if atypical hyperplasia is observed in the core biopsy or surgically resected breast tissue (6, 43). The inclusion of mammographic density into standard risk prediction models is associated with a surprisingly modest improvement in risk prediction, considering the large increase in risk observed in women with extremely dense breasts compared with those with predominantly fatty breasts (44–46). However, with the exception of atypical hyperplasia and very dense breasts, the risk associated with each of these factors is modest. Even the strongest risk factors of atypia (47, 48) and dense breasts (49) identify many women who will never develop breast cancer, leading to a search for molecular markers that would increase both the sensitivity and the specificity of present risk assessment methods.
In breast samples obtained using rFNA, molecular markers such as RNAs, miRNAs, methylated genes, and proteins offer objective readouts and can be detected with great sensitivity and specificity (reviewed in ref. 41). In this study, we assessed multigene methylation quantitatively (expressed cumulatively as CMI and %M for individual genes) by QM-MSP. QM-MSP was feasible in all rFNA samples, regardless of low or not evaluable cellularity as assessed by cytopathology. We found that CMI did not differ by menstrual or menopausal status and correlated strongly with cytological atypia as determined by Masood scoring and by standard cytology. Both stability across the menstrual cycles and over time, and a strong correlation with cytologic atypia implies that CMI has the potential to be a robust breast cancer risk indicator, which may reflect similar features as cytologic atypia but is quantitative and objective and has a larger dynamic range. In a pilot study of baseline and the 6-month methylation measures in samples of rFNA collected 6 months apart from a small cohort of 15 patients (11), we found the CMI of most genes showed fair-to-moderate but not strong agreement (Supplementary Table S9), stressing the importance of examining the question of reproducibility over time in greater depth. These findings emphasize the need for follow-up of the women in this study and larger studies with known cancer outcomes to establish the utility of CMI for cancer risk assessment.
We conclude that the CMI of the 8-gene panel (or even as few as 5 genes) has the potential to provide an ancillary test to cytology or Masood score. The Masood score offers numerical evaluation that is more amenable to determining whether there is value to adding CMI assessment to routine cytological evaluation of rFNAs. Future validation studies and follow-up of the current cohort over time for incident breast cancer will reveal the value of quantitative methylation measurement in rFNA for risk assessment.
Disclosure of Potential Conflicts of Interest
V. Stearns reports receiving other commercial research support from Abbvie, Celgene, Medimmune, Merck, Novartis, Pfizer, and Puma. M.J. Fackler and S. Sukumar report receiving commercial research grant from, has ownership interest (including patents) in, and is a consultant/advisory board member of Cepheid. Z.L. Bujanda has ownership interest in patent application for cMethDNA method (no financial interest). No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: V. Stearns, M.J. Fackler, L.K. Jacobs, S.A. Khan, S. Sukumar
Development of methodology: V. Stearns, M.J. Fackler, Z.L. Bujanda, R.T. Chatterton, L.K. Jacobs, C. Shehata, S.A. Khan, S. Sukumar
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): V. Stearns, M.J. Fackler, R.T. Chatterton, L.K. Jacobs, N.F. Khouri, K. Kenney, C. Shehata, S.C. Jeter, J.A. Wolfman, C.M. Zalles, S.A. Khan, S. Sukumar
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): V. Stearns, M.J. Fackler, S. Hafeez, C. Shehata, P. Huang, S.A. Khan, S. Sukumar
Writing, review, and/or revision of the manuscript: V. Stearns, M.J. Fackler, Z.L. Bujanda, R.T. Chatterton, L.K. Jacobs, C. Shehata, S.C. Jeter, C.M. Zalles, P. Huang, S.A. Khan, S. Sukumar
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): V. Stearns, M.J. Fackler, S. Hafeez, Z.L. Bujanda, D. Ivancic, K. Kenney, C. Shehata, S.C. Jeter, S.A. Khan, S. Sukumar
Study supervision: V. Stearns, M.J. Fackler, C. Shehata, S.A. Khan, S. Sukumar
Other (performed breast density analyses): J.A. Wolfman
Grant Support
This study was supported by a grant from Avon Foundation #02-2011-108 (to S.A. Khan, V. Stearns, and S. Sukumar), N01-CN-35157 (to S.A. Khan), Breast Cancer Research Foundation (to V. Stearns) and the SKCCC Core grant P30 CA006973 (to V. Stearns and S. Sukumar).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.