Abstract
Purpose: Previously, we found that gene expression in histologically normal breast epithelium (NlEpi) from women at high breast cancer risk can resemble gene expression in NlEpi from cancer-containing breasts. Therefore, we hypothesized that gene expression characteristic of a cancer subtype might be seen in NlEpi of breasts containing that subtype.
Experimental Design: We examined gene expression in 46 cases of microdissected NlEpi from untreated women undergoing breast cancer surgery. From 30 age-matched cases [15 estrogen receptor (ER)+, 15 ER−] we used Affymetryix U133A arrays. From 16 independent cases (9 ER+, 7 ER−), we validated selected genes using quantitative real-time PCR (qPCR). We then compared gene expression between NlEpi and invasive breast cancer using four publicly available data sets.
Results: We identified 198 genes that are differentially expressed between NlEpi from breasts with ER+ (NlEpiER+) compared with ER− cancers (NlEpiER−). These include genes characteristic of ER+ and ER− cancers (e.g., ESR1, GATA3, and CX3CL1, FABP7). qPCR validated the microarray results in both the 30 original cases and the 16 independent cases. Gene expression in NlEpiER+ and NlEpiER− resembled gene expression in ER+ and ER− cancers, respectively: 25% to 53% of the genes or probes examined in four external data sets overlapped between NlEpi and the corresponding cancer subtype.
Conclusions: Gene expression differs in NlEpi of breasts containing ER+ compared with ER− breast cancers. These differences echo differences in ER+ and ER− invasive cancers. NlEpi gene expression may help elucidate subtype-specific risk signatures, identify early genomic events in cancer development, and locate targets for prevention and therapy. Clin Cancer Res; 17(2); 236–46. ©2010 AACR.
This article is featured in Highlights of This Issue, p. 201
We find that gene expression differs in histologically normal epithelium of breasts containing ER+ compared with ER− breast cancers. These differences reflect the gene expression differences in ER+ compared with ER− invasive cancers. Explanations of these findings include an effect of the extracellular environment, a field effect, predisposition to particular tumor subtypes due to inherited susceptibility genes, or detection of breast cancer subtype-specific genomic changes prior to histologic abnormality. Regardless of which explanation is correct, normal epithelium gene expression profiles could help define a breast cancer subtype-specific risk signature, identify the initial genomic differences between subtypes, and suggest new targets for prevention and therapy, which is especially important for ER− cancers.
Introduction
Breast cancer is a heterogeneous disease. One well-documented and clinically important dichotomizing characteristic of breast cancers is expression (or not) of estrogen receptor–α (ER). ER expression is of major importance in breast cancer prevention, treatment, and outcome (1–3), and may even help define a cancer's cell of origin (4).
Many of the genomic aberrations present in each subtype of invasive breast cancers can also be detected in earlier lesions, such as carcinoma in situ and even hyperplastic lesions (for review see ref. 5), suggesting that these aberrations are important to breast cancer initiation or early progression. Unexpectedly, alterations—genetic, epigenetic, gene expression, and protein—have been found in histologically normal breast tissue, but their biological significance and clinical utility are poorly understood (refs. 6–16; for review see ref. 17). They may mark increased risk, or be evidence of a field effect (e.g., due to an exposure or to an occult dysregulation of a gene or pathway). They could reveal breast cancer's earliest genomic changes, they could be random effects, or they could be a response to a tumor existing in the breast. Although challenging to address, a better understanding of the changes in histologically normal tissue should provide insight into breast cancer risk, initiation, and early progression.
Recently, we found that gene expression in histologically normal breast epithelium (NlEpi) in women with breast cancers could be distinguished from NlEpi gene expression in women without breast cancer who were undergoing reduction mammoplasty (11, 18), and that NlEpi gene expression in women at high risk of breast cancer undergoing prophylactic mastectomy resembles NlEpi gene expression in women with breast cancer (18). This suggests that aberrant NlEpi gene expression may indicate increased breast cancer risk by showing either the cancer's earliest genomic changes, or the influence of the microenvironment on the epithelium. These findings led us to hypothesize that specific gene expression profiles might characterize NlEpi associated with particular subtypes. To test this hypothesis, the goals of this study were, first, to compare gene expression in NlEpi in breasts containing 2 cancer subtypes, ER+ and ER−, and, second, to ask whether these gene expression profiles mirrored the gene expression profiles in ER+ and ER− breast cancers from independent cases. If this were true, then NlEpi expression could help define a risk signature for specific cancer subtypes, identify genomic features distinguishing the subtypes early in cancer development, and suggest new targets for subtype-specific prevention and therapy, which is of particular importance for ER− cancers.
Materials and Methods
Case selection
All cases were obtained using an IRB-approved protocol for collection of de-identified breast tissue not required for diagnosis. For microarray analysis, cases were randomly selected from women with ER+ or ER− tumors undergoing cancer surgery. No patient had undergone chemotherapy or radiation treatment prior to tissue acquisition. Each ER+ case was age matched (within 2 years) to an ER− case, to account for effects of age on gene expression (refs. 19–21; Table 1). Seventeen of 30 cases had been used in other studies (11, 18). For quantitative real-time PCR (qPCR) validation of the array data, an independent set of 16 cases (9 ER+ and 7 ER−) was selected using the same criteria as described earlier in the text (Supplementary Table S1).
Patients with ER+ tumors (n = 15) . | Patients with ER− tumors (n = 15) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
Sample . | Age (y) . | NlEpi distance from tumor (cm) . | ER/PR/HER2c . | Staged . | Sample . | Age (y) . | NlEpi distance from tumor (cm) . | ER/PR/HER2c . | Staged . |
351Ha | 35 | <2 | +/+/− | IIA | 319Hb | 34 | <2 | −/−/− | IIIA |
297BHa | 46 | <2 | +/+/− | IIIC | 364Hb | 47 | >2 | −/−/+ | IIIA |
359Ha | 48 | <2 | +/+/NA | I | 289Hb | 47 | <2 | −/−/− | IIA |
304BHa,b | 49 | <2 | +/+/+ | I | 342Hb | 48 | <2 | −/−/− | IIA |
248Ha | 49 | >2 | +/+/− | I | 273Hb | 49 | <2 | −/−/+ | III |
345H | 58 | >2 | +/+/− | IIB | 272Hb | 58 | >2 | −/−/+ | IIIC |
388AHb | 58 | <2 | +/−/− | I | 322Hb | 58 | <2 | −/−/− | I |
232Hb | 59 | >2 | +/+/NA | 0 | 226Ha,b | 61 | <2 | −/−/+ | IIIA |
258Ha | 65 | <2 | +/+/− | IIA | 452H | 63 | <2 | −/−/− | I |
405H | 67 | >2 | +/+/− | III | 298H | 67 | <2 | −/−/− | IIB |
311H | 69 | >2 | +/+/− | IIA | 429H | 69 | <2 | −/−/− | IIA |
453H | 76 | >2 | +/−/− | II | 333Hb | 76 | >2 | −/−/+ | IIB |
394H | 81 | >2 | +/+/− | IIA | 413H | 79 | <2 | −/−/− | I |
275H | 85 | <2 | +/+/− | IIIA | 296H | 82 | <2 | −/−/− | I |
228H | 92 | <2 | +/−/− | IIB | 401H | 94 | <2 | −/−/− | IIIA |
Patients with ER+ tumors (n = 15) . | Patients with ER− tumors (n = 15) . | ||||||||
---|---|---|---|---|---|---|---|---|---|
Sample . | Age (y) . | NlEpi distance from tumor (cm) . | ER/PR/HER2c . | Staged . | Sample . | Age (y) . | NlEpi distance from tumor (cm) . | ER/PR/HER2c . | Staged . |
351Ha | 35 | <2 | +/+/− | IIA | 319Hb | 34 | <2 | −/−/− | IIIA |
297BHa | 46 | <2 | +/+/− | IIIC | 364Hb | 47 | >2 | −/−/+ | IIIA |
359Ha | 48 | <2 | +/+/NA | I | 289Hb | 47 | <2 | −/−/− | IIA |
304BHa,b | 49 | <2 | +/+/+ | I | 342Hb | 48 | <2 | −/−/− | IIA |
248Ha | 49 | >2 | +/+/− | I | 273Hb | 49 | <2 | −/−/+ | III |
345H | 58 | >2 | +/+/− | IIB | 272Hb | 58 | >2 | −/−/+ | IIIC |
388AHb | 58 | <2 | +/−/− | I | 322Hb | 58 | <2 | −/−/− | I |
232Hb | 59 | >2 | +/+/NA | 0 | 226Ha,b | 61 | <2 | −/−/+ | IIIA |
258Ha | 65 | <2 | +/+/− | IIA | 452H | 63 | <2 | −/−/− | I |
405H | 67 | >2 | +/+/− | III | 298H | 67 | <2 | −/−/− | IIB |
311H | 69 | >2 | +/+/− | IIA | 429H | 69 | <2 | −/−/− | IIA |
453H | 76 | >2 | +/−/− | II | 333Hb | 76 | >2 | −/−/+ | IIB |
394H | 81 | >2 | +/+/− | IIA | 413H | 79 | <2 | −/−/− | I |
275H | 85 | <2 | +/+/− | IIIA | 296H | 82 | <2 | −/−/− | I |
228H | 92 | <2 | +/−/− | IIB | 401H | 94 | <2 | −/−/− | IIIA |
RNA extraction and microarray hybridization
Tissue preparation, microdissection, RNA extraction, amplification, hybridization, and normalization were completed as described previously (11, 18). Briefly, tissues were snap frozen, embedded in optimal cutting temperature embedding medium, sectioned at 10 μm, stained with hematoxylin and eosin (50% diluted with H20) and then histologically normal epithelium—both terminal ductal-lobular units (TDLU) and ducts—were identified and microdissected (Supplementary Fig. S1). Most NlEpi samples (n = 20) were “tumor-adjacent” (i.e., located 1–2 cm from the tumor, on blocks lacking malignant cells). Some NlEpi samples (n = 10) lay further away, but still in the same quadrant as the tumor, because most surgeries were lumpectomies. Great care was taken to avoid microdissecting any proliferative cells, even simple hyperplastic lesions. Most NlEpi samples consisted only of TDLUs, but approximately 20% of samples (mainly from older patients) contained some ducts, as well. RNA was extracted using the PicoPure extraction kit (Molecular Devices). For cases undergoing microarray analysis, the RNA was amplified, and gene expression was measured using the Affymetrix U133A chip (Affymetrix), a technique that yields reliable and reproducible results (22). .cel files were processed with MAS 5.0 using standard procedures for quality control and normalization was limited to rescaling each sample to a mean intensity of 200. The microarray data from these samples are available from the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo) under accession GSE21947.
Identification of differentially expressed probes
Probes with less than 20% detectable hybridization and samples with a scaling factor greater than 10 were removed. Gene expression data of the probes that passed the quality control filters were analyzed using the method known as Bayesian Analysis of Differential Gene Expression (BADGE; ref. 23) to identify probes that were differentially expressed in NlEpi of patients with ER+ compared with ER− cancers. We have used this approach previously (18, 24). BADGE uses a model-averaging approach to identify probes with different expression in 2 biological conditions and scores the evidence of differential expression by the probability that the fold change of expression is greater than 1 or less than 1. The probability score is then calculated as 1 minus this probability so that the smaller the probability score, the stronger is the evidence of differential expression. The method has a very large sensitivity but low specificity with small sample sizes; with 20 samples per group the sensitivity to detect a fold change of 2 or more can be 100% but the specificity can be less that 70% (23). Therefore, to reduce the chance of false positives, we used an extrinsic leave-one-out cross-validation implemented in BADGE to select those probes that showed robust changes of expression between groups (25). Leave-one-out cross-validation consisted of removing one case at a time from the data set and using the remaining samples to detect the probes with differential expression with a false discovery rate (FDR) less than 5%, chosen to trade off sensitivity and specificity. Probes selected 80% or more of the time were included in the final list of differentially expressed genes and the final probability scores and fold changes are based on all samples (Table 2). Heatmaps were generated using the package HeatPlus from Bioconductor and simple hierarchical clustering was used to cluster samples based on their expression profiles.
Gene . | ER+:ER− foldb . | Gene . | ER+:ER− foldb . | Gene . | ER+:ER− foldb . | Gene . | ER+:ER− foldb . |
---|---|---|---|---|---|---|---|
UGT2B28 | 15.65 | BUB3 | 2.28 | CUL4B | 2.08 | ANPEP | 0.40 |
CXCL13 | 11.76 | SFRS1 | 2.28 | ADCY1 | 2.08 | IGHG1 | 0.40 |
GRIA2 | 5.52 | CTSO | 2.27 | ATP9A | 2.07 | PTGDS | 0.40 |
TNFSF11 | 4.41 | SAV1 | 2.27 | ZNF451 | 2.07 | SAT1 | 0.40 |
TFPI2 | 4.22 | ALOX15B | 2.27 | YWHAZ | 2.07 | KRT23 | 0.40 |
EREG | 4.20 | FUSIP1 | 2.26 | WAC | 2.07 | COL6A1 | 0.39 |
SLC26A3 | 4.18 | SHQ1 | 2.26 | TM9SF2 | 2.04 | S100A4 | 0.39 |
TSPAN8 | 4.13 | CCDC47 | 2.25 | CD74 | 0.49 | RARRES1 | 0.39 |
CYP24A1 | 4.11 | SERHL2 | 2.24 | LSR | 0.49 | DEFB1 | 0.39 |
CPB1 | 4.06 | TCP1 | 2.24 | FKBP11 | 0.48 | CD14 | 0.38 |
NAIP | 4.05 | COIL | 2.23 | C1QA | 0.48 | S100A11 | 0.38 |
MYBPC1 | 3.92 | NA | 2.23 | SNRPD2 | 0.48 | PTGDS | 0.38 |
STC1 | 3.87 | PI15 | 2.22 | IFITM1 | 0.47 | CX3CL1 | 0.37 |
BMP5 | 3.71 | LONP2 | 2.21 | S100A11 | 0.47 | GSTA1 | 0.37 |
GPR144 | 3.49 | ZBTB44 | 2.21 | BCL3 | 0.47 | SH3BGRL3 | 0.37 |
CLCA2 | 3.41 | XRCC5 | 2.21 | STAG2 | 0.47 | PSPH | 0.37 |
HPGD | 3.04 | FYCO1 | 2.20 | RPL13 | 0.47 | TGFB2 | 0.37 |
NEDD4L | 2.94 | ZFC3H1 | 2.20 | GMFG | 0.47 | MAF | 0.36 |
SERHL | 2.93 | ERBB4 | 2.19 | APOC1 | 0.47 | PLTP | 0.36 |
HMGCS2 | 2.90 | GOLGA8B | 2.19 | GBP1 | 0.47 | G0S2 | 0.35 |
STK32B | 2.86 | FRY | 2.19 | CITED2 | 0.47 | HSPG2 | 0.35 |
TRIB2 | 2.81 | GATA3 | 2.19 | LTB | 0.46 | PI3 | 0.35 |
BAT1 | 2.80 | ERMAP | 2.18 | TNFAIP2 | 0.46 | RARRES1 | 0.34 |
C14ORF132 | 2.63 | NSF | 2.18 | PDZK1IP1 | 0.46 | RPS26 | 0.33 |
ZNF167 | 2.60 | SIDT1 | 2.17 | H2AFX | 0.46 | S100A6 | 0.33 |
GYG2 | 2.59 | RAB3GAP2 | 2.17 | ATP5E | 0.46 | NA | 0.33 |
KCTD7 | 2.55 | GFRA1 | 2.16 | IGKC | 0.46 | NOV | 0.32 |
PNMAL1 | 2.54 | GPRC5B | 2.16 | CRABP2 | 0.45 | VGLL1 | 0.31 |
TTLL5 | 2.52 | ISG20L2 | 2.16 | NA | 0.45 | THBS1 | 0.31 |
PDZK1 | 2.52 | BLCAP | 2.16 | CIITA | 0.45 | IGHA2 | 0.30 |
DIO2 | 2.50 | GNB1 | 2.16 | TRBC1 | 0.45 | CCL19 | 0.30 |
GPLD1 | 2.50 | ABAT | 2.15 | TRAF3IP3 | 0.45 | IGKC | 0.30 |
LYST | 2.48 | STC2 | 2.15 | CX3CL1 | 0.45 | LOC100293440 | 0.29 |
HSP90AB1 | 2.45 | IL20RA | 2.15 | RPL36 | 0.45 | IGLc | 0.29 |
FGFR3 | 2.44 | TTC19 | 2.15 | CD24 | 0.45 | IGKC | 0.29 |
LANCL1 | 2.43 | TCF7L2 | 2.15 | SPP1 | 0.44 | CCL5 | 0.28 |
NA | 2.41 | CEP68 | 2.15 | CCL8 | 0.44 | IGLc | 0.27 |
MAP4K5 | 2.41 | CPD | 2.14 | LTF | 0.44 | UBD | 0.27 |
AKR1D1 | 2.40 | MME | 2.14 | FOSL2 | 0.44 | NA | 0.25 |
PEX11B | 2.40 | YTHDF1 | 2.13 | LOC91316 | 0.44 | IGLc | 0.25 |
SERINC3 | 2.36 | KLF7 | 2.13 | C10ORF81 | 0.43 | MUC7 | 0.22 |
FAM188A | 2.33 | AHNAK | 2.12 | FABP4 | 0.43 | LOC100291464 | 0.21 |
PSMB1 | 2.33 | NA | 2.12 | JUNB | 0.43 | LYZ | 0.21 |
TBC1D8B | 2.32 | MTUS1 | 2.12 | FABP7 | 0.43 | MGC29506 | 0.19 |
TRIM37 | 2.32 | ADRA2A | 2.11 | RPL28 | 0.43 | S100A8 | 0.19 |
HEY1 | 2.32 | TRIM36 | 2.11 | CXCL14 | 0.42 | IGLc | 0.18 |
MSMB | 2.31 | PTPRZ1 | 2.11 | LIF | 0.41 | CCL2 | 0.18 |
RAB3GAP1 | 2.31 | MCCC2 | 2.11 | ID2 | 0.41 | S100A9 | 0.18 |
NPY2R | 2.30 | ESR1 | 2.10 | C4BPA | 0.41 | CD52 | 0.17 |
AXL | 2.30 | ITPR1 | 2.10 | TFEB | 0.41 | IGKC | 0.17 |
SPRY1 | 2.30 | ADAM12 | 2.10 | AIF1 | 0.40 | IGLL3 | 0.13 |
ABLIM1 | 2.30 | PIP4K2A | 2.09 | CCL3 | 0.40 | IGHM | 0.11 |
STC1 | 2.30 | RYBP | 2.09 | TOP2A | 0.40 | IGLc | 0.11 |
RAB28 | 2.29 | POGK | 2.08 | STAB1 | 0.40 | CSN2 | 0.09 |
Gene . | ER+:ER− foldb . | Gene . | ER+:ER− foldb . | Gene . | ER+:ER− foldb . | Gene . | ER+:ER− foldb . |
---|---|---|---|---|---|---|---|
UGT2B28 | 15.65 | BUB3 | 2.28 | CUL4B | 2.08 | ANPEP | 0.40 |
CXCL13 | 11.76 | SFRS1 | 2.28 | ADCY1 | 2.08 | IGHG1 | 0.40 |
GRIA2 | 5.52 | CTSO | 2.27 | ATP9A | 2.07 | PTGDS | 0.40 |
TNFSF11 | 4.41 | SAV1 | 2.27 | ZNF451 | 2.07 | SAT1 | 0.40 |
TFPI2 | 4.22 | ALOX15B | 2.27 | YWHAZ | 2.07 | KRT23 | 0.40 |
EREG | 4.20 | FUSIP1 | 2.26 | WAC | 2.07 | COL6A1 | 0.39 |
SLC26A3 | 4.18 | SHQ1 | 2.26 | TM9SF2 | 2.04 | S100A4 | 0.39 |
TSPAN8 | 4.13 | CCDC47 | 2.25 | CD74 | 0.49 | RARRES1 | 0.39 |
CYP24A1 | 4.11 | SERHL2 | 2.24 | LSR | 0.49 | DEFB1 | 0.39 |
CPB1 | 4.06 | TCP1 | 2.24 | FKBP11 | 0.48 | CD14 | 0.38 |
NAIP | 4.05 | COIL | 2.23 | C1QA | 0.48 | S100A11 | 0.38 |
MYBPC1 | 3.92 | NA | 2.23 | SNRPD2 | 0.48 | PTGDS | 0.38 |
STC1 | 3.87 | PI15 | 2.22 | IFITM1 | 0.47 | CX3CL1 | 0.37 |
BMP5 | 3.71 | LONP2 | 2.21 | S100A11 | 0.47 | GSTA1 | 0.37 |
GPR144 | 3.49 | ZBTB44 | 2.21 | BCL3 | 0.47 | SH3BGRL3 | 0.37 |
CLCA2 | 3.41 | XRCC5 | 2.21 | STAG2 | 0.47 | PSPH | 0.37 |
HPGD | 3.04 | FYCO1 | 2.20 | RPL13 | 0.47 | TGFB2 | 0.37 |
NEDD4L | 2.94 | ZFC3H1 | 2.20 | GMFG | 0.47 | MAF | 0.36 |
SERHL | 2.93 | ERBB4 | 2.19 | APOC1 | 0.47 | PLTP | 0.36 |
HMGCS2 | 2.90 | GOLGA8B | 2.19 | GBP1 | 0.47 | G0S2 | 0.35 |
STK32B | 2.86 | FRY | 2.19 | CITED2 | 0.47 | HSPG2 | 0.35 |
TRIB2 | 2.81 | GATA3 | 2.19 | LTB | 0.46 | PI3 | 0.35 |
BAT1 | 2.80 | ERMAP | 2.18 | TNFAIP2 | 0.46 | RARRES1 | 0.34 |
C14ORF132 | 2.63 | NSF | 2.18 | PDZK1IP1 | 0.46 | RPS26 | 0.33 |
ZNF167 | 2.60 | SIDT1 | 2.17 | H2AFX | 0.46 | S100A6 | 0.33 |
GYG2 | 2.59 | RAB3GAP2 | 2.17 | ATP5E | 0.46 | NA | 0.33 |
KCTD7 | 2.55 | GFRA1 | 2.16 | IGKC | 0.46 | NOV | 0.32 |
PNMAL1 | 2.54 | GPRC5B | 2.16 | CRABP2 | 0.45 | VGLL1 | 0.31 |
TTLL5 | 2.52 | ISG20L2 | 2.16 | NA | 0.45 | THBS1 | 0.31 |
PDZK1 | 2.52 | BLCAP | 2.16 | CIITA | 0.45 | IGHA2 | 0.30 |
DIO2 | 2.50 | GNB1 | 2.16 | TRBC1 | 0.45 | CCL19 | 0.30 |
GPLD1 | 2.50 | ABAT | 2.15 | TRAF3IP3 | 0.45 | IGKC | 0.30 |
LYST | 2.48 | STC2 | 2.15 | CX3CL1 | 0.45 | LOC100293440 | 0.29 |
HSP90AB1 | 2.45 | IL20RA | 2.15 | RPL36 | 0.45 | IGLc | 0.29 |
FGFR3 | 2.44 | TTC19 | 2.15 | CD24 | 0.45 | IGKC | 0.29 |
LANCL1 | 2.43 | TCF7L2 | 2.15 | SPP1 | 0.44 | CCL5 | 0.28 |
NA | 2.41 | CEP68 | 2.15 | CCL8 | 0.44 | IGLc | 0.27 |
MAP4K5 | 2.41 | CPD | 2.14 | LTF | 0.44 | UBD | 0.27 |
AKR1D1 | 2.40 | MME | 2.14 | FOSL2 | 0.44 | NA | 0.25 |
PEX11B | 2.40 | YTHDF1 | 2.13 | LOC91316 | 0.44 | IGLc | 0.25 |
SERINC3 | 2.36 | KLF7 | 2.13 | C10ORF81 | 0.43 | MUC7 | 0.22 |
FAM188A | 2.33 | AHNAK | 2.12 | FABP4 | 0.43 | LOC100291464 | 0.21 |
PSMB1 | 2.33 | NA | 2.12 | JUNB | 0.43 | LYZ | 0.21 |
TBC1D8B | 2.32 | MTUS1 | 2.12 | FABP7 | 0.43 | MGC29506 | 0.19 |
TRIM37 | 2.32 | ADRA2A | 2.11 | RPL28 | 0.43 | S100A8 | 0.19 |
HEY1 | 2.32 | TRIM36 | 2.11 | CXCL14 | 0.42 | IGLc | 0.18 |
MSMB | 2.31 | PTPRZ1 | 2.11 | LIF | 0.41 | CCL2 | 0.18 |
RAB3GAP1 | 2.31 | MCCC2 | 2.11 | ID2 | 0.41 | S100A9 | 0.18 |
NPY2R | 2.30 | ESR1 | 2.10 | C4BPA | 0.41 | CD52 | 0.17 |
AXL | 2.30 | ITPR1 | 2.10 | TFEB | 0.41 | IGKC | 0.17 |
SPRY1 | 2.30 | ADAM12 | 2.10 | AIF1 | 0.40 | IGLL3 | 0.13 |
ABLIM1 | 2.30 | PIP4K2A | 2.09 | CCL3 | 0.40 | IGHM | 0.11 |
STC1 | 2.30 | RYBP | 2.09 | TOP2A | 0.40 | IGLc | 0.11 |
RAB28 | 2.29 | POGK | 2.08 | STAB1 | 0.40 | CSN2 | 0.09 |
aProbability scores calculated in BADGE ranged from 4.45E-8 to 1.81E-2.
bFold change calculated in BADGE (see Materials and Methods).
cCannot distinguish between multiple gene family members.
Abbreviation: NA, not available.
Validation of microarray data via quantitative real-time PCR
Seven genes were selected for qPCR validation of gene expression data (ABLIM, AHNAK, CRABP2, CX3CL1, CXCL14, ESR1, and NEDD4L) on the basis of consistent expression among cases within ER+ and ER− groups, mean probe expression ≥ 200, more than 2-fold difference in expression between the groups, a strongly significant P value, and biological relevance to cancer. Several of the genes have been implicated in breast cancer–associated pathways (e.g., ESR1 is the estrogen receptor; CRABP2 is involved in retinoic acid pathway; CX3CL1 and CXCL14 are involved in the immune response). An endogenous control gene, CPSF6, was selected on the basis of consistent expression between groups, and a probe expression greater than 200. For each gene, we selected intron-spanning TaqMan assays (ABI) that overlapped with the Affy probe target on the HU-133A chip and that generated amplicons less than 110 nucleotides.
For each qPCR, 2 ng unamplified RNA was reverse transcribed using random hexamers with the TaqMan Multiscript RT reagent kit (ABI). For each sample, the dCT value for each test gene was calculated by subtracting the CT value of the reference gene, CPSF6, from the CT value of the test gene. For both the NlEpiER+ and NlEpiER− groups, each test gene's mean dCT value and SEM were calculated using a 1-tailed 2-sample t test (from the data analysis package in Microsoft Excel). To assess each test gene in individual samples, the dCT value for individual NlEpiER− samples was compared with the mean dCT value for the NlEpiER+ samples, and the dCT values for individual NlEpiER+ samples were compared with the mean dCT value for the NlEpiER− samples. For each comparison, we defined validation as expression in the direction predicted by microarray analysis. These individual assessments generated the rates of validation described in the text.
For graphical presentation of quantified and summarized qPCR results, mean ddCT values for each gene were calculated by subtracting the mean dCT value of the reference group (NlEpiER+, as defined in the microarray analysis) from the mean dCT value of the test group (NlEpiER−). Data were plotted as 2−ddCT (mean fold change) using Graph Pad prism software. To determine whether dCT values differed between groups, we used a 1-tailed 2-sample t test assuming equal variances to compare dCT values for individual samples between groups. Significantly different dCT values (P < 0.05) are denoted by an asterisk.
Annotation and analysis of differentially expressed genes
The list of differentially expressed genes was compared with published data sets and was uploaded into DAVID (http://david.abcc.ncifcrf.gov/home.jsp) and analyzed with the Functional Annotation Enrichment Analysis to determine overrepresented GO terms and PANTHER functions. Ingenuity Pathway Analysis (www.ingenuity.com) was also used to identify biological functions, canonical pathways, and functional gene classification terms.
Comparison of NlEpi gene expression to existing breast cancer gene expression data sets
To compare gene expression in NlEpi with invasive breast cancer, we used a publicly available breast cancer gene expression data set (26). The data set was downloaded from NCBI's GEO database with accession number GSE3494. To be compatible with this data set, we also recalculated the signal intensity from the 30 NlEpi samples' .cel files using the RMA algorithm (27). To investigate how previously reported ER-related genes are expressed in NlEpi, we retrieved 3 lists of genes that distinguish ER+ from ER− breast cancers (28–30). These gene lists were mapped to our own expression data using official gene symbol as a common ID. When multiple probes in our gene expression data set corresponded to the same gene ID in the published gene lists, we used only the probe with highest average expression level across all samples. Differentially expressed genes were selected using a simple t test with a correction for multiple hypotheses testing (FDR).
Results
Identification of genes that are differentially expressed between NlEpiER+ compared with NlEpiER−
Cases were selected for microarray analysis on the basis of tumor immunohistochemical staining for ER. Then, 15 ER+ cases were each age matched to an ER− case, to minimize age-related differences in gene expression (19–21). On the basis of sample availability, individual cases could not be directly matched for ethnicity, cancer stage, or distance of the NlEpi sample from the tumor. Table 1 presents clinical pathologic information for these 30 cases. NlEpi was microdissected from each case. RNA was isolated, amplified, and used for microarray analysis (see Materials and Methods).
Using BADGE (23) and then applying an extrinsic leave-one-out 80% cross-validation, we identified 216 probes, reflecting 198 genes, that were significantly differentially expressed between NlEpiER+ compared with NlEpiER−. These are presented in Table 2 (by relative fold change) and in Supplementary Table S2 (alphabetically). Of the 216 probes, 115 (53.2%), corresponding to 111 genes, had a higher gene expression in NlEpiER+ samples, and 101 (46.8%), corresponding to 87 genes, had a higher gene expression in NlEpiER− samples. Six genes were represented by multiple probes and 5 probes represented unidentified targets. As expected, and as shown in Figure 1, unsupervised clustering analysis utilizing the 216 probes separated the samples based on ER status.
For validation of these expression data, we used qPCR, selecting primers for 7 of the differentially expressed genes (ABLIM, AHNAK, CRABP2, CX3CL1, CXCL14, ESR1, and NEDD4L), and 1 endogenous control gene (CPSF6). We first tested the microarray data on unamplified RNA that remained from 15 of the 30 cases used for the microarray analysis (7 ER+ and 8 ER−), in a technical validation of the microarray data. We examined the 7 genes' expression in 6 to 8 of these samples. Overall, 43 of 51 (84%) reactions validated the microarray data, with 5 of the 7 genes confirming the microarray data in every sample tested (CRABP2, CX3CL1, CXCL14, ESR1, and NEDD4L, 100% of PCRs validated), and 2 of the 7 genes not confirming the microarray (AHNAK and ABLIM, 43% of PCRs validated). When we calculated mean fold change in expression for each gene between groups, we found that the mean fold change was in the expected direction for 5 of the genes (CRABP2, ESR1, NEDD4L, CX3CL1, and CSCL14) as shown in Figure 2A.
Testing the gene expression profile in independent cases of NlEpi from breasts with ER+ and ER− breast cancers
Next, we tested the gene expression data on NlEpi from an independent set of breast cancer cases. For this prospective validation, we used the same techniques to obtain RNA from NlEpi of 16 independent cases (9 ER+ and 7 ER−) with similar clinical pathologic features as the 30 cases used for microarray analysis (Supplementary Table S1). We then tested these 16 cases with each of the 7 qPCR primers. Overall, 82 of 106 (77%) reactions validated the microarray data. When we calculated mean fold change in expression for each gene between groups, we found that 3 of the 7 genes (ABLIM, CX3CL1, and CXCL14) confirmed the microarray results. Mean fold changes were in the expected direction for another 3 genes (AHNAK, CRABP2, and ESR1), but did not reach significance, perhaps due to the small number of cases. The nonvalidating PCRs were distributed evenly across all 16 cases. These results are shown in Figure 2B.
Annotation of the ER+ versus ER− NlEpi gene expression profile
The 216-probe list was analyzed with DAVID and Ingenuity, which together identified functional categories implicated in carcinogenesis, including cell adhesion, motility, transcription, cell cycle, immune response, and hormonal activity and regulation (Supplementary Table S3; Supplementary Fig. S2). The 198 unique genes (Table 2; Supplementary Table S2) included genes known to be involved in ER function or characteristic of ER+ tumors. These genes were overexpressed in NlEpiER+ compared with NlEpiER−. Examples include ESR1 itself (28, 31, 32), ABAT (33), GATA3 (28, 32, 34–36), GFRA1 (33), PDZK1 (33), STC2 (28, 33), and ERBB4 (29). Similarly, genes characteristic of ER− cancers were relatively overexpressed in NlEpiER− compared with NLEpiER+. Examples include CX3CL1 (33), FABP7 (33), GBP1 (37), KRT23 (38), RARRES1 (33), S100A8 (33), and THBS1 (28). In addition, the 198 genes included family members of genes implicated in breast cancer, such as multiple ribosomal-related proteins and S100 calcium binding proteins, which were all overexpressed in NlEpiER−, and YWHAZ, an antiapoptotic protein, associated with anthracycline resistance (39), which was overexpressed in NlEpiER+.
We noted that multiple immune-related genes' expression was increased in NlEpi of ER− cancers. At least 18 of the 86 (21%) genes with higher expression in NlEpi of ER− cancers are immune related (e.g., CXCL1, multiple CCLs, multiple Ig genes). To examine whether these genes were expressed in the epithelium or were due to an increase in the number of infiltrating lymphocytes, we performed immunohistochemistry for p63 (a myoepithelial cell marker) and LCA (a pan-leucocyte marker) on adjacent sections of a representative subgroup of cases (7 ER+ and 11 ER−). The number of lymphocytes (LCA+ cells with characteristic morphology) per TDLU was determined by a pathologist blinded to ER status. No difference in the number of lymphocytes per TDLU was seen between the ER+ and ER− cases (Supplementary Table S4; Supplementary Fig. S3). Thus, increased expression of immune-related genes in NlEpiER− compared with NlEpiER+ is not due to quantitative differences in lymphocyte infiltration.
Gene set enrichment analysis (GSEA) of the whole NlEpi expression data set identified overexpression of keratin genes in the NlEpiER− group (FDR < 0.25). The keratin genes in the gene set included markers of basal breast cancers (KRT17, KRT15, KRT13, KRT6B, KRT5; ref. 40) and also of luminal breast cancers (KRT8, KRT18). Two other gene sets of interest that were differentially expressed (although with higher FDR) included ribosomal genes, which were also overexpressed in the NlEpiER− group, and genes defining ER, which were overexpressed in the NlEpiER+ group. These results are depicted in Supplementary Figures S4–S7.
Comparison of NlEpi gene expression to invasive breast cancer gene expression
We wished to compare gene expression in NlEpi to gene expression in invasive breast cancers. We first compared the expression pattern of the 216 probes in our data set to their expression pattern in a publicly available breast cancer gene expression data set. We used the data set of Miller et al. (26) because ER status is provided and it is based on the same platform (Affymetrix U133A) as this study. Figure 3 shows side by side the expression pattern of the 216 probes in our 30 NlEpi samples and the 247 breast cancers. We find that 115 of the 216 (53%) probes, representing 103 genes, are differentially expressed (FDR < 0.05) in ER+ compared with ER− breast tumors. Even though approximately half of the 216 probes are not significantly different between ER+ and ER− breast tumors, Figure 3 shows that the overall pattern is quite similar for most of 216 probes between NlEpi and tumors.
Next, we wished to investigate how genes previously reported to distinguish ER+ from ER− cancers are expressed in NlEpi. Therefore, we retrieved 3 gene lists reported previously to be differentially expressed in ER+ versus ER− breast cancers, and analyzed the genes' expression in our NlEpi data set (28–30). We found that 25% to 31% of the genes that distinguish ER+ from ER− cancers are also differentially expressed in NlEpiER+ compared with NlEpiER−. These results are shown in Figure 4. Specifically, in the Tozlu gene list (29), 11 of 35 (31%) genes overexpressed in ER+ cancers are also overexpressed in NlEpiER+ (FDR < 0.05). These genes include ESR1, GATA3, BCL2, MYB, AR, STC2, and ERBB4. In the Gruvberger gene list (28), 9 of the 36 (25%) genes distinguishing ER+ from ER− cancers are also differentially expressed (FDR < 0.05) in NlEpiER+ compared with NlEpiER−. These include ESR1, GATA3, STC2 (up in both NlEpiER+ and ER+ cancers), and CDH3, EGFR, S100A8 (up in both NlEpiER− and ER− cancers). In the van't Veer gene lists (30), 76 of the 248 (31%) genes whose expression distinguishes ER+ from ER− breast cancers are also differentially expressed (FDR < 0.05), in NlEpiER+ compared with NlEpiER−. These genes include classic components of the luminal A signature, ESR1 and GATA3, as well as ERBB4, BCL2, and MYB, which are all overexpressed in NlEpiER+; and genes marking basal tumors, CDH3, CXC3L1, FAPB7, and KRT23, which are all overexpressed in NlEpiER−.
In each comparison between NlEpi gene expression and an external data set, many genes were not differentially expressed, and a small number were differentially expressed in the “wrong” direction. The proportion of these “wrongly directed” genes was always smaller than the proportion of “rightly directed” genes, and the proportion decreased as cancer sample size increased, suggesting that “wrongly directed” genes represent a combination of small sample size and microarray related artifacts. Specifically, in the Tozlu data set (29), there were 0 of 35 (0%) genes in the “wrong” direction compared with 11 of 35 (31%) in the “right” direction; in the Gruvberger data set (28), there were 5 of 36 (14%) compared with 9 of 36 (25%); in the van't Veer data set (30), there were 9 of 248 (4%) compared with 78 of 248 (31%); and in the Miller data set (26), there were 7 of 216 (3%) probes compared with 115 of 216 (53%).
In sum, we found overlap between NlEpi and the corresponding cancer subset in 25% to 53% of the genes or probes examined. Because the NlEpi was microdissected, but the cancer gene expression data sets were derived from cancers that were not microdissected, and thus contain heterogeneous cell types, the similarities between NlEpi and cancer gene expression may be underestimated.
Discussion
We investigated gene expression in histologically normal breast epithelium microdissected from breasts with either ER+ or ER− breast cancers. We found that gene expression in NlEpiER+ differs from gene expression in NlEpiER−, and that gene expression in each type of NlEpi resembles expression of the corresponding type of invasive breast cancer (i.e., ER+ or ER−).
There are several possible (and not mutually exclusive) explanations for our findings. One explanation is that the NlEpi gene expression profile reflects the influence of the extracellular environment. The extracellular environment unquestionably plays a crucial role in cancer development, and likely acts prior to tumor invasion (24, 41, 42). A coincident cancer could influence NlEpi gene expression, although our previous findings suggest that neither contamination nor paracrine effects of a cancer would explain aberrant NlEpi gene expression (18). An intriguing possibility is that particular women may be predisposed to develop particular tumor subtypes due to inherited susceptibility genes. Evidence supporting this explanation includes the finding that gene expression is significantly influenced by germline polymorphisms (43), and the observation that components of breast cancer's prognostic signatures are present in non-neoplastic tissue of susceptible animals (44). Alternatively, the NlEpi gene expression profile could reflect a field effect in some part of the breast (for review see ref. 17) that results in a predisposition for that area to transform into a particular cancer subtype. We do not know the size of the field because we have not comprehensively sampled distinct geographic regions of the breast. This model might be particularly relevant to women who are BRCA gene mutation carriers. Finally, combined with our previous finding that gene expression in NlEpi from women at high risk of breast cancer can resemble gene expression in NlEpi from cancer-containing breasts, the present findings suggest that characteristic features of breast cancer subgroups may be detectable prior to histologic abnormalities. The NlEpi gene expression profile could show some of the earliest genomic changes of ER+ and ER− cancers. These changes may be present before histologic abnormality, either in cancer-initiating cells or in cells that eventually develop into the tumor. The relationship between the NlEpi abnormalities and any intratumoral heterogeneity that eventually develops is unknown (45, 46).
Regardless of which explanation(s) is correct, analysis of NlEpi gene expression and comparison to 4 independent cancer data sets offer insight into events occurring early in carcinogenesis. In particular, NlEpi expression of ER and associated cofactors and downstream signals seems fundamental to the development of ER+ cancers. Conversely, expression of these genes is absent in NlEpi associated with ER− cancers. This observation is consistent with the substantial benefits of anti-estrogens in preventing only ER+ cancers (1, 47, 48). ER− tumors have no signature feature analogous to ER expression, and may be more heterogeneous than ER+ tumors (49). A consequence is that prevention and treatment is less successful for ER− tumors than for ER+ tumors. More generally, the overlap between genes and probes differentially expressed in NlEpi subtypes and in the corresponding cancer subtype is striking (36%–53% of the genes or probes examined). This observation suggests that histologically normal epithelium may be quite genomically active and that additional genome-wide approaches to investigate the NlEpi landscape could be highly promising.
The various analyses of NlEpi gene expression results suggested several directions that would be worthwhile pursuing. Examination of the gene list itself suggested that immune-related genes' expression was generally increased in NlEpiER−, and was not due to an increase in number of infiltrating lymphocytes. One possible explanation is that the lymphocytes infiltrating NlEpiER− differ from those infiltrating NlEpiER+ and generate a distinct signature. Another possible explanation may be found in recent reports of high expression of immunomodulatory genes in ER− breast cancers composed predominantly of epithelial cells and lacking lymphocytic infiltrates (49, 50). The immunomodulatory gene expression in this setting may be associated with good prognosis, and is consistent with data from cell lines (51). Furthermore, immunoglobulin genes themselves have been reported to be expressed in breast cancer cells and breast cancer cell lines (52–55). This area would be promising to investigate further.
GSEA of our expression data identified several gene sets that may differ between NlEpiER+ and NlEpiER−. ER-related genes were overexpressed in NlEpiER+, which seems to confirm the validity of our data. Ribosome-related genes and keratin genes were overexpressed in NlEpiER−. NlEpiER−'s overexpression of ribosomal genes may reflect increased cell metabolism and protein synthesis, or be related to lineage-specific cell fate (56). Overrepresentation of ribosomal genes has been reported in basal breast cancers (which are ER−; ref. 57) and may be related to increased cell proliferation or MYC activity, which has also been reported by others (58). The explanation for overexpression of keratin genes is not clear, but one could speculate that it suggests alterations to the cells' structural integrity, or even a response to signals from the extracellular environment. Unlike the BADGE-derived gene list and the DAVID analyses, GSEA did not identify immune function as an overrepresented category; this may be due to separate analytic approaches.
This study has limitations. One is its small sample size, which is due to the logistic and technical challenges of obtaining and investigating fresh tissue of untreated patients. To counterbalance this limitation, we adopted a study design suitable for small sample size: the cases in each group were tightly age matched (because age exerts a major influence on gene expression in the breast; refs. 19–21), the normal epithelium was microdissected away from other cells to enrich for a homogeneous cell population, we utilized a statistical approach appropriate for small sample size, we validated the microarray results on independent NlEpi samples and compared them to external cancer data sets. The study's second limitation is the possibility that our NlEpi samples were contaminated by malignant cells. We made every effort to avoid this by having each section reviewed by a breast pathologist (A de las Morenas) to diagnose each area on every 10th slide from each section: both histologically normal epithelium and any abnormal areas (cancerous or not) were marked. Every effort was made to avoid any non-normal area. Thus, we think it is unlikely that the microdissected samples were contaminated to any substantial extent.
In conclusion, gene expression differs in NlEpi of breasts containing ER+ compared with ER− breast cancers. These gene expression differences reflect the gene expression differences in ER+ compared with ER− cancers. This finding implies that genomic changes characteristic of specific breast cancer subtypes may be detectable before histologic evidence of abnormalities. Future work could examine whether gene expression characteristic of the breast cancer intrinsic or molecular subtypes (31, 32, 59) is detectable in NlEpi, and whether NlEpi gene expression reflects gene expression in each subtype. This would be clinically relevant because each subtype has characteristic genomic (31, 32, 59), clinical, and pathologic features (30, 60). Therefore, normal epithelium gene expression profiles could help define a breast cancer subtype-specific risk signature, identify the initial genomic differences between subtypes, and suggest new targets for prevention and therapy, which is especially important for ER− cancers.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant Support
C.L. Rosenberg was supported by PHS CA115434, the Avon Foundation, and the LaPann funds. X. Ge is supported by NIH grant GM083226.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.