Purpose: Previously, we found that gene expression in histologically normal breast epithelium (NlEpi) from women at high breast cancer risk can resemble gene expression in NlEpi from cancer-containing breasts. Therefore, we hypothesized that gene expression characteristic of a cancer subtype might be seen in NlEpi of breasts containing that subtype.

Experimental Design: We examined gene expression in 46 cases of microdissected NlEpi from untreated women undergoing breast cancer surgery. From 30 age-matched cases [15 estrogen receptor (ER)+, 15 ER−] we used Affymetryix U133A arrays. From 16 independent cases (9 ER+, 7 ER−), we validated selected genes using quantitative real-time PCR (qPCR). We then compared gene expression between NlEpi and invasive breast cancer using four publicly available data sets.

Results: We identified 198 genes that are differentially expressed between NlEpi from breasts with ER+ (NlEpiER+) compared with ER− cancers (NlEpiER−). These include genes characteristic of ER+ and ER− cancers (e.g., ESR1, GATA3, and CX3CL1, FABP7). qPCR validated the microarray results in both the 30 original cases and the 16 independent cases. Gene expression in NlEpiER+ and NlEpiER− resembled gene expression in ER+ and ER− cancers, respectively: 25% to 53% of the genes or probes examined in four external data sets overlapped between NlEpi and the corresponding cancer subtype.

Conclusions: Gene expression differs in NlEpi of breasts containing ER+ compared with ER− breast cancers. These differences echo differences in ER+ and ER− invasive cancers. NlEpi gene expression may help elucidate subtype-specific risk signatures, identify early genomic events in cancer development, and locate targets for prevention and therapy. Clin Cancer Res; 17(2); 236–46. ©2010 AACR.

This article is featured in Highlights of This Issue, p. 201

Translational Relevance

We find that gene expression differs in histologically normal epithelium of breasts containing ER+ compared with ER− breast cancers. These differences reflect the gene expression differences in ER+ compared with ER− invasive cancers. Explanations of these findings include an effect of the extracellular environment, a field effect, predisposition to particular tumor subtypes due to inherited susceptibility genes, or detection of breast cancer subtype-specific genomic changes prior to histologic abnormality. Regardless of which explanation is correct, normal epithelium gene expression profiles could help define a breast cancer subtype-specific risk signature, identify the initial genomic differences between subtypes, and suggest new targets for prevention and therapy, which is especially important for ER− cancers.

Breast cancer is a heterogeneous disease. One well-documented and clinically important dichotomizing characteristic of breast cancers is expression (or not) of estrogen receptor–α (ER). ER expression is of major importance in breast cancer prevention, treatment, and outcome (1–3), and may even help define a cancer's cell of origin (4).

Many of the genomic aberrations present in each subtype of invasive breast cancers can also be detected in earlier lesions, such as carcinoma in situ and even hyperplastic lesions (for review see ref. 5), suggesting that these aberrations are important to breast cancer initiation or early progression. Unexpectedly, alterations—genetic, epigenetic, gene expression, and protein—have been found in histologically normal breast tissue, but their biological significance and clinical utility are poorly understood (refs. 6–16; for review see ref. 17). They may mark increased risk, or be evidence of a field effect (e.g., due to an exposure or to an occult dysregulation of a gene or pathway). They could reveal breast cancer's earliest genomic changes, they could be random effects, or they could be a response to a tumor existing in the breast. Although challenging to address, a better understanding of the changes in histologically normal tissue should provide insight into breast cancer risk, initiation, and early progression.

Recently, we found that gene expression in histologically normal breast epithelium (NlEpi) in women with breast cancers could be distinguished from NlEpi gene expression in women without breast cancer who were undergoing reduction mammoplasty (11, 18), and that NlEpi gene expression in women at high risk of breast cancer undergoing prophylactic mastectomy resembles NlEpi gene expression in women with breast cancer (18). This suggests that aberrant NlEpi gene expression may indicate increased breast cancer risk by showing either the cancer's earliest genomic changes, or the influence of the microenvironment on the epithelium. These findings led us to hypothesize that specific gene expression profiles might characterize NlEpi associated with particular subtypes. To test this hypothesis, the goals of this study were, first, to compare gene expression in NlEpi in breasts containing 2 cancer subtypes, ER+ and ER−, and, second, to ask whether these gene expression profiles mirrored the gene expression profiles in ER+ and ER− breast cancers from independent cases. If this were true, then NlEpi expression could help define a risk signature for specific cancer subtypes, identify genomic features distinguishing the subtypes early in cancer development, and suggest new targets for subtype-specific prevention and therapy, which is of particular importance for ER− cancers.

Case selection

All cases were obtained using an IRB-approved protocol for collection of de-identified breast tissue not required for diagnosis. For microarray analysis, cases were randomly selected from women with ER+ or ER− tumors undergoing cancer surgery. No patient had undergone chemotherapy or radiation treatment prior to tissue acquisition. Each ER+ case was age matched (within 2 years) to an ER− case, to account for effects of age on gene expression (refs. 19–21; Table 1). Seventeen of 30 cases had been used in other studies (11, 18). For quantitative real-time PCR (qPCR) validation of the array data, an independent set of 16 cases (9 ER+ and 7 ER−) was selected using the same criteria as described earlier in the text (Supplementary Table S1).

Table 1.

Clinical pathologic characteristics of 30 breast cancer patients whose histologically normal breast epithelium (NlEpi) was analyzed by microarray

Patients with ER+ tumors (n = 15)Patients with ER− tumors (n = 15)
SampleAge (y)NlEpi distance from tumor (cm)ER/PR/HER2cStagedSampleAge (y)NlEpi distance from tumor (cm)ER/PR/HER2cStaged
351Ha 35 <2 +/+/− IIA 319Hb 34 <2 −/−/− IIIA 
297BHa 46 <2 +/+/− IIIC 364Hb 47 >2 −/−/+ IIIA 
359Ha 48 <2 +/+/NA 289Hb 47 <2 −/−/− IIA 
304BHa,b 49 <2 +/+/+ 342Hb 48 <2 −/−/− IIA 
248Ha 49 >2 +/+/− 273Hb 49 <2 −/−/+ III 
345H 58 >2 +/+/− IIB 272Hb 58 >2 −/−/+ IIIC 
388AHb 58 <2 +/−/− 322Hb 58 <2 −/−/− 
232Hb 59 >2 +/+/NA 226Ha,b 61 <2 −/−/+ IIIA 
258Ha 65 <2 +/+/− IIA 452H 63 <2 −/−/− 
405H 67 >2 +/+/− III 298H 67 <2 −/−/− IIB 
311H 69 >2 +/+/− IIA 429H 69 <2 −/−/− IIA 
453H 76 >2 +/−/− II 333Hb 76 >2 −/−/+ IIB 
394H 81 >2 +/+/− IIA 413H 79 <2 −/−/− 
275H 85 <2 +/+/− IIIA 296H 82 <2 −/−/− 
228H 92 <2 +/−/− IIB 401H 94 <2 −/−/− IIIA 
Patients with ER+ tumors (n = 15)Patients with ER− tumors (n = 15)
SampleAge (y)NlEpi distance from tumor (cm)ER/PR/HER2cStagedSampleAge (y)NlEpi distance from tumor (cm)ER/PR/HER2cStaged
351Ha 35 <2 +/+/− IIA 319Hb 34 <2 −/−/− IIIA 
297BHa 46 <2 +/+/− IIIC 364Hb 47 >2 −/−/+ IIIA 
359Ha 48 <2 +/+/NA 289Hb 47 <2 −/−/− IIA 
304BHa,b 49 <2 +/+/+ 342Hb 48 <2 −/−/− IIA 
248Ha 49 >2 +/+/− 273Hb 49 <2 −/−/+ III 
345H 58 >2 +/+/− IIB 272Hb 58 >2 −/−/+ IIIC 
388AHb 58 <2 +/−/− 322Hb 58 <2 −/−/− 
232Hb 59 >2 +/+/NA 226Ha,b 61 <2 −/−/+ IIIA 
258Ha 65 <2 +/+/− IIA 452H 63 <2 −/−/− 
405H 67 >2 +/+/− III 298H 67 <2 −/−/− IIB 
311H 69 >2 +/+/− IIA 429H 69 <2 −/−/− IIA 
453H 76 >2 +/−/− II 333Hb 76 >2 −/−/+ IIB 
394H 81 >2 +/+/− IIA 413H 79 <2 −/−/− 
275H 85 <2 +/+/− IIIA 296H 82 <2 −/−/− 
228H 92 <2 +/−/− IIB 401H 94 <2 −/−/− IIIA 

aUsed in Tripathi et al. (11).

bUsed in Graham et al. (18).

cEstrogen receptor, progesterone receptor, HER2 status of coincident tumor.

dStage using AJCC stage criteria (61).

Abbreviation: NA, not available.

RNA extraction and microarray hybridization

Tissue preparation, microdissection, RNA extraction, amplification, hybridization, and normalization were completed as described previously (11, 18). Briefly, tissues were snap frozen, embedded in optimal cutting temperature embedding medium, sectioned at 10 μm, stained with hematoxylin and eosin (50% diluted with H20) and then histologically normal epithelium—both terminal ductal-lobular units (TDLU) and ducts—were identified and microdissected (Supplementary Fig. S1). Most NlEpi samples (n = 20) were “tumor-adjacent” (i.e., located 1–2 cm from the tumor, on blocks lacking malignant cells). Some NlEpi samples (n = 10) lay further away, but still in the same quadrant as the tumor, because most surgeries were lumpectomies. Great care was taken to avoid microdissecting any proliferative cells, even simple hyperplastic lesions. Most NlEpi samples consisted only of TDLUs, but approximately 20% of samples (mainly from older patients) contained some ducts, as well. RNA was extracted using the PicoPure extraction kit (Molecular Devices). For cases undergoing microarray analysis, the RNA was amplified, and gene expression was measured using the Affymetrix U133A chip (Affymetrix), a technique that yields reliable and reproducible results (22). .cel files were processed with MAS 5.0 using standard procedures for quality control and normalization was limited to rescaling each sample to a mean intensity of 200. The microarray data from these samples are available from the NCBI Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo) under accession GSE21947.

Identification of differentially expressed probes

Probes with less than 20% detectable hybridization and samples with a scaling factor greater than 10 were removed. Gene expression data of the probes that passed the quality control filters were analyzed using the method known as Bayesian Analysis of Differential Gene Expression (BADGE; ref. 23) to identify probes that were differentially expressed in NlEpi of patients with ER+ compared with ER− cancers. We have used this approach previously (18, 24). BADGE uses a model-averaging approach to identify probes with different expression in 2 biological conditions and scores the evidence of differential expression by the probability that the fold change of expression is greater than 1 or less than 1. The probability score is then calculated as 1 minus this probability so that the smaller the probability score, the stronger is the evidence of differential expression. The method has a very large sensitivity but low specificity with small sample sizes; with 20 samples per group the sensitivity to detect a fold change of 2 or more can be 100% but the specificity can be less that 70% (23). Therefore, to reduce the chance of false positives, we used an extrinsic leave-one-out cross-validation implemented in BADGE to select those probes that showed robust changes of expression between groups (25). Leave-one-out cross-validation consisted of removing one case at a time from the data set and using the remaining samples to detect the probes with differential expression with a false discovery rate (FDR) less than 5%, chosen to trade off sensitivity and specificity. Probes selected 80% or more of the time were included in the final list of differentially expressed genes and the final probability scores and fold changes are based on all samples (Table 2). Heatmaps were generated using the package HeatPlus from Bioconductor and simple hierarchical clustering was used to cluster samples based on their expression profiles.

Table 2.

A total of 216 probes (identified by gene name) differentially expressed between histologically normal breast epithelium of breasts with ER+ compared with ER− cancersa

GeneER+:ER− foldbGeneER+:ER− foldbGeneER+:ER− foldbGeneER+:ER− foldb
UGT2B28 15.65 BUB3 2.28 CUL4B 2.08 ANPEP 0.40 
CXCL13 11.76 SFRS1 2.28 ADCY1 2.08 IGHG1 0.40 
GRIA2 5.52 CTSO 2.27 ATP9A 2.07 PTGDS 0.40 
TNFSF11 4.41 SAV1 2.27 ZNF451 2.07 SAT1 0.40 
TFPI2 4.22 ALOX15B 2.27 YWHAZ 2.07 KRT23 0.40 
EREG 4.20 FUSIP1 2.26 WAC 2.07 COL6A1 0.39 
SLC26A3 4.18 SHQ1 2.26 TM9SF2 2.04 S100A4 0.39 
TSPAN8 4.13 CCDC47 2.25 CD74 0.49 RARRES1 0.39 
CYP24A1 4.11 SERHL2 2.24 LSR 0.49 DEFB1 0.39 
CPB1 4.06 TCP1 2.24 FKBP11 0.48 CD14 0.38 
NAIP 4.05 COIL 2.23 C1QA 0.48 S100A11 0.38 
MYBPC1 3.92 NA 2.23 SNRPD2 0.48 PTGDS 0.38 
STC1 3.87 PI15 2.22 IFITM1 0.47 CX3CL1 0.37 
BMP5 3.71 LONP2 2.21 S100A11 0.47 GSTA1 0.37 
GPR144 3.49 ZBTB44 2.21 BCL3 0.47 SH3BGRL3 0.37 
CLCA2 3.41 XRCC5 2.21 STAG2 0.47 PSPH 0.37 
HPGD 3.04 FYCO1 2.20 RPL13 0.47 TGFB2 0.37 
NEDD4L 2.94 ZFC3H1 2.20 GMFG 0.47 MAF 0.36 
SERHL 2.93 ERBB4 2.19 APOC1 0.47 PLTP 0.36 
HMGCS2 2.90 GOLGA8B 2.19 GBP1 0.47 G0S2 0.35 
STK32B 2.86 FRY 2.19 CITED2 0.47 HSPG2 0.35 
TRIB2 2.81 GATA3 2.19 LTB 0.46 PI3 0.35 
BAT1 2.80 ERMAP 2.18 TNFAIP2 0.46 RARRES1 0.34 
C14ORF132 2.63 NSF 2.18 PDZK1IP1 0.46 RPS26 0.33 
ZNF167 2.60 SIDT1 2.17 H2AFX 0.46 S100A6 0.33 
GYG2 2.59 RAB3GAP2 2.17 ATP5E 0.46 NA 0.33 
KCTD7 2.55 GFRA1 2.16 IGKC 0.46 NOV 0.32 
PNMAL1 2.54 GPRC5B 2.16 CRABP2 0.45 VGLL1 0.31 
TTLL5 2.52 ISG20L2 2.16 NA 0.45 THBS1 0.31 
PDZK1 2.52 BLCAP 2.16 CIITA 0.45 IGHA2 0.30 
DIO2 2.50 GNB1 2.16 TRBC1 0.45 CCL19 0.30 
GPLD1 2.50 ABAT 2.15 TRAF3IP3 0.45 IGKC 0.30 
LYST 2.48 STC2 2.15 CX3CL1 0.45 LOC100293440 0.29 
HSP90AB1 2.45 IL20RA 2.15 RPL36 0.45 IGLc 0.29 
FGFR3 2.44 TTC19 2.15 CD24 0.45 IGKC 0.29 
LANCL1 2.43 TCF7L2 2.15 SPP1 0.44 CCL5 0.28 
NA 2.41 CEP68 2.15 CCL8 0.44 IGLc 0.27 
MAP4K5 2.41 CPD 2.14 LTF 0.44 UBD 0.27 
AKR1D1 2.40 MME 2.14 FOSL2 0.44 NA 0.25 
PEX11B 2.40 YTHDF1 2.13 LOC91316 0.44 IGLc 0.25 
SERINC3 2.36 KLF7 2.13 C10ORF81 0.43 MUC7 0.22 
FAM188A 2.33 AHNAK 2.12 FABP4 0.43 LOC100291464 0.21 
PSMB1 2.33 NA 2.12 JUNB 0.43 LYZ 0.21 
TBC1D8B 2.32 MTUS1 2.12 FABP7 0.43 MGC29506 0.19 
TRIM37 2.32 ADRA2A 2.11 RPL28 0.43 S100A8 0.19 
HEY1 2.32 TRIM36 2.11 CXCL14 0.42 IGLc 0.18 
MSMB 2.31 PTPRZ1 2.11 LIF 0.41 CCL2 0.18 
RAB3GAP1 2.31 MCCC2 2.11 ID2 0.41 S100A9 0.18 
NPY2R 2.30 ESR1 2.10 C4BPA 0.41 CD52 0.17 
AXL 2.30 ITPR1 2.10 TFEB 0.41 IGKC 0.17 
SPRY1 2.30 ADAM12 2.10 AIF1 0.40 IGLL3 0.13 
ABLIM1 2.30 PIP4K2A 2.09 CCL3 0.40 IGHM 0.11 
STC1 2.30 RYBP 2.09 TOP2A 0.40 IGLc 0.11 
RAB28 2.29 POGK 2.08 STAB1 0.40 CSN2 0.09 
GeneER+:ER− foldbGeneER+:ER− foldbGeneER+:ER− foldbGeneER+:ER− foldb
UGT2B28 15.65 BUB3 2.28 CUL4B 2.08 ANPEP 0.40 
CXCL13 11.76 SFRS1 2.28 ADCY1 2.08 IGHG1 0.40 
GRIA2 5.52 CTSO 2.27 ATP9A 2.07 PTGDS 0.40 
TNFSF11 4.41 SAV1 2.27 ZNF451 2.07 SAT1 0.40 
TFPI2 4.22 ALOX15B 2.27 YWHAZ 2.07 KRT23 0.40 
EREG 4.20 FUSIP1 2.26 WAC 2.07 COL6A1 0.39 
SLC26A3 4.18 SHQ1 2.26 TM9SF2 2.04 S100A4 0.39 
TSPAN8 4.13 CCDC47 2.25 CD74 0.49 RARRES1 0.39 
CYP24A1 4.11 SERHL2 2.24 LSR 0.49 DEFB1 0.39 
CPB1 4.06 TCP1 2.24 FKBP11 0.48 CD14 0.38 
NAIP 4.05 COIL 2.23 C1QA 0.48 S100A11 0.38 
MYBPC1 3.92 NA 2.23 SNRPD2 0.48 PTGDS 0.38 
STC1 3.87 PI15 2.22 IFITM1 0.47 CX3CL1 0.37 
BMP5 3.71 LONP2 2.21 S100A11 0.47 GSTA1 0.37 
GPR144 3.49 ZBTB44 2.21 BCL3 0.47 SH3BGRL3 0.37 
CLCA2 3.41 XRCC5 2.21 STAG2 0.47 PSPH 0.37 
HPGD 3.04 FYCO1 2.20 RPL13 0.47 TGFB2 0.37 
NEDD4L 2.94 ZFC3H1 2.20 GMFG 0.47 MAF 0.36 
SERHL 2.93 ERBB4 2.19 APOC1 0.47 PLTP 0.36 
HMGCS2 2.90 GOLGA8B 2.19 GBP1 0.47 G0S2 0.35 
STK32B 2.86 FRY 2.19 CITED2 0.47 HSPG2 0.35 
TRIB2 2.81 GATA3 2.19 LTB 0.46 PI3 0.35 
BAT1 2.80 ERMAP 2.18 TNFAIP2 0.46 RARRES1 0.34 
C14ORF132 2.63 NSF 2.18 PDZK1IP1 0.46 RPS26 0.33 
ZNF167 2.60 SIDT1 2.17 H2AFX 0.46 S100A6 0.33 
GYG2 2.59 RAB3GAP2 2.17 ATP5E 0.46 NA 0.33 
KCTD7 2.55 GFRA1 2.16 IGKC 0.46 NOV 0.32 
PNMAL1 2.54 GPRC5B 2.16 CRABP2 0.45 VGLL1 0.31 
TTLL5 2.52 ISG20L2 2.16 NA 0.45 THBS1 0.31 
PDZK1 2.52 BLCAP 2.16 CIITA 0.45 IGHA2 0.30 
DIO2 2.50 GNB1 2.16 TRBC1 0.45 CCL19 0.30 
GPLD1 2.50 ABAT 2.15 TRAF3IP3 0.45 IGKC 0.30 
LYST 2.48 STC2 2.15 CX3CL1 0.45 LOC100293440 0.29 
HSP90AB1 2.45 IL20RA 2.15 RPL36 0.45 IGLc 0.29 
FGFR3 2.44 TTC19 2.15 CD24 0.45 IGKC 0.29 
LANCL1 2.43 TCF7L2 2.15 SPP1 0.44 CCL5 0.28 
NA 2.41 CEP68 2.15 CCL8 0.44 IGLc 0.27 
MAP4K5 2.41 CPD 2.14 LTF 0.44 UBD 0.27 
AKR1D1 2.40 MME 2.14 FOSL2 0.44 NA 0.25 
PEX11B 2.40 YTHDF1 2.13 LOC91316 0.44 IGLc 0.25 
SERINC3 2.36 KLF7 2.13 C10ORF81 0.43 MUC7 0.22 
FAM188A 2.33 AHNAK 2.12 FABP4 0.43 LOC100291464 0.21 
PSMB1 2.33 NA 2.12 JUNB 0.43 LYZ 0.21 
TBC1D8B 2.32 MTUS1 2.12 FABP7 0.43 MGC29506 0.19 
TRIM37 2.32 ADRA2A 2.11 RPL28 0.43 S100A8 0.19 
HEY1 2.32 TRIM36 2.11 CXCL14 0.42 IGLc 0.18 
MSMB 2.31 PTPRZ1 2.11 LIF 0.41 CCL2 0.18 
RAB3GAP1 2.31 MCCC2 2.11 ID2 0.41 S100A9 0.18 
NPY2R 2.30 ESR1 2.10 C4BPA 0.41 CD52 0.17 
AXL 2.30 ITPR1 2.10 TFEB 0.41 IGKC 0.17 
SPRY1 2.30 ADAM12 2.10 AIF1 0.40 IGLL3 0.13 
ABLIM1 2.30 PIP4K2A 2.09 CCL3 0.40 IGHM 0.11 
STC1 2.30 RYBP 2.09 TOP2A 0.40 IGLc 0.11 
RAB28 2.29 POGK 2.08 STAB1 0.40 CSN2 0.09 

aProbability scores calculated in BADGE ranged from 4.45E-8 to 1.81E-2.

bFold change calculated in BADGE (see Materials and Methods).

cCannot distinguish between multiple gene family members.

Abbreviation: NA, not available.

Validation of microarray data via quantitative real-time PCR

Seven genes were selected for qPCR validation of gene expression data (ABLIM, AHNAK, CRABP2, CX3CL1, CXCL14, ESR1, and NEDD4L) on the basis of consistent expression among cases within ER+ and ER− groups, mean probe expression ≥ 200, more than 2-fold difference in expression between the groups, a strongly significant P value, and biological relevance to cancer. Several of the genes have been implicated in breast cancer–associated pathways (e.g., ESR1 is the estrogen receptor; CRABP2 is involved in retinoic acid pathway; CX3CL1 and CXCL14 are involved in the immune response). An endogenous control gene, CPSF6, was selected on the basis of consistent expression between groups, and a probe expression greater than 200. For each gene, we selected intron-spanning TaqMan assays (ABI) that overlapped with the Affy probe target on the HU-133A chip and that generated amplicons less than 110 nucleotides.

For each qPCR, 2 ng unamplified RNA was reverse transcribed using random hexamers with the TaqMan Multiscript RT reagent kit (ABI). For each sample, the dCT value for each test gene was calculated by subtracting the CT value of the reference gene, CPSF6, from the CT value of the test gene. For both the NlEpiER+ and NlEpiER− groups, each test gene's mean dCT value and SEM were calculated using a 1-tailed 2-sample t test (from the data analysis package in Microsoft Excel). To assess each test gene in individual samples, the dCT value for individual NlEpiER− samples was compared with the mean dCT value for the NlEpiER+ samples, and the dCT values for individual NlEpiER+ samples were compared with the mean dCT value for the NlEpiER− samples. For each comparison, we defined validation as expression in the direction predicted by microarray analysis. These individual assessments generated the rates of validation described in the text.

For graphical presentation of quantified and summarized qPCR results, mean ddCT values for each gene were calculated by subtracting the mean dCT value of the reference group (NlEpiER+, as defined in the microarray analysis) from the mean dCT value of the test group (NlEpiER−). Data were plotted as 2−ddCT (mean fold change) using Graph Pad prism software. To determine whether dCT values differed between groups, we used a 1-tailed 2-sample t test assuming equal variances to compare dCT values for individual samples between groups. Significantly different dCT values (P < 0.05) are denoted by an asterisk.

Annotation and analysis of differentially expressed genes

The list of differentially expressed genes was compared with published data sets and was uploaded into DAVID (http://david.abcc.ncifcrf.gov/home.jsp) and analyzed with the Functional Annotation Enrichment Analysis to determine overrepresented GO terms and PANTHER functions. Ingenuity Pathway Analysis (www.ingenuity.com) was also used to identify biological functions, canonical pathways, and functional gene classification terms.

Comparison of NlEpi gene expression to existing breast cancer gene expression data sets

To compare gene expression in NlEpi with invasive breast cancer, we used a publicly available breast cancer gene expression data set (26). The data set was downloaded from NCBI's GEO database with accession number GSE3494. To be compatible with this data set, we also recalculated the signal intensity from the 30 NlEpi samples' .cel files using the RMA algorithm (27). To investigate how previously reported ER-related genes are expressed in NlEpi, we retrieved 3 lists of genes that distinguish ER+ from ER− breast cancers (28–30). These gene lists were mapped to our own expression data using official gene symbol as a common ID. When multiple probes in our gene expression data set corresponded to the same gene ID in the published gene lists, we used only the probe with highest average expression level across all samples. Differentially expressed genes were selected using a simple t test with a correction for multiple hypotheses testing (FDR).

Identification of genes that are differentially expressed between NlEpiER+ compared with NlEpiER−

Cases were selected for microarray analysis on the basis of tumor immunohistochemical staining for ER. Then, 15 ER+ cases were each age matched to an ER− case, to minimize age-related differences in gene expression (19–21). On the basis of sample availability, individual cases could not be directly matched for ethnicity, cancer stage, or distance of the NlEpi sample from the tumor. Table 1 presents clinical pathologic information for these 30 cases. NlEpi was microdissected from each case. RNA was isolated, amplified, and used for microarray analysis (see Materials and Methods).

Using BADGE (23) and then applying an extrinsic leave-one-out 80% cross-validation, we identified 216 probes, reflecting 198 genes, that were significantly differentially expressed between NlEpiER+ compared with NlEpiER−. These are presented in Table 2 (by relative fold change) and in Supplementary Table S2 (alphabetically). Of the 216 probes, 115 (53.2%), corresponding to 111 genes, had a higher gene expression in NlEpiER+ samples, and 101 (46.8%), corresponding to 87 genes, had a higher gene expression in NlEpiER− samples. Six genes were represented by multiple probes and 5 probes represented unidentified targets. As expected, and as shown in Figure 1, unsupervised clustering analysis utilizing the 216 probes separated the samples based on ER status.

Figure 1.

Clustering of NlEpi samples based on gene expression. Clustering of the 30 NlEpi samples using the 216 probes identified as differentially expressed between the NlEpiER+ (ERp) and NlEpiER− (ERn) samples. The relative abundance of each transcript for each sample is represented as a colored block showing fold changes greater than 2 in either direction.

Figure 1.

Clustering of NlEpi samples based on gene expression. Clustering of the 30 NlEpi samples using the 216 probes identified as differentially expressed between the NlEpiER+ (ERp) and NlEpiER− (ERn) samples. The relative abundance of each transcript for each sample is represented as a colored block showing fold changes greater than 2 in either direction.

Close modal

For validation of these expression data, we used qPCR, selecting primers for 7 of the differentially expressed genes (ABLIM, AHNAK, CRABP2, CX3CL1, CXCL14, ESR1, and NEDD4L), and 1 endogenous control gene (CPSF6). We first tested the microarray data on unamplified RNA that remained from 15 of the 30 cases used for the microarray analysis (7 ER+ and 8 ER−), in a technical validation of the microarray data. We examined the 7 genes' expression in 6 to 8 of these samples. Overall, 43 of 51 (84%) reactions validated the microarray data, with 5 of the 7 genes confirming the microarray data in every sample tested (CRABP2, CX3CL1, CXCL14, ESR1, and NEDD4L, 100% of PCRs validated), and 2 of the 7 genes not confirming the microarray (AHNAK and ABLIM, 43% of PCRs validated). When we calculated mean fold change in expression for each gene between groups, we found that the mean fold change was in the expected direction for 5 of the genes (CRABP2, ESR1, NEDD4L, CX3CL1, and CSCL14) as shown in Figure 2A.

Figure 2.

qPCR validation of microarray data. A, qPCR results using unamplified RNA from the original samples (technical validation). B, qPCR results using unamplified RNA from independent samples (prospective validation). Each panel shows qPCR results for 7 genes tested. The 7 test genes are listed on the x-axis, and the mean fold change of expression is shown on the y-axis. NlEpiER+ serves as the reference group. Fold changes were calculated using the ddCT method, in which fold change data are represented as 2−ddCT. Error bars depict the SEM dCT values. *, significantly different dCT values (P < 0.05).

Figure 2.

qPCR validation of microarray data. A, qPCR results using unamplified RNA from the original samples (technical validation). B, qPCR results using unamplified RNA from independent samples (prospective validation). Each panel shows qPCR results for 7 genes tested. The 7 test genes are listed on the x-axis, and the mean fold change of expression is shown on the y-axis. NlEpiER+ serves as the reference group. Fold changes were calculated using the ddCT method, in which fold change data are represented as 2−ddCT. Error bars depict the SEM dCT values. *, significantly different dCT values (P < 0.05).

Close modal

Testing the gene expression profile in independent cases of NlEpi from breasts with ER+ and ER− breast cancers

Next, we tested the gene expression data on NlEpi from an independent set of breast cancer cases. For this prospective validation, we used the same techniques to obtain RNA from NlEpi of 16 independent cases (9 ER+ and 7 ER−) with similar clinical pathologic features as the 30 cases used for microarray analysis (Supplementary Table S1). We then tested these 16 cases with each of the 7 qPCR primers. Overall, 82 of 106 (77%) reactions validated the microarray data. When we calculated mean fold change in expression for each gene between groups, we found that 3 of the 7 genes (ABLIM, CX3CL1, and CXCL14) confirmed the microarray results. Mean fold changes were in the expected direction for another 3 genes (AHNAK, CRABP2, and ESR1), but did not reach significance, perhaps due to the small number of cases. The nonvalidating PCRs were distributed evenly across all 16 cases. These results are shown in Figure 2B.

Annotation of the ER+ versus ER− NlEpi gene expression profile

The 216-probe list was analyzed with DAVID and Ingenuity, which together identified functional categories implicated in carcinogenesis, including cell adhesion, motility, transcription, cell cycle, immune response, and hormonal activity and regulation (Supplementary Table S3; Supplementary Fig. S2). The 198 unique genes (Table 2; Supplementary Table S2) included genes known to be involved in ER function or characteristic of ER+ tumors. These genes were overexpressed in NlEpiER+ compared with NlEpiER−. Examples include ESR1 itself (28, 31, 32), ABAT (33), GATA3 (28, 32, 34–36), GFRA1 (33), PDZK1 (33), STC2 (28, 33), and ERBB4 (29). Similarly, genes characteristic of ER− cancers were relatively overexpressed in NlEpiER− compared with NLEpiER+. Examples include CX3CL1 (33), FABP7 (33), GBP1 (37), KRT23 (38), RARRES1 (33), S100A8 (33), and THBS1 (28). In addition, the 198 genes included family members of genes implicated in breast cancer, such as multiple ribosomal-related proteins and S100 calcium binding proteins, which were all overexpressed in NlEpiER−, and YWHAZ, an antiapoptotic protein, associated with anthracycline resistance (39), which was overexpressed in NlEpiER+.

We noted that multiple immune-related genes' expression was increased in NlEpi of ER− cancers. At least 18 of the 86 (21%) genes with higher expression in NlEpi of ER− cancers are immune related (e.g., CXCL1, multiple CCLs, multiple Ig genes). To examine whether these genes were expressed in the epithelium or were due to an increase in the number of infiltrating lymphocytes, we performed immunohistochemistry for p63 (a myoepithelial cell marker) and LCA (a pan-leucocyte marker) on adjacent sections of a representative subgroup of cases (7 ER+ and 11 ER−). The number of lymphocytes (LCA+ cells with characteristic morphology) per TDLU was determined by a pathologist blinded to ER status. No difference in the number of lymphocytes per TDLU was seen between the ER+ and ER− cases (Supplementary Table S4; Supplementary Fig. S3). Thus, increased expression of immune-related genes in NlEpiER− compared with NlEpiER+ is not due to quantitative differences in lymphocyte infiltration.

Gene set enrichment analysis (GSEA) of the whole NlEpi expression data set identified overexpression of keratin genes in the NlEpiER− group (FDR < 0.25). The keratin genes in the gene set included markers of basal breast cancers (KRT17, KRT15, KRT13, KRT6B, KRT5; ref. 40) and also of luminal breast cancers (KRT8, KRT18). Two other gene sets of interest that were differentially expressed (although with higher FDR) included ribosomal genes, which were also overexpressed in the NlEpiER− group, and genes defining ER, which were overexpressed in the NlEpiER+ group. These results are depicted in Supplementary Figures S4–S7.

Comparison of NlEpi gene expression to invasive breast cancer gene expression

We wished to compare gene expression in NlEpi to gene expression in invasive breast cancers. We first compared the expression pattern of the 216 probes in our data set to their expression pattern in a publicly available breast cancer gene expression data set. We used the data set of Miller et al. (26) because ER status is provided and it is based on the same platform (Affymetrix U133A) as this study. Figure 3 shows side by side the expression pattern of the 216 probes in our 30 NlEpi samples and the 247 breast cancers. We find that 115 of the 216 (53%) probes, representing 103 genes, are differentially expressed (FDR < 0.05) in ER+ compared with ER− breast tumors. Even though approximately half of the 216 probes are not significantly different between ER+ and ER− breast tumors, Figure 3 shows that the overall pattern is quite similar for most of 216 probes between NlEpi and tumors.

Figure 3.

Gene expression in histologically normal breast epithelium (NlEpi) compared with invasive breast cancer. Each row is a probe, each column is a sample. A, expression pattern of the 216 probes that show differential expression between NlEpiER+ and NlEpiER−. ER+ indicates our 15 NlEpi ER+ samples, ER− indicates our 15 NlEpi ER− samples. B, expression of the same 216 probes in the breast cancer data set of Miller et al. (26). The probes (rows) are in same order in A and B. ER+ indicates the 213 ER+ cancers, ER− indicates the 34 ER− cancers. The black bars at the right indicate whether a gene's expression is statistically different between ER+ compared with ER− breast cancer at P < 0.05. Numbers across the top indicate the number of samples (NlEpi in A, cancers in B). Red indicates relative overexpression, green indicates relative underexpression.

Figure 3.

Gene expression in histologically normal breast epithelium (NlEpi) compared with invasive breast cancer. Each row is a probe, each column is a sample. A, expression pattern of the 216 probes that show differential expression between NlEpiER+ and NlEpiER−. ER+ indicates our 15 NlEpi ER+ samples, ER− indicates our 15 NlEpi ER− samples. B, expression of the same 216 probes in the breast cancer data set of Miller et al. (26). The probes (rows) are in same order in A and B. ER+ indicates the 213 ER+ cancers, ER− indicates the 34 ER− cancers. The black bars at the right indicate whether a gene's expression is statistically different between ER+ compared with ER− breast cancer at P < 0.05. Numbers across the top indicate the number of samples (NlEpi in A, cancers in B). Red indicates relative overexpression, green indicates relative underexpression.

Close modal

Next, we wished to investigate how genes previously reported to distinguish ER+ from ER− cancers are expressed in NlEpi. Therefore, we retrieved 3 gene lists reported previously to be differentially expressed in ER+ versus ER− breast cancers, and analyzed the genes' expression in our NlEpi data set (28–30). We found that 25% to 31% of the genes that distinguish ER+ from ER− cancers are also differentially expressed in NlEpiER+ compared with NlEpiER−. These results are shown in Figure 4. Specifically, in the Tozlu gene list (29), 11 of 35 (31%) genes overexpressed in ER+ cancers are also overexpressed in NlEpiER+ (FDR < 0.05). These genes include ESR1, GATA3, BCL2, MYB, AR, STC2, and ERBB4. In the Gruvberger gene list (28), 9 of the 36 (25%) genes distinguishing ER+ from ER− cancers are also differentially expressed (FDR < 0.05) in NlEpiER+ compared with NlEpiER−. These include ESR1, GATA3, STC2 (up in both NlEpiER+ and ER+ cancers), and CDH3, EGFR, S100A8 (up in both NlEpiER− and ER− cancers). In the van't Veer gene lists (30), 76 of the 248 (31%) genes whose expression distinguishes ER+ from ER− breast cancers are also differentially expressed (FDR < 0.05), in NlEpiER+ compared with NlEpiER−. These genes include classic components of the luminal A signature, ESR1 and GATA3, as well as ERBB4, BCL2, and MYB, which are all overexpressed in NlEpiER+; and genes marking basal tumors, CDH3, CXC3L1, FAPB7, and KRT23, which are all overexpressed in NlEpiER−.

Figure 4.

Expression of genes that distinguish ER+ from ER− invasive breast cancer in histologically normal breast epithelium (NlEpi). Each panel shows results from one comparison. Each comparison takes a gene list that distinguishes ER+ from ER− invasive breast cancer (28–30) and examines each gene's expression in the NlEpi samples. Within each panel, each row is a gene from the invasive cancer study and each column is an NlEpi sample. Across the top of each panel, ER+ indicates the 15 NlEpiER+ samples; ER− indicates the 15 NlEpiER− samples. A black horizontal bar at the right of each panel indicates whether a gene distinguishing ER+ from ER− invasive cancer is differentially expressed (P < 0.05) in NlEpiER+ versus NlEpiER−. Red indicates relative overexpression, green indicates relative underexpression. A, Tozlu gene list: 11 of the 35 genes show significantly higher expression in the “correct” direction in our NlEpiER+ and NlEpiER− sample. In this gene list, all 11 significant genes happen to be over expressed in the NlEpiER+ compared with NlEpiER− samples. B, Gruvberger gene list: 9 of the 36 genes show significantly higher expression in the “correct” direction in our NlEpiER+ and NlEpiER− samples. C, Van't Veer gene list: 76 of the 248 genes show significantly higher expression in the “correct” direction in our NlEpiER+ and NlEpiER− samples.

Figure 4.

Expression of genes that distinguish ER+ from ER− invasive breast cancer in histologically normal breast epithelium (NlEpi). Each panel shows results from one comparison. Each comparison takes a gene list that distinguishes ER+ from ER− invasive breast cancer (28–30) and examines each gene's expression in the NlEpi samples. Within each panel, each row is a gene from the invasive cancer study and each column is an NlEpi sample. Across the top of each panel, ER+ indicates the 15 NlEpiER+ samples; ER− indicates the 15 NlEpiER− samples. A black horizontal bar at the right of each panel indicates whether a gene distinguishing ER+ from ER− invasive cancer is differentially expressed (P < 0.05) in NlEpiER+ versus NlEpiER−. Red indicates relative overexpression, green indicates relative underexpression. A, Tozlu gene list: 11 of the 35 genes show significantly higher expression in the “correct” direction in our NlEpiER+ and NlEpiER− sample. In this gene list, all 11 significant genes happen to be over expressed in the NlEpiER+ compared with NlEpiER− samples. B, Gruvberger gene list: 9 of the 36 genes show significantly higher expression in the “correct” direction in our NlEpiER+ and NlEpiER− samples. C, Van't Veer gene list: 76 of the 248 genes show significantly higher expression in the “correct” direction in our NlEpiER+ and NlEpiER− samples.

Close modal

In each comparison between NlEpi gene expression and an external data set, many genes were not differentially expressed, and a small number were differentially expressed in the “wrong” direction. The proportion of these “wrongly directed” genes was always smaller than the proportion of “rightly directed” genes, and the proportion decreased as cancer sample size increased, suggesting that “wrongly directed” genes represent a combination of small sample size and microarray related artifacts. Specifically, in the Tozlu data set (29), there were 0 of 35 (0%) genes in the “wrong” direction compared with 11 of 35 (31%) in the “right” direction; in the Gruvberger data set (28), there were 5 of 36 (14%) compared with 9 of 36 (25%); in the van't Veer data set (30), there were 9 of 248 (4%) compared with 78 of 248 (31%); and in the Miller data set (26), there were 7 of 216 (3%) probes compared with 115 of 216 (53%).

In sum, we found overlap between NlEpi and the corresponding cancer subset in 25% to 53% of the genes or probes examined. Because the NlEpi was microdissected, but the cancer gene expression data sets were derived from cancers that were not microdissected, and thus contain heterogeneous cell types, the similarities between NlEpi and cancer gene expression may be underestimated.

We investigated gene expression in histologically normal breast epithelium microdissected from breasts with either ER+ or ER− breast cancers. We found that gene expression in NlEpiER+ differs from gene expression in NlEpiER−, and that gene expression in each type of NlEpi resembles expression of the corresponding type of invasive breast cancer (i.e., ER+ or ER−).

There are several possible (and not mutually exclusive) explanations for our findings. One explanation is that the NlEpi gene expression profile reflects the influence of the extracellular environment. The extracellular environment unquestionably plays a crucial role in cancer development, and likely acts prior to tumor invasion (24, 41, 42). A coincident cancer could influence NlEpi gene expression, although our previous findings suggest that neither contamination nor paracrine effects of a cancer would explain aberrant NlEpi gene expression (18). An intriguing possibility is that particular women may be predisposed to develop particular tumor subtypes due to inherited susceptibility genes. Evidence supporting this explanation includes the finding that gene expression is significantly influenced by germline polymorphisms (43), and the observation that components of breast cancer's prognostic signatures are present in non-neoplastic tissue of susceptible animals (44). Alternatively, the NlEpi gene expression profile could reflect a field effect in some part of the breast (for review see ref. 17) that results in a predisposition for that area to transform into a particular cancer subtype. We do not know the size of the field because we have not comprehensively sampled distinct geographic regions of the breast. This model might be particularly relevant to women who are BRCA gene mutation carriers. Finally, combined with our previous finding that gene expression in NlEpi from women at high risk of breast cancer can resemble gene expression in NlEpi from cancer-containing breasts, the present findings suggest that characteristic features of breast cancer subgroups may be detectable prior to histologic abnormalities. The NlEpi gene expression profile could show some of the earliest genomic changes of ER+ and ER− cancers. These changes may be present before histologic abnormality, either in cancer-initiating cells or in cells that eventually develop into the tumor. The relationship between the NlEpi abnormalities and any intratumoral heterogeneity that eventually develops is unknown (45, 46).

Regardless of which explanation(s) is correct, analysis of NlEpi gene expression and comparison to 4 independent cancer data sets offer insight into events occurring early in carcinogenesis. In particular, NlEpi expression of ER and associated cofactors and downstream signals seems fundamental to the development of ER+ cancers. Conversely, expression of these genes is absent in NlEpi associated with ER− cancers. This observation is consistent with the substantial benefits of anti-estrogens in preventing only ER+ cancers (1, 47, 48). ER− tumors have no signature feature analogous to ER expression, and may be more heterogeneous than ER+ tumors (49). A consequence is that prevention and treatment is less successful for ER− tumors than for ER+ tumors. More generally, the overlap between genes and probes differentially expressed in NlEpi subtypes and in the corresponding cancer subtype is striking (36%–53% of the genes or probes examined). This observation suggests that histologically normal epithelium may be quite genomically active and that additional genome-wide approaches to investigate the NlEpi landscape could be highly promising.

The various analyses of NlEpi gene expression results suggested several directions that would be worthwhile pursuing. Examination of the gene list itself suggested that immune-related genes' expression was generally increased in NlEpiER−, and was not due to an increase in number of infiltrating lymphocytes. One possible explanation is that the lymphocytes infiltrating NlEpiER− differ from those infiltrating NlEpiER+ and generate a distinct signature. Another possible explanation may be found in recent reports of high expression of immunomodulatory genes in ER− breast cancers composed predominantly of epithelial cells and lacking lymphocytic infiltrates (49, 50). The immunomodulatory gene expression in this setting may be associated with good prognosis, and is consistent with data from cell lines (51). Furthermore, immunoglobulin genes themselves have been reported to be expressed in breast cancer cells and breast cancer cell lines (52–55). This area would be promising to investigate further.

GSEA of our expression data identified several gene sets that may differ between NlEpiER+ and NlEpiER−. ER-related genes were overexpressed in NlEpiER+, which seems to confirm the validity of our data. Ribosome-related genes and keratin genes were overexpressed in NlEpiER−. NlEpiER−'s overexpression of ribosomal genes may reflect increased cell metabolism and protein synthesis, or be related to lineage-specific cell fate (56). Overrepresentation of ribosomal genes has been reported in basal breast cancers (which are ER−; ref. 57) and may be related to increased cell proliferation or MYC activity, which has also been reported by others (58). The explanation for overexpression of keratin genes is not clear, but one could speculate that it suggests alterations to the cells' structural integrity, or even a response to signals from the extracellular environment. Unlike the BADGE-derived gene list and the DAVID analyses, GSEA did not identify immune function as an overrepresented category; this may be due to separate analytic approaches.

This study has limitations. One is its small sample size, which is due to the logistic and technical challenges of obtaining and investigating fresh tissue of untreated patients. To counterbalance this limitation, we adopted a study design suitable for small sample size: the cases in each group were tightly age matched (because age exerts a major influence on gene expression in the breast; refs. 19–21), the normal epithelium was microdissected away from other cells to enrich for a homogeneous cell population, we utilized a statistical approach appropriate for small sample size, we validated the microarray results on independent NlEpi samples and compared them to external cancer data sets. The study's second limitation is the possibility that our NlEpi samples were contaminated by malignant cells. We made every effort to avoid this by having each section reviewed by a breast pathologist (A de las Morenas) to diagnose each area on every 10th slide from each section: both histologically normal epithelium and any abnormal areas (cancerous or not) were marked. Every effort was made to avoid any non-normal area. Thus, we think it is unlikely that the microdissected samples were contaminated to any substantial extent.

In conclusion, gene expression differs in NlEpi of breasts containing ER+ compared with ER− breast cancers. These gene expression differences reflect the gene expression differences in ER+ compared with ER− cancers. This finding implies that genomic changes characteristic of specific breast cancer subtypes may be detectable before histologic evidence of abnormalities. Future work could examine whether gene expression characteristic of the breast cancer intrinsic or molecular subtypes (31, 32, 59) is detectable in NlEpi, and whether NlEpi gene expression reflects gene expression in each subtype. This would be clinically relevant because each subtype has characteristic genomic (31, 32, 59), clinical, and pathologic features (30, 60). Therefore, normal epithelium gene expression profiles could help define a breast cancer subtype-specific risk signature, identify the initial genomic differences between subtypes, and suggest new targets for prevention and therapy, which is especially important for ER− cancers.

No potential conflicts of interest were disclosed.

C.L. Rosenberg was supported by PHS CA115434, the Avon Foundation, and the LaPann funds. X. Ge is supported by NIH grant GM083226.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Fisher
B
,
Costantino
JP
,
Wickerham
DL
,
Redmond
CK
,
Kavanah
M
,
Cronin
WM
, et al
Tamoxifen for prevention of breast cancer: report of the National Surgical Adjuvant Breast and Bowel Project P-1 Study
.
J Natl Cancer Inst
1998
;
90
:
1371
88
.
2.
Early Breast Cancer Trialists' Collaborative Group
. 
Tamoxifen for early breast cancer: an overview of the randomised trials
.
Lancet
1998
;
351
:
1451
67
.
3.
Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials
.
Lancet
2005
;
365
:
1687
717
.
4.
Ince
TA
,
Richardson
AL
,
Bell
GW
,
Saitoh
M
,
Godar
S
,
Karnoub
AE
, et al
Transformation of different human breast epithelial cell types leads to distinct tumor phenotypes
.
Cancer Cell
2007
;
12
:
160
70
.
5.
Sgroi
DC
. 
Preinvasive breast cancer
.
Annu Rev Pathol
;
5
:
193
221
.
6.
Deng
G
,
Lu
Y
,
Zlotnikov
G
,
Thor
AD
,
Smith
HS
. 
Loss of heterozygosity in normal tissue adjacent to breast carcinomas
.
Science
1996
;
274
:
2057
9
.
7.
Yan
PS
,
Venkataramu
C
,
Ibrahim
A
,
Liu
JC
,
Shen
RZ
,
Diaz
NM
, et al
Mapping geographic zones of cancer risk with epigenetic biomarkers in normal breast tissue
.
Clin Cancer Res
2006
;
12
:
6626
36
.
8.
Clarke
CL
,
Sandle
J
,
Jones
AA
,
Sofronis
A
,
Patani
NR
,
Lakhani
SR
. 
Mapping loss of heterozygosity in normal human breast cells from BRCA1/2 carriers
.
Br J Cancer
2006
;
95
:
515
9
.
9.
Grigoriadis
A
,
Mackay
A
,
Reis-Filho
JS
,
Steele
D
,
Iseli
C
,
Stevenson
BJ
, et al
Establishment of the epithelial-specific transcriptome of normal and malignant human breast cells based on MPSS and array expression data
.
Breast Cancer Res
2006
;
8
:
R56
.
10.
Larson
PS
,
de las Morenas
A
,
Cupples
LA
,
Huang
K
,
Rosenberg
CL
. 
Genetically abnormal clones in histologically normal breast tissue
.
Am J Pathol
1998
;
152
:
1591
8
.
11.
Tripathi
A
,
King
C
,
de la Morenas
A
,
Perry
VK
,
Burke
B
,
Antoine
GA
, et al
Gene expression abnormalities in histologically normal breast epithelium of breast cancer patients
.
Int J Cancer
2008
;
122
:
1557
66
.
12.
Larson
PS
,
de las Morenas
A
,
Bennett
SR
,
Cupples
LA
,
Rosenberg
CL
. 
Loss of heterozygosity or allele imbalance in histologically normal breast epithelium is distinct from loss of heterozygosity or allele imbalance in co-existing carcinomas
.
Am J Pathol
2002
;
161
:
283
90
.
13.
Larson
PS
,
Schlechter
BL
,
de las Morenas
A
,
Garber
JE
,
Cupples
LA
,
Rosenberg
CL
. 
Allele imbalance, or loss of heterozygosity, in normal breast epithelium of sporadic breast cancer cases and BRCA1 gene mutation carriers is increased compared with reduction mammoplasty tissues
.
J Clin Oncol
2005
;
23
:
8613
9
.
14.
Batchelder
AJ
,
Gordon-Weeks
AN
,
Walker
RA
. 
Altered expression of anti-apoptotic proteins in non-involved tissue from cancer-containing breasts
.
Breast Cancer Res Treat
2009
;
114
:
63
9
.
15.
Ding
L
,
Erdmann
C
,
Chinnaiyan
AM
,
Merajver
SD
,
Kleer
CG
. 
Identification of EZH2 as a molecular marker for a precancerous state in morphologically normal breast tissues
.
Cancer Res
2006
;
66
:
4095
9
.
16.
Van Der Auwera
I
,
Bovie
C
,
Svensson
C
,
Trinh
XB
,
Limame
R
,
van Dam
P
, et al
Quantitative methylation profiling in tumor and matched morphologically normal tissues from breast cancer patients
.
BMC Cancer;
10
:
97
.
17.
Heaphy
CM
,
Griffith
JK
,
Bisoffi
M
. 
Mammary field cancerization: molecular evidence and clinical importance
.
Breast Cancer Res Treat
2009
.
18.
Graham
K
,
de Las Morenas
A
,
Tripathi
A
,
King
C
,
Kavanah
M
,
Mendez
J
, et al
Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile
.
Br J Cancer.
19.
Anders
CK
,
Acharya
CR
,
Hsu
DS
,
Broadwater
G
,
Garman
K
,
Foekens
JA
, et al
Age-specific differences in oncogenic pathway deregulation seen in human breast tumors
.
PLoS One
2008
;
3
:
e1373
.
20.
Euhus
DM
,
Bu
D
,
Milchgrub
S
,
Xie
XJ
,
Bian
A
,
Leitch
AM
, et al
DNA methylation in benign breast epithelium in relation to age and breast cancer risk
.
Cancer Epidemiol Biomarkers Prev
2008
;
17
:
1051
9
.
21.
Yau
C
,
Fedele
V
,
Roydasgupta
R
,
Fridlyand
J
,
Hubbard
A
,
Gray
JW
, et al
Aging impacts transcriptomes but not genomes of hormone-dependent breast cancers
.
Breast Cancer Res
2007
;
9
:
R59
.
22.
King
C
,
Guo
N
,
Frampton
GM
,
Gerry
NP
,
Lenburg
ME
,
Rosenberg
CL
. 
Reliability and reproducibility of gene expression measurements using amplified RNA from laser-microdissected primary breast tissue with oligonucleotide arrays
.
J Mol Diagn
2005
;
7
:
57
64
.
23.
Sebastiani
P
,
Xe
H
,
Ramoni
MF
. 
Bayesian analysis of comparative microarray experiments by model averaging
.
Bayesian Analysis Journal
2006
;
1
:
707
32
.
24.
Emery
LA
,
Tripathi
A
,
King
C
,
Kavanah
M
,
Mendez
J
,
Stone
MD
, et al
Early dysregulation of cell adhesion and extracellular matrix pathways in breast cancer progression
.
Am J Pathol
2009
;
175
:
1292
302
.
25.
Singh
D
,
Febbo
PG
,
Ross
K
,
Jackson
DG
,
Manola
J
,
Ladd
C
, et al
Gene expression correlates of clinical prostate cancer behavior
.
Cancer Cell
2002
;
1
:
203
9
.
26.
Miller
LD
,
Smeds
J
,
George
J
,
Vega
VB
,
Vergara
L
,
Ploner
A
, et al
An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival
.
Proc Natl Acad Sci U S A
2005
;
102
:
13550
5
.
27.
Irizarry
RA
,
Bolstad
BM
,
Collin
F
,
Cope
LM
,
Hobbs
B
,
Speed
TP
. 
Summaries of Affymetrix GeneChip probe level data
.
Nucleic Acids Res
2003
;
31
:
e15
.
28.
Gruvberger
S
,
Ringner
M
,
Chen
Y
,
Panavally
S
,
Saal
LH
,
Borg
A
, et al
Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns
.
Cancer Res
2001
;
61
:
5979
84
.
29.
Tozlu
S
,
Girault
I
,
Vacher
S
,
Vendrell
J
,
Andrieu
C
,
Spyratos
F
, et al
Identification of novel genes that co-cluster with estrogen receptor alpha in breast tumor biopsy specimens, using a large-scale real-time reverse transcription-PCR approach
.
Endocr Relat Cancer
2006
;
13
:
1109
20
.
30.
van't Veer
LJ
,
Dai
H
,
van de Vijver
MJ
,
He
YD
,
Hart
AA
,
Mao
M
, et al
Gene expression profiling predicts clinical outcome of breast cancer
.
Nature
2002
;
415
:
530
6
.
31.
Perou
CM
,
Sorlie
T
,
Eisen
MB
,
van de Rijn
M
,
Jeffrey
SS
,
Rees
CA
, et al
Molecular portraits of human breast tumours
.
Nature
2000
;
406
:
747
52
.
32.
Sorlie
T
,
Perou
CM
,
Tibshirani
R
,
Aas
T
,
Geisler
S
,
Johnsen
H
, et al
Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications
.
Proc Natl Acad Sci U S A
2001
;
98
:
10869
74
.
33.
Wirapati
P
,
Sotiriou
C
,
Kunkel
S
,
Farmer
P
,
Pradervand
S
,
Haibe-Kains
B
, et al
Meta-analysis of gene expression profiles in breast cancer: toward a unified understanding of breast cancer subtyping and prognosis signatures
.
Breast Cancer Res
2008
;
10
:
R65
.
34.
Eeckhoute
J
,
Keeton
EK
,
Lupien
M
,
Krum
SA
,
Carroll
JS
,
Brown
M
. 
Positive cross-regulatory loop ties GATA-3 to estrogen receptor alpha expression in breast cancer
.
Cancer Res
2007
;
67
:
6477
83
.
35.
Wilson
BJ
,
Giguere
V
. 
Meta-analysis of human cancer microarrays reveals GATA3 is integral to the estrogen receptor alpha pathway
.
Mol Cancer
2008
;
7
:
49
.
36.
Mehra
R
,
Varambally
S
,
Ding
L
,
Shen
R
,
Sabel
MS
,
Ghosh
D
, et al
Identification of GATA3 as a breast cancer prognostic marker by global gene expression meta-analysis
.
Cancer Res
2005
;
65
:
11259
64
.
37.
Creighton
CJ
. 
A gene transcription signature associated with hormone independence in a subset of both breast and prostate cancers
.
BMC Genomics
2007
;
8
:
199
.
38.
Herschkowitz
JI
,
Simin
K
,
Weigman
VJ
,
Mikaelian
I
,
Usary
J
,
Hu
Z
, et al
Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors
.
Genome Biol
2007
;
8
:
R76
.
39.
Li
Y
,
Zou
L
,
Li
Q
,
Haibe-Kains
B
,
Tian
R
,
Desmedt
C
, et al
Amplification of LAPTM4B and YWHAZ contributes to chemotherapy resistance and recurrence of breast cancer
.
Nat Med
;
16
:
214
8
.
40.
Charafe-Jauffret
E
,
Ginestier
C
,
Monville
F
,
Finetti
P
,
Adelaide
J
,
Cervera
N
, et al
Gene expression profiling of breast cell lines identifies potential new basal markers
.
Oncogene
2006
;
25
:
2273
84
.
41.
Troester
MA
,
Lee
MH
,
Carter
M
,
Fan
C
,
Cowan
DW
,
Perez
ER
, et al
Activation of host wound responses in breast cancer microenvironment
.
Clin Cancer Res
2009
;
15
:
7020
8
.
42.
Ma
XJ
,
Dahiya
S
,
Richardson
E
,
Erlander
M
,
Sgroi
DC
. 
Gene expression profiling of the tumor microenvironment during breast cancer progression
.
Breast Cancer Res
2009
;
11
:
R7
.
43.
Schadt
EE
,
Monks
SA
,
Drake
TA
,
Lusis
AJ
,
Che
N
,
Colinayo
V
, et al
Genetics of gene expression surveyed in maize, mouse and man
.
Nature
2003
;
422
:
297
302
.
44.
Lukes
L
,
Crawford
NP
,
Walker
R
,
Hunter
KW
. 
The origins of breast cancer prognostic gene expression profiles
.
Cancer Res
2009
;
69
:
310
8
.
45.
Allred
DC
,
Wu
Y
,
Mao
S
,
Nagtegaal
ID
,
Lee
S
,
Perou
CM
, et al
Ductal carcinoma in situ and the emergence of diversity during breast cancer evolution
.
Clin Cancer Res
2008
;
14
:
370
8
.
46.
Park
SY
,
Gonen
M
,
Kim
HJ
,
Michor
F
,
Polyak
K
. 
Cellular and genetic diversity in the progression of in situ human breast carcinomas to an invasive phenotype
.
J Clin Invest
120
:
636
44
.
47.
Fisher
B
,
Costantino
JP
,
Wickerham
DL
,
Cecchini
RS
,
Cronin
WM
,
Robidoux
A
, et al
Tamoxifen for the prevention of breast cancer: current status of the National Surgical Adjuvant Breast and Bowel Project P-1 study
.
J Natl Cancer Inst
2005
;
97
:
1652
62
.
48.
Vogel
VG
,
Costantino
JP
,
Wickerham
DL
,
Cronin
WM
,
Cecchini
RS
,
Atkins
JN
, et al
Effects of tamoxifen vs raloxifene on the risk of developing invasive breast cancer and other disease outcomes: the NSABP Study of Tamoxifen and Raloxifene (STAR) P-2 trial
.
JAMA
2006
;
295
:
2727
41
.
49.
Speers
C
,
Tsimelzon
A
,
Sexton
K
,
Herrick
AM
,
Gutierrez
C
,
Culhane
A
, et al
Identification of novel kinase targets for the treatment of estrogen receptor-negative breast cancer
.
Clin Cancer Res
2009
;
15
:
6327
40
.
50.
Teschendorff
AE
,
Miremadi
A
,
Pinder
SE
,
Ellis
IO
,
Caldas
C
. 
An immune response gene expression module identifies a good prognosis subtype in estrogen receptor negative breast cancer
.
Genome Biol
2007
;
8
:
R157
.
51.
Neve
RM
,
Chin
K
,
Fridlyand
J
,
Yeh
J
,
Baehner
FL
,
Fevr
T
, et al
A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes
.
Cancer Cell
2006
;
10
:
515
27
.
52.
Babbage
G
,
Ottensmeier
CH
,
Blaydes
J
,
Stevenson
FK
,
Sahota
SS
. 
Immunoglobulin heavy chain locus events and expression of activation-induced cytidine deaminase in epithelial breast cancer cell lines
.
Cancer Res
2006
;
66
:
3996
4000
.
53.
Chen
Z
,
Gu
J
. 
Immunoglobulin G expression in carcinomas and cancer cell lines
.
FASEB J
2007
;
21
:
2931
8
.
54.
Qiu
X
,
Zhu
X
,
Zhang
L
,
Mao
Y
,
Zhang
J
,
Hao
P
, et al
Human epithelial cancers secrete immunoglobulin g with unidentified specificity to promote growth and survival of tumor cells
.
Cancer Res
2003
;
63
:
6488
95
.
55.
Zheng
J
,
Huang
J
,
Mao
Y
,
Liu
S
,
Sun
X
,
Zhu
X
, et al
Immunoglobulin gene transcripts have distinct VHDJH recombination characteristics in human epithelial cancer cells
.
J Biol Chem
2009
;
284
:
13610
9
.
56.
Young
DW
,
Hassan
MQ
,
Pratap
J
,
Galindo
M
,
Zaidi
SK
,
Lee
SH
, et al
Mitotic occupancy and lineage-specific transcriptional control of rRNA genes by Runx2
.
Nature
2007
;
445
:
442
6
.
57.
Chandriani
S
,
Frengen
E
,
Cowling
VH
,
Pendergrass
SA
,
Perou
CM
,
Whitfield
ML
, et al
A core MYC gene expression signature is prominent in basal-like breast cancer but only partially overlaps the core serum response
.
PLoS One
2009
;
4
:
e6693
.
58.
Alles
MC
,
Gardiner-Garden
M
,
Nott
DJ
,
Wang
Y
,
Foekens
JA
,
Sutherland
RL
, et al
Meta-analysis and gene set enrichment relative to er status reveal elevated activity of MYC and E2F in the “basal” breast cancer subgroup
.
PLoS One
2009
;
4
:
e4710
.
59.
Sorlie
T
,
Tibshirani
R
,
Parker
J
,
Hastie
T
,
Marron
JS
,
Nobel
A
, et al
Repeated observation of breast tumor subtypes in independent gene expression data sets
.
Proc Natl Acad Sci U S A
2003
;
100
:
8418
23
.
60.
van de Vijver
MJ
,
He
YD
,
van't Veer
LJ
,
Dai
H
,
Hart
AA
,
Voskuil
DW
, et al
A gene-expression signature as a predictor of survival in breast cancer
.
N Engl J Med
2002
;
347
:
1999
2009
.
61.
AJCC cancer staging manual
. 6th ed.
New York: Springer
; 
2002
.