Abstract
The normal duct-lobular system of the breast is lined by two epithelial cell types, inner luminal secretory cells and outer contractile myoepithelial cells. We have generated comprehensive expression profiles of the two normal cell types, using immunomagnetic cell separation and gene expression microarray analysis. The cell-type specificity was confirmed at the protein level by immunohistochemistry in normal breast tissue. New prognostic markers for survival were identified when the luminal- and myoepithelial-specific molecules were evaluated on breast tumor tissue microarrays. Nuclear expression of luminal epithelial marker galectin 3 correlated with a shorter overall survival in these patients, and the expression of SPARC (osteonectin), a myoepithelial marker, was an independent marker of poor prognosis in breast cancers as a whole. These data provide a framework for the interpretation of breast cancer molecular profiling experiments, the identification of potential new diagnostic markers, and development of novel indicators of prognosis.
INTRODUCTION
The terminal duct-lobular unit of the breast, the structure from which the majority of breast cancers arise, is composed of two types of epithelial cells. The inner or luminal cells, which are potential milk secreting cells, are surrounded by an outer basal layer of contractile myoepithelial cells. Most breast carcinomas express phenotypic markers that are consistent with an origin from luminal cells (1). The biology of normal luminal cells is the key to understanding breast cancer initiation, with genetic alterations occurring in both normal cells and epithelial hyperplastic lesions driving the earliest stages of progression (2, 3, 4).
Despite the luminal origin of breast tumors, a subset of invasive ductal carcinomas in the breast also express markers specific for myoepithelial cells (5, 6, 7, 8, 9, 10). Recent studies using cDNA microarray analysis of primary human breast tumors have also identified a basal-like subset of invasive ductal carcinomas (11) based on their patterns of gene expression. These tumors exhibiting a basal phenotype have been reported to have an aggressive phenotype and poorer prognosis for the patient (12, 13, 14). Although it is interesting to speculate on the cells from which these tumors may be derived, there is little evidence currently to suggest a myoepithelial origin for basal-like breast cancers.
Although tumor expression profiling data have begun to unravel the complexity of breast cancer, corresponding transcriptional studies of the two normal epithelial cell types have lagged behind. Immunomagnetic methods have been developed for large-scale purification of normal human luminal and myoepithelial breast cells from reduction mammoplasty samples, amenable to detailed molecular analysis (15). We report here the cDNA microarray analysis of separated normal luminal and myoepithelial cells. The objectives of this study were to provide a baseline reference dataset to help understand preexisting and forthcoming tumor expression profiles, to determine whether novel cell-type specific markers can be used for tumor subclassification and for differential diagnosis, and to identify new predictive and prognostic markers that could include potential targets for future therapy. Comparison of the transcriptional profiles of normal adult human luminal and myoepithelial cells does identify novel markers, including some which provide significant prognostic information for primary breast cancers.
MATERIALS AND METHODS
Cell Preparations.
Purified populations of ∼107 normal human breast luminal and myoepithelial cells were prepared from individual reduction mammoplasty samples (16) with modifications to enhance purity (15). Briefly, the different breast cells were immunomagnetically sorted from primary cultures using combined positive magnetic activated cell sorting selection using antibodies against the luminal epithelial membrane marker EMA (rat monoclonal ICR2) and the myoepithelial membrane antigen CD10 (mouse monoclonal DAKO-CALLA clone SS2/36) followed by negative Dynabead selection using mouse monoclonal antibodies against a different myoepithelial cell-surface antigen (Santa Cruz Biotechnology anti-β-4-integrin clone A9) and another luminal antigen (Dako BerEp-4 Epithelial Antigen).
cDNA Microarray Hybridizations.
The cDNA microarrays used in this study were constructed at the Sanger Centre as part of the Ludwig Institute for Cancer Research/Cancer Research UK Microarray Consortium, containing 9,930 sequence-validated cDNA clones representing ∼6000 unique human gene sequences (see web site for details and protocols).8 Clone annotation was based on the National Center for Biotechnology Information 34 assembly of the human genome with the Sanger Clone IDs mapping to Ensembl.9
RNA was prepared according to standard protocols (17), and preparations from nine luminal and nine myoepithelial samples (four of which were paired samples from the same patient) were individually hybridized against a common breast reference RNA in duplicated dye-swap experiments. The breast reference RNA was created by combining equal quantities of total RNA from the following breast cell lines, MDA-MB-361, MDA-MB-231, MDA-MB-435, BT20, HBL100, GI101, BT474, T47D, MCF7, SKBR3, ZR-75-1, and MDA-MB-468.
Image Processing and Data Analysis.
Fluorescent images of hybridized microarrays were captured using either the GenePix 4000 (Axon) dual color confocal laser scanner and GenePix software or a GSI Lumonics 4000 scanner and ScanArray software and quantitated and background subtracted using GSI Lumonics Quantarray 3.0 software. The log expression ratios were normalized using lowess local regression (18) using the statistical platform S-Plus version 6.1 for Windows (Insightful). All raw fluorescence intensity data and microarray image files have been deposited within the public repository for microarray based gene expression data ArrayExpress,10 complying with minimum information about a microarray experiment (MIAME) standards (19), with the accession number E-MEXP-36.
Genes were filtered from the total set of 9930 by exclusion because of low mean intensity values (<20th percentile of highest intensity across all arrays), consistent local artifact, and low mean absolute deviation (<0.3). This resulted in a filtered gene list of 1896 targets for unsupervised and supervised analysis. Any remaining missing values were imputed using the k-mean nearest neighbor method in Statistical Analysis of Microarrays (SAM).
Unsupervised hierarchical clustering was carried out using “hclust” in S-Plus, as well as the Cluster package, and plotted with Treeview.11 Differentially expressed genes were identified by application of the SAM (version 1.12) Excel add-in.12 Supervised analysis was carried by the nearest shrunken centroid classification for class prediction using the Prediction Analysis of Microarrays package,13 implemented in R (1.6.2).14
Reverse Transcription-PCR.
Ten μg total RNA was reverse transcribed from an oligo-dT primer under conventional conditions (Superscript II; Life Technologies, Inc.). The resulting reaction was diluted 10-fold in water, and 2 μl were used as a template for PCR amplification. PCR was performed under standard conditions in 50 μl for 25–40 cycles (primer sequences and cycle numbers are given in Supplementary Table S4). Products were resolved by standard agarose gel electrophoresis. Differential expression was confirmed by densitometry of ethidium bromide staining on conventional agarose gels using NIH Image software.
Immunohistochemistry.
Antibodies to differentially expressed genes were obtained commercially where available. Sections were dewaxed in xylene overnight, taken to ethanol (99.7–100% v/v), and blocked for endogenous peroxidase in methanol for 10 min. Sections were subjected to specific high temperature antigen retrieval techniques, blocked in normal horse serum (2.5%; Vector Labs) for 20 min, and primary antibodies applied for 30 min. SPARC was subjected to 2 min of pressure cooking in citrate buffer (pH 6.0), 1/5 dilution (Novocastra); S100A2 received 2 min of pressure cooking, dilution 1/100 (Dako); maspin received 18 min of microwaving in Dako Target Retrieval Solution (pH 6.0), dilution 1/100 (Novocastra); galectin 3 (LGALS3) received 2 min of pressure cooking, dilution 1/750 (Novocastra); CLDN4 received 18 min of microwaving, dilution 1/100 (Zymed); CD24 received 3 min of pressure cooking, dilution 1/100, (Serotech). All antibodies were diluted in Tris-buffered saline. The primary antibodies were rinsed off in 0.1% Tween 20 in Tris-buffered saline, developed using Vectastain Universal ABC kit (Vector Labs) and visualized with 3,3′-diaminobenzidine (Dako).
Tissue Microarrays.
Breast tumors were selected from the archives of the Istituto di Anatomia Patologica (Sassari, Italy), with appropriate local ethical committee approval. A total of 566 unselected primary breast cancers comprising all grades and types was retrieved and reviewed by an experienced pathologist. Up to 10 years clinical follow-up data were available for all cases (3–110 months, mean = 62 months, median = 73 months). The paraffin blocks were marked and punched with 0.6-mm2 tumor cores taken from the donor blocks for inclusion in duplicate recipient tissue array blocks using a precision tissue array instrument (Beecher Instruments; Ref. 20).
Survival analysis was carried out using the statistical platform S-Plus version 6.1 for Windows (Insightful) on our right-censored clinical follow-up data from the cases on the tissue microarray, using the log-rank test and the Cox proportional hazards model.
RESULTS
cDNA Microarray Analysis.
The microarray data from normal breast luminal and myoepithelial cells were firstly analyzed in an unsupervised manner to determine the innate differences between the cell preparations and secondly using supervised algorithms to identify the most discriminatory genes associated with each cell types. Using the normalized data from the 1896 gene list, unsupervised hierarchical clustering on the normal breast luminal and myoepithelial preparations was carried out (Fig. 1,A). The sample dendrogram clearly separates two main branches each consisting of one of the two cell types, exemplifying the inherent differences between the two epithelial cell types of the breast and the consistency of the cell separation and microarray analysis methods used. Clustering of the 1896 gene set also identifies luminal and myoepithelial-specific gene clusters, representative regions of which are shown (Fig. 1, B and C).
A list of statistically significant genes which were differentially expressed between the two cell types were identified using SAM. With a false discovery rate of 1%, 132 myoepithelial and 77 luminal differential genes were found (Supplementary Table S1). Expression ratios and SAM scores for the top 50 most differentially expressed genes in each cell type are shown (Table 1). The data were also analyzed by a supervised classification method to identify those genes which are the most predictive of each cell type, using a class prediction algorithm based on the nearest shrunken centroid method (21). Using Prediction Analysis of Microarrays, a classifier was first trained using the 1896 gene set, before cross-validation and plotting of the cross-validated error curves, to determine the threshold (amount of shrinkage), which gives the minimum cross-validated error rate (Supplementary Figure S2). Applying a threshold of 2.7 gives a misclassification rate of 0 using 42 cDNA clones corresponding to 33 unique genes (Fig. 2,A, Supplementary Figure S3). The cross-validated class probabilities by sample type (Fig. 2 B) demonstrate that these 33 genes accurately classify all of the samples into their correct classes (cell type).
Confirmation of Differential Expression.
To confirm the specific identity and differential expression of our cell type specific markers, semiquantitative reverse transcription-PCR was carried out on the four patient-matched luminal and myoepithelial samples used in our microarray analysis. Differential expression was confirmed for 56 of 62 (90%) genes by reverse transcription-PCR (primers, cycle numbers, and confirmatory data for the 66 unique genes from Table 1 are given in Supplementary Table S4).
These genes were next examined at the protein level in paraffin-embedded archival samples by immunohistochemistry, where the availability of appropriate antibodies made this possible. Differential luminal expression in normal breast lobules is shown for claudin 4 (CLDN4), CD24, and LGALS3 proteins. CLDN4 staining shows a strong membrane component of luminal epithelial cells consistent with its role in tight junction adhesion and does not stain the basement membrane of these cells (Fig. 3,A). CD24 stains the cytoplasmic compartment of normal luminal epithelial cells as well as the apical cell surface (Fig. 3,B). LGALS3 stains the nucleus and cytoplasm of luminal cells differentially compared with myoepithelial cells and also stains intralobular fibroblasts in the breast (Fig. 3 C).
Differential myoepithelial expression in normal breast lobules is demonstrated for S100A2, SERPINB5, and SPARC proteins. S100A2 shows a strong nuclear and cytoplasmic staining specifically in the myoepithelial cells, with no expression in the stromal cells (Fig. 3,E). Maspin (SERPINB5) expression is also restricted to myoepithelial cells, with strong nuclear and cytoplasmic staining (Fig. 3,F). Osteonectin (SPARC) stains the cytoplasmic compartment of myoepithelial cells differentially to luminal cells and also stains inter- and intralobular fibroblasts (Fig. 3 G).
Evaluation of Prognostic Significance Using Tissue Microarrays.
To evaluate whether the expression of the luminal and myoepithelial markers demonstrated any correlation with prognosis in breast cancer, immunohistochemistry was carried out on a tissue microarray consisting of 566 primary breast tumors of all types and grades for which outcome data in the form of overall survival was available. A summary of the results of univariate analysis is given in Table 2.
The luminal epithelial maker LGALS3 showed a loss of expression in approximately one-half of all assessable tumors on the tissue microarray, with 213 of 431 cases (49.4%) LGALS3 positive. This loss of expression did not correlate with prognostic outcome in all tumors (P = 0.597); however, when the subcellular localization of LGALS3 was evaluated, tumors with nuclear positivity (9 of 431, 2.1%, Fig. 3,D) showed a statistically significant (P = 0.00895, log-rank test) poorer overall survival than negative cases or those for whom expression was restricted to the cytoplasm (Fig. 4 A). All of these nuclear positive cases were also positive for cyclin D1 (data not shown). There was no correlation between LGALS3 expression and age, grade, size, ER, progesterone receptor, or tumor type (data not shown). In multivariate analysis, loss of LGALS3 expression just failed to reach formal statistical significance as an independent prognostic factor (P = 0.051).
Loss of expression from normal luminal epithelial cells to invasive cancer was also seen for other luminal markers tested. CLDN4 was positive in 245 of 331 tumors (74.0%) and CD24 positive in 126 of 426 tumors (29.6%). Although loss of expression shows a clear association with breast cancer development, neither marker conferred any independent prognostic information, nor were they correlated with age, grade, size, ER, progesterone receptor, or tumor type (data not shown).
The myoepithelial marker osteonectin (SPARC) was found to be positive in 17 of 350 (4.9%) assessable tumor cores (Fig. 3,H). When Kaplan-Meier survival curves for overall survival were plotted, a clear poor prognosis was observed for SPARC-positive tumors. This was found to be statistically significant by the log-rank test in all tumors (P = 0.00844, Fig. 4,B). By multivariate analysis (Table 3), SPARC was found to be an independent prognostic factor (P = 0.0057, Cox proportional hazards), conferring the highest relative risk of all factors fitted (6.88, 95% confidence interval 1.75–27.04), although the confidence interval is large, given the small number of positive tumors.
Other myoepithelial markers tested showed a proportion of breast tumors expressing these basal proteins. S100A2 was positive in 8 of 443 cases (1.8%), whereas maspin (SERPINB5) was positive in 108 of 333 cases (32.4%). S100A2 conferred no independent prognostic information, and whereas maspin expression appeared to indicate a better overall survival, this did not reach statistical significance (P = 0.092). No association with clinicopathological variables or survival was observed with maspin expression or its subcellular localization.
DISCUSSION
Expression profiling of purified normal luminal epithelial and myoepithelial cells in the breast provides a basis for interpretation of the large amount of microarray data currently being generated for breast tumors. Identification of subclassifications of breast tumors termed basal-like and luminal-like (11), which differ in their clinical outcome (13) clearly demonstrates the need for accurate determination of the patterns of gene expression in these normal cells of the breast. Our observations of myoepithelial-specific genes such as S100A2, LGALS7, CSTA, and BPAG1 in our cell preparations, which also cluster together to define the basal-like group in these tumor studies (22), demonstrate the use of such an approach and will help to accurately classify the proposed breast cancer stratifications. Our normal luminal cell preparations are hormone receptor negative, in common with the vast majority of normal luminal epithelial cells in the breast. It is therefore not surprising that there is little overlap between our luminal epithelial profiles and those of the luminal-like tumors in these classifications (11, 13), which are almost exclusively ER positive and the genes associated with them ER-responsive genes. The cell-type specific expression profiles in the normal breast also provides a baseline for studies investigating breast cancer progression (23), outcome prediction (24, 25), and local (26) and distant metastasis (27).
Our data also help to clarify previous transcriptional studies using human mammary epithelial cells (HMECs) as the normal component. Such cells were derived from cultures of unsorted normal breast epithelium. As it is the myoepithelially derived cells that have the greatest proliferative potential in vitro (28), HMECs are essentially myoepithelial-like (i.e., basal) in phenotype (29). Consequently, when HMECs are compared with (luminally derived) breast cancer cells or solid tumors, the markers that emerge as differentially expressed are essentially those represented in our luminal versus myoepithelial lists, rather than specific tumor markers per se, as has been inferred (30).
Generation of a larger and more accurate panel of markers for differentiating the normal epithelial cell types will have a major impact in patient management. Routine histopathological discrimination of in situ from invasive cancer in the breast uses the retention of the myoepithelial layer as a critical diagnostic criterion, with huge implications for planning appropriate surgery. Improving the differential diagnosis from, for example, small needle core biopsies using novel myoepithelial-specific markers will make an important contribution to clinical practice.
Genes differentially expressed between luminal and myoepithelial cells were also found to confer independent prognostic information for breast cancer patients. Galectin 3 is a gene expressed in normal luminal epithelial but not myoepithelial cells. Galectin 3 is thought to regulate many biological processes and has been associated with ERBB2 expression (31). Down-regulation of galectin 3 has been implicated in breast cancer progression (32). Tissue microarray analysis showed that loss of galectin 3 expression by malignant epithelial cells was seen in approximately one-half of all tumors assessed. Of particular interest is the correlation of nuclear LGALS3 expression and poor outcome. Nuclear galectin 3 has been reported to have a growth promoting activity through cyclin D1 induction (33), and all of these (9 of 431) tumors were also positive for cyclin D1. This observation demonstrates the ability of our approach to identify luminal epithelial-specific markers in the normal breast and use them to monitor breast cancer progression and establish their relationship with patient prognosis.
SPARC (osteonectin) modulates cellular interaction with the extracellular matrix by its binding to structural matrix proteins such as collagen and vitronectin. It was found by our cDNA microarray analysis to be differentially expressed between myoepithelial and luminal cells. Up-regulation of SPARC has been associated with increased invasive potential in breast cancer cells in vitro (34) and has been identified as a breast tumor marker by serial analysis of gene expression analysis (35). SPARC was found to be positive in malignant epithelial cells in 4.9% of all breast tumors examined in our tissue microarray study. SPARC-positive tumors showed a highly significant poorer overall survival in breast cancer patients by univariate analysis (P = 0.00844) and was an independent prognostic indicator by multivariate analysis (P = 0.0057). Aberrant expression of myoepithelial proteins has been recognized in breast tumors for some time, and data demonstrating poor prognosis of these tumors have largely been associated with ER- and lymph node-negative tumors (13, 14). Here, we present the identification of a protein expressed in the myoepithelial cells but not the luminal cells of the normal breast and whose expression in breast tumors confers a very poor clinical outcome, regardless of ER or lymph node status.
The clinical use of expanding the list of normal cell-type specific genes not only provides novel diagnostic and prognostic markers but will assist in understanding the multistep progression from epithelial cells to invasive cancer in the breast. Recent studies of breast cancer stem cell phenotypes (36) have identified CD24−/CD44+ cells as having tumorigenic potential, these markers being associated respectively with normal luminal and normal myoepithelial cells. At the other end of the progression spectrum, among 17 reporter genes associated with metastasis (27) were up-regulation of COL1A1 and down-regulation of MYLK, two of the most discriminant myoepithelial genes in the present study. Collectively, these observations highlight the importance of the normal breast cell expression profiles in understanding breast cancer. They also provide a unique dataset in which targets for future therapeutic intervention may be identified.
Grant support: The microarray consortium is funded by the Wellcome Trust, Cancer Research UK and the Ludwig Institute of Cancer Research. J. Reis-Filho is the recipient of the Gordon Signy International Fellowship Award and is partially supported by Ph.D. Grant SFRH/BD/5386/2001 from the Fundação para a Ciência e a Tecnologia, Portugal, and Programa Operacional Ciência, Tecnologia e Inovação POCTI/CBO/45157/2002. A. Cossu, M. Budroni, and G. Palmieri are partially funded by Regione Autonoma della Sardegna.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: C. Jones and A. Mackay contributed equally to this work. The human I.M.A.G.E. cDNA clone collection was obtained from the Medical Research Council Human Genome Mapping Project Resource Centre (Hinxton, United Kingdom). All cDNA clone resequencing was performed by Team 56 at the Sanger Institute.
Requests for reprints: Sunil R. Lakhani, The Breakthrough Toby Robins Breast Cancer Research Centre Institute of Cancer Research, Fulham Road, London SW3 6JB, United Kingdom. Phone: 020-7153-5525; Fax: 020-7153-5533; E-mail: [email protected]
Internet address: http://www.sanger.ac.uk/Projects/Microarrays/.
Internet address: http://www.ensembl.org.
Internet address: http://www.ebi.ac.uk/arrayexpress.
Internet address: http://rana.lbl.gov/EisenSoftware.htm.
Internet address: http://www-stat.stanford.edu/∼tibs/SAM/.
Internet address: http://www-stat.stanford.edu/∼tibs/PAM/.
Internet address: http://www.r-project.org.
Sanger ID . | Ensembl . | Gene ID . | Statistical Analysis of Microarray score . | Fold difference . |
---|---|---|---|---|
Luminal genes | ||||
741497_A | ENSG00000148346 | LCN2 | −3.64 | 0.18 |
741497_C | ENSG00000148346 | LCN2 | −3.80 | 0.18 |
111213_B | ENSG00000167755 | KLK6 | −2.83 | 0.20 |
357842_A | ENSG00000175315 | CST6 | −2.98 | 0.21 |
376599_A | ENSG00000175315 | CST6 | −2.71 | 0.23 |
204335_A | ENSESTG00000020862 | CD24 | −4.71 | 0.26 |
341021_A | ENSG00000008517 | NK4 | −3.00 | 0.26 |
724533_B | ENSG00000101443 | WFDC2 | −2.81 | 0.28 |
153508_A | ENSG00000186996 | CLDN4 | −3.74 | 0.29 |
809923_A | ENSESTG00000024749 | TNFAIP2 | −3.12 | 0.30 |
346130_B | ENSG00000186996 | CLDN4 | −3.81 | 0.31 |
346510_A | ENSG00000186996 | CLDN4 | −4.07 | 0.31 |
25433_A | ENSG00000143153 | ATP1B1 | −5.66 | 0.32 |
1257299_A | ENSG00000163975 | MFI2 | −3.21 | 0.35 |
767629_A | ENSESTG00000006616 | RARRES1 | −2.90 | 0.36 |
137018_A | ENSG00000012171 | SEMA3B | −4.07 | 0.37 |
180786_A | ENSG00000006210 | CX3CL1 | −3.08 | 0.38 |
183573_A | ENSG00000006210 | CX3CL1 | −2.71 | 0.40 |
153925_B | ENSG00000052344 | PRSS8 | −4.82 | 0.40 |
382660_A | ENSESTG00000003790 | KIAA1641 | −3.71 | 0.41 |
201516_B | ENSG00000184930 | MTND4 | −3.32 | 0.41 |
182635_A | ENSG00000070404 | FSTL3 | −2.54 | 0.42 |
151761_A | ENSG00000185499 | MUC1 | −4.30 | 0.43 |
165830_B | ENSG00000006210 | CX3CL1 | −2.63 | 0.43 |
357613_B | ENSG00000184930 | MTND4 | −3.87 | 0.44 |
308173_A | ENSG00000184689 | MTND6 | −2.73 | 0.45 |
233818_A | ENSG00000183503 | MTCO2 | −3.94 | 0.45 |
809822_A | ENSG00000129353 | CTL2 | −4.79 | 0.46 |
324225_B | ENSG00000133321 | RARRES3 | −3.72 | 0.47 |
34461_A | ENSG00000131981 | LGALS3 | −4.39 | 0.47 |
149218_A | ENSG00000184930 | MTND4 | −2.73 | 0.48 |
41288_A | ENSG00000185215 | TNFAIP2 | −3.27 | 0.48 |
293168_A | ENSG00000184689 | MTND6 | −2.88 | 0.49 |
24593_B | ENSG00000129353 | CTL2 | −3.32 | 0.49 |
156398_A | unknown | −2.83 | 0.49 | |
365623_A | ENSG00000184930 | MTND4 | −3.47 | 0.50 |
782280_B | HDCRA | −3.04 | 0.50 | |
320142_A | ENSG00000184316 | MTATP6 | −2.66 | 0.50 |
163072_A | ENSG00000103335 | KIAA0233 | −2.80 | 0.50 |
167150_A | ENSG00000169246 | KIAA0220 | −2.83 | 0.50 |
302373_C | ENSG00000143153 | ATP1B1 | −3.36 | 0.51 |
1659533_B | ENSG00000124159 | MATN4 | −3.61 | 0.51 |
307769_A | ENSG00000122034 | GTF3A | −3.27 | 0.52 |
244297_A | ENSG00000141934 | PPAP2C | −3.88 | 0.52 |
262390_A | ENSG00000182240 | BACE2 | −4.40 | 0.52 |
772402_A | ENSG00000099860 | GADD45B | −2.55 | 0.53 |
294203_A | ENSG00000109062 | SLC9A3R1 | −3.06 | 0.53 |
796298_B | ENSG00000110721 | CHK | −3.54 | 0.54 |
49200_A | ENSG00000074416 | MGLL | −2.84 | 0.55 |
123164_A | ENSG00000065361 | ERBB3 | −2.83 | 0.57 |
Myoepithelial genes | ||||
298509_A | ENSG00000108821 | COL1A1 | 3.54 | 9.23 |
214997_A | ENSG00000108821 | COL1A1 | 3.58 | 9.22 |
323321_A | ENSESTG00000026432 | COL1A1 | 3.26 | 8.29 |
341752_A | ENSG00000178939 | LGALS7 | 4.17 | 7.97 |
810813_B | ENSG00000160675 | S100A2 | 3.66 | 7.48 |
300737_A | ENSG00000065534 | MYLK | 2.50 | 7.11 |
188036_C | ENSG00000151914 | BPAG1 | 2.94 | 6.85 |
188036_A | ENSG00000151914 | BPAG1 | 3.39 | 6.38 |
264525_A | ENSG00000065534 | MYLK | 3.24 | 5.92 |
249977_A | ENSG00000100234 | TIMP3 | 3.18 | 5.59 |
270187_A | ENSG00000102265 | TIMP1 | 3.57 | 5.04 |
310019_A | ENSG00000065534 | MYLK | 2.40 | 4.98 |
stSG89269 | ENSG00000100234 | TIMP3 | 2.78 | 4.79 |
266325_A | ENSG00000102265 | TIMP1 | 3.42 | 4.79 |
327165_A | ENSG00000166628 | SERPINB5 | 3.14 | 4.72 |
263278_A | ENSG00000113140 | SPARC | 3.89 | 4.66 |
1404774_A | ENSG00000087494 | PTHLH | 2.42 | 4.55 |
346130_A | ENSG00000169474 | SPRR1B | 2.73 | 4.45 |
51003_A | ENSESTG00000011065 | DKK3 | 2.88 | 4.11 |
359747_A | ENSG00000178939 | LGALS7 | 3.01 | 4.09 |
813614_C | ENSG00000169474 | SPRR1B | 2.91 | 3.91 |
141815_A | ENSG00000101384 | JAG1 | 3.20 | 3.89 |
302294_A | ENSG00000166033 | PRSS11 | 3.10 | 3.65 |
346610_A | ENSG00000175793 | SFN | 4.86 | 3.56 |
324700_A | ENSG00000149968 | MMP3 | 2.50 | 3.34 |
38967_A | ENSG00000166033 | PRSS11 | 3.39 | 3.30 |
810017_A | ENSG00000104368 | PLAT | 3.12 | 3.26 |
111081_A | ENSG00000169688 | MT1F | 3.08 | 3.21 |
809810_A | ENSG00000137699 | TRIM29 | 2.68 | 3.19 |
Sanger ID | Ensembl | Gene ID | Statistical Analysis of Microarray score | Fold difference |
1840568_A | ENSG00000184330 | S100A7 | 2.75 | 3.14 |
347284_A | ENSG00000105974 | CAV1 | 3.33 | 3.13 |
788192_A | ENSG00000087494 | PTHLH | 3.24 | 3.09 |
195273_A | ENSG00000139219 | COL1A1 | 2.55 | 3.02 |
293270_A | ENSG00000107987 | AKR1C2 | 3.21 | 2.94 |
728114_A | ENSG00000166899 | FABP5 | 2.46 | 2.93 |
1568010_A | ENSG00000109321 | AREG | 2.85 | 2.90 |
137665_A | ENSG00000073282 | TP73L | 2.89 | 2.82 |
364409_A | ENSG00000121552 | CSTA | 2.54 | 2.80 |
22117_A | ENSG00000136699 | FLJ20297 | 3.75 | 2.77 |
813614_A | ENSG00000169474 | SPRR1B | 2.61 | 2.69 |
297392_A | ENSG00000187193 | MT1L | 3.79 | 2.61 |
127982_A | ENSG00000104368 | PLAT | 2.84 | 2.58 |
148057_A | ENSG00000117318 | ID3 | 3.74 | 2.56 |
66946_A | ENSG00000169688 | MT1B | 3.13 | 2.54 |
26285_A | ENSG00000091409 | ITGA6 | 3.30 | 2.50 |
267759_A | ENSG00000105974 | CAV1 | 2.53 | 2.49 |
149370_A | ENSG00000109861 | CTSC | 4.47 | 2.45 |
274164_A | ENSG00000187193 | MT1F | 4.07 | 2.44 |
292784_A | ENSG00000109861 | CTSC | 3.69 | 2.43 |
418137_A | ENSG00000105281 | SLC1A5 | 5.24 | 2.42 |
Sanger ID . | Ensembl . | Gene ID . | Statistical Analysis of Microarray score . | Fold difference . |
---|---|---|---|---|
Luminal genes | ||||
741497_A | ENSG00000148346 | LCN2 | −3.64 | 0.18 |
741497_C | ENSG00000148346 | LCN2 | −3.80 | 0.18 |
111213_B | ENSG00000167755 | KLK6 | −2.83 | 0.20 |
357842_A | ENSG00000175315 | CST6 | −2.98 | 0.21 |
376599_A | ENSG00000175315 | CST6 | −2.71 | 0.23 |
204335_A | ENSESTG00000020862 | CD24 | −4.71 | 0.26 |
341021_A | ENSG00000008517 | NK4 | −3.00 | 0.26 |
724533_B | ENSG00000101443 | WFDC2 | −2.81 | 0.28 |
153508_A | ENSG00000186996 | CLDN4 | −3.74 | 0.29 |
809923_A | ENSESTG00000024749 | TNFAIP2 | −3.12 | 0.30 |
346130_B | ENSG00000186996 | CLDN4 | −3.81 | 0.31 |
346510_A | ENSG00000186996 | CLDN4 | −4.07 | 0.31 |
25433_A | ENSG00000143153 | ATP1B1 | −5.66 | 0.32 |
1257299_A | ENSG00000163975 | MFI2 | −3.21 | 0.35 |
767629_A | ENSESTG00000006616 | RARRES1 | −2.90 | 0.36 |
137018_A | ENSG00000012171 | SEMA3B | −4.07 | 0.37 |
180786_A | ENSG00000006210 | CX3CL1 | −3.08 | 0.38 |
183573_A | ENSG00000006210 | CX3CL1 | −2.71 | 0.40 |
153925_B | ENSG00000052344 | PRSS8 | −4.82 | 0.40 |
382660_A | ENSESTG00000003790 | KIAA1641 | −3.71 | 0.41 |
201516_B | ENSG00000184930 | MTND4 | −3.32 | 0.41 |
182635_A | ENSG00000070404 | FSTL3 | −2.54 | 0.42 |
151761_A | ENSG00000185499 | MUC1 | −4.30 | 0.43 |
165830_B | ENSG00000006210 | CX3CL1 | −2.63 | 0.43 |
357613_B | ENSG00000184930 | MTND4 | −3.87 | 0.44 |
308173_A | ENSG00000184689 | MTND6 | −2.73 | 0.45 |
233818_A | ENSG00000183503 | MTCO2 | −3.94 | 0.45 |
809822_A | ENSG00000129353 | CTL2 | −4.79 | 0.46 |
324225_B | ENSG00000133321 | RARRES3 | −3.72 | 0.47 |
34461_A | ENSG00000131981 | LGALS3 | −4.39 | 0.47 |
149218_A | ENSG00000184930 | MTND4 | −2.73 | 0.48 |
41288_A | ENSG00000185215 | TNFAIP2 | −3.27 | 0.48 |
293168_A | ENSG00000184689 | MTND6 | −2.88 | 0.49 |
24593_B | ENSG00000129353 | CTL2 | −3.32 | 0.49 |
156398_A | unknown | −2.83 | 0.49 | |
365623_A | ENSG00000184930 | MTND4 | −3.47 | 0.50 |
782280_B | HDCRA | −3.04 | 0.50 | |
320142_A | ENSG00000184316 | MTATP6 | −2.66 | 0.50 |
163072_A | ENSG00000103335 | KIAA0233 | −2.80 | 0.50 |
167150_A | ENSG00000169246 | KIAA0220 | −2.83 | 0.50 |
302373_C | ENSG00000143153 | ATP1B1 | −3.36 | 0.51 |
1659533_B | ENSG00000124159 | MATN4 | −3.61 | 0.51 |
307769_A | ENSG00000122034 | GTF3A | −3.27 | 0.52 |
244297_A | ENSG00000141934 | PPAP2C | −3.88 | 0.52 |
262390_A | ENSG00000182240 | BACE2 | −4.40 | 0.52 |
772402_A | ENSG00000099860 | GADD45B | −2.55 | 0.53 |
294203_A | ENSG00000109062 | SLC9A3R1 | −3.06 | 0.53 |
796298_B | ENSG00000110721 | CHK | −3.54 | 0.54 |
49200_A | ENSG00000074416 | MGLL | −2.84 | 0.55 |
123164_A | ENSG00000065361 | ERBB3 | −2.83 | 0.57 |
Myoepithelial genes | ||||
298509_A | ENSG00000108821 | COL1A1 | 3.54 | 9.23 |
214997_A | ENSG00000108821 | COL1A1 | 3.58 | 9.22 |
323321_A | ENSESTG00000026432 | COL1A1 | 3.26 | 8.29 |
341752_A | ENSG00000178939 | LGALS7 | 4.17 | 7.97 |
810813_B | ENSG00000160675 | S100A2 | 3.66 | 7.48 |
300737_A | ENSG00000065534 | MYLK | 2.50 | 7.11 |
188036_C | ENSG00000151914 | BPAG1 | 2.94 | 6.85 |
188036_A | ENSG00000151914 | BPAG1 | 3.39 | 6.38 |
264525_A | ENSG00000065534 | MYLK | 3.24 | 5.92 |
249977_A | ENSG00000100234 | TIMP3 | 3.18 | 5.59 |
270187_A | ENSG00000102265 | TIMP1 | 3.57 | 5.04 |
310019_A | ENSG00000065534 | MYLK | 2.40 | 4.98 |
stSG89269 | ENSG00000100234 | TIMP3 | 2.78 | 4.79 |
266325_A | ENSG00000102265 | TIMP1 | 3.42 | 4.79 |
327165_A | ENSG00000166628 | SERPINB5 | 3.14 | 4.72 |
263278_A | ENSG00000113140 | SPARC | 3.89 | 4.66 |
1404774_A | ENSG00000087494 | PTHLH | 2.42 | 4.55 |
346130_A | ENSG00000169474 | SPRR1B | 2.73 | 4.45 |
51003_A | ENSESTG00000011065 | DKK3 | 2.88 | 4.11 |
359747_A | ENSG00000178939 | LGALS7 | 3.01 | 4.09 |
813614_C | ENSG00000169474 | SPRR1B | 2.91 | 3.91 |
141815_A | ENSG00000101384 | JAG1 | 3.20 | 3.89 |
302294_A | ENSG00000166033 | PRSS11 | 3.10 | 3.65 |
346610_A | ENSG00000175793 | SFN | 4.86 | 3.56 |
324700_A | ENSG00000149968 | MMP3 | 2.50 | 3.34 |
38967_A | ENSG00000166033 | PRSS11 | 3.39 | 3.30 |
810017_A | ENSG00000104368 | PLAT | 3.12 | 3.26 |
111081_A | ENSG00000169688 | MT1F | 3.08 | 3.21 |
809810_A | ENSG00000137699 | TRIM29 | 2.68 | 3.19 |
Sanger ID | Ensembl | Gene ID | Statistical Analysis of Microarray score | Fold difference |
1840568_A | ENSG00000184330 | S100A7 | 2.75 | 3.14 |
347284_A | ENSG00000105974 | CAV1 | 3.33 | 3.13 |
788192_A | ENSG00000087494 | PTHLH | 3.24 | 3.09 |
195273_A | ENSG00000139219 | COL1A1 | 2.55 | 3.02 |
293270_A | ENSG00000107987 | AKR1C2 | 3.21 | 2.94 |
728114_A | ENSG00000166899 | FABP5 | 2.46 | 2.93 |
1568010_A | ENSG00000109321 | AREG | 2.85 | 2.90 |
137665_A | ENSG00000073282 | TP73L | 2.89 | 2.82 |
364409_A | ENSG00000121552 | CSTA | 2.54 | 2.80 |
22117_A | ENSG00000136699 | FLJ20297 | 3.75 | 2.77 |
813614_A | ENSG00000169474 | SPRR1B | 2.61 | 2.69 |
297392_A | ENSG00000187193 | MT1L | 3.79 | 2.61 |
127982_A | ENSG00000104368 | PLAT | 2.84 | 2.58 |
148057_A | ENSG00000117318 | ID3 | 3.74 | 2.56 |
66946_A | ENSG00000169688 | MT1B | 3.13 | 2.54 |
26285_A | ENSG00000091409 | ITGA6 | 3.30 | 2.50 |
267759_A | ENSG00000105974 | CAV1 | 2.53 | 2.49 |
149370_A | ENSG00000109861 | CTSC | 4.47 | 2.45 |
274164_A | ENSG00000187193 | MT1F | 4.07 | 2.44 |
292784_A | ENSG00000109861 | CTSC | 3.69 | 2.43 |
418137_A | ENSG00000105281 | SLC1A5 | 5.24 | 2.42 |
Factor . | No. cases . | Death from breast cancer . | . | P . | |
---|---|---|---|---|---|
. | . | Mean survival (mo) . | SE . | . | |
Size of tumor | <0.0001 | ||||
T1 | 178 | 112.2 | 3.04 | ||
T2 | 170 | 97.5 | 3.86 | ||
T3 | 21 | 76.9 | 4.92 | ||
T4 | 38 | 50.2 | 5.79 | ||
Nodal status | <0.0001 | ||||
Positive | 208 | 92.8 | 3.28 | ||
Negative | 262 | 115.2 | 2.29 | ||
Stage | <0.0001 | ||||
I | 124 | 117.9 | 3.05 | ||
II | 197 | 102.7 | 3.4 | ||
III | 50 | 66.4 | 4.85 | ||
IV | 34 | 46.1 | 6.59 | ||
Grade | 0.00224 | ||||
1 | 103 | 108.1 | 4.78 | ||
2 | 118 | 93.2 | 5.45 | ||
3 | 39 | 81.7 | 7.8 | ||
Estrogen receptor | 0.796 | ||||
Positive | 242 | 98.9 | 3.08 | ||
Negative | 205 | 98.7 | 3.28 | ||
Progesterone receptor | 0.561 | ||||
Positive | 255 | 101.1 | 2.94 | ||
Negative | 215 | 98.3 | 3.16 | ||
LGALS3 | 0.597 | ||||
Positive | 213 | 96.1 | 3.2 | ||
Negative | 218 | 99.7 | 3.14 | ||
CD24 | 0.827 | ||||
Positive | 126 | 99.3 | 4.1 | ||
Negative | 300 | 99.6 | 2.75 | ||
CLDN4 | 0.707 | ||||
Positive | 245 | 101.7 | 2.95 | ||
Negative | 86 | 99.4 | 4.86 | ||
SPARC | 0.00844 | ||||
Positive | 17 | 67.9 | 11.1 | ||
Negative | 333 | 100.9 | 2.53 | ||
S100A2 | 0.209 | ||||
Positive | 8 | 114.7 | 10.84 | ||
Negative | 435 | 97.8 | 2.28 | ||
MASPIN | 0.092 | ||||
Positive | 108 | 103.3 | 4.34 | ||
Negative | 225 | 96.3 | 3.12 |
Factor . | No. cases . | Death from breast cancer . | . | P . | |
---|---|---|---|---|---|
. | . | Mean survival (mo) . | SE . | . | |
Size of tumor | <0.0001 | ||||
T1 | 178 | 112.2 | 3.04 | ||
T2 | 170 | 97.5 | 3.86 | ||
T3 | 21 | 76.9 | 4.92 | ||
T4 | 38 | 50.2 | 5.79 | ||
Nodal status | <0.0001 | ||||
Positive | 208 | 92.8 | 3.28 | ||
Negative | 262 | 115.2 | 2.29 | ||
Stage | <0.0001 | ||||
I | 124 | 117.9 | 3.05 | ||
II | 197 | 102.7 | 3.4 | ||
III | 50 | 66.4 | 4.85 | ||
IV | 34 | 46.1 | 6.59 | ||
Grade | 0.00224 | ||||
1 | 103 | 108.1 | 4.78 | ||
2 | 118 | 93.2 | 5.45 | ||
3 | 39 | 81.7 | 7.8 | ||
Estrogen receptor | 0.796 | ||||
Positive | 242 | 98.9 | 3.08 | ||
Negative | 205 | 98.7 | 3.28 | ||
Progesterone receptor | 0.561 | ||||
Positive | 255 | 101.1 | 2.94 | ||
Negative | 215 | 98.3 | 3.16 | ||
LGALS3 | 0.597 | ||||
Positive | 213 | 96.1 | 3.2 | ||
Negative | 218 | 99.7 | 3.14 | ||
CD24 | 0.827 | ||||
Positive | 126 | 99.3 | 4.1 | ||
Negative | 300 | 99.6 | 2.75 | ||
CLDN4 | 0.707 | ||||
Positive | 245 | 101.7 | 2.95 | ||
Negative | 86 | 99.4 | 4.86 | ||
SPARC | 0.00844 | ||||
Positive | 17 | 67.9 | 11.1 | ||
Negative | 333 | 100.9 | 2.53 | ||
S100A2 | 0.209 | ||||
Positive | 8 | 114.7 | 10.84 | ||
Negative | 435 | 97.8 | 2.28 | ||
MASPIN | 0.092 | ||||
Positive | 108 | 103.3 | 4.34 | ||
Negative | 225 | 96.3 | 3.12 |
Factor . | Death from breast cancer . | . | |
---|---|---|---|
. | Hazard ratio (95% confidence interval) . | P (Cox) . | |
Age | 1.05 (1.02–1.08) | 0.0016 | |
Stage | 1.83 (1.24–2.72) | 0.0026 | |
Grade | 2.53 (1.146–4.40) | 0.001 | |
SPARC positive | 6.88 (1.75–27.04) | 0.0057 |
Factor . | Death from breast cancer . | . | |
---|---|---|---|
. | Hazard ratio (95% confidence interval) . | P (Cox) . | |
Age | 1.05 (1.02–1.08) | 0.0016 | |
Stage | 1.83 (1.24–2.72) | 0.0026 | |
Grade | 2.53 (1.146–4.40) | 0.001 | |
SPARC positive | 6.88 (1.75–27.04) | 0.0057 |
Acknowledgments
We thank the staff of the Sanger Institute Microarray Facility for the supply of arrays, lab protocols, and technical advice (David Vetrie, Cordelia Langford, Adam Whittaker, Neil Sutton) and the staff of the Quantarray/GeneSpring data files and all data analysis and databases relating to elements on the arrays (Kate Rice, Rob Andrews, Adam Butler, Harish Chudasama). We also thank Professor Alan Ashworth for continued support and helpful discussions pertaining to this project.