Abstract
Breast cancer is a heterogeneous disease. Comprehensive gene expression profiles obtained using DNA microarrays have revealed previously indistinguishable subtypes of noninflammatory breast cancer (NIBC) related to different features of mammary epithelial biology and significantly associated with survival. Inflammatory breast cancer (IBC) is a rare, particular, and aggressive form of disease. Here we have investigated whether the five molecular subtypes described for NIBC (luminal A and B, basal, ERBB2 overexpressing, and normal breast-like) were also present in IBC. We monitored the RNA expression of ∼8,000 genes in 83 breast tissue samples including 37 IBC, 44 NIBC, and 2 normal breast samples. Hierarchical clustering identified the five subtypes of breast cancer in both NIBC and IBC samples. These subtypes were highly similar to those defined in previous studies and associated with similar histoclinical features. The robustness of this classification was confirmed by the use of both alternative gene set and analysis method, and the results were corroborated at the protein level. Furthermore, we show that the differences in gene expression between NIBC and IBC and between IBC with and without pathologic complete response that we have recently reported persist in each subtype. Our results show that the expression signatures defining molecular subtypes of NIBC are also present in IBC. Obtained using different patient series and different microarray platforms, they reinforce confidence in the expression-based molecular taxonomy but also give evidence for its universality in breast cancer, independently of a specific clinical form.
Introduction
Breast cancer is a major cause of morbidity and mortality in the world (1). The different clinical forms of disease at diagnosis include early-stage, locally advanced, and metastatic breast cancer. A rare (∼5% of all cases) but often lethal form of locally advanced breast cancer is inflammatory breast cancer (IBC). IBC is defined upon clinical and/or pathologic criteria. Clinical inflammatory signs arise quickly and involve more than one third of the breast [T4d of the tumor-node-metastasis-Union International Contre Cancer (UICC) classification]. Pathologic criterion is the presence of tumor emboli in dermal lymphatic vessels.
IBC displays epidemiologic, histoclinical, and prognostic differences with non-IBC (NIBC) (for review, see refs. 2, 3). Some geographic regions have higher incidence of IBC, and recently IBC incidence increased more than that of NIBC (4). IBC frequently displays high histologic grade, negativity of hormone receptors, neoangiogenenesis and tissue invasiveness (2), and an aggressive clinical behavior with high frequency of axillary lymph node involvement at diagnosis and, in ∼35% of cases, distant metastasis. Despite advances in the multidisciplinary treatment, the prognosis is much less favorable than for NIBC, with a 5-year survival ranging from 30% to 50% (2, 4). Like NIBC, IBC is a heterogeneous disease, notably in terms of survival or response to primary chemotherapy with a rate of pathologic complete response (PCR) <15% to 25%. Prognostic features (5) remain contested, and to date, no clinical or molecular parameter can reliably predict PCR to standard anthracycline-based chemotherapy (6, 7). Because of its relative scarcity and difficulty for obtaining diagnostic samples of sufficient size, little is known about the molecular basis of IBC (3). Significant advances recently came from the development of experimental models (cell lines, xenografts; for review, see ref. 8).
Comprehensive gene expression profiles defined using DNA microarrays (9, 10) have led to important insights in the molecular heterogeneity of cancers by revealing biologically and clinically relevant subtypes of tumors previously indistinguishable with conventional approaches. This was first reported in diffuse large B-cell lymphoma (11) with the identification of two prognostic subclasses derived from two different stages of B-cell maturation. Similar cell-of-origin classifier was then described for breast cancer with the definition of five subtypes of disease, which are related to mammary epithelial biology (luminal A and B, basal, ERBB2-overexpressing, and normal breast–like subtypes), and associated with different histoclinical features including clinical outcome (12–18). Specific genome alterations (17) and distinct gene expression changes in response to chemotherapy (16) have been associated to some subtypes, further suggesting that the five subtypes may represent distinct diseases. However, this new classification concerned the locally advanced or localized forms of NIBC but has not yet been described for IBC.
IBC has been recently studied using DNA microarrays, first on cell lines (19), then more recently on clinical specimens (20). We monitored the RNA expression of ∼8,000 genes in 81 clinical breast cancer samples including 37 IBC and identified a 109-gene signature with expression discriminated between NIBC and IBC (20). Global analysis showed extensive transcriptional heterogeneity of IBC samples, suggesting the existence of distinct subtypes unrecognized by classic approaches. Here we have investigated whether the molecular subtypes of NIBC are also present in IBC. We show for the first time that the same subtypes as in NIBC do exist in IBC and are associated with similar histoclinical features.
Material and Methods
Patients and Samples. Tumor samples were obtained from 81 women who underwent initial surgery at the Institut Paoli-Calmettes for an invasive breast adenocarcinoma. The 81 samples included 37 pretreatment tumor specimens from 37 patients with IBC. The diagnosis of IBC was defined upon clinical and/or pathologic criteria. Clinically, cancer was characterized by diffuse brawny induration of the skin of the breast with an erysipeloid edge (UICC T4d). Pathologic criterion was the presence of tumor embolization of superficial dermal lymphatics. A series of 44 other samples represented 44 NIBCs: locally advanced (UICC T3, T4a, b, c; 14 cases) and localized with (18 cases) and without (12 cases) pathologic axillary lymph node involvement. The main histoclinical characteristics of patients are summarized in Table 1. After removal, samples were macrodissected by pathologists and quickly frozen and stored at −160°C. After dissection, all of the tumor samples contained more than 60% of tumor cells as assessed using frozen sections adjacent to the profiled samples. All medical records and tumor sections were de novo reviewed before analysis.
Patients and tumor characteristics
Characteristics* . | No. patients (% of evaluated cases) . | . | . | P . | ||
---|---|---|---|---|---|---|
. | Total (N = 81) . | NIBC (n = 44) . | IBC (n = 37) . | . | ||
Age (y) | 0.12 | |||||
Median (range) | 58 (24-86) | 62 (39-86) | 55 (24-81) | |||
Histological type (81) | 0.19 | |||||
Ductal | 67 (83) | 34 (77) | 33 (89) | |||
Lobular | 11 (14) | 7 (16) | 4 (11) | |||
Other | 3 (3) | 3 (7) | 0 (0) | |||
SBR grade (81) | 0.07 | |||||
1 and 2 | 38 (47) | 25 (57) | 13 (35) | |||
3 | 43 (53) | 19 (43) | 24 (65) | |||
Dermal lymphatic emboli (81) | <0.01 | |||||
Present | 26 (32) | 0 (0) | 26 (70) | |||
Absent | 55 (68) | 44 (100) | 11 (30) | |||
ER status (81) | 0.36 | |||||
Negative | 32 (39) | 15 (34) | 17 (46) | |||
Positive | 49 (61) | 29 (66) | 20 (54) | |||
PR status (81) | 0.06 | |||||
Negative | 30 (37) | 12 (27) | 18 (49) | |||
Positive | 51 (63) | 32 (73) | 19 (51) | |||
ERBB2 status (81) | 0.44 | |||||
Negative, 0-1+ | 61 (75) | 35 (80) | 26 (70) | |||
Positive, 2-3+ | 20 (25) | 9 (20) | 11 (30) | |||
P53 status (81) | 0.35 | |||||
Negative | 51 (63) | 29 (66) | 22 (59) | |||
Positive | 30 (37) | 15 (34) | 15 (41) |
Characteristics* . | No. patients (% of evaluated cases) . | . | . | P . | ||
---|---|---|---|---|---|---|
. | Total (N = 81) . | NIBC (n = 44) . | IBC (n = 37) . | . | ||
Age (y) | 0.12 | |||||
Median (range) | 58 (24-86) | 62 (39-86) | 55 (24-81) | |||
Histological type (81) | 0.19 | |||||
Ductal | 67 (83) | 34 (77) | 33 (89) | |||
Lobular | 11 (14) | 7 (16) | 4 (11) | |||
Other | 3 (3) | 3 (7) | 0 (0) | |||
SBR grade (81) | 0.07 | |||||
1 and 2 | 38 (47) | 25 (57) | 13 (35) | |||
3 | 43 (53) | 19 (43) | 24 (65) | |||
Dermal lymphatic emboli (81) | <0.01 | |||||
Present | 26 (32) | 0 (0) | 26 (70) | |||
Absent | 55 (68) | 44 (100) | 11 (30) | |||
ER status (81) | 0.36 | |||||
Negative | 32 (39) | 15 (34) | 17 (46) | |||
Positive | 49 (61) | 29 (66) | 20 (54) | |||
PR status (81) | 0.06 | |||||
Negative | 30 (37) | 12 (27) | 18 (49) | |||
Positive | 51 (63) | 32 (73) | 19 (51) | |||
ERBB2 status (81) | 0.44 | |||||
Negative, 0-1+ | 61 (75) | 35 (80) | 26 (70) | |||
Positive, 2-3+ | 20 (25) | 9 (20) | 11 (30) | |||
P53 status (81) | 0.35 | |||||
Negative | 51 (63) | 29 (66) | 22 (59) | |||
Positive | 30 (37) | 15 (34) | 15 (41) |
Abbreviation: SBR, Scarff-Bloom-Richardson.
Number of evaluated cases among 81 patients.
After incisional diagnostic biopsy, patients with IBC or locally advanced NIBC were treated with anthracycline-based chemotherapy followed (for clinically nonprogressive and consenting patients) by mastectomy and locoregional radiotherapy. The pathologic response to chemotherapy was assessed on mastectomy specimens and was scored in four grades as described (21). Grades 1 and 2 were considered as pathologic complete responses (PCR+ samples), and grades 3 and 4 as failures (PCR− samples). After surgery, patients with early-stage breast cancer were treated by radiotherapy and adjuvant systemic therapy (chemotherapy and/or hormone therapy) when appropriate.
In addition, we profiled two RNA samples from normal breast tissue (Clontech, Palo Alto, CA), representing pools of whole normal breast from six (NB1) and four (NB2) individuals, and RNA extracted from 14 cell lines including 7 mammary epithelial cell lines. These cell lines were associated to three of the five previously described molecular subtypes of breast cancer. They included 5 luminal NIBC cell lines (BT-474, MCF-7, MDA-MB-134, SK-BR-3, and T-47D), some of which display amplification of ERBB2, 1 basal-like epithelial cell line (HME-1), and 1 IBC cell line (SUM-149). All cell lines were obtained from American Type Culture Collection (Manassas, VA; http://www.atcc.org/), except SUM-149 (kindly provided by S.P. Ethier, Ann Arbor, MI), and were grown as recommended.
RNA Isolation. Total RNA was extracted from frozen samples and cell lines by standard methods (22). Integrity was controlled by denaturing formaldehyde agarose gel electrophoresis and Agilent analysis (Bioanalyzer, Palo Alto, CA).
DNA Microarray Production. Gene expression profiling was done with homemade nylon DNA microarrays produced as previously described (23), and radioactive detection. Briefly, all cDNA clones were PCR amplified in 96-well microtiter plates and then spotted onto Hybond-N+ 2 × 7-cm2 membranes (Amersham Biosciences, Freiburg, Germany) stuck to glass slides with a 64-pin print head on a MicroGrid II microarrayer (Apogent Discoveries, Cambridge, United Kingdom). They contained 8,016 spotted cDNA clones, representing 7,874 IMAGE clones and 142 control clones. The IMAGE clones included 6,664 named genes (according to the 155 Unigene release, ∼2,500 of them being selected to be related to oncogenesis) and 1,210 expressed sequence tags. All microarrays hybridized were provided from the same printing batch. Before RNA hybridization, hybridization with a 33P-labeled oligonucleotide sequence common to all PCR products was done to control the quality of spotting and quantity of target DNA amount accessible in each spot.
DNA Microarray Hybridizations. After stripping, microarrays were hybridized with 33P-labeled probes made from 2 μg of total RNA. Probe preparations, hybridizations, and washes were done as previously described (23). Briefly, 2 μg of total RNA were retrotranscribed in the presence of [α-33P]dCTP (Amersham Biosciences). Hybridizations were carried out during 48 hours at 68°C in a final volume of 500 μL of buffer. After washes, arrays were exposed for 24 to 72 hours to phosphorimaging plates. Detection scanning was done with a FUJI BAS 5000 machine at 25-μm resolution (Raytest, Paris, France) and quantification of hybridization signals with the ArrayGauge software (Fuji Ltd, Tokyo, Japan). All hybridization images were inspected for artifacts and aberrant spots or microarray regions were excluded from analyses. The full list of clones and more details on the preparation of the microarrays and probes and hybridizations are available on the Web (http://tagc.univ-mrs.fr/pub/).
DNA Microarray Data Analysis. Before analysis, a filter procedure eliminated noninformative genes, retaining 2,300 genes on the basis of being well measured (expression level >2 × background signal) in at least 50% of the samples in at least one of the three classes (IBC, NIBC, and normal breast). Signal intensities were then normalized for the amount of spotted DNA (24), then the variability of experimental conditions (24) using LOWESS for each print-tip group (25). Data was log2 transformed before analysis.
Biologically relevant subtypes of NIBC have been defined using an “intrinsic set” of ∼500 genes (12–14). These genes were selected by the authors among ∼8,000 genes tested based on a lower variation in expression between paired samples from the same patients than between samples from different patients. To test the generality of these subtypes in our series of samples, we first analyzed samples with genes common to this gene set and ours. Of the 500-gene set used by Sorlie et al. (13, 14), 120 overlapped with our set of 2,300 filtered genes. We also analyzed our 2,300 genes by both unsupervised and supervised analysis to show the robustness of this new molecular taxonomy.
Unsupervised hierarchical clustering (cluster program; ref. 26) was applied to investigate the relations among genes and among samples, using data median-centered on genes, Pearson correlation as similarity metric, and centroid linkage clustering. Results were displayed using TreeView program (26). Supervised analysis was applied to identify genes that discriminate between the three major molecular subtypes and between the five subtypes. Each subtype was compared with the others (3 comparisons in the first analysis, 10 comparisons in the second one) by using a discriminating score (DS) combined with iterative random permutations. The DS was calculated for each gene (27) as DS = (M1 − M2) / (S1 + S2), where M1 and S1, respectively, represent mean and SD of expression levels of the gene in subgroup 1, and M2 and S2 in subgroup 2. Confidence levels were estimated by 200 iterative random permutations of samples as previously described (28) with a significance threshold producing fewer than five false positives. The final list of discriminator genes for the analysis of 3 or 5 subtypes included the genes identified as discriminator in at least one of the 3 or 10 comparisons, respectively. Once identified, the classification power of the discriminator signature was illustrated by annotating each sample according to the strongest correlation coefficient of its expression profile with the median profile of each molecular subtype (3 or 5 subtypes) calculated among the NIBC samples. No annotation was given when the correlation was <0.15 with any subtype.
Statistical Analysis. Statistical analyses were done using SPSS software (version 10.0.5). Correlations between sample groups and histoclinical data were studied by using Fisher's exact test or χ2 test when appropriate. The follow-up was measured from the date of diagnosis to last follow-up for nonmetastatic patients. Metastasis-free survival was measured from diagnosis until the date of the first distant metastasis. All other patients were censored at the time of the last follow-up. Survival was estimated with the Kaplan-Meier method and compared between groups with the log-rank test. Data concerning patients without metastatic relapse at last follow-up were censored. A P value < 0.05 was considered significant.
Results
Selection of a Common Intrinsic Gene Set and Performance in The Norway/Stanford Tumors. One hundred twenty genes were common to our 2,300 filtered genes and a 500-gene set used by Sorlie et al. (13, 14) to identify five molecular subtypes in breast cancer. We investigated whether these 120 genes were still able to discriminate the same five subtypes in the Norway/Stanford series. Available RNA expression data in the 122 breast tissue samples (115 cancer and 7 nonmalignant tissues) from Sorlie et al. (14) was submitted to hierarchical clustering based on the expression of the 120-gene set. Results are shown in Fig. 1.
Hierarchical clustering of expression data from Sorlie et al. (14) based on 120 common genes. Hierarchical clustering of the 122 Norway/Stanford samples based on mRNA expression levels of 120 genes common to our 2,300 genes and the intrinsic 500-gene set used by Sorlie et al. Rows, genes; columns, samples. Expression level of each gene in a single sample is relative to its median level across all samples and is depicted according to a color scale shown at the bottom. Red and green, expression levels above and below the median, respectively. The magnitude of deviation from the median is represented by the color saturation. Gray, missing data. The dendrogram of samples (above matrix) represents overall similarities in gene expression profiles. Under the dendrogram, the horizontal colored boxes delimit the five subgroups of tumor: luminal A (dark blue box), luminal B (light blue box), ERBB2-overexpressing (pink box), basal (red box), and normal breast–like (green box). Branches of the core samples for each of the five centroids are similarly color coded. Black branches, samples with low correlation to any subgroup. Colored bars, right, locations of three gene clusters of interest: ER (dark blue bar), ERBB2 (pink bar), and basal cluster (red bar). Some genes included in these clusters are referenced by their HUGO abbreviation as used in Locus Link.
Hierarchical clustering of expression data from Sorlie et al. (14) based on 120 common genes. Hierarchical clustering of the 122 Norway/Stanford samples based on mRNA expression levels of 120 genes common to our 2,300 genes and the intrinsic 500-gene set used by Sorlie et al. Rows, genes; columns, samples. Expression level of each gene in a single sample is relative to its median level across all samples and is depicted according to a color scale shown at the bottom. Red and green, expression levels above and below the median, respectively. The magnitude of deviation from the median is represented by the color saturation. Gray, missing data. The dendrogram of samples (above matrix) represents overall similarities in gene expression profiles. Under the dendrogram, the horizontal colored boxes delimit the five subgroups of tumor: luminal A (dark blue box), luminal B (light blue box), ERBB2-overexpressing (pink box), basal (red box), and normal breast–like (green box). Branches of the core samples for each of the five centroids are similarly color coded. Black branches, samples with low correlation to any subgroup. Colored bars, right, locations of three gene clusters of interest: ER (dark blue bar), ERBB2 (pink bar), and basal cluster (red bar). Some genes included in these clusters are referenced by their HUGO abbreviation as used in Locus Link.
Gene clustering revealed groups of coordinately expressed genes, some of which represented known expression signatures corresponding to defined biological processes or cell types: the “estrogen receptor (ER) cluster,” including ESR1, GATA3, XBP1, HNF3A, MUC1, and CCND1, with a prominent role in the classification of samples, the “ERBB2 cluster” including ERBB2 and GRB7, and the “basal cluster” including KRT5 and TRIM29.
Samples clustered in two main branches (left and right) and five major subgroups (horizontal colored boxes under the dendrogram in Fig. 1): two subgroups (luminal A and luminal B), characterized by high expression of the ER cluster and relatively low expression of the ERBB2 and basal clusters, were in the left branch, and three subgroups (ERBB2-overexpressing, basal, and normal breast–like) were in the right branch. These subgroups are shown by horizontal colored boxes under the dendrogram of samples in Fig. 1. Despite the limited number of genes, these subgroups were very similar to those previously described, and 89% of samples clustered together in the same manner. In addition, the luminal A, the ERBB2-overexpressing, and the basal subgroups were characterized by the same major gene clusters (ER, ERBB2, and basal clusters, respectively). No gene cluster was evident for the luminal B and normal breast–like subgroups, possibly because of the low number of genes. As reported, the luminal B samples showed lower expression of the ER cluster as compared with luminal A samples, and the normal breast–like samples displayed strong expression of the basal gene cluster and low expression of the ER cluster. This clustering suggested that the 120-gene set was sufficient to identify biologically relevant subtypes.
We then computed the typical expression profile of the 122 samples for the 120 genes, hereafter designated centroid. The samples from the luminal A, the ERBB2-overexpressing, and the basal centroids were easily defined based on the expression pattern of the ER, the ERBB2, and the basal gene clusters, respectively, and by selecting tumors with the highest correlation with each other within the subgroup: 45 samples for luminal A (correlation >0.31), 16 for ERBB2-overexpressing (correlation >0.46), and 19 for basal (correlation >0.51). With the 120-gene set, the samples from the normal breast–like and the luminal B centroids were very similar to those defined by Sorlie et al.: the former included 8 samples (correlation >0.36), all contained in that of Sorlie et al., whereas the latter included 11 samples (correlation >0.36), 7 of which were in that of Sorlie et al. These samples are color coded in the dendrogram in Fig. 1. As reported by Sorlie et al., the basal centroid was the most homogeneous one. The centroid expression for each of the five tumor subgroups was calculated as the average expression for each of the 120 genes in the corresponding samples.
Confirmation of Biologically Relevant Subtypes in Breast Cell Lines. To validate our approach, we investigated whether breast cancer cell lines with known luminal, ERBB2-overexpressing, and basal characteristics could be similarly classified. We calculated correlations of each of 8 cell lines (BT-474 was profiled twice) with each centroid: 2 were closer to basal centroid (HME-1, SUM149), 2 to ERBB2 centroid (BT-474, SKBR3), and 3 to luminal A centroid (MCF7, MDA-MB-134, T47D), in perfect agreement with their known phenotypical features. No cell line displayed a correlation <0.15 with any centroid.
Clustering of cell lines based on the 120 genes is shown in Fig. 2A. Two groups of cell lines were clearly discriminated by the ER cluster. In the right group, two subgroups were evidenced with (from left to right) strong expression of the ERBB2 cluster (subgroup I) and of the basal cluster (subgroup II), respectively. Branches of the dendrogram were color coded according to the strongest correlation of the cell line with the centroid. As shown, the grouping of cell lines was in perfect agreement with the subtype from which they were closer: subgroup I included the ERBB2-overexpressing subtype with known ERBB2-positive status, subgroup II the basal-like cell lines with known ER-negative status, and the right group contained luminal A–like cell lines with known ER-positive status.
Hierarchical clustering of cancer cell lines, inflammatory/noninflammatory breast cancer, and normal breast tissue samples based on 120 genes. A and B, clustering of 8 breast cancer cell lines (A) and 81 breast cancer tissue samples (44 NIBC and 37 IBC) and 2 normal breast tissue samples (NB1 and NB2) using expression levels of 120 genes (B). Branches of the dendrogram are color coded according to the strongest correlation of the sample with the centroid as defined in Fig. 1. Gray branches, samples with low correlation to any centroid (<0.15). Between each dendrogram and colored matrix, some histoclinical features of samples according to a color ladder (unavailable, oblique feature): type (NIBC, white; IBC, black; normal breast, gray), SBR grade (1, white; 2, gray; 3, black), ER immunohistochemistry status (negative, white; positive, black), PR immunohistochemistry status (negative, white; positive, black), ERBB2 immunohistochemistry status (negative, white; positive, black), P53 immunohistochemistry status (negative, white; positive, black), distant metastasis during follow-up (absent, white; present, black). Colored bars, right, locations of the ER (dark blue bar), ERBB2 (pink bar), and basal clusters (red bar). C, distribution of all breast cancer tissue samples across the five molecular subtypes. ER pos, PR pos, and ERBB2 pos, percentage of cases with positive immunohistochemistry status within each subtype. D, Kaplan-Meier plots of metastasis-free survival of the three major subtypes of breast cancer samples (NIBC and IBC combined; dark blue, luminal A, 26 patients; pink, ERBB2-overexpressing, 15 patients; red, basal, 16 patients).
Hierarchical clustering of cancer cell lines, inflammatory/noninflammatory breast cancer, and normal breast tissue samples based on 120 genes. A and B, clustering of 8 breast cancer cell lines (A) and 81 breast cancer tissue samples (44 NIBC and 37 IBC) and 2 normal breast tissue samples (NB1 and NB2) using expression levels of 120 genes (B). Branches of the dendrogram are color coded according to the strongest correlation of the sample with the centroid as defined in Fig. 1. Gray branches, samples with low correlation to any centroid (<0.15). Between each dendrogram and colored matrix, some histoclinical features of samples according to a color ladder (unavailable, oblique feature): type (NIBC, white; IBC, black; normal breast, gray), SBR grade (1, white; 2, gray; 3, black), ER immunohistochemistry status (negative, white; positive, black), PR immunohistochemistry status (negative, white; positive, black), ERBB2 immunohistochemistry status (negative, white; positive, black), P53 immunohistochemistry status (negative, white; positive, black), distant metastasis during follow-up (absent, white; present, black). Colored bars, right, locations of the ER (dark blue bar), ERBB2 (pink bar), and basal clusters (red bar). C, distribution of all breast cancer tissue samples across the five molecular subtypes. ER pos, PR pos, and ERBB2 pos, percentage of cases with positive immunohistochemistry status within each subtype. D, Kaplan-Meier plots of metastasis-free survival of the three major subtypes of breast cancer samples (NIBC and IBC combined; dark blue, luminal A, 26 patients; pink, ERBB2-overexpressing, 15 patients; red, basal, 16 patients).
Identification of Molecular Subtypes in Inflammatory Breast Cancer. We then looked for subtypes in our series of 83 breast tissue samples, including 37 IBC. We calculated correlations of each sample with each centroid: 26 were closer to luminal A centroid (12 NIBC and 14 IBC), 7 to luminal B centroid (4 NIBC and 3 IBC), 15 to ERBB2-overexpressing centroid (6 NIBC and 9 IBC), 16 to basal centroid (8 NIBC and 8 IBC), and 6 to normal breast–like centroid (2 NIBC, 2 IBC, and 2 NB). Thirteen samples, 12 NIBC and 1 IBC, displayed a low correlation (<0.15) with any centroid and were not assigned to any centroid (Fig. 2C).
There was good agreement between these assignments and protein expression data: all the luminal A/B tumors were ER positive by immunohistochemistry, 29 of 31 basal or ERBB2-overexpressing tumors were ER negative, and all 15 ERBB2-overexpressing tumors were ERBB2 positive by immunohistochemistry. In addition, 27 of the 68 tumors that correlated with a centroid were analyzed by immunohistochemistry with antibodies against luminal (CK8/18) and basal (CK5/6 and CDH3) markers (29, 30) to establish their luminal A/B and basal characteristics at the protein level. All luminal A/B tumors with available immunohistochemistry data showed strong staining for CK8/18 and no staining for CK5/6 and CDH3. Conversely, all basal tumors with available immunohistochemistry data showed strong staining for CK5/6 or CDH3 or both. These results confirmed not only the existence of molecular subtypes in NIBC but also strongly suggested their existence in IBC.
We applied hierarchical clustering to the expression levels of the 120 genes in the 83 samples, including the IBC (Fig. 2B). As expected, two major groups of samples were discriminated by the ER cluster in close agreement with the immunohistochemistry status for ER: in the left group (strong expression of the cluster), 39 (98%) of 40 samples were ER positive, whereas in the right group (low expression), 31 (76%) of 41 samples were ER negative. A majority of tumors was grade 3 in the right group, which included most of ER-negative cases, ERBB2-positive cases, and P53-positive cases. The right group was subdivided in at least two subgroups that displayed strong expression of the basal cluster and of the ERBB2 cluster, respectively.
Branches of the dendrogram were color coded according to the strongest correlation of the sample with the centroid. There was a strong association between the grouping of samples and the subtype from which they were closer. The left group included all 26 luminal A–like samples and 6 of 7 luminal B–like samples. Even if the 7 luminal B–like samples were more scattered than the luminal A samples, many of them clustered near each other. The right group contained all 15 ERBB2-overexpressing cases, all 16 basal-like samples, and 5 of 6 normal breast–like samples. Here also, several subgroups of samples were evidenced. Fourteen of 15 ERBB2-overexpressing tumors clustered together in the same subgroup. Similarly, 12 of 16 tumors closer to the basal-like subtype were in the same subgroup, as were the 5 normal breast–like samples in another subgroup.
As shown in Fig. 2B, the five molecular subtypes concerned all samples, whether NIBC or IBC. The IBC samples were intermingled with the NIBC across all subgroups and distributed in all five molecular subtypes. This indicates that the different subtypes are present in both NIBC and IBC with similar expression patterns.
Robustness of the Taxonomy. The robustness and reliability of the subtype classification was confirmed by unsupervised and supervised analyses of an alternative gene set including all 2,300 genes. Global hierarchical clustering sorted breast cancer samples into two groups strongly related with the subtyping: 90% of samples of the right group were luminal A- or luminal B–like, and 82% of samples of the left group were basal-like, ERBB2-overexpressing, or normal breast–like (data not shown). To test the hypothesis that the expression signatures defining the different subtypes are equivalent in NIBC and IBC, we applied supervised analysis based on the 2,300 genes. The NIBC samples were used as a learning set to develop a molecular signature discriminating either three (luminal A, basal, and ERBB2-overexpressing) or five centroid-based subtypes (same plus luminal B and normal breast-like). Using a discriminating score combined with permutation tests, we identified 108 and 187 genes as discriminatory between the three and the five subtypes, respectively. As expected, the resulting classification of NIBC was in good agreement with their centroid-based subtype (100% and 91%; Fig. 3A). We applied these signatures to the subtype prediction of the IBC samples used as a validation set. Here also, the prediction strongly correlated with the centroid-based subtyping, with 90% and 78% of concordance for the 3-class and the 5-class analysis, respectively (Fig. 3A). Hierarchical clustering based on expression of the 108-gene signature (3-class analysis) is shown in Fig. 3B. Clustering classified NIBC samples into three groups that perfectly correlated to their subtype (Fig. 3B,, left). Similar clustering applied to IBC samples identified three major groups strongly associated to the subtyping: the first group contained 7 basal samples, the second group contained 8 ERBB2-overexpressing samples, and the third group contained 15 samples including 12 luminal A samples (Fig. 3B , right).
Supervised analysis to test the robustness of the molecular subtypes. A, subtype prediction using supervised analysis applied to expression levels of all 2,300 genes for discriminating three (luminal A, basal, ERBB2-overexpressing) or five molecular centroid–based subtypes (same plus luminal B and normal breast-like). The list of discriminator genes was defined on a learning set including the NIBC samples (26 and 32 cases) and was then tested on a validation set including the IBC samples (31 and 36 cases). The number of discriminatory genes differed for the 3- and the 5-subtype prediction. The accuracy represents the concordance between the subtype prediction by supervised analysis and the centroid-based subtype. B, hierarchical clustering based on expression of the 108-gene signature identified in the 3-subtype supervised analysis and applied to the NIBC samples (left) and the IBC samples (right). Branches of the dendrogram of samples are color coded according to the centroid-based subtype as in Fig. 2 B. Colored bars to the right, locations of three gene clusters: basal (red bar), ERBB2 (pink bar), and ER (dark blue bar).
Supervised analysis to test the robustness of the molecular subtypes. A, subtype prediction using supervised analysis applied to expression levels of all 2,300 genes for discriminating three (luminal A, basal, ERBB2-overexpressing) or five molecular centroid–based subtypes (same plus luminal B and normal breast-like). The list of discriminator genes was defined on a learning set including the NIBC samples (26 and 32 cases) and was then tested on a validation set including the IBC samples (31 and 36 cases). The number of discriminatory genes differed for the 3- and the 5-subtype prediction. The accuracy represents the concordance between the subtype prediction by supervised analysis and the centroid-based subtype. B, hierarchical clustering based on expression of the 108-gene signature identified in the 3-subtype supervised analysis and applied to the NIBC samples (left) and the IBC samples (right). Branches of the dendrogram of samples are color coded according to the centroid-based subtype as in Fig. 2 B. Colored bars to the right, locations of three gene clusters: basal (red bar), ERBB2 (pink bar), and ER (dark blue bar).
Correlation of Molecular Subtypes with Histoclinical Data. We then searched for correlations between the three major molecular subtypes (luminal A, basal, and ERBB2-overexpressing) and relevant features of tumors [grade, ER, progesterone receptor (PR), ERBB2 and P53 immunohistochemistry status, clinical outcome]. The luminal B and normal breast–like subtypes were excluded from analysis because of the low number of samples.
We first combined the corresponding IBC and NIBC samples (n = 57). As expected, correlations (P < 0.001, Fisher's exact test) with ER, PR, and ERBB2 status were found (Table 2). A significant association with grade, P53 status, and clinical outcome was also evidenced. The respective ratios of grade 3 samples were 27%, 67%, and 88% in the luminal A–like subtype, the ERBB2-overexpressing subtype, and the basal-like subtype (P = 0.002, χ2 test), and the respective ratios of P53-positive samples were 15%, 47%, and 75% (P = 0.001, Fisher's exact test). Finally, with a median follow-up of 47 months after diagnosis (range, 2-135), the respective 5-year metastasis-free survival was 67%, 36%, and 48% (P = 0.03, log-rank test; Fig. 2D).
Correlations of the three major molecular tumor subtypes with histoclinical characteristics (NIBC and IBC are merged)
Characteristics* . | No. patients (% of evaluated cases) . | . | . | P . | ||
---|---|---|---|---|---|---|
. | Luminal A (n = 26) . | ERBB2+ (n = 15) . | Basal (n = 16) . | . | ||
SBR grade (57) | 0.002 | |||||
1 | 4 (15) | 0 (0) | 1 (6) | |||
2 | 15 (58) | 5 (33) | 1 (6) | |||
3 | 7 (27) | 10 (67) | 14 (88) | |||
ER status (57) | <0.001 | |||||
Negative | 0 (0) | 14 (93) | 15 (94) | |||
Positive | 26 (100) | 1 (7) | 1 (6) | |||
PR status (57) | <0.001 | |||||
Negative | 1 (4) | 12 (80) | 15 (94) | |||
Positive | 25 (96) | 3 (20) | 1 (6) | |||
ERBB2 status (57) | <0.001 | |||||
Negative, 0-1+ | 25 (96) | 0 (0) | 16 (100) | |||
Positive, 2-3+ | 1 (4) | 15 (100) | 0 (0) | |||
P53 status (57) | 0.001 | |||||
Negative | 22 (85) | 8 (53) | 4 (25) | |||
Positive | 4 (15) | 7 (47) | 12 (75) |
Characteristics* . | No. patients (% of evaluated cases) . | . | . | P . | ||
---|---|---|---|---|---|---|
. | Luminal A (n = 26) . | ERBB2+ (n = 15) . | Basal (n = 16) . | . | ||
SBR grade (57) | 0.002 | |||||
1 | 4 (15) | 0 (0) | 1 (6) | |||
2 | 15 (58) | 5 (33) | 1 (6) | |||
3 | 7 (27) | 10 (67) | 14 (88) | |||
ER status (57) | <0.001 | |||||
Negative | 0 (0) | 14 (93) | 15 (94) | |||
Positive | 26 (100) | 1 (7) | 1 (6) | |||
PR status (57) | <0.001 | |||||
Negative | 1 (4) | 12 (80) | 15 (94) | |||
Positive | 25 (96) | 3 (20) | 1 (6) | |||
ERBB2 status (57) | <0.001 | |||||
Negative, 0-1+ | 25 (96) | 0 (0) | 16 (100) | |||
Positive, 2-3+ | 1 (4) | 15 (100) | 0 (0) | |||
P53 status (57) | 0.001 | |||||
Negative | 22 (85) | 8 (53) | 4 (25) | |||
Positive | 4 (15) | 7 (47) | 12 (75) |
Number of evaluated cases among 57 patients.
To further compare molecular subtypes between NIBC and IBC, we separately analyzed NIBC and IBC and searched if correlations coexisted within each type. Strong correlations (P < 0.001, Fisher's exact test) between the molecular subtyping and ER, PR, and ERBB2 status were confirmed in both NIBC and IBC separately, with significantly more ER-positive cases and more PR-positive cases in luminal A samples as compared with ERBB2-overexpressing samples and basal samples, and more ERBB2-positive cases in ERBB2-overexpressing samples as compared with luminal A samples and basal samples. Other correlations were found as significant in one type of disease and not in the other, but with a similar tendency. In NIBC (n = 26), there was a significant association of subtype classification with grade (P = 0.005, χ2 test) and P53 status (P < 0.001, Fisher's exact test), with more grade 3 and P53-positive samples in the basal-like subtype than in the two other subtypes. Although the difference was not significant, probably due to the small number of samples, the 5-year metastasis-free survival was 38% in the basal-like subtype, 55% in the ERBB2-overexpressing, and 60% in the luminal A–like (P = 0.13, log-rank test). In IBC (n = 31), the 5-year metastasis-free survival was 60% in basal subtype, 19% in ERBB2-overexpressing subtype, and 73% in luminal A subtype (P = 0.08, log-rank test). Similar to NIBC, more samples from the basal-like subtype were grade 3 (100%) or P53-positive (63%) than samples from the ERBB2-overexpressing subtype (67% and 44%) and the luminal A–like subtype (50% and 29%; P = 0.17, χ2 test for grade, and P = 0.29 for P53 status, Fisher's exact test). Finally, among 21 IBC available for assessment of PCR (8 PCR+ and 13 PCR−), the rate of PCR was 27% in the luminal A, 80% in the basal, and 20% in the ERBB2-overexpressing subtypes (P = 0.08, χ2 test). No similar correlation existed between the PCR rate and grade or immunohistochemistry data for ER, PR, ERBB2, and P53 in these patients.
Discrimination of Inflammatory Breast Cancer/Non-Inflammatory Breast Cancer Samples and PCR+/PCR− Inflammatory Breast Cancer Samples in Molecular Subtypes. The major finding of this study is that the five molecular subtypes are highly correlated with the cell-of-origin distinction, but not with the IBC/NIBC type. This observation suggests that the 120-gene set does not reflect the difference seen between IBC and NIBC. We have recently identified (20), on the same series of samples, a 109-gene signature, the expression of which discriminates IBC from NIBC samples with an accuracy rate of 84%. We tested this signature in each of the molecular subtypes (Fig. 4A). As shown, the signature performed well for discriminating NIBC and IBC within each of the five subtypes of samples. The number of samples correctly classified was 92% for luminal A (P < 0.001, χ2 test), 100% for luminal B (P = 0.008, χ2 test), 73% for ERBB2-overexpressing (P = 0.08, χ2 test), 63% for basal (P = 0.3, χ2 test), 100% for normal breast–like (P = 0.04, 0.04, χ2 test), and 92% for other cases (P = 0.015, χ2 test).
Supervised classification of breast cancer in each molecular subtype based on two different gene expression signatures. A, samples from each molecular subtype are classified using a 109-gene expression signature associated with NIBC/IBC discrimination (20). Genes (rows) are ordered from top to bottom by their decreasing discriminating score. In each subtype, samples (columns) are ordered from left to right according to the correlation coefficient of their expression profile with the median profile of the IBC samples. Dashed orange line, threshold 0 that separates the two classes of samples, IBC class (left) and NIBC class (right). Bottom, histoclinical type of tumors (□, NIBC; ▪, IBC). B, samples from each of the three main molecular subtypes are classified using an 85-gene expression signature associated with discrimination between IBC with and without pathologic complete response after primary chemotherapy (20). Genes (rows) are ordered from top to bottom by their decreasing discriminating score. In each subtype, samples (columns) are ordered from left to right according to the correlation coefficient of their expression profile with the median profile of the IBC samples with PCR. Dashed orange line, threshold 0 that separates the two classes of samples, PCR+ (left) and PCR− (right). Bottom, pathologic response to primary chemotherapy (▪, PCR+; □, PCR−).
Supervised classification of breast cancer in each molecular subtype based on two different gene expression signatures. A, samples from each molecular subtype are classified using a 109-gene expression signature associated with NIBC/IBC discrimination (20). Genes (rows) are ordered from top to bottom by their decreasing discriminating score. In each subtype, samples (columns) are ordered from left to right according to the correlation coefficient of their expression profile with the median profile of the IBC samples. Dashed orange line, threshold 0 that separates the two classes of samples, IBC class (left) and NIBC class (right). Bottom, histoclinical type of tumors (□, NIBC; ▪, IBC). B, samples from each of the three main molecular subtypes are classified using an 85-gene expression signature associated with discrimination between IBC with and without pathologic complete response after primary chemotherapy (20). Genes (rows) are ordered from top to bottom by their decreasing discriminating score. In each subtype, samples (columns) are ordered from left to right according to the correlation coefficient of their expression profile with the median profile of the IBC samples with PCR. Dashed orange line, threshold 0 that separates the two classes of samples, PCR+ (left) and PCR− (right). Bottom, pathologic response to primary chemotherapy (▪, PCR+; □, PCR−).
We have also previously identified an 85-gene signature that discriminates PCR+ and PCR− IBC patients (20). We investigated whether the 85-gene signature still performed well in the three main subtypes (Fig. 4B). All eight patients with PCR were correctly classified, whatever their subtype. The number of samples correctly classified was 82% for luminal A (P = 0.02, χ2 test), 80% for ERBB2-overexpressing (P = 0.17, χ2 test), and 100% for basal (P = 0.02, χ2 test).
Discussion
The current prognostic classification of breast cancer based on pathologic and clinical factors is not sufficient to capture the heterogeneity and complex cascade of events that drive the clinical behavior of tumors. Consequently, molecularly distinct diseases are grouped in similar clinical classes. High-throughput molecular techniques such as DNA microarrays provide novel tools to tackle this heterogeneity. Genome-wide expression studies of various cancers have identified subtypes of tumors previously unrecognized but biologically and clinically relevant with respect to both cellular features and prognosis (11–15, 31–33). Such tumor subtypes probably represent different diseases that ideally would require different treatment.
In breast cancer, five molecular subtypes have been repeatedly described in NIBC (12, 13, 15, 17, 18). Recently, Sorlie et al. confirmed the capacity of an intrinsic 500-gene set for identifying these subtypes in three independent panels of tumors (14). Because of the known histoclinical differences between IBC and NIBC and the heterogeneity of IBC, we have investigated whether these molecular subtypes are also present in IBC. Using DNA microarrays with ∼8,000 genes, we profiled a series of NIBC and IBC.
To test in our series the presence of subtypes reported by Sorlie et al. with a 500-gene set (13, 14), we first defined a set of 120 genes that overlapped between this 500-gene set and the 2,300 well-measured and variably expressed genes of our arrays, and we used a similar approach based on hierarchical clustering and centroids. Despite reduction of the gene set, the five tumor subtypes were easily recognized in the 122 Norway/Stanford tumors, with 89% of samples clustering in the same subtypes. This result suggested the validity of our data and the robustness of the subtype classification and allowed computation of five centroids.
As expected, in our panel of cell lines and tumor samples, NIBC as well as IBC displayed transcriptional heterogeneity. Analysis of the correlation of each sample with each centroid correctly identified the five subtypes in all samples, including IBC, and hierarchical clustering grouped together samples from the same subtype. Thus, despite differences in patient series and microarray platforms, our results confirmed those of previous studies, with the identification of the five molecular subtypes in NIBC. Furthermore, they show for the first time the existence of the same five subtypes in IBC. Protein data, available for markers of luminal and basal type for some of our tumors, perfectly corroborated the luminal-like and the basal-like character of the samples, further supporting the validity of the taxonomy. By using an alternative gene set (2,300 genes) and different methods of analysis, we further showed that the classification was consistent and independent of the method, demonstrating its robustness and reliability. Very similar distribution of samples was observed by global unsupervised hierarchical clustering. Furthermore, 3-class or 5-class supervised analyses applied to the 2,300 genes and to the NIBC samples as a learning set identified gene signatures that successfully predicted the centroid-based subtype of IBC in 90% or 78% of cases, respectively.
The correlations of the three major (luminal A, basal, and ERBB2 overexpressing) subtypes of breast cancer with histoclinical factors were in perfect agreement with the literature, reinforcing the idea that the subtype classification may improve the management of breast cancer (13–15, 34). For example, basal-like tumors were frequently associated with high grade, negative ER, PR, and ERBB2 status, positive P53 status, and poor survival and should be offered a treatment different from that of luminal-like tumors. ER positivity is a good indicator of luminal-like subtype and is currently a major factor of good prognosis. However, several expression profiling studies have shown that ER-positive breast cancer is a heterogeneous subpopulation of breast cancer, both biologically and clinically. As previously described (12–15, 18), and for both NIBC and IBC, most ER and/or PR-positive samples were luminal A/B samples, and a few of them were ERBB2-overexpressing, basal, or normal breast–like cases. Immunohistochemistry determination of ER status, which may not indicate the functionality of the ER pathway in all cases, might be less performing in terms of prognosis than more refined assays such as gene expression signature, which can discriminate subclasses of ER-positive tumors with different clinical outcome (13–15, 35); the difference in 5-year metastasis-free survival between luminal A and luminal B tumors (36% versus 67%, respectively; P = 0.04, log-rank test) is consistent with such observation. Even more interestingly, IBC and NIBC subtypes not only displayed similar gene expression patterns but also similar differences in grade, protein levels (hormone receptors, ERBB2, P53), and clinical outcome. The three main subtypes showed different metastasis-free survival in both NIBC and IBC, with longer survival in the luminal A subtype than in the basal and ERBB2-overexpressing subtypes. For the first time, we also found a difference in the rate of PCR after primary chemotherapy between the three subtypes, which calls for analysis of a larger series of cases.
Our results, obtained using different patients, different clinical forms of breast cancer, and different microarray techniques, not only reinforce confidence in this new molecular taxonomy but also give evidence for its universality in breast cancer independently of a specific clinical form. The molecular subtypes of breast cancer can be identified in IBC, as in NIBC. This suggests that the intrinsic biology related to the cell type is more important to determine the transcriptional pattern than the clinical aspect of the disease. However, we also show that the recently reported differences in gene expr ession between NIBC and IBC and between PCR+ and PCR− IBC (20) persist in each subtype, suggesting that these gene signatures may be regulated in a manner common to all subtypes. Molecular subtype and inflammatory character are two independent features of breast cancers. Both can be recognized by gene profiling.
Acknowledgments
Grant support: Institut Paoli-Calmettes, Institut National de la Santé et de la Recherche Médicale, Université de la Méditerranée, grants from the Ministries of Health and Research (Cancéropôle), the Association pour le Recherche contre le Cancer (Printemps 2002, No. 4719), and Ligue Nationale contre le Cancer (label).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank F. Birg and D. Maraninchi for encouragement and discussions and S.P. Ethier for the gift of the SUM-149 cell line.