Purpose: Molecular classification of breast cancer has been proposed based on gene expression profiles of human tumors. Luminal, basal-like, normal-like, and erbB2+ subgroups were identified and were shown to have different prognoses. The goal of this research was to determine if these different molecular subtypes of breast cancer also respond differently to preoperative chemotherapy.

Experimental Design: Fine needle aspirations of 82 breast cancers were obtained before starting preoperative paclitaxel followed by 5-fluorouracil, doxorubicin, and cyclophosphamide chemotherapy. Gene expression profiling was done with Affymetrix U133A microarrays and the previously reported “breast intrinsic” gene set was used for hierarchical clustering and multidimensional scaling to assign molecular class.

Results: The basal-like and erbB2+ subgroups were associated with the highest rates of pathologic complete response (CR), 45% [95% confidence interval (95% CI), 24-68] and 45% (95% CI, 23-68), respectively, whereas the luminal tumors had a pathologic CR rate of 6% (95% CI, 1-21). No pathologic CR was observed among the normal-like cancers (95% CI, 0-31). Molecular class was not independent of conventional cliniocopathologic predictors of response such as estrogen receptor status and nuclear grade. None of the 61 genes associated with pathologic CR in the basal-like group were associated with pathologic CR in the erbB2+ group, suggesting that the molecular mechanisms of chemotherapy sensitivity may vary between these two estrogen receptor–negative subtypes.

Conclusions: The basal-like and erbB2+ subtypes of breast cancer are more sensitive to paclitaxel- and doxorubicin-containing preoperative chemotherapy than the luminal and normal-like cancers.

Breast cancer is a clinically heterogeneous disease. Histologically similar tumors may have different prognoses and may respond to therapy differently. It is believed that these differences in clinical behavior are due to molecular differences between histologically similar tumors. DNA microarray technology is ideally suited to reveal such molecular differences. A novel molecular classification of breast cancer based on gene expression profiles was recently proposed (1). The investigators identified a set of stably expressed genes (“intrinsic gene set”; n = 534) that accounted for much of the molecular differences between 42 breast cancers and did hierarchical cluster analysis to identify subgroups of cancers with separate gene expression profiles. Luminal, basal-like, normal-like, and erbB2+ subgroups were identified and were shown to have different prognoses (14). These results were confirmed in follow-up experiments by the same group and others using larger numbers of cases. The basal-like (mostly estrogen receptor negative) and erbB2+ (mostly HER-2 amplified and estrogen receptor negative) subgroups had the shortest relapse-free and overall survival, whereas the luminal-type (estrogen receptor–positive) tumors had a more favorable clinical outcome (24). There is no published data on how the different molecular classes of breast cancer respond to chemotherapy. The goal of the current project was to examine if these different molecular subclasses of breast cancer also respond differently to anthracycline- and paclitaxel-containing preoperative chemotherapy.

Fine needle aspirations of breast cancer were collected in a prospectively designed pharmacogenomic marker discovery study at the Nellie B. Connally Breast Center of the University of Texas M.D. Anderson Cancer Center. The goal of the ongoing clinical study was to develop multigene predictors of pathologic complete response (CR) to preoperative therapy. The current analysis was undertaken to examine if molecular class is associated with sensitivity to chemotherapy. Gene expression results from the first 82 patients with stage I to III breast cancer were included in this analysis. Patient characteristics were presented in Table 1. Fine needle aspiration was done using a 23- or 25-gauge needle before starting preoperative chemotherapy with 12 weeks of paclitaxel followed by 5-fluorouracil, doxorubicin, and cyclophosphamide × 4 courses. Cells from 2 to 3 passes were collected into vials containing 1 mL of RNAlater solution (Ambion, Austin, TX) and stored at −80°C. Median RNA yield of the 82 specimens was 2.0 μg (1-22 μg). Approximately 70% of all aspirations yielded at least 1 μg total RNA, which is required for gene expression profiling. The main reason for failure to obtain sufficient RNA was acellular aspirations (low cell yield). The cellular composition of the fine needle aspiration samples was previously reported; in brief, fine needle aspiration samples on average contain 80% neoplastic cells and the rest of the cells are infiltrating leukocytes (5). These samples contain little or no stromal cells (fibroblasts and adipocytes) or normal breast epithelium. Of the 82 RNA specimens used in this analysis, 33 were included in a previous pharmacogenomic analysis using cDNA arrays (6). These 33 cases were profiled on both platforms (Affymetrix U133A and proprietary cDNA) and the results of the cross platform comparison of gene expression data were published separately (7). All patients underwent surgery after completion of 24 weeks of preoperative chemotherapy. Grossly visible residual cancer was measured and representative sections were submitted for routine histopathologic examination. When there was no grossly visible residual cancer, the slices of the specimen were radiographed and all areas of radiologically and/or architecturally abnormal tissue were entirely submitted for histopathologic study. Patients without any residual invasive cancer in the breast and axillary lymph nodes were considered to have pathologic CR. Patients with residual in situ cancer (DCIS) only were also considered to have pathologic CR. Estrogen receptor and HER-2 status was determined by routine clinical diagnostic methods [using mouse monoclonal anti–estrogen receptor antibody 6F11 (Novacastra/Vector Laboratories, Burlingame, CA) and fluorescence in situ hybridization assay to determine HER-2 amplification (PathVision kit, Vysis, Dovners Grove, IL)] on a diagnostic core needle biopsy obtained before or concomitant to the research fine needle aspiration. Nuclear grade was defined by the modified Black's nuclear grading system (1 = low grade, 2 = intermediate grade, and 3 = high grade; ref. 8). The study was approved by the Institutional Review Board of M.D. Anderson Cancer Center, and all patients signed an informed consent.

Table 1.

Clinical information and demographics of the patients included in the study (n = 82)

Female 82 (100%) 
Median age 52 y (range 29-79) 
Race  
    Caucasian 56 (68%) 
    African American 11 (13%) 
    Asian 7 (9%) 
    Hispanic 6 (7%) 
    Mixed 2 (2%) 
Histology  
    Invasive ductal 73 (89%) 
    Mixed ductal/lobular 6 (7%) 
    Invasive lobular 1 (1%) 
    Invasive mucinous 2 (2%) 
Tumor-node-metastasis stage  
    T1 7 (9%) 
    T2 46 (56%) 
    T3 15 (18%) 
    T4 14 (17%) 
    N0 28 (34%) 
    N1 38 (46%) 
    N2 8 (10%) 
    N3 8 (10%) 
Nuclear grade (benign melanocytic nevus)  
    1 2 (2%) 
    2 23 (37%) 
    3 35 (61%) 
Estrogen receptor positive* 35 (43%) 
Estrogen receptor negative 47 (57%) 
HER-2 positive 57 (70%) 
HER-2 negative 25 (30%) 
Neoadjuvant therapy  
    Weekly T (80 mg/m2) × 12 + FAC × 4 69 (84%) 
    3-weekly T (225 mg/m CI) × 4 + FAC × 4 13 (16%) 
Pathologic CR 21 (26%) 
Residuald isease 61 (74%) 
Female 82 (100%) 
Median age 52 y (range 29-79) 
Race  
    Caucasian 56 (68%) 
    African American 11 (13%) 
    Asian 7 (9%) 
    Hispanic 6 (7%) 
    Mixed 2 (2%) 
Histology  
    Invasive ductal 73 (89%) 
    Mixed ductal/lobular 6 (7%) 
    Invasive lobular 1 (1%) 
    Invasive mucinous 2 (2%) 
Tumor-node-metastasis stage  
    T1 7 (9%) 
    T2 46 (56%) 
    T3 15 (18%) 
    T4 14 (17%) 
    N0 28 (34%) 
    N1 38 (46%) 
    N2 8 (10%) 
    N3 8 (10%) 
Nuclear grade (benign melanocytic nevus)  
    1 2 (2%) 
    2 23 (37%) 
    3 35 (61%) 
Estrogen receptor positive* 35 (43%) 
Estrogen receptor negative 47 (57%) 
HER-2 positive 57 (70%) 
HER-2 negative 25 (30%) 
Neoadjuvant therapy  
    Weekly T (80 mg/m2) × 12 + FAC × 4 69 (84%) 
    3-weekly T (225 mg/m CI) × 4 + FAC × 4 13 (16%) 
Pathologic CR 21 (26%) 
Residuald isease 61 (74%) 
*

Cases where >10% of tumor cells stained positive for estrogen receptor with immunohistochemistry were considered positive.

Cases that showed gene copy number >2.0 were considered HER-2 positive.

T, paclitaxel; CI, 24-hour continuous infusion; and FAC, 5-flurouracil (500 mg/m2), doxorubicin (50 mg/m2), and cyclophosphamide (500 mg/m2).

RNA was extracted from fine needle aspiration samples using the RNAeasy Kit (Qiagen, Valencia CA). The amount and quality of RNA were assessed with a DU-640 UV Spectrophotometer (Beckman Coulter, Fullerton, CA) and by an Agilent 2100 Bioanalyzer RNA 6000 LabChip kit (Agilent Technologies, Palo Alto, CA). Profiling was done without second round amplification using a minimum of 1 μg total RNA. Double-stranded cDNA was synthesized, followed by in vivo transcription reaction to generate biotinylated cRNA. Biotin-labeled and fragmented cRNA was hybridized to Affymetrix U133A gene chips overnight at 42°C. The Affymetrix GeneChip system was used for hybridization and scanning and the dCHIP V1.3 (http://dchip.com) software was used to generate probe level signal and for normalization of data across arrays.

dCHIP V1.3 software was used for normalization; this program normalizes all arrays to one standard array that represents a chip with median overall intensity. After normalization, estimates of feature level intensity were derived from the 75th percentile of pixel level intensity of each feature. Each individual probe was aggregated at the feature level to form a single measure of intensity for each probe set. We used the perfect match model. Statistical analysis was done by using the BRB-Arraytools version 3.0 software package (http://linus.nci.nih.gov/BRB-ArrayTools.html). Complete linkage hierarchical clustering was done with the previously published breast cancer intrinsic gene set with 1− Pearson correlation coefficient as distance metric (1). Cluster reproducibility and the robustness of the dendograms were examined by the method proposed by McShane et al. (9) based on 500 perturbations. Tumors clustering together in significant dendrogram branches were categorized as one molecular class. We also used multidimensional scaling with the Eucledian distance as metric to provide graphical representation of the distances among samples. This method also made it possible to test global statistical significance to determine whether the expression profiles form distinct clusters (rather than represent the same multivariate Gaussian distribution). Genes differentially expressed in a particular molecular class compared with all other tumors and between cases of pathologic CR and residual cancer within a single molecular subgroup were identified using the significance analysis of microarrays (SAM) software with 1,000 sample permutations. SAM uses permutations to estimate the false discovery rate and an adjustable threshold allows for control of the false discovery rate (10).

Pathologic complete response rates were calculated for each molecular class and assessed in univariate analysis (χ2 test) and multivariate analysis (logistic regression). Estrogen receptor and HER-2 status, nuclear grade, tumor size, and lymph node involvement were included in the multivariate analysis. We built logistic regression–based prediction models with various combinations of clinical variables and molecular class to examine if knowledge of the molecular class improves prediction accuracy above what can be achieved by combining routine clinical variables.

Hierarchical clustering with the breast cancer intrinsic gene set reveals previously described molecular classes in fine needle aspiration specimens. The intrinsic breast cancer gene set consists of 534 genes of which expression showed significantly larger variation between tumors than between paired samples from the same tumor in an early seminal publication (1). Of these intrinsic genes, 424 were represented on the Affymetrix U133A chip. We did supervised hierarchical clustering with 689 Affymetrix probe sets that represented these 424 genes to define the molecular classes of breast tumors in our data. The tumors clustered into four major classes. The reproducibility indices of the four distinct clusters were 0.82, 0.76, 0.85, and 0.78, respectively, which indicates reasonably robust clusters (9). Tumors within each molecular subtypes corresponded well to the previously described clinicopathologic phenotypes of luminal (n = 30), normal-like (n = 10), basal-like (n = 22), and erbB2+ (n = 20) cancers (Fig. 1A). All of the luminal tumors were estrogen receptor positive by immunohistochemistry. All but two cases (80%) of the erbB2+ molecular class had HER-2 gene amplification by fluorescence in situ hybridization analysis. All but one of the basal-like tumors (95%) was estrogen receptor negative and 75% of these tumors were also high nuclear grade. These groups did not differ significantly in nodal status, tumor size, or patient age distribution (Fig. 1B). Multidimensional scaling analysis also confirmed the presence of significant clustering of the cases (global test of significance P = 0.04, Fig. 1C). To examine how sensitive the cluster results are to the actual gene set used for clustering, we did a multidimensional scaling analysis using the probe sets with the highest variance (top 10%) across all samples (2,228 probe sets including 229 overlapping probes with the intrinsic gene set). Cases with the same molecular class (as defined by the intrinsic gene set) continued to cluster together (global test of significance P = 0.047; Fig 1E). This suggests that the gene signature–based groups are robust.

Fig. 1.

Different molecular classes of breast cancer show distinct clinicopathologic features and respond differently to chemotherapy. A, hierarchical clustering using the 689 probe sets (424 unique genes) corresponding to the breast intrinsic gene set (n = 82 cases). B, correlation between molecular subclass and clinicopathologic characteristics in univariate analysis. C, graphical representation of the distances among the samples that belong to different molecular classes using multidimensional scaling (P = global test of significance for molecular subclasses). D, cases with pathologic CR form a separate cluster in multidimensional scaling analysis. E, when genes with the greatest variance (top 10%, n = 2228) across samples were used for multidimensional scaling, rather than the intrinsic gene set, molecular subgroups continued to cluster together.

Fig. 1.

Different molecular classes of breast cancer show distinct clinicopathologic features and respond differently to chemotherapy. A, hierarchical clustering using the 689 probe sets (424 unique genes) corresponding to the breast intrinsic gene set (n = 82 cases). B, correlation between molecular subclass and clinicopathologic characteristics in univariate analysis. C, graphical representation of the distances among the samples that belong to different molecular classes using multidimensional scaling (P = global test of significance for molecular subclasses). D, cases with pathologic CR form a separate cluster in multidimensional scaling analysis. E, when genes with the greatest variance (top 10%, n = 2228) across samples were used for multidimensional scaling, rather than the intrinsic gene set, molecular subgroups continued to cluster together.

Close modal

To define the molecular differences further between the subgroups, we identified differentially expressed genes between the four molecular classes using SAM analysis on the most variably expressed probe sets (n = 2,228). Setting the most stringent false discovery rate at 0.0001, 372 probe sets representing 298 genes were identified as differentially expressed between the four distinct groups (Supplementary Table S1). The high expression of estrogen receptor 1 and several of the known estrogen receptor–inducible genes, such as X-box binding protein 1 and SLC39A6 among many others, characterized the luminal subgroup. The basal-like subgroup was characterized by the expression of keratin 17, keratin 5, and γ-aminobutyric acid receptor π subunit among others. The erbB2+ subtype was characterized by the overexpression of genes that are located in the HER-2 amplicon including erbB2 and GRB7. Interestingly, the normal-like group had only 15 genes that were overexpressed in this subgroup. These gene lists could be used to further characterize the various molecular subclasses and for the development of supervised molecular class prediction methods.

Correlation between molecular class and pathologic complete response to preoperative chemotherapy. The rates of pathologic CR differed significantly among the four molecular classes of breast cancer defined by clustering using the intrinsic gene set. Basal-like and erbB2+ subgroups were associated with the highest rate of pathologic CR, 45% [95% confidence interval (95% CI), 24-68] and 45% (95% CI, 23-68), respectively, whereas luminal tumors had a pathologic CR rate of 6% (95% CI, 1-21). No pathologic complete response was observed in the normal-like subclass (Table 2). We next used multidimensional scaling graph to explore if the breast intrinsic gene set can separate cases with pathologic CR versus those with residual disease (Fig. 1D). The global test of significance showed that the observed clusters of pathologic CR and residual disease were significantly separate (P = 0.026).

Table 2.

Correlation between molecular classification and pathologic complete response

Pathologic complete response
NoYes
Molecular classification n [% (95% CI)] n [% (95% CI)]  
    Luminal A/B subtype 28 [93% (78-99)] 2 [7% (1-22)]  
    Normal breast like 10 [100% (29-100)] 0 [0% (0-31)]  
    erbB2+ 11 [55% (32-77)] 9 [45% (23-68)]  
    Basal subtype 12 [55% (32-76)] 10 [45% (24-68)] P < 0.001 
Pathologic complete response
NoYes
Molecular classification n [% (95% CI)] n [% (95% CI)]  
    Luminal A/B subtype 28 [93% (78-99)] 2 [7% (1-22)]  
    Normal breast like 10 [100% (29-100)] 0 [0% (0-31)]  
    erbB2+ 11 [55% (32-77)] 9 [45% (23-68)]  
    Basal subtype 12 [55% (32-76)] 10 [45% (24-68)] P < 0.001 

Next, we examined the clinical pathologic variables that were associated with pathologic CR in this data. Age < 50 years and estrogen receptor–negative status were identified as independent variables associated with higher likelihood of pathologic CR in multivariate analysis including age, estrogen receptor and HER-2 status, tumor size, clinical nodal status, and nuclear grade (Table 3). To examine if knowledge of the molecular class improves estimation of probability of pathologic CR beyond what can be achieved with routine clinical variables, we built three different logistic regression models including the clinical variables (age, tumor, node stage), the histopathologic variables (grade, estrogen receptor, and HER-2 status), and the molecular class in various combinations. For this analysis, we merged the luminal and the normal-like groups because there was no pathologic CR in the normal-like category and these tumors were phenotypically similar to the luminal tumors (HER-2 normal and estrogen receptor positive) that also had low pathologic CR rates. Molecular class was not independently associated with pathologic CR because of the high correlation between molecular class and estrogen receptor status and nuclear grade in this cohort. We constructed Receiver Operating Characteristic curves to measure the predictive accuracy of the logistic regression models including (a) clinical + pathologic variables, (b) clinical variables + molecular classification, and (c) clinical + pathologic variables + molecular class (Fig. 2). The three models yielded similar areas under the Receiver Operating Characteristic curve. This indicates that the molecular class alone can replace histopathologic characteristics (estrogen receptor, HER-2 status, or grade) for prediction of pathologic CR but provides little additional information when these characteristics are included. More directed supervised class prediction methods may be needed to develop a multigene predictor of pathologic CR. Such predictors can be developed by identifying informative genes that are differentially expressed between cases of pathologic CR and residual disease and combing these genes into a weighted prediction score or algorithm.

Table 3.

Multivariate analysis of predictive factors for pathologic CR

VariablesModel 1: clinical and histologic variables
Model 2: clinical variables and molecular classification
Model 3: clinical, histologic variables, molecular classification
OR (95% CI)POR (95% CI)POR (95% CI)P
Age (y)       
    <50    
    >50 0.27 (0.8-0.91) 0.035 0.17 (0.06-0.45) <0.001 0.43 (0.11-1.7) 0.43 
Tumor (cm)       
    <5    
    >5 0.55 (0.14-2.3) 0.41 0.53 (0.15-1.8) 0.28 0.64 (0.14-2.9) 0.56 
Node       
    N0    
    N1-3 0.96 (0.31-3.0) 0.94 0.65 (0.22-2.0) 0.18 0.90 (0.22-3.7) 0.90 
Estrogen receptor       
    Negative     
    Positive 0.12 (0.02-0.31) <0.001   0.08 (0.02-0.35) 0.001 
HER-2       
    Negative     
    Positive 1.77 (0.42-7.5) 0.43   0.32 (0.03-3.6) 0.34 
Nuclear grade       
    1/2     
    3 2.6 (0.81-8.4) 0.11   2.5 (0.4-13.6) 0.30 
Histology       
    Ductal     
    Other 1.14 (0.17-7.5) 0.89   2.3 (0.11-1.7) 0.76 
Molecular classification       
    Luminal/normal-like     
    Normal-like   0 (0-…) 0.99 0 (0-…) 0.99 
    Basal-like   3.3 (1.0-11) 0.06 0.8 (0.12-5.5) 0.83 
    erbB2+   4.4 (1.2-17) 0.026 7.8 (0.62-100) 0.11 
VariablesModel 1: clinical and histologic variables
Model 2: clinical variables and molecular classification
Model 3: clinical, histologic variables, molecular classification
OR (95% CI)POR (95% CI)POR (95% CI)P
Age (y)       
    <50    
    >50 0.27 (0.8-0.91) 0.035 0.17 (0.06-0.45) <0.001 0.43 (0.11-1.7) 0.43 
Tumor (cm)       
    <5    
    >5 0.55 (0.14-2.3) 0.41 0.53 (0.15-1.8) 0.28 0.64 (0.14-2.9) 0.56 
Node       
    N0    
    N1-3 0.96 (0.31-3.0) 0.94 0.65 (0.22-2.0) 0.18 0.90 (0.22-3.7) 0.90 
Estrogen receptor       
    Negative     
    Positive 0.12 (0.02-0.31) <0.001   0.08 (0.02-0.35) 0.001 
HER-2       
    Negative     
    Positive 1.77 (0.42-7.5) 0.43   0.32 (0.03-3.6) 0.34 
Nuclear grade       
    1/2     
    3 2.6 (0.81-8.4) 0.11   2.5 (0.4-13.6) 0.30 
Histology       
    Ductal     
    Other 1.14 (0.17-7.5) 0.89   2.3 (0.11-1.7) 0.76 
Molecular classification       
    Luminal/normal-like     
    Normal-like   0 (0-…) 0.99 0 (0-…) 0.99 
    Basal-like   3.3 (1.0-11) 0.06 0.8 (0.12-5.5) 0.83 
    erbB2+   4.4 (1.2-17) 0.026 7.8 (0.62-100) 0.11 

NOTE: Multivariate analysis of different combinations of clinical (age, tumor, node) and histopathologic characteristics (grade, estrogen receptor and HER-2 status, and histologic type) and molecular class as variables. Three distinct prediction models were examined: clinical plus histopathologic variables (model 1), clinical variables plus molecular class (model 2), and all three types of information together (model 3).

Fig. 2.

Receiver Operating Characteristic curves for logistic regression models. Three different prediction models were compared including clinical plus histopathologic variables (model 1), clinical variables plus molecular classification (model 2), and clinical plus histopathologic plus molecular classification (model 3). All three models were similarly done.

Fig. 2.

Receiver Operating Characteristic curves for logistic regression models. Three different prediction models were compared including clinical plus histopathologic variables (model 1), clinical variables plus molecular classification (model 2), and clinical plus histopathologic plus molecular classification (model 3). All three models were similarly done.

Close modal

Genes associated with pathologic complete response in the different molecular subgroups. We next examined if the genes of which expression was associated with pathologic CR differed between basal-like and erbB2+ subtypes. Because only two cases of pathologic CR were observed in the luminal group and no pathologic CR was seen in the normal-like group, these groups were not included in this exploratory analysis. Seventy-two probe sets (corresponding to 61 genes) were differentially expressed between basal-like tumors that achieved a pathologic CR and those that did not (Table 4). All highly variable genes (n = 2,228) were used in this analysis and the false discovery rate was set at 5%. Interestingly, within the erbB2+ group, zero genes were identified at false discovery rate < 10%. If the false discovery rate was set at 50%, 16 probe sets (15 genes) were identified; however, half of these could represent spurious discovery (data not shown). A greater variance of gene expression among erbB2+ tumors and, therefore, greater molecular heterogeneity compared with basal-like tumors combined with the small sample size (erbB2+; n = 20) may explain why it was difficult to identify genes in this group. Importantly, none of the genes associated with pathologic CR in the basal-like group was associated with pathologic CR in the erbB2+ group. We also assessed if there was a correlation between fold differences of expression of pathologic CR-associated genes in cases with pathologic CR compared with residual disease in basal-like and erbB2+ tumors, respectively. There was no correlation (P = 0.19). The absence of correlation suggests that genes associated with chemotherapy sensitivity are different between these two molecular subgroups of breast cancer.

Table 4.

Genes associated with pathologic CR in basal-like breast cancer

Probe setGene symbolFold difference of means between residual disease and pathologic CR
213060_s_at* CHI3L2 4.117 
200052_s_at ILF2 2.645 
213338_at RIS1 3.181 
214433_s_at* SELENBP1 2.735 
204319_s_at RGS10 2.183 
213005_s_at ANKRD15 2.517 
221561_at SOAT1 2.156 
212190_at SERPINE2 2.448 
209387_s_at TM4SF1 3.096 
10 221727_at PC4 1.842 
11 213716_s_at SECTM1 2.183 
12 203165_s_at SLC33A1 1.826 
13 207414_s_at PACE4 1.732 
14 201819_at SCARB1 2.118 
15 217983_s_at RNASET2 1.915 
16 214540_at HIST1H2BO 1.864 
17 218538_s_at MRS2L 1.794 
18 202506_at SSFA2 1.678 
19 215071_s_at HIST1H2AC 2.099 
20 202988_s_at RGS1 2.328 
21 220624_s_at ELF5 2.521 
22 221505_at* ANP32E 2.51 
23 208370_s_at DSCR1 2.332 
24 204407_at TTF2 1.913 
25 218398_at MRPS30 1.595 
26 213754_s_at TRIM26 1.844 
27 210147_at ART3 3.943 
28 204809_at CLPX 1.813 
29 202035_s_at SFRP1 6.461 
30 209389_x_at DBI 1.792 
31 201897_s_at CKS1B 1.937 
32 209142_s_at UBE2G1 1.746 
33 209340_at UAP1 1.724 
34 203362_s_at MAD2L1 1.967 
35 217028_at CXCR4 2.097 
36 205044_at* GABRP 6.224 
37 36711_at MAFF 2.271 
38 202023_at EFNA1 1.721 
39 212915_at PDZRN3 2.539 
40 217851_s_at C20orf45 1.734 
41 211762_sat KPNA2 1.849 
42 213134_x_at* BTG3 2.007 
43 204162_at KNTC2 2.283 
44 212276_at LPIN1 2.219 
45 219768_at B7-H4 1.843 
46 209551_at MGC11061 1.925 
47 203744_at HMGB3 1.457 
48 200975_at PPT1 1.628 
49 221931_s_at SEC13L 1.796 
50 209786_at HMGN4 1.63 
51 218963_s_at KRT23 3.088 
52 219209_at MDA5 2.337 
53 214214_s_at C1QBP 1.799 
54 209656_s_at TM4SF10 2.645 
55 203706_s_at* FZD7 2.603 
56 206055_sat SNRPA1 1.799 
57 204825_at MELK 1.735 
58 212762_s_at TCF7L2 1.928 
59 203423_at RBP1 1.82 
60 210605_s_at* MFGE8 2.085 
61 214835_s_at SUCLG2 1.576 
Probe setGene symbolFold difference of means between residual disease and pathologic CR
213060_s_at* CHI3L2 4.117 
200052_s_at ILF2 2.645 
213338_at RIS1 3.181 
214433_s_at* SELENBP1 2.735 
204319_s_at RGS10 2.183 
213005_s_at ANKRD15 2.517 
221561_at SOAT1 2.156 
212190_at SERPINE2 2.448 
209387_s_at TM4SF1 3.096 
10 221727_at PC4 1.842 
11 213716_s_at SECTM1 2.183 
12 203165_s_at SLC33A1 1.826 
13 207414_s_at PACE4 1.732 
14 201819_at SCARB1 2.118 
15 217983_s_at RNASET2 1.915 
16 214540_at HIST1H2BO 1.864 
17 218538_s_at MRS2L 1.794 
18 202506_at SSFA2 1.678 
19 215071_s_at HIST1H2AC 2.099 
20 202988_s_at RGS1 2.328 
21 220624_s_at ELF5 2.521 
22 221505_at* ANP32E 2.51 
23 208370_s_at DSCR1 2.332 
24 204407_at TTF2 1.913 
25 218398_at MRPS30 1.595 
26 213754_s_at TRIM26 1.844 
27 210147_at ART3 3.943 
28 204809_at CLPX 1.813 
29 202035_s_at SFRP1 6.461 
30 209389_x_at DBI 1.792 
31 201897_s_at CKS1B 1.937 
32 209142_s_at UBE2G1 1.746 
33 209340_at UAP1 1.724 
34 203362_s_at MAD2L1 1.967 
35 217028_at CXCR4 2.097 
36 205044_at* GABRP 6.224 
37 36711_at MAFF 2.271 
38 202023_at EFNA1 1.721 
39 212915_at PDZRN3 2.539 
40 217851_s_at C20orf45 1.734 
41 211762_sat KPNA2 1.849 
42 213134_x_at* BTG3 2.007 
43 204162_at KNTC2 2.283 
44 212276_at LPIN1 2.219 
45 219768_at B7-H4 1.843 
46 209551_at MGC11061 1.925 
47 203744_at HMGB3 1.457 
48 200975_at PPT1 1.628 
49 221931_s_at SEC13L 1.796 
50 209786_at HMGN4 1.63 
51 218963_s_at KRT23 3.088 
52 219209_at MDA5 2.337 
53 214214_s_at C1QBP 1.799 
54 209656_s_at TM4SF10 2.645 
55 203706_s_at* FZD7 2.603 
56 206055_sat SNRPA1 1.799 
57 204825_at MELK 1.735 
58 212762_s_at TCF7L2 1.928 
59 203423_at RBP1 1.82 
60 210605_s_at* MFGE8 2.085 
61 214835_s_at SUCLG2 1.576 

NOTE: There are 72 probes sets significant by SAM, corresponding to 61 genes. The median false discovery rate among the 72 significant genes is 0.04. Genes are ranked by significance.

The goal of this current project was to further evaluate the clinical relevance of a novel gene expression–based classification system of breast cancer. This new classification is based on gene expression signatures of variably expressed genes in breast cancer (1). It has previously been shown that the various molecular classes have different long-term survivals (14). However, it is not possible to decipher from these earlier studies if the differences in survival are due to different metastatic potentials or to different sensitivities to adjuvant chemotherapy or hormonal therapy because the patients included in these studies received various forms of multimodality treatment. In the current study, we examined newly diagnosed stage I to III breast cancers that all received preoperative treatment with anthracycline and taxane followed by surgery to determine if the different molecular classes show different chemotherapy sensitivities based on pathologic response to preoperative chemotherapy.

All previous reports on molecular classification used frozen breast cancer tissues for gene expression profiling. The current study differs from these in that we used fine needle aspiration specimens. Surgically resected cancer tissues differ from fine needle aspiration in cellular composition. The fine needle aspiration material contains 80% to 90% pure neoplastic cells whereas surgical biopsies or core needle biopsies contain a variable amount of stromal cells. It was therefore of interest to determine if the intrinsic gene set that discriminated molecular class in surgical specimens could also separate molecular classes of breast cancer in fine needle aspiration data. If such separation can be observed, this would suggest that these informative genes are primarily expressed in neoplastic cells rather than in stromal cells.

In the current study, we did hierarchical clustering and multidimensional scaling analysis using the breast cancer intrinsic gene set which mimics the original class discovery process because there are presently no uniformly accepted class prediction tools to define the molecular classes of breast cancer utilizing gene expression data. We observed very similar results in our fine needle aspiration data as was reported by others on surgical tissues. The two most readily distinguishable molecular classes of breast cancer are the basal-like and luminal subtypes whereas the normal-breast like class is the least robust. This may be due to the possibility that the original samples in this category contained significant amount of contaminating normal breast tissue. The basal-like, erbB2+, and luminal subclasses were distinguished by some of the same genes and histologic phenotypes in our series as previously reported. This supports the hypothesis that these clusters represent genuinely different diseases within breast cancer (3, 4).

The different molecular classes of breast cancer showed different sensitivities to preoperative chemotherapy. The basal-like and erbB2+ subgroups had the highest rates of pathologic CR, 45% (95% CI, 23-68). The luminal and normal-like tumors had low pathologic CR rates of 6% (95% CI, 1-21) and 0% (95% CI, 0-21%), respectively. However, the pathologic predictors of response (i.e., grade and estrogen receptor status). The basal-like and erbB2+ tumors were predominantly high nuclear grade and the basal-like tumors were almost all estrogen receptor negative. Both of these characteristics are known to be associated with higher likelihood of pathologic CR to preoperative chemotherapy (1113). Because of this association, incorporation of molecular class into a logistic regression–based predictor of response did not improve the prediction accuracy compared with using routine clinical and pathologic variables only. Therefore, it is likely that more focused gene signature–based predictors will need to be developed through supervised outcome prediction methods.

How to define the best multigene predictor of response to chemotherapy is not known. One approach is to group all breast cancers into either responders (e.g., pathologic CR) or nonresponders (e.g., residual disease), define the gene expression differences between these groups, and use this information to construct a response prediction score or machine learning–based predictor. This approach was successfully applied to develop prognostic signatures for breast cancer and was also promising in small pilot studies of chemotherapy response prediction (6, 14, 15). To develop the best possible supervised classifier for prediction of pathologic CR from this data set was not the goal of this current analysis. However, if distinct molecular classes of breast cancer exist, one could hypothesize that stratification of patients by molecular class may yield more accurate class-specific predictors than unstratified use of the data.

As an exploratory analysis, we attempted to define the molecular differences between tumors that are extremely chemotherapy sensitive (pathologic CR) and those that are more resistant (residual disease) within the basal-like and erbB2+ groups, separately. In the basal-like group (n = 22, including 10 pathologic CR), 61 genes were identified that were statistically significantly associated with pathologic CR. It is important to realize that none of these genes are associated with estrogen receptor status or high grade (the two conventional strong predictors of pathologic CR) because the basal-like group almost exclusively consists of high-grade and estrogen receptor–negative tumors. We could not define a robust gene set that correlated with pathologic CR in the erbB2+ group (n = 20, including 9 pathologic CR). Importantly, the genes that were associated with pathologic CR in the basal-like group were not associated with pathologic CR in the erbB2+ group. This suggests that distinct sets of genes are associated with pathologic CR in the different molecular classes.

It is tempting to speculate on the biological function of the genes that are differentially expressed between cases with pathologic CR and those with residual cancer. However, not all of these genes may play a causative role in determining sensitivity to chemotherapy. Some of these may be distant downstream transcriptional effects of biological events that influence drug sensitivity and a few could represent spurious discovery. From the vantage point of gaining mechanistic insight into the biology of chemotherapy sensitivity or resistance, these gene lists should be regarded as hypothesis-generating and will require further in vitro experimentation to show a functional role for any particular molecule.

In summary, these results indicate that the major molecular classes of breast cancer can be detected in gene expression data regardless of tissue sampling method (i.e., fine needle aspirations, core needle, or surgical biopsies). The different molecular classes of breast cancer not only have different prognoses but also show distinct sensitivities to preoperative chemotherapy. The basal-like and erbB2+ subtypes of breast cancer are more sensitive to paclitaxel- and doxorubicin-containing preoperative chemotherapy than the luminal and normal-like cancers. The genes associated with pathologic CR were different between the basal-like and erbB2+ subgroups, which suggest that the mechanisms of chemotherapy sensitivity may vary across the subtypes. The possibility that distinct predictive signatures can be developed for the different molecular subtypes of breast cancer warrants further examination.

Grant support: The Nellie B. Connally Breast Cancer Research Fund, Millennium Pharmaceuticals, The Dee Simmons Fund, University of Texas M.D. Anderson Cancer Center Aventis Drug Development Award (L. Pusztai), The Susan G. Komen Breast Cancer Foundation (grant LF2002-044HM; W.F. Symmans), Association pour la Recherche sur le Cancer (R. Rouzier), and National Cancer Institute Specialized Program of Research Excellence in Breast Cancer (grant P50-CA58223-09A1; C.M. Perou).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).

1
Perou CM, Sorlie T, Eisen MB, et al. Molecular portraits of human breast tumours.
Nature
2000
;
406
:
747
–52.
2
Sorlie T, Perou CM, Tibshirani R, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications.
Proc Natl Acad Sci U S A
2001
;
98
:
10869
–74.
3
Sorlie T, Tibshirani R, Parker J, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets.
Proc Natl Acad Sci U S A
2003
;
100
:
8418
–23.
4
Sotiriou C, Neo SY, McShane LM, et al. Breast cancer classification and prognosis based on gene expression profiles from a population-based study.
Proc Natl Acad Sci U S A
2003
;
100
:
10393
–8.
5
Symmans WF, Ayers M, Clark EA, et al. Total RNA yield and microarray gene expression profiles from fine-needle aspiration biopsy and core-needle biopsy samples of breast carcinoma.
Cancer
2003
;
97
:
2960
–71.
6
Ayers M, Symmans WF, Stec J, et al. Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer.
J Clin Oncol
2004
;
22
:
2284
–93.
7
Stec J, Wang J, Coombes K, et al. Comparison of the predictive accuracy of DNA array based multigene classifiers across cDNA arrays and Affymetrix GeneChips. J Mol Diagnostics. In press 2005.
8
Cutler SJ, Black MM, Mork T, et al. Further observations on prognostic factors in cancer of the female breast.
Cancer
1969
;
24
:
653
–67.
9
McShane LM, Radmacher MD, Freidlin B, et al. Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.
Bioinformatics
2002
;
18
:
1462
–9.
10
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response.
Proc Natl Acad Sci U S A
2001
;
98
:
5116
–21.
11
Mathieu MC, Rouzier R, Llombart-Cussac A, et al. The poor responsiveness of infiltrating lobular breast carcinomas to neoadjuvant chemotherapy can be explained by their biological profile.
Eur J Cancer
2004
;
40
:
342
–51.
12
Rouzier R, Extra JM, Klijanienko J, et al. Incidence and prognostic significance of complete axillary downstaging after primary chemotherapy in breast cancer patients with T1 to T3 tumors and cytologically proven axillary metastatic lymph nodes.
J Clin Oncol
2002
;
20
:
1304
–10.
13
Kuerer HM, Newman LA, Smith TL, et al. Clinical course of breast cancer patients with complete pathologic primary tumor and axillary lymph node response to doxorubicin-based neoadjuvant chemotherapy.
J Clin Oncol
1999
;
17
:
460
–9.
14
Gianni L, Zambetti M, Clark K, et al. Gene expression profiles of paraffin-embedded core biopsy tissue predict response to chemotherapy in patients with locally advanced breast cancer.
ASCO Annual Meeting Proceedings
2004
;
22
:
501
.
15
Chang JC, Wooten EC, Tsimelzon A, et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer.
Lancet
2003
;
362
:
362
–9.

Supplementary data