Purpose:

Expression-based classifiers to predict pathologic complete response (pCR) after neoadjuvant chemotherapy (NACT) are not routinely used in the clinic. We aimed to build and validate a classifier for pCR after NACT.

Patients and Methods:

We performed a prospective multicenter study (EXPRESSION) including 114 patients treated with anthracycline/taxane-based NACT. Pretreatment core needle biopsies from 91 patients were used for gene expression analysis and classifier construction, followed by validation in five external cohorts (n = 619).

Results:

A 20-gene classifier established in the EXPRESSION cohort using a Youden index–based cut-off point predicted pCR in the validation cohorts with an accuracy, AUC, negative predictive value (NPV), positive predictive value, sensitivity, and specificity of 0.811, 0.768, 0.829, 0.587, 0.216, and 0.962, respectively. Alternatively, aiming for a high NPV by defining the cut-off point for classification based on the complete responder with the lowest predicted probability of pCR in the EXPRESSION cohort led to an NPV of 0.960 upon external validation. With this extreme-low cut-off point, a recommendation to not treat with anthracycline/taxane-based NACT would be possible for 121 of 619 unselected patients (19.5%) and 112 of 322 patients with luminal breast cancer (34.8%). The analysis of the molecular subtypes showed that the identification of patients who do not achieve a pCR by the 20-gene classifier was particularly relevant in luminal breast cancer.

Conclusions:

The novel 20-gene classifier reliably identifies patients who do not achieve a pCR in about one third of luminal breast cancers in both the EXPRESSION and combined validation cohorts.

Translational Relevance

Patients with breast cancer receive neoadjuvant chemotherapy (NACT) to reduce the size of locally advanced tumors. Overtreatment of many patients is a problem, because only a minority of unselected patients achieves a pathologic complete response (pCR), but all are at risk for chemotherapy-related toxicities. Optimally, a classifier should inform each patient whether or not she will respond to treatment. However, this has been difficult to achieve. An alternative approach is to construct a classifier that makes a very reliable statement, but only for a subset of patients. Here, we established and externally validated a 20-gene classifier that identifies patients who do not achieve a pCR with high confidence (negative predictive value: 0.960). One limitation is that a reliable recommendation not to treat with anthracycline/taxane-based NACT is possible only for one fifth of the patients. However, these mostly luminal patients can be spared the toxicity of a not optimally effective chemotherapy, and other therapeutic strategies could be considered.

Patients with breast cancer receive neoadjuvant chemotherapy (NACT) to reduce the size of locally advanced tumors with the aim to improve operability and enable breast-conserving surgery. In addition, NACT provides a unique opportunity to evaluate the response to chemotherapeutic agents in vivo and to assess the predictive value of a biomarker without superimposed prognostic effects. Accordingly, NACT is the preferred treatment method for patients with a clear indication for chemotherapy, according to several current consensus recommendations (1, 2). In addition, the response to NACT provides information on long-term prognosis, because the pathologic complete response (pCR) indicates a reduced risk of disease recurrence for the individual patient, especially in aggressive tumors (3). However, pCR rates for commonly used anthracycline- and taxane-based neoadjuvant therapies in unselected patients are relatively low ranging between approximately 15% and 40% (4–8). Thus, the majority of patients does not benefit from the treatment, but can nevertheless suffer from different degrees of chemotherapy-induced toxicity. Because of the close relationship between pCR and survival, a genomic classifier that accurately predicts pCR would be helpful for patient selection.

A main goal of a predictive classifier is to reliably identify patients who are likely to respond to NACT, and therefore should receive treatment. It is equally important to be able to identify patients with a low likelihood of response to spare the patients unnecessary toxicity. In the latter scenario, it is crucial that a therapeutic approach, which may benefit the patient, will not be withheld. This means that a high negative predictive value (NPV) is required, which informs about the proportion of true nonresponders among the patients who are classified as nonresponders. Previous efforts to predict pCR to neoadjuvant therapy with taxane-anthracycline based on the gene expression profile using multigene classifiers (9–22) have led to sensitivities and specificities ranging from 29% to 92% and 49% to 100%, respectively (Supplementary Table S1A). Limitations of the presently available studies include a high variability regarding the sensitivity and specificity reported in the different studies, the relatively small numbers of patients available for classifier construction, and the lack of extensive validation of classifiers in several additional, external cohorts. To address these limitations, we constructed a classifier that predicts pCR after sequential NACT with docetaxel, followed by a combination of fluorouracil, epirubicin, and cyclophosphamide (FEC) in early breast cancer. A transparent pipeline of statistical modeling and validation steps was created to construct and evaluate the classifier. Genome-wide gene expression data from pretreatment core needle biopsies (CNB) of patients with breast cancer, who were treated within the prospective, one-armed, noncomparative, multicenter phase II EXPRESSION clinical trial, were used to build the classifier. The performance of the classifier was externally validated in five publicly available transcriptomics datasets of patients with breast cancer who were neoadjuvantly treated with taxanes and/or anthracyclines, and it was further compared with a classifier constructed using only clinicopathologic variables, or by using a combination of the genes and clinicopathologic variables. Alternatively, we also focused on attaining a high negative or positive predictive value (NPV and PPV) by defining the cut-off point for classification based on the complete responder with the lowest, or the noncomplete responder with the highest, predicted probability of pCR in the EXPRESSION cohort, followed by external validation.

We established a 20-gene classifier that with high confidence identifies patients who do not achieve a pCR in the external cohorts (NPV: 0.960), when used with an extreme-low cut-off point. A limitation is that a recommendation would only be possible for one-fifth of the patients with an extremely low predicted probability of pCR, whereas no reliable statement about pCR would be possible for the remainder of the patients. However, the patients predicted to not achieve a pCR could potentially be spared the toxicity of a not optimally effective chemotherapy, and consequently other therapeutic strategies are warranted for these patients.

Study design

The multicenter, prospective, noncomparative phase II EXPRESSION trial (Fig. 1A; Supplementary Table S1B; identifier EudraCT: 2008-006381-29) was performed in agreement with the Helsinki Declaration and the ICH Guidelines on Good Clinical Practice. It was approved by the Ethical Review Board of the Medical Association of Rhineland-Palatinate, Germany. All patients gave written informed consent.

Figure 1.

Genome-wide gene expression data of patients with breast cancer treated within the prospective, one-armed, noncomparative, multicenter phase II EXPRESSION clinical trial. A, Study design. RNA for gene expression profiling was isolated from pretreatment CNBs. The patients were thereafter treated with sequential NACT, consisting of three cycles of docetaxel (100 mg/m2), followed by three cycles of FEC (5-FU 500 mg/m2, epirubicin 100 mg/m2, cyclophosphamide 500 mg/m2). B, CONSORT diagram illustrating the flow of patients through the study. Patient characteristics are in Supplementary Table S1C. C, Volcano plot with the log2 FC displayed on the x-axis and the negative log10P value for differential gene expression (pCR vs. non-pCR) on the y-axis. The horizontal dotted line indicates P = 0.01; the vertical dotted lines indicate, from left to right, FC < 0.5 (log2FC < −1), FC < 2/3 = 0.667 (log2FC < −0.585), FC > 1.5 (log2FC > 0.585), FC > 2 (log2FC > 1). D, Heatmap based on hierarchical clustering of the top-100 probesets with an absolute FC > 1.5 [log2FC> log2(1.5)] ranked according to the P value.

Figure 1.

Genome-wide gene expression data of patients with breast cancer treated within the prospective, one-armed, noncomparative, multicenter phase II EXPRESSION clinical trial. A, Study design. RNA for gene expression profiling was isolated from pretreatment CNBs. The patients were thereafter treated with sequential NACT, consisting of three cycles of docetaxel (100 mg/m2), followed by three cycles of FEC (5-FU 500 mg/m2, epirubicin 100 mg/m2, cyclophosphamide 500 mg/m2). B, CONSORT diagram illustrating the flow of patients through the study. Patient characteristics are in Supplementary Table S1C. C, Volcano plot with the log2 FC displayed on the x-axis and the negative log10P value for differential gene expression (pCR vs. non-pCR) on the y-axis. The horizontal dotted line indicates P = 0.01; the vertical dotted lines indicate, from left to right, FC < 0.5 (log2FC < −1), FC < 2/3 = 0.667 (log2FC < −0.585), FC > 1.5 (log2FC > 0.585), FC > 2 (log2FC > 1). D, Heatmap based on hierarchical clustering of the top-100 probesets with an absolute FC > 1.5 [log2FC> log2(1.5)] ranked according to the P value.

Close modal

Patients

Patients with distant metastases were excluded by chest x-ray, sonography of the liver, and a bone scan. Eastern Cooperative Oncology Group performance status had to be 0–2. Baseline characteristics of patients who received at least one cycle of chemotherapy (intention-to-treat population) and of patients with gene expression data available are shown in Supplementary Table S1C. Eligible patients were required to have adequate organ function (bilirubin and serum creatinine within normal limits, aspartate aminotransferase and alanine aminotransferase ≤ 2.0 × upper limit of normal, neutrophils ≥ 1,500/μL, platelets ≥ 100,000/μL, hemoglobin ≥ 100 g/L, and normal left ventricular ejection fraction > 50%). Cardiac function was assessed using echocardiography before the start of therapy. Two CNBs were acquired for diagnostics that included the determination of tumor grade, estrogen receptor (ER), progesterone receptor (PR), and HER2 status. Two additional biopsies were obtained for RNA isolation and gene expression profiling, both of which were stored in RNAlater at −80°C.

Treatment

Patients received three 3-week cycles of docetaxel (100 mg/m2) followed by three 3-week cycles of 5-fluorouracil (500 mg/m2), epirubicin (100 mg/m2), and cyclophosphamide (500 mg/m2). Primary prophylaxis with G-CSF was performed in case of febrile neutropenia and was given for each consecutive cycle. Cycles were delayed if neutropenia < 1,500/μL and/or platelets < 100,000/μL. A dose reduction of chemotherapy was indicated in case of neurotoxicity ≥ grade 2, and any nonhematologic toxicity ≥ grade 3 (except nausea/vomiting, alopecia). Toxicities were graded according to NCI Common Toxicity Criteria. Only serious adverse events were collected. Once study treatment was discontinued because of toxicities, it was not allowed to be resumed. Clinical response was assessed by sonography and physical examination. Patients who progressed (i.e., increase in size of more than 25% or a detection of a new breast lesion) were excluded from protocol treatment. Surgery was performed within 28 days of neoadjuvant therapy completion. Additional adjuvant therapy was given at the discretion of the investigator (e.g., radiotherapy depending on initial tumor extension, antihormonal therapy according to hormone receptor status, and/or trastuzumab according to HER2 status, following current guideline recommendations).

Endpoints

The primary endpoint was the identification of a gene expression signature predicting pCR after sequential neoadjuvant treatment with docetaxel followed by FEC. In this trial, pCR was defined as the absence of invasive or noninvasive cancer in the breast.

Gene expression analysis, classifier construction, and validation

RNA isolation from pretreatment CNBs and Affymetrix microarray analysis using HG-U133 Plus 2.0 arrays are described in Supplementary Materials and Methods. CEL files have been deposited under GSE140494.

The statistical software R version 3.4.4 was used for all analyses. Statistical analyses were performed as detailed in Supplementary Materials and Methods. The pipeline for classifier construction and validation is comprehensively described in Supplementary Materials and Methods and illustrated in Supplementary Fig. S1A. Briefly, the data were randomly split into a training set (80%) and a test set (20%), stratified according to the response variable to ensure that the training set and the test set had the same proportion of patients with pCR. The strongest differentially expressed genes were identified in the training set using a limma t test. Gene sets for classifier construction were determined by ranking all probesets with |log2(FC)| > log2(1.5) according to the P value (lowest to highest) and selecting the top-x probesets (x = 2, 3, 4, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, and 100). Classifiers were built using five algorithms: linear discriminant analysis (LDA), ℓ1- and ℓ2-regularized logistic regression (LogReg ℓ1 and ℓ2), a support vector machine (SVM) with a radial basis function kernel, and random forest (RF). This procedure was repeated 100 times with different training and test splits. Classifier performance was assessed in the training sets using cross-validation, where the performance values were averaged across 50 repetitions of 5-fold cross-validation, and in the test sets. The algorithm (LogReg ℓ2) and the numbers of top-ranked probesets (20) that combined the highest possible mean area under the ROC curve (AUC; based on the test sets) with a low number of probesets were selected. The top-20 probesets that were most frequently included among the 100 different ranked lists of top-20 probesets were next combined into a final classifier, which was then applied to the entire cohort. The cut-off point for prediction was decided on the basis of the ROC curve and the Youden statistic. Finally, five additional breast cancer datasets, with information about chemotherapy response and clinicopathologic factors, were used for validation [total N = 619 patients; GSE20194, GSE20271, GSE22093, GSE23988, and one internally generated breast cancer dataset (DUS; ref. 23; Supplementary Table S1D]. Classifiers were also constructed on the basis of clinicopathologic variables only (age, grade, ER, and HER2), and a combination of the clinicopathologic variables and the top 20 probesets.

Chemotherapy response

Between January 2010 and May 2014, 114 patients were enrolled (Fig. 1A and B; Supplementary Table S1C) of whom 107 patients, with available response data after completing therapy, underwent surgery. pCR was observed for 23.4%, partial response (pPR) for 49.5%, and no change (pNC) for 27.1%. The pCR rate was significantly higher in ER than ER+ patients [15/37 (40.5%) vs. 10/70 patients (14.3%), P = 0.004; Fisher exact test], PR than PR+ patients [17/50 (34.0%) vs. 8/57 (14.0%), P = 0.021] and in patients with high-grade tumors [17/50 (34.0%) vs. 8/55 patients (14.5%), P = 0.023; Supplementary Table S1E]. Only ER status was significantly associated with pCR in the multivariate analysis, including ER status, HER2 status, menopausal status, age, tumor size, and tumor grade [OR 0.265; 95% confidence interval (CI), 0.088–0.794; P = 0.018; Supplementary Table S1F].

Gene expression profiling

Treatment response data after surgery and Affymetrix gene expression data from pretreatment CNB were available for 91 patients (Supplementary Table S1C). No obvious effect was observed that was related with gene array batch, study center or RNA input amount (Supplementary Fig. S1B). In total, 7,233 of 54,675 probesets were differentially expressed (P < 0.05) when patients with pCR were compared with the other response groups combined (Fig. 1C). However, only 35 probesets corresponding to 26 unique genes with a FDR-adjusted P < 0.05 were identified. Of the 35, 27 probesets (21 genes) were upregulated and eight probesets (five genes) were downregulated in the patients who achieved a pCR (Supplementary Table S2A). Hierarchical clustering, including the top-100 probesets with |log2(FC)| > log2(1.5) ranked according to P value, revealed two main clusters (Fig. 1D), with the majority of the pathologic complete responders located in the smaller of the two. Instead comparing all responders (complete and partial) with patients for whom no change was observed did not identify any genes with a significant FDR-adjusted P value (Supplementary Table S2B) and was not pursued further. While one can argue that a better endpoint for a classifier would be to identify patients with progressive disease who did not at all benefit from the given therapy, this was not possible in this study because only 3 patients had progressive disease after the first three treatment cycles and were then switched to an alternative therapy.

The genes with significantly higher expression in the pathologic complete responders (FDR-adjusted P < 0.05) were primarily related with immune response and inflammation, in agreement with previous publications (23, 24), including immunoglobulins (IGHM, IGLC1), IL2 and IL6 receptors (IL2RA, IL6R), programmed cell death 1 ligand 2 (PDCD1LG2), guanylate binding protein 1 and 5 (GBP1, GBP5), and indoleamine 2,3-dioxygenase 1 (IDO). Gene ontology (GO) enrichment analysis of genes that were upregulated in patients with pCR [unadjusted P < 0.01; log2(FC) > log2(1.5); 333 probesets] accordingly revealed a strong enrichment of immune-related GOs, including “chemokine-mediated signaling pathway,” “adaptive immune response,” and “innate immune response,” as well as others, such as “G1–S transition of mitotic cell cycle,” and “apoptotic process” (Supplementary Table S2C). The identification of proliferation-related transcripts among the genes which are upregulated in complete responders is in agreement with the identification of tumor grade as a significant factor for pCR and with previous literature (25). Significantly higher gene expression (FDR-adjusted P < 0.05) in the patients who did not achieve a pCR was observed for the nonvoltage-gated sodium channel encoded by SCNN1A, the thiamine transporter encoded by SLC19A2, the adaptor protein SH2B1, microtubule-associated protein 9 (MAP9), and nonprotein-coding RNA gene negative regulator of antiviral response (NRAV; Supplementary Table S2D). Only one significant GO (calcium-dependent phospholipid binding) was identified after P-value adjustment for the genes downregulated in patients with pCR.

Prediction of pCR in the expression cohort

We next aimed to construct a gene expression–based classifier for prediction of pCR to NACT. For comparison, a classifier was built using only clinical variables as input, and the probesets of the best-performing gene expression–based classifier were also combined with the clinical variables (work pipeline: Supplementary Fig. S1A).

First, the patients in the EXPRESSION trial cohort were randomly assigned into training and test sets, stratified according to the response variable. The Affymetrix data were initially restricted to the 22,277 probesets that are present on both the HG U133 A and Plus 2.0 chips. This was necessary to later enable the validation of the classifier performance in the external datasets that were all analyzed using Affymetrix HG U133 A. Probesets for classifier construction were selected on the basis of ranking all probesets with |log2(FC)| > log2(1.5) according to the P value for differential expression between complete responders (pCR) and patients who did not achieve a pCR (pPR+pNC) in the training set, followed by defining variable sets that included different numbers of top-ranked probesets, ranging from top-2 to top-100. Using these different variable sets as input, classifiers were built using five algorithms (LDA, LogReg ℓ1, LogReg ℓ2, SVM, RF). The performance of all classifier variants was first assessed in the training set using CV, and subsequently in the test sets. This procedure was repeated 100 times with different training-test splits. As expected, the mean performance of all classifiers across the 100 iterations was better in the training sets compared with the test sets, and with lower variance (Supplementary Fig. S1C).

To attain a single best-performing gene expression–based classifier, we selected the algorithm and the number of probesets based on the mean test AUC. The highest mean test AUC was obtained using ℓ2-regularized logistic regression (LogReg ℓ2; Fig. 2A). The number of probesets finally used for the classifier construction was based on the number that achieved the highest possible mean AUC in combination with a low number of probesets; to aim for a classifier that included as few genes as possible was motivated by the future need to transfer the analysis to a technology other than Affymetrix arrays. Between 2 and 5 probesets resulted in lower AUCs compared with 10 to 100 probesets (Fig. 2B). No significant increase in the mean AUC was achieved for probeset numbers higher than 10, as illustrated by the overlapping 95% CIs. We continued to build the classifier with 20 probesets [Table 1 (25–42)].

Figure 2.

Classifier construction in the EXPRESSION cohort. Selection of algorithm (A) and number of probesets (B) to be used for classifier construction based on the mean test AUC of the EXPRESSION study. C, For the gene expression classifier (top-20 probeset, LogReg ℓ2), the cut-off point that provided the best balance between sensitivity and specificity (same weight) was determined in the EXPRESSION cohort from the ROC curve based on Youden statistic. D, Cut-off point determination in the EXPRESSION cohort based on Youden statistic for the classifier constructed only on the basis of the clinical variables. E, Cut-off point determination in the EXPRESSION cohort based on Youden statistic for the classifier constructed using a combination of the clinicopathologic variables and the top-20 probesets.

Figure 2.

Classifier construction in the EXPRESSION cohort. Selection of algorithm (A) and number of probesets (B) to be used for classifier construction based on the mean test AUC of the EXPRESSION study. C, For the gene expression classifier (top-20 probeset, LogReg ℓ2), the cut-off point that provided the best balance between sensitivity and specificity (same weight) was determined in the EXPRESSION cohort from the ROC curve based on Youden statistic. D, Cut-off point determination in the EXPRESSION cohort based on Youden statistic for the classifier constructed only on the basis of the clinical variables. E, Cut-off point determination in the EXPRESSION cohort based on Youden statistic for the classifier constructed using a combination of the clinicopathologic variables and the top-20 probesets.

Close modal
Table 1.

Probesets included in the top-20 classifier, FC, log2FC, mean and median expression log2 values, and interquartile range (IQR), for patients with pCR (n = 23) and no pCR (n = 68).

ProbesetβaGeneCategorybFCclog2FCMean no pCRMean pCRMedian no pCRMedian pCRIQR no pCRIQR pCR
202270_at 0.102 GBP1 Immune (39) 2.469 1.304 8.234 9.538 8.160 9.585 1.533 0.596 
207781_s_at 0.261 ZNF711 TF-development (41) 1.999 1.000 4.122 5.122 3.885 5.212 1.116 1.527 
211637_x_at 0.096 NAd  3.224 1.689 7.398 9.087 7.364 9.049 1.889 2.044 
209681_at —0.223 SLC19A2 Carrier/channel (29) 0.512 −0.966 9.232 8.265 9.046 8.392 1.287 1.131 
216237_s_at 0.190 MCM5 Proliferation (38) 1.511 0.595 7.999 8.594 7.946 8.694 0.474 0.711 
216491_x_at 0.063 IGHM Immune (34) 3.889 1.960 7.058 9.017 7.103 8.880 2.715 2.650 
206140_at 0.475 LHX2 TF-development (30) 1.524 0.608 3.882 4.490 3.755 4.257 0.405 1.184 
210029_at 0.044 IDO1 Immune (45) 2.396 1.260 6.603 7.864 6.347 7.851 1.736 1.527 
217236_x_at 0.103 NAd  1.676 0.745 5.212 5.958 5.090 5.559 0.734 1.126 
203453_at —0.053 SCNN1A Carrier/channel (35) 0.378 −1.405 8.836 7.432 9.141 7.636 1.775 2.221 
206341_at 0.163 IL2RA Immune (33) 1.500 0.585 4.877 5.462 4.844 5.398 0.756 0.719 
202901_x_at 0.129 CTSS Immune (36) 1.992 0.994 6.858 7.852 6.836 7.925 1.136 1.095 
219740_at 0.141 VASH2 Angiogenesis; microtubule (46) (37) 2.095 1.067 5.527 6.594 5.249 6.507 1.233 1.829 
215207_x_at 0.104 NAd  1.520 0.604 6.696 7.300 6.634 7.348 0.891 0.776 
220145_at -0.200 MAP9 Microtubule (40) 0.551 -0.860 7.413 6.553 7.530 6.613 1.004 1.568 
202269_x_at 0.003 GBP1 Immune (39) 2.201 1.138 8.489 9.627 8.487 9.774 1.489 0.889 
219385_at 0.060 SLAMF8 Immune (32) 1.598 0.676 6.303 6.979 6.201 7.127 0.882 0.830 
206693_at 0.444 IL7 Immune (43) 1.546 0.629 4.197 4.826 4.114 4.703 0.980 1.003 
210880_s_at 0.476 EFS Immune (31) 1.504 0.589 6.654 7.243 6.749 7.259 0.765 0.668 
218875_s_at 0.117 FBXO Proliferation (44) 1.610 0.687 6.722 7.409 6.606 7.451 0.828 0.915 
ProbesetβaGeneCategorybFCclog2FCMean no pCRMean pCRMedian no pCRMedian pCRIQR no pCRIQR pCR
202270_at 0.102 GBP1 Immune (39) 2.469 1.304 8.234 9.538 8.160 9.585 1.533 0.596 
207781_s_at 0.261 ZNF711 TF-development (41) 1.999 1.000 4.122 5.122 3.885 5.212 1.116 1.527 
211637_x_at 0.096 NAd  3.224 1.689 7.398 9.087 7.364 9.049 1.889 2.044 
209681_at —0.223 SLC19A2 Carrier/channel (29) 0.512 −0.966 9.232 8.265 9.046 8.392 1.287 1.131 
216237_s_at 0.190 MCM5 Proliferation (38) 1.511 0.595 7.999 8.594 7.946 8.694 0.474 0.711 
216491_x_at 0.063 IGHM Immune (34) 3.889 1.960 7.058 9.017 7.103 8.880 2.715 2.650 
206140_at 0.475 LHX2 TF-development (30) 1.524 0.608 3.882 4.490 3.755 4.257 0.405 1.184 
210029_at 0.044 IDO1 Immune (45) 2.396 1.260 6.603 7.864 6.347 7.851 1.736 1.527 
217236_x_at 0.103 NAd  1.676 0.745 5.212 5.958 5.090 5.559 0.734 1.126 
203453_at —0.053 SCNN1A Carrier/channel (35) 0.378 −1.405 8.836 7.432 9.141 7.636 1.775 2.221 
206341_at 0.163 IL2RA Immune (33) 1.500 0.585 4.877 5.462 4.844 5.398 0.756 0.719 
202901_x_at 0.129 CTSS Immune (36) 1.992 0.994 6.858 7.852 6.836 7.925 1.136 1.095 
219740_at 0.141 VASH2 Angiogenesis; microtubule (46) (37) 2.095 1.067 5.527 6.594 5.249 6.507 1.233 1.829 
215207_x_at 0.104 NAd  1.520 0.604 6.696 7.300 6.634 7.348 0.891 0.776 
220145_at -0.200 MAP9 Microtubule (40) 0.551 -0.860 7.413 6.553 7.530 6.613 1.004 1.568 
202269_x_at 0.003 GBP1 Immune (39) 2.201 1.138 8.489 9.627 8.487 9.774 1.489 0.889 
219385_at 0.060 SLAMF8 Immune (32) 1.598 0.676 6.303 6.979 6.201 7.127 0.882 0.830 
206693_at 0.444 IL7 Immune (43) 1.546 0.629 4.197 4.826 4.114 4.703 0.980 1.003 
210880_s_at 0.476 EFS Immune (31) 1.504 0.589 6.654 7.243 6.749 7.259 0.765 0.668 
218875_s_at 0.117 FBXO Proliferation (44) 1.610 0.687 6.722 7.409 6.606 7.451 0.828 0.915 

aEstimated regression coefficients by logistic regression with ℓ2 penalty.

bCategory to which the genes were manually assigned on the basis of literature: Immune, immune system associated, including cytokine-responsive factors; TF-development, transcription factors that play a role in development; carrier/channel, carriers or channels of the cell membrane; proliferation, proliferation-associated gene including cell-cycle control factors; regulation.

cFC>1 denotes higher expression in patients with pCR compared with non-pCR and FC<1 lower expression in patients with pCR compared with non-pCR.

dNA, not available; probeset does not map uniquely to one gene (Bioconductor packages hgu133plus2.db version 3.2.3 and AnnotationDbi version 1.40.0).

The top-20 classifier was next applied to the complete EXPRESSION cohort to determine the optimal cut-off point by inspection of the corresponding ROC curve. The cut-off point (0.532), which corresponds to a predicted probability, was determined on the basis of Youden statistic to provide the best balance between sensitivity and specificity (Fig. 2C). Of the 23 patients in the EXPRESSION cohort who achieved a pCR, 18 were classified as complete responders by the top-20 classifier, corresponding to a sensitivity of 0.783; of the 66 patients who did not achieve a pCR, 64 were also classified as such, giving a specificity 0.970 (Table 2). Of the 20 patients predicted to have a complete response, 18 did indeed achieve a pCR (PPV: 0.900); and, conversely, of the 69 patients predicted to not have a complete response, 64 did not achieve a pCR (NPV: 0.928).

Table 2.

Classifier performance (accuracy, AUC, NPV, PPV, sensitivity, specificity) in the EXPRESSION cohort, all external validation cohorts combined, and each external validation cohort separately, for the top-20 gene expression classifier (Top-20), the classifier constructed on the basis of clinical variables (Clinical), and the classifier constructed using a combination of the clinical variables and the top-20 classifier probesets (Combination).

CohortTop-20ClinicalCombinationRandom
Accuracy EXPRESSION 0.921 0.697 0.910 0.617 
 Validation combined 0.811 0.695 0.809 0.678 
 GSE20194 0.816 0.724 0.801 0.676 
 GSE20271 0.878 0.682 0.851 0.786 
 GSE22093 0.692 0.551 0.667 0.564 
 GSE23988 0.649 0.614 0.632 0.568 
 DUS 0.920 0.827 0.907 0.750 
AUC EXPRESSION 0.928 0.676 0.920 0.500 
 Validation combined 0.768 0.768 0.774 0.500 
 GSE20194 0.746 0.783 0.750 0.500 
 GSE20271 0.839 0.807 0.847 0.500 
 GSE22093 0.698 0.641 0.700 0.500 
 GSE23988 0.674 0.686 0.702 0.500 
 DUS 0.932 0.873 0.940 0.500 
NPV EXPRESSION 0.928 0.831 0.939 0.742 
 Validation combined 0.829 0.909 0.849 0.798 
 GSE20194 0.831 0.925 0.845 0.797 
 GSE20271 0.900 0.966 0.915 0.878 
 GSE22093 0.699 0.750 0.708 0.679 
 GSE23988 0.679 0.774 0.680 0.684 
 DUS 0.926 0.947 0.938 0.853 
PPV EXPRESSION 0.900 0.433 0.826 0.258 
 Validation combined 0.587 0.370 0.543 0.202 
 GSE20194 0.632 0.406 0.514 0.203 
 GSE20271 0.500 0.254 0.389 0.122 
 GSE22093 0.600 0.381 0.462 0.321 
 GSE23988 0.250 0.423 0.286 0.316 
 DUS 0.857 0.444 0.700 0.147 
Sensitivity EXPRESSION 0.783 0.565 0.826 0.258 
 Validation combined 0.216 0.728 0.352 0.202 
 GSE20194 0.226 0.774 0.340 0.203 
 GSE20271 0.222 0.833 0.389 0.122 
 GSE22093 0.120 0.640 0.240 0.321 
 GSE23988 0.056 0.611 0.111 0.316 
 DUS 0.545 0.727 0.636 0.147 
Specificity EXPRESSION 0.970 0.742 0.939 0.742 
 Validation combined 0.962 0.686 0.925 0.798 
 GSE20194 0.966 0.712 0.918 0.797 
 GSE20271 0.969 0.662 0.915 0.878 
 GSE22093 0.962 0.509 0.868 0.679 
 GSE23988 0.923 0.615 0.872 0.684 
 DUS 0.984 0.844 0.953 0.853 
CohortTop-20ClinicalCombinationRandom
Accuracy EXPRESSION 0.921 0.697 0.910 0.617 
 Validation combined 0.811 0.695 0.809 0.678 
 GSE20194 0.816 0.724 0.801 0.676 
 GSE20271 0.878 0.682 0.851 0.786 
 GSE22093 0.692 0.551 0.667 0.564 
 GSE23988 0.649 0.614 0.632 0.568 
 DUS 0.920 0.827 0.907 0.750 
AUC EXPRESSION 0.928 0.676 0.920 0.500 
 Validation combined 0.768 0.768 0.774 0.500 
 GSE20194 0.746 0.783 0.750 0.500 
 GSE20271 0.839 0.807 0.847 0.500 
 GSE22093 0.698 0.641 0.700 0.500 
 GSE23988 0.674 0.686 0.702 0.500 
 DUS 0.932 0.873 0.940 0.500 
NPV EXPRESSION 0.928 0.831 0.939 0.742 
 Validation combined 0.829 0.909 0.849 0.798 
 GSE20194 0.831 0.925 0.845 0.797 
 GSE20271 0.900 0.966 0.915 0.878 
 GSE22093 0.699 0.750 0.708 0.679 
 GSE23988 0.679 0.774 0.680 0.684 
 DUS 0.926 0.947 0.938 0.853 
PPV EXPRESSION 0.900 0.433 0.826 0.258 
 Validation combined 0.587 0.370 0.543 0.202 
 GSE20194 0.632 0.406 0.514 0.203 
 GSE20271 0.500 0.254 0.389 0.122 
 GSE22093 0.600 0.381 0.462 0.321 
 GSE23988 0.250 0.423 0.286 0.316 
 DUS 0.857 0.444 0.700 0.147 
Sensitivity EXPRESSION 0.783 0.565 0.826 0.258 
 Validation combined 0.216 0.728 0.352 0.202 
 GSE20194 0.226 0.774 0.340 0.203 
 GSE20271 0.222 0.833 0.389 0.122 
 GSE22093 0.120 0.640 0.240 0.321 
 GSE23988 0.056 0.611 0.111 0.316 
 DUS 0.545 0.727 0.636 0.147 
Specificity EXPRESSION 0.970 0.742 0.939 0.742 
 Validation combined 0.962 0.686 0.925 0.798 
 GSE20194 0.966 0.712 0.918 0.797 
 GSE20271 0.969 0.662 0.915 0.878 
 GSE22093 0.962 0.509 0.868 0.679 
 GSE23988 0.923 0.615 0.872 0.684 
 DUS 0.984 0.844 0.953 0.853 

Note: pCR was predicted by the Youden statistic–based cut-off point. “Random” indicates the performance measures obtained by randomly assigning pCR status to the number of patients who would be expected to achieve a pCR based on the anticipated pCR rate, computed as described in Supplementary Materials and Methods.

Next, we built a classifier based only on four clinicopathologic parameters—age, tumor grade, ER status, and HER2 status. The best-performing algorithm (LogReg ℓ2) was determined by the highest mean test AUC and the cut-off point (0.271) was determined on the basis of Youden statistic (Fig. 2D; Supplementary Fig. S1D and S1E). Compared with the top-20 gene expression–based classifier, its performance was consistently worse in the EXPRESSION cohort (Table 2).

Finally, a classifier was built, where both the top-20 probesets and the clinical variables were incorporated into the model. After selection of the cut-off point based on Youden statistic (Fig. 2E), the classifiers displayed a slightly better sensitivity (0.826) and NPV (0.939), but worse PPV (0.826) compared with the top-20 classifier without clinical variables (Table 2; Supplementary Fig. S1D and S1E). Similar results as for the entire EXPRESSION cohort were obtained for the subgroup of ER+/HER2 patients (Supplementary Table S3).

Classifier performance in external cohorts

Because of the large number of probesets that can be selected for optimizing the performance of the classifier on the EXPRESSION dataset, the obtained sensitivity, specificity, PPV and NPV may consequently be too high because of overfitting. To evaluate the generalizability of the performance, the three classifiers were next applied to five external validation cohorts, for which Affymetrix gene array data and information on NACT response were available (Supplementary Table S1D). Datasets were analyzed where patients were treated with similar NACT as in this study, that is, a combination treatment including a taxane, an anthracycline, fluorouracil (or capecitabine), and/or cyclophosphamide.

A general drop in classifier performance was observed in the external cohorts for most of the performance metrics (Table 2). The highest AUC (0.774) in the combined external cohort was achieved by the classifier that combined the expression values of the top-20 probesets with the clinical variables, but overall there was no large difference in AUC between the three classifiers (range: 0.768–0.774). The highest sensitivity (0.728; 91/125 true complete responders were classified as complete responders) and NPV (0.909; 339/373 patients classified as not having a pCR did indeed not have a pCR) in the combined external cohorts were achieved by the classifier that was constructed only on the basis of clinical variables, while using the top-20 probesets delivered the highest specificity (0.962; 475/494 patients who did not achieve a pCR were also classified as not having a complete response) and PPV (0.587; 27/46 patients classified as complete responders were true complete responders). While a PPV of 0.587 appears low when considering absolute numbers, this still corresponds to a substantial increase compared with “random guessing” which would result in a PPV of 0.202 (Table 2, column “Random”). For instance, anticipating that approximately 25% of the patients will achieve a pCR, based on information from previous, similar studies, randomly assigning complete responder status to every fourth patient would yield a PPV of 0.25.

Classification with extreme cut-off points

Our classifier predicts whether or not there will be a complete response to treatment, namely, each patient is assigned to one of two possible classes. With logistic regression, the classifier predicts the probability of pCR, which is a continuous variable between 0 and 1, and a cut-off point is then defined, above and below of which patients are classified as belonging to the group of patients who have a complete response (pCR) or who fail to have a complete response (pPR+pNC), respectively. The cut-off point can for instance be determined on the basis of Youden statistic, as was done above, which is a measure of the classifiers' performance, when equal weight is given to sensitivity and specificity. To obtain a high NPV or PPV, we instead applied cut-off points that were lower or higher than the Youden statistic–based cut-off point (Fig. 3). For the top-20 classifier, a good agreement was observed between the predicted probabilities of pCR and the observed pCR rates in the EXPRESSION cohort also in the intervals with the lowest and highest predicted probability (Fig. 3A). Therefore, the cut-off point for classification for the top-20 classifier was defined on the basis of the complete responder with the lowest predicted probability of pCR in the EXPRESSION cohort (0.060). When applied to the external validation cohorts this led to an NPV of 0.960, which means that of the 126 patients who were predicted to not have a complete response, 121 did indeed not achieve a pCR (Fig. 3B; Table 3A). These patients could potentially be spared the toxicity of a not optimally effective chemotherapy, and other therapeutic strategies could be considered. A disadvantage of using the extreme-low cut-off point is that a reliable recommendation to not treat with anthracycline/taxane-based NACT would only be possible for approximately one-fifth of all unselected patients (20.2%), whereas no reliable statement about pCR would be possible for the remainder of the patients (PPV 0.243). Using the same approach for the classifier combining gene expression and clinical factors led to similar results (Supplementary Fig. S1F); whereas, this approach was not used for the classifier including only clinical variables, because it included only dichotomized categorical variables. Patients who did not achieve a pCR, identified by the extreme-low cut-off point (predicted probability of response in the range 0.000–0.060), differed from those identified by the Youden statistic–based cut-off point, who were still above the extreme-low cut-off point (0.060–0.532) primarily by lower expression of genes related with the immune response (Supplementary Table S4A) and higher expression of estrogen receptor (ESR1) and genes positively correlating with ESR1 (Supplementary Table S4B).

Figure 3.

Classifier calibration and identification of patient subgroups with high NPV and PPV by cut-off point selection based on extreme-low or -high predicted probability of pCR. A, Calibration plots for the top-20 classifier for the EXPRESSION cohort (top; 89 patients) and the combined external cohorts (bottom, 619 patients), allowing a visual comparison of predicted probabilities and empirical responses in equally spaced intervals. For each interval, the mean predicted probability of pCR is used as x-coordinate, while the corresponding y-coordinate is the proportion of true complete responders in this interval. Red and blue tick marks: patients with pCR and no pCR. The diagonal line represents the perfect agreement between predicted probabilities of pCR and the proportion of true complete responders. To enable a direct comparison with the classifiers that include clinical variables, only the patients for whom clinical variables were available were also used for the top-20 classifier based on gene expression only. B, The cut-off point for classification for the top-20 classifier was defined on the basis of the complete responder with the lowest predicted probability of pCR in the EXPRESSION cohort (dotted line; 0.06), or the patient who did not have a complete response with the highest probability of pCR (dotted line; 0.73; top) and applied to the external cohorts (bottom). The dotted line in the middle (0.532) indicates the cut-off point determined by Youden statistic. Red and blue dots represent patients with pCR and no pCR (pPR+pNC), respectively.

Figure 3.

Classifier calibration and identification of patient subgroups with high NPV and PPV by cut-off point selection based on extreme-low or -high predicted probability of pCR. A, Calibration plots for the top-20 classifier for the EXPRESSION cohort (top; 89 patients) and the combined external cohorts (bottom, 619 patients), allowing a visual comparison of predicted probabilities and empirical responses in equally spaced intervals. For each interval, the mean predicted probability of pCR is used as x-coordinate, while the corresponding y-coordinate is the proportion of true complete responders in this interval. Red and blue tick marks: patients with pCR and no pCR. The diagonal line represents the perfect agreement between predicted probabilities of pCR and the proportion of true complete responders. To enable a direct comparison with the classifiers that include clinical variables, only the patients for whom clinical variables were available were also used for the top-20 classifier based on gene expression only. B, The cut-off point for classification for the top-20 classifier was defined on the basis of the complete responder with the lowest predicted probability of pCR in the EXPRESSION cohort (dotted line; 0.06), or the patient who did not have a complete response with the highest probability of pCR (dotted line; 0.73; top) and applied to the external cohorts (bottom). The dotted line in the middle (0.532) indicates the cut-off point determined by Youden statistic. Red and blue dots represent patients with pCR and no pCR (pPR+pNC), respectively.

Close modal
Table 3.

(A) Performance measure for the top-20 classifier with the Youden statistics–based (0.532) as well as extreme low (0.060) and high (0.730) cut-off points in the EXPRESSION cohort and the combined external validation cohorts and NPV for the top-20 classifier with a cut-off point of 0.060 in the four intrinsic subtypes in the combined external cohort (B).

(A)
Cut-off point
0.0600.5320.730
Accuracy EXPRESSION 0.573 0.921 0.809 
 Combined validation 0.389 0.811 0.816 
AUCa EXPRESSION 0.928 0.928 0.928 
 Combined validation 0.768 0.768 0.768 
NPV EXPRESSION 1.000 0.928 0.795 
 Combined validation 0.960 0.829 0.815 
PPV EXPRESSION 0.377 0.900 1.000 
 Combined validation 0.243 0.587 0.867 
Sensitivity EXPRESSION 1.000 0.783 0.261 
 Combined validation 0.960 0.216 0.104 
Specificity EXPRESSION 0.424 0.970 1.000 
 Combined validation 0.245 0.962 0.996 
(B) 
 Luminal A Luminal B HER2+ ER/HER2 
NPV 0.986 0.949 0.714 1.000 
Random 0.959 0.891 0.667 0.646 
n total 194 128 105 192 
n pCR 14 35 68 
n <0.060 73 (37.6%) 39 (30.5) 7 (6.7%) 7 (3.6%) 
(A)
Cut-off point
0.0600.5320.730
Accuracy EXPRESSION 0.573 0.921 0.809 
 Combined validation 0.389 0.811 0.816 
AUCa EXPRESSION 0.928 0.928 0.928 
 Combined validation 0.768 0.768 0.768 
NPV EXPRESSION 1.000 0.928 0.795 
 Combined validation 0.960 0.829 0.815 
PPV EXPRESSION 0.377 0.900 1.000 
 Combined validation 0.243 0.587 0.867 
Sensitivity EXPRESSION 1.000 0.783 0.261 
 Combined validation 0.960 0.216 0.104 
Specificity EXPRESSION 0.424 0.970 1.000 
 Combined validation 0.245 0.962 0.996 
(B) 
 Luminal A Luminal B HER2+ ER/HER2 
NPV 0.986 0.949 0.714 1.000 
Random 0.959 0.891 0.667 0.646 
n total 194 128 105 192 
n pCR 14 35 68 
n <0.060 73 (37.6%) 39 (30.5) 7 (6.7%) 7 (3.6%) 

Note: “Random” indicates the performance measures obtained by randomly assigning pCR status to the number of patients who would be expected to achieve a pCR based on the anticipated pCR rate, computed as described in Supplemental Materials and Methods. n, total number of patients in each subgroup; n pCR, number of patients with pathologic complete response in each subgroup; n <0.060, number of patients with a predicted probability for pCR below 0.060.

aAUC does not depend on the cut-off point. Therefore, identical values are obtained.

Correspondingly, the cut-off point for the predicted probability of pCR was defined on the basis of the patient who did not achieve a pCR who had the highest predicted probability of pCR in the EXPRESSION cohort (0.730; Fig. 3B), and next applied to the external validation cohorts. For the top-20 classifier, this yielded a PPV of 0.867 (Table 3A), corresponding to 13 true complete responders among the 15 patients predicted to achieve a pCR. A critical limitation is that a classifier-based assessment, stating that a patient predicted to have a complete response would be very likely to actually do have a complete response, would only be possible for 2% of all patients.

Extreme cut-off points in the intrinsic breast cancer subtypes

The NPV of the top-20 classifier with the extreme-low cut-off point (<0.060) was further analyzed in the intrinsic breast cancer subtypes of the combined external cohorts (Table 3B). Very high NPVs of 0.986, 0.949, and 1.00 were obtained for the luminal A, luminal B, and the ER/HER2 subtypes, respectively, while the corresponding value for HER2+ carcinomas was lower. The percentage of patients identified by the classifier as not achieving a pCR strongly differed between the subtypes with 37.6% and 30.5% in the luminal A and luminal B carcinomas and much smaller percentages in the HER2+ and ER/HER2 subtypes.

Prognostic gene signatures have already demonstrated a high level of evidence (LoE) and clinical utility to guide adjuvant systemic therapy decisions in certain subgroups of early breast cancer (43). However, the situation is less convincing for predicting response to neoadjuvant anthracycline- and taxane-based chemotherapy. A problem with NACT in breast cancer is the overtreatment of a relatively large proportion of patients, because all are at risk for chemotherapy-related toxicities, but only a minority achieves a pCR. Because NACT is nevertheless the standard of care for patients with locally advanced breast cancer and increasingly used in smaller tumors, it may be of particular interest to identify patients who do not achieve a pCR but are still exposed to the side effects of the chemotherapy and for whom alternative therapeutic strategies, for example, inhibitors of cyclin-dependent kinase 4/6, PARP, or immune checkpoints, could be considered. This requires a classifier with a high NPV, that is, where a high proportion of the patients predicted to not have a complete response indeed do not achieve a pCR, so that a therapeutic approach which may benefit a patient will not be withheld.

So far, gene expression signatures lack a high LoE for predicting the therapeutic effect of NACT. Several published classifiers to predict pCR to NACT reported NPVs above 0.90 (Supplementary Table S1A), but most were not extensively validated in several external cohorts and in particular the earliest studies were based on very small training and validation sets. The DLDA30 classifier (13) was independently validated in two further studies (14, 19) and as such can be considered the most extensively validated signature for NACT response in breast cancer. There was no overlap of genes/probesets between the 30 genes included in the DLDA30 and the top-20 classifier presented in this article (Supplementary Fig. S1G). Nevertheless, 20 probesets (67%) included in DLDA30 do indeed show a significant difference (pCR vs. non-pCR) in our cohort (unadjusted P values). It should be noted that no immune-related genes are included in DLDA30; whereas, immune-related genes dominate in our top-20 classifier, strongly contributing to the differences between these two signatures. The overlap between our top-20 classifier and the IRSN23 (18) which includes only immune-related genes and the MPCP155 (21) which focuses on metabolic genes, was restricted to one gene for each classifier (Supplementary Fig. S1G). This lack of an extensive overlap may at first sight appear deterrent, but it is unlikely that there exists one gene set that performs better than all other, as further explored in Supplementary Fig. 1H. The reported NPV for DLDA30 was 0.96 in the original study (13), that is, similar to that in our study, but decreased to 0.88, 0.92, and 0.91 when the classifier was applied to further validation cohorts (14, 19). One study reported an NPV that is higher than in this study; the MPCP155 signature reported a NPV of 0.97 in their validation cohort of 259 ER+/HER2 patients (21). The MPCP155 predicted 144 patients to be highly chemosensitive, when in reality only 22 patients had a complete response to therapy, and 115 to have a low chemosensitivity. Of the 115, 112 of these did not achieve a pCR. These numbers indicate that their classifier may work similar to ours with the extreme-low cut-off point, that is, it reliably identifies patients who do not achieve a pCR but at the cost of not giving a reliable estimate of pCR.

A strength of this study is that the identification of a predictive classifier based on gene expression in pretreatment biopsies was the primary endpoint of a prospective multicenter study where the patients were treated with anthracycline- and taxane-based NACT. Furthermore, we tested the performance of our classifier in different molecular subtypes of the validation cohorts. One important limitation is that the study had to be discontinued prematurely due to slow recruitment and because guidelines were changed during the study to include trastuzumab and pertuzumab as part of neoadjuvant therapy in HER2+ patients (44). As with HER2+ patients, the treatment landscape for triple-negative breast cancer also changed (45). Nevertheless, the patient number of the EXPRESSION trial was sufficient to build a classifier for further validation in the external cohorts which included 619 patients. In the EXPRESSION cohort, an NPV of 0.928 was obtained for the top-20 gene expression–based classifier, which in the combined external cohorts was reduced to 0.829, underscoring the risk of over optimism if one relies exclusively on a single cohort. An alternative approach is to construct a classifier that makes a statement with high confidence, but only for a subset of patients defined by very low (or very high) predicted probabilities. The comparison of predicted and empirical probabilities of pCR by calibration plots suggested a good agreement between the two also in patients with particularly low predicted probabilities. To identify patients with a very low probability of complete response to NACT, the cut-off point was therefore defined on the basis of the complete responder with the lowest predicted probability of pCR in the EXPRESSION cohort, and next applied to the external validation cohorts, yielding an NPV of 0.960. This means that 121 patients could have been spared a not optimally effective chemotherapy, while a beneficial therapy would have been withheld from 5 of the 126 patients predicted to not achieve a pCR. However, no convincing statement about pCR would be possible for the remaining 493 of the in total 619 patients of the combined external cohorts, who were classified as complete responders, due to the low PPV (0.243). The patients predicted to not achieve a complete response and that had an extremely low probability of pCR were characterized by very high expression of ESR1 and low expression of immune-related transcripts, indicating the absence of an effective antitumor immune response.

Analysis of the individual intrinsic breast cancer subtypes by the top-20 classifier with an extreme-low cut-off point identified about one-third of patients with luminal breast cancer of the external cohorts as not having a complete response to neoadjuvant therapy with very high NPVs of 0.986 (luminal A) and 0.949 (luminal B), respectively. For ER/HER2 carcinomas, the fraction of patients classified as not achieving a pCR was small (3.6%). The HER2+ carcinomas showed a much lower NPV, demonstrating that the top-20 classifier does not reliably identify patients who do not achieve a pCR in this subtype.

To identify patients with a very high probability of complete response to NACT, the cut-off point was defined on the basis of the patient who had the highest predicted probability of pCR among those who did not have a complete response in the EXPRESSION cohort, and next applied to the external cohorts. For the top-20 classifier, this yielded a PPV of 0.867, which is much better compared with the PPV of 0.587 attained when a more common approach to cut-off point optimization was applied on the basis of the Youden statistic. However, only approximately 2% of all patients were identified as complete responders with this cut-off point.

In a previous study, the authors divided the range of predicted probabilities into five equally spaced intervals and reported pCR rates separately for patients treated with either anthracycline, paclitaxel plus anthracycline, or docetaxel plus anthracycline (46). The lowest and the highest intervals yielded the lowest and highest pCR rates, respectively. This result corresponds to this study, where the highest NPV and PPV was obtained for patients with the lowest and highest predicted probabilities of pCR. However, in contrast to Yu and colleagues, we defined cut-off points for predicted probabilities in the training cohort and tested NPV and PPV in independent cohorts.

In conclusion, identification of a gene expression–based classifier for pCR in the prospective multicenter phase II EXPRESSION study, validation in published external cohorts and the use of established Youden statistic–based cutoffs resulted in a similar NPV and PPV of classification as reported in previous studies that would not be sufficient for clinical application. However, the use of a novel strategy with an extreme-low cutoff identified patients who did not achieve a pCR with high confidence. These patients could possibly be spared the toxicity of NACT. However, the clinical application of such a classifier would require several actions. First, the classifier presented here was established and validated using Affymetrix gene array data and should be transferred to a more suitable technology such as using qRT-PCR or next-generation sequencing. Moreover, it is crucial that a therapeutic approach that may benefit the patient is not withheld. Thus, a randomized clinical trial would be required before classifier-based recommendations not to treat can be made. The development of alternative therapeutic strategies should also be prioritized for this patient group.

B. Aktas reports personal fees from Amgen, AstraZeneca, Celgene, Novartis, and Roche outside the submitted work. H.-C. Kolberg reports personal fees and nonfinancial support from Carl Zeiss meditec, AstraZeneca, Pfizer, Novartis, Roche, GenomicHealth/Exact Sciences, Theraclion, Amgen, MSD, TEVA, and GSK; personal fees from Onkowissen, SurgVision, and Janssen Cilag; and nonfinancial support from LIV Pharma, Tesaro, and Daiichi Sankyo outside the submitted work; in addition, H.-C. Kolberg is stock owner of Theraclion and co-owner of Phaon scientific. M. Battista reports personal fees from Roche Pharma AG, Pharma Mar AG, Tesaro Bio Gmbh, Clovis Oncology, and AstraZeneca outside the submitted work. K.E. Weber reports a patent for 18209672.7 - 1111 issued. S. Loibl reports grants and other from Amgen, AbbVie, Roche, Celgene, Novartis, and Pfizer; other from SeaGen, Prime/Medscape, Eirgenix, BMS, Merck, and Puma; grants from Immunomedics; grants, personal fees, and other from DSI; and personal fees from Chugai outside the submitted work; in addition, S. Loibl has a patent for EP14153692.0 pending. M. Schmidt reports grants from Federal Ministry of Education and Research (BMBF, NGFN project Oncoprofile, no. 01GR0816) and Sanofi-Aventis during the conduct of the study, as well as grants and personal fees from AstraZeneca, Novartis, and Pierre-Fabre; grants, personal fees, and nonfinancial support from Pfizer, Roche, and Pantarhei Bioscience; personal fees from Lilly, SeaGen, Eisai, and MSD; grants and nonfinancial support from BioNTech; and grants from Genentech outside the submitted work; in addition, M. Schmidt has a patent for EP 2951317 B1: A method for predicting the benefit from inclusion of a taxane in a chemotherapy regimen in patients with breast cancer issued and a patent for EP 2390370 B1: A method for predicting the response of a tumor in a patient suffering from or at risk of developing recurrent gynecologic cancer toward a chemotherapeutic agent issued. No disclosures were reported by the other authors.

K. Edlund: Conceptualization, data curation, formal analysis, supervision, validation, investigation, methodology, writing–original draft, writing–review and editing. K. Madjar: Software, formal analysis. A. Lebrecht: Conceptualization, supervision, writing–review and editing. B. Aktas: Conceptualization, supervision, writing–review and editing. H. Pilch: Conceptualization, supervision, writing–review and editing. G. Hoffmann: Conceptualization, formal analysis. M. Hofmann: Conceptualization, formal analysis. H.-C. Kolberg: Conceptualization, formal analysis. D. Boehm: Conceptualization, resources. M. Battista: Conceptualization, resources. M. Seehase: Data curation, investigation. K. Stewen: Conceptualization, writing–review and editing. S. Gebhard: Conceptualization, data curation, formal analysis, validation, investigation, methodology. C. Cadenas: Conceptualization, data curation, supervision, writing–original draft, writing–review and editing. R. Marchan: Conceptualization, data curation, supervision, writing–original draft, writing–review and editing. W. Brenner: Conceptualization, supervision, writing–review and editing. A. Hasenburg: Conceptualization, supervision, writing–review and editing. H. Koelbl: Conceptualization, supervision, writing–review and editing. C. Solbach: Conceptualization, supervision, writing–review and editing. M. Gehrmann: Conceptualization, supervision, writing–review and editing. B. Tanner: Conceptualization, supervision, writing–review and editing. K.E. Weber: Conceptualization, supervision, writing–review and editing. S. Loibl: Conceptualization, supervision, writing–review and editing. A. Sachinidis: Formal analysis, investigation. J. Rahnenführer: Conceptualization, data curation, formal analysis, supervision, validation, investigation, methodology, writing–review and editing. M. Schmidt: Conceptualization, resources, data curation, supervision, funding acquisition, validation, investigation, writing–original draft, project administration, writing–review and editing. J.G. Hengstler: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, validation, writing–original draft, writing–review and editing.

We thank Margit Henry and Tamara Rotshteyn for valuable support with the microarray analyses. The study was supported by the Federal Ministry of Education and Research (BMBF, NGFN project Oncoprofile, no. 01GR0816) and Sanofi-Aventis.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Burstein
HJ
,
Curigliano
G
,
Loibl
S
,
Dubsky
P
,
Gnant
M
,
Poortmans
P
, et al
Estimating the benefits of therapy for early-stage breast cancer: the St. Gallen International Consensus Guidelines for the primary therapy of early breast cancer 2019
.
Ann Oncol
2019
;
30
:
1541
57
.
2.
Ditsch
N
,
Untch
M
,
Thill
M
,
Muller
V
,
Janni
W
,
Albert
US
, et al
AGO recommendations for the diagnosis and treatment of patients with early breast cancer: update 2019
.
Breast Care
2019
;
14
:
224
45
.
3.
Spring
LM
,
Fell
G
,
Arfe
A
,
Sharma
C
,
Greenup
R
,
Reynolds
KL
, et al
Pathologic complete response after neoadjuvant chemotherapy and impact on breast cancer recurrence and survival: a comprehensive meta-analysis
.
Clin Cancer Res
2020
;
26
:
2838
48
.
4.
Gianni
L
,
Mansutti
M
,
Anton
A
,
Calvo
L
,
Bisagni
G
,
Bermejo
B
, et al
Comparing neoadjuvant nab-paclitaxel vs paclitaxel both followed by anthracycline regimens in women with ERBB2/HER2-negative breast cancer-the evaluating treatment with neoadjuvant abraxane (ETNA) trial: a randomized phase 3 clinical trial
.
JAMA Oncol
2018
;
4
:
302
8
.
5.
Iwata
H
,
Sato
N
,
Masuda
N
,
Nakamura
S
,
Yamamoto
N
,
Kuroi
K
, et al
Docetaxel followed by fluorouracil/epirubicin/cyclophosphamide as neoadjuvant chemotherapy for patients with primary breast cancer
.
Jpn J Clin Oncol
2011
;
41
:
867
75
.
6.
Untch
M
,
Jackisch
C
,
Schneeweiss
A
,
Conrad
B
,
Aktas
B
,
Denkert
C
, et al
Nab-paclitaxel versus solvent-based paclitaxel in neoadjuvant chemotherapy for early breast cancer (GeparSepto-GBG 69): a randomised, phase 3 trial
.
Lancet Oncol
2016
;
17
:
345
56
.
7.
von Minckwitz
G
,
Eidtmann
H
,
Rezai
M
,
Fasching
PA
,
Tesch
H
,
Eggemann
H
, et al
Neoadjuvant chemotherapy and bevacizumab for HER2-negative breast cancer
.
N Eng J Med
2012
;
366
:
299
309
.
8.
von Minckwitz
G
,
Kummel
S
,
Vogel
P
,
Hanusch
C
,
Eidtmann
H
,
Hilfrich
J
, et al
Intensified neoadjuvant chemotherapy in early-responding breast cancer: phase III randomized GeparTrio study
.
J Natl Cancer Inst
2008
;
100
:
552
62
.
9.
Ayers
M
,
Symmans
WF
,
Stec
J
,
Damokosh
AI
,
Clark
E
,
Hess
K
, et al
Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer
.
J Clin Oncol
2004
;
22
:
2284
93
.
10.
Chang
JC
,
Wooten
EC
,
Tsimelzon
A
,
Hilsenbeck
SG
,
Gutierrez
MC
,
Elledge
R
, et al
Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer
.
Lancet
2003
;
362
:
362
9
.
11.
Farmer
P
,
Bonnefoi
H
,
Anderle
P
,
Cameron
D
,
Wirapati
P
,
Becette
V
, et al
A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer
.
Nat Med
2009
;
15
:
68
74
.
12.
Hatzis
C
,
Pusztai
L
,
Valero
V
,
Booser
DJ
,
Esserman
L
,
Lluch
A
, et al
A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer
.
JAMA
2011
;
305
:
1873
81
.
13.
Hess
KR
,
Anderson
K
,
Symmans
WF
,
Valero
V
,
Ibrahim
N
,
Mejia
JA
, et al
Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer
.
J Clin Oncol
2006
;
24
:
4236
44
.
14.
Lee
JK
,
Coutant
C
,
Kim
YC
,
Qi
Y
,
Theodorescu
D
,
Symmans
WF
, et al
Prospective comparison of clinical and genomic multivariate predictors of response to neoadjuvant chemotherapy in breast cancer
.
Clin Cancer Res
2010
;
16
:
711
8
.
15.
Liedtke
C
,
Hatzis
C
,
Symmans
WF
,
Desmedt
C
,
Haibe-Kains
B
,
Valero
V
, et al
Genomic grade index is associated with response to chemotherapy in patients with breast cancer
.
J Clin Oncol
2009
;
27
:
3185
91
.
16.
Naoi
Y
,
Kishi
K
,
Tanei
T
,
Tsunashima
R
,
Tominaga
N
,
Baba
Y
, et al
Prediction of pathologic complete response to sequential paclitaxel and 5-fluorouracil/epirubicin/cyclophosphamide therapy using a 70-gene classifier for breast cancers
.
Cancer
2011
;
117
:
3682
90
.
17.
Rodriguez
AA
,
Makris
A
,
Wu
MF
,
Rimawi
M
,
Froehlich
A
,
Dave
B
, et al
DNA repair signature is associated with anthracycline response in triple negative breast cancer patients
.
Breast Cancer Res Treat
2010
;
123
:
189
96
.
18.
Sota
Y
,
Naoi
Y
,
Tsunashima
R
,
Kagara
N
,
Shimazu
K
,
Maruyama
N
, et al
Construction of novel immune-related signature for prediction of pathological complete response to neoadjuvant chemotherapy in human breast cancer
.
Ann Oncol
2014
;
25
:
100
6
.
19.
Tabchy
A
,
Valero
V
,
Vidaurre
T
,
Lluch
A
,
Gomez
H
,
Martin
M
, et al
Evaluation of a 30-gene paclitaxel, fluorouracil, doxorubicin, and cyclophosphamide chemotherapy response predictor in a multicenter randomized trial in breast cancer
.
Clin Cancer Res
2010
;
16
:
5351
61
.
20.
Thuerigen
O
,
Schneeweiss
A
,
Toedt
G
,
Warnat
P
,
Hahn
M
,
Kramer
H
, et al
Gene expression signature predicting pathologic complete response with gemcitabine, epirubicin, and docetaxel in primary breast cancer
.
J Clin Oncol
2006
;
24
:
1839
45
.
21.
Tsunashima
R
,
Naoi
Y
,
Kagara
N
,
Shimoda
M
,
Shimomura
A
,
Maruyama
N
, et al
Construction of multi-gene classifier for prediction of response to and prognosis after neoadjuvant chemotherapy for estrogen receptor positive breast cancers
.
Cancer Lett
2015
;
365
:
166
73
.
22.
Turner
N
,
Forcato
M
,
Nuzzo
S
,
Malorni
L
,
Bicciato
S
,
Di Leo
A
. 
A multifactorial ‘Consensus Signature’ by in silico analysis to predict response to neoadjuvant anthracycline-based chemotherapy in triple-negative breast cancer
.
NPJ Breast Cancer
2015
;
1
:
15003
.
23.
Schmidt
M
,
Hellwig
B
,
Hammad
S
,
Othman
A
,
Lohr
M
,
Chen
Z
, et al
A comprehensive analysis of human gene expression profiles identifies stromal immunoglobulin kappa C as a compatible prognostic marker in human solid tumors
.
Clin Cancer Res
2012
;
18
:
2695
703
.
24.
Denkert
C
,
von Minckwitz
G
,
Brase
JC
,
Sinn
BV
,
Gade
S
,
Kronenwett
R
, et al
Tumor-infiltrating lymphocytes and response to neoadjuvant chemotherapy with or without carboplatin in human epidermal growth factor receptor 2-positive and triple-negative primary breast cancers
.
J Clin Oncol
2015
;
33
:
983
91
.
25.
Schmidt
M
,
Bohm
D
,
von Torne
C
,
Steiner
E
,
Puhl
A
,
Pilch
H
, et al
The humoral immune system has a key prognostic impact in node-negative breast cancer
.
Cancer Res
2008
;
68
:
5405
13
.
26.
Boulware
MJ
,
Subramanian
VS
,
Said
HM
,
Marchant
JS
. 
Polarized expression of members of the solute carrier SLC19A gene family of water-soluble multivitamin transporters: implications for physiological function
.
Biochem J
2003
;
376
:
43
8
.
27.
Chou
SJ
,
Tole
S
. 
Lhx2, an evolutionarily conserved, multifunctional regulator of forebrain development
.
Brain Res
2019
;
1705
:
1
14
.
28.
Deneka
A
,
Korobeynikov
V
,
Golemis
EA
. 
Embryonal Fyn-associated substrate (EFS) and CASS4: the lesser-known CAS protein family members
.
Gene
2015
;
570
:
25
35
.
29.
Dunbier
AK
,
Ghazoui
Z
,
Anderson
H
,
Salter
J
,
Nerurkar
A
,
Osin
P
, et al
Molecular profiling of aromatase inhibitor-treated postmenopausal breast tumors identifies immune-related correlates of resistance
.
Clin Cancer Res
2013
;
19
:
2775
86
.
30.
Flynn
MJ
,
Hartley
JA
. 
The emerging role of anti-CD25 directed therapies as both immune modulators and targeted agents in cancer
.
Br H Haematol
2017
;
179
:
20
35
.
31.
Hsu
HM
,
Chu
CM
,
Chang
YJ
,
Yu
JC
,
Chen
CT
,
Jian
CE
, et al
Six novel immunoglobulin genes as biomarkers for better prognosis in triple-negative breast cancer by gene co-expression network analysis
.
Sci Rep
2019
;
9
:
4484
.
32.
Hummler
E
. 
Epithelial sodium channel, salt intake, and hypertension
.
Curr Hypertens Rep
2003
;
5
:
11
8
.
33.
Kim
SJ
,
Schätzle
S
,
Ahmed
SS
,
Haap
W
,
Jang
SH
,
Gregersen
PK
, et al
Increased cathepsin S in Prdm1(-/-) dendritic cells alters the T(FH) cell repertoire and contributes to lupus
.
Nat Immunol
2017
;
18
:
1016
24
.
34.
Nieuwenhuis
J
,
Adamopoulos
A
,
Bleijerveld
OB
,
Mazouzi
A
,
Stickel
E
,
Celie
P
, et al
Vasohibins encode tubulin detyrosinating activity
.
Science
2017
;
358
:
1453
6
.
35.
Nowinska
K
,
Ciesielska
U
,
Piotrowska
A
,
Jablonska
K
,
Partynska
A
,
Paprocka
M
, et al
MCM5 expression is associated with the grade of malignancy and Ki-67 antigen in LSCC
.
Anticancer Res
2019
;
39
:
2325
35
.
36.
Praefcke
GJK
. 
Regulation of innate immune functions by guanylate-binding proteins
.
Int J Med Microbiol
2018
;
308
:
237
45
.
37.
Ramkumar
A
,
Jong
BY
,
Ori-McKenney
KM
. 
ReMAPping the microtubule landscape: how phosphorylation dictates the activities of microtubule-associated proteins
.
Dev Dyn
2018
;
247
:
138
55
.
38.
Rhie
SK
,
Yao
L
,
Luo
Z
,
Witt
H
,
Schreiner
S
,
Guo
Y
, et al
ZFX acts as a transcriptional activator in multiple types of human tumors by binding downstream of transcription start sites at the majority of CpG island promoters
.
Genome Res
2018
;
28
:
310
20
.
39.
Schreiber
M
,
Weigelt
M
,
Karasinsky
A
,
Anastassiadis
K
,
Schallenberg
S
,
Petzold
C
, et al
Inducible IL-7 hyperexpression influences lymphocyte homeostasis and function and increases allograft rejection
.
Front Immunol
2019
;
10
:
742
.
40.
Wang
X
,
Zhang
T
,
Zhang
S
,
Shan
J
. 
Prognostic values of F-box members in breast cancer: an online database analysis and literature review
.
Biosci Rep
2019
;
39
:
BSR20180949
.
41.
Wu
RY
,
Kong
PF
,
Xia
LP
,
Huang
Y
,
Li
ZL
,
Tang
YY
, et al
Regorafenib promotes antitumor immunity via inhibiting PD-L1 and IDO1 expression in melanoma
.
Clin Cancer Res
2019
;
25
:
4530
41
.
42.
Sato
Y
. 
The vasohibin family: a novel family for angiogenesis regulation
.
J Biochem
2013
;
153
:
5
11
.
43.
Harris
LN
,
Ismaila
N
,
McShane
LM
,
Hayes
DF
. 
Use of biomarkers to guide decisions on adjuvant systemic therapy for women with early-stage invasive breast cancer: American Society of Clinical Oncology clinical practice guideline summary
.
J Oncol Pract
2016
;
12
:
384
9
.
44.
Gianni
L
,
Pienkowski
T
,
Im
YH
,
Roman
L
,
Tseng
LM
,
Liu
MC
, et al
Efficacy and safety of neoadjuvant pertuzumab and trastuzumab in women with locally advanced, inflammatory, or early HER2-positive breast cancer (NeoSphere): a randomised multicentre, open-label, phase 2 trial
.
Lancet Oncol
2012
;
13
:
25
32
.
45.
Schmid
P
,
Cortes
J
,
Pusztai
L
,
McArthur
H
,
Kümmel
S
,
Bergh
J
, et al
Pembrolizumab for early triple-negative breast cancer
.
New Eng J Med
2020
;
382
:
810
21
.
46.
Yu
K
,
Sang
QA
,
Lung
PY
,
Tan
W
,
Lively
T
,
Sheffield
C
, et al
Personalized chemotherapy selection for breast cancer using gene expression profiles
.
Sci Rep
2017
;
7
:
43294
.

Supplementary data