Abstract
Purpose: Histologic tumor grade is a well-established prognostic factor for breast cancer, and tumor grade–associated genes are the common denominator of many prognostic gene signatures. The objectives of this study are as follows: (a) to develop a simple gene expression index for tumor grade (molecular grade index or MGI), and (b) to determine whether MGI and our previously described HOXB13:IL17BR index together provide improved prognostic information.
Experimental Design: From our previously published list of genes whose expression correlates with both tumor grade and tumor stage progression, we selected five cell cycle–related genes to build MGI and evaluated MGI in two publicly available microarray data sets totaling 410 patients. Using two additional cohorts (n = 323), we developed a real-time reverse transcription PCR assay for MGI, validated its prognostic utility, and examined its interaction with HOXB13:IL17BR.
Results: MGI performed consistently as a strong prognostic factor and was comparable with a more complex 97-gene genomic grade index in multiple data sets. In patients treated with endocrine therapy, MGI and HOXB13:IL17BR modified each other's prognostic performance. High MGI was associated with significantly worse outcome only in combination with high HOXB13:IL17BR, and likewise, high HOXB13:IL17BR was significantly associated with poor outcome only in combination with high MGI.
Conclusions: We developed and validated a five-gene reverse transcription PCR assay for MGI suitable for analyzing routine formalin-fixed paraffin-embedded clinical samples. The combination of MGI and HOXB13:IL17BR outperforms either alone and identifies a subgroup (∼30%) of early stage estrogen receptor–positive breast cancer patients with very poor outcome despite endocrine therapy.
The most recent (2005) St. Gallen consensus guidelines for treatment selection for early stage breast cancer consider both risk of recurrence and endocrine responsiveness to better balance risk and benefit of systemic adjuvant therapies (1). To better define risk stratification, genome-wide expression profiling studies have created multiple prognostic gene signatures for breast cancer (2, 3). An important issue is whether these signatures overlap in their prognostic information and whether combining several of these signatures would provide more accurate prognosis. In one comparative study, four signatures (the intrinsic subtypes, 70-gene signature, wound response signature, and Recurrence Score) were found to be highly concordant in classifying patients into low and high-risk groups (4). Notably, combining these signatures did not yield significant improvement in predictive accuracy, suggesting that the prognostic information provided by these signatures is largely overlapping (4). Sotiriou et al. (5) showed that a 97-gene tumor grade signature was comparable with the 70-gene signature and the Recurrence Score algorithm (6) in independent cohorts, and they hypothesized that most of the prognostic power of these signatures comes from genes associated with cellular proliferation (7).
Previously, we conducted a microarray analysis of 60 patients with hormone receptor–positive breast cancer treated with standard 5 years of adjuvant tamoxifen therapy (8). To facilitate discovery of novel biomarkers predictive of clinical outcome beyond standard prognostic factors, patients who developed recurrences were matched to those who did not with respect to tumor stage and grade. We identified three genes associated with clinical outcome, HOXB13, IL17BR, and CHDH, none of which had been previously implicated in breast cancer. Because high HOXB13 and low IL17BR expression levels are associated with recurrence, we proposed that a simple HOXB13:IL17BR two-gene ratio could serve as a novel biomarker for predicting recurrence in breast cancer patients receiving adjuvant tamoxifen therapy. Subsequent studies by us (9, 10) and others (11, 12) have shown that HOXB13:IL17BR is both prognostic (i.e., predicts the risk of breast cancer recurrence) and predictive of tamoxifen benefit (i.e., tamoxifen response/resistance).
Given the well-established prognostic importance of histologic tumor grade (13) and tumor grade–associated genes (7) in breast cancer, a simple robust gene expression assay for tumor grade would be a valuable clinical tool. Herein, we developed a simple 5-gene tumor grade signature (MGI for molecular grade index) that recapitulates tumor grade and show that it predicts clinical outcome with comparable performance to the well-characterized 97-gene tumor grade index. We implemented a robust real-time reverse transcription-PCR (RT-PCR) assay for MGI and show that MGI together with HOXB13:IL17BR provides more accurate prognosis than either biomarker alone.
Patients and Methods
Patients and tumor samples. The designation of training and test sets, and the work flow of data analyses are outlined in Fig. 1. Previously published microarray data sets (accessions GSE3494 and GSE1456) are described in Supplementary Data. Two additional patient cohorts were analyzed by real-time RT-PCR. The first cohort [Massachusetts General Hospital (MGH) cohort] used a retrospective case-cohort design (14) and was derived from 683 stage I, II, and III patients with estrogen receptor–positive primary breast cancer treated at the MGH from 1991 to 1999. Clinical follow-up data were obtained from tumor registry and hospital records. Cases were all patients who developed distant metastasis during follow-up; controls were randomly selected from patients who remained disease free at last follow-up to achieve a 2:1 ratio of controls to cases. In addition, controls were matched to cases with respect to adjuvant therapy and time of diagnosis. For ∼80% of the cases and controls, both clinical outcome data and formalin-fixed paraffin-embedded tumor blocks were successfully retrieved. The final cohort consisted of 79 cases and 160 controls, and its patient and tumor characteristics were summarized in Supplementary Table S1. The second cohort (Oxford cohort; see Supplementary Table S2) consisted of 84 of the Oxford series described previously (6); all patients had estrogen receptor–positive primary breast cancer and were treated with adjuvant tamoxifen therapy. This study used portions of RNA from frozen tumor samples isolated previously (6).
Gene expression analysis by real-time RT-PCR. We used primer/probe sequences for HOXB13, IL17BR, ESR1, PGR, CHDH, ACTB, HMBS, SDHA, and UBC described previously (9), and designed primer/probe sequences for the five molecular grade genes (BUB1B, CENPA, NEK2, RACGAP1, and RRM2) and ERBB2 (HER2; Supplementary Table S2), using Primer Express (Applied Biosystem, Inc.). For each formalin-fixed paraffin-embedded sample, two 7-μm tissue sections were subject to gross macrodissection to enrich for tumor content. RNA extraction, reverse transcription, and Taq Man RT-PCR using the ABI 7900HT instrument (Applied Biosystem, Inc.) were done as previously described (9). The cycling threshold numbers were normalized to the mean cycling threshold of four reference genes (ACTB, HMBS, SDHA, and UBC).
Calculation of gene expression indices. Normalized expression levels for the five molecular grade genes from microarrays or RT-PCR were standardized to a mean of 0 and a SD of 1 across samples within each data set and then combined into a single index per sample as the first principal component (15). Standardization of the primary data within each data set was necessary to account for the different platforms (microarrays and RT-PCR) and sample types (frozen and formalin-fixed paraffin-embedded).
HOXB13:IL17BR was calculated as described previously (9); the means and SDs for HOXB13 and IL17BR used for standardizing the MGH cohort were derived from an analysis of 190 formalin-fixed paraffin-embedded samples from a separate population-based cohort of estrogen receptor–positive lymph node–negative breast cancer patients (data not shown).
Genomic grade index (GGI) was calculated from microarray data using the 128 Affymetrix probe sets representing 97 genes and scaled within each data set to have a mean of −1 for grade 1 tumors and +1 for grade 3 tumors as described before (5).
Statistical analysis. The cutpoint for MGI was determined as follows. Initial analysis of MGI in the Uppsala cohort indicated good discrimination of grade 1 and grade 3 tumors using the mean (0) as cutpoint. This cutpoint was further supported by receiver operating characteristic analysis (Supplementary Fig. S1; ref. 16). GGI was dichotomized at the cutpoint of 0 as described previously (5). The cutpoint of 0.06 for HOXB13:IL17BR, previously defined to stratify patients treated with adjuvant tamoxifen into low and high risk of recurrence (9), was applied directly in this study.
Kaplan-Meier analysis with log-rank test and Cox proportional hazards regression were done to assess the association of gene expression indices with clinical outcome. Multivariate Cox regression models were done to assess the prognostic capacity of gene expression indices after adjusting for known prognostic factors. Proportional hazard assumption was checked by scaled Schoenfeld residuals; variables violating proportional hazard assumption were adjusted for in the model through stratification. To account for the case-cohort design of the MGH cohort, we used weighed Kaplan-Meier analysis and Cox regression models with modifications to handle case-cohort designs (17, 18) as implemented in the survey package in R.6
To test for interaction between dichotomized MGI and HOXB13:IL17BR in Cox regression models, the Wald statistic was used in the MGH cohort and the likelihood ratio test was used in the Oxford cohort.Correlations of continuous variables with categorical factors were examined using nonparametric two-sample Wilcoxon test or Kruskal-Wallis test for factors with more than two levels. All statistical analyses were done in R. All significance tests were two sided, and a P value of <0.05 was considered significant.
Results
Gene selection for MGI. In our previous study of gene expression associated with breast cancer progression, we showed hundreds of genes differentially expressed between tumors of low and high histologic grade (19). Moreover, we showed that a subset of 39 genes with increased expression in high-grade tumors were also more highly expressed in the invasive compared with the preinvasive stage of breast cancer, suggesting a role for these genes in invasive growth (19). Thus, we focused on this subset of 39 genes to develop a prognostic tumor grade signature. First, we confirmed their high correlation with tumor grade in a large publicly available microarray data set (Uppsala cohort, n = 251; ref. 20). Next, we narrowed this list to five genes based on functional annotation of the genes [genes involved in different cell cycle phases (21) and processes], association with clinical outcome (Uppsala cohort, n = 236), and correlation with tumor grade in another independent 60-patient cohort (see Supplementary Tables S3-5; ref. 8).
Through the use of unsupervised (i.e., without using any clinical outcome data) principal component analysis, we combined the five-gene expression pattern into a single MGI score and showed that MGI strongly correlated with tumor grade (Supplementary Fig. S1). Using the mean value (0) of MGI as a cutpoint, most of the grade 1 and grade 3 tumors were classified correctly (89% overall accuracy), and grade 2 tumors were stratified into two groups (59% and 41% in the low and high MGI group, respectively).
Prognostic performance of MGI in breast cancer patients. Using two independent publicly available microarray data sets (Uppsala and Stockholm data sets; Fig. 1), we examined the capacity of MGI to predict clinical outcome in breast cancer patients and compared MGI with the previously described 97-gene GGI (5). In the Uppsala data set, receiver operating characteristic analysis indicated that MGI and GGI were comparable in discriminating grade 1 and grade 3 tumors, and Kaplan-Meier analysis (MGI dichotomized at a cutpoint of 0) separated patients into two subgroups with significantly different risks of breast cancer death (Fig. 2A-B). The survival curves and hazard ratios were comparable with those generated by GGI (Fig. 2C). Similar results were obtained in the independent Stockholm test data set (Fig. 2D-F). Therefore, our results showed that a simple five-gene index could reproduce the prognostic performance of the more complex 97-gene signature; we note that although MGI was developed entirely independently of GGI, four (BUB1B, CENPA, RACGAP1, and RRM2) of the five genes were among the 97-gene signature, and the fifth gene, NEK2, was only 2 positions down from the 112 grade 3–associated probe sets included in GGI (5).
Development and validation of an RT-PCR assay for MGI. We designed primers and probes for the five MGI genes in the Taq Man real-time PCR (RT-PCR) assay format (Supplementary Table S3). To validate the RT-PCR–based MGI assay, we carried out a retrospective case-cohort study. The cases were patients who were treated at the MGH between 1991 and 1999 and developed distant metastasis during follow-up. Controls were randomly selected from patients treated during the same period and were disease free at their most recent follow-up (Supplementary Table S1). Patients were treated variably with the therapy of best choice by their medical oncologist including the following: no systemic therapy, hormonal therapy, and chemotherapy. Because we were most interested in determining the therapy-independent prognostic utility of MGI, we matched controls with cases with respect to systemic therapy.
Similar to the microarray data sets analyzed above, the RT-PCR–based MGI also accurately discriminated grade 1 and grade 3 tumors (86% accuracy) using the same cutpoint of 0 as described above (Fig. 3A). Kaplan-Meier analysis indicated that high MGI was significantly associated with high risk of distant metastasis irrespective of nodal status (Fig. 3B-D). To remove potential confounding effect from mixed treatments in this cohort, we further performed Kaplan-Meier analysis of MGI within each treatment group. Indeed, MGI was significantly associated with distant metastasis-free survival in each of the untreated, endocrine-, and endocrine + chemotherapy–treated subgroups (Supplementary Fig. S2); the lack of significance in the chemotherapy-alone group might be due to its small sample size (n = 20). In addition, MGI was significantly prognostic in both postmenopausal and premenopausal patients and in patients with small tumors (<2 cm) or with intermediate tumor grade (grade 2; Supplementary Fig. S3). Finally, in a multivariate Cox regression model adjusting for age, tumor size, tumor grade, lymph node status, and systemic therapy, MGI remained highly significant with a hazard ratio of 4.7 (2.1-10.8; Table 1).
Variable . | . | Hazard ratio (95% confidence interval) . | P . |
---|---|---|---|
MGI | High vs low | 4.7 (2.1-10.8) | 0.0002 |
Tumor size | >2 cm vs <2 cm | 0.8 (0.4-1.5) | 0.4580 |
Tumor grade | 0.0011 | ||
II vs I | 1.6 (0.5-5.2) | 0.4331 | |
III vs I | 5.6 (1.5-20.6) | 0.0105 | |
Age | ≥35 y vs <35 y | 0.7 (0.2-1.9) | 0.4687 |
Node status | Pos. vs neg. | 1.2 (0.6-2.3) | 0.5581 |
Treatment | 0.5733 | ||
Chemo vs none | 0.9 (0.4-2.4) | 0.8837 | |
Endo vs none | 1.5 (0.5-4.5) | 0.4406 | |
Chemo + endo vs none | 1.0 (0.3-3.5) | 0.9939 |
Variable . | . | Hazard ratio (95% confidence interval) . | P . |
---|---|---|---|
MGI | High vs low | 4.7 (2.1-10.8) | 0.0002 |
Tumor size | >2 cm vs <2 cm | 0.8 (0.4-1.5) | 0.4580 |
Tumor grade | 0.0011 | ||
II vs I | 1.6 (0.5-5.2) | 0.4331 | |
III vs I | 5.6 (1.5-20.6) | 0.0105 | |
Age | ≥35 y vs <35 y | 0.7 (0.2-1.9) | 0.4687 |
Node status | Pos. vs neg. | 1.2 (0.6-2.3) | 0.5581 |
Treatment | 0.5733 | ||
Chemo vs none | 0.9 (0.4-2.4) | 0.8837 | |
Endo vs none | 1.5 (0.5-4.5) | 0.4406 | |
Chemo + endo vs none | 1.0 (0.3-3.5) | 0.9939 |
Abbreviations: Pos, positive; Neg, negative; Chemo, chemoteraphy; Endo, endocrine therapy.
Complementary prognostic value of MGI and HOXB13:IL17BR. To explore whether HOXB13:IL17BR provides additional prognostic information to MGI and vice versa, we analyzed both indices in patients with lymph node–negative tumors who received adjuvant endocrine therapy (n = 93). In this patient group, MGI and HOXB13:IL17BR each was strongly associated with risk of distant metastasis (Fig. 4A-B). When both were considered together, MGI was highly significant in stratifying patients into low and high-risk groups only when the tumors had high HOXB13:IL17BR, and likewise, HOXB13:IL17BR was only significant in stratifying patients with tumors having high MGI (Pinteraction = 0.09; Supplementary Fig. S4). We therefore combined MGI and HOXB13:IL17BR to stratify patients into three risk groups (low risk, low for MGI and low or high HOXB13:IL17BR; intermediate risk, high MGI and low HOXB13:IL17BR; and high risk, high for both MGI and HOXB13:IL17BR, accounting for 48%, 24%, and 28% of the patients, respectively). Kaplan-Meier analysis of these three groups indicated that high MGI and high HOXB13:IL17BR together predicted very poor outcome for the high-risk group (hazard ratio versus low-risk group = 26.4; 95% CI, 5.8-121; Fig. 4C). The Kaplan-Meier estimate of the 10-year distant metastasis-free survival probability was 98% (96-100%), 87% (77-99%), and 60% (47-78%) for the low, intermediate, and high-risk groups, respectively. Furthermore, after adjusting for systemic therapy and standard prognostic factors (age, tumor size, and grade) in a multivariate Cox regression model, the combined index remained highly statistically significant (Table 2), demonstrating the strong independent prognostic value of combining MGI and HOXB13:IL17BR.
Variables . | . | Hazard ratio (95% confidence interval) . | P . |
---|---|---|---|
MGI + HOXB13:IL17BR | 0.0007 | ||
Intermediate vs low | 5.5 (0.9-34.6) | 0.0720 | |
High vs low | 24.2 (4.3-135.2) | 0.0003 | |
Tumor size | >2 cm vs ≤2 cm | 1.0 (0.3-2.9) | 0.9804 |
Age | ≥ 35 y vs <35 y | 0.1 (0.0-0.4) | 0.0036 |
Treatment | Endo vs chemo + endo | 11.5 (2.2-59.4) | 0.0034 |
Variables . | . | Hazard ratio (95% confidence interval) . | P . |
---|---|---|---|
MGI + HOXB13:IL17BR | 0.0007 | ||
Intermediate vs low | 5.5 (0.9-34.6) | 0.0720 | |
High vs low | 24.2 (4.3-135.2) | 0.0003 | |
Tumor size | >2 cm vs ≤2 cm | 1.0 (0.3-2.9) | 0.9804 |
Age | ≥ 35 y vs <35 y | 0.1 (0.0-0.4) | 0.0036 |
Treatment | Endo vs chemo + endo | 11.5 (2.2-59.4) | 0.0034 |
NOTE: Tumor grade was adjusted for by stratification.
To further substantiate the prognostic power of combining MGI and HOXB13:IL17BR, we examined these two indices in an independent test cohort of 84 patients with estrogen receptor–positive breast cancer uniformly treated with adjuvant tamoxifen therapy (Oxford cohort; ref. 6). After applying the same cutpoints to these two indices and the same combination algorithm as described above, the resulting low-, intermediate-, and high-risk groups consisted of 44%, 24%, and 32% of the patients, respectively, in keeping with their proportions seen in the MGH cohort. Again, Kaplan-Meier analysis indicated that the high-risk group with tumors high for both indices had the worst clinical outcome [hazard ratios versus low-risk group = 7.9 (2.2-28.2); Fig. 4D], and the likelihood ratio test indicated a statistically significant interaction between these two indices (P = 0.036; see interaction plots in Supplementary Fig. S5).
Discussion
Given the importance of tumor grade in prognosis and the existence of hundreds of genes whose expression levels highly correlate with tumor grade and proliferation (5, 19), it is not surprising that a multitude of seemingly distinct prognostic signatures could be developed. Furthermore, the prognostic robustness and redundancy of these genes suggest that a much simpler assay involving a few genes may be sufficient. For example, Ivshina et al. (22) showed that a 264-gene tumor grade signature can be reduced to 6 genes in silico, whereas Sotiriou et al. (6) noted that only a fraction of the 97 genes for GGI are needed for prognosis. In this study, through a combined data- and knowledge-driven approach, we developed a five-gene tumor grade signature and implemented it in a robust RT-PCR assay. One important characteristic of MGI is that its calculation does not involve complex weighting trained on clinical outcome. Instead, it is a molecular correlate of tumor grade and derives its prognostic capacity from the latter (so-called “bottom-up” approach; ref. 3). The advantage of MGI over histologic tumor grade is 2-fold. First, such as GGI (5), it classifies grade 2 tumors to be either grade 1–like or grade 3–like, removing most of the ambiguity of pathologic tumor grading. Second, because an RT-PCR–based assay can be standardized in the clinical laboratory, it also removes the subjectivity and inter/intraobserver variability associated with pathologic grading (13).
Most notably, our results show that the prognostic accuracy of MGI can be augmented by also considering HOXB13:IL17BR and vice versa, suggesting a simple algorithm that stratifies patients into three risk groups. MGI and HOXB13:IL17BR seem to represent two distinct prognostic modules in breast cancer, as suggested by the observation that HOXB13:IL17BR but not MGI is strongly associated with the estrogen receptor signaling pathway (Supplementary Fig. S6). A recent small interfering RNA study of HOXB13 in breast cancer cell lines suggests that expression of HOXB13 blocks apoptotic cell death.7
Z. Wang, D. Sgroi, unpublished data.
The St. Gallen guidelines classify estrogen receptor–positive node-negative breast cancer patients into low and intermediate-risk groups, with the majority falling into the latter. For example, applying these guidelines to the MGH cohort resulted in the classification of 86% of the patients into the intermediate-risk group. An important treatment decision is whether to withhold chemotherapy for some of the patients in the intermediate-risk group, a question targeted by the TAILORx (24) and MINDACT (25) prospective clinical trials (24, 25). Significantly, this large intermediate-risk group could be reclassified by MGI and HOXB13:IL17BR into low (43%), intermediate (26%), or high (31%) risk (Supplementary Fig. S7). Therefore, the use of MGI and HOXB13:IL17BR could identify a large subgroup of low-risk women that may be spared from toxic chemotherapy and a subgroup at high risk for whom more intense chemotherapy regimens or new therapeutic agents should be considered.
High tumor grade or mitotic index predicts benefit from chemotherapy in node-negative breast cancer patients (26). Indeed, high MGI predicted complete pathologic response to preoperative paclitaxel followed by 5-fluorouracil, doxorubicin, and cyclophosphamide in estrogen receptor–positive patients.8
X. Ma, M. Erlander, D. Sgroi, unpublished data.
In summary, we developed and validated MGI as a powerful prognostic factor in estrogen receptor–positive breast cancer. Furthermore, MGI and HOXB13:IL17BR can be combined to provide more accurate prognostic information than either alone. The identification of a subset of patients with very poor outcome using these two biomarkers should facilitate clinical trial designs and drug development to target those cancers with both high MGI and high HOXB13:IL17BR.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant support: National Cancer Institute grant RO1-1CA112021-01 (D.C. Sgroi), the Department of Defense grant W81XWH-04-1-0606 (D.C. Sgroi), the Susan G. Komen Breast Cancer Foundation grant BCTR0402932 (D.C. Sgroi), a grant from the Avon Foundation (D.C. Sgroi) and the NCI SPORE in breast cancer at Massachusetts General Hospital (D.C. Sgroi).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincancerres.aacrjournals.org/).
Acknowledgments
We thank Yen Tran and Thao Ho for excellent technical assistance.