Abstract
Purpose: Selection of treatment options with the highest likelihood of successful outcome for individual breast cancer patients is based to a large degree on accurate classification into subgroups with poor and good prognosis reflecting a different probability of disease recurrence and survival after therapy. Here we propose a breast cancer classification algorithm taking into account three main prognostic features determined at the time of diagnosis: estrogen receptor (ER) status; lymph node (LN) status; and gene expression signatures associated with distinct therapy outcome.
Experimental Design: Using microarray expression profiling and quantitative reverse transcription-PCR analyses, we compared expression profiles of the 70-gene breast cancer survival signature in established breast cancer cell lines and primary breast carcinomas from cancer patients. We classified 295 breast cancer patients using 14-, 13-, 6-, and 4-gene survival predictor signatures into subgroups having statistically distinct probability of therapy failure (P < 0.0001). We evaluated the prognostic power of breast cancer survival predictor signatures alone and in combination with ER and LN status using Kaplan-Meier analysis.
Results: The breast cancer survival predictor algorithm allowed highly accurate classification into subgroups with dramatically distinct 5- and 10-year survival after therapy of a large cohort of 295 breast cancer patients with either ER+ or ER− tumors as well as LN+ or LN− disease (P < 0.0001, log-rank test).
Conclusions: Our data imply that quantitative laboratory tests measuring expression profiles of a limited set of identified small gene clusters may be useful in stratification of breast cancer patients at the time of diagnosis into subgroups with statistically distinct probability of positive outcome after therapy and assisting in selection of optimal treatment strategies. The estimated increase in survival due to the optimization of treatment protocols may reach many thousands of breast cancer survivors every year at the 10-year follow-up check point.
Introduction
Highly accurate prognostic tests are essential for individualized decision-making process during clinical management of cancer patients leading to a rational and more efficient selection of appropriate therapeutic interventions and improved outcome after therapy. In breast cancer, patients are classified into broad subgroups with poor and good prognosis reflecting a different probability of disease recurrence and survival after therapy. Distinct prognostic subgroups are identified using a combination of clinical and pathological criteria: age; primary tumor size; status of axillary lymph nodes (LNs); histological type and pathological grade of tumor; and hormone receptor status (1, 2).
One of the most critical treatment decisions during the clinical management of breast cancer patients is the use of adjuvant systemic therapy. Adjuvant systemic therapy significantly improves disease-free and overall survival in breast cancer patients with both LN− and LN+ disease (3, 4). It is generally accepted that breast cancer patients with poor prognosis would gain the most benefits from the adjuvant systemic therapy (1, 2).
Diagnosis of LN status is important in therapeutic decision-making, prediction of disease outcome, and probability of breast cancer recurrence. Invasion into axillary LNs is recognized as one of the most important prognostic factors (5, 6, 7). Most patients diagnosed with LN− breast cancer can be effectively treated with surgery and local radiation therapy. However, results of several studies show that 22–33% of breast cancer patients with no detectable LN involvement and classified into a good prognosis subgroup develop recurrence of disease after a 10-year follow-up (4). Therefore, accurate identification of breast cancer patients with LN− tumors who are at high risk of recurrence is critically important for rational treatment decision and improved clinical outcome in the individual patient.
Microarray-based gene expression profiling of human cancers rapidly emerged as a new powerful screening technique generating hundreds of novel diagnostic, prognostic, and therapeutic targets (8, 9, 10, 11, 12, 13). Recently, breast cancer gene expression signatures have been identified that are associated with the estrogen receptor (ER) and LN status of patients and can aid in classification of breast caner patients into subgroups with different clinical outcome after therapy (14, 15, 16, 17, 18, 19, 20, 21, 22).
One of the significant limitations of these array-based studies is that they generated vast data sets comprising many attractive targets with diagnostic and prognostic potential. Design and performance of meaningful follow-up experiments such as translation of the array-generated hits into quantitative reverse transcription-PCR (Q-RT-PCR)-based analytical assays would require a significant data reduction. Furthermore, clinical implementation of novel prognostic tests would require integration of genomic data and the best-established conventional markers of outcome.
Here, we move forward toward these goals by translating a large microarray-based breast cancer outcome predictor signature into Q-RT-PCR-based assays of mRNA abundance levels of small gene clusters performing with similar classification accuracy. We demonstrate that identified molecular signatures provide additional predictive values over well-established conventional prognostic markers for breast cancer such as hormone receptor status and LN involvement. These data suggest that quantitative laboratory tests measuring expression profiles of identified small gene clusters may be useful in stratification of breast cancer patients into subgroups with distinct likelihood of positive outcome after therapy and assisting in selection of optimal treatment strategies.
Materials and Methods
Clinical Samples.
We used in our experiments two independent sets of clinical samples for signature discovery (training outcome set of 78 samples) and validation (validation outcome set of 295 samples). Original gene expression profiles of the training set of 78 clinical samples analyzed in this study were reported recently (18). Primary gene expression data files of clinical samples as well as associated clinical information can be found online.1
Breast tumor tissues comprising the validation data set were obtained from 295 breast cancer patients undergoing therapeutic and diagnostic procedures performed as part of routine clinical management at the Netherlands Cancer Institute. Clinical and pathological features of 295 breast cancer cases as well as original gene expression analysis of corresponding tumor samples comprising validation outcome set were reported elsewhere (21). Expression data for 70-gene breast cancer survival predictor cluster for 295 clinical samples as well as associated clinical information can be found online.2
Cell Culture.
Human breast carcinoma cell lines used in this study were described previously (23, 24, 25). Measurements of mRNA expression levels of 70 genes comprising the breast cancer survival predictor cluster in established human breast carcinoma cell lines (MCF7, MDA-MB-435, MDA-MB-468, MDA-MB-231, MDA-MB-435Lung2, MDA-MB-435Br1, and MDA-MB-435BL3) and primary cultures of normal human breast epithelial cells (Clonetics/BioWhittaker, San Diego, CA) were performed using the Q-RT-PCR method. MB435Lung2 and MB435Br1 cell lines were derived from lung (MB435Lung2) and brain (MD435Br1) metastases of parental MDA-MB-435 cells growing orthotopically in nude mice (26). The MB435BL3 cell line was established after three consecutive rounds of 30-min circulation in blood of rats and recovery and expansion in culture of the parental MDA-MB-435 cells. Except where noted, cell lines were grown in RPMI 1640 supplemented with 10% fetal bovine serum and gentamicin (Life Technologies, Inc.) to 70–80% confluence and subjected to serum starvation as described previously (23, 24, 25) or maintained in fresh complete media supplemented with 10% fetal bovine serum.
RNA and mRNA Extraction.
For gene expression analysis, cells were harvested in lysis buffer 2 h after the last media change at 70–80% confluence, and total RNA or mRNA was extracted using the RNeasy (Qiagen, Chatsworth, CA) or FastTract kits (Invitrogen, Carlsbad, CA). Cell lines were not split more than five times before RNA extraction, except where noted.
Affymetrix Arrays.
The protocol for mRNA quality control and gene expression analysis was that recommended by Affymetrix.3 In brief, approximately 1 μg of mRNA was reverse transcribed with an oligo(dT) primer that has a T7 RNA polymerase promoter at the 5′ end. Second-strand synthesis was followed by cRNA production incorporating a biotinylated base. Hybridization to Affymetrix U95Av2 arrays representing 12,625 transcripts overnight for 16 h was followed by washing and labeling using a fluorescence-labeled antibody. The arrays were read, and data were processed using Affymetrix equipment and software as reported previously (27, 28, 29).
Data Analysis.
Detailed protocols for data analysis and documentation of the sensitivity, reproducibility, and other aspects of the quantitative statistical microarray analysis using Affymetrix technology have been reported previously (27, 28, 29). A total of 40–50% of the surveyed genes were called present by the Affymetrix Microarray Suite version 5.0 software in these experiments. The concordance analysis of differential gene expression across the data sets was performed using Affymetrix MicroDB version 3.0 and DMT version 3.0 software as described previously (27, 28, 29). We processed the microarray data using the Affymetrix Microarray Suite version 5.0 software and performed statistical analysis of expression data set using the Affymetrix MicroDB and Affymetrix DMT software. The concordance analysis of differential gene expression across the clinical and experimental data sets was performed using Affymetrix MicroDB version 3.0 and DMT version 3.0 software as described previously (27, 28, 29). The Pearson correlation coefficient for individual test samples and appropriate reference standard was determined using Microsoft Excel software as described in the signature discovery protocol.
Q-RT-PCR Analysis.
Q-RT-PCR analysis of transcript abundance levels for genes of the breast cancer survival predictor cluster was performed using an ABI7900 instrument (Applied Biosystems, Foster City, CA). Primer design, assay validation, and Q-PCR analysis were performed as described previously (27, 28, 29) and according to the vendor’s recommended protocols.4 For quantification, a reference curve was generated for each gene by amplifying serial dilution of cDNA, and expression values were normalized using glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and mRNA from normal human breast epithelial cell line (Clonetics/BioWhittaker) as controls. The real-time PCR method measures the accumulation of PCR products by a fluorescence detector system and allows for quantification of the amount of amplified PCR products in the log phase of the reaction.
Survival Analysis.
The Kaplan-Meier survival analysis was carried out using GraphPad Prism 4.0 software.5 Statistical significance of the difference between the survival curves for different groups of patients was assessed using χ2 and log-rank tests.
Results and Discussion
The 70-Gene Breast Cancer Metastasis and Survival Predictor Signature Represents Heterogeneous Set of Small Gene Clusters Independently Performing with High Therapy Outcome Prediction Accuracy.
A recent study on gene expression profiling of breast cancer identifies 70 genes whose expression pattern is strongly predictive of a short post-diagnosis and treatment interval to distant metastases (18). The expression pattern of these 70 genes discriminates with 81% (optimized sensitivity threshold) or 83% (optimal accuracy threshold) accuracy the patient’s prognosis in the group of 78 young women diagnosed with sporadic LN− breast cancer (this group comprises 34 patients who developed distant metastases within 5 years and 44 patients who continued to be disease free at least 5 years after therapy; they constitute clinically defined poor prognosis and good prognosis groups, correspondingly). We thought to reduce the number of genes whose expression patterns represent genetic signatures of breast cancer with poor prognosis or good prognosis. Measurements of mRNA expression levels of 70 genes in established human breast carcinoma cell lines (MCF7, MDA-MB-435, MDA-MB-468, MDA-MB-231, MDA-MB-435Lung2, MDA-MB-435Br1, and MDA-MB-435BL3) and primary cultures of normal human breast epithelial cells were performed using the Q-RT-PCR method, which is generally accepted as the most reliable method of gene expression analysis and unambiguous confirmation of a gene identity. For each breast cancer cell line, concordant sets of genes were identified exhibiting both positive and negative correlation between fold expression changes in cancer cell lines versus control cell line and poor prognosis group versus good prognosis group. Minimum segregation sets were selected from corresponding concordance sets, and individual phenotype association indices (PAIs) were calculated. Four top-performing breast cancer metastasis predictor gene clusters are listed in Table 1.
Breast cancer poor prognosis predictor cluster comprising 6 genes was identified (r = 0.981) using the MDA-MB-468 cell line gene expression profile as a reference standard. Thirty-two of 34 samples from the poor prognosis group had positive PAIs, whereas 29 of 44 samples from the good prognosis group had negative PAIs, yielding 78% overall accuracy in sample classification. Another breast cancer poor prognosis predictor cluster comprising 4 genes was identified (r = 0.944) using the MDA-MB-435BL3 cell line gene expression profile as a reference standard. Using this 4-gene cluster, 28 of 34 samples from the poor prognosis group had positive PAIs, whereas 28 of 44 samples from the good prognosis group had negative PAIs, yielding 72% overall accuracy in sample classification.
Breast cancer good prognosis predictor cluster comprising 14 genes was identified (r = −0.952) using the MDA-MB-435Br1 cell line gene expression profile as a reference standard. Thirty of 34 samples from the poor prognosis group had negative PAIs, whereas 34 of 44 samples from the good prognosis group had positive PAIs, yielding 82% overall accuracy in sample classification. Another breast cancer good prognosis predictor cluster comprising 13 genes (r = −0.992) was identified using the MCF7 cell line gene expression profile as a reference standard. Thirty of 34 samples from the poor prognosis group had negative PAIs, whereas 32 of 44 samples from the good prognosis group had positive PAIs, yielding 79% overall accuracy in sample classification.
To validate the classification accuracy using an independent data set, we tested performance of the 13-gene good prognosis predictor cluster on a set of 19 samples obtained from 11 breast cancer patients who developed distant metastases within 5 years after diagnosis and treatment and 8 patients who remained disease free for at least 5 years (18). Nine of 11 samples from the poor prognosis group had negative PAIs, whereas 6 of 8 samples from the good prognosis group had positive PAIs, yielding 79% overall accuracy in sample classification.
Kaplan-Meier analysis showed that metastasis-free survival after therapy was significantly different in breast cancer patients segregated into good and poor prognosis groups based on relative values of expression signatures defined by all four small gene clusters (Fig. 1). These data suggest that quantitative laboratory tests measuring expression profiles of identified small gene clusters may be useful in stratification of breast cancer patients into subgroups with statistically distinct probabilities to remain disease-free after therapy.
Small Gene Clusters and a Large Parent Signature Perform with Similar Therapy Outcome Prediction Accuracy in an Independent Cohort of 295 Breast Cancer Patients.
Recently, the breast cancer prognosis prediction accuracy of the 70-gene signature was validated in a large cohort of 295 patients with either LN− or LN+ breast cancer (21). The expression profile of the 70-gene breast cancer outcome predictor signature was highly informative in forecasting the probability of remaining free of distant metastasis and predicting the overall survival after therapy (21). We thought to compare the classification accuracy of small gene clusters and a large 70-gene parent signature applied to a cohort of 295 patients.
The identified small gene clusters and a large parent signature perform similarly in identifying subgroups of breast cancer patients with poor and good prognosis defined by differences in the probability of overall survival after therapy. At the several classification threshold levels, small gene clusters fully recapitulate or even outperform the 70-gene parent signature in classification accuracy of the 295 breast cancer patients. Taken together, these data are consistent with the idea that the 70-gene breast cancer prognosis signature represents a heterogeneous set of small gene clusters with high therapy outcome prediction potential. Consistent with this idea, the application of the 14-gene survival predictor signature was highly informative in classification of breast cancer patients into subgroups with statistically significant difference in the probability of survival after therapy (Fig. 2). A highly significant difference (P < 0.0001) in the survival probability between poor and good prognosis groups defined by the 14-gene signature was achieved using multiple classification threshold levels providing additional flexibility in selection of a desirable 5-or 10-year survival level defining poor and good prognosis subgroups (Fig. 2). Interestingly, stratification of patients into subgroups using a 10% increment from bottom to top values of the 14-gene expression profile (Fig. 2) appears to be highly informative in predicting the difference in survival probability across the entire cohort of patients.
Stratification of Breast Cancer Patients into Subgroups with Distinct Survival Probability Using an Outcome Predictor Algorithm Based on a Panel of Four Expression Signatures.
Theoretically, the outcome predictor algorithm based on a combination of signatures should be more robust than a single predictor signature, particularly during the validation analysis using an independent test cohort of patients. Next we analyzed whether a combination of the four signatures would perform in the patient classification test with similar efficacy as the individual signatures. For every patient, we considered the individual outcome calls defined by each signature at the best performing cutoff level (highest hazard ratio and lowest P). We segregated all patients into subgroups with same numbers of poor prognosis and good prognosis calls and performed Kaplan-Meier analysis to determine the difference in survival probability between subgroups (Fig. 3). This analysis clearly demonstrates the power of the breast cancer outcome predictor algorithm based on a panel of 4 gene expression signatures (Fig. 3). The Kaplan-Meier survival analysis (Fig. 3, A and C) showed that the median survival after therapy of patients having 4 poor prognosis signatures was 9.7 years, and this group of patients had 67% of all recorded death events in the cohort of 295 patients. Fifty-one percent of patients with 4 poor prognosis signatures died within 10 years after therapy, whereas 98% of patients with 4 good prognosis signatures remained alive at least 10 years. The estimated hazard ratio for death after therapy for these two groups of patients defined by the outcome predictor algorithm was 31.13 (95% confidence interval of ratio, 2.656–8.095; P < 0.0001).
The group of patients having 2 or more poor prognosis signatures had 75 of 79 (95%) of the death events in the cohort of 295 patients, thus comprising a poor prognosis classification category defined by the outcome predictor algorithm based on a panel of 4 survival predictor signatures. The Kaplan-Meier analysis (Fig. 3 B) showed that the median survival after therapy of patients in the poor prognosis group was 14.4 years. Forty percent of patients in the poor prognosis group died within 10 years after therapy, whereas 94% of patients in the good prognosis group remained alive at least 10 years. The estimated hazard ratio for death after therapy in the poor prognosis group of patients as compared with the good prognosis group of patients defined by the outcome predictor algorithm was 10.05 (95% confidence interval of ratio, 2.355–5.956; P < 0.0001).
The 70-Gene Signature, in Contrast to Small Gene Clusters, Is Not Suitable for Breast Cancer Outcome Prediction in Patients with ER− Tumors.
Consistent with the well-established prognostic value of the ER status of breast tumors (see “Introduction”), 97% of patients in the good prognosis group defined by the 70-gene signature had ER+ tumors (21). Conversely, 96% of breast cancer patients with ER− tumors (66 of 69 patients at the cutoff level of <0.45) had expression profile of the 70 genes predictive of a poor outcome after therapy. Two important conclusions can be drawn from this association. First, breast cancer patients with ER+ tumors and poor prognosis expression profile of the 70 genes may have an as yet unidentified functional defect within an ER response pathway. Second, a 70-gene signature appears to assign rather uniformly a vast majority of the patients with ER− tumors into the poor prognosis category and therefore is not suitable for prognosis prediction in this group of breast cancer patients.
In agreement with many previous observations, patients with ER− tumors had significantly worse survival after therapy compared with the patients with ER+ tumors in the cohort of 295 breast cancer patients (Fig. 4,A). The Kaplan-Meier survival analysis (Fig. 4 A) showed that the median relapse-free survival after therapy of patients with ER− tumors was 9.7 years. Only 47.1% of patients with ER− tumors survived 10 years after therapy compared with 77.4% of patients with ER+ tumors. The estimated hazard ratio for survival after therapy in the poor prognosis group as compared with the good prognosis group of patients defined by the ER status was 3.258 (95% confidence interval of ratio, 2.792–8.651; P < 0.0001).
Next we set out to determine whether the application of the survival predictor algorithm would identify subgroups of patients with distinct clinical outcome after therapy in breast cancer patients with ER− tumors, thus providing additional predictive value to the therapy outcome classification based on ER status alone. We were unable to generate statistically meaningful prognostic stratification of ER− breast cancer patients using a parent 70-gene signature (data not shown). However, we were able to identify two small gene clusters comprising 5 and 3 genes (Table 2) that appear highly informative in classification of breast cancer patients with ER− tumors into good and poor prognosis subgroups with statistically distinct probability of survival after therapy (Fig. 4 B).
In the group of 69 breast cancer patients with ER− tumors (Fig. 4 B), the median survival after therapy of patients in the poor prognosis subgroup defined by the survival predictor algorithm was 5.2 years. Only 30% of patients in the poor prognosis subgroup survived 10 years after therapy compared with 77% of patients in the good prognosis subgroup. The estimated hazard ratio for survival after therapy in the poor prognosis subgroup as compared with the good prognosis subgroup of patients defined by the survival predictor algorithm was 3.609 (95% confidence interval of ratio, 1.477–5.792; P = 0.0021).
Outcome Classification of Breast Cancer Patients with ER+ Tumors Using a 14-Gene Survival Predictor Signature.
To further validate the potential clinical utility of identified signatures, we tested whether an application of a 14-gene survival predictor cluster would be informative in classification of breast cancer patients with ER+ tumors. Kaplan-Meier analysis showed that application of the 14-gene survival predictor signature identified three subgroups of patients with statistically distinct probability of survival after therapy in the cohort of 226 breast cancer patients with ER+ tumors (Fig. 5, A and B). The median survival after therapy of patients in the poor prognosis subgroup defined by the 14-gene survival predictor signature was 7.2 years (Fig. 5,A). Only 41% of patients in the poor prognosis subgroup survived 10 years after therapy compared with 100% of patients in the good prognosis subgroup (P < 0.0001). A large, statistically distinct subgroup of patients with an intermediate expression pattern of the 14-gene signature and an intermediate prognosis was identified by Kaplan-Meier survival analysis (Fig. 5,B). The patients in the subgroup with an intermediate prognosis had 90% 5-year survival and 76% 10-year survival after therapy (Fig. 4,B). Thus, the 14-gene survival predictor signature appears highly informative in classification of breast cancer patients with ER+ tumors into good, intermediate, and poor prognosis subgroups with statistically significant difference in the probability of survival after therapy (Fig. 5, A and B).
Therapy Outcome Prediction in Breast Cancer Patients with LN− Disease Using Survival Predictor Signatures.
Invasion into axillary LNs is considered one of the most important negative prognostic factors in breast cancer, and patients with no detectable LN involvement are classified as having good prognosis (5, 6, 7). Breast cancer patients with LN− disease typically would not be selected for adjuvant systemic therapy and are usually treated with surgery and radiation. Recent data demonstrated that up to 33% of these patients would fail therapy and develop recurrence of the disease after a 10-year follow-up (4). Therefore, we tested whether application of the 14-gene survival predictor signature would aid in identification of breast cancer patients with LN− tumors who are at high risk of treatment failure.
Kaplan-Meier analysis showed that application of the 14-gene survival predictor signature identified two subgroups of patients with statistically distinct probability of survival after therapy in the cohort of 151 breast cancer patients with LN− disease (Fig. 6,A). The median survival after therapy of patients in the poor prognosis subgroup defined by the 14-gene survival predictor signature was 7.7 years (Fig. 6 A). Only 46% of patients in the poor prognosis subgroup survived 10 years after therapy compared with 82% of patients in the good prognosis subgroup (P < 0.0001). The estimated hazard ratio for survival after therapy in the poor prognosis subgroup as compared with the good prognosis subgroup of patients defined by the 14-gene survival predictor signature was 5.067 (95% confidence interval of ratio, 3.174–11.57; P < 0.0001).
Kaplan-Meier analysis demonstrated that application of the 14-gene survival predictor signature identified two subgroups of patients with statistically distinct probability of survival after therapy in the cohort of 109 breast cancer patients with ER+ tumors and LN− disease (Fig. 6,B). The median survival after therapy of patients in the poor prognosis subgroup defined by the 14-gene survival predictor signature was 11.0 years (Fig. 6 B). The 10-year survival after therapy in the poor prognosis subgroup was 57%, compared with 86% patient survival in the good prognosis subgroup (P < 0.0001). The estimated hazard ratio for survival after therapy in the poor prognosis subgroup as compared with the good prognosis subgroup of patients defined by the 14-gene survival predictor signature was 5.314 (95% confidence interval of ratio, 2.775–17.79; P < 0.0001).
Next we thought to determine whether the application of small gene clusters comprising 5 and 3 genes (Table 2) that appear highly informative in classification of breast cancer patients with ER− tumors into good and poor prognosis subgroups with statistically distinct probability of survival after therapy (Fig. 3,B) would be also informative in classification of subgroup of ER− patients with LN− disease. In the group of 42 breast cancer patients with ER− tumors and LN− disease (Fig. 6,C), the median survival after therapy of patients in the poor prognosis subgroup defined by the survival predictor algorithm was 5.2 years. Only 34% of patients in the poor prognosis subgroup survived 10 years after therapy compared with 74% of patients in the good prognosis subgroup. The estimated hazard ratio for survival after therapy in the poor prognosis subgroup as compared with the good prognosis subgroup of patients defined by the survival predictor algorithm was 3.237 (95% confidence interval of ratio, 1.139–6.476; P = 0.0243). Thus, application of survival predictor signatures appears highly informative in classification of breast cancer patients with LN− disease and either ER+ or ER− tumors into good and poor prognosis subgroups with statistically significant difference in the probability of survival after therapy (Fig. 6, B and C).
Therapy Outcome Prediction in Breast Cancer Patients with LN+ Disease Using Survival Predictor Signatures.
Breast cancer patients with invasion into axillary LN are considered as having a poor prognosis and are usually treated with adjuvant systemic therapy. The patients with poor prognosis are thought to benefit most from adjuvant systemic therapy (see “Introduction”). In the cohort of 295 breast cancer patients, 10 of 151 (6.6%) patients who had LN− disease and 120 of the 144 (83.3%) patients who had LN+ disease had received adjuvant systemic therapy (21). This treatment strategy was clearly beneficial for patients with LN+ disease because subgroups of patients with distinct LN status in the cohort of 295 patients had statistically indistinguishable survival after therapy (data not shown). Next we analyzed whether therapy outcome prediction using survival predictor signatures would be informative in the breast cancer patients with LN+ disease.
Kaplan-Meier analysis showed that application of the 14-gene survival predictor signature identified three subgroups of patients with statistically distinct probability of survival after therapy in the cohort of 144 breast cancer patients with LN+ disease (Fig. 7,A). The median survival after therapy of patients in the poor prognosis subgroup defined by the 14-gene survival predictor signature was 9.5 years (Fig. 7,A). Only 43% of patients in the poor prognosis subgroup survived 10 years after therapy compared with 98% of patients in the good prognosis subgroup (P < 0.0001). A large, statistically distinct subgroup of patients with an intermediate expression pattern of the 14-gene signature and an intermediate prognosis was identified by Kaplan-Meier survival analysis (Fig. 7,A). The patients in the subgroup with an intermediate prognosis had 86% 5-year survival and 73% 10-year survival after therapy (Fig. 7,A). Thus, 14-gene survival predictor signature appears highly informative in classification of breast cancer patients with LN+ disease into good, intermediate, and poor prognosis subgroups with statistically significant difference in the probability of survival after therapy (Fig. 7 A).
Using the 14-gene survival predictor signature, we identified two subgroups of patients with statistically distinct probability of survival after therapy in the cohort of 117 breast cancer patients with ER+ tumors and LN+ disease (Fig. 7,B). The median survival after therapy of patients in the poor prognosis subgroup defined by the 14-gene survival predictor signature was 11.0 years (Fig. 7 B). The 10-year survival after therapy in the poor prognosis subgroup was 68%, compared with 98% patient survival in the good prognosis subgroup (P < 0.0026). The estimated hazard ratio for survival after therapy in the poor prognosis subgroup as compared with the good prognosis subgroup of patients defined by the 14-gene survival predictor signature was 6.810 (95% confidence interval of ratio, 1.566–8.358; P < 0.0026).
Next we thought to determine whether the application of small gene clusters comprising 5 and 3 genes (Table 2) would be also informative in classification of the subgroup of ER− patients with LN+ disease. In the group of 27 breast cancer patients with ER− tumors and LN+ disease (Fig. 7,C), the median survival after therapy of patients in the poor prognosis subgroup defined by the survival predictor algorithm was 4.4 years. Only 24% of patients in the poor prognosis subgroup survived 10 years after therapy compared with 82% of patients in the good prognosis subgroup. The estimated hazard ratio for survival after therapy in the poor prognosis subgroup as compared with the good prognosis subgroup of patients defined by the survival predictor algorithm was 3.815 (95% confidence interval of ratio, 0.9857–9.660; P = 0.0530). Thus, application of survival predictor signatures also appears informative in classification of breast cancer patients with LN+ disease into good and poor prognosis subgroups with statistically significant difference in the probability of survival after therapy (Fig. 7, A and B).
Estimated Long-Term Survival Benefits of Using Gene Expression Profiling as a Component of Multiparameter Therapy Outcome Classification of Breast Cancer Patients.
Next we attempted to estimate the potential clinical benefits of application of gene expression survival predictor signatures for classification of breast cancer patients at the time of diagnosis into subgroups with distinct probability of survival after therapy. In our estimate, we used the assignment of the patient into poor outcome classification subgroup as a criterion of treatment failure and reason for prescription of additional cycle(s) of adjuvant systemic therapy. We have made the estimate of potential therapeutic benefits in the cohort of 295 breast cancer patients (21) and based our estimate on the assumption that the use of additional cycle(s) of adjuvant systemic therapy would be prescribed to patients classified into poor prognosis subgroups. In the cohort of 295 breast cancer patients, 10 of 151 (6.6%) patients who had LN− disease and 120 of the 144 (83.3%) patients who had LN+ disease had received adjuvant systemic therapy (21), indicating that a major difference in treatment protocols between LN+ and LN− subgroups was the application of adjuvant systemic therapy in patients with LN+ disease. We accepted the actual 5- and 10-year survival in the corresponding classification categories as the expected therapy outcome for a given subgroup. We assumed that each additional cycle of adjuvant systemic therapy would result in the same therapy outcome as was actually documented in the most relevant subgroups of the 295 patients. Therapy outcome for patients classified into poor prognosis subgroups and treated with additional cycle(s) of adjuvant systemic therapy is expected to be in 37% of patients in good therapy outcome category for ER+/LN+ and ER+/LN− poor signature subgroups and in 41% of patients in good therapy outcome category for ER−/LN+ and ER−/LN− poor signature subgroups. Finally, we assumed that patients classified into good prognosis subgroups would receive the same treatment and would have the same outcome as in the original cohort of 295 patients (21). Based on these assumptions, we calculated the number of patients who would be expected to have good and poor survival outcome after therapy and estimated the expected 10-year survival in each classification subgroups.
One of the most interesting end points of this analysis is the prediction that patients with ER−/LN− and ER−/LN+ breast cancer classified into poor prognosis subgroups would be expected to show a most dramatic increase in 10-year survival after therapy. This prediction is consistent with the generally accepted notion that breast cancer patients with poor prognosis would benefit most from adjuvant systemic therapy (see “Introduction”). The estimated modest increase in the overall 10-year survival may translate every year into ∼7000–9000 more breast cancer survivors after 10-year follow-up in the United States alone. Our ability to accurately segregate at the time of diagnosis breast cancer patients with low probability of survival after therapy should lead to more rapid development of novel efficient therapeutic modalities specifically targeting most aggressive therapy-resistant breast cancers.
Grant support: NIH/National Cancer Institute Grant 5RO1 CA89827 (to G. V. G.) and Metastat, Inc.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Clinical Cancer Research Online (http://clincanres.aacrjournals.org).
Requests for reprints: Gennadi V. Glinsky, Sidney Kimmel Cancer Center, 10835 Altman Row, San Diego, California 92121. Phone: (858) 450-5990; Fax: (858) 623-2740, E-mail: [email protected]
http://www.rii.com/publications/2002/vantveer.htm.
http://www.rii.com/publications/2002/nejm.htm.
http://www.affymetrix.com.
http://appliedbiosystems.com/support/tutorials/.
http://www.graphpad.com.
Gene . | Description . | Microarray ID . | UniGene ID . |
---|---|---|---|
6-Gene signature | |||
FLT1 | Fms-related tyrosine kinase 1 | NM_002019 | Hs.381093 |
BBC3 | BCL2 binding component 3 | U82987 | Hs.87246 |
TGFB3 | Transforming growth factor, β 3 | NM_003239 | Hs.2025 |
MS4A7 | Membrane-spanning 4-domains | AF201951 | Hs.11090 |
GSTM3 | Glutathione S-transferase M3 | NM_000849 | Hs.2006 |
FGF18 | Fibroblast growth factor 18 | NM_003862 | Hs.49585 |
4-Gene signature | |||
HEC | Highly expressed in cancer | NM_006101 | Hs.58169 |
MCM6 | Minichromosome maintenance deficient 6 | NM_005915 | Hs.155462 |
GSTM3 | Glutathione S-transferase M3 | NM_000849 | Hs.2006 |
FGF18 | Fibroblast growth factor 18 | NM_003862 | Hs.49585 |
13-Gene signature | |||
CEGP1 | SCUBE2 signal peptide, CUB domain | NM_020974 | Hs.222399 |
FGF18 | Fibroblast growth factor 18 | NM_003862 | Hs.49585 |
GSTM3 | Glutathione S-transferase M3 | NM_000849 | Hs.2006 |
TGFB3 | Transforming growth factor, β 3 | NM_003239 | Hs.2025 |
MS4A7 | Membrane-spanning 4-domains | AF201951 | Hs.11090 |
EST | Hypothetical protein | Contig55377_RC | Hs.218182 |
AP2B1 | Adaptor-related protein complex 2 | NM_001282 | Hs.74626 |
CCNE2 | Cyclin E2 | NM_004702 | Hs.30464 |
KIAA0175 | Maternal embryonic leucine zipper kinase | NM_014791 | Hs.184339 |
EXT1 | Exostoses (multiple) 1 | NM_000127 | Hs.184161 |
LOC341692 | Similar to Diap3 protein | Contig46218_RC | Hs.283127 |
PK428 | CDC42 binding protein kinase α | NM_003607 | Hs.18586 |
14-Gene signature | |||
MS4A7 | Membrane-spanning 4-domains | AF201951 | Hs.11090 |
TGFB3 | Transforming growth factor, β 3 | NM_003239 | Hs.2025 |
BBC3 | BCL2 binding component 3 | U82987 | Hs.87246 |
AP2B1 | Adaptor-related protein complex 2 | NM_001282 | Hs.74626 |
ALDH4A1 | Aldehyde dehydrogenase 4 family, member A1 | NM_003748 | Hs.77448 |
FLJ11190 | Chromosome 20, open reading frame 46 | NM_018354 | Hs.155071 |
DC13 | DC13 protein | NM_020188 | Hs.6879 |
GMPS | Guanine monophosphate synthetase | NM_003875 | Hs.5398 |
AKAP2 | A kinase (PRKA) anchor protein | Contig57258_RC | Hs.42322 |
DCK | Deoxycytidine kinase | NM_000788 | Hs.709 |
ECT2 | Epithelial cell transforming sequence 2 | Contig25991 | Hs.122579 |
EST | ESTs, weakly similar to quiescin | Contig38288_RC | |
OXCT | 3-Oxoacid CoA transferase | NM_000436 | Hs.177584 |
EXT1 | Exostoses (multiple) 1 | NM_000127 | Hs.184161 |
Gene . | Description . | Microarray ID . | UniGene ID . |
---|---|---|---|
6-Gene signature | |||
FLT1 | Fms-related tyrosine kinase 1 | NM_002019 | Hs.381093 |
BBC3 | BCL2 binding component 3 | U82987 | Hs.87246 |
TGFB3 | Transforming growth factor, β 3 | NM_003239 | Hs.2025 |
MS4A7 | Membrane-spanning 4-domains | AF201951 | Hs.11090 |
GSTM3 | Glutathione S-transferase M3 | NM_000849 | Hs.2006 |
FGF18 | Fibroblast growth factor 18 | NM_003862 | Hs.49585 |
4-Gene signature | |||
HEC | Highly expressed in cancer | NM_006101 | Hs.58169 |
MCM6 | Minichromosome maintenance deficient 6 | NM_005915 | Hs.155462 |
GSTM3 | Glutathione S-transferase M3 | NM_000849 | Hs.2006 |
FGF18 | Fibroblast growth factor 18 | NM_003862 | Hs.49585 |
13-Gene signature | |||
CEGP1 | SCUBE2 signal peptide, CUB domain | NM_020974 | Hs.222399 |
FGF18 | Fibroblast growth factor 18 | NM_003862 | Hs.49585 |
GSTM3 | Glutathione S-transferase M3 | NM_000849 | Hs.2006 |
TGFB3 | Transforming growth factor, β 3 | NM_003239 | Hs.2025 |
MS4A7 | Membrane-spanning 4-domains | AF201951 | Hs.11090 |
EST | Hypothetical protein | Contig55377_RC | Hs.218182 |
AP2B1 | Adaptor-related protein complex 2 | NM_001282 | Hs.74626 |
CCNE2 | Cyclin E2 | NM_004702 | Hs.30464 |
KIAA0175 | Maternal embryonic leucine zipper kinase | NM_014791 | Hs.184339 |
EXT1 | Exostoses (multiple) 1 | NM_000127 | Hs.184161 |
LOC341692 | Similar to Diap3 protein | Contig46218_RC | Hs.283127 |
PK428 | CDC42 binding protein kinase α | NM_003607 | Hs.18586 |
14-Gene signature | |||
MS4A7 | Membrane-spanning 4-domains | AF201951 | Hs.11090 |
TGFB3 | Transforming growth factor, β 3 | NM_003239 | Hs.2025 |
BBC3 | BCL2 binding component 3 | U82987 | Hs.87246 |
AP2B1 | Adaptor-related protein complex 2 | NM_001282 | Hs.74626 |
ALDH4A1 | Aldehyde dehydrogenase 4 family, member A1 | NM_003748 | Hs.77448 |
FLJ11190 | Chromosome 20, open reading frame 46 | NM_018354 | Hs.155071 |
DC13 | DC13 protein | NM_020188 | Hs.6879 |
GMPS | Guanine monophosphate synthetase | NM_003875 | Hs.5398 |
AKAP2 | A kinase (PRKA) anchor protein | Contig57258_RC | Hs.42322 |
DCK | Deoxycytidine kinase | NM_000788 | Hs.709 |
ECT2 | Epithelial cell transforming sequence 2 | Contig25991 | Hs.122579 |
EST | ESTs, weakly similar to quiescin | Contig38288_RC | |
OXCT | 3-Oxoacid CoA transferase | NM_000436 | Hs.177584 |
EXT1 | Exostoses (multiple) 1 | NM_000127 | Hs.184161 |
Gene . | Description . | Microarray ID . | UniGene ID . |
---|---|---|---|
5-Gene signature | |||
EST | Unknown | Contig63649_RC | |
L2DTL | RA-regulated nuclear matrix-associated protein | NM_016448 | Hs.126774 |
DCK | Deoxycytidine kinase | NM_000788 | Hs.709 |
DKFZP564D0462 | G protein-coupled receptor 126 | AL080079 | Hs.44197 |
LOC286052 | Hypothetical protein LOC286052 | AA555029_RC | Hs.100691 |
3-Gene signature | |||
GNAZ | Guanine nucleotide-binding protein | NM_002073 | Hs.92002 |
PK428 | CDC42-binding protein kinase α | NM_003607 | Hs.18586 |
LYRIC | LYRIC/3D3 | AK000745 | Hs.243901 |
Gene . | Description . | Microarray ID . | UniGene ID . |
---|---|---|---|
5-Gene signature | |||
EST | Unknown | Contig63649_RC | |
L2DTL | RA-regulated nuclear matrix-associated protein | NM_016448 | Hs.126774 |
DCK | Deoxycytidine kinase | NM_000788 | Hs.709 |
DKFZP564D0462 | G protein-coupled receptor 126 | AL080079 | Hs.44197 |
LOC286052 | Hypothetical protein LOC286052 | AA555029_RC | Hs.100691 |
3-Gene signature | |||
GNAZ | Guanine nucleotide-binding protein | NM_002073 | Hs.92002 |
PK428 | CDC42-binding protein kinase α | NM_003607 | Hs.18586 |
LYRIC | LYRIC/3D3 | AK000745 | Hs.243901 |
Acknowledgments
We thank Dr. Janet E. Price (The University of Texas M. D. Anderson Cancer Center, Houston, TX) for providing human breast carcinoma cell lines.