We previously reported a multivariate Cox proportional hazards model for breast cancer survival that included pathological nodal status, as well as the following molecular features derived from IHC or FISH analysis: (1) MYC, (2) PGR and BCL2 as interacting factors dependent on ER status, (3) BCL2 and TP53 as interacting factors, and (4) ER dependent on ERBB2 status. The data showed the importance of assessing certain markers in the context of their interactions with and/or dependencies upon other markers to maximize their predictive value. A prognostic model has now been developed in hormone receptor-positive patients based on 5-year overall survival using the features from the multivariate Cox model. Due to the marker interactions, a non-linear regression modeling technique (kernel partial least squares third-order polynomial) was used instead of standard logistic regression or simple discriminant analysis. Leave-one-out cross-validation was used to confirm model generalization. The model achieved an area under the receiver operating characteristic curve of 0.90, significantly outperforming current standards that categorize patients into low, medium, and high risk categories: the Nottingham Prognostic Index (NPI) (0.70), the 2005 St. Gallen’s guidelines (0.70), the 2003 St. Gallen’s guidelines (0.62), and the NIH guidelines (<0.62). The model had a 36% lower false-negative rate (patients predicted to survive who ended up dying) and a 54% lower false-positive rate (patients predicted to die who ended up surviving) compared with the NPI at operating points of 81% specificity and 81% sensitivity (representing the NPI risk thresholds), respectively. This translated into a significantly lower number of patients in the ambiguous “medium risk” category. Using 81% sensitivity as a single cut-off, Kaplan-Meier survival analysis also revealed the superiority of our model in accurately separating patients into good and bad prognosis categories. Independent data sets have been obtained to validate the model and to explore its utility and expansion to patients treated with chemotherapy, hormone therapy plus chemotherapy, or no systemic therapy. Several prognostic models based on gene expression profiles have been reported. However, all, or nearly all, suffer from data overfitting, lack of validation, and/or a failure to report their performance relative to the more sophisticated NPI or 2005 St. Gallen standards. Beyond these statistical issues, gene expression assays cannot assess functional protein levels or mislocalization, and the results can be affected by adjacent non-tumor tissue, making them relatively complicated and costly. We feel that our advanced informatic analysis of data derived from conventional, cost-effective assays can provide better performance without these drawbacks, positively contributing to quality of life and life expectancy in cancer patients.

[Proc Amer Assoc Cancer Res, Volume 47, 2006]