Gene expression patterns show promise in estimating prognosis and directing adjuvant therapy, but its significance in guiding axillary treatment is sparsely evaluated. We aimed to identify predictors for nodal status based on gene expression patterns alongside clinicopathological characteristics, and to validate the performances as well as the prognostic importance of the predictors in a population-based context.

Material and Methods

The study assigned consecutive patients with primary breast cancer enrolled in the SCAN-B study ( ID: NCT02306096)in South Sweden between September 2010-March 2015. Exclusion criteria were: prior breast cancer, neoadjuvant therapy or unknown nodal status after surgical staging. Data on age, tumour size, multifocality, vascular invasion, NHG and ER/PR/HER2 status were retrieved. 3026 patients were successfully profiled by RNA sequencing (RNA-seq) forming the study analysis cohort. Patients enrolled during 2011 (n=1206) were excluded from predictor training/test sets and kept as an independent validation set. Seven machine-based learning algorithms were evaluated for all samples and for each of the molecular subtypes based on routine analysis: ER+/HER2-, HER2+ and TNBC. Primary outcome was discrimination (AUC) for N0/N+ based on either clinicopathological parameters, RNA-seq data or mixed data. Secondary outcome was to evaluate the prognostic value of the predictors. Kaplan-Meier estimates were used to portray univariate survival data in subgroups stratified by nodal status.


The Swedish National Quality Registry for Breast Cancer revealed 5235 patients eligible for study inclusion, of which 89% were enrolled in the SCAN-B study. Distribution of clinicopathological characteristics for the 3026 RNA-sequenced patients reflected features in the catchment region, and were similar for the training/test sets (n = 1820) as well as the validation set (n = 1206). Mean AUCs from 10 iterative assessments in the training/test sets identified Generalized Boosted Regression Models having the highest performance. AUCs for clinicopathological predictors in the validation set were 0.73, 0.75, 0.71 and 0.66 for all samples, ER+/HER2-, HER2+ and TNBC, respectively. Corresponding AUCs for gene expression predictors were 0.66, 0.66, 0.62 and 0.57, respectively, while the best predictive performances were achieved with mixed predictors, revealing AUCs 0.75, 0.75, 0.78 and 0.68, respectively. Preliminary results indicated prognostic value of the predictors; patients with stated N0 but predicted N+ by the models had worse survival rates. On the contrary, a trend towards better survival was observed for those with stated N+ but predicted N0 by the models.


Subgroup-specific predictors for nodal status based on gene expression data alongside traditional clinicopathological characteristics were developed, and independently validated regarding performance and prognostic value, in a population-based breast cancer cohort. Integrating gene expression data in the preoperative setting may improve decision-making on the required extent of axillary surgery and systemic therapy needed.

Citation Format: Dihge L, Staaf J, Vallon-Christersson J, Hegardt C, Häkkinen J, Borg Å, Rydén L. Predictors of axillary nodal metastasis based on gene expression and clinicopathological characteristics: Data from a population-based prospective study, the Sweden Cancerome Analysis Network – Breast (SCAN-B) [abstract]. In: Proceedings of the 2017 San Antonio Breast Cancer Symposium; 2017 Dec 5-9; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2018;78(4 Suppl):Abstract nr PD2-08.