Purpose: To compare clinical, immunohistochemical (IHC), and gene expression models of prognosis applicable to formalin-fixed, paraffin-embedded blocks in a large series of estrogen receptor (ER)–positive breast cancers from patients uniformly treated with adjuvant tamoxifen.

Experimental Design: Quantitative real-time reverse transcription-PCR (qRT-PCR) assays for 50 genes identifying intrinsic breast cancer subtypes were completed on 786 specimens linked to clinical (median follow-up, 11.7 years) and IHC [ER, progesterone receptor (PR), HER2, and Ki67] data. Performance of predefined intrinsic subtype and risk-of-relapse scores was assessed using multivariable Cox models and Kaplan-Meier analysis. Harrell's C-index was used to compare fixed models trained in independent data sets, including proliferation signatures.

Results: Despite clinical ER positivity, 10% of cases were assigned to nonluminal subtypes. qRT-PCR signatures for proliferation genes gave more prognostic information than clinical assays for hormone receptors or Ki67. In Cox models incorporating standard prognostic variables, hazard ratios for breast cancer disease-specific survival over the first 5 years of follow-up, relative to the most common luminal A subtype, are 1.99 [95% confidence interval (CI), 1.09-3.64] for luminal B, 3.65 (95% CI, 1.64-8.16) for HER2-enriched subtype, and 17.71 (95% CI, 1.71-183.33) for the basal-like subtype. For node-negative disease, PAM50 qRT-PCR–based risk assignment weighted for tumor size and proliferation identifies a group with >95% 10-year survival without chemotherapy. In node-positive disease, PAM50-based prognostic models were also superior.

Conclusion: The PAM50 gene expression test for intrinsic biological subtype can be applied to large series of formalin-fixed, paraffin-embedded breast cancers, and gives more prognostic information than clinical factors and IHC using standard cut points. Clin Cancer Res; 16(21); 5222–32. ©2010 AACR.

Translational Relevance

Molecular intrinsic subtyping reveals the major biological categories of breast cancer. Herein, we show adaptation of a 50-gene intrinsic subtyping signature for testing standard paraffin blocks. Using a large, homogeneously treated cohort of breast cancer patients, we directly compare gene expression results with high-quality clinical and central immunohistochemical data. We show the PAM50 approach to be superior as a prognostic test, specifically able to identify an ultralow-risk group who may not need chemotherapy. Based on these results, intrinsic subtyping tests are now being applied to randomized clinical trials series in Canada and the United States to assess predictive capacity (already under way for response to endocrine therapy, anthracyclines, and taxanes, with further studies under consideration). Should such studies prove a predictive value for intrinsic subtyping, this test could be clinically implemented in a similar form, as it has been designed for application on standard laboratory specimens.

Several gene expression technologies and statistical models have reported methodologies to identify breast cancer patients with estrogen receptor–positive (ER+), node-negative (N0) disease that may be adequately managed with 5 years of tamoxifen monotherapy (15). However, these studies often included patients with tumors already associated with established low-risk biomarkers, for example, low-grade histology, low Ki67-based proliferation index, and favorable surgical stage. It therefore remains controversial whether genomic assays should be applied routinely, or whether surgical stage and a limited number of immunohistochemical (IHC) markers will, in most cases, be adequate and less costly (6).

The clinical significance of continued efforts in this area is relevant for decisions about both chemotherapy and endocrine agents, as patients at low risk after 5 years of tamoxifen monotherapy could be spared the morbidity associated with extended aromatase inhibitor therapy (7). Studies that address this issue are few because extremely long follow-up and information on breast cancer–specific mortality are required. Furthermore, because frozen tumor archives are unavailable from suitably large patient populations, gene expression technologies must be applicable to degraded RNA extracted from formalin-fixed, paraffin-embedded tissues that are necessarily more than a decade old.

Our group has assembled and published several technological and statistical approaches to address prognosis in ER+ breast cancer. We therefore sought to compare clinicopathologic, IHC, and molecular methodologies in a single independent test set to identify the best approach. Importantly, we focused on fixed statistical models that were previously trained on independent data sets to avoid overoptimistic results. The models we report in this article include the use of standard pathologic factors, such as centrally reviewed histologic grade, as incorporated into Adjuvant! Online (8), models based on IHC for biomarkers of intrinsic subtypes (6), and a gene expression assay using 50 genes (PAM50). The latter represents a reduced gene set, amenable to assay by techniques such as quantitative real-time reverse transcription-PCR (qRT-PCR), which accurately identifies the major intrinsic biological subtypes of breast cancer and generates risk-of-relapse (ROR) scores (9). The investigation used a large independent cohort of formalin-fixed, paraffin-embedded pathology specimens from patients with ER+ breast cancer, all M0 but otherwise representing a spectrum of T and N stages including a large fraction of node-positive (N+) patients. All patients received adequate local treatment, 5 years of tamoxifen therapy but no adjuvant chemotherapy, and were followed for relapse-free survival (RFS) and disease-specific survival (DSS) for over a decade.

Patient and sample characteristics

The study cohort was accrued from female patients with invasive breast cancer, diagnosed in British Columbia between 1986 and 1992. Cancer tissue from these patients had been frozen and shipped to Vancouver Hospital for central ER and progesterone receptor (PR) testing by dextran-charcoal–coated (DCC) ligand-binding assay. The PAM50 assay was conducted on the portion of this tissue that was formalin fixed and paraffin embedded for histologic correlation. Characteristics of this cohort have been previously described (6), and the same source blocks were used to assemble tissue microarrays for previously published studies on ER (10), HER2 (11), PR (12), Ki67, cytokeratin 5/6, and epidermal growth factor receptor (6, 13). Quantitative ER was determined using the Ariol automated digital imaging system (14), and the same method was applied for PR. For this study, we selected samples from patients with ER+ tumors by IHC who had received tamoxifen as their only adjuvant systemic therapy. Provincial guidelines from that time period recommended tamoxifen for women >50 years of age, whose ER status was positive or unknown, and who were either node positive or had lymphovascular invasion. Cohort identification and sample selection for this study are summarized as per REMARK criteria (15) in Supplementary Table S1.

RNA preparation, qRT-PCR, and assignment of biological subtype and ROR score

H&E sections from each block were reviewed by a pathologist (T.O.N.). Areas containing representative invasive breast carcinoma were selected and circled on the source block. Using a 1.0-mm punch needle, at least two tumor cores were extracted from the circled area. Details of RNA preparation from paraffin cores, the qRT-PCR assay for the PAM50 panel and reference genes, and how these results allow assignment into luminal A, luminal B, HER2-enriched, and basal-like subtypes, and the independently trained ROR-S (ROR based on subtype), ROR-T (ROR based on tumor size weighted model), ROR-P (ROR based on proliferation weighted model), and ROR-PT (ROR based on proliferation and tumor size weighted) risk score assignments are presented in Supplementary Materials and Methods. For clarity, the term ROR-T is now used for the same model described in our earlier publication as ROR-C (“clinical”; ref. 9).

Relation of clinicopathologic factors, intrinsic subtypes, and ROR scores to clinical outcome

Statistical analyses were conducted using SPSS v16.0 and R v2.8.0. Univariable analyses of tumor subtype against breast cancer RFS and DSS were done by Kaplan-Meier analysis with log-rank test. Multivariable analyses were done against the standard clinical parameters of tumor size, nodal status, histologic grade, patient age, and HER2 status. HER2 scores were centrally determined based on assay of adjacent cores from the same source blocks, assembled into tissue microarrays, and subjected to IHC and fluorescence in situ hybridization (FISH) analysis using clinical-equivalent protocols (11). Cox regression models (16) were built to estimate the adjusted hazard ratios of the qRT-PCR–assigned breast cancer subtypes, as well as ROR scores categorized by published cut points and as a continuous variable. IHC-based subtypes were assigned as previously defined (6). The online decision-making tool Adjuvant! Online (http://www.adjuvantonline.com), previously validated on the British Columbia population cohort (8), was used to generate breast cancer RFS and DSS estimates for each patient in this cohort. Only cases with information for all the covariates were included in the analyses. Smoothed plots of weighted Schoenfeld residuals were used to assess proportional hazard assumptions (17), and time stratifications were used where hazards were not proportional over the entire follow-up period.

The C-index (concordance index; ref. 18) is defined as the probability that risk assignments to members of a random pair are accurately ranked according to their prognosis. The number of concordant pairs (order of failure and risk assignment agree), discordant pairs (order of failure and risk assignment disagree), and uninformative pairs are tabulated to calculate the measure. C-index values of 0.5 indicate random prediction, and higher values indicate increasing prediction accuracy. Variability in the C-index for each predictor and P values from comparisons were estimated from 1,000 bootstrap samples of the risk assignments. Calculation was done using the rcorr.cens function implemented in the Hmisc (19) library for R statistical software version 2.8.1 (http://www.R-project.org).

Intrinsic subtyping of ER+, tamoxifen-treated breast cancer using the PAM50 assay

RNA was extracted from pathologist-guided tissue cores from 991 formalin-fixed, paraffin-embedded breast cancer specimens. Eight hundred and eleven samples yielded sufficient RNA for analysis (at least 1.2 μg total RNA at a concentration of ≥25 ng/μL). Template was technically sufficient in 786 cases, based on all internal housekeeper gene controls being expressed in the sample above background. Clinical characteristics for the patients included in the PAM50 analysis are presented in Table 1 (Supplementary Tables S2 and S3 provide details stratified by node status). Based on the nearest PAM50 centroid algorithm, intrinsic breast cancer subtypes were assigned using gene expression as follows: 372 samples (47.3%) were luminal A, 329 (41.9%) luminal B, 64 (8.1%) HER2 enriched, 5 (0.6%) basal-like, and 16 (2.0%) normal-like. Thus, although all cases in this study were positive for ER by centrally assessed IHC analysis on a tissue microarray (10), and 98.8% were also positive by DCC biochemical assay (Table 1), the gene expression panel nevertheless assigned 9% of cases into nonluminal subtypes, mostly HER2 enriched. This phenomenon has been previously observed when interrogating published data sets for expression of the PAM50 genes (9). For the 16 cases assigned as normal-like, histology was reviewed from adjacent tissue cores, and in 14 of 16 cases, invasive cancer cells were absent or rare. Normal-like cases were therefore excluded from outcome analyses, as a breast cancer subtype could not be confidently assigned due to insufficient tumor content.

Table 1.

Clinical characteristics of the whole cohort

Clinical parameterTotalPAM50 subtype (all N = 786)
Luminal ALuminal BHER2BasalNormal
Sample size n 786 372 329 64 16 
Follow-up times in recurrence-free patients (y) Median (min-max) 9.7 (0.12-18) 12 (0.25-18) 7.6 (0.12-18) 7.3 (0.47-18) 2.3 (0.6-4.1) 13 (3.2-18) 
Follow-up-times in disease-specific surviving patients (y) Median (min-max) 12 (0.55-18) 13 (0.57-18) 10 (0.64-18) 8.8 (0.55-18) 5 (1.6-16) 14 (3.2-18) 
Age (y) Median 67 67 68 66.5 65 68.5 
Premenopausal Yes 20 10 
No 752 358 314 62 14 
Unknown/pregnant 14 
Surgery Complete mastectomy 468 210 203 39 11 
Partial mastectomy 306 159 119 23 
Other 12 
Axillary node dissection Yes 745 349 313 62 16 
No 41 23 16 
Radiation therapy Yes 419 207 164 40 
No 367 165 165 24 
Tumor size (cm) Median 2.1 2.0 2.5 2.5 3.5 2.3 
T stage (clinical) T0/IS 
T1 331 180 118 27 
T2 380 165 179 28 
T3 18 
T4 34 10 17 
TX 22 10 
No. positive nodes 222 95 97 19 10 
1-3 360 182 148 26 
4-9 125 55 53 12 
10+ 26 10 14 
Unknown 53 30 17 
Grade Grade 1: well differentiated 34 25 
Grade 2: moderately differentiated 338 186 129 14 
Grade 3: poorly differentiated 370 135 179 48 
Unknown 44 26 16 
Histologic subtype Ductal NOS 708 329 302 60 13 
Lobular 61 32 21 
Mucinous 
Tubular 
Medullary 
Apocrine 
Lymphovascular invasion Yes 485 210 220 44 
No 262 139 94 19 
Unknown 39 23 15 
Clinical ER status (DCC ligand-binding assay) Missing 
Negative (0-9 fmol/mg) 
Positive (>10 fmol/mg) 768 364 324 60 15 
Median (fmol/mg) 254.5 255.5 327.0 74.0 32.0 54.0 
Clinical PR status (DCC ligand-binding assay) Missing 161 84 53 15 
Negative (0-9 fmol/mg) 72 15 39 18 
Positive (>10 fmol/mg) 553 273 237 31 
Median (fmol/mg) 129 202 84.5 17 153 239 
IHC HER2 with FISH correction on 2+ cases Negative 696 348 294 34 16 
Positive 75 15 30 29 
Unknown 15 
Clinical parameterTotalPAM50 subtype (all N = 786)
Luminal ALuminal BHER2BasalNormal
Sample size n 786 372 329 64 16 
Follow-up times in recurrence-free patients (y) Median (min-max) 9.7 (0.12-18) 12 (0.25-18) 7.6 (0.12-18) 7.3 (0.47-18) 2.3 (0.6-4.1) 13 (3.2-18) 
Follow-up-times in disease-specific surviving patients (y) Median (min-max) 12 (0.55-18) 13 (0.57-18) 10 (0.64-18) 8.8 (0.55-18) 5 (1.6-16) 14 (3.2-18) 
Age (y) Median 67 67 68 66.5 65 68.5 
Premenopausal Yes 20 10 
No 752 358 314 62 14 
Unknown/pregnant 14 
Surgery Complete mastectomy 468 210 203 39 11 
Partial mastectomy 306 159 119 23 
Other 12 
Axillary node dissection Yes 745 349 313 62 16 
No 41 23 16 
Radiation therapy Yes 419 207 164 40 
No 367 165 165 24 
Tumor size (cm) Median 2.1 2.0 2.5 2.5 3.5 2.3 
T stage (clinical) T0/IS 
T1 331 180 118 27 
T2 380 165 179 28 
T3 18 
T4 34 10 17 
TX 22 10 
No. positive nodes 222 95 97 19 10 
1-3 360 182 148 26 
4-9 125 55 53 12 
10+ 26 10 14 
Unknown 53 30 17 
Grade Grade 1: well differentiated 34 25 
Grade 2: moderately differentiated 338 186 129 14 
Grade 3: poorly differentiated 370 135 179 48 
Unknown 44 26 16 
Histologic subtype Ductal NOS 708 329 302 60 13 
Lobular 61 32 21 
Mucinous 
Tubular 
Medullary 
Apocrine 
Lymphovascular invasion Yes 485 210 220 44 
No 262 139 94 19 
Unknown 39 23 15 
Clinical ER status (DCC ligand-binding assay) Missing 
Negative (0-9 fmol/mg) 
Positive (>10 fmol/mg) 768 364 324 60 15 
Median (fmol/mg) 254.5 255.5 327.0 74.0 32.0 54.0 
Clinical PR status (DCC ligand-binding assay) Missing 161 84 53 15 
Negative (0-9 fmol/mg) 72 15 39 18 
Positive (>10 fmol/mg) 553 273 237 31 
Median (fmol/mg) 129 202 84.5 17 153 239 
IHC HER2 with FISH correction on 2+ cases Negative 696 348 294 34 16 
Positive 75 15 30 29 
Unknown 15 

Abbreviation: NOS, not otherwise specified.

The intrinsic biological subtypes were strongly prognostic by Kaplan-Meier analysis (Fig. 1A and B). In the British Columbia population at the time these samples were originally acquired, many patients with a clinically low-risk profile received no adjuvant systemic therapy (8). In contrast, those receiving adjuvant tamoxifen (the subjects of this study) had tumors that were mostly node positive and high grade, exhibited lymphovascular invasion, and therefore constitute a higher-risk group with overall 10-year RFS of 62% and DSS of 72%. Those assigned by the PAM50 assay to luminal A status had a significantly better outcome (10-year RFS, 74%; DSS, 83%) than luminal B, HER2-enriched, or basal-like tumors (Fig. 1A for RFS and Fig. 1B for DSS). The ROR algorithms (9) were originally trained on microarray data from N0 patients who received no adjuvant systemic therapy, and have not previously been applied to a population homogeneously treated with adjuvant tamoxifen, nor to a series containing large numbers of N+ cases, nor to the endpoint of DSS. In this data set, ROR-S (a model based solely on gene expression) nevertheless showed performance consistent with our previous report (Fig. 1C and D). Multivariable Cox models were constructed to test the independent value of PAM50 subtyping against standard clinical and pathologic factors including age, histologic grade, lymphovascular invasion, HER2 expression, nodal status, and tumor size. To meet proportional hazard assumptions, multivariable models were assessed with the time axis split at 5 years (20), as HER2-enriched and basal-like tumors (Fig. 1A and B) and ROR-S high category tumors (Fig. 1C and D) had a much higher event rate in the first 5 years than subsequently. The intrinsic biological subtype and ROR-S remained significant in the multivariable models for DSS (Table 2) and RFS (Supplementary Table S4), particularly in the first 5 years, as did pathologic staging variables (tumor size and node status). However, histologic grade, lymphovascular invasion, and clinical HER2 status, significant in univariable analysis in this cohort, no longer contributed significant independent prognostic information when the multivariable analysis included the PAM50 assignments.

Fig. 1.

Kaplan-Meier survival analysis of intrinsic subtype (A and B) and ROR-S (C and D), as determined by PAM50 gene expression measurement by qRT-PCR done on paraffin blocks from women with invasive breast carcinoma, treated with adjuvant tamoxifen. The number of events and total number of patients in each group are shown beside the description of each curve. B and D, breast cancer DSS (excludes two cases with unknown cause of death).

Fig. 1.

Kaplan-Meier survival analysis of intrinsic subtype (A and B) and ROR-S (C and D), as determined by PAM50 gene expression measurement by qRT-PCR done on paraffin blocks from women with invasive breast carcinoma, treated with adjuvant tamoxifen. The number of events and total number of patients in each group are shown beside the description of each curve. B and D, breast cancer DSS (excludes two cases with unknown cause of death).

Close modal
Table 2.

Cox model multivariable analysis of breast cancer DSS among ER+, tamoxifen-treated women, incorporating standard clinicopathologic factors and (A) intrinsic subtype or (B) ROR-S, as determined by PAM50 qRT-PCR measurements

Clinical endpointMultivariable DSS (0-5 y of follow-up)Multivariable DSS (5 y to end of follow-up)
Hazard ratio (95% CI)PHazard ratio (95% CI)P
A. Intrinsic subtype 
    Age 1.02 (0.99-1.05) 0.2665 1.00 (0.98-1.02) 0.9786 
    Grade (1-2 vs 3) 1.51 (0.87-2.60) 0.1405 1.05 (0.71-1.56) 0.8109 
    Lymphovascular invasion 1.02 (0.58-1.81) 0.9421 1.16 (0.77-1.75) 0.4852 
    HER2 (IHC) 1.50 (0.77-2.91) 0.2314 0.82 (0.40-1.69) 0.5968 
    Node status (0 as reference group)  <0.0001  0.0012 
        1-3 2.07 (0.95-4.54)  1.54 (0.96-2.47)  
        4+ 5.80 (2.64-12.71)  2.78 (1.60-4.82)  
    Tumor size (T1 as reference group)  0.049  0.0002 
        T2 1.22 (0.71-2.09)  1.62 (1.08-2.42)  
        T3 3.92 (1.50-10.22)  5.11 (1.78-14.62)  
        T4 1.31 (0.38-4.50)  4.02 (1.85-8.74)  
    Subtype (luminal A as reference group)  0.0018  0.0381 
        Luminal B 1.99 (1.09-3.64)  1.70 (1.13-2.55)  
        HER2 enriched 3.65 (1.64-8.16)  1.52 (0.72-3.18)  
        Basal-like 17.71 (1.71-183.33)  NA  
B. ROR-S 
    Age 1.02 (0.99-1.05) 0.2676 1.00 (0.98-1.02) 0.9089 
    Grade (1-2 vs 3) 1.36 (0.79-2.36) 0.2674 1.01 (0.68-1.51) 0.9588 
    Lymphovascular invasion 0.95 (0.54-1.66) 0.852 1.18 (0.78-1.79) 0.4299 
    HER2 (IHC) 1.46 (0.77-2.77) 0.2467 0.87 (0.43-1.77) 0.6964 
        Node status (0 as reference group)  <0.0001  0.0014 
        1-3 2.14 (1.00-4.60)  1.55 (0.97-2.48)  
        4+ 6.03 (2.79-13.05)  2.78 (1.59-4.86)  
    Tumor size (T1 as reference group)  0.0647  0.0003 
        T2 1.19 (0.70-2.05)  1.64 (1.10-2.45)  
        T3 3.34 (1.32-8.43)  3.69 (1.30-10.46)  
        T4 0.90 (0.25-3.19)  4.44 (2.01-9.78)  
    ROR-S (low as reference group)  <0.0001  0.0388 
        Med 2.04 (0.89-4.66)  1.86 (1.15-3.00)  
        High 6.48 (2.56-16.40)  1.57 (0.71-3.46)  
Clinical endpointMultivariable DSS (0-5 y of follow-up)Multivariable DSS (5 y to end of follow-up)
Hazard ratio (95% CI)PHazard ratio (95% CI)P
A. Intrinsic subtype 
    Age 1.02 (0.99-1.05) 0.2665 1.00 (0.98-1.02) 0.9786 
    Grade (1-2 vs 3) 1.51 (0.87-2.60) 0.1405 1.05 (0.71-1.56) 0.8109 
    Lymphovascular invasion 1.02 (0.58-1.81) 0.9421 1.16 (0.77-1.75) 0.4852 
    HER2 (IHC) 1.50 (0.77-2.91) 0.2314 0.82 (0.40-1.69) 0.5968 
    Node status (0 as reference group)  <0.0001  0.0012 
        1-3 2.07 (0.95-4.54)  1.54 (0.96-2.47)  
        4+ 5.80 (2.64-12.71)  2.78 (1.60-4.82)  
    Tumor size (T1 as reference group)  0.049  0.0002 
        T2 1.22 (0.71-2.09)  1.62 (1.08-2.42)  
        T3 3.92 (1.50-10.22)  5.11 (1.78-14.62)  
        T4 1.31 (0.38-4.50)  4.02 (1.85-8.74)  
    Subtype (luminal A as reference group)  0.0018  0.0381 
        Luminal B 1.99 (1.09-3.64)  1.70 (1.13-2.55)  
        HER2 enriched 3.65 (1.64-8.16)  1.52 (0.72-3.18)  
        Basal-like 17.71 (1.71-183.33)  NA  
B. ROR-S 
    Age 1.02 (0.99-1.05) 0.2676 1.00 (0.98-1.02) 0.9089 
    Grade (1-2 vs 3) 1.36 (0.79-2.36) 0.2674 1.01 (0.68-1.51) 0.9588 
    Lymphovascular invasion 0.95 (0.54-1.66) 0.852 1.18 (0.78-1.79) 0.4299 
    HER2 (IHC) 1.46 (0.77-2.77) 0.2467 0.87 (0.43-1.77) 0.6964 
        Node status (0 as reference group)  <0.0001  0.0014 
        1-3 2.14 (1.00-4.60)  1.55 (0.97-2.48)  
        4+ 6.03 (2.79-13.05)  2.78 (1.59-4.86)  
    Tumor size (T1 as reference group)  0.0647  0.0003 
        T2 1.19 (0.70-2.05)  1.64 (1.10-2.45)  
        T3 3.34 (1.32-8.43)  3.69 (1.30-10.46)  
        T4 0.90 (0.25-3.19)  4.44 (2.01-9.78)  
    ROR-S (low as reference group)  <0.0001  0.0388 
        Med 2.04 (0.89-4.66)  1.86 (1.15-3.00)  
        High 6.48 (2.56-16.40)  1.57 (0.71-3.46)  

NOTE: P values for multilevel categorical variables are derived from likelihood ratio tests between models with and without each these variables.

Abbreviation: 95% CI, 95% confidence interval.

Comparisons between gene expression and clinical assays for hormone receptors and proliferation

In a case that is ER+ by IHC, additional information about hormone receptor expression can be obtained in several ways, including DCC ligand-binding assay, quantitative IHC for ER, or equivalent measures of PR. Most published assays for breast cancer prognosis in ER+ disease include tumor growth rate as one of the parameters in the statistical model, and this data set was previously assessed in detail for IHC Ki67 index (6). The PAM50 qRT-PCR data allow detailed quantitative assessment of the functionality of the estrogen response pathway (8-gene luminal signature) as well as a proliferation signature based on the mean expression of 11 genes linked to cell cycle progression (trained on published data, as per Supplementary Materials and Methods). The availability of all these measurements (10) provides an opportunity to determine which approach most accurately captures the prognostic effect of estrogen pathway biomarkers and tumor growth rate in a direct comparison (Fig. 2). Given a randomly selected pair of subjects, C-index is the probability that the patient assigned the more extreme risk score actually has a worse prognosis. A value of 0.5 indicates discrimination that is no better than chance prediction, and a value of 1 indicates perfect discrimination of samples. Using the C-index to compare prognostic capacity in this uniformly tamoxifen-treated cohort, the combination of luminal genes measured by the PAM50 yields more prognostic information than other methods of hormone receptor analysis, but the differences are not significant. Although Ki67 index by IHC seems to outdo quantitative ER, the proliferation signature provides the most robust approach for the prediction of both RFS and DSS (Fig. 2; Supplementary Table S5). Multivariable analysis indicated that the Ki67 IHC assay did not contribute significant independent information to prognostic models for either N0 or N+ breast cancer patients when information on the proliferation signature is included (Supplementary Table S6).

Fig. 2.

C-index estimates of RFS and DSS for different measures of hormone receptors and proliferation. The luminal and proliferation measures are the means of normalized qRT-PCR values across 8- and 11-signature genes, respectively, as described in Supplementary Materials and Methods. P values were estimated from 1,000 bootstrap samples. Single asterisk (*) designates significant improvement (P < 0.05) in C-index relative to clinical quantitative ER by DCC ligand-binding assay. Double asterisk (**) designates significant improvement (P < 0.05) in C-index relative to visual quantitative Ki67 index.

Fig. 2.

C-index estimates of RFS and DSS for different measures of hormone receptors and proliferation. The luminal and proliferation measures are the means of normalized qRT-PCR values across 8- and 11-signature genes, respectively, as described in Supplementary Materials and Methods. P values were estimated from 1,000 bootstrap samples. Single asterisk (*) designates significant improvement (P < 0.05) in C-index relative to clinical quantitative ER by DCC ligand-binding assay. Double asterisk (**) designates significant improvement (P < 0.05) in C-index relative to visual quantitative Ki67 index.

Close modal

Comparison of fixed models of prognosis in N0 breast cancer

For formal model comparisons, data were generated on four fixed approaches, without any element of training within the test set: (a) clinical model based on Adjuvant! Online, (b) IHC-based (incorporating data on Ki67 and HER2), (c) the ROR-S approach based on PAM50 gene expression alone, and (d) the proliferation signature alone and as incorporated into the ROR-P risk model using a β coefficient weighting for proliferation (described in Supplementary Materials and Methods). Adjuvant! Online incorporates full tumor size staging information; to account for the influence of tumor size, the biomarker models were also weighted by a β coefficient (T) that incorporated the prognostic information associated with T1 status versus higher T stage (the level of detail available in the independent training sets). This approach created IHC-T, ROR-T, and ROR-PT models. In addition, the strong independent influence of N stage was accounted for by conducting the analysis separately in the N0 and N+ populations. C-index assessments showed superiority of the biomarker models over the purely clinical Adjuvant! Online model in the N0 population, with the ROR-PT approach providing the most prognostic information (Fig. 3A). In multivariable analysis, the addition of ROR-P to a model of ROR-S results in a significant increase in explained prognostic variation (RFS, P = 0.0032; DFS, P = 0.0015); ROR-PT is also significant after conditioning on ROR-S (RFS, P = 0.0023; DFS, P = 0.0015) but not ROR-P (RFS, P = 0.12; DFS, P = 0.13). A continuous score based on ROR-PT was generated to translate the data into an individual RFS and DSS risk assessment tool (Fig. 3B). Kaplan-Meier analysis illustrates the ability of the ROR-PT model to identify patients who have an extremely high chance (>95%) of remaining disease-free (Fig. 3C) and alive beyond 10 years (Fig. 3D). In contrast, our previously published IHC model (6) could not identify a group with sufficiently favorable outcomes that 5 years of tamoxifen might be considered adequate treatment (i.e., <90% 10-year RFS; Fig. 3E and F).

Fig. 3.

Comparison of prognostic classifiers in N0 subjects. A, the C-index is used to compare accuracy of the prognostic classifiers (Supplementary Table S5). Asterisks denote significant improvement (P < 0.05) in C-index relative to the clinical model (Adjuvant!) (*), or relative to the IHC-T model (**). B, taking the best-performing model, ROR-PT values are related to actual 10-year event probabilities using a Cox proportional hazard model (dotted lines are 95% confidence interval). C and D, Kaplan-Meier survival analysis of the size and proliferation weighted ROR (ROR-PT) assignments. E and F, comparable information provided by a model of IHC subtype and tumor size. D and F, breast cancer DSS (excludes two cases with unknown cause of death).

Fig. 3.

Comparison of prognostic classifiers in N0 subjects. A, the C-index is used to compare accuracy of the prognostic classifiers (Supplementary Table S5). Asterisks denote significant improvement (P < 0.05) in C-index relative to the clinical model (Adjuvant!) (*), or relative to the IHC-T model (**). B, taking the best-performing model, ROR-PT values are related to actual 10-year event probabilities using a Cox proportional hazard model (dotted lines are 95% confidence interval). C and D, Kaplan-Meier survival analysis of the size and proliferation weighted ROR (ROR-PT) assignments. E and F, comparable information provided by a model of IHC subtype and tumor size. D and F, breast cancer DSS (excludes two cases with unknown cause of death).

Close modal

Comparison of fixed models of prognosis in N+ breast cancer

For N+ disease, C-index analysis (Fig. 4A) supports the conclusion that the ROR-T score produces the best prognostic model; in contrast to N0 disease, the proliferation signature added relatively little information and proliferation weighting (ROR-PT) did not yield a superior model. Adjuvant! Online performed almost as well, but had the advantage of incorporating the actual number of involved lymph nodes. This information was not available in the independent training sets used to build the ROR models, and so could not be used in the current analysis (which can, however, serve to train future models incorporating number of involved lymph nodes). The continuous score model for N+ disease (Fig. 4B) produces a very broad range of prognosis, similar to N0 disease, although few patients have a prognosis in the range where tamoxifen monotherapy for 5 years would be considered sufficient treatment. Although there were large and highly significant differences in survival in ROR-defined risk groups, Kaplan-Meier analysis (Fig. 4C and D) illustrates that even patients in the lowest risk ROR group are still subject to relapses and late deaths from breast cancer, particularly after the 5th year of follow-up. The IHC-based risk model incorporating Ki67 and HER2 also produces a statistically significant prognostic effect for RFS (Fig. 4E) and DSS (Fig. 4F), although these differences are narrower than those achieved by the gene expression–based model.

Fig. 4.

Comparison of prognostic classifiers in N+ subjects. A, C-index comparison of the accuracy of prognostic classifiers as described in Fig. 3. B, Cox proportional hazard model relating the best-performing model (ROR-T) to actual 10-year event probabilities. C and D, Kaplan-Meier survival analysis of the ROR-T assignments for RFS (C) and DFS (D). E and F, comparable information as provided by a model of IHC subtype and tumor size.

Fig. 4.

Comparison of prognostic classifiers in N+ subjects. A, C-index comparison of the accuracy of prognostic classifiers as described in Fig. 3. B, Cox proportional hazard model relating the best-performing model (ROR-T) to actual 10-year event probabilities. C and D, Kaplan-Meier survival analysis of the ROR-T assignments for RFS (C) and DFS (D). E and F, comparable information as provided by a model of IHC subtype and tumor size.

Close modal

Previous studies have established that intrinsic biological signatures are present and have prognostic significance in breast cancer cohorts from multiple different institutions, profiled with several gene expression microarray platforms (2124). To identify these subtypes on standard formalin-fixed, paraffin-embedded pathology specimens, we developed a qRT-PCR test based on a panel of 50 genes (9). The analysis reported here applied this test to a series of paraffin blocks with >15-year detailed follow-up.

Whereas previously assessed cohorts consisted mainly of low-risk women receiving no adjuvant systemic therapy, or were heterogeneously treated, the cases in the current study are all women with ER+ breast cancer who received endocrine therapy as their sole adjuvant treatment, a group of particular clinical importance and contemporary relevance. In this analysis, we sought to compare different technologies for predicting long-term outcomes for such patients. In this study cohort, patients were diagnosed with N+ or higher-risk N0 disease. Only 8% of the N0 population had grade 1 disease and 55% exhibited lymphovascular invasion (Table S2). Under the current standard of care in most countries, the majority of these patients would now be treated with adjuvant chemotherapy (25) and extended endocrine therapy. Using a series of fixed models trained in independent data sets, we compared a standard approach using clinicopathologic information (Adjuvant! Online) with our published luminal B discriminator based on Ki67 and HER2 IHC additionally weighted for T stage (IHC-T), and with PAM50 gene expression–based ROR models weighted for T stage (ROR-T and ROR-PT). In N0 patients, the ROR-PT approach was the most accurate and was able to identify patients in whom 5 years of tamoxifen may be adequate treatment based on the very low late relapse rate in the 5- to 10-year window (Fig. 3C). In N+ disease, the PAM50 approach represents an advance in prognostication, but late relapses and deaths were seen even in the lowest risk group identified using the best ROR model. Unlike in N0 disease, proliferation signature weighting did not improve the C-index in N+ disease.

On this cohort, detailed centrally determined IHC analyses have previously been done and published (6, 1013, 26). C-index, Kaplan-Meier, and Cox model analyses show that IHC approaches do work and provide significant prognostic information. However, the PAM50-based models are superior in terms of adding significant additional information and in their capacity to identify a particularly low-risk group of women.

We view these PAM50 models, derived from archival formalin-fixed RNA, as a potential replacement for grade-, hormone receptor–, Ki67-, and HER2-based prognostic models, but not as a replacement for pathologic stage (as tumor size and nodal status remain independent predictors in multivariable models that include PAM50-based prognostic information). One weakness of our approach is that our current accounting for pathologic stage is oversimplified due to the limited stage distributions and clinical information in our training sets. We analyzed the data as either N0 or N+, and accounted for T stage by categorizing the samples as either T1 or greater. A future aim is to integrate the PAM50 data into the Adjuvant! Online approach (27) to more completely account for the prognostic influence of pathologic stage. To achieve this, we would need to construct a training set that adequately includes all the five categories of T size and four categories of N stage used in Adjuvant! Online to gauge the prognostic weight of these pathologic stage categories in the setting of PAM50 information. Additionally, incorporation of all IHC data as continuous variables in a combined model may improve its prognostic value. The current series contains sufficiently detailed clinical and IHC information to contribute to such detailed comparisons as a training set requiring further validation.

An additional caveat to our study is that the population was strongly biased toward higher-risk breast cancers and so likely underestimates of the number of patients in the broader, N0 population for whom adjuvant tamoxifen would represent adequate treatment. The current generation of adjuvant aromatase inhibitor trials would be an appropriate setting to address the value of our approach further. We accept the possibility that a better model using Ki67 at a different cut point could be developed. However, because we were focused on comparing fixed models, we used our published approach. Further work on the Ki67 model and cut-point optimization will require independent data sets.

In comparison with other signatures such as the recurrence score and genomic grade index (1, 28, 29), the PAM50 has the potential advantage of discriminating high-risk patients into luminal B, HER2-enriched, and basal-like subtypes, who are likely to respond differently to the main systemic therapy options (endocrine, anti-HER2, and anthracycline versus nonanthracycline versus taxane chemotherapy regimens). The assay requires neither frozen tissue (30) nor manual microdissection of cut sections (1), and can be readily applied to standard paraffin blocks including archival tissues from clinical trials. Currently available assays such as Mammaprint (31) and Oncotype DX (32) were optimized to recognize particularly low-risk patients from among a N0 early-stage population who did not receive chemotherapy. Because intrinsic subtyping is designed to identify discriminative biological features of breast cancer, rather than being derived around clinical outcome in a specific population, this approach is particularly likely to extrapolate well onto other patient cohorts (33). The current study shows the ability of PAM50 to recognize a very low-risk prognostic group among women receiving tamoxifen and no chemotherapy, similar to the Oncotype DX assay (34, 35). A direct comparison of different expression profile approaches may become possible in the future through a reanalysis of cohorts with the PAM50 that have already been analyzed by Oncotype DX, because both assays can be applied to the same source material.

Our inability to identify a group of patients with N+ disease in whom 5 years of tamoxifen is adequate is reminiscent of the recent findings from the Southwest Oncology Group, who also found that a molecular signature for good outcome in N0 disease failed in N+ disease in this regard (35). It would be relevant to study a series of patients treated with extended adjuvant aromatase inhibitor therapy, who will have even lower residual risk, as some of the patients in the low-risk N+ group may simply require longer treatment with modern endocrine therapy rather than chemotherapy. The development of new approaches for defining prognosis in N+ disease is also warranted. We have already established the preoperative endocrine prognostic index, which showed that the “on endocrine treatment” Ki67 value is more effective than baseline Ki67 for the identification of patients with clinical stage II and III disease who have excellent long-term outcomes after neoadjuvant endocrine therapy (36). A comparison between Ki67 and the PAM50-based proliferation signature in the neoadjuvant endocrine therapy setting is therefore one logical next step. The applicability of this test to formalin-fixed, paraffin-embedded tissues will make possible its use on large clinical trial archives that address this issue (37). The results of our study highlight the feasibility of measuring multigene expression panels on such series as a means for showing clinical utility using a method readily applicable to prospective clinical samples that provides more prognostic information than clinical or standard IHC approaches.

T.O. Nielsen, C.M. Perou, M.J. Ellis, P.S. Bernard: ownership interest, Bioclassifier LLC; U.S. Patent No. 61/057,508.

We thank current and former members of the British Columbia Cancer Agency's Breast Cancer Outcomes Unit, including S. Chia, K. Gelmon, H. Kennecke, I. Olivotto, and C. Speers, for maintaining the clinical database.

Grant Support: T. Nielsen is a Senior Scholar of the Michael Smith Foundation for Health Research. Grant support was provided by National Cancer Institute (NCI) Strategic Partnering to Evaluate Cancer Signatures grant U01 CA114722-01, Canadian Cancer Society, Huntsman Cancer Institute/Foundation (P.S. Bernard), ARUP Institute for Clinical and Experimental Pathology (P.S. Bernard), NCI Breast Specialized Program of Research Excellence grant P50-CA58223-09A1 (C.M. Perou), St. Louis Affiliate of the Susan G. Komen Foundation CRAFT (M.J. Ellis), Breast Cancer Research Foundation (C.M. Perou and M.J. Ellis), and Sanofi-Aventis Canada unrestricted educational grant. Additional support provided by the TRAC facility and Informatics at the Huntsman Cancer Center, supported in part by NCI Cancer Center Support grant P30 CA42014-19, and the tissue procurement facility at the Alvin J. Siteman Cancer Center at Washington University School of Medicine, which is funded in part by the NCI Cancer Center Support grant P30 CA91842.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1
Paik
S
,
Shak
S
,
Tang
G
, et al
. 
A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer
.
N Engl J Med
2004
;
351
:
2817
26
.
2
Desmedt
C
,
Piette
F
,
Loi
S
, et al
. 
Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series
.
Clin Cancer Res
2007
;
13
:
3207
14
.
3
Goetz
MP
,
Suman
VJ
,
Ingle
JN
, et al
. 
A two-gene expression ratio of homeobox 13 and interleukin-17B receptor for prediction of recurrence and survival in women receiving adjuvant tamoxifen
.
Clin Cancer Res
2006
;
12
:
2080
7
.
4
Ross
JS
. 
Multigene classifiers, prognostic factors, and predictors of breast cancer clinical outcome
.
Adv Anat Pathol
2009
;
16
:
204
15
.
5
Tutt
A
,
Wang
A
,
Rowland
C
, et al
. 
Risk estimation of distant metastasis in node-negative, estrogen receptor-positive breast cancer patients using an RT-PCR based prognostic expression signature
.
BMC Cancer
2008
;
8
:
339
.
6
Cheang
MC
,
Chia
SK
,
Voduc
D
, et al
. 
Ki67 index, HER2 status, and prognosis of patients with luminal B breast cancer
.
J Natl Cancer Inst
2009
;
101
:
736
50
.
7
Goss
PE
,
Ingle
JN
,
Martino
S
, et al
. 
A randomized trial of letrozole in postmenopausal women after five years of tamoxifen therapy for early-stage breast cancer
.
N Engl J Med
2003
;
349
:
1793
802
.
8
Olivotto
IA
,
Bajdik
CD
,
Ravdin
PM
, et al
. 
Population-based validation of the prognostic model ADJUVANT! for early breast cancer
.
J Clin Oncol
2005
;
23
:
2716
25
.
9
Parker
JS
,
Mullins
M
,
Cheang
MC
, et al
. 
Supervised risk predictor of breast cancer based on intrinsic subtypes
.
J Clin Oncol
2009
;
27
:
1160
7
.
10
Cheang
MC
,
Treaba
DO
,
Speers
CH
, et al
. 
Immunohistochemical detection using the new rabbit monoclonal antibody SP1 of estrogen receptor in breast cancer is superior to mouse monoclonal antibody 1D5 in predicting survival
.
J Clin Oncol
2006
;
24
:
5637
44
.
11
Chia
S
,
Norris
B
,
Speers
C
, et al
. 
Human epidermal growth factor receptor 2 overexpression as a prognostic factor in a large tissue microarray series of node-negative breast cancers
.
J Clin Oncol
2008
;
26
:
5697
704
.
12
Liu
S
,
Chia
SK
,
Mehl
E
, et al
. 
Progesterone receptor is a significant factor associated with clinical outcomes and effect of adjuvant tamoxifen therapy in breast cancer patients
.
Breast Cancer Res Treat
2009
;
119
:
53
61
.
13
Cheang
MC
,
Voduc
D
,
Bajdik
C
, et al
. 
Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype
.
Clin Cancer Res
2008
;
14
:
1368
76
.
14
Turbin
DA
,
Leung
S
,
Cheang
MC
, et al
. 
Automated quantitative analysis of estrogen receptor expression in breast carcinoma does not differ from expert pathologist scoring: a tissue microarray study of 3,484 cases
.
Breast Cancer Res Treat
2008
;
110
:
417
26
.
15
McShane
LM
,
Altman
DG
,
Sauerbrei
W
,
Taube
SE
,
Gion
M
,
Clark
GM
. 
REporting recommendations for tumor MARKer prognostic studies (REMARK)
.
Nat Clin Pract Oncol
2005
;
2
:
416
22
.
16
Cox
D
,
Oakes
D
. 
Analysis of survival data
.
Monographs on statistics and probability
.
London (United Kingdom)
:
Chapman and Hall
; 
1984
.
17
Grambsch
P
,
Therneau
TM
. 
Proportional hazards tests and diagnostics based on weighted residuals
.
Biometrika
1994
;
81
:
515
26
.
18
Harrell
FE
 Jr.
,
Lee
KL
,
Mark
DB
. 
Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors
.
Stat Med
1996
;
15
:
361
87
.
19
Harrell
FE
 Jr
.
Design Package, in R package version 2.3-0. 2009
.
20
Schemper
M
. 
Cox analysis of survival data with non-proportional hazard functions
.
The Statistician
1992
;
41
:
455
65
.
21
Calza
S
,
Hall
P
,
Auer
G
, et al
. 
Intrinsic molecular signature of breast cancer in a population-based cohort of 412 patients
.
Breast Cancer Res
2006
;
8
:
R34
.
22
Fan
C
,
Oh
DS
,
Wessels
L
, et al
. 
Concordance among gene-expression-based predictors for breast cancer
.
N Engl J Med
2006
;
355
:
560
9
.
23
Hu
Z
,
Fan
C
,
Oh
DS
, et al
. 
The molecular portraits of breast tumors are conserved across microarray platforms
.
BMC Genomics
2006
;
7
:
96
.
24
Kapp
AV
,
Jeffrey
SS
,
Langerod
A
, et al
. 
Discovery and validation of breast cancer subtypes
.
BMC Genomics
2006
;
7
:
231
.
25
Goldhirsch
A
,
Wood
WC
,
Gelber
RD
,
Coates
AS
,
Thurlimann
B
,
Senn
HJ
. 
Progress and promise: highlights of the international expert consensus on the primary therapy of early breast cancer 2007
.
Ann Oncol
2007
;
18
:
1133
44
.
26
Jensen
KC
,
Turbin
DA
,
Leung
S
, et al
. 
New cutpoints to identify increased HER2 copy number: analysis of a large, population-based cohort with long-term follow-up
.
Breast Cancer Res Treat
2008
;
112
:
453
9
.
27
Ravdin
PM
,
Siminoff
LA
,
Davis
GJ
, et al
. 
Computer program to assist in making decisions about adjuvant therapy for women with early breast cancer
.
J Clin Oncol
2001
;
19
:
980
91
.
28
Ivshina
AV
,
George
J
,
Senko
O
, et al
. 
Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer
.
Cancer Res
2006
;
66
:
10292
301
.
29
Sotiriou
C
,
Wirapati
P
,
Loi
S
, et al
. 
Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis
.
J Natl Cancer Inst
2006
;
98
:
262
72
.
30
Glas
AM
,
Floore
A
,
Delahaye
LJ
, et al
. 
Converting a breast cancer microarray signature into a high-throughput diagnostic test
.
BMC Genomics
2006
;
7
:
278
.
31
van 't Veer
LJ
,
Dai
H
,
van de Vijver
MJ
, et al
. 
Gene expression profiling predicts clinical outcome of breast cancer
.
Nature
2002
;
415
:
530
6
.
32
Paik
S
. 
Development and clinical utility of a 21-gene recurrence score prognostic assay in patients with early breast cancer treated with tamoxifen
.
Oncologist
2007
;
12
:
631
5
.
33
Rouzier
R
,
Perou
CM
,
Symmans
WF
, et al
. 
Breast cancer molecular subtypes respond differently to preoperative chemotherapy
.
Clin Cancer Res
2005
;
11
:
5678
85
.
34
Paik
S
,
Tang
G
,
Shak
S
, et al
. 
Gene expression and benefit of chemotherapy in women with node-negative, estrogen receptor-positive breast cancer
.
J Clin Oncol
2006
;
24
:
3726
34
.
35
Albain
KS
,
Barlow
WE
,
Shak
S
, et al
. 
Prognostic and predictive value of the 21-gene recurrence score assay in postmenopausal women with node-positive, oestrogen-receptor-positive breast cancer on chemotherapy: a retrospective analysis of a randomised trial
.
Lancet Oncol
2009
;11:55–65.
36
Ellis
MJ
,
Tao
Y
,
Luo
J
, et al
. 
Outcome prediction for estrogen receptor-positive breast cancer based on postneoadjuvant endocrine therapy tumor characteristics
.
J Natl Cancer Inst
2008
;
100
:
1380
8
.
37
Simon
RM
,
Paik
S
,
Hayes
DF
. 
Use of archived specimens in evaluation of prognostic and predictive biomarkers
.
J Natl Cancer Inst
2009
;
101
:
1446
52
.

Supplementary data