Background:

Modified median and subgroup-specific gene centering are two essential preprocessing methods to assign breast cancer molecular subtypes by PAM50. We evaluated the PAM50 subtypes derived from both methods in a subset of Nurses' Health Study (NHS) and NHSII participants; correlated tumor subtypes by PAM50 with IHC surrogates; and characterized the PAM50 subtype distribution, proliferation scores, and risk of relapse with proliferation and tumor size weighted (ROR-PT) scores in the NHS/NHSII.

Methods:

PAM50 subtypes, proliferation scores, and ROR-PT scores were calculated for 882 invasive breast tumors and 695 histologically normal tumor-adjacent tissues. Cox proportional hazards models evaluated the relationship between PAM50 subtypes or ROR-PT scores/groups with recurrence-free survival (RFS) or distant RFS.

Results:

PAM50 subtypes were highly comparable between the two methods. The agreement between tumor subtypes by PAM50 and IHC surrogates improved to fair when Luminal subtypes were grouped together. Using the modified median method, our study consisted of 46% Luminal A, 18% Luminal B, 14% HER2-enriched, 15% Basal-like, and 8% Normal-like subtypes; 53% of tumor-adjacent tissues were Normal-like. Women with the Basal-like subtype had a higher rate of relapse within 5 years. HER2-enriched subtypes had poorer outcomes prior to 1999.

Conclusions:

Either preprocessing method may be utilized to derive PAM50 subtypes for future studies. The majority of NHS/NHSII tumor and tumor-adjacent tissues were classified as Luminal A and Normal-like, respectively.

Impact:

Preprocessing methods are important for the accurate assignment of PAM50 subtypes. These data provide evidence that either preprocessing method can be used in epidemiologic studies.

Breast cancer is a heterogeneous disease at both morphologic and molecular levels (1, 2). Given this diversity, many approaches such as MammaPrint (3), Oncotype DX (4), and PAM50 (5), have been developed to classify breast tumors to inform prognosis and guide treatment. PAM50 is a 50-gene signature that classifies breast cancer into five molecular intrinsic subtypes: Luminal A, Luminal B, HER2-enriched, Basal-like, and Normal-like (1, 5). Each of the five molecular subtypes varies by their biological properties and prognoses (6, 7). Luminal A generally has the best prognosis; HER2-enriched and Basal-like are considered more aggressive diseases. Less common subtypes, such as Claudin-low, Interferon-rich, and Molecular Apocrine, have also been identified using other gene expression profiling assays (8–11).

Molecular subtyping using the PAM50 gene signature can be performed using gene expression derived from microarrays, RNASeq, or qRT-PCR. Until the recent development of Prosigna, a rapid PAM50-based molecular subtype classifier using the NanoString nCounter Dx Analysis System (12), the complexities of using PAM50 and other gene signature assays for molecular subtyping have limited their use in clinical practice and led to the development of IHC surrogate definitions to classify tumors into molecular subtypes (13, 14). For example, the immunophenotypic surrogate profile for classifying a tumor as Basal-like is one that is estrogen receptor (ER), progesterone receptor (PR), and HER2 negative, with positive expression of cytokeratin 5/6 (CK 5/6) and/or EGFR (15). However, studies have reported differences in tumor classification when comparing molecular assays and IHC (16, 17). There are ongoing efforts to refine the IHC definitions to more closely approximate molecular subtypes (7, 18–21).

In addition to the discrepancies between molecular subtyping using PAM50 and IHC, inaccurate preprocessing of gene expression data, as well as utilizing nonstandard PAM50 algorithms will result in inconsistent and/or erroneous assignment of molecular subtypes (22–25). In particular, molecular subtype assignment by PAM50 may be affected when the clinicopathologic distribution (e.g., ER status) of the intended research cohort differs from the original cohort used by Parker and colleagues to derive the PAM50 algorithm. The original cohort had an equal distribution of ER+ and ER tumors (i.e., 50% ER+/50% ER; ref. 5). To address this problem, a modified median gene centering (MMGC) preprocessing method was developed (1, 2). Later, Zhao and colleagues proposed a subgroup-specific gene centering (SSGC) preprocessing method (26).

Although PAM50 subtypes were initially developed to classify breast cancer, molecular subtypes can also be reflected in histologically normal tumor-adjacent tissues (henceforth referred to as “tumor-adjacent”). Each subtype is associated with a distinct physiologic response in the tumor-adjacent tissue; and specific gene expression patterns in these tumor-adjacent regions may be associated with varying risk of recurrence and prognosis (27–30). Thus, these prior studies suggest the importance of studying tumor-adjacent tissues in breast cancer.

We have previously reported the tumor molecular subtypes using IHC surrogates for 5561 Nurses' Health Study (NHS) and NHSII participants diagnosed with breast cancer (31). In this study, we describe the tumor and tumor-adjacent PAM50 molecular subtypes in a subset of 954 NHS/NHSII participants with gene expression data. Specifically, we:

  • (i) computed and compared breast cancer PAM50 molecular subtypes, proliferation scores, and risk of relapse with proliferation and tumor size weighted (ROR-PT) scores derived from both the MMGC and SSGC preprocessing methods;

  • (ii) determined the concordance of tumor molecular subtypes using PAM50 and IHC surrogates; and

  • (iii) described the tumor PAM50 subtype distribution, proliferation scores, and ROR-PT scores in the NHS/NHSII.

Study population

The Human Subjects Committee at Partners Healthcare System and Brigham and Women's Hospital in Boston, MA approved this study. The NHS and NHSII cohorts are ongoing prospective studies of U.S. female registered nurses followed biennially by questionnaires to query exposures and identify newly diagnosed diseases. NHS was established in 1976 with 121,700 participants between 30–55 years of age, and NHSII was established in 1989 (n = 116,429, ages 25–42 years). Written permission was obtained from participants who were diagnosed with invasive breast cancer, or their next of kin, to review medical records for diagnosis confirmation, retrieval of cancer details, and to collect archival tissue specimens. Archival formalin-fixed, paraffin embedded (FFPE) breast cancer tissue blocks were requested from respective hospitals (32).

Breast cancer recurrence

Local and distant recurrences were self-reported by NHS/NHSII participants; no medical record review was conducted for recurrences. Recurrence-free survival (RFS) is defined as time from diagnosis to reported breast cancer recurrence, diagnosis of cancer in common sites of recurrence (i.e., liver, lung, brain, or bone) or death from breast cancer without reported recurrence. Distant recurrence-free survival (DRFS) is defined as time from initial diagnosis to diagnosis of cancer in common sites of recurrence (i.e., liver, lung, brain, or bone) or death from breast cancer without reported recurrence.

Gene expression data

The protocol to obtain RNA from FFPE tissues was published previously (33). Gene expression data were obtained in two batches using microarrays performed in 2012–2014 using the Glue Grant Human Transcriptome Arrays (HTA) 3.0 prerelease version (Affymetrix, Santa Clara; ref. 33) and 2015–2018 (HTA 2.0, Affymetrix) by the Molecular Biology Core Facilities, Dana-Farber Cancer Institute (Boston, MA). Gene expression data were normalized, summarized into log2 values using Robust Multi-array Average, and annotated. All microarrays and sample information are available at the National Center for Biotechnology Information Gene Expression Omnibus (accession number: GSE115577).

Molecular subtyping by PAM50, proliferation scores, and ROR-PT scores

Molecular subtyping by PAM50 was carried out separately for tumor and tumor-adjacent samples. Gene adjustment factors for tumor-adjacent samples were estimated from tumors. After adjusting the gene expression dataset using the MMGC or SSGC method, research-based PAM50 classification was performed. Proliferation scores and ROR-PT scores are additional measures that were subsequently developed to further characterize breast tumors and are automatically generated by the PAM50 algorithm (34). Thus, proliferation scores and ROR-PT scores were only reported for tumor tissues. Proliferation scores were computed using three methods: log2 expression (no centering), MMGC-adjusted expression, and SSGC-adjusted expression. The ROR-PT score is calculated using PAM50 subtype, proliferation score, and pathologic tumor size.

Molecular subtyping using IHC surrogates

IHC data were obtained from tissue microarrays (31, 32, 35). Missing IHCs for ER, PR, and HER2 (n = 144) were replaced with data from medical records. Tumors were classified into Luminal A, Luminal B, HER2-enriched, and Basal-like as defined previously (14, 31, 36). For tumors missing Ki-67 IHC data (n = 545), histologic grade was used as a proxy in classification.

Please refer to the Supplementary Methods for gene expression, preprocessing methods, and IHC surrogate details.

Statistical analysis

Confusion matrices were used to determine the concordance of PAM50 subtypes when gene expression data were preprocessed using the MMGC or SSGC method, the concordance of molecular subtypes classified using PAM50 and IHC in tumor tissues, and the concordance of subtypes in paired tumor and tumor-adjacent tissues (37). The confusion matrix computes summaries such as accuracy (the frequency of agreement), and Cohen kappa (a measure which accounts for the agreement expected to occur by random chance). Spearman ρ was used to determine the correlation between the two methods used to derive proliferation scores and ROR-PT scores in tumor tissues.

RFS and DRFS were evaluated at 5 and 10 years because these time points are generally utilized in clinical studies. Thus, crude and adjusted Cox proportional hazards models evaluated the relationship between PAM50 subtypes or ROR-PT scores/groups with RFS or DRFS within 5 and 10 years in the NHS/NHSII. Individuals were censored for RFS or DRFS if they were reported to have death from other causes or end of follow-up. Adjusted models included age and year of diagnosis, clinical grade, stage, type of surgery (lumpectomy, mastectomy, none, and unknown), and type of treatment (chemotherapy, hormone therapy, radiotherapy, two or more types of therapies, none, and unknown). When evaluating tumor PAM50 subtypes in the Cox proportional hazards models, Luminal A was set as the reference group. The proportional hazards assumption was tested through evaluation of scaled Schoenfeld residuals (38). All tests of statistical significance were two-sided. Statistical significance was defined as a P < 0.05. All analyses were conducted using R version 3.4.0. Kaplan–Meier curves were plotted using survminer version 0.4.0 package in R.

This analysis consisted of gene expression data from 954 women who contributed 882 tumors and 695 histologically normal tumor-adjacent samples. Of these, there were 623 paired samples. This subset of 954 women with gene expression data was generally representative of the NHS/NHSII population diagnosed with breast cancer (Supplementary Table S1). The majority of participants in this study had stage I disease, were clinical grade 2, ER+ and PR+, and HER2. NHS women had more IHC HER2+ cases compared with NHSII (Supplementary Table S2). Among the 882 women who contributed tumor samples, RFS and DRFS data were unavailable for 6 women. At 10 years of follow-up, there were 112 recurrence and 85 distant recurrence events. ROR-PT scores were computed for 863 cases; 19 cases were not computed due to missing tumor size. Thus, only 857 women were included for ROR-PT and RFS/DRFS analyses.

Comparing PAM50 molecular subtypes, proliferation scores, and ROR-PT scores derived from the two preprocessing methods

PAM50 subtypes derived by both preprocessing methods were highly agreeable. Figure 1 shows the concordance of PAM50 subtypes in tumor (accuracy = 0.86, κ = 0.81) and tumor-adjacent tissue (accuracy = 0.82, κ = 0.74). Most tumors were classified as Luminal A (46% using MMGC and 40% using SSGC; Fig. 1A). Of the 695 tumor-adjacent tissues, 53% and 39% were classified as Normal-like by MMGC and SSGC, respectively (Fig. 1B). More tumor samples were assigned as Luminal B or HER2-enriched using SSGC compared with MMGC, while MMGC assigned more tumor samples as Normal-like. Further investigation into why there was a shift in tumors classified as Luminal B to Luminal A (n = 44), and more Normal-like cells (n = 36) using MMGC revealed that proliferation scores computed using SSGC were slightly higher compared with MMGC resulting in these cases being classified into more aggressive molecular subtypes when SSGC method was used (Supplementary Fig. S1A–S1C). In general, proliferation scores of tumors were highly correlated between simple log2 expression (no centering) and each preprocessing method (both P < 0.01; Supplementary Fig. S2). ROR-PT scores for tumors were highly correlated between MMGC and SSGC (Spearman ρ = 0.99, P < 0.01).

Figure 1.

A, The correlation of molecular subtypes by PAM50 in tumor (accuracy = 0.86, κ = 0.81) between modified median or subgroup-specific gene centering preprocessing methods. B, The correlation of molecular subtypes by PAM50 in tumor-adjacent tissue (accuracy = 0.82, κ = 0.74) between modified median or subgroup-specific gene centering preprocessing methods.

Figure 1.

A, The correlation of molecular subtypes by PAM50 in tumor (accuracy = 0.86, κ = 0.81) between modified median or subgroup-specific gene centering preprocessing methods. B, The correlation of molecular subtypes by PAM50 in tumor-adjacent tissue (accuracy = 0.82, κ = 0.74) between modified median or subgroup-specific gene centering preprocessing methods.

Close modal

Comparing PAM50 molecular subtypes derived from the two preprocessing methods and IHC surrogates

Figure 2A and B display the correlation between Luminal A, Luminal B, HER2-enriched, and Basal-like as classified by PAM50 and IHC surrogates (MMGC: accuracy = 0.54, κ = 0.32; and SSGC: accuracy = 0.53, κ = 0.32). With κ at 0.32, there is poor agreement between PAM50 and IHC. When the Luminal subtypes were grouped together, the correlation between PAM50 and IHC improved to fair agreement (MMGC: accuracy = 0.81, κ = 0.53, Fig. 2C; SSGC: accuracy = 0.79, κ = 0.49, Fig. 2D). Very similar results were obtained when analyses were restricted to women with Ki-67 IHC data (n = 337; Supplementary Data).

Figure 2.

The correlation between the molecular subtypes (Luminal A, Luminal B, HER2-enriched, and Basal-like) by PAM50 and IHC surrogates using modified median (A) and subgroup-specific (B) gene centering methods. When the Luminal subtypes were grouped together, the correlation between PAM50 and IHC improved to fair agreement using modified median (C; accuracy = 0.81, κ = 0.53) and subgroup-specific (D; accuracy = 0.79, κ = 0.49) gene centering methods. IHC surrogates were defined as Luminal A ER+ and/or PR+, HER2, and Ki-67 low (or histologic grade 1 or 2); Luminal B ER+ and/or PR+, and HER2+; or ER+ and/or PR+, HER2, and Ki-67 high (or histologic grade 3); HER2-enriched ER, PR and HER2+; Basal-like ER, PR, HER2, and CK 5/6+ and/or EGFR+.

Figure 2.

The correlation between the molecular subtypes (Luminal A, Luminal B, HER2-enriched, and Basal-like) by PAM50 and IHC surrogates using modified median (A) and subgroup-specific (B) gene centering methods. When the Luminal subtypes were grouped together, the correlation between PAM50 and IHC improved to fair agreement using modified median (C; accuracy = 0.81, κ = 0.53) and subgroup-specific (D; accuracy = 0.79, κ = 0.49) gene centering methods. IHC surrogates were defined as Luminal A ER+ and/or PR+, HER2, and Ki-67 low (or histologic grade 1 or 2); Luminal B ER+ and/or PR+, and HER2+; or ER+ and/or PR+, HER2, and Ki-67 high (or histologic grade 3); HER2-enriched ER, PR and HER2+; Basal-like ER, PR, HER2, and CK 5/6+ and/or EGFR+.

Close modal

Molecular subtypes in tumor and tumor-adjacent tissues

Among 623 paired samples, the most common pairing was Luminal A tumors and Normal-like tumor-adjacent tissues using both preprocessing methods (Fig. 3A and B). Women with Luminal A or B tumors were more likely to have Normal-like tumor-adjacent subtype than women with HER2-enriched or Basal-like tumors. The agreement between paired tumor and tumor-adjacent subtypes was 30% using MMGC and 32% using SSGC.

Figure 3.

A, PAM50 subtypes of paired samples using modified median gene centering preprocessing method. B, PAM50 subtypes of paired samples using subgroup-specific gene centering preprocessing method.

Figure 3.

A, PAM50 subtypes of paired samples using modified median gene centering preprocessing method. B, PAM50 subtypes of paired samples using subgroup-specific gene centering preprocessing method.

Close modal

Tumor PAM50 subtypes, ROR-PT scores, and prognosis in the NHS/NHSII

Because there was high concordance in PAM50 subtypes between the two preprocessing methods, subsequent main tables in the article will display results derived from the MMGC method while supplementary tables display results from SSGC. Luminal A and B, and HER2-enriched were generally of clinical grade 2 while 38% of grade 3 tumors were of the Basal-like subtype (Table 1). IHC ER+ tumors (76%) were Luminal A or B, while 54% of ER tumors were Basal-like. Similarly, 77% of IHC PR+ tumors were Luminal subtypes and 49% of PR tumors were Basal-like. In tumors classified as HER2-enriched, only 42% were IHC HER2+. The association of PAM50 subtypes computed using the SSGC method and NHS/NHSII participants are in Supplementary Table S3.

Table 1.

Tumor PAM50 subtypes in the NHS/NHSII cohorts

Luminal ALuminal BHER2-enrichedBasal-likeNormal-like
N 405 157 124 128 68 
NHS Cohort, n (%) 
 NHS 234 (57.8) 93 (59.2) 81 (65.3) 84 (65.6) 45 (66.2) 
 NHSII 171 (42.2) 64 (40.8) 43 (34.7) 44 (34.4) 23 (33.8) 
Tumor grade, n (%) 
 1: Predominantly well-differentiated 143 (36.4) 21 (13.5) 14 (11.8) 11 (9.4) 22 (36.1) 
 2: Moderately differentiated 216 (55.0) 86 (55.1) 72 (60.5) 33 (28.2) 35 (57.4) 
 3: Poorly differentiated 34 (8.7) 49 (31.4) 33 (27.7) 73 (62.4) 4 (6.6) 
Stage, n (%) 
 I 265 (65.6) 89 (56.7) 65 (52.4) 62 (48.8) 47 (69.1) 
 II 104 (25.7) 53 (33.8) 40 (32.3) 59 (46.5) 15 (22.1) 
 III 32 (7.9) 14 (8.9) 18 (14.5) 6 (4.7) 5 (7.4) 
 IV 3 (0.7) 1 (0.6) 1 (0.8) 0 (0.0) 1 (1.5) 
Tumor ER, n (%) 
 Positive 389 (96.5) 150 (96.2) 82 (66.7) 36 (28.3) 52 (76.5) 
 Negative 14 (3.5) 6 (3.8) 41 (33.3) 91 (71.7) 16 (23.5) 
Tumor PR, n (%) 
 Positive 382 (95.3) 141 (90.4) 76 (62.3) 33 (26.2) 51 (75.0) 
 Negative 19 (4.7) 15 (9.6) 46 (37.7) 93 (73.8) 17 (25.0) 
Tumor HER2, n (%) 
 Positive 101 (26.5) 40 (27.8) 48 (42.1) 21 (17.8) 14 (23.0) 
 Negative 280 (73.5) 104 (72.2) 66 (57.9) 97 (82.2) 47 (77.0) 
Tumor Ki-67, n (%) 
 High 36 (25.2) 27 (44.3) 19 (38.0) 26 (49.1) 5 (16.7) 
 Low 107 (74.8) 34 (55.7) 31 (62.0) 27 (50.9) 25 (83.3) 
Luminal ALuminal BHER2-enrichedBasal-likeNormal-like
N 405 157 124 128 68 
NHS Cohort, n (%) 
 NHS 234 (57.8) 93 (59.2) 81 (65.3) 84 (65.6) 45 (66.2) 
 NHSII 171 (42.2) 64 (40.8) 43 (34.7) 44 (34.4) 23 (33.8) 
Tumor grade, n (%) 
 1: Predominantly well-differentiated 143 (36.4) 21 (13.5) 14 (11.8) 11 (9.4) 22 (36.1) 
 2: Moderately differentiated 216 (55.0) 86 (55.1) 72 (60.5) 33 (28.2) 35 (57.4) 
 3: Poorly differentiated 34 (8.7) 49 (31.4) 33 (27.7) 73 (62.4) 4 (6.6) 
Stage, n (%) 
 I 265 (65.6) 89 (56.7) 65 (52.4) 62 (48.8) 47 (69.1) 
 II 104 (25.7) 53 (33.8) 40 (32.3) 59 (46.5) 15 (22.1) 
 III 32 (7.9) 14 (8.9) 18 (14.5) 6 (4.7) 5 (7.4) 
 IV 3 (0.7) 1 (0.6) 1 (0.8) 0 (0.0) 1 (1.5) 
Tumor ER, n (%) 
 Positive 389 (96.5) 150 (96.2) 82 (66.7) 36 (28.3) 52 (76.5) 
 Negative 14 (3.5) 6 (3.8) 41 (33.3) 91 (71.7) 16 (23.5) 
Tumor PR, n (%) 
 Positive 382 (95.3) 141 (90.4) 76 (62.3) 33 (26.2) 51 (75.0) 
 Negative 19 (4.7) 15 (9.6) 46 (37.7) 93 (73.8) 17 (25.0) 
Tumor HER2, n (%) 
 Positive 101 (26.5) 40 (27.8) 48 (42.1) 21 (17.8) 14 (23.0) 
 Negative 280 (73.5) 104 (72.2) 66 (57.9) 97 (82.2) 47 (77.0) 
Tumor Ki-67, n (%) 
 High 36 (25.2) 27 (44.3) 19 (38.0) 26 (49.1) 5 (16.7) 
 Low 107 (74.8) 34 (55.7) 31 (62.0) 27 (50.9) 25 (83.3) 

Women with the Basal-like subtype were significantly more likely to have poorer RFS outcomes within 5 years (Table 2A). Although women with HER2-enriched subtypes appear to have significantly poorer RFS and DRFS outcomes at both 5 and 10 years compared with women with Luminal A subtypes (Table 2A and B), further analyses showed that this finding is generally reflective of women diagnosed prior to the introduction of targeted therapy for HER2 (i.e., trastuzumab) in 1999 (Supplementary Table S4A). After 1999, there was no difference in RFS or DRFS rates among women with HER2-enriched subtypes compared with women with Luminal A subtypes (Supplementary Table S4B). The relationships between PAM50 subtypes and RFS or DRFS in the NHS/NHSII are illustrated in Supplementary Fig. S3A–S3D.

Table 2A.

The association of PAM50 subtypes and risk of relapse with proliferation and tumor size weighted (ROR-PT) scores with RFS within 5 and 10 years in the NHS/NHSII cohorts

5 Years10 Years
Event n/Total nHR (95% CI)PEvent n/Total nHR (95% CI)P
A. Crude 
 Luminal A 19/402 1.00 (ref) — 36/402 1.00 (ref) — 
 Luminal B 13/157 1.81 (0.90–3.67) 0.10 22/157 1.63 (0.96–2.76) 0.07 
 HER2-enriched 18/122 3.32 (1.74–6.33) <0.01 22/122 2.15 (1.27–3.66) <0.01 
 Basal-like 13/127 2.26 (1.12–4.58) 0.02 21/127 1.93 (1.13–3.31) 0.02 
 Normal-like 7/68 2.24 (0.94–5.33) 0.07 11/68 1.85 (0.94–3.64) 0.07 
 ROR-PT scorea 70/857 1.24 (1.11–1.39) <0.01 112/857 1.19 (1.09–1.30) <0.01 
B. Adjusted modelb 
 Luminal A 19/402 1.00 (ref) — 36/402 1.00 (ref) — 
 Luminal B 13/157 1.61 (0.77–3.38) 0.21 22/157 1.46 (0.83–2.56) 0.19 
 HER2-enriched 18/122 2.80 (1.42–5.55) <0.01 22/122 1.87 (1.06–3.27) 0.03 
 Basal-like 13/127 2.42 (1.06–5.49) 0.03 21/127 1.80 (0.96–3.38) 0.07 
 Normal-like 7/68 1.99 (0.78–5.09) 0.15 11/68 1.75 (0.86–3.58) 0.13 
 ROR-PT scorea 70/857 1.09 (0.95–1.25) 0.21 112/857 1.06 (0.95–1.18) 0.27 
5 Years10 Years
Event n/Total nHR (95% CI)PEvent n/Total nHR (95% CI)P
A. Crude 
 Luminal A 19/402 1.00 (ref) — 36/402 1.00 (ref) — 
 Luminal B 13/157 1.81 (0.90–3.67) 0.10 22/157 1.63 (0.96–2.76) 0.07 
 HER2-enriched 18/122 3.32 (1.74–6.33) <0.01 22/122 2.15 (1.27–3.66) <0.01 
 Basal-like 13/127 2.26 (1.12–4.58) 0.02 21/127 1.93 (1.13–3.31) 0.02 
 Normal-like 7/68 2.24 (0.94–5.33) 0.07 11/68 1.85 (0.94–3.64) 0.07 
 ROR-PT scorea 70/857 1.24 (1.11–1.39) <0.01 112/857 1.19 (1.09–1.30) <0.01 
B. Adjusted modelb 
 Luminal A 19/402 1.00 (ref) — 36/402 1.00 (ref) — 
 Luminal B 13/157 1.61 (0.77–3.38) 0.21 22/157 1.46 (0.83–2.56) 0.19 
 HER2-enriched 18/122 2.80 (1.42–5.55) <0.01 22/122 1.87 (1.06–3.27) 0.03 
 Basal-like 13/127 2.42 (1.06–5.49) 0.03 21/127 1.80 (0.96–3.38) 0.07 
 Normal-like 7/68 1.99 (0.78–5.09) 0.15 11/68 1.75 (0.86–3.58) 0.13 
 ROR-PT scorea 70/857 1.09 (0.95–1.25) 0.21 112/857 1.06 (0.95–1.18) 0.27 

aROR-PT was evaluated as continuous variable per 10-unit change in 857 women with ROR-PT scores.

bAdjusted model included age and year of diagnosis, clinical grade, stage, type of surgery, and type of treatment.

Table 2B.

The association of PAM50 subtypes and risk of relapse with proliferation and tumor size weighted (ROR-PT) scores with DRFS within 5 and 10 years in the NHS/NHSII cohorts

5 Years10 Years
Event n/Total nHR (95% CI)PEvent n/Total nHR (95% CI)P
A. Crude 
 Luminal A 17/402 1.00 (ref) — 25/402 1.00 (ref) — 
 Luminal B 9/157 1.38 (0.61–3.09) 0.44 18/157 1.88 (1.03–3.45) 0.04 
 HER2-enriched 16/122 3.29 (1.66–6.51) <0.01 18/122 2.52 (1.38–4.62) <0.01 
 Basal-like 9/127 1.72 (0.77–3.86) 0.19 15/127 1.95 (1.03–3.69) 0.04 
 Normal-like 6/68 2.14 (0.84–5.43) 0.11 9/68 2.17 (1.02–4.66) <0.05 
 ROR-PT scorea 57/857 1.29 (1.14–1.46) <0.01 85/857 1.23 (1.11–1.36) <0.01 
B. Adjusted modelb 
 Luminal A 17/402 1.00 (ref) — 25/402 1.00 (ref) — 
 Luminal B 9/157 1.19 (0.51–2.79) 0.69 18/157 1.68 (0.88–3.18) 0.11 
 HER2-enriched 16/122 2.74 (1.31–5.70) <0.01 18/122 2.18 (1.15–4.14) 0.02 
 Basal-like 9/127 1.96 (0.76–5.04) 0.16 15/127 2.07 (0.98–4.37) 0.06 
 Normal-like 6/68 1.85 (0.66–5.16) 0.24 9/68 1.93 (0.86–4.34) 0.11 
 ROR-PT scorea 57/857 1.15 (0.98–1.34) 0.08 85/857 1.10 (0.97–1.23) 0.13 
5 Years10 Years
Event n/Total nHR (95% CI)PEvent n/Total nHR (95% CI)P
A. Crude 
 Luminal A 17/402 1.00 (ref) — 25/402 1.00 (ref) — 
 Luminal B 9/157 1.38 (0.61–3.09) 0.44 18/157 1.88 (1.03–3.45) 0.04 
 HER2-enriched 16/122 3.29 (1.66–6.51) <0.01 18/122 2.52 (1.38–4.62) <0.01 
 Basal-like 9/127 1.72 (0.77–3.86) 0.19 15/127 1.95 (1.03–3.69) 0.04 
 Normal-like 6/68 2.14 (0.84–5.43) 0.11 9/68 2.17 (1.02–4.66) <0.05 
 ROR-PT scorea 57/857 1.29 (1.14–1.46) <0.01 85/857 1.23 (1.11–1.36) <0.01 
B. Adjusted modelb 
 Luminal A 17/402 1.00 (ref) — 25/402 1.00 (ref) — 
 Luminal B 9/157 1.19 (0.51–2.79) 0.69 18/157 1.68 (0.88–3.18) 0.11 
 HER2-enriched 16/122 2.74 (1.31–5.70) <0.01 18/122 2.18 (1.15–4.14) 0.02 
 Basal-like 9/127 1.96 (0.76–5.04) 0.16 15/127 2.07 (0.98–4.37) 0.06 
 Normal-like 6/68 1.85 (0.66–5.16) 0.24 9/68 1.93 (0.86–4.34) 0.11 
 ROR-PT scorea 57/857 1.15 (0.98–1.34) 0.08 85/857 1.10 (0.97–1.23) 0.13 

aROR-PT was evaluated as continuous variable per 10-unit change in 857 women with ROR-PT scores.

bAdjusted model included age and year of diagnosis, clinical grade, stage, type of surgery, and type of treatment.

ROR-PT categories (low, medium, and high) were automatically stratified by the PAM50 algorithm and confirmed the expected relationships using both preprocessing methods, where women predicted as “low” had the best RFS and DRFS outcomes (Fig. 4A–D). ROR-PT scores were also analyzed as a continuous variable per 10-unit change in the Cox proportional hazards model. In crude models for MMGC, every 10-unit increase in ROR-PT scores corresponded to 24% increase in risk of recurrence [95% confidence interval (CI), 1.11–1.39] within 5 years and 19% (95% CI, 1.09–1.30) within 10 years (Table 2A); and 29% increase risk of distant recurrence (95% CI, 1.14–1.46) within 5 years and 23% (95% CI, 1.11–1.36) within 10 years (Table 2B). These findings attenuated in the adjusted models, although not all the way to the null. Results were very similar when PAM50 and ROR-PT were computed using SSGC (Supplementary Table S5A–S5D).

Figure 4.

These Kaplan–Meier curves display the relationships between the risk of with proliferation and tumor size weighted (ROR-PT) categories [low risk (green lines), medium risk (blue lines), and high risk (red lines)] and RFS or DRFS in the NHS. ROR-PT was separately computed using the modified median (A and B) and subgroup-specific (C and D) gene centering methods. Crude HRs are reported in the figures.

Figure 4.

These Kaplan–Meier curves display the relationships between the risk of with proliferation and tumor size weighted (ROR-PT) categories [low risk (green lines), medium risk (blue lines), and high risk (red lines)] and RFS or DRFS in the NHS. ROR-PT was separately computed using the modified median (A and B) and subgroup-specific (C and D) gene centering methods. Crude HRs are reported in the figures.

Close modal

The discovery of molecular subtypes has created a new tool for clinicians and researchers to further understand breast cancer biology (39), etiology, risk factors (40), and evaluate response to treatment (34, 41, 42). Thus, the accurate assignment of molecular subtypes is important. The distribution of PAM50 subtypes, proliferation scores, and ROR-PT scores were highly comparable when computed using either the MMGC or SSGC preprocessing method. Furthermore, the agreement between PAM50 classification by gene expression and IHC was fair when Luminal A and B were considered as a single group. The majority of the NHS/NHSII participants had Luminal A subtype tumors. There was a higher rate of recurrence in women with Basal-like subtypes compared with Luminal A subtypes. ROR-PT scores were only prognostic in crude analyses.

The application of a preprocessing step to the gene expression data, as well as selecting a specific preprocessing method (i.e., MMGC or SSGC) prior to subtyping are critical components to establish a reproducible informatics workflow for PAM50 classification. MMGC and SSGC generally yielded concordant subtypes and highly correlated proliferation scores. The associations between the PAM50 subtypes and ROR-PT scores and prognosis were similar when subtypes were computed using either method. It remains unclear which preprocessing method should be considered superior as there is no gold standard measure to compare with. Both preprocessing methods have practical utility and either one may be employed to classify breast tumors. We decided to use the MMGC method to report our main results as this method is widely utilized by The Cancer Genome Atlas breast cancer study team (1, 2). SSGC is an elegant alternative to MMGC that is useful as an additional check when performing PAM50 subtyping. Future data analyses should take note that proliferation estimates are generally higher when computed by the SSGC method compared with MMGC, tumors are more likely to be classified into the more aggressive molecular subtypes, and tumor-adjacent tissue is less likely to be classified as Normal-like.

The 2015 St Gallen International Expert Conference Report published recommended IHC definitions to more accurately reflect molecular subtypes (7). There are slight differences in the IHC definitions for the Luminal subtypes between our study and St Gallen's recommendations. Our PR were manually graded as 0, >1%, and >10% while St Gallen's suggests using >20% to classify PR+. We graded Ki-67 as low (<14%) or high (>14%) while St Gallen categorizes Ki-67 into low (<14%), intermediate (14%–19%), and high (≥20%). Ki-67 staining information was unavailable for about 60% of women; tumor grade was used as a proxy for these individuals. This may result in the misclassification of IHC subtypes for some individuals, and may, in part, explain the low agreement between molecular subtyping by PAM50 and IHC in our study. If IHC surrogate definitions are still to be used, further refinement is needed so that breast tumor classification will more closely approximate the PAM50 subtypes.

With technological advances in RNA extraction and the availability of the NanoString nCounter Dx Analysis System (12), more studies in the future should be able to obtain molecular subtypes derived from gene expression instead of relying on IHC surrogates. The difference in molecular subtyping by PAM50 and IHC is further demonstrated by this study. The PAM50 distribution of NHS/NHSII participants in this study was 46% Luminal A, 18% Luminal B, 14% HER2-enriched, 15% Basal-like, and 8% Normal-like while our previous study utilized IHC surrogates to classify 5,561 tumors reported higher percentages of women classified as Luminal A (55%) and B (27%), and lower percentages of HER2-enriched (6%) and Basal-like (10%) with 2.9% unclassified tumors (31).

Gene expression data are only available for a subset of the breast cancer cases in NHS/NHSII, although this subset is generally representative of the overall NHS/NHSII breast cancer population. The majority of NHS participants are white postmenopausal women, while NHSII participants are mostly white premenopausal women. Our data showed that the prevalence of each PAM50 subtype did not differ by participant menopausal status. Given that IHC subtype surrogates have been shown to differ by race, future studies should investigate potential differences in the distributions of PAM50 subtypes in minority populations (43).

As expected, women with tumors of Basal-like subtypes had poorer RFS outcomes compared with women with Luminal A subtypes. Women with tumors of HER2-enriched subtype only had significantly poorer RFS outcomes at both 5 and 10 years compared with women with Luminal A subtypes before 1999. In contrast to other studies, we did not observe poorer prognosis for Luminal B tumors at 10 years (41, 44). This may be attributed to the small number of events among women with Luminal B tumors or different preprocessing method used for PAM50 subtyping.

Molecular subtyping was specifically developed to classify breast cancers. We applied the PAM50 algorithm to classify histologically normal tumor-adjacent tissue into molecular subtypes. The histologically normal tissue was classified as Normal-like for 40%–50% of women, depending on the preprocessing method used. This suggests that histologically normal tumor-adjacent tissue may not be biologically normal for all women. Expression of an estrogen response signature and in vivo triple-negative signature in tumor-adjacent tissue was found to differ across tumor PAM50 subtypes (30). Future work could identify novel molecular subtypes unique to tumor-adjacent tissue and determine whether these novel subtypes within tumor-adjacent tissue may harbor additional insights to therapy response and prognosis (28, 29).

In summary, we used two preprocessing methods (MMGC and SSGC) to characterize the PAM50 breast cancer molecular subtypes of tumor and histologically normal tumor-adjacent samples. We have shown that either preprocessing method may be utilized to derive PAM50 subtypes for future studies. In the NHS/NHSII, the majority of tumor and tumor-adjacent tissues were classified as Luminal A and Normal-like, respectively. Women with Luminal A or B tumors were more likely to have Normal-like tumor-adjacent tissues than women with HER2-enriched or Basal-like tumors. Women with Basal-like subtypes had poorer prognoses compared with Luminal A subtypes. The identification of novel tumor-adjacent molecular subtypes in the future may provide new insights into breast cancer therapy response and prognosis.

J.S. Parker has ownership interest in and is a consultant/advisory board member for NanoString. No potential conflicts of interests were disclosed by the other authors.

Conception and design: K.H. Kensler, V.N. Sankar, L.C. Collins, R.M. Tamimi, Y.J. Heng

Development of methodology: V.N. Sankar, Y.J. Heng

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C.A. Rubadue, G.M. Baker, M.E. Pyle, D.J. Hunter, A.H. Elissen, S.E. Hankinson, R.M. Tamimi, Y.J. Heng

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.H. Kensler, V.N. Sankar, J. Wang, X. Zhang, C.A. Rubadue, J.S. Parker, K.A. Hoadley, L.C. Collins, S.E. Hankinson, Y.J. Heng

Writing, review, and/or revision of the manuscript: K.H. Kensler, V.N. Sankar, J. Wang, X. Zhang, C.A. Rubadue, G.M. Baker, J.S. Parker, K.A. Hoadley, M.E. Pyle, L.C. Collins, D.J. Hunter, A.H. Elissen, S.E. Hankinson, R.M. Tamimi, Y.J. Heng

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): V.N. Sankar, C.A. Rubadue, G.M. Baker, A.L. Stancu, M.E. Pyle, D.J. Hunter, A.H. Elissen, R.M. Tamimi, Y.J. Heng

Study supervision: Y.J. Heng

We thank the participants and staff of the NHS and the NHSII for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY. Funding for this project was provided by NIH grants U19 CA148065 (to D.J. Hunter and R.M. Tamimi), UM1 CA186107 (to R.M. Tamimi), P01 CA87969 (to R.M. Tamimi, A.H. Eliassen, Y.J. Heng, A.L. Stancu, M.E. Pyle, and G.M. Baker), UM1 CA176726 (to A.H. Eliassen, R.M. Tamimi, Y.J. Heng, A.L. Stancu, M.E. Pyle, and G.M. Baker), and R01 CA166666 (to S.E. Hankinson, Y.J. Heng, A.L. Stancu, and M.E. Pyle); Komen grant SAC110014 (to S.E. Hankinson, Y.J. Heng, and A.L. Stancu); National Cancer Institute Predoctoral National Research Service Award F31CA192462 (to K.H. Kensler); National Cancer Institute Institutional National Research Service Award T32CA009001 (to K.H. Kensler); and the Klarman Family Foundation (to Y.J. Heng).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
The Cancer Genome Atlas
. 
Comprehensive molecular portraits of human breast tumours
.
Nature
2012
;
490
:
61
70
.
2.
Heng
YJ
,
Lester
SC
,
Tse
GMK
,
Factor
RE
,
Allison
KH
,
Collins
LC
, et al
The molecular basis of breast cancer pathological phenotypes
.
J Pathol
2017
;
241
:
375
91
.
3.
Paik
S
,
Shak
S
,
Tang
G
,
Kim
C
,
Baker
J
,
Cronin
M
, et al
A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer
.
N Engl J Med
2004
;
351
:
2817
26
.
4.
van de Vijver
MJ
,
He
YD
,
van't Veer
LJ
,
Dai
H
,
Hart
AA
,
Voskuil
DW
, et al
A gene-expression signature as a predictor of survival in breast cancer
.
N Engl J Med
2002
;
347
:
1999
2009
.
5.
Parker
JS
,
Mullins
M
,
Cheang
MCU
,
Leung
S
,
Voduc
D
,
Vickery
T
, et al
Supervised risk predictor of breast cancer sased on intrinsic subtypes
.
J Clin Oncol
2009
;
27
:
1160
7
.
6.
Caan
BJ
,
Sweeney
C
,
Habel
LA
,
Kwan
ML
,
Kroenke
CH
,
Weltzien
EK
, et al
Intrinsic subtypes from the PAM50 gene expression assay in a population-based breast cancer survivor cohort: prognostication of short- and long-term outcomes
.
Cancer Epidemiol Biomarkers Prev
2014
;
23
:
725
34
.
7.
Coates
AS
,
Winer
EP
,
Goldhirsch
A
,
Gelber
RD
,
Gnant
M
,
Piccart-Gebhart
M
, et al
Tailoring therapies-improving the management of early breast cancer: St. Gallen International Expert Consensus on the primary therapy of early breast cancer 2015
.
Ann Oncol
2015
;
26
:
1533
46
.
8.
Prat
A
,
Parker
JS
,
Karginova
O
,
Fan
C
,
Livasy
C
,
Herschkowitz
JI
, et al
Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer
.
Breast Cancer Res
2010
;
12
:
R68
.
9.
Sabatier
R
,
Finetti
P
,
Guille
A
,
Adelaide
J
,
Chaffanet
M
,
Viens
P
, et al
Claudin-low breast cancers: clinical, pathological, molecular and prognostic characterization
.
Mol Cancer
2014
;
13
:
228
.
10.
Lehmann-Che
J
,
Hamy
AS
,
Porcher
R
,
Barritault
M
,
Bouhidel
F
,
Habuellelah
H
, et al
Molecular apocrine breast cancers are aggressive estrogen receptor negative tumors overexpressing either HER2 or GCDFP15
.
Breast Cancer Res
2013
;
15
:
R37
.
11.
Hu
Z
,
Fan
C
,
Oh
DS
,
Marron
JS
,
He
X
,
Qaqish
BF
, et al
The molecular portraits of breast tumors are conserved across microarray platforms
.
BMC Genomics
2006
;
7
:
96
.
12.
Wallden
B
,
Storhoff
J
,
Nielsen
T
,
Dowidar
N
,
Schaper
C
,
Ferree
S
, et al
Development and verification of the PAM50-based Prosigna breast cancer gene signature assay
.
BMC Med Genomics
2015
;
8
:
54
.
13.
Guiu
S
,
Michiels
S
,
André
F
,
Cortes
J
,
Denkert
C
,
Di Leo
A
, et al
Molecular subclasses of breast cancer: how do we define them? The IMPAKT 2012 working group statement
.
Ann Oncol
2012
;
23
:
2997
3006
.
14.
Tamimi
RM
,
Colditz
GA
,
Hazra
A
,
Baer
HJ
,
Hankinson
SE
,
Rosner
B
, et al
Traditional breast cancer risk factors in relation to molecular subtypes of breast cancer
.
Breast Cancer Res Treat
2012
;
131
:
159
67
.
15.
Cheang
MC
,
Voduc
D
,
Bajdik
C
,
Leung
S
,
McKinney
S
,
Chia
SK
, et al
Basal-like breast cancer defined by five biomarkers has superior prognostic value than triple-negative phenotype
.
Clin Cancer Res
2008
;
14
:
1368
76
.
16.
de Ronde
JJ
,
Hannemann
J
,
Halfwerk
H
,
Mulder
L
,
Straver
ME
,
Vrancken Peeters
MJ
, et al
Concordance of clinical and molecular breast cancer subtyping in the context of preoperative chemotherapy response
.
Breast Cancer Res Treat
2010
;
119
:
119
26
.
17.
Bastien
RR
,
Rodríguez-Lescure
Á
,
Ebbert
MTW
,
Prat
A
,
Munárriz
B
,
Rowe
L
, et al
PAM50 breast cancer subtyping by RT-qPCR and concordance with standard clinical molecular markers
.
BMC Med Genomics
2012
;
5
:
44
.
18.
Maisonneuve
P
,
Disalvatore
D
,
Rotmensz
N
,
Curigliano
G
,
Colleoni
M
,
Dellapasqua
S
, et al
A revised clinico-pathological surrogate definition of luminal A intrinsic breast cancer subtype
.
Breast Cancer Res
2014
;
16
:
R65
.
19.
Cheang
MCU
,
Chia
SK
,
Voduc
D
,
Gao
D
,
Leung
S
,
Snider
J
, et al
Ki67 index, HER2 status, and prognosis of patients with luminal B breast cancer
.
J Natl Cancer Inst
2009
;
101
:
736
50
.
20.
Prat
A
,
Cheang
MCU
,
Martín
M
,
Parker
JS
,
Carrasco
E
,
Caballero
R
, et al
Prognostic significance of progesterone receptor-positive tumor cells within immunohistochemically defined luminal a breast cancer
.
J Clin Oncol
2013
;
31
:
203
9
.
21.
Allott
EH
,
Geradts
J
,
Cohen
SM
,
Khoury
T
,
Zirpoli
GR
,
Bshara
W
, et al
Frequency of breast cancer subtypes among African American women in the AMBER consortium
.
Breast Cancer Res
2018
;
20
:
12
.
22.
Lusa
L
,
McShane
LM
,
Reid
JF
,
De Cecco
L
,
Ambrogi
F
,
Biganzoli
E
, et al
Challenges in projecting clustering results across gene expression-profiling datasets
.
J Natl Cancer Inst
2007
;
99
:
1715
23
.
23.
Gendoo
DMA
,
Ratanasirigulchai
N
,
Schröder
MS
,
Paré
L
,
Parker
JS
,
Prat
A
, et al
Genefu: an R/Bioconductor package for computation of gene expression-based signatures in breast cancer
.
Bioinformatics
2016
;
32
:
1097
9
.
24.
Patil
P
,
Bachant-Winner
PO
,
Haibe-Kains
B
,
Leek
JT
. 
Test set bias affects reproducibility of gene signatures
.
Bioinformatics
2015
;
31
:
2318
23
.
25.
Curtis
C
,
Shah
SP
,
Chin
SF
,
Turashvili
G
,
Rueda
OM
,
Dunning
MJ
, et al
The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups
.
Nature
2012
;
486
:
346
52
.
26.
Zhao
X
,
Rødland
EA
,
Tibshirani
R
,
Plevritis
S
. 
Molecular subtyping for clinically defined breast cancer subgroups
.
Breast Cancer Res
2015
;
17
:
29
.
27.
Huang
X
,
Stern
DF
,
Zhao
H
. 
Transcriptional profiles from paired normal samples offer complementary information on cancer patient survival – evidence from TCGA pan-cancer data
.
Sci Rep
2016
;
6
:
20567
.
28.
Roman-Perez
E
,
Casbas-Hernandez
P
,
Pirone
JR
,
Rein
J
,
Carey
LA
,
Lubet
RA
, et al
Gene expression in extratumoral microenvironment predicts clinical outcome in breast cancer patients
.
Breast Cancer Res
2012
;
14
:
R51
.
29.
Troester
MA
,
Hoadley
KA
,
D'Arcy
M
,
Cherniack
AD
,
Stewart
C
,
Koboldt
DC
, et al
DNA defects, epigenetics, and gene expression in cancer-adjacent breast: a study from The Cancer Genome Atlas
.
NPJ Breast Cancer
2016
;
2
:
16007
.
30.
Casbas-Hernandez
P
,
Sun
X
,
Roman-Perez
E
,
D'Arcy
M
,
Sandhu
R
,
Hishida
A
, et al
Tumor intrinsic subtype is reflected in cancer-adjacent tissue
.
Cancer Epidemiol Biomarkers Prev
2015
;
24
:
406
14
.
31.
Sisti
JS
,
Collins
LC
,
Beck
AH
,
Tamimi
RM
,
Rosner
BA
,
Eliassen
AH
. 
Reproductive risk factors in relation to molecular subtypes of breast cancer: results from the Nurses' Health Studies
.
Int J Cancer
2016
;
138
:
2346
56
.
32.
Tamimi
RM
,
Baer
HJ
,
Marotti
J
,
Galan
M
,
Galaburda
L
,
Fu
Y
, et al
Comparison of molecular phenotypes of ductal carcinoma in situ and invasive breast cancer
.
Breast Cancer Res
2008
;
10
:
R67
.
33.
Wang
J
,
Heng
YJ
,
Eliassen
AH
,
Tamimi
RM
,
Hazra
A
,
Carey
VJ
, et al
Alcohol consumption and breast tumor gene expression
.
Breast Cancer Res
2017
;
19
:
108
.
34.
Nielsen
TO
,
Parker
JS
,
Leung
S
,
Voduc
D
,
Ebbert
M
,
Vickery
T
, et al
A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer
.
Clin Cancer Res
2010
;
16
:
5222
32
.
35.
Collins
LC
,
Marotti
JD
,
Baer
HJ
,
Tamimi
RM
. 
Comparison of estrogen receptor results from pathology reports with results from central laboratory testing
.
J Natl Cancer Inst
2008
;
100
:
218
21
.
36.
Hirko
KA
,
Chen
WY
,
Willett
WC
,
Rosner
BA
,
Hankinson
SE
,
Beck
AH
, et al
Alcohol consumption and risk of breast cancer by molecular subtype: prospective analysis of the Nurses' Health Study after 26 years of follow-up
.
Int J Cancer
2016
;
138
:
1094
101
.
37.
Kuhn
M
. 
Building predictive models in R using the caret package
.
J Stat Softw
2008
;
28
:
1
26
.
38.
Grambsch
PM
,
Therneau
TM
. 
Proportional hazards tests and diagnostics based on weighted residuals
.
Biometrika
1994
;
81
:
515
26
.
39.
Kwan
ML
,
Kroenke
CH
,
Sweeney
C
,
Bernard
PS
,
Weltzien
EK
,
Castillo
A
, et al
Association of high obesity with PAM50 breast cancer intrinsic subtypes and gene expression
.
BMC Cancer
2015
;
15
:
278
.
40.
Barnard
ME
,
Boeke
CE
,
Tamimi
RM
. 
Established breast cancer risk factors and risk of intrinsic tumor subtypes
.
Biochim Biophys Acta
2015
;
1856
:
73
85
.
41.
Liu
MC
,
Pitcher
BN
,
Mardis
ER
,
Davies
SR
,
Friedman
PN
,
Snider
JE
, et al
PAM50 gene signatures and breast cancer prognosis with adjuvant anthracycline- and taxane-based chemotherapy: correlative analysis of C9741 (Alliance)
.
NPJ Breast Cancer
2016
;
2
:
15023
.
42.
Prat
A
,
Cheang
MCU
,
Galván
P
,
Nuciforo
P
,
Paré
L
,
Adamo
B
, et al
Prognostic value of intrinsic subtypes in hormone receptor-positive metastatic breast cancer treated with letrozole with or without lapatinib
.
JAMA Oncol
2016
;
2
:
1287
94
.
43.
Carey
LA
,
Perou
CM
,
Livasy
CA
,
Dressler
LG
,
Cowan
D
,
Conway
K
, et al
Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study
.
JAMA
2006
;
295
:
2492
502
.
44.
Chia
SK
,
Bramwell
VH
,
Tu
D
,
Shepherd
LE
,
Jiang
S
,
Vickery
T
, et al
A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen
.
Clin Cancer Res
2012
;
18
:
4465
72
.

Supplementary data