Abstract
There is substantial variation in breast cancer survival rates, even among patients with similar clinical and genomic profiles. New biomarkers are needed to improve risk stratification and inform treatment options. Our aim was to identify novel miRNAs associated with breast cancer survival and quantify their prognostic value after adjusting for established clinical factors and genomic markers.
Using the Women's Healthy Eating and Living (WHEL) breast cancer cohort with >15 years of follow-up and archived tumor specimens, we assayed PAM50 mRNAs and 25 miRNAs using the Nanostring nCounter platform.
We obtained high-quality reads on 1,253 samples (75% of available specimens) and used an existing research-use algorithm to ascertain PAM50 subtypes and risk scores (ROR-PT). We identified miRNAs significantly associated with breast cancer outcomes and then tested these in independent TCGA samples. miRNAs that were also prognostic in TCGA samples were further evaluated in multiple regression Cox models. We also used penalized regression for unbiased discovery.
Two miRNAs, 210 and 29c, were associated with breast cancer outcomes in the WHEL and TCGA studies and further improved risk stratification within PAM50 risk groups: 10-year survival was 62% in the node-negative high miR-210-high ROR-PT group versus 75% in the low miR-210- high ROR-PT group. Similar results were obtained for miR-29c. We identified three additional miRNAs, 187-3p, 143-3p, and 205-5p, via penalized regression.
Our findings suggest that miRNAs might be prognostic for long-term breast cancer survival and might improve risk stratification. Further research to incorporate miRNAs into existing clinicogenomic signatures is needed.
Introduction
miRNAs are small noncoding RNAs that posttranscriptionally regulate the expression of their target genes (1). miRNAs are involved in cell differentiation, regulation, and apoptosis, and miRNA dysregulation impacts development and progression of many cancers, including breast cancer (2–9). miRNAs can have tumor-suppressing or tumor-promoting roles, and these functions may differ depending on tumor histopathology (8, 9).
In recent years, the potential of dozens of miRNAs for diagnosis, treatment response prediction, and prognosis of breast cancer have been studied (5, 8, 10–14); however, few have been examined in cohorts with long-term follow-up. Currently, there are several well-established clinical prognostic factors for breast cancer survival, such as tumor size, nodal status, grade, and histopathology (15), and a few genomic signatures, for example, Oncotype DX, Mammaprint (16–19), ProSigna (20, 21), and the research-based PAM50 signature (22–24). However, even with the existing clinicogenomic tools, an estimated 25%–35% of patients with breast cancer are classified as intermediate risk (20, 21, 25–27), and for these patients, the long-term risk of relapse remains uncertain. Thus, identifying novel biomarkers such as miRNAs that independently predict breast cancer outcomes could lead to better risk stratification and possibly improve treatment management strategies, potentially sparing aggressive chemotherapy for some, while offering targeted therapies for others.
In this study, we used >1,200 archived tumor samples from the Women's Healthy Eating and Living (WHEL) breast cancer cohort with 15+ years of follow-up (28), to investigate 25 miRNA predictors of long-term disease-free survival and breast cancer mortality. We developed models that included standard clinical factors and a research-use published PAM50 signature (29), as well as new miRNA features. We adopted two statistically rigorous complementary paradigms. First, we used an external validation approach in which we tested the prognostic value of the 25 miRNAs using the publicly available TCGA database (30); miRNAs that were prognostic in the WHEL samples and independently prognostic in TCGA database were considered hits, and evaluated for prognostic value over and above clinical factors and PAM50 subtypes. Second, we implemented an unbiased approach via penalized regression to select the most prognostic features from among clinical factors, PAM50 subtypes, and 25 individual mRNAs. In summary, the first approach is similar to the usual model building methods to evaluate the incremental effects of new candidate predictors, after adjusting for known prognosticators. However, we imposed stringent criteria for inclusion in multivariate models, namely requiring independent validation via TCGA. The second penalized regression method, by including all variables simultaneously rather than incrementally, has the potential to discover new features but avoids overfitting via a penalty term.
Materials and Methods
Study sample
The Women's Healthy Eating and Living (WHEL) Study was a randomized-controlled dietary secondary prevention trial of 3,088 early-stage breast cancer survivors (28, 31). Women were recruited between 1995 and 2000, and were eligible provided they were within 4 years of diagnosis of their primary operable invasive stage I (≥1 cm), stage II, or stage III breast carcinoma (28), were aged 18 to 70 years at diagnosis, and had completed primary treatment for breast cancer. Details on inclusion and exclusion criteria are provided in Pierce and colleagues (2002; ref. 28).
An archival, formalin-fixed, paraffin-embedded (FFPE) block of breast tumor tissue was solicited from local hospitals for each WHEL participant. Blocks were sent to the WHEL Study Coordinating Center, where the Histology Core Lab at the UC San Diego Cancer Center (La Jolla, CA) cut ten unstained slides (5 μm each) and one hematoxylin and eosin (H&E)-stained slide. Each unstained slide was dipped in paraffin for preservation before storage. The study pathologist reviewed each pathology report with the accompanying H&E slide to confirm that invasive tumor was present on the slide that corresponded to the tumor described in the participant's pathology report. On occasion, the block sent to the coordinating center did not contain invasive tumor. In such cases, the unsuitable block was returned, the pathology report was reviewed and another block was requested. FFPE tissue from the primary tumor was available for 56% (n = 1,723) of the WHEL cohort. The final analysis for this investigation was based on 1,253 participants. As the dietary intervention produced no treatment effect (31), we treated the study population as a single cohort. We obtained IRB approval from participating institutions, and written informed consent from all participants, that included consent for genomic analysis.
Study endpoints
In this study, we evaluated two primary outcomes (i) a breast cancer event (locoregional recurrence, metastasis, or contralateral), and (ii) death from breast cancer. We also considered overall survival, that is, death from any cause, as a secondary endpoint. Events were independently adjudicated by two breast oncologists. Carcinoma in situ was not counted as a breast cancer event. The WHEL study ceased active surveillance for cancer events in 2010, since then, deaths were ascertained via annual searches of the National Death Index. Time from diagnosis to a second breast cancer event defined the disease-free survival outcome; time from diagnosis to breast cancer death defined the breast cancer survival outcome. Time-to-event was censored at death (from non–breast cancer causes), last contact, or end of follow-up (2010 for breast cancer events, 2015 for death).
Nucleic acid extraction
The H&E-stained slide was used for histopathologic review and to guide tumor macrodissection of sections from four unstained slides. The unstained slides from samples with ≥40% tumor cellularity were incubated at 65°C for 30 minutes and deparaffinized using Citrisolv (Thermo Fisher Scientific) followed by ethanol wash. Tumor tissues were macrodissected from the slides into RNAse-free microfuge tubes, and nucleic acids isolated using the Qiagen AllPrep FFPE kit (#80234). Manufacturer's instructions were followed with the exception that the proteinase K digestion step was extended to an overnight incubation for the DNA isolation. tRNA and DNA were quantified using the Invitrogen Qubit and corresponding quantification kits. tRNA was used for the miRGE assay (see below). DNA pellets were stored at −80°C for future use.
mRNA and miRNA quantification
Transcript expression was quantified with 250 ng of tRNA using the NanoString nCounter analysis system with a custom miRGE CodeSet, which included probes for 55 PAM50 genes (including 5 housekeepers) and 25 miRNA targets (Supplementary Table S1). The choice of miRNA targets was based on a review of the literature for promising targets at the time of initiation of the WHEL-genomic substudy. In particular, 25 miRNAs (Supplementary Table S1) were identified as prognostic for breast cancer in studies with moderate-to-large sample sizes from published reports (1, 2, 11–14). Probe selection (miRNA) for the custom miRGE Codeset was determined through screening a set of well-characterized breast cancer tumor samples using the Nanostring Human miRNA assay V2.1 panel that included 800 targets. Positive and negative controls were also included, and assay reactions were assembled per manufacturer's specifications (NanoString Technologies).
Statistical approach
Normalization and data processing.
miRNA expression values were normalized using geometric means of the two miRNA positive controls to eliminate assay technical variation, and then subtracting the maximum value of eight mRNA negative controls to eliminate background effect. Expression values were log2-transformed to reduce skewness and quantile normalized to reduce batch effects. As sensitivity analysis, normalization was also conducted using three different sets of housekeepers: the five mRNA housekeepers used to normalize the mRNA values, five most highly expressed miRNAs, and also three putative housekeeper miRNAs with very high expression values and low SDs; results were consistent across normalization methods (32, 33).
Expression of the 50 PAM50 mRNAs were normalized to negative and positive controls, and standardized to five housekeepers, per standard practice (29). The published PAM50 algorithm (29) was used to classify each subject into an intrinsic subtype: Luminal A, Luminal B, basal-like, Her2-enriched, normal-like. Prior to implementing this algorithm, mRNA values were platform-adjusted (34). Risk-of-recurrence (ROR-PT) scores were calculated (29) and categorized into low-, medium-, and high-risk strata.
Individual miRNA and breast cancer outcomes.
Associations between individual miRNAs and breast cancer outcomes were investigated in Cox models with and without adjustment for tumor characteristics (i.e., tumor stage, tumor grade, and age at diagnosis). Cox models with delayed entry were used to account for varying times from cancer diagnosis to study entry. HRs and 95% confidence interval (CI) were computed, and correspond to the increase in (log)-hazard per 1-unit change in log2 of the miRNA value or equivalently the change in (log)-hazard for doubling of the miRNA level. Proportional hazards assumptions were assessed using scaled Schoenfeld residuals. We also used Bonferroni correction for multiple comparisons when evaluating 25 miRNAs, and present the corrected and uncorrected results.
External validation of prognostic miRNA.
To assess reproducibility of WHEL miRNAs findings, we downloaded TCGA miRNA data for patients with primary breast cancer through bioconductor TCGA Biolinks (30). Similar to WHEL samples, TCGA miRNAs were log2-transformed and normalized; delayed entry Cox models were used to assess the association between miRNAs and disease-free survival, time to breast cancer–related death or time to overall survival within the TCGA samples.
Additional prognostic value of TCGA-validated miRNAs in clinicogenomic models.
After adjusting for standard clinical features (i.e., tumor stage, tumor grade, and age at diagnosis), we identified miRNAs that were prognostic in both the WHEL and TCGA analysis. These miRNAs were then added to models that included PAM50 subtypes and standard clinical features in the WHEL cohort. To assess potential clinical utility, Kaplan–Meier curves and score tests were used to examine whether these miRNAs further delineated PAM50 ROR-PT risk categories, using the median miRNA values to further divide the ROR-PT risk strata. Thus, rather than adjustment for multiple comparisons and the attendant loss in power, we implemented a rigorous external validation framework for testing predictive value of the miRNAs over and above PAM50 subtypes.
Unbiased selection and internal validation.
To examine all the prognostic factors on an equal footing and implement unbiased variable selection, we used penalized regression. We included all variables: clinical factors, PAM50 subtypes, and 25 individual miRNAs in the initial model and used a lasso method for variable selection within the Cox regression (35). The tuning parameter λ was chosen by 10-fold cross-validation to minimize model deviance.
In summary, given the moderately large set of miRNAs, we used two complementary methodologic approaches for selecting miRNAs for further multivariate analysis. The first method used external validation, and only those miRNAs that were also prognostic in TCGA samples, were evaluated for their incremental prognostic value, after adjusting for clinical features and PAM50 subtypes. The second approach used penalized regression (lasso) with all variables included in the model and used cross-validation for feature selection, thus implementing stringent variable selection within the WHEL cohort. Furthermore, the penalized approach examines all predictors simultaneously and could potentially discover new features.
Results
FFPE samples
Of the 1,723 FFPE samples, 25% had low tumor cellularity or low RNA content prohibiting further analysis. Gene expression was obtained on 1,291 samples; of these, 38 were eliminated because of outliers or poor quality reads, resulting in a final sample of N = 1,253 for statistical analysis.
Clinical and demographic characteristics
The distribution of demographic and clinical characteristics in the WHEL genomic substudy were similar to those in the parent study (N = 3,088; ref. 31). Women in the substudy were median 50 years at cancer diagnosis, 85% were White, 36% had stage I and 46% stage II tumors, three-quarters had ER+ histopathology, and 16% had triple-negative histopathology (Table 1). There were 303 breast cancer events (locoregional recurrence, metastasis, or contralateral breast cancer), 219 deaths due to breast cancer, and 316 total deaths
Age at breast cancer diagnosis | |
Median (range) | 50 (27–70) |
Race/ethnicity N (%) | |
White | 1,060 (84.6%) |
Black | 45 (3.6%) |
Hispanic | 85 (6.8%) |
Asian | 31 (2.5%) |
Other | 32 (2.6%) |
Stage N (%) | |
I | 453 (36.2%) |
IIA | 432 (34.5%) |
IIB | 144 (11.5%) |
IIIA | 166 (13.2%) |
IIIC | 58 (4.6%) |
Nodal status N (%) | |
Negative | 702 (56%) |
Positive | 551 (44%) |
Tumor size (cm) | |
Mean (SD) | 2.3 (1.44) |
Grade N (%) | |
Poorly differentiated | 497 (39.7%) |
Moderately differentiated | 496 (39.6%) |
Well differentiated | 159 (12.7%) |
Unspecified | 101 (8.1%) |
Histopathology N (%) | |
ER+ | 909 (73.7%) |
PR+ | 809 (66.4%) |
Her2+ | 217 (17.3%) |
Triple negative | 199 (15.9%) |
Years diagnosis to study entry | |
Median (25th, 75th)%-iles | 1.8 (1.03–2.8) |
Chemotherapy and antiestrogen therapy N (%) | |
Yes; yes | 590 (47.1%) |
Yes; no | 314 (25.1%) |
No; yes | 258 (20.6%) |
No; no | 76 (6.1%) |
Yes; unknown | 5 (0.4%) |
No; unknown | 9 (0.7%) |
Outcomes | |
Breast cancer events (N) | 303 |
Disease-free survival (years) | |
Median (25th, 75th)%-iles | 9.5 (6.7–11.3) |
Breast cancer deaths (N) | 219 |
Breast cancer survival (years) | |
Median (25th, 75th)%-iles | 16.8 (15.3–18.2) |
Age at breast cancer diagnosis | |
Median (range) | 50 (27–70) |
Race/ethnicity N (%) | |
White | 1,060 (84.6%) |
Black | 45 (3.6%) |
Hispanic | 85 (6.8%) |
Asian | 31 (2.5%) |
Other | 32 (2.6%) |
Stage N (%) | |
I | 453 (36.2%) |
IIA | 432 (34.5%) |
IIB | 144 (11.5%) |
IIIA | 166 (13.2%) |
IIIC | 58 (4.6%) |
Nodal status N (%) | |
Negative | 702 (56%) |
Positive | 551 (44%) |
Tumor size (cm) | |
Mean (SD) | 2.3 (1.44) |
Grade N (%) | |
Poorly differentiated | 497 (39.7%) |
Moderately differentiated | 496 (39.6%) |
Well differentiated | 159 (12.7%) |
Unspecified | 101 (8.1%) |
Histopathology N (%) | |
ER+ | 909 (73.7%) |
PR+ | 809 (66.4%) |
Her2+ | 217 (17.3%) |
Triple negative | 199 (15.9%) |
Years diagnosis to study entry | |
Median (25th, 75th)%-iles | 1.8 (1.03–2.8) |
Chemotherapy and antiestrogen therapy N (%) | |
Yes; yes | 590 (47.1%) |
Yes; no | 314 (25.1%) |
No; yes | 258 (20.6%) |
No; no | 76 (6.1%) |
Yes; unknown | 5 (0.4%) |
No; unknown | 9 (0.7%) |
Outcomes | |
Breast cancer events (N) | 303 |
Disease-free survival (years) | |
Median (25th, 75th)%-iles | 9.5 (6.7–11.3) |
Breast cancer deaths (N) | 219 |
Breast cancer survival (years) | |
Median (25th, 75th)%-iles | 16.8 (15.3–18.2) |
Prognostic value of individual miRNAs for disease-free survival, breast cancer survival, and overall survival
We examined associations between each of the 25 miRNAs and outcomes, after adjusting for stage, grade, and age at diagnosis (Table 2). The statistically significant results with HR (95% CI) were as follows: three miRNAs were significantly associated with disease-free survival, namely, miRNA 27b-3p: 1.13 (1.01–1.27), miRNA 210-3p: 1.1 (1.01–1.2), and miRNA 143-3p: 1.23 (1.08–1.41); six miRNAs were associated with breast cancer–free survival, namely, miRNA 150-5p: 0.91 (0.84–0.99), miRNA 16-5p: 1.18 (1.02–1.37), miRNA 205-5p: 0.93 (0.88–0.99), miRNA 29c-3p: 1.2 (1.05–1.37), miRNA 27b-3p: 1.2 (1.04–1.38), miRNA 143-3p: 1.28 (1.1–1.51); and four miRNAs were associated with overall survival, namely, miRNA 150-5p: 0.93 (0.87–0.99), miRNA 29c-3p: 1.19 (1.07–1.33), miRNA 187-3p: 1.09 (1.01–1.17), miRNA 143-3p: 1.14 (1–1.3). Of these, the associations of miRNA 143-3p with disease-free and breast cancer survival, and miRNA 29c-3p with overall survival were still significant after Bonferroni correction for 25 miRNA tests. Unadjusted associations between each of the 25 miRNAs and each breast cancer outcome were similar to the adjusted analysis and are displayed in Supplementary Fig. S1.
. | Disease-free survival (# of events 303) . | Breast cancer survival (# of events 219) . | Overall survival (# of events 316) . | |||
---|---|---|---|---|---|---|
miRNA . | HR (95% CI) . | Pb . | HR (95% CI) . | Pb . | HR (95% CI) . | Pb . |
93.5p | 1.07 (0.96–1.19) | 0.202 | 1.1 (0.96–1.25) | 0.169 | 1.04 (0.95–1.15) | 0.381 |
150.5p | 0.94 (0.87–1.01) | 0.072 | 0.91 (0.84–0.99) | 0.025 | 0.93 (0.87–0.99) | 0.033 |
128.3p | 1 (0.94–1.08) | 0.903 | 1.03 (0.94–1.13) | 0.531 | 1.01 (0.94–1.08) | 0.850 |
141.3p | 1.07 (0.97–1.19) | 0.160 | 1.11 (0.99–1.26) | 0.079 | 1.08 (0.98–1.19) | 0.127 |
21.5p | 1.04 (0.93–1.17) | 0.486 | 1.14 (0.98–1.31) | 0.083 | 1.06 (0.94–1.18) | 0.336 |
1246 | 1.12 (0.98–1.28) | 0.101 | 1.03 (0.88–1.2) | 0.722 | 1.06 (0.93–1.22) | 0.353 |
16.5p | 1.11 (0.99–1.25) | 0.082 | 1.18 (1.02–1.37) | 0.028 | 1.12 (1–1.26) | 0.057 |
205.5p | 0.96 (0.91–1.02) | 0.173 | 0.93 (0.88–0.99) | 0.021 | 0.95 (0.9–1) | 0.056 |
342.3p | 0.98 (0.89–1.07) | 0.649 | 1 (0.89–1.12) | 0.949 | 1.01 (0.92–1.1) | 0.862 |
29c.3p | 1.09 (0.97–1.21) | 0.139 | 1.2 (1.05–1.37) | 0.009 | 1.19 (1.07–1.33) | 0.002c |
27b.3p | 1.13 (1.01–1.27) | 0.037 | 1.2 (1.04–1.38) | 0.012 | 1.05 (0.95–1.17) | 0.326 |
187.3p | 1.03 (0.96–1.11) | 0.348 | 1.08 (0.99–1.17) | 0.085 | 1.09 (1.01–1.17) | 0.021 |
26b.5p | 1.03 (0.89–1.2) | 0.695 | 1.08 (0.9–1.3) | 0.393 | 1.05 (0.91–1.22) | 0.508 |
15a.5p | 1.06 (0.92–1.22) | 0.449 | 1.08 (0.9–1.29) | 0.423 | 1.08 (0.93–1.24) | 0.301 |
221.3p | 1.09 (1–1.19) | 0.062 | 1.1 (0.99–1.23) | 0.071 | 1.01 (0.93–1.09) | 0.809 |
494.3p | 0.95 (0.84–1.09) | 0.497 | 0.93 (0.8–1.09) | 0.390 | 0.99 (0.87–1.12) | 0.845 |
30c.5p | 1.05 (0.95–1.16) | 0.374 | 1.01 (0.9–1.14) | 0.868 | 0.99 (0.9–1.08) | 0.743 |
769.5p | 0.97 (0.88–1.06) | 0.489 | 1.01 (0.9–1.13) | 0.873 | 1.01 (0.92–1.11) | 0.830 |
210.3p | 1.1 (1.01–1.2) | 0.027 | 1.06 (0.96–1.18) | 0.226 | 1.04 (0.96–1.13) | 0.375 |
10b.5p | 1.07 (0.98–1.16) | 0.138 | 1.1 (0.99–1.23) | 0.065 | 1.05 (0.96–1.14) | 0.286 |
143.3p | 1.23 (1.08–1.41) | 0.002c | 1.28 (1.1–1.51) | 0.002c | 1.14 (1–1.3) | 0.046 |
7.5p | 1.03 (0.96–1.11) | 0.392 | 0.99 (0.91–1.08) | 0.772 | 1.02 (0.96–1.1) | 0.490 |
200b.3p | 1.04 (0.94–1.15) | 0.426 | 1.08 (0.95–1.22) | 0.250 | 1.08 (0.97–1.2) | 0.142 |
135a.5p | 0.98 (0.94–1.02) | 0.315 | 0.99 (0.94–1.04) | 0.595 | 0.98 (0.94–1.03) | 0.480 |
126.3p | 1.06 (0.95–1.2) | 0.307 | 1.11 (0.96–1.3) | 0.166 | 1.01 (0.9–1.12) | 0.922 |
. | Disease-free survival (# of events 303) . | Breast cancer survival (# of events 219) . | Overall survival (# of events 316) . | |||
---|---|---|---|---|---|---|
miRNA . | HR (95% CI) . | Pb . | HR (95% CI) . | Pb . | HR (95% CI) . | Pb . |
93.5p | 1.07 (0.96–1.19) | 0.202 | 1.1 (0.96–1.25) | 0.169 | 1.04 (0.95–1.15) | 0.381 |
150.5p | 0.94 (0.87–1.01) | 0.072 | 0.91 (0.84–0.99) | 0.025 | 0.93 (0.87–0.99) | 0.033 |
128.3p | 1 (0.94–1.08) | 0.903 | 1.03 (0.94–1.13) | 0.531 | 1.01 (0.94–1.08) | 0.850 |
141.3p | 1.07 (0.97–1.19) | 0.160 | 1.11 (0.99–1.26) | 0.079 | 1.08 (0.98–1.19) | 0.127 |
21.5p | 1.04 (0.93–1.17) | 0.486 | 1.14 (0.98–1.31) | 0.083 | 1.06 (0.94–1.18) | 0.336 |
1246 | 1.12 (0.98–1.28) | 0.101 | 1.03 (0.88–1.2) | 0.722 | 1.06 (0.93–1.22) | 0.353 |
16.5p | 1.11 (0.99–1.25) | 0.082 | 1.18 (1.02–1.37) | 0.028 | 1.12 (1–1.26) | 0.057 |
205.5p | 0.96 (0.91–1.02) | 0.173 | 0.93 (0.88–0.99) | 0.021 | 0.95 (0.9–1) | 0.056 |
342.3p | 0.98 (0.89–1.07) | 0.649 | 1 (0.89–1.12) | 0.949 | 1.01 (0.92–1.1) | 0.862 |
29c.3p | 1.09 (0.97–1.21) | 0.139 | 1.2 (1.05–1.37) | 0.009 | 1.19 (1.07–1.33) | 0.002c |
27b.3p | 1.13 (1.01–1.27) | 0.037 | 1.2 (1.04–1.38) | 0.012 | 1.05 (0.95–1.17) | 0.326 |
187.3p | 1.03 (0.96–1.11) | 0.348 | 1.08 (0.99–1.17) | 0.085 | 1.09 (1.01–1.17) | 0.021 |
26b.5p | 1.03 (0.89–1.2) | 0.695 | 1.08 (0.9–1.3) | 0.393 | 1.05 (0.91–1.22) | 0.508 |
15a.5p | 1.06 (0.92–1.22) | 0.449 | 1.08 (0.9–1.29) | 0.423 | 1.08 (0.93–1.24) | 0.301 |
221.3p | 1.09 (1–1.19) | 0.062 | 1.1 (0.99–1.23) | 0.071 | 1.01 (0.93–1.09) | 0.809 |
494.3p | 0.95 (0.84–1.09) | 0.497 | 0.93 (0.8–1.09) | 0.390 | 0.99 (0.87–1.12) | 0.845 |
30c.5p | 1.05 (0.95–1.16) | 0.374 | 1.01 (0.9–1.14) | 0.868 | 0.99 (0.9–1.08) | 0.743 |
769.5p | 0.97 (0.88–1.06) | 0.489 | 1.01 (0.9–1.13) | 0.873 | 1.01 (0.92–1.11) | 0.830 |
210.3p | 1.1 (1.01–1.2) | 0.027 | 1.06 (0.96–1.18) | 0.226 | 1.04 (0.96–1.13) | 0.375 |
10b.5p | 1.07 (0.98–1.16) | 0.138 | 1.1 (0.99–1.23) | 0.065 | 1.05 (0.96–1.14) | 0.286 |
143.3p | 1.23 (1.08–1.41) | 0.002c | 1.28 (1.1–1.51) | 0.002c | 1.14 (1–1.3) | 0.046 |
7.5p | 1.03 (0.96–1.11) | 0.392 | 0.99 (0.91–1.08) | 0.772 | 1.02 (0.96–1.1) | 0.490 |
200b.3p | 1.04 (0.94–1.15) | 0.426 | 1.08 (0.95–1.22) | 0.250 | 1.08 (0.97–1.2) | 0.142 |
135a.5p | 0.98 (0.94–1.02) | 0.315 | 0.99 (0.94–1.04) | 0.595 | 0.98 (0.94–1.03) | 0.480 |
126.3p | 1.06 (0.95–1.2) | 0.307 | 1.11 (0.96–1.3) | 0.166 | 1.01 (0.9–1.12) | 0.922 |
NOTE: The bold-faced P values are statistically significant at the P <0.05 level.
aAdjusted for tumor stage, tumor grade, age at diagnosis.
bP values are not adjusted for multiple comparisons.
cStatistically significant after Bonferroni adjustment for 25 miRNAs.
External validation with TCGA
We next evaluated the prognostic value of each of the 25 miRNAs for each breast cancer outcome using TCGA database (30). To ensure comparability with WHEL samples, we only included early stage (I, II, III) TCGA breast cancer samples, resulting in a total TCGA sample size of 1,034. The median age at diagnosis of TCGA participants was 58 (range: 26–90) years; 76.4% had stages I and II, 73.8% were ER+, 64% were PR+, 14.9% were Her2+, and 10.4% were triple negative. These characteristics were comparable to the WHEL genomics substudy (Table 1). In TCGA samples, there were 171 breast cancer recurrences, and 123 deaths with 63 known breast cancer–specific deaths.
In TCGA analysis, after adjustment for age and stage, four miRNAs, namely 1246, 135a.2, 210, and 29c, were significantly associated with the three survival outcomes, and miR-342 was associated with breast cancer and overall mortality at the 5% significance level (Supplementary Table S2). Of these, miR-210 and miR-29c were also significantly prognostic in the WHEL cohort (Table 2). Complete results for all miRNAs are presented in Supplementary Table S2 and Supplementary Fig. S1.
Added prognostic value of TCGA-validated miRNAs in the WHEL study
Using the WHEL cohort, we next examined TCGA-validated miRNAs, 210 and 29c, in multiple regression Cox models, after adjusting for clinical factors, and PAM50 subtypes; separate multiple regression models were fit for miR-210 and miR-29c (Table 3). As expected tumor stage was strongly associated with survival outcomes with HRs increasing from 1.26 to 6.26 as stage increased. After adjusting for stage, grade, and age at diagnosis, the luminal B subtype remained a significant prognostic factor with HR >1.5 in all models. The HR (95% CI) for miR-210 was 1.09 [(1–1.2), P = 0.05] for disease-free survival, 1.07 [(0.96–1.19), P = 0.22] for breast cancer survival, and 1.04 [(0.95–1.13), P = 0.4] for overall survival after adjusting for clinical variables and PAM50 subtypes. The HR (95% CI) for miR-29c was 1.08 [(0.95–1.22), P = 0.22] for disease-free survival, 1.17 [(1.01–1.37), P = 0.04] for breast cancer survival, and 1.16 [(1.02–1.31), P = 0.02] for overall survival after adjusting for clinical variables and PAM50 subtypes. Thus, even if not consistently significant at the 5% level, the HRs for these miRNAs were similar to their HRs in the models that did not include PAM50 subtypes (Table 2), suggesting that these miRNAs are likely independent predictors of breast cancer outcomes.
. | Disease-free survival (N = 295 relapse events)a . | Breast cancer survival (N = 212 breast cancer deaths)a . | Overall survival (N = 306 deaths)a . |
---|---|---|---|
Predictors . | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . |
Model for miR-210 | |||
Tumor stage | |||
I (Ref) | 1.0 | 1.0 | 1.0 |
IIA | 1.72 (1.24–2.38) | 1.90 (1.25–2.89) | 1.26 (0.93–1.71) |
IIB | 2.47 (1.67–3.65) | 3.09 (1.92–4.97) | 1.83 (1.26–2.65) |
IIIA | 3.52 (2.46–5.02) | 4.52 (2.92–7) | 2.36 (1.67–3.33) |
IIIC | 4.00 (2.52–6.36) | 6.26 (3.7–10.6) | 3.75 (2.44–5.77) |
PAM50 subtype | |||
Luminal A (Ref) | 1.0 | 1.0 | 1.0 |
Basal | 1.14 (0.79–1.65) | 0.94 (0.6–1.47) | 0.9 (0.61–1.32) |
Her2 | 0.91 (0.60–1.39) | 0.86 (0.52–1.42) | 0.97 (0.64–1.45) |
LuminalB | 1.55 (1.16–2.08) | 1.64 (1.17–2.31) | 1.58 (1.2–2.09) |
MiRNA 210 | 1.09 (1–1.2) | 1.07 (0.96–1.19) | 1.04 (0.95–1.13) |
Model for miR-29c | |||
Tumor stage | |||
I (Ref) | 1.0 | 1.0 | 1.0 |
IIA | 1.73 (1.25–2.4) | 1.87 (1.23–2.85) | 1.24 (0.92–1.68) |
IIB | 2.47 (1.67–3.65) | 3.05 (1.89–4.92) | 1.8 (1.24–2.61) |
IIIA | 3.37 (2.35–4.82) | 4.27 (2.75–6.62) | 2.25 (1.59–3.17) |
IIIC | 3.95 (2.48–6.3) | 6.05 (3.57–10.26) | 3.61 (2.35–5.56) |
PAM50 subtype | |||
LuminalA (Ref) | 1.0 | 1.0 | 1.0 |
Basal | 1.36 (0.92–2.0) | 1.23 (0.77–1.97) | 1.13 (0.75–1.7) |
Her2 | 1.01 (0.66–1.54) | 0.99 (0.6–1.62) | 1.08 (0.72–1.63) |
LuminalB | 1.58 (1.18–2.12) | 1.66 (1.18–2.32) | 1.58 (1.2–2.09) |
MiRNA 29c | 1.08 (0.95–1.22) | 1.17 (1.01–1.37) | 1.16 (1.02–1.31) |
. | Disease-free survival (N = 295 relapse events)a . | Breast cancer survival (N = 212 breast cancer deaths)a . | Overall survival (N = 306 deaths)a . |
---|---|---|---|
Predictors . | HR (95% CI) . | HR (95% CI) . | HR (95% CI) . |
Model for miR-210 | |||
Tumor stage | |||
I (Ref) | 1.0 | 1.0 | 1.0 |
IIA | 1.72 (1.24–2.38) | 1.90 (1.25–2.89) | 1.26 (0.93–1.71) |
IIB | 2.47 (1.67–3.65) | 3.09 (1.92–4.97) | 1.83 (1.26–2.65) |
IIIA | 3.52 (2.46–5.02) | 4.52 (2.92–7) | 2.36 (1.67–3.33) |
IIIC | 4.00 (2.52–6.36) | 6.26 (3.7–10.6) | 3.75 (2.44–5.77) |
PAM50 subtype | |||
Luminal A (Ref) | 1.0 | 1.0 | 1.0 |
Basal | 1.14 (0.79–1.65) | 0.94 (0.6–1.47) | 0.9 (0.61–1.32) |
Her2 | 0.91 (0.60–1.39) | 0.86 (0.52–1.42) | 0.97 (0.64–1.45) |
LuminalB | 1.55 (1.16–2.08) | 1.64 (1.17–2.31) | 1.58 (1.2–2.09) |
MiRNA 210 | 1.09 (1–1.2) | 1.07 (0.96–1.19) | 1.04 (0.95–1.13) |
Model for miR-29c | |||
Tumor stage | |||
I (Ref) | 1.0 | 1.0 | 1.0 |
IIA | 1.73 (1.25–2.4) | 1.87 (1.23–2.85) | 1.24 (0.92–1.68) |
IIB | 2.47 (1.67–3.65) | 3.05 (1.89–4.92) | 1.8 (1.24–2.61) |
IIIA | 3.37 (2.35–4.82) | 4.27 (2.75–6.62) | 2.25 (1.59–3.17) |
IIIC | 3.95 (2.48–6.3) | 6.05 (3.57–10.26) | 3.61 (2.35–5.56) |
PAM50 subtype | |||
LuminalA (Ref) | 1.0 | 1.0 | 1.0 |
Basal | 1.36 (0.92–2.0) | 1.23 (0.77–1.97) | 1.13 (0.75–1.7) |
Her2 | 1.01 (0.66–1.54) | 0.99 (0.6–1.62) | 1.08 (0.72–1.63) |
LuminalB | 1.58 (1.18–2.12) | 1.66 (1.18–2.32) | 1.58 (1.2–2.09) |
MiRNA 29c | 1.08 (0.95–1.22) | 1.17 (1.01–1.37) | 1.16 (1.02–1.31) |
aModels also adjusted for age at diagnosis and grade; patients with a normal PAM50 subtype were excluded.
ROR-PT categories, externally validated miRNA, and survival
To assess potential clinical utility, we tested whether TCGA-validated miRNAs could further discriminate PAM50 ROR-PT risk categories. For this analysis, we dichotomized each miRNA at the median value, and used Kaplan–Meier curves and score statistics to compare survival rates in the ROR-PT by miRNA categories. On the basis of the model results (Table 3), we evaluated miR-210 for disease-free survival, and miR-29c for breast cancer survival. Figure 1 gives the Kaplan–Meier curves for ROR-PT*miRNA categories stratified by nodal status. In the node-negative stratum (Fig. 1, top), miR-210*ROR-PT groups had significantly different survival rates (P < 0.001): women with high miR-210 levels and high ROR-PT scores had notably worse disease-free survival (10-year rate, 62%) compared with those with low miR-210 levels and high ROR-PT scores (10-year survival, 75%). Similar results were observed for the node-positive group (Fig. 1B); although, survival differences across ROR-PT*miR-210 strata were marginally significant (P = 0.06). Interestingly, in the node-positive group, the ROR-PT-high + miR-210-low subgroup had similar survival to the ROR-PT medium-risk group. Thus, miR-210 expression delineated risk for the ROR-PT high-risk category, identifying a subgroup with very poor prognosis.
ROR-PT*miR-29c categories were significantly associated with breast cancer survival for node-negative (P < 0.001) and node-positive strata (P = 0.006). For the node-negative high ROR-PT group (Fig. 1, bottom), low levels of miR-29c were associated with higher survival rates compared with those with high miR-29c levels (10-year survival rate, 91% vs. 72%). Interestingly, for the node-positive ROR-PT medium-risk group, those with low miR-29c levels had similar survival rates to the node-positive low-ROR-PT risk group, suggesting that miR-29c was able to identify a low-risk phenotype within the ROR-PT medium-risk category.
Unbiased selection with internal validation results
Variable selection models were built via lasso-penalized regression to predict disease-free survival, breast cancer–related survival, and overall survival. Candidate predictors included age at diagnosis, tumor stage, tumor grade, PAM50 subtypes, and 25 miRNAs. Tumor stage and PAM50 subtypes were consistently selected across the outcomes. No miRNAs were selected for disease-free survival. For breast cancer survival, the selected miRNAs with HR (95% CI) after adjusting for stage, grade, and PAM50 subtype were as follows: miR-143, 1.31 (1.12–1.54), P < 0.001; miR-205, 0.95 (0.89–1.01), P = 0.09. For overall survival, miR-29c, 1.16 (1.02–1.32), P = 0.02; miR-187, 1.09 (1.01–1.17), P = 0.02 were selected. We note that the 95% CIs do not account for the variable selection and should be interpreted with caution. The estimated HRs, however, do accurately reflect the predictive ability of the miRNA for breast cancer outcomes.
Discussion
Despite major advances in clinicogenomic risk classification tools, there is still substantial variation in breast cancer relapse and survival rates in subgroups with similar risk profiles. New biomarkers are needed to improve risk stratification and inform treatment options, especially for women who are currently classified as having an intermediate-to-high risk of relapse. In this work, we identified two miRNAs, miR-210 and miR-29c, that were associated with breast cancer outcomes, after adjusting for tumor characteristics and the PAM50 molecular subtypes, in a large breast cancer cohort with >15 years of follow-up.
miR 210-3p has been extensively studied and is involved in breast cancer cell migration, proliferation, and invasion (36). Concordant with our findings, miR-210 overexpression was associated with worse prognosis in multiple studies and systematic reviews (4, 5, 37–39) and likely characterizes an aggressive phenotype. The miR-29 family is reported to exhibit both tumor-suppressing and -promoting roles in breast cancer (40–43). In our study, higher levels of miR-29c were associated with worse prognosis, in contrast to published findings in triple-negative breast cancer, where higher miR-29c levels were associated with better prognosis (44–47). However, it is notable that <16% of our sample had triple-negative tumors, which could explain discrepancies. Importantly, we found that both miR-210 and miR-29c were able to improve risk stratification within ROR-PT categories, with miR-210 identifying a very high-risk subgroup within the ROR-PT high-risk stratum, and miR-29c delineating a low-risk group in the ROR-PT medium-risk category. These findings could inform treatment guidelines and research, for example, developing new and targeted therapies for women with high miR-210-high ROR-PT disease, while possibly sparing aggressive therapies for women with low miR-29c-medium ROR-PT disease.
We also implemented an unbiased approach to discover novel markers. To maintain rigor and reduce overfitting, we used internally cross-validated, penalized regression in which all clinical variables, miRNAs, and PAM50 subtypes were included as predictors. This analysis identified a robust, parsimonious prognostic set that included three additional miRNAs: miRNA 187-3p, miRNA 143-3p, and miRNA 205-5p. Higher levels of miRNAs 187-3p and 143-3p, and lower levels of 205-5p were associated with shorter survival times. miR-205, an oncosuppressor, was previously reported to reduce invasion and was associated with better prognosis (5, 10). This is consistent with the marginally significant protective effect (HR = 0.94) for breast cancer survival observed in our study. Also, similar to our results, miR-187 has been associated with breast cancer progression and worse survival (5, 48). Reports on the role of miR-143 in breast cancer are mixed (49), with several laboratory studies indicating a tumor-suppressive effect (50–54), while others suggest tumorigenic effects (55). To our knowledge, our finding that miR-143 is associated with worse prognosis in long-term breast cancer survival after adjusting for clinical factors and PAM50 subtypes is novel and has not been reported previously.
We note that miRNA–outcome associations were not consistently statistically significant across the three outcomes in our WHEL analysis: for example, miR-150 was significantly associated with breast cancer and overall mortality, but not disease-free survival, whereas miR-27b was significantly associated with disease-free and breast cancer survival but not overall survival. While determining reasons for these discrepancies is beyond our scope, we note that the estimated HRs for all three outcomes were in the same direction, with similar effect-sizes in most cases. Also, there were discrepancies between the lists of prognostic miRNAs in the WHEL and TCGA samples. We conjecture that these could be due to different cohort characteristics (although we tried to match on key variables), and assay methodologies. More importantly, by focusing only on features that were prognostic in both cohorts, we required a higher level of replication, which should reduce spurious cohort-specific findings. On the other hand, the penalized regression approach allowed us to discover new features in the WHEL cohort, which would need to be validated in the future. Given that there are not many studies with long-term follow-up that have evaluated miRNAs, we believe that both approaches provide useful and important information and add to the body of literature in this emerging field.
Our study has many strengths. The study sample comprised a large well-characterized clinical cohort with over 15 years follow-up. We obtained high-quality assays using the validated Nanostring platform. In addition, we used rigorous statistical approaches for identifying miRNA hits, model development, and comparison. First, rather than using multiple testing correction, we used stringent external validation, and only considered miRNAs significant if they were also prognostic in the TCGA dataset. This should enhance reproducibility of our results, and reduce the chance of spurious findings. Second, we used cross-validated penalized regression methods for unbiased variable selection in our statistical models, which allows discovery of new features while at the same time reducing overfitting.
There are also limitations in our study. Our study cohort was diagnosed with breast cancer between 1991 and 2000, and did not receive current standard of care: women with Her2+ tumors did not receive adjuvant trastuzumab, and few postmenopausal women received adjuvant aromatase inhibitors. In addition, the average interval from diagnosis to entry into the WHEL Study was 2 years suggesting that women susceptible to early relapse, for example, those with basal tumors, may have been under-represented in the WHEL sample. Selection against the basal subtype, which comprises primarily triple-negative breast cancer, could partially explain why we did not observe a protective effect for miR-29c, a finding reported primarily for triple-negative breast cancer (44).
In summary, using a large breast cancer cohort with >15 years of follow-up, we identified five miRNAs that might be prognostic for breast cancer survival. In addition, our results suggest that miRNAs might identify high (or low) risk groups within PAM50 clinical risk score categories. If replicated in future studies, adding these miRNA targets to existing prognostic tools could lead to improved risk stratification, and ultimately, to better informed treatment decisions and clinical management for patients with breast cancer.
Disclosure of Potential Conflicts of Interest
S.R. Davies has ownership interest (including stocks and patents) in NSTG. B.A. Parker reports receiving a commercial research grant from Genetech, Novartis, Pfizer, and Oncternal; has ownership interest (including stocks and patents) in Merck; and is a consultant/advisory board member for Bioalta and EMD Serono. M.J. Ellis is the founder and has ownership interest (including stocks and patents) in Bioclassifier LLC, and is a consultant/advisory board member for AstraZeneca, Novartis, Abbvie, Pfizer, and Sermonex. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: L. Natarajan, S.R. Davies, B.A. Parker, M.J. Ellis, E.R. Mardis, C.R. Marinac, K. Messer
Development of methodology: L. Natarajan, M. Pu, S.R. Davies, T.L. Vickery, B.A. Parker, E.R. Mardis, K. Messer
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): S.R. Davies, E.R. Mardis, J.P. Pierce, K. Messer
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): L. Natarajan, M. Pu, S.H. Nelson, M.J. Ellis, S.W. Flatt, K. Messer
Writing, review, and/or revision of the manuscript: L. Natarajan, S.R. Davies, T.L. Vickery, S.H. Nelson, E. Pittman, B.A. Parker, M.J. Ellis, E.R. Mardis, C.R. Marinac, J.P. Pierce, K. Messer
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): T.L. Vickery, E. Pittman, M.J. Ellis, S.W. Flatt
Study supervision: L. Natarajan, S.R. Davies, M.J. Ellis, K. Messer
Acknowledgments
This study was supported by NCI, NIH, Award Nos. R01CA166293 (to K. Messer, L. Natarajan, M. Pu, J.P. Pierce, B.A. Parker, S.W. Flatt, E.R. Mardis, M.J. Ellis, S.R. Davies, T.L. Vickery), P30CA023100 (to K. Messer, L. Natarajan, M. Pu, E. Pittman), and F32CA220859 (to C.R. Marinac).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.