Purpose: To identify novel breast cancer subtypes by extracting quantitative imaging phenotypes of the tumor and surrounding parenchyma and to elucidate the underlying biologic underpinnings and evaluate the prognostic capacity for predicting recurrence-free survival (RFS).

Experimental Design: We retrospectively analyzed dynamic contrast–enhanced MRI data of patients from a single-center discovery cohort (n = 60) and an independent multicenter validation cohort (n = 96). Quantitative image features were extracted to characterize tumor morphology, intratumor heterogeneity of contrast agent wash-in/wash-out patterns, and tumor-surrounding parenchyma enhancement. On the basis of these image features, we used unsupervised consensus clustering to identify robust imaging subtypes and evaluated their clinical and biologic relevance. We built a gene expression–based classifier of imaging subtypes and tested their prognostic significance in five additional cohorts with publically available gene expression data but without imaging data (n = 1,160).

Results: Three distinct imaging subtypes, that is, homogeneous intratumoral enhancing, minimal parenchymal enhancing, and prominent parenchymal enhancing, were identified and validated. In the discovery cohort, imaging subtypes stratified patients with significantly different 5-year RFS rates of 79.6%, 65.2%, 52.5% (log-rank P = 0.025) and remained as an independent predictor after adjusting for clinicopathologic factors (HR, 2.79; P = 0.016). The prognostic value of imaging subtypes was further validated in five independent gene expression cohorts, with average 5-year RFS rates of 88.1%, 74.0%, 59.5% (log-rank P from <0.0001 to 0.008). Each imaging subtype was associated with specific dysregulated molecular pathways that can be therapeutically targeted.

Conclusions: Imaging subtypes provide complimentary value to established histopathologic or molecular subtypes and may help stratify patients with breast cancer. Clin Cancer Res; 23(13); 3334–42. ©2017 AACR.

Translational Relevance

Breast cancer is a heterogeneous disease. Biomarkers that stratify patients with clinical relevance are critically needed for precision medicine. Molecular profiling is currently used to stratify breast cancer but is limited by the requirement for invasive biopsy and confounded by intratumor genetic heterogeneity. Conversely, imaging provides a global, unbiased picture of the entire tumor. Using unsupervised consensus clustering of quantitative imaging phenotypes of the tumor and parenchyma, we identify three imaging subtypes in a single-institution cohort and validate them in an independent multi-institutional cohort. Each imaging subtype is associated with distinct molecular pathways with therapeutic implications. Furthermore, we show that imaging subtypes and their gene expression–based classifiers predict patient survival in a discovery cohort and five external validation cohorts, respectively. Our method can potentially stratify patients noninvasively for personalized management of breast cancer.

Breast cancer is routinely divided into several subtypes on the basis of clinical and pathologic factors such as hormone receptor and HER2 status, which are used to determine appropriate therapy and guide clinical decision-making (1, 2). Gene expression profiling of breast cancer has identified 4 intrinsic molecular subtypes (luminal A, luminal B, HER2-enriched, and basal-like), each associated with distinct gene expression patterns, prognosis, and response to treatment (3, 4). However, molecular tests are limited by the requirement for invasive biopsy or surgery, and surrogate subtypes based on routinely measured tumor markers [estrogen receptor (ER), progesterone receptor (PR), HER2] do not fully approximate the intrinsic subtypes. Moreover, recent studies have shown profound intratumor genetic heterogeneity in breast cancer (5, 6), which poses a significant challenge for molecular profiling based on a single biopsy. There is a critical need for reliable, noninvasive prognostic or predictive biomarkers that can aid in patient stratification.

MRI plays an important role in breast cancer management, from the initial diagnosis to the evaluation of therapy response (7–10). MRI is exquisitely sensitive to physiological changes (e.g., blood flow) in the underlying tissue and is well suited for noninvasive characterization of the tumor. One important advantage of MRI over biopsy-derived molecular data is that imaging provides a global, unbiased view of the entire tumor as well as its surrounding tissue. Beyond visual assessment by radiologists, quantitative image analysis may reveal additional useful biomarkers in cancer (11–14). Recent studies have shown that intratumor imaging heterogeneity, which might reflect underlying genetic heterogeneity, is associated with aggressive disease, resistance to chemotherapy, and poor prognosis (15–21). There is also emerging evidence suggesting that imaging characteristics of breast parenchyma are associated with the risk of developing breast cancer, treatment response, long-term disease recurrence, and patient survival (22–24).

The availability of large-scale curated image and gene expression datasets has spurred a significant interest in linking tumor phenotypes at the molecular and tissue (imaging) level (14, 25–34). These studies used a similar study design to identify the correlation of individual imaging features with specific molecular features, such as gene expression, mutation, or predefined molecular subtypes. Significant associations have been identified in breast cancer (31, 32, 34), leading to an improved understanding of the molecular mechanisms behind imaging phenotypes.

In this study, we aim to discover novel breast cancer subtypes by extracting quantitative image phenotypes of the tumor as well as the breast parenchyma for a detailed characterization of the tumor and its invasion into surrounding tissue. By creating an imaging–genomic association map, we show that the imaging subtypes are associated with differing molecular pathways and that patients stratified by the imaging subtypes have distinct prognoses in multiple independent cohorts.

Study design

We aimed to discover novel breast cancer subtypes defined by quantitative imaging features, investigate the prognostic relevance of these imaging subtypes, and explore their underlying biological mechanism. To do this, we carried out this study in 3 phases, as shown in Fig. 1. In phase 1, we independently explored intrinsic imaging subtypes of breast cancer in discovery (single institutional) and validation [(TCGA)] cohorts on the basis of unsupervised clustering of quantitative imaging features and further validated the similarity and reproducibility of the imaging subtypes across these 2 cohorts. See Supplementary Methods for details of the 2 patients' cohorts. In phase 2, the prognostic value of the imaging subtypes was investigated in its ability to stratify recurrence-free survival (RFS) in (i) discovery cohort and (ii) 5 additional gene expression datasets. In phase 3, we explored the biologic mechanism associated with the discovered imaging subtypes through 2 types of pathway analyses. The details of each phase are elaborated in the following sections.

Figure 1.

Flowchart of the overall design of the study. This study contains 3 phases.

Figure 1.

Flowchart of the overall design of the study. This study contains 3 phases.

Close modal

Overview of patient cohorts

The overall study design is shown in Fig. 1. We retrospectively analyzed 2 publicly accessible breast cancer imaging cohorts from The Cancer Imaging Archive (TCIA). Sixty patients from a single institution were used as the imaging subtype discovery cohort, for which both imaging and RFS data were available. Another 96 patients from TCGA were used as the imaging subtype validation cohort, for which both imaging and gene expression data were available. Patient characteristics of these 2 cohorts are summarized in Supplementary Table S1 (see Supplementary Methods for patient selection and imaging protocols). To validate the prognostic relevance of the imaging subtypes, we identified 5 publicly available datasets comprising microarray gene expression profiles for a total of 1,160 patients with breast cancer, for whom the RFS information was available. Among them, the Netherlands Cancer Institute (NKI) dataset (295 patients) was obtained from the NKI homepage (http://ccb.nki.nl/data, accessed April 1, 2016). The other 4 datasets were retrieved from the Gene Expression Omnibus [series accession numbers: GSE1456 (159 patients), GSE25055 (310 patients), GSE25065 (198 patients), and GSE7390 (198 patients)].

Image analysis and feature extraction

We analyzed each patient's dynamic contrast–enhanced (DCE) MR images in 3 steps: (i) tumor and background parenchyma segmentation, (ii) image preprocessing and harmonization, and (iii) quantitative image feature extraction. In step 1, 2 radiologists (G. Cao and X. Sun with 17 and 11 years of experience in breast imaging) blinded to the patients' clinical outcome, that is, RFS, manually delineated the tumor in a slice-by-slice manner on the DCE MRI and reached consensus regarding the tumor contours. The background parenchyma ipsilateral to tumor was automatically segmented using fuzzy c-means (35) and confirmed by both radiologists in consensus (see details in Supplementary Methods). In step 2, the image data were harmonized via a series of imaging processing algorithms, which allows subsequent quantitative image analysis. First, the temporal sequences for the DCE MRI were extracted at 3 time points, including the pre-contrast, early post-contrast (around 2.5 minutes), and late post-contrast (around 7.5 minutes). The N4 bias correction (36) was used to remove the shading artifacts in the MR images. Then each of the 3 sequential MR images was normalized by the average pixel value of breast parenchyma in pre-contrast images. The DCE MR images were resized to an isotropic voxel resolution of 0.8 mm to allow for consistent calculation of image features. In step 3, a set of 110 quantitative image features were extracted to characterize the phenotypes of each tumor and its parenchymal enhancement as well as intratumor heterogeneity. The feature set contains 8 tumor morphologic features, 66 tumor texture features of kinetic maps including wash-in/wash-out and signal enhancement ratio (SER) maps, 4 functional tumor volume features, 10 background parenchymal enhancement features, and 22 tumor-surrounding parenchymal enhancement features. The details of the image features are explained in Supplementary Table S2. The image analysis and feature extraction were implemented with MATLAB (MathWorks).

Imaging subtype discovery and validation

We used unsupervised consensus clustering (37) to discover intrinsic imaging subtypes for the discovery and validation cohorts, respectively. Compared with traditional k-means and hierarchical clustering algorithms, consensus clustering is shown to be more robust and insensitive to random starts and has been widely used to identify biologically meaningful clusters (37). In detail, we used the partition-around-medoids clustering algorithm (38) with the Euclidean distance metric and performed 10,000 bootstraps with 80% item resampling of the quantitative imaging features. We varied the cluster number from 2 to 10 and selected the optimal cluster number that produced the most stable consensus matrices and the most unambiguous cluster assignments across permuted clustering runs (37). The final clusters identified as such correspond to imaging subtypes of breast cancer. Furthermore, the same procedure was independently implemented in the validation cohort to determine and validate the imaging subtypes. The significance analysis of microarrays (SAM) algorithm (39) was used to identify the quantitative image features significantly associated with the identified imaging subtypes. SAM is a permutation-based nonparametric statistical algorithm and designed to identify significantly different variables (imaging features) that are associated with a given trait (imaging subtype). The in-group proportion (IGP) statistic (40) was used to test whether similar imaging subtypes from the discovery cohort existed in the validation cohort. The IGP quantitatively measure the similarity of clusters when defined using training and testing data. If the clusters are identical between 2 datasets, the IGP approaches to 100%. The consensus clustering, SAM algorithm, and IGP statistic were performed in R.

Prognostic significance of the imaging subtypes

We first evaluated the imaging subtypes in terms of their prognostic capacity for predicting RFS in the discovery cohort. Then, we tested the prognostic relevance of the imaging subtypes on 5 additional breast cancer cohorts with gene expression data publically available but no imaging data available. To do this, we first built a gene expression–based imaging subtype classifier using clustered imaging subtype labels and gene expression data (RNA-seq) from the TCGA cohort. Specifically, we performed one-way ANOVA with fixed-effect (41) to first identify genes significantly associated with imaging subtypes (P < 0.05). Then, we trained a nearest shrunken centroid classifier (42) with the preselected genes and validated it using 10-fold cross validation with stratified sampling. We subsequently applied this classifier to the microarray datasets to classify each patient into 1 of the 3 imaging subtypes and evaluated their prognostic value in stratifying RFS. To account for the different dynamic ranges of RNA-seq and DNA microarray data, we performed copula transformation (43) to each gene, respectively, and independently for each dataset so that they were comparable in the classification model (Supplementary Methods).

Identifying molecular pathways associated with the imaging subtypes

We performed 2 types of pathway analyses to elucidate the biologic mechanisms of the imaging findings. First, we used the Gene Set Enrichment Analysis (GSEA) to identify enriched biologic pathways associated with each imaging subtype within the TCGA cohort. The gene expression data of normal breast tissue available for 113 patients were set as the baseline. Then, the samples from the tumor tissue were compared with the paired normal tissue, respectively in each imaging subtype. The KEGG pathway database was used for GSEA. Second, we integrated gene expression and copy number variation (CNV) data with Pathway Recognition Algorithm Using Data Integration on Genomic Models (PARADIGM; ref. 44). The NCI Pathway Interaction Database was used for PARADIGM analysis. See details in Supplementary Methods.

Statistical analysis

We used the Cox proportional hazard model to build survival model to predict RFS. Kaplan–Meier analysis and log-rank test were used to evaluate patient stratification into different survival groups. To adjust for multiple statistical testing, the Benjamini–Hochberg method was used to control the false discovery rate (FDR) on univariate analysis. All statistical tests were 2-sided, with a P < 0.05 or FDR < 0.25 considered statistically significant. The rationale of using a larger threshold for FDR is to increase the likelihood of positive findings. All statistical analyses were performed in R.

Three imaging subtypes emerge in the discovery cohort

On the basis of consensus clustering of the patients' quantitative imaging features characterizing both tumor and surrounding parenchyma, we determined the optimal cluster number to be 3. The 3-cluster solution corresponded to the largest cluster number that induced the smallest incremental change in the area under the under cumulative distribution function (CDF) curves while maximizing consensus within clusters and minimizing the rate of ambiguity in cluster assignments, as shown in Fig. 2A and B. This resulted in 18 patients (30%) in cluster 1, 19 patients (32%) in cluster 2, and 23 patients (38%) in cluster 3 for the discovery cohort.

Figure 2.

Unsupervised consensus clustering of quantitative imaging phenotypes. A and C, Consensus matrices represented as heatmaps for the chosen optimal cluster number (k = 3) for discovery and validation cohorts, respectively. Patient samples are both rows and columns, and consensus values range from 0 (never grouped together) to 1 (always clustered together). The dendrogram above the heatmap illustrates the ordering of patient samples in 3 clusters. B and D, Corresponding relative change in area under the cumulative distribution function (CDF) curves when cluster number changing from k to k + 1. The range of k changed from 2 to 10, and the optimal k = 3.

Figure 2.

Unsupervised consensus clustering of quantitative imaging phenotypes. A and C, Consensus matrices represented as heatmaps for the chosen optimal cluster number (k = 3) for discovery and validation cohorts, respectively. Patient samples are both rows and columns, and consensus values range from 0 (never grouped together) to 1 (always clustered together). The dendrogram above the heatmap illustrates the ordering of patient samples in 3 clusters. B and D, Corresponding relative change in area under the cumulative distribution function (CDF) curves when cluster number changing from k to k + 1. The range of k changed from 2 to 10, and the optimal k = 3.

Close modal

SAM (39) identified quantitative image features significantly associated with each imaging subtype (Supplementary Fig. S1). The boxplot of 4 representative imaging features is shown in Fig. 3A, for which there were significant differences (ANOVA: P < 0.05) across the subtypes. In particular, imaging subtype 1 was characterized by the lowest intratumoral heterogeneity compared with others (Wilcoxon: P < 2.2E-16) and hence was named “homogeneous intratumoral enhancing subtype.” Imaging subtype 2 was characterized by the lowest amount of background parenchymal enhancement (BPE) compared with others (Wilcoxon: P = 0.0002) and was named “minimal parenchymal enhancing subtype,” Compared with subtype 2, subtype 3 was characterized by a higher amount of BPE (Wilcoxon: P = 5.49E-16) and was named “prominent parenchymal enhancing subtype.” These patterns were consistent in the validation cohort (Supplementary Fig. S2). Images of typical patients from 3 subtypes are shown in Fig. 3B.

Figure 3.

A, Selected 4 quantitative imaging features significantly associated with 3 imaging subtypes, including tumor volume, tumor sphericity, tumor homogeneity measured at early enhancement phase, and background parenchymal enhancement (BPE) fraction with percentage enhancement > 0.6. B, Details of analyzing the tumor and BPE for a representative patient from each imaging subtype. The tumor active function was measured and color-coded with signal enhancement ratio (SER). BPE was measured and color coded with percentage enhancement at early enhancement phase.

Figure 3.

A, Selected 4 quantitative imaging features significantly associated with 3 imaging subtypes, including tumor volume, tumor sphericity, tumor homogeneity measured at early enhancement phase, and background parenchymal enhancement (BPE) fraction with percentage enhancement > 0.6. B, Details of analyzing the tumor and BPE for a representative patient from each imaging subtype. The tumor active function was measured and color-coded with signal enhancement ratio (SER). BPE was measured and color coded with percentage enhancement at early enhancement phase.

Close modal

Imaging subtypes are validated in an external multi-institutional cohort

We independently applied the same consensus clustering analysis to the multi-institutional TCGA cohort and determined the optimal cluster number to be 3, as shown in Fig. 2C and D. The in-group proportion (IGP) statistic (40) was used to evaluate the reproducibility of the imaging subtypes across the discovery and validation cohorts. Imaging subtypes 1 and 2 showed a high consistency between the 2 cohorts, with the corresponding IGP values at 82.4% and 92.3%, respectively. On the other hand, imaging subtype 3 was associated with a lower IGP of 60.0%, suggesting a large degree of intertumor phenotypic heterogeneity among this group. All 3 imaging subtypes were statistically significant, with P < 0.01, P < 0.001, and P < 0.05 for each subtype, respectively.

Imaging subtypes are distinct from established breast cancer subtypes

The 3 imaging subtypes were not associated with intrinsic molecular subtypes (Pearson χ2: P = 0.865) or immunohistochemistry (IHC)-based measurements such as ER, PR, HER2 (Pearson χ2: P > 0.05). Every imaging subtype contained all molecular subtypes and all IHC-based subtypes and vice versa (Supplementary Tables S3 and S4). In addition, the imaging subtypes were not associated with the BI-RADS–based characteristics including tumor shape, margin, and internal enhancement patterns (Pearson χ2: P = 0.344, 0.769, 0.432, respectively).

Imaging subtypes stratify patients in terms of RFS in the discovery cohort

We observed significant differences in RFS (log-rank P = 0.025) in the discovery cohort (Fig. 4A), with 5-year RFS rates of 79.6%, 65.2%, 52.5% for subtypes 1, 2, and 3, respectively. On univariate analysis, imaging subtype was a strong predictor of RFS [HR, 2.11; 95% confidence interval (CI), (1.19–3.71); P = 0.01]. On multivariate analysis, imaging subtype was the only independent predictor of RFS (P = 0.016) after adjusting for clinical and pathologic factors, including patients' age, receptor status, histologic type, and lymph node involvement (Table 1).

Figure 4.

Kaplan–Meier curves of RFS stratified by the imaging subtypes. The plots are for (A) the discovery cohort and (B–F) 5 independent validation cohorts, with predicted imaging subtypes via gene expression–based imaging subtype classifiers built in TCGA cohort.

Figure 4.

Kaplan–Meier curves of RFS stratified by the imaging subtypes. The plots are for (A) the discovery cohort and (B–F) 5 independent validation cohorts, with predicted imaging subtypes via gene expression–based imaging subtype classifiers built in TCGA cohort.

Close modal
Table 1.

Univariate and multivariate analyses of the imaging subtype and clinical risk factors for predicting RFS in the discovery cohort

UnivariateMultivariate
PredictorsHR (95% CI)PHR (95% CI)P
Imaging subtypea 2.11 (1.19–3.71) 0.010b 2.79 (1.21–6.44) 0.016b 
Age 0.79 (0.53–1.19) 0.260 0.57 (0.30–1.09) 0.087 
ER 0.60 (0.23–1.55) 0.289 0.57 (0.11–2.95) 0.499 
PR 0.67 (0.25–1.75) 0.409 0.30 (0.06–1.50) 0.143 
HER2 0.75 (0.24–2.29) 0.610 0.49 (0.10–2.34) 0.370 
Histologic typec 0.97 (0.42–2.24) 0.940 0.73 (0.23–2.34) 0.602 
Lymph node status 3.43 (1.16–10.14) 0.026b 3.42 (0.64–18.20) 0.149 
Molecular subtyped 1.49 (0.84–2.65) 0.177 — — 
UnivariateMultivariate
PredictorsHR (95% CI)PHR (95% CI)P
Imaging subtypea 2.11 (1.19–3.71) 0.010b 2.79 (1.21–6.44) 0.016b 
Age 0.79 (0.53–1.19) 0.260 0.57 (0.30–1.09) 0.087 
ER 0.60 (0.23–1.55) 0.289 0.57 (0.11–2.95) 0.499 
PR 0.67 (0.25–1.75) 0.409 0.30 (0.06–1.50) 0.143 
HER2 0.75 (0.24–2.29) 0.610 0.49 (0.10–2.34) 0.370 
Histologic typec 0.97 (0.42–2.24) 0.940 0.73 (0.23–2.34) 0.602 
Lymph node status 3.43 (1.16–10.14) 0.026b 3.42 (0.64–18.20) 0.149 
Molecular subtyped 1.49 (0.84–2.65) 0.177 — — 

aImage subtype was treated as a continuous variable, that is, subtype 1 was coded as 1, subtype 2 was coded as 2, subtype 3 was coded as 3.

bP < 0.05.

cThe invasive ductal carcinoma was coded as 1 and other types as 0.

dMolecular subtype includes luminal, HER2+, and triple-negative, and it was not adjusted in multivariate analysis.

Gene expression–based imaging subtype classifier predicted RFS in 5 independent cohorts

We identified 692 genes whose expression was significantly associated with the imaging subtypes using one-way ANOVA and trained a nearest shrunken centroid classifier (42) to predict imaging subtypes based on the selected genes in the TCGA cohort. This gene expression–based classifier had an accuracy of 90.6%, 75.0%, and 82.3% in predicting each imaging subtype based on 10-fold cross validation. We then applied the classifier to 5 independent microarray datasets, respectively, to assign each patient into 1 of the 3 imaging subtypes. Patient stratification based on the predicted imaging subtypes showed significantly different RFS in all 5 datasets (Fig. 4B–F, log-rank P = 1.16e-6 to 7.92e-3), with average 5-year RFS rates of 88.1%, 74.0%, 59.5% for subtypes 1 to 3, respectively. Furthermore, the patterns of RFS were consistent with those in the discovery cohort (subtypes 1–3 corresponding to favorable, intermediate, and poor prognosis).

Imaging subtypes are associated with distinct molecular pathways

Figure 5A shows the molecular pathways significantly enriched in each imaging subtype (FDR < 0.25) based on GSEA. The number of enriched pathways progressively increased from subtype 1 through subtype 3 (Fig. 5A; Supplementary Table S5). In addition, we combined gene expression and CNV data and computed the pathway activity scores for each imaging subtype using the PARADIGM (44). Again, we observed a clear trend: the number and activity (or inactivity) of dysregulated pathways (FDR < 0.25) progressively increased from subtype 1 through subtype 3 (Fig. 5B).

Figure 5.

A, Stacked Venn plots of the significantly associated (FDR < 25%) KEGG pathways for 3 imaging subtypes with GSEA. B, Pathway activity scores for 3 imaging subtypes. The pathways are from NCI Pathway Interaction Database Pathways, which are significantly (FDR < 25%) associated with 3 imaging subtypes with PARADIGM analysis. The bar length indicates the magnitude of activity score. From 2 independent pathway analyses, we observed consistent pathway dysregulation patterns across imaging subtypes, which might explain the differential prognoses associated with the 3 imaging subtypes.

Figure 5.

A, Stacked Venn plots of the significantly associated (FDR < 25%) KEGG pathways for 3 imaging subtypes with GSEA. B, Pathway activity scores for 3 imaging subtypes. The pathways are from NCI Pathway Interaction Database Pathways, which are significantly (FDR < 25%) associated with 3 imaging subtypes with PARADIGM analysis. The bar length indicates the magnitude of activity score. From 2 independent pathway analyses, we observed consistent pathway dysregulation patterns across imaging subtypes, which might explain the differential prognoses associated with the 3 imaging subtypes.

Close modal

We discovered 3 subtypes of breast cancer on the basis of quantitative imaging phenotypes of the tumor and surrounding parenchymal tissue and validated these imaging subtypes in an independent multi-institutional cohort. We showed that the imaging subtypes were associated with distinct molecular pathways and provided independent prognostic value beyond conventional clinicopathologic factors. Furthermore, by building a gene expression–based imaging subtype classifier, we showed that the imaging subtypes stratified RFS in 5 independent cohorts totaling more than 1,000 patients. These newly identified imaging subtypes and associated findings warrant further validation in large prospective studies, and if successful, may serve as useful biomarkers for personalized management of breast cancer.

Our findings offer an intriguing perspective on the biology of breast cancer. The 3 imaging subtypes shared several disturbances in biologic pathways that are implicated in breast cancer, including DNA damage repair, pyrimidine metabolism, oocyte meiosis, and spliceosome (Fig. 5A). Malfunction of DNA damage repair can lead to mutation and chromosomal instability, a hallmark of oncogenesis and tumor progression (45). Pyrimidine metabolism is a limiting step for DNA replication during mitosis (46). The disturbance of pyrimidine metabolism is consistent with the fact that the antimetabolite drugs such as fluorouracil and capecitabine are often effective treatments for breast cancer (47). In addition, there is evidence showing that the extent of dysregulation of genes involved in the spliceosome correlates with the malignant behavior of breast cancer (48). In subtypes 2 and 3 (but not in subtype 1), the immune pathway of systemic lupus erythematosus was found to be disturbed, which might correspond to the enhanced intratumoral angiogenesis observed in these 2 subtypes (49). In subtype 3 (but not in subtypes 1 or 2), the protein export pathway was disturbed. During tumor progression, cancer cells activate immune infiltrate and endothelial cells to increase the secretion and export of proteins into the tumor microenvironment, such as cytokines and angiogenesis factors that promote tumor growth (50). The disturbed pathways in subtypes 2 and/or 3 could be the biologic reasons for their distinct imaging phenotypes and might explain their differential prognoses. Our study thus provides novel insight into the biologic processes associated with the aggressiveness of breast cancers.

Our results show that imaging phenotypes may be used to infer dysregulated molecular pathways that can be targeted (Fig. 5B). For example, imaging subtype 3 was characterized by the upregulation of the c-Met, PI3K, and mTOR pathways. Targeted therapies inhibiting these pathways are being actively tested in clinical trials (51–53). Such therapies might prove to be most effective in subtype 3, which had the worst prognosis, whereas in subtype 2, they might prove not be effective due to inactivation of the corresponding pathways. Compared with subtypes 1 and 2, subtype 3 had reduced reproducibility across discovery and validation cohorts. This might reflect greater phenotypical diversity or intertumor heterogeneity among this group, implying that combinatory therapies targeting multiple pathways may be needed for subtype 3.

On the basis of 2 independent pathway analyses (Fig. 5A and B), we showed that the number and activity of dysregulated biologic pathways had similar patterns across imaging subtypes, which was consistent with their differential prognoses. Our findings support the potential of imaging analysis to inform clinical trial design and ultimately to help guide precision therapy of breast cancer.

The proposed image-based subtyping overcomes several key challenges of current approaches for breast cancer classification. Traditional analysis requires molecular profiling of the tumor sampled in a small biopsy, which is limited by intratumor genetic heterogeneity (5, 6, 54, 55) and other confounders such as normal tissue contamination, making downstream class discovery susceptible to sampling errors. In contrast, imaging provides a complete, unbiased picture of the tumor. Thus, subtypes defined by quantitative imaging phenotypes may be more reliable. Our results show that imaging subtypes were distinct from established IHC-based or molecular subtypes, suggesting that imaging could provide complementary prognostic information. One key issue with image-based markers is that the scanner and acquisition protocols such as MR field strengths are often quite heterogeneous, and these variations could limit the study power especially in multi-institutional retrospective studies. Appropriate standardization and harmonization of imaging data are critical to ensure more reliable results. Another important advantage of our approach is that imaging is often used clinically for treatment response evaluation and long-term follow-up (9, 13). This opens the door to noninvasive disease monitoring using imaging subtypes as surrogate markers of underlying molecular activity, which would be far more tolerable to patients than frequent invasive biopsies.

Our work represents a major shift in direction from traditional imaging genomics studies. Instead of finding imaging features associated with predefined genomic properties, here we started with an extensive characterization of imaging phenotypes and applied unsupervised clustering for subtype discovery. A previous study used a similar approach to identify imaging subtypes for glioblastoma multiforme (GBM; ref. 56). Beside the apparent differences in cancer types (GBM vs. breast cancer) and imaging modalities (anatomical vs. functional MRI), there are several important strengths of our study: (i) our image analysis extended beyond the tumor and included background breast parenchyma, thus providing a more detailed imaging characterization of tumor invasion into surrounding tissue and (ii) we built gene expression–based imaging subtype classifiers and validated the prognostic significance of these subtypes in multiple independent cohorts. One recent breast cancer radiogenomic study (32) aimed to identify the genomic underpinnings associated with individual MRI-based radiomic features. On the other hand, our study focused on discovering clinically relevant breast cancer subtypes based on imaging phenotypes, which can be directly used to stratify patients. While our radiogenomic study included molecular features such as gene expression and CNV, it may be beneficial to incorporate other types of -omic data (32) such as genetic mutation, miRNA expression, and protein expression, which could provide a more complete picture of the molecular characteristics. Similar to previous radiogenomic studies, our work identified correlative but not necessarily causative relations between imaging phenotypes and molecular features. To mechanistically validate these imaging genomic associations, experimental validation using a preclinical knock-out model will be required, and this warrants further investigation in future studies.

We have identified 3 breast cancer subtypes on the basis of quantitative imaging phenotypes of the tumor and surrounding tissue. The 3 imaging subtypes reflect distinct underlying molecular pathways and are associated with significantly different survival. This work may serve as the basis for future prospective studies to evaluate the imaging subtypes as potential biomarkers for precision medicine.

No potential conflicts of interest were disclosed.

Conception and design: J. Wu, Y. Cui, R. Li

Development of methodology: J. Wu, Y. Cui, R. Li

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): J. Wu, Y. Cui

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Wu, Y. Cui, X. Sun, G. Cao, B. Li, D.M. Ikeda

Writing, review, and/or revision of the manuscript: J. Wu, Y. Cui, D.M. Ikeda, A.W. Kurian

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J. Wu, Y. Cui, D.M. Ikeda

Study supervision: J. Wu, R. Li

The authors thank TCIA for providing the breast cancer cases enrolled in TCGA study and the UCSF study.

This research was partially supported by the NIH grant number R01 CA193730.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Hammond
ME
,
Hayes
DF
,
Dowsett
M
,
Allred
DC
,
Hagerty
KL
,
Badve
S
, et al
American Society of Clinical Oncology/College Of American Pathologists guideline recommendations for immunohistochemical testing of estrogen and progesterone receptors in breast cancer
.
J Clin Oncol
2010
;
28
:
2784
95
.
2.
Harris
LN
,
Ismaila
N
,
McShane
LM
,
Andre
F
,
Collyar
DE
,
Gonzalez-Angulo
AM
, et al
Use of biomarkers to guide decisions on adjuvant systemic therapy for women with early-stage invasive breast cancer: American Society of Clinical Oncology clinical practice guideline
.
J Clin Oncol
2016
;
34
:
1134
50
.
3.
Sotiriou
C
,
Pusztai
L
. 
Gene-expression signatures in breast cancer
.
New Engl J Med
2009
;
360
:
790
800
.
4.
Voduc
KD
,
Cheang
MC
,
Tyldesley
S
,
Gelmon
K
,
Nielsen
TO
,
Kennecke
H
. 
Breast cancer subtypes and the risk of local and regional relapse
.
J Clin Oncol
2010
;
28
:
1684
91
.
5.
Barry
WT
,
Kernagis
DN
,
Dressman
HK
,
Griffis
RJ
,
J'Vonne
DH
,
Olson
JA
, et al
Intratumor heterogeneity and precision of microarray-based predictors of breast cancer biology and clinical outcome
.
J Clin Oncol
2010
;
28
:
2198
206
.
6.
Zardavas
D
,
Irrthum
A
,
Swanton
C
,
Piccart
M
. 
Clinical management of breast cancer heterogeneity
.
Nat Rev Clin Oncol
2015
;
12
:
381
94
.
7.
Hylton
N
. 
Dynamic contrast-enhanced magnetic resonance imaging as an imaging biomarker
.
J Clin Oncol
2006
;
24
:
3293
8
.
8.
Bhooshan
N
,
Giger
ML
,
Jansen
SA
,
Li
H
,
Lan
L
,
Newstead
GM
. 
Cancerous breast lesions on dynamic contrast-enhanced MR images: computerized characterization for image-based prognostic markers
.
Radiology
2010
;
254
:
680
90
.
9.
Hylton
NM
,
Blume
JD
,
Bernreuter
WK
,
Pisano
ED
,
Rosen
MA
,
Morris
EA
, et al
Locally advanced breast cancer: MR imaging for prediction of response to neoadjuvant chemotherapy-results from ACRIN 6657/I-SPY TRIAL
.
Radiology
2012
;
263
:
663
72
.
10.
Li
X
,
Abramson
RG
,
Arlinghaus
LR
,
Kang
H
,
Chakravarthy
AB
,
Abramson
VG
, et al
Multiparametric magnetic resonance imaging for predicting pathological response after the first cycle of neoadjuvant chemotherapy in breast cancer
.
Invest Radiol
2015
;
50
:
195
204
.
11.
Yankeelov
TE
,
Mankoff
DA
,
Schwartz
LH
,
Lieberman
FS
,
Buatti
JM
,
Mountz
JM
, et al
Quantitative imaging in cancer clinical trials
.
Clin Cancer Res
2016
;
22
:
284
90
.
12.
Gillies
RJ
,
Kinahan
PE
,
Hricak
H
. 
Radiomics: images are more than pictures, they are data
.
Radiology
2016
;
278
:
563
77
.
13.
Hylton
NM
,
Gatsonis
CA
,
Rosen
MA
,
Lehman
CD
,
Newitt
DC
,
Partridge
SC
, et al
Neoadjuvant chemotherapy for breast cancer: functional tumor volume by MR imaging predicts recurrence-free survival—results from the ACRIN 6657/CALGB 150007 I-SPY 1 TRIAL
.
Radiology
2016
;
279
:
44
55
.
14.
Aerts
HJ
,
Velazquez
ER
,
Leijenaar
RT
,
Parmar
C
,
Grossmann
P
,
Cavalho
S
, et al
Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach
.
Nat Commun
2014
;
5
:
4006
.
15.
O'Connor
JP
,
Rose
CJ
,
Waterton
JC
,
Carano
RA
,
Parker
GJ
,
Jackson
A
. 
Imaging intratumor heterogeneity: role in therapy response, resistance, and clinical outcome
.
Clin Cancer Res
2015
;
21
:
249
57
.
16.
Fehr
D
,
Veeraraghavan
H
,
Wibmer
A
,
Gondo
T
,
Matsumoto
K
,
Vargas
HA
, et al
Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images
.
Proc Natl Acad Sci
2015
;
112
:
E6265
E73
.
17.
Agner
SC
,
Rosen
MA
,
Englander
S
,
Tomaszewski
JE
,
Feldman
MD
,
Zhang
P
, et al
Computerized image analysis for identifying triple-negative breast cancers and differentiating them from other molecular subtypes of breast cancer on dynamic contrast-enhanced MR images: a feasibility study
.
Radiology
2014
;
272
:
91
9
.
18.
Sutton
EJ
,
Oh
JH
,
Dashevsky
BZ
,
Veeraraghavan
H
,
Apte
AP
,
Thakur
SB
, et al
Breast cancer subtype intertumor heterogeneity: MRI‐based features predict results of a genomic assay
.
J Magn Reson Imaging
2015
;
42
:
1398
406
.
19.
Parikh
J
,
Selmi
M
,
Charles-Edwards
G
,
Glendenning
J
,
Ganeshan
B
,
Verma
H
, et al
Changes in primary breast cancer heterogeneity may augment midtreatment mr imaging assessment of response to neoadjuvant chemotherapy
.
Radiology
2014
;
272
:
100
12
.
20.
Win
T
,
Miles
KA
,
Janes
SM
,
Ganeshan
B
,
Shastry
M
,
Endozo
R
, et al
Tumor heterogeneity and permeability as measured on the CT component of PET/CT predict survival in patients with non-small cell lung cancer
.
Clin Cancer Res
2013
;
19
:
3591
9
.
21.
Wu
J
,
Gong
G
,
Cui
Y
,
Li
R
. 
Intratumor partitioning and texture analysis of dynamic contrast‐enhanced (DCE)‐MRI identifies relevant tumor subregions to predict pathological response of breast cancer to neoadjuvant chemotherapy
.
J Magn Reson Imaging
2016
;
44
:
1107
1115
.
22.
King
V
,
Brooks
JD
,
Bernstein
JL
,
Reiner
AS
,
Pike
MC
,
Morris
EA
. 
Background parenchymal enhancement at breast MR imaging and breast cancer risk
.
Radiology
2011
;
260
:
50
60
.
23.
Dontchos
BN
,
Rahbar
H
,
Partridge
SC
,
Korde
LA
,
Lam
DL
,
Scheel
JR
, et al
Are qualitative assessments of background parenchymal enhancement, amount of fibroglandular tissue on MR images, and mammographic density associated with breast cancer risk?
Radiology
2015
;
276
:
371
80
.
24.
van der Velden
BHM
,
Dmitriev
I
,
Loo
CE
,
Pijnappel
RM
,
Gilhuijs
KGA
. 
Association between parenchymal enhancement of the contralateral breast in dynamic contrast-enhanced MR imaging and outcome of patients with unilateral invasive breast cancer
.
Radiology
2015
;
276
:
675
85
.
25.
Burnside
ES
,
Drukker
K
,
Li
H
,
Bonaccio
E
,
Zuley
M
,
Ganott
M
, et al
Using computer-extracted image phenotypes from tumors on breast magnetic resonance imaging to predict breast cancer pathologic stage
.
Cancer
2016
;
122
:
748
57
.
26.
Li
H
,
Zhu
Y
,
Burnside
ES
,
Drukker
K
,
Hoadley
KA
,
Fan
C
, et al
MR imaging radiomics signatures for predicting the risk of breast cancer recurrence as given by research versions of mammaprint, oncotype DX, and PAM50 gene assays
.
Radiology
2016
;
281
:
382
91
.
27.
Segal
E
,
Sirlin
CB
,
Ooi
C
,
Adler
AS
,
Gollub
J
,
Chen
X
, et al
Decoding global gene expression programs in liver cancer by noninvasive imaging
.
Nat Biotechnol
2007
;
25
:
675
80
.
28.
Diehn
M
,
Nardini
C
,
Wang
DS
,
McGovern
S
,
Jayaraman
M
,
Liang
Y
, et al
Identification of noninvasive imaging surrogates for brain tumor gene-expression modules
.
P Natl Acad Sci USA
2008
;
105
:
5213
8
.
29.
Gevaert
O
,
Xu
JJ
,
Hoang
CD
,
Leung
AN
,
Xu
Y
,
Quon
A
, et al
Non-Small cell lung cancer: identifying prognostic imaging biomarkers by leveraging public gene expression microarray data-methods and preliminary results
.
Radiology
2012
;
264
:
387
96
.
30.
Nair
VS
,
Gevaert
O
,
Davidzon
G
,
Napel
S
,
Graves
EE
,
Hoang
CD
. 
Prognostic PET F-18-FDG uptake imaging features are associated with major oncogenomic alterations in patients with resected non-small cell lung cancer (vol 72, pg 3725, 2012)
.
Cancer Res
2012
;
72
:
4870
1
.
31.
Yamamoto
S
,
Han
W
,
Kim
Y
,
Du
LT
,
Jamshidi
N
,
Huang
DS
, et al
Breast cancer: radiogenomic biomarker reveals associations among dynamic contrast-enhanced MR imaging, long noncoding RNA, and metastasis
.
Radiology
2015
;
275
:
384
92
.
32.
Zhu
Y
,
Li
H
,
Guo
W
,
Drukker
K
,
Lan
L
,
Giger
ML
, et al
Deciphering genomic underpinnings of quantitative MRI-based radiomic phenotypes of invasive breast carcinoma
.
Sci Rep
2015
;
5:17787
.
33.
Yamamoto
S
,
Huang
D
,
Du
L
,
Korn
RL
,
Jamshidi
N
,
Burnette
BL
, et al
Radiogenomic analysis demonstrates associations between 18F-fluoro-2-deoxyglucose PET, prognosis, and epithelial-mesenchymal transition in non–small cell lung cancer
.
Radiology
2016
;
280
:
261
70
.
34.
Yamamoto
S
,
Maki
DD
,
Korn
RL
,
Kuo
MD
. 
Radiogenomic analysis of breast cancer using MRI: a preliminary study to define the landscape
.
Am J Roentgenol
2012
;
199
:
654
63
.
35.
Chen
W
,
Giger
ML
,
Bick
U
. 
A fuzzy C-means (FCM)-based approach for computerized segmentation of breast lesions in dynamic contrast-enhanced MR images 1
.
Acad Radiol
2006
;
13
:
63
72
.
36.
Tustison
NJ
,
Avants
BB
,
Cook
PA
,
Zheng
Y
,
Egan
A
,
Yushkevich
PA
, et al
N4ITK: improved N3 bias correction
.
IEEE Transact Med Imaging
2010
;
29
:
1310
20
.
37.
Monti
S
,
Tamayo
P
,
Mesirov
J
,
Golub
T
. 
Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data
.
Machine learning
2003
;
52
:
91
118
.
38.
Reynolds
AP
,
Richards
G
,
de la Iglesia
B
,
Rayward-Smith
VJ
. 
Clustering rules: a comparison of partitioning and hierarchical clustering algorithms
.
J Math Model Algorithms
2006
;
5
:
475
504
.
39.
Tusher
VG
,
Tibshirani
R
,
Chu
G
. 
Significance analysis of microarrays applied to the ionizing radiation response
.
Proc Natl Acad Sci
2001
;
98
:
5116
21
.
40.
Kapp
AV
,
Tibshirani
R
. 
Are clusters found in one dataset present in another dataset?
Biostatistics
2007
;
8
:
9
31
.
41.
Wilcox
RR
. 
Adjusting for unequal variances when comparing means in one-way and two-way fixed effects ANOVA models
.
J Educ Behav Stat
1989
;
14
:
269
78
.
42.
Tibshirani
R
,
Hastie
T
,
Narasimhan
B
,
Chu
G
. 
Diagnosis of multiple cancer types by shrunken centroids of gene expression
.
Proc Natl Acad Sci
2002
;
99
:
6567
72
.
43.
Liu
H
,
Lafferty
J
,
Wasserman
L
. 
The nonparanormal: Semiparametric estimation of high dimensional undirected graphs
.
J Mach Learn Res
2009
;
10
:
2295
328
.
44.
Vaske
CJ
,
Benz
SC
,
Sanborn
JZ
,
Earl
D
,
Szeto
C
,
Zhu
J
, et al
Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM
.
Bioinformatics
2010
;
26
:
i237
ii45
.
45.
Hanahan
D
,
Weinberg
RA
. 
Hallmarks of cancer: the next generation
.
Cell
2011
;
144
:
646
74
.
46.
Foekens
JA
,
Romain
S
,
Look
MP
,
Martin
P-M
,
Klijn
JG
. 
Thymidine kinase and thymidylate synthase in advanced breast cancer: response to tamoxifen and chemotherapy
.
Cancer Res
2001
;
61
:
1421
5
.
47.
Longley
DB
,
Harkin
DP
,
Johnston
PG
. 
5-fluorouracil: mechanisms of action and clinical strategies
.
Nat Rev Cancer
2003
;
3
:
330
8
.
48.
André
F
,
Michiels
S
,
Dessen
P
,
Scott
V
,
Suciu
V
,
Uzan
C
, et al
Exonic expression profiling of breast cancer and benign lesions: a retrospective analysis
.
Lancet Oncol
2009
;
10
:
381
90
.
49.
Barbera-Guillem
E
,
Nyhus
JK
,
Wolford
CC
,
Friece
CR
,
Sampsel
JW
. 
Vascular endothelial growth factor secretion by tumor-infiltrating macrophages essentially supports tumor angiogenesis, and IgG immune complexes potentiate the process
.
Cancer Res
2002
;
62
:
7042
9
.
50.
Wang
M
,
Kaufman
RJ
. 
The impact of the endoplasmic reticulum protein-folding environment on cancer development
.
Nat Rev Cancer
2014
;
14
:
581
97
.
51.
Baselga
J
,
Campone
M
,
Piccart
M
,
Burris
HA
 III
,
Rugo
HS
,
Sahmoud
T
, et al
Everolimus in postmenopausal hormone-receptor–positive advanced breast cancer
.
New Engl J Med
2012
;
366
:
520
9
.
52.
Bachelot
T
,
Bourgier
C
,
Cropet
C
,
Ray-Coquard
I
,
Ferrero
J-M
,
Freyer
G
, et al
Randomized phase II trial of everolimus in combination with tamoxifen in patients with hormone receptor–positive, human epidermal growth factor receptor 2–negative metastatic breast cancer with prior exposure to aromatase inhibitors: A GINECO study
.
J Clin Oncol
2012
;
30
:
2718
24
.
53.
Gherardi
E
,
Birchmeier
W
,
Birchmeier
C
,
Woude
GV
. 
Targeting MET in cancer: rationale and progress
.
Nat Rev Cancer
2012
;
12
:
89
103
.
54.
Martelotto
LG
,
Ng
CK
,
Piscuoglio
S
,
Weigelt
B
,
Reis-Filho
JS
. 
Breast cancer intra-tumor heterogeneity
.
Breast Cancer Res
2014
;
16
:
1
.
55.
Shipitsin
M
,
Campbell
LL
,
Argani
P
,
Weremowicz
S
,
Bloushtain-Qimron
N
,
Yao
J
, et al
Molecular definition of breast tumor heterogeneity
.
Cancer Cell
2007
;
11
:
259
73
.
56.
Itakura
H
,
Achrol
AS
,
Mitchell
LA
,
Loya
JJ
,
Liu
T
,
Westbroek
EM
, et al
Magnetic resonance image features identify glioblastoma phenotypic subtypes with distinct molecular pathway activities
.
Sci Transl Med
2015
;
7
:
303ra138
.