Abstract
Purpose: To identify MRI-based radiomics as prognostic factors in patients with advanced nasopharyngeal carcinoma (NPC).
Experimental Design: One-hundred and eighteen patients (training cohort: n = 88; validation cohort: n = 30) with advanced NPC were enrolled. A total of 970 radiomics features were extracted from T2-weighted (T2-w) and contrast-enhanced T1-weighted (CET1-w) MRI. Least absolute shrinkage and selection operator (LASSO) regression was applied to select features for progression-free survival (PFS) nomograms. Nomogram discrimination and calibration were evaluated. Associations between radiomics features and clinical data were investigated using heatmaps.
Results: The radiomics signatures were significantly associated with PFS. A radiomics signature derived from joint CET1-w and T2-w images showed better prognostic performance than signatures derived from CET1-w or T2-w images alone. One radiomics nomogram combined a radiomics signature from joint CET1-w and T2-w images with the TNM staging system. This nomogram showed a significant improvement over the TNM staging system in terms of evaluating PFS in the training cohort (C-index, 0.761 vs. 0.514; P < 2.68 × 10−9). Another radiomics nomogram integrated the radiomics signature with all clinical data, and thereby outperformed a nomogram based on clinical data alone (C-index, 0.776 vs. 0.649; P < 1.60 × 10−7). Calibration curves showed good agreement. Findings were confirmed in the validation cohort. Heatmaps revealed associations between radiomics features and tumor stages.
Conclusions: Multiparametric MRI-based radiomics nomograms provided improved prognostic ability in advanced NPC. These results provide an illustrative example of precision medicine and may affect treatment strategies. Clin Cancer Res; 23(15); 4259–69. ©2017 AACR.
Radiomics, an emerging field, uses quantitative features of medical imaging to provide information on tumor physiology. Here, we used features of MR images as prognostic factors in patients with advanced nasopharyngeal carcinoma. Radiomics-based prognostic nomograms were developed based on a training cohort (n = 88) using machine learning methods. The nomograms were then tested in a validation cohort (n = 30). We found that prognostic ability became higher when we added radiomics features to the TNM staging system and to clinical factors. Practically speaking, radiomics-based nomograms are an especially promising approach to personalized medicine because they are noninvasive and relatively low cost. Our study provides an illustrative example of this promise, and may affect treatment strategies for patients with advanced nasopharyngeal carcinoma.
Introduction
Nasopharyngeal carcinoma (NPC) is a rather common malignant tumor among Asians, especially the South China (1). Radiotherapy is regarded as the standard treatment for patients with NPC. Although various improvements have been achieved in radiotherapy technology and equipment, the 5-year survival rate of patients with NPC remains around 50% (2). Patients with advanced NPC have a poorer prognosis, with 5-year survival rates ranging from 10% to 40% (2, 3). Unfortunately, approximately 70%–80% of patients with NPC have locoregionally advanced disease at diagnosis (4, 5). The main causes of treatment failure are locoregional recurrences and distant metastasis (6). Pretreatment identification of recurrence and distant metastasis in patients with advanced NPC is crucial to identify the prognosis and make decisions regarding treatment. If poor survival can be identified before treatment, then this can help to determine whether more aggressive treatments should be administered, for example by increasing cycles, or by using of adjuvant and/or induction chemotherapy, or by using gemcitabine plus cisplatin instead of fluorouracil plus cisplatin as the standard first-line treatment option (7, 8).
Although the tumor-node-metastasis (TNM) staging system for NPC plays a vital role in predicting prognosis and facilitate treatment stratification, it may not be sufficiently precise (9). Beyond traditional prediction strategies, some recent studies reported that various clinical risk factors, such as hemoglobin, lactate dehydrogenase level, neutrophil-to-lymphocyte ratio, and platelet counts, were associated with poor survival (10–13). However, the clinical utility of these factors was limited and unclear. Therefore, new tools are urgently needed to identify patients who are at risk of having a poor prognosis.
Radiomics, an emerging and promising field, hypothesizes that medical imaging can provide crucial information regarding tumor physiology (14–16). Increasing numbers of pattern recognition tools and dataset sizes have facilitated the development of processes for radiomics (17). By converting medical images into high-dimensional, mineable, and quantitative imaging features via high-throughput extraction of data-characterization algorithms, radiomics methods provide an unprecedented opportunity to improve decision-support in oncology at low cost and noninvasively (14, 17). Some previous studies have shown that biomarkers based on quantitative radiomics features are associated with clinical prognosis and underlying genomic patterns across a range of cancer types (18–22).
Recently, the most widely used imaging modality in radiomics research has been CT, which can quantify tissue density (14). However, unlike CT, MRI can detect tumor density and reflect physiologic characteristics of tumors (23). In addition, MRI provides better tissue contrast, has multiplanar capacity, and exhibits fewer artifacts from radiation and bone beam hardening, which allows tumor margins to be delineated more accurately (24). To our knowledge, no published study has determined whether the prognosis of NPC could be evaluated by a radiomics approach based on multiparametric MRI.
Thus, in this study, we developed and validated multiparametric MRI-based radiomics as a novel approach for providing individualized, pretreatment evaluation of progression-free survival in patients with advanced NPC (stage III–IVb). In addition, we sought to reveal association between radiomics features and clinical data.
Patients and Methods
A predefined hypothesis and rationale for sample size
Predefined hypothesis.
The Cox proportional hazards regression model can be used to improve the prognosis of the PFS of patients with advanced NPC.
Sample size.
Small sample size will increase both the type-I (incorrectly detecting a difference) and type-II (not detecting an actual difference) error rates. To generate accurate estimates of the impact of the depended variables, an adequate number of events per variable is required. For the training sample size, Chalkidou and colleagues proposed that for linear models, like multiple regression, at least 10 to 15 observations per predictor variable is required to produce reasonably stable estimates (25). In our study, eight features were selected for the final model and the minimum training data size was 80. While for the validation sample size, we performed a power calculation to estimate the sample size for our study (26) and found that the minimum sample size is 24. The estimation process can be found in the Supplementary Information. In our study, 118 patients (88 training data and 30 validation data) were involved, which were enough.
Patients
Our Institutional Review Board approved this retrospective study and waived the need to obtain informed consent from the patients. The entire cohort of this study was acquired from the January 2007 to August 2013 records of the Institutional Picture Archiving and Communication System (PACS, Carestream), which was used to identify patients who had histologically confirmed NPC (TNM stage: III-IVb) without evidence of recurrence or distant metastases at diagnosis. All patients underwent pretreatment 1.5 T MRI scans (Signa EXCITE HD, TwinSpeed, GE Healthcare). Supplementary Figure S1 provides the patient recruitment pathway, along with the inclusion and exclusion criteria. A total of 118 consecutive patients met the criteria (92 men and 26 women; mean age, 43 years ± 10.98) were identified and divided into two cohorts at a ratio of 3:1 using computer-generated random numbers. Eighty-eight patients were allocated to the training cohort (65 men and 23 women; mean age, 44 years ± 10.73), while 30 patients were allocated to the independent validation cohort (27 men and three women; mean age, 43 years ± 11.85).
Demographic and pretreatment clinical characteristics were collected from PACS, including age, gender, histology, T-stage, N-stage, overall stage, hemoglobin, and platelet counts. The dates of baseline MRI were also recorded. Tumor staging was performed on the basis of the American Joint Committee on Cancer TNM Staging System Manual, 7th Edition (27).
Follow-up and clinical endpoint
All patients were followed up every 1–3 months during the first 2 years, every 6 months in years 2–5, and annually thereafter. To avoid extended follow-up and provide an efficient tool that would allow earlier personalized treatment, we chose PFS as the endpoint (28). We calculated PFS from the first day of treatment to the date of disease progression (locoregional recurrences or distant metastases), death from any cause, or the date of the last follow-up visit (censored). The minimum follow-up time to ascertain the PFS was 36 months. All local recurrences were diagnosed by flexible nasopharyngoscopy and biopsy and/or MRI scanning of the nasopharynx and skull base that showed progressive bone erosion and/or soft tissue swelling. Regional recurrences were diagnosed by clinical examination of the neck and, in doubtful cases, by fine-needle aspiration or an MRI scan of the neck. Distant metastases were diagnosed on the basis of clinical symptoms, physical examination, and imaging methods that included chest X-ray, whole-body bone scan, MRI/CT, positron emission tomography (PET)/CT, and abdominal sonography.
MRI acquisition and segmentation
All patients underwent a pretreatment 1.5 T MRI scan. For feature selection, we used axial T2−weighted Digital Imaging and Communications in Medicine (DICOM) images and contrast-enhanced T1-weighted DICOM images that had been archived in the PACS, without applying any preprocessing or normalization. The Supplementary Information describes the magnetic resonance image acquisition parameters.
Note that segmentation is required before the extraction of quantitative radiomics features. We used ITK-SNAP software for three-dimensional manual segmentation (open source software; www.itk-snap.org). All manual segmentations of the tumor were performed by a radiologist who had 10 years of experience, and each segmentation was validated by a senior radiologist, who had 20 years of experience (largely with NPC). The region of interest covered the whole tumor and was delineated on both the axial T2-weighted images and contrast-enhanced T1-weighted images on each slice.
Data analysis
Radiomics feature extraction/selection and radiomics signature building.
Radiomics features capture phenotypic differences between tumors by extracting a large set of quantitative features (Fig. 1). The feature extraction methodology has been described in the Supplementary Information. All feature extraction methods were implemented using MatLab 2014a (MathWorks). We used least absolute shrinkage and selection operator (LASSO) method to select features that were most significant and then built a Cox model including selected variates. The LASSO is a data analysis method that may be applied for biomarker selection in high dimensional data. Originally proposed for the linear regression model, this method minimizes the residual sum of squares, subject to the sum of the absolute value of the coefficients being less than a tuning parameter (λ). For the binary logistic regression model, the residual sum of squares is replaced by the negative log-likelihood. If the λ is large, there is no effect on the estimated regression parameters, but as the λ gets smaller, some coefficients may be shrunk towards zero (29, 30). We then selected the λ for which the cross-validation error is the smallest. Finally, the model is refit using all of the available observations and the selected λ. By the way, most of the coefficients of the covariates are reduced to zero and the remaining non-zero coefficients are selected by LASSO. Non-zero coefficient of the selected feature is defined as Rad-Score: the Rad-score was calculated for each patient as a linear combination of selected features that were weighted by their respective coefficients. Radiomics signatures were built using Rad-score.
Prognostic validation of radiomics signature.
The potential association of radiomics signature with PFS was first assessed in the training cohort, and then validated in the validation cohort. Kaplan–Meier survival analysis was used in both cases. The patients were divided into high-risk and low-risk groups based on the median Rad-score. Patients with median scores were placed in high-risk groups. We performed stratified analyses to determine the PFS in various subgroups, comparing high-risk and low-risk patients. Univariate Cox proportional hazards models were applied to calculate the C-index of the radiomics signature that was based on CET1-w and T2-w images.
Performance of the TNM staging system, clinical nomogram, and radiomics nomogram in the training cohort.
The TNM staging system and nomogram performance were measured quantitatively using the C-index. The C-index is commonly used to evaluate the discriminative ability of prognostic models in survival analysis. The value of the C-index can range from 0.5, which indicates no discriminative ability, to 1.0, which indicates perfect ability to distinguish between the patients who experience disease progression or death and those who do not. Bootstrap analyses with 1,000 resamples were used to obtain a C-index statistics that were corrected for potential overfitting. The nomogram calibration curves were assessed by plotting the observed survival fraction against the nomogram-assessed probabilities.
Validation of the TNM staging system and nomograms
The prognostic performance of the TNM staging system, clinical nomogram, and radiomics nomogram was evaluated in the training cohort and then tested in the validation cohort. C-index and calibration curve were obtained from multivariable Cox proportional hazard regression analyses.
Association of radiomics features with clinical data
We performed a heatmap analysis to evaluate associations between radiomics features and clinical data.
Statistical analysis
The statistical analyses were performed with R software (R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; http://www.R-project.org, 2016). The following R packages were used: The glmnet package was used for LASSO logistic regression. The detailed description of the LASSO method was shown in and Supplementary Information. The survival package was used for Kaplan–Meier survival analyses. The rms package was used for Cox proportional hazards regression, nomograms, and calibration curves. The Hmisc package was used for comparisons between C-indices. The Resource Selection package was used to apply Hosmer–Lemeshow tests. The gplots and pheatmap packages were used for heatmaps. Bonferroni correction was used for multiple hypothesis correction if necessary. All statistical tests were two-sided, and P values of <0.05 were considered significant.
Results
Clinical characteristics of the patients
The clinical characteristics of the training and validation cohorts are summarized in Table 1. No differences were found between the training and validation cohorts in terms of age, gender, overall stage, T-stage, N-stage, histology, or follow-up time (P = 0.270–0.687). However, hemoglobin and platelet counts differed significantly between the two cohorts (P < 0.001). The median PFS was 40 months (range, 3–84 months).
. | Training cohort (N = 88) . | Validation cohort (N = 30) . |
---|---|---|
Gender | ||
Male | 65 (73.9%) | 27 (90%) |
Female | 23 (26.1%) | 3 (10%) |
Age (years) | ||
Median (IQR) | 43 (37.75–51.00) | 44 (36.25–50.75) |
≤40 | 37 (42%) | 13 (43.4%) |
40–50 | 26 (30%) | 9 (30%)s |
>50 | 25 (28%) | 8 (26.6%) |
Overall stage | ||
III | 55 (62.5%) | 22 (73.4%) |
IV | 33 (37.5%) | 8 (26.6%) |
T stage | ||
T1 | 3 (3.4%) | 4 (13.4%) |
T2 | 23 (26.1%) | 3 (10%) |
T3 | 41 (46.6%) | 17 (56.6%) |
T4 | 21 (23.9%) | 6 (20%) |
N stage | ||
N0 | 8 (9%) | 1 (3.3%) |
N1 | 19 (21.6%) | 5 (16.7%) |
N2 | 47 (53.4%) | 22 (73.3%) |
N3 | 14 (16%) | 2 (6.7%) |
Histologya | ||
WHO type I | 0 | 0 |
WHO type II | 3 (3.5%) | 3 (10%) |
WHO type III | 83 (96.5%) | 27 (90%) |
Hemoglobin before treatment (g/L) | ||
Median (IQR) | 173.5 (142–232.5) | 143.5 (134–152.5) |
≤182 | 43 (49%) | 6 (20) |
>182 | 45 (51%) | 24 (80) |
Platelet counts before treatment (109/L) | ||
Median (IQR) | 141.5 (126.75–185.25) | 227.5 (184–296.5) |
≤184 | 23 (26.1%) | 9 (30%) |
>184 | 65 (73.9%) | 21 (70%) |
Follow-up time (mo) | ||
Median (IQR) | 40.5 (26–58.5) | 38.5 (29–49.5) |
. | Training cohort (N = 88) . | Validation cohort (N = 30) . |
---|---|---|
Gender | ||
Male | 65 (73.9%) | 27 (90%) |
Female | 23 (26.1%) | 3 (10%) |
Age (years) | ||
Median (IQR) | 43 (37.75–51.00) | 44 (36.25–50.75) |
≤40 | 37 (42%) | 13 (43.4%) |
40–50 | 26 (30%) | 9 (30%)s |
>50 | 25 (28%) | 8 (26.6%) |
Overall stage | ||
III | 55 (62.5%) | 22 (73.4%) |
IV | 33 (37.5%) | 8 (26.6%) |
T stage | ||
T1 | 3 (3.4%) | 4 (13.4%) |
T2 | 23 (26.1%) | 3 (10%) |
T3 | 41 (46.6%) | 17 (56.6%) |
T4 | 21 (23.9%) | 6 (20%) |
N stage | ||
N0 | 8 (9%) | 1 (3.3%) |
N1 | 19 (21.6%) | 5 (16.7%) |
N2 | 47 (53.4%) | 22 (73.3%) |
N3 | 14 (16%) | 2 (6.7%) |
Histologya | ||
WHO type I | 0 | 0 |
WHO type II | 3 (3.5%) | 3 (10%) |
WHO type III | 83 (96.5%) | 27 (90%) |
Hemoglobin before treatment (g/L) | ||
Median (IQR) | 173.5 (142–232.5) | 143.5 (134–152.5) |
≤182 | 43 (49%) | 6 (20) |
>182 | 45 (51%) | 24 (80) |
Platelet counts before treatment (109/L) | ||
Median (IQR) | 141.5 (126.75–185.25) | 227.5 (184–296.5) |
≤184 | 23 (26.1%) | 9 (30%) |
>184 | 65 (73.9%) | 21 (70%) |
Follow-up time (mo) | ||
Median (IQR) | 40.5 (26–58.5) | 38.5 (29–49.5) |
NOTE: Data are n (%) unless otherwise indicated. No significant differences were found between the training cohort and the validation cohort in terms of age, gender, overall stage, T-stage, N-stage, histology, or follow-up time (P = 0.270-0.687). Hemoglobin and platelet counts were significantly different between the two cohorts (P = 6.49 × 10−5 and 0.0001, respectively).
Abbreviations: IQR, inter-quartile range; type I, keratinizing; type II, nonkeratinizing differentiated; type III, nonkeratinizing undifferentiated.
aHistology was categorized according to the WHO Classification.
Radiomics feature extraction/selection and radiomics signature building
A total of 970 features were extracted from magnetic resonance images (485 features from T2-w images and the remaining 485 from CET1-w images). Of these, we selected three features from CET1-w images and four features from T2-w images that were most strongly associated with PFS in the training cohort (Table 2). To build the radiomics signature, eight features were selected for inclusion in the Rad-score prognostic model, including five features derived from CET1-w images and three features derived from T2-w images (Supplementary Fig. S2A–S2B; Supplementary Information, Rad-score calculation formula). Rad-score for each patient in the training cohort was shown in Fig. 2.
Result category . | CET1-w . | T2-w . | CET1-w+T2-w . |
---|---|---|---|
Number of selected features | 3 | 4 | 8 |
Individual features | CET1-w_5_fos_mean (P = 1.022864e-05) | T2-w_Max3D (P = 4.527813e-05) | CET1-w_5_fos_median (P = 1.022864e-05) |
CET1-w_5_GLCM_correlation (P = 3.550793e-05) | T2-w_3_fos_mean (P = 1.990964e-03) | CET1-w_1_GLCM_energy (P = 1.022216e-03) | |
CET1-w_5_GLRLM_RP (P = 3.438792e-04) | T2-w_6_GLCM_IMC1 (P = 5.970507e-06) | CET1-w_5_GLCM_correlation (P = 3.550793e-05) | |
T2-w_1_GLRLM_SRLGLE (P = 3.898521e-05) | CET1-w_4_GLRLM_LRHGLE (P = 1.651304e-04) | ||
CET1-w_5_GLRLM_RP (P = 3.438792e-04) | |||
T2-w_Max3D (P = 4.527813e-05) | |||
T2-w_3_fos_mean (P = 1.990964e-03) | |||
T2-w_4_fos_mean (P = 4.369881e-03) | |||
The best-performance feature | CET1-w_5_fos_mean | T2-w_6_GLCM_IMC1 | CET1-w_5_GLRLM_RP |
Result category . | CET1-w . | T2-w . | CET1-w+T2-w . |
---|---|---|---|
Number of selected features | 3 | 4 | 8 |
Individual features | CET1-w_5_fos_mean (P = 1.022864e-05) | T2-w_Max3D (P = 4.527813e-05) | CET1-w_5_fos_median (P = 1.022864e-05) |
CET1-w_5_GLCM_correlation (P = 3.550793e-05) | T2-w_3_fos_mean (P = 1.990964e-03) | CET1-w_1_GLCM_energy (P = 1.022216e-03) | |
CET1-w_5_GLRLM_RP (P = 3.438792e-04) | T2-w_6_GLCM_IMC1 (P = 5.970507e-06) | CET1-w_5_GLCM_correlation (P = 3.550793e-05) | |
T2-w_1_GLRLM_SRLGLE (P = 3.898521e-05) | CET1-w_4_GLRLM_LRHGLE (P = 1.651304e-04) | ||
CET1-w_5_GLRLM_RP (P = 3.438792e-04) | |||
T2-w_Max3D (P = 4.527813e-05) | |||
T2-w_3_fos_mean (P = 1.990964e-03) | |||
T2-w_4_fos_mean (P = 4.369881e-03) | |||
The best-performance feature | CET1-w_5_fos_mean | T2-w_6_GLCM_IMC1 | CET1-w_5_GLRLM_RP |
NOTE: P value for each radiomic feature associated with outcome was calculated using Cox proportional hazards regression. CET1-w = contrast-enhanced T1-weighted; T2-w = T2-weighted; CET1-w_4_GLRLM_LRHGLE: the Long Run High Gray Level Emphasis in the Gray-Level Run-Length matrix of textural features; CET1-w_5_fos_median: the first-order statistics feature which describes the median value of the intensity levels; CET1-w_5_GLCM_correlation: the correlation in the GLCM that describes the degree of similarity of the matrix elements in a row or column direction; CET1-w _5_GLRLM_RP: the Run Percentage in the Gray-Level Run-Length matrix of textural features; T2-w_Max3D: the shape and size feature that describes the maximum three-dimensional tumor diameter in the original image; T2-w_3_fos_mean and T2-w_4_fos_mean: the first-order statistics feature that describes the mean value of the intensity levels; CET1-w_1_GLCM_energy: the energy of the whole element in the GLCM matrix; T2-w_6_GLCM_IMC1: the informational measure of correlation 1 in the Gray-Level Run-Length matrix of textural features; T2-w_1_GLRLM_SRLGLE: the short Run Low Gray Level Emphasis in the Gray-Level Run-Length matrix of textural features.
Validation of radiomics signature
In the training cohort, the radiomics signature derived from CET1-w images yielded a C-index of 0.690 [95% confidence interval (CI): 0.593–0.787]. The radiomics signature from T2-w images yielded a C-index of 0.648 (95% CI: 0.551–0.745). The radiomics signature from joint CET1-w and T2-w images yielded the highest C-index, which was 0.758 (95% CI: 0.661–0.856).
In the validation cohort, the radiomics signature from CET1-w images yielded a C-index of 0.724 (95% CI: 0.544–0.904). The radiomics signature from T2-w images yielded a C-index of 0.682 (95% CI: 0.500–0.860). The radiomics signature from joint CET1-w and T2-w images yielded the highest C-index, which was 0.737 (95% CI: 0.549–0.924).
Significant discrimination between the PFS of high-risk and low-risk patients was observed when subgroup analyses were performed (Fig. 3).
The value of the joint radiomics signature is complementary to the TNM staging system in the training cohort
The traditional TNM staging system yielded a C-index of 0.514 (95% CI: 0.432–0.596). We developed a radiomics nomogram that integrated the radiomics signature from the joint CET1-w and T2-w images with the TNM staging system. This nomogram showed a significant improvement over the TNM staging system in terms of evaluating PFS (C-index: 0.761; 95% CI, 0.664–0.858), with a P value < 2.68 × 10−9 (Fig. 4A). The nomogram also showed good calibration (Fig. 4B).
The incremental value of the radiomics signature when added to the clinical data in the training cohort
The clinical nomogram yielded a C-index of 0.649 (95% CI: 0.552–0.746). We created a radiomics nomogram that integrated the radiomics signature from the joint CET1-w and T2-w images with all clinical data, and found that it provided a C-index of 0.776 (95% CI: 0.678–0.873) and good calibration (Fig. 4C and D). Hence, the radiomics nomogram appeared to be more accurate than clinical nomogram for evaluating PFS (P < 1.60 × 10−7).
Validation of the TNM staging system and nomograms
When tested in the validation cohort, the traditional TNM staging system yielded a C-index of 0.634 (95% CI: 0.498–0.769). The radiomics nomogram that integrated the radiomics signature from the joint CET1-w and T2-w images with the TNM staging system showed an improvement over the TNM staging system alone (C-index: 0.728; 95% CI: 0.541–0.916). The calibration curve for probability of PFS showed good agreement between evaluation by nomogram and actual observation (figure not shown).
The clinical nomogram yielded a C-index of 0.626 (95% CI: 0.438–0.813) in the validation cohort, which was improved by adding radiomics signature (C-index: 0.724; 95% CI: 0.537–0.912). The calibration curves showed good agreement between nomogram-evaluated and actual survival (figure not shown).
Association of radiomics features with clinical data
Unsupervised clustering revealed clusters of NPC patients with similar radiomics expression patterns (Supplementary Fig. S3).
We used a heatmap to determine the association between radiomics features and clinical data (Fig. 5). The results showed significant associations between signature features CET1-w_5_fos_median, T2-w_Max3D, and T2-w_3_fos_mean with overall stage (P = 0.002-0.007) as well as T-stage (P = 0.001-0.004). CET1-w_5_fos_median was associated with N-stage (P = 0.048). In contrast, no radiomics feature was significantly associated with hemoglobin or platelet counts (for all, P > 0.05). Interestingly, further analysis suggested that higher mean values of CET1-w_5_fos_ median, T2-w_Max3D, and T2-w_3_fos_mean were associated with higher overall stage and T-stage (P = 3.16 × 10−9 to 0.03). However, the mean value of CET1-w_5_ fos_ median was lower in the cases with node metastasis than in the cases without node metastasis (P = 7.49 × 10−5; Table 3).
Stage . | CET1-w_5_fos_median . | P . | T2-w_3_fos_mean . | P . | T2-w_Max3D . | P . |
---|---|---|---|---|---|---|
Overall stage | ||||||
III | 130.67 ± 23.01 | 0.0319 | 122.36 ± 1.01 | 0.0002 | 117.53 ± 4.52 | 0.0019 |
IV | 144.49 ± 26.40 | 122.91 ± 0.90 | 119.98 ± 3.92 | |||
T stage | ||||||
T1 | 114.57 ± 5.26 | 2.53 × 10−8 | 121.66 ± 1.17 | 3.16 × 10−9 | 112.27 ± 15.18 | 4.21 × 10−4 |
T2 | 117.13 ± 4.79 | 122.25 ± 1.03 | 127.84 ± 18.50 | |||
T3 | 118.39 ± 3.84 | 122.55 ± 0.92 | 135.52 ± 23.14 | |||
T4 | 120.54 ± 4.29 | 123.08 ± 0.86 | 148.73 ± 29.58 | |||
N stage | ||||||
N0 | 120.03 ± 4.59 | 0.0014 | — | — | — | — |
N1–3 | 117.74 ± 4.53 | — | — |
Stage . | CET1-w_5_fos_median . | P . | T2-w_3_fos_mean . | P . | T2-w_Max3D . | P . |
---|---|---|---|---|---|---|
Overall stage | ||||||
III | 130.67 ± 23.01 | 0.0319 | 122.36 ± 1.01 | 0.0002 | 117.53 ± 4.52 | 0.0019 |
IV | 144.49 ± 26.40 | 122.91 ± 0.90 | 119.98 ± 3.92 | |||
T stage | ||||||
T1 | 114.57 ± 5.26 | 2.53 × 10−8 | 121.66 ± 1.17 | 3.16 × 10−9 | 112.27 ± 15.18 | 4.21 × 10−4 |
T2 | 117.13 ± 4.79 | 122.25 ± 1.03 | 127.84 ± 18.50 | |||
T3 | 118.39 ± 3.84 | 122.55 ± 0.92 | 135.52 ± 23.14 | |||
T4 | 120.54 ± 4.29 | 123.08 ± 0.86 | 148.73 ± 29.58 | |||
N stage | ||||||
N0 | 120.03 ± 4.59 | 0.0014 | — | — | — | — |
N1–3 | 117.74 ± 4.53 | — | — |
NOTE: Values are expressed as means ± SDs.
Discussion
In the current study, we identified multiparametric MRI-based radiomics as a new approach for individualized evaluation of PFS before treatment in advanced NPC (stage III–IVb). To our knowledge, this is the first study of MRI-based radiomics for evaluating prognosis in advanced NPC. The radiomics signature from joint CET1-w and T2-w images demonstrated better prognostic performance than the radiomics signature from either CET1-w or T2-w images alone. The radiomics signature successfully stratified patients into high-risk and low-risk groups, which were separated on the basis of the median Rad-score. The two groups had significantly different 3-year PFS. The radiomics nomogram outperformed both the traditional TNM staging system and a clinical nomogram.
To develop the radiomics signature, a total of 970 candidate features were reduced to a set of only eight potential descriptors by using a LASSO logistic regression model. LASSO is suitable for analyzing large sets of radiomics features with a relatively small sample size, and it is designed to avoid overfitting (31, 32). The radiomics features obtained from LASSO are generally accurate, and the regression coefficients of most features are shrunk towards zero during model fitting, making the model easier to interpret and allowing the identification of features that are most strongly associated with PFS (33). Moreover, LASSO allows radiomics signature to be constructed by combining the selected features. The radiomics signature from joint CET1-w and T2-w images revealed adequate discrimination in both the training cohort (C-index, 0.758) and the validation cohort (C-index, 0.701). The field of radiomics aims to develop decision support tools. Therefore, it involves combining radiomics data with other patient characteristics, as available, to increase the power of the decision support models (17, 34). We showed that radiomics features complemented the TNM staging system, helping to provide better prognostic ability for pretreatment PFS. This complementary ability illustrates the clinical importance of our findings as TNM staging is routinely used in clinical practice (35, 36). Currently, the TNM staging system is used for risk stratification and treatment decision making. However, when patients were stratified by clinical disease stage, differences in PFS were evident within the individual stages, which suggest that heterogeneity was present in the survival outcomes.
The advanced nasopharyngeal carcinoma patients (stage III–IVb) with shorter PFS may benefit greatly if we can perform accurate prognosis and predict their response to an aggressive treatment plan. In this regard, our study only focused on patients with stage III–IVb tumors. As a future study, we will develop a new model to include low-stage NPC patients. Moreover, N-staging in our study showed poor prognosis value. Note that advanced NPC patients often experienced lymph node metastasis. In contrast, relatively few patients had lymph node metastasis with low-stage tumors. Therefore, the N-stage seemed not to be an adverse prognosis factor for advanced patients. Besides, only 9% and 3% III–IVb patients with N0 stage were involved in the training cohort and validation cohort, respectively. The small number of N0 patients would also limit the performance of N-staging.
Our results showed that the radiomics signature performed better than the TNM classification, not only in the training cohort, but also in the validation cohort. There might be three reasons: (i) our study was focused on advanced nasopharyngeal carcinoma cohorts. As shown in Table 1, the involved patients were all with clinical stage III and IV. Therefore, it was difficult for clinical staging to predict PFS since all patients had similar stage information; (ii) related to the first reason, our cohorts were imbalance in T and N staging. Only 9% N0 and 3.4% T1 stage patients were involved in the training set. Even if the clinical staging works, the imbalance cohort would generate a great deviation when using only clinical staging for prognosis; (iii) clinical staging reflected tumor size (T stage), lymph node status (N stage), and metastasis status (M stage), which were based on gross anatomy information. They were proved to have prognosis value in clinical practices. Currently, the intratumor heterogeneity has been reported to have pronounced effects on diagnosis and prognosis, and thus it is considered to be a potential prognosis factor. This view fits our current knowledge of cancer, in which malignant lesions consist of heterogeneous cell populations with distinct molecular and microenvironmental differences. In contrast to traditional clinical staging, which barely reflect the intratumor heterogeneity, the radiomic approach extracts features from the imaging characteristics of the entire tumor on medical images, thus provide a robust way to characterize the intratumor heterogeneity noninvasively. That is the reason that the radiomics signatures performed better than clinical staging on the patients' population selected in the current study. Therefore, radiomics signatures can have indispensable prognosis value totally complementary to clinical staging. Age, gender, body mass index, lactate dehydrogenase level in serum, hemoglobin, platelet counts, and various other prognostic factors have been identified and evaluated retrospectively in some previous studies (37–39). Therefore, we devised a clinical nomogram that combined available risk factors (age, gender, pretreatment hemoglobin, and platelet counts) with overall stage. Our results demonstrated that the performance of the clinical nomogram could be improved by adding a radiomics signature to the model. If additional clinical variables were included in the radiomics nomogram, its performance might improve further.
Segmentation is the most critical and challenging component of radiomics because the subsequent feature data are generated from the segmented volumes, even though many tumors have indistinct borders (40). Compared with CT or PET/CT, MRI provides better tissue contrast, has multiplanar capacity, and exhibits fewer artifacts from radiation and bone beam hardening, allowing tumor borders to be delineated more accurately (24). Another especially challenging component of radiomics is variability in the imaging data (17). Variations in acquisition and image reconstruction parameters can introduce changes that are not due to underlying biologic effects (17). Recently, CT has been the most widely used imaging modality, in consideration of its standard-of-care images (41). However, CT is also limited by the evolution of hardware and progress in informatics. To reduce bias and variance, we extracted all radiomics features from the same MRI unit at our institution. Naturally, the use of multiparametric magnetic resonance images would be expected to improve performance. Therefore, we analyzed CET1-w and T2-w images together, and the results confirmed our expectations by providing the best performance.
T2-w images can detect tumor density, and CET1-w images may reflect intratumoral heterogeneity and architecture (e.g., tumor angiogenesis). Therefore, we analyzed the relationships between radiomics features and tumor-associated characteristics. We observed that radiomics features CE T1-w_5_fos_median, T2-w_3_fos_mean, and T2-w_Max3D were significantly associated with both overall stage and T-stage. In many previous studies that used IHC methods, dynamic-contrast–enhanced MRI (DCE-MRI), or PET/CT, researchers have found that angiogenesis is closely related to tumor invasion and metastasis, which can be staged by TNM (42–47). Unlike the traditional methods, radiomics offers a noninvasive and low-cost method of providing new insights into the associations between tumor angiogenesis and biological behaviors. In addition, our radiomics evaluations also showed that tumors with increased cell density (T2-w_3_fos_mean) and greater maximum diameters (T2-w_Max3D) may be more likely to invade the surrounding tissues. However, we unexpectedly found that cases with lymph node metastasis had a lower mean value of CET1-w_5_fos_median than did cases without lymph node metastasis. Almost all previous studies have reported the contrary result that angiogenesis is positively correlated with lymph node metastasis (48–50). The reason for this discrepancy may be NPC patients often experienced lymph node metastasis even in the low-stage (stage I–II). Only 9% and 3.4% III–IVb patients were with N0 stage in the training cohort and the validation cohort, respectively, which limited the ability of radiomic features to uncover the association between intensity levels in contrast-enhanced T1-w images and lymphatic metastasis. However, the current study may provide some different insights into the mechanisms of lymphatic metastasis of NPC, which warrant future investigation.
The limitations to this study included the fact that our analysis did not account for two-way or higher order interactions between features. If interactions between features had been identified, the interaction terms that were most strongly associated with the outcome interactions would have been selected when we constructed the radiomics signature, and this could have improved prognostic performance. However, uncovering the interactions of multiple attributes is a challenging problem. On one hand, the ability to detect multiway interactions would be underpowered unless all multiway interactions were prespecified and explicitly formulated in the model. On the other hand, when the number of possible configurations becomes very large and the sample size is limited, it is difficult to yield reliable statistical inferences for two-way or higher order interactions. Furthermore, we used a validation cohort that was drawn from the same institution as the training cohort, which prevented us from investigating the generalizability of the results to other institutions and settings. Finally, selection bias occurred when strict criteria were used (randomization hypothesis is compromised), which may affect the model training. For instance, we used strict inclusion criteria in our study to select 118 patients: (i) all patients should be in clinical stage III and IV, which will limit the application of our method to low-stage patients; (ii) all patients should have regular follow-up time [every 1–3 months during the first 2 years, every 6 months in years 2–5, and annually thereafter]; (iii) all patients should use uniform imaging scanners and parameters to ensure the reproducibility and stability of radiomics features. These criteria introduced selection bias by removing patients with the best prognosis (i.e., those of low stage) as well as the worst prognosis (i.e., those were lost in follow up at all). The selection bias thus limits our model only accurate in those patients in a good condition and at the clinical stage III and IV. Indeed our results clearly showed the potential of radiomics approach in the prognosis of NPC patients. As a future study, we will increase the patient sample size by including those were not included in the current study. Moreover, we should include patients from different scanners and with various imaging parameters, and develop normalization method to improve the radiomics model.
In summary, the current study developed and validated multiparametric MRI-based radiomics as a convenient approach to evaluating PFS pretreatment in patients with advanced NPC (stage III–IVb). The radiomics signature that we presented added value to both the TNM staging system and clinical data as method of providing individualized evaluation of PFS. Prognostic models based on quantitative radiomics could potentially be useful for precision medicine and affect the treatment strategies that are used for patients with NPC.
Radiomics can be complementary to other omics such as proteomics and genomics. Radiomics focuses on medical imaging of the entire tumor and perform diagnosis and prognosis of the tumor with masses of quantitative imaging features. Proteomics studies proteins in tissue of the tumor or other organs to find the change of protein structure and function in the diseases. Genomics discovers and notes the gene sequences to study function and structure of genomes of the diseases. Contrary to radiomics that determines the tumor features in the macroscopic scale, proteomics and genomics determine the feature of tumor in the microscopic scale. Therefore, it can be anticipated that, in the future, the combination of several omics would be the best scheme for disease diagnosis and treatment.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: S.X. Zhang, B. Zhang, J. Tian, D. Dong, C.H. Liang
Development of methodology: B. Zhang, J. Tian, D. Dong, D.S. Gu
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): B. Zhang, Y.H. Dong, L. Zhang, Z.Y. Lian, J. Liu, X.N. Luo, S.F. Pei, X.K. Mo, W.H. Huang, F.S. Ouyang, B.L. Guo, L. Liang, W.B. Chen
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): B. Zhang, D.S. Gu, Y.H. Dong, Z.Y. Lian, J. Liu, X.N. Luo, X.K. Mo, F.S. Ouyang, B.L. Guo, L. Liang, W.B. Chen
Writing, review, and/or revision of the manuscript: S.X. Zhang, B. Zhang
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): B. Zhang, D. Dong, Z.Y. Lian, S.F. Pei, X.K. Mo, L. Liang
Study supervision: S.X. Zhang, J. Tian, C.H. Liang
Acknowledgments
We acknowledge financial support from the National Scientific Foundation of China and the Science and Technology Planning Project of Guangdong Province.
Grant Support
We acknowledge financial support from the National Scientific Foundation of China (81571664) and the Science and Technology Planning Project of Guangdong Province (2014A020212244, 2016A020216020).