Abstract
Purpose: The current tumor–node–metastasis (TNM) staging system is inadequate at identifying patients with high-risk colorectal cancer. Using a systematic and comprehensive biomarker discovery and validation approach, we aimed to identify an miRNA recurrence classifier (MRC) that can improve upon the current TNM staging as well as is superior to currently offered molecular assays.
Experimental Design: Three independent genome-wide miRNA expression profiling datasets were used for biomarker discovery (N = 158) and in silico validation (N = 109 and N = 40) to identify an miRNA signature for predicting tumor recurrence in patients with colorectal cancer. Subsequently, this signature was analytically trained and validated in retrospectively collected independent patient cohorts of fresh-frozen (N = 127, cohort 1) and formalin-fixed paraffin-embedded (FFPE; N = 165, cohort 2 and N = 139, cohort 3) specimens.
Results: We identified an 8-miRNA signature that significantly predicted recurrence-free interval (RFI) in the discovery (P = 0.002) and two independent publicly available datasets (P = 0.00006 and P = 0.002). The RT-PCR–based validation in independent clinical cohorts revealed that MRC-derived high-risk patients succumb to significantly poor RFI in patients with stage II and III colorectal cancer [cohort 1: hazard ratio (HR), 3.44 (1.56–7.45), P = 0.001; cohort 2: HR, 6.15 (3.33–11.35), P = 0.001; and cohort 3: HR, 4.23 (2.26–7.92), P = 0.0003]. In multivariate analyses, MRC emerged as an independent predictor of tumor recurrence and achieved superior predictive accuracy over the currently available molecular assays. The RT-PCR–based MRC risk score = (−0.1218 × miR-744) + (−3.7142 × miR-429) + (−2.2051 × miR-362) + (3.0564 × miR-200b) + (2.4997 × miR-191) + (−0.0065 × miR-30c2) + (2.2224 × miR-30b) + (−1.1162 × miR-33a).
Conclusions: This novel MRC is superior to currently used clinicopathologic features, as well as National Comprehensive Cancer Network (NCCN) criteria, and works regardless of adjuvant chemotherapy status in identifying patients with high-risk stage II and III colorectal cancer. This can be readily deployed in clinical practice with FFPE specimens for decision-making pending further model testing and validation. Clin Cancer Res; 24(16); 3867–77. ©2018 AACR.
See related commentary by Rodriguez et al., p. 3787
Current staging and clinicopathologic risk factors are inadequate at identifying high-risk stage II and III colorectal cancers. Here, we performed a genome-wide unbiased screening and identified a clinically applicable miRNA signature that can robustly stratify patients with low- and high-risk stage II and III colorectal cancer independent of adjuvant chemotherapy and clinicopathologic risk factors. The miRNA prognostic classifier was independently validated in multiple patient cohorts consisting of 736 patients with stage II and III colorectal cancer. Although we did not perform a direct comparison, the recurrence prediction values and hazard ratios for our miRNA classifier were superior to other commercial gene expression–based assays. As we developed a “risk prediction model” using our 8-miRNA signature from the formalin-fixed paraffin-embedded (FFPE) specimens, these scores can be readily applied to independent, prospective cohorts to further evaluate the potential of our newly identified miRNA classifier in identifying the appropriate patient population for adjuvant chemotherapy, and thereby improve the survival of patients with colorectal cancer.
Introduction
Colorectal cancer is one of the leading causes of cancer-related mortality worldwide, with an estimated 49,190 deaths recorded in the United States alone in 2016 (1). Survival in patients with colorectal cancer is primarily associated with the tumor stage at diagnosis; 5-year relative survival rates range from 65% for all stages to approximately 93.2% for stage I, 82.5% for stage II, 59.5% for stage III, and 8.1% for stage IV (2).
Postsurgery, 30% of stage II and 50% to 60% of stage III colorectal cancer patients develop a recurrence within 5 years (3). Although there is general agreement that adjuvant chemotherapy in patients with stage III colorectal cancer improves patient survival (4–6), the use of such treatments in stage II cancers remains debatable due to lack of risk stratification for identifying true high-risk patients (7, 8). Current National Comprehensive Cancer Network (NCCN) guidelines recommend adjuvant chemotherapy for patients with high-risk stage II colorectal cancer, where the risk is primarily defined by the clinicopathologic features such as tumor size, number of lymph nodes investigated, degree of differentiation, tumor perforation, bowel obstruction, and lymphovascular invasion (7, 9). However, several studies have highlighted the inadequacy of these pathologic features in identifying such high-risk patients, providing a potential explanation for the lack of clinical benefit from adjuvant therapy in these patients (10–12). Furthermore, a significant proportion of stage III patients suffer from adverse effects of adjuvant chemotherapy (2). Collectively, these data highlight the imperative need to develop molecular markers for identifying true high-risk populations of patients with stage II and III colorectal cancer to facilitate optimal treatment modalities.
With regard to availability of other potential prognostic biomarkers in colorectal cancer, the association of BRAF and KRAS mutations, CpG island methylator phenotype (CIMP) status, and microsatellite instability (MSI) status have been studied extensively. It has been shown that MSI patients demonstrate an inherently better survival, and that MSI patients with stage II colorectal cancer do not benefit from 5-fluorouracil (5FU)–based adjuvant chemotherapy (13–18). A more recent effort in this context, involving a gene expression–based consensus molecular subtyping (CMS), has identified that the CMS4 subtype of colorectal cancer is associated with poor prognosis (19). Although gene expression–based biomarkers may be promising, technical concerns involving specimen preservation and mRNA integrity, particularly in formalin-fixed paraffin-embedded (FFPE) specimens, limit their clinical translation. In contrast, due to their short length, noncoding RNAs, such as miRNAs, are emerging as important biomarker candidates by virtue of their ability to resist RNAase-mediated degradation and their intact expression in a variety of bodily fluids as well as FFPE tissues.
Previously, we discovered that the miR-200 family is an important driver for the CMS4 subtype in patients with colorectal cancer (20). Building upon this evidence, herein we have performed an unbiased, systematic, and comprehensive genome-wide discovery to identify a novel and robust miRNA-based classifier that can predict tumor recurrence in patients with stage II and III colorectal cancer. By analyzing multiple clinical cohorts that included a total of 736 patients with stage II and III colorectal cancer, we demonstrate that this miRNA recurrence classifier (MRC) has superior predictive power over clinicopathologic risk determinants and currently available commercial assays, and this combined with its robust performance even in FFPE tissues makes it attractive for relatively immediate clinical translation.
Materials and Methods
Patient cohorts
This study included multiple clinical cohorts with a total of 736 patients. These cohorts included patients from the publicly available dataset from The Cancer Genome Atlas (TCGA; N = 158 and 107), the GSE29623 dataset (N = 40), as well two clinical validation cohorts of 431 patients with stage II and III colorectal cancer who underwent surgery without neoadjuvant chemotherapy. Clinicopathologic parameters of the clinical validation cohorts are provided in Table 1. The first cohort (cohort 1) comprised fresh-frozen tissues from 127 patients who were enrolled at the National Cancer Center Hospital (NCCH), Tokyo, Japan, from 2004 to 2006, and consisted of 28 recurrences with a median follow-up of 67 months. The second cohort included FFPE tissues from 304 patients enrolled at the Tokyo Medical and Dental University Hospital (TMDU), Tokyo, Japan, between 2007 and 2011, and consisted of 82 recurrences with a median follow-up of 47 months. Based upon the year of enrollment, this cohort was subdivided into a training (cohort 2) and a validation (cohort 3) cohort, respectively. Random splitting of this cohort into training and validation also resulted in similar outcomes (data not shown). Our study was conducted in accordance with the Declaration of Helsinki. Written informed consent was obtained from all the subjects, and the respective institutional review boards approved the study. A reporting recommendations for tumor marker prognostic studies (REMARK; ref. 21) compliance checklist is provided in Supplementary Table S1.
Patient characteristics
| . | In silico discovery and validation . | Clinical training and Validation . | ||||
|---|---|---|---|---|---|---|
| Characteristics . | TCGA-HiSeq (discovery, N = 158) . | TCGA-GA (validation 1, N = 109) . | GSE29623 (validation 2, N = 40) . | Cohort 1 (fresh frozen, N = 127) . | Cohort 2 (FFPE training, N = 165) . | Cohort 3 (FFPE validation, N = 139) . |
| Age | ||||||
| Median | 68 | 69 | 69 | NA | 69 | 69 |
| Gender | ||||||
| Male | 88 | 51 | 22 | 73 | 90 | 86 |
| Female | 70 | 58 | 18 | 54 | 75 | 53 |
| Localization | ||||||
| Left | NA | NA | NA | 107 | 112 | 94 |
| Right | NA | NA | NA | 20 | 53 | 45 |
| Grade | ||||||
| Well/moderate | NA | NA | 32 | 122 | 152 | 126 |
| Poor | NA | NA | 8 | 5 | 13 | 13 |
| Stage | ||||||
| II | 88 | 65 | 22 | 53 | 92 | 70 |
| III | 70 | 44 | 18 | 74 | 73 | 69 |
| Relapse | ||||||
| Yes | 34 | 16 | 7 | 28 | 42 | 40 |
| No | 124 | 93 | 33 | 99 | 123 | 99 |
| Lymphatic invasion | ||||||
| Yes | NA | NA | NA | 81 | 91 | 72 |
| No | NA | NA | NA | 46 | 74 | 67 |
| Venous invasion | ||||||
| Yes | NA | NA | NA | 45 | 147 | 122 |
| No | NA | NA | NA | 82 | 18 | 17 |
| Lymph nodes investigated | ||||||
| >/=12 | NA | NA | NA | 110 | 133 | 118 |
| </=12 | NA | NA | NA | 17 | 32 | 21 |
| Adjuvant therapy | ||||||
| Yes | NA | NA | 23 | 56 | 60 | 52 |
| No | NA | NA | 17 | 71 | 105 | 87 |
| MSI | ||||||
| Yes | 59 | 32 | NA | 5 | 15 | 10 |
| No | 99 | 76 | NA | 111 | 147 | 125 |
| NA | 40 | 11 | 3 | 4 | ||
| CEA | ||||||
| </=5 | NA | NA | NA | 84 | 96 | 86 |
| >/=5 | NA | NA | NA | 43 | 69 | 53 |
| Non-CMS4 | 100 | 80 | NA | NA | NA | NA |
| CMS4 | 50 | 24 | NA | NA | NA | NA |
| NA | 8 | 5 | 40 | 127 | 165 | 139 |
| . | In silico discovery and validation . | Clinical training and Validation . | ||||
|---|---|---|---|---|---|---|
| Characteristics . | TCGA-HiSeq (discovery, N = 158) . | TCGA-GA (validation 1, N = 109) . | GSE29623 (validation 2, N = 40) . | Cohort 1 (fresh frozen, N = 127) . | Cohort 2 (FFPE training, N = 165) . | Cohort 3 (FFPE validation, N = 139) . |
| Age | ||||||
| Median | 68 | 69 | 69 | NA | 69 | 69 |
| Gender | ||||||
| Male | 88 | 51 | 22 | 73 | 90 | 86 |
| Female | 70 | 58 | 18 | 54 | 75 | 53 |
| Localization | ||||||
| Left | NA | NA | NA | 107 | 112 | 94 |
| Right | NA | NA | NA | 20 | 53 | 45 |
| Grade | ||||||
| Well/moderate | NA | NA | 32 | 122 | 152 | 126 |
| Poor | NA | NA | 8 | 5 | 13 | 13 |
| Stage | ||||||
| II | 88 | 65 | 22 | 53 | 92 | 70 |
| III | 70 | 44 | 18 | 74 | 73 | 69 |
| Relapse | ||||||
| Yes | 34 | 16 | 7 | 28 | 42 | 40 |
| No | 124 | 93 | 33 | 99 | 123 | 99 |
| Lymphatic invasion | ||||||
| Yes | NA | NA | NA | 81 | 91 | 72 |
| No | NA | NA | NA | 46 | 74 | 67 |
| Venous invasion | ||||||
| Yes | NA | NA | NA | 45 | 147 | 122 |
| No | NA | NA | NA | 82 | 18 | 17 |
| Lymph nodes investigated | ||||||
| >/=12 | NA | NA | NA | 110 | 133 | 118 |
| </=12 | NA | NA | NA | 17 | 32 | 21 |
| Adjuvant therapy | ||||||
| Yes | NA | NA | 23 | 56 | 60 | 52 |
| No | NA | NA | 17 | 71 | 105 | 87 |
| MSI | ||||||
| Yes | 59 | 32 | NA | 5 | 15 | 10 |
| No | 99 | 76 | NA | 111 | 147 | 125 |
| NA | 40 | 11 | 3 | 4 | ||
| CEA | ||||||
| </=5 | NA | NA | NA | 84 | 96 | 86 |
| >/=5 | NA | NA | NA | 43 | 69 | 53 |
| Non-CMS4 | 100 | 80 | NA | NA | NA | NA |
| CMS4 | 50 | 24 | NA | NA | NA | NA |
| NA | 8 | 5 | 40 | 127 | 165 | 139 |
Abbreviations: CEA, carcinoembryonic antigen; NA, not available.
Identification of the miRNA signature from genome-wide small RNA sequencing data
Two public datasets (three cohorts) were analyzed in the discovery phase: colorectal cancer miRNA sequencing data from TCGA (22) and GSE29623 (23) from Gene Expression Omnibus (GEO). The TCGA colorectal cancer dataset includes 265 stage II and III patients with corresponding miRNA sequencing data derived from two different platforms: Illumina HiSeq (TCGA-HiSeq, N = 158) and Genome Analyzer (TCGA-GA, N = 109). More specifically, level 3 miRNA expression data were downloaded from Firehose Broad GDAC portal (http://gdac.broadinstitute.org/, accessed on November 1, 2015). The miRNA expression levels, measured by reads per million miRNA mapped (RPM), were first log2 transformed. A total of 680 miRNAs in common between the two platforms were kept for the following analysis. Differential miRNA expression analysis was subsequently performed between patients with and without recurrence in 3 years using Wilcoxon signed-rank test. For in silico validation of identified miRNAs, we analyzed one additional independent cohort (GSE29623). The GSE29623 set includes expression levels of 664 miRNAs for 65 tumor tissue samples based on NIH TaqMan human microRNA array v.2 microarray platform, of which 40 samples were from stage II and III patients. The miRNA expression profiles were normalized using the robust multiarray average (RMA) algorithm in R (23). We downloaded preprocessed data from GEO using Bioconductor package “GEOquery.” Using multivariate Cox regression analysis, we calculated risk scores and assessed the prognosis performance of the miRNA signature–based survival analysis, using the median value of the predicted risk scores in each dataset as the cutoff.
Nucleic acid isolation and miRNA expression analysis
Total RNA from the fresh-frozen tissues was isolated using RNeasy mini kit (QIAGEN), and both RNA and DNA from the FFPE cohort were isolated using AllPrep FFPE kit (QIAGEN). The miRNA expression analysis was performed using QuantStudio 7 flex real-time PCR system (Applied Biosystems). All miRNA TaqMan probes were purchased from Thermo Fisher Scientific. The qRT-PCR assays were conducted using the TaqMan microRNA reverse transcription kit (Applied Biosystems) using the SensiFAST probe Lo-ROX kit. The relative expression of miRNAs was determined by the 2-Δct method using snRNA U6 as a normalizer, as described previously (24), and we observed no difference in snRNA U6 between recurrent and nonrecurrent patients.
MSI analysis
The MSI analysis was performed using the five mononucleotide repeat microsatellite markers (BAT-25, BAT-26, NR-21, NR-24, and NR-27) in a pentaplex PCR system, as described previously (25–27).
Statistical analysis
Statistical analyses were performed using IBM SPSS version 23, GraphPad Prism version 6.0, and R 3.2.4. Statistical differences between miRNAs and various clinicopathologic factors were determined by the χ2 test. The Benjamini–Hochberg method was used to correct for multiple hypothesis testing wherever applicable. All statistical tests were two sided, and a P value of less than 0.05 was considered significant. Recurrence-free interval (RFI) was defined from the day of surgery to the recurrence or the end of follow-up and was analyzed by log-rank test. We performed receiver operating characteristic (ROC) curve analysis to evaluate the predictive power of MRC. All 8-miRNA expression values derived from the RT-PCR were used to build MRC using Cox proportional hazard regression. The risk scores derived from the 8-gene MRC Cox model was used to plot the area under the curves (AUC). The risk scores were calculated using the formula derived from the Cox model as follows: The RT-PCR–based MRC risk score = (−0.1218 × miR-744) + (−3.7142 × miR-429) + (−2.2051 × miR-362) + (3.0564 × miR-200b) + (2.4997 × miR-191) + (−0.0065 × miR-30c2) + (2.2224 × miR-30b) + (−1.1162 × miR-33a). To plot the Kaplan–Meier curves, we dichotomized the patients into low or high risk, based on X-tile–derived cutoff values (X-tile software 3.6.1, Yale School of Medicine). Additionally, we performed univariate and multivariate Cox proportional hazard regression models using clinicopathologic variables and MRC to calculate estimate hazard ratios (HR). Only the significant variables in the univariate model were used to perform the multivariate analysis.
Results
Discovery and validation of an 8-gene miRNA classifier for predicting recurrence in patients with stage II and III colorectal cancer
Based upon the study design illustrated in Fig. 1, we performed a genome-wide, unbiased biomarker discovery to identify an miRNA signature that allowed stratification of patients with low- and high-risk stage II and III colorectal cancer. As the TCGA dataset consisted of miRNA sequencing profiles from two different platforms, we used one of these for biomarker discovery (TCGA-HiSeq, N = 158) and the other for validation (TCGA-GA, N = 109) purposes. In the TCGA-HiSeq discovery cohort, we compared miRNA expression profiles between high- and low-risk groups who had at least a minimum of 3-year follow-up and identified 25 targets with an absolute log2 fold change difference of 0.2, a P value less than 0.05 (Wilcoxon signed-rank test), and an average expression level of greater than three transcripts per million. Based on multivariate Cox regression analysis using a total of 25 miRNAs, eight candidates with top statistical significance (P value < 0.2) were further selected, which includes hsa-mir-191, hsa-mir-200b, hsa-mir-30b, hsa-mir-30c2, hsa-mir-33a, hsa-mir-362, hsa-mir-429, and hsa-mir-744 (Fig. 2A, Supplementary Table S2).
An miRNA classifier volcano plot and Kaplan–Meier curves predicting RFI in the TCGA discovery, TCGA validation, and GSE validation cohorts. A, Volcano plot showing the significant and differentially regulated miRNAs selected in the TCGA discovery cohort. Selected miRNAs are depicted in the figure. The Kaplan–Meier survival plots for RFI stratified by MRC scores in the (B) TCGA discovery cohort (N = 158), (C) TCGA validation cohort (N = 109), and (D) GSE29623 validation cohort (N = 40). E and F, The Kaplan–Meier plots illustrating that both CMS4 and non-CMS4 patients with high miRNA risk scores exhibited shorter RFI in the TCGA discovery and validation cohorts, respectively.
An miRNA classifier volcano plot and Kaplan–Meier curves predicting RFI in the TCGA discovery, TCGA validation, and GSE validation cohorts. A, Volcano plot showing the significant and differentially regulated miRNAs selected in the TCGA discovery cohort. Selected miRNAs are depicted in the figure. The Kaplan–Meier survival plots for RFI stratified by MRC scores in the (B) TCGA discovery cohort (N = 158), (C) TCGA validation cohort (N = 109), and (D) GSE29623 validation cohort (N = 40). E and F, The Kaplan–Meier plots illustrating that both CMS4 and non-CMS4 patients with high miRNA risk scores exhibited shorter RFI in the TCGA discovery and validation cohorts, respectively.
Further validation of the MRC using Kaplan–Meier and log-rank analysis significantly predicted RFI in all three cohorts: the TCGA-HiSeq cohort [HR, 2.72; 95% confidence interval (CI), 1.48–5.00; P = 0.002; Fig. 2B], the TCGA-GA cohort (HR, 2.71; 95% CI, 1.64–4.51; P = 0.00006; Fig. 2C), and the GSE29623 cohort (HR, 2.73; 95% CI, 1.42–5.22; P = 0.002; Fig. 2D). The AUC values for predicting the tumor recurrence in both validation cohorts were 0.79 (95% CI, 0.67–0.89) and 0.88 (95% CI, 0.78–0.99), respectively, highlighting the validity of the miRNA classifier.
miRNA classifier predicted cancer recurrence independent of gene expression–based CMS status of patients with colorectal cancer
We were also curious to investigate whether our MRC could predict recurrence irrespective of the CMS subtype in patients with colorectal cancer. The CMS labels of the TCGA cohort were obtained from the Colorectal Cancer Subtyping Consortium (CRCSC) repository (19). The log-rank analysis demonstrated that regardless of the gene expression subtype, our MRC was able to significantly predict RFI in both TCGA cohorts (Fig. 2E and F). Especially for prediction of recurrence, our MRC outperforms the CMS classifier significantly in both TCGA cohorts (P = 0.00975 and 0.0000187, respectively, DeLong test; Supplementary Fig. S1).
Validation of the miRNA classifier in fresh-frozen tissues from patients with stage II and III colorectal cancer
Supplementary Fig. S2 illustrates the Consolidated Standards of Reporting Trials (CONSORT) diagram for all clinical validation cohorts.
To determine whether our MRC derived from the in silico datasets was robust, we first evaluated its performance in cohort 1, which was comprised of 127 fresh-frozen tissues from patients with stage II and III colorectal cancer. We measured expression levels of all 8 miRNAs in colorectal cancer tissues and used Cox proportional hazard models to build a prognostic classifier. As depicted in Fig. 3A, the 5-year RFI significantly dropped from 87% to 63% in MRC-derived low- versus high-risk patients (P < 0.001), with an HR of 3.44 (1.56–7.45). The HR in stage II patients was 7.54 (1.71–33.2; Fig. 3B), whereas in stage III, it was 4.23 (1.77–10.11; Fig. 3C). Furthermore, the MRC achieved an AUC of 0.70 in both stages, with a superior recurrence prediction in patients with stage II (AUC, 0.89) versus stage III (AUC, 0.72) colorectal cancer, which is clinically quite exciting.
Stage-wise survival curves predicting RFI in fresh-frozen specimens in clinical cohort 1. The Kaplan–Meier survival plots for RFI stratified by MRC scores in (A) combined stage II and III colorectal cancer patients, (B) stage II patients, and (C) stage III colorectal cancer patients and (D) HRs of the miRNA classifier, NCCN risk classification, and other clinicopathologic variables presented for patients with stage II colorectal cancer. Adj chemo, adjuvant chemotherapy; LN, lymph node.
Stage-wise survival curves predicting RFI in fresh-frozen specimens in clinical cohort 1. The Kaplan–Meier survival plots for RFI stratified by MRC scores in (A) combined stage II and III colorectal cancer patients, (B) stage II patients, and (C) stage III colorectal cancer patients and (D) HRs of the miRNA classifier, NCCN risk classification, and other clinicopathologic variables presented for patients with stage II colorectal cancer. Adj chemo, adjuvant chemotherapy; LN, lymph node.
In the univariate analysis, among the three significant variables, MRC emerged as the strongest predictor of recurrence versus tumor stage and lymphatic invasion (Table 2). However, in the multivariate analysis, MRC remained as the only significant predictor of recurrence (HR, 2.54; 95% CI, 1.29–4.99). From a clinical viewpoint, we were enthused to observe that our MRC-based risk stratification was significant in predicting recurrence in patients with stage II colorectal cancer, whereas the NCCN criteria did not work (Fig. 3D).
Univariate and multivariate analysis of MRC and clinicopathologic factors in all three clinical cohorts
| . | Cohort 1 . | Cohort 2 . | Cohort 3 . | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | Univariate . | Multivariate . | Univariate . | Multivariate . | Univariate . | Multivariate . | ||||||
| . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . |
| Age (<65 vs. >65) | NA | NA | NA | NA | 0.40 | 1.32 (0.69–2.51) | 0.35 | 0.74 (0.39–1.4) | ||||
| Gender (M vs. F) | 0.06 | 2.32 (0.99–5.45) | 0.61 | 1.17 (0.64–2.16) | 0.53 | 0.82 (0.44–1.53) | ||||||
| Location | 0.19 | 2.62 (0.62–11.06) | 0.64 | 0.85 (0.43–1.69) | 0.11 | 0.49 (0.23–1.06) | ||||||
| Differentiation | 0.89 | 0.87 (0.12–6.4) | 0.80 | 0.86 (0.26–2.77) | 0.87 | 1.09 (0.39–3.07) | ||||||
| T4 vs. T2 + 3 | 0.06 | 2.10 (0.97–4.53) | 0.08 | 1.74 (0.94–3.25) | 0.19 | 1.61 (0.79–3.3) | ||||||
| Tumor size | 0.84 | 1 (0.98–1.02) | 0.72 | 1 (0.99–1.02) | 0.46 | 1 (0.99–1.01) | ||||||
| Venous invasion | 0.19 | 1.78 (0.76–4.18) | 0.37 | 1.71 (0.53–5.53) | 0.16 | 2.79 (0.67–11.56) | ||||||
| Lymphatic invasion | 0.01 | 2.58 (1.22–5.46) | 0.14 | 1.82 (0.82–4.07) | 0.68 | 1.14 (0.62–2.1) | 0.01 | 2.43 (1.24–4.79) | 0.12 | 1.83 (0.84–3.99) | ||
| Stage (III vs. II) | 0.03 | 2.56 (1.09–6.02) | 0.23 | 1.75 (0.7–4.39) | 0.02 | 2.16 (1.16–4.03) | 0.03 | 2.01 (1.07–3.74) | 0.06 | 1.85 (0.97–3.51) | 0.44 | 1.33 (0.64–2.78) |
| CEA (low vs. high) | 0.90 | 0.95 (0.43–2.1) | 0.10 | 1.67 (0.91–3.06) | 0.60 | 1.19 (0.63–2.23) | ||||||
| Lymph node number | 0.18 | 0.54 (0.22–1.33) | 0.64 | 1.22 (0.54–2.74) | 0.13 | 0.55 (0.25–1.2) | ||||||
| Adjuvant chemotherapy | 0.35 | 1.43 (0.68–2.99) | 0.61 | 0.85 (0.45–1.6) | 0.22 | 1.48 (0.79–2.75) | ||||||
| MSI | 0.81 | 0.78 (0.10–5.79) | 0.16 | 0.04 (0–3.65) | 0.53 | 0.63 (0.15–2.62) | ||||||
| miRNA classifier | 0.001 | 3.44 (1.56–7.45) | 0.01 | 2.54 (1.29–4.99) | 0.0001 | 6.15 (3.33–11.35) | 0.00004 | 5.11 (2.26–11.53) | 0.0003 | 4.23 (2.26–7.92) | 0.00001 | 3.94 (1.64–9.43) |
| . | Cohort 1 . | Cohort 2 . | Cohort 3 . | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | Univariate . | Multivariate . | Univariate . | Multivariate . | Univariate . | Multivariate . | ||||||
| . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . | P . | HR (95% CI) . |
| Age (<65 vs. >65) | NA | NA | NA | NA | 0.40 | 1.32 (0.69–2.51) | 0.35 | 0.74 (0.39–1.4) | ||||
| Gender (M vs. F) | 0.06 | 2.32 (0.99–5.45) | 0.61 | 1.17 (0.64–2.16) | 0.53 | 0.82 (0.44–1.53) | ||||||
| Location | 0.19 | 2.62 (0.62–11.06) | 0.64 | 0.85 (0.43–1.69) | 0.11 | 0.49 (0.23–1.06) | ||||||
| Differentiation | 0.89 | 0.87 (0.12–6.4) | 0.80 | 0.86 (0.26–2.77) | 0.87 | 1.09 (0.39–3.07) | ||||||
| T4 vs. T2 + 3 | 0.06 | 2.10 (0.97–4.53) | 0.08 | 1.74 (0.94–3.25) | 0.19 | 1.61 (0.79–3.3) | ||||||
| Tumor size | 0.84 | 1 (0.98–1.02) | 0.72 | 1 (0.99–1.02) | 0.46 | 1 (0.99–1.01) | ||||||
| Venous invasion | 0.19 | 1.78 (0.76–4.18) | 0.37 | 1.71 (0.53–5.53) | 0.16 | 2.79 (0.67–11.56) | ||||||
| Lymphatic invasion | 0.01 | 2.58 (1.22–5.46) | 0.14 | 1.82 (0.82–4.07) | 0.68 | 1.14 (0.62–2.1) | 0.01 | 2.43 (1.24–4.79) | 0.12 | 1.83 (0.84–3.99) | ||
| Stage (III vs. II) | 0.03 | 2.56 (1.09–6.02) | 0.23 | 1.75 (0.7–4.39) | 0.02 | 2.16 (1.16–4.03) | 0.03 | 2.01 (1.07–3.74) | 0.06 | 1.85 (0.97–3.51) | 0.44 | 1.33 (0.64–2.78) |
| CEA (low vs. high) | 0.90 | 0.95 (0.43–2.1) | 0.10 | 1.67 (0.91–3.06) | 0.60 | 1.19 (0.63–2.23) | ||||||
| Lymph node number | 0.18 | 0.54 (0.22–1.33) | 0.64 | 1.22 (0.54–2.74) | 0.13 | 0.55 (0.25–1.2) | ||||||
| Adjuvant chemotherapy | 0.35 | 1.43 (0.68–2.99) | 0.61 | 0.85 (0.45–1.6) | 0.22 | 1.48 (0.79–2.75) | ||||||
| MSI | 0.81 | 0.78 (0.10–5.79) | 0.16 | 0.04 (0–3.65) | 0.53 | 0.63 (0.15–2.62) | ||||||
| miRNA classifier | 0.001 | 3.44 (1.56–7.45) | 0.01 | 2.54 (1.29–4.99) | 0.0001 | 6.15 (3.33–11.35) | 0.00004 | 5.11 (2.26–11.53) | 0.0003 | 4.23 (2.26–7.92) | 0.00001 | 3.94 (1.64–9.43) |
NOTE: Boldface represents significant variables.
Abbreviations: CEA, carcinoembryonic antigen; F, female; M, male; NA, not available.
Training and validation of the miRNA classifier in independent FFPE cohorts to evaluate its translational potential
To evaluate the translational potential of our MRC in identifying high-risk patients, we deliberately examined its performance in FFPE tissues, which are routinely available in clinical settings. To this end, we divided our large FFPE cohort into a training (cohort 2) and a validation set (cohort 3). Using the Cox proportional hazards model, we initially trained a classifier on the 8-miRNA signature and subsequently applied the coefficients derived from this model to the validation cohort.
The risk scores for each patient in the training cohort were as follows: MRC risk score = (−0.1218 × miR-744) + (−3.7142 × miR-429) + (−2.2051 × miR-362) + (3.0564 × miR-200b) + (2.4997 × miR-191) + (−0.0065 × miR-30c2) + (2.2224 × miR-30b) + (−1.1162 × miR-33a). Based upon the Cox model–derived risk scores (Fig. 4A and G), patients from both the training and validation cohorts were stratified into MRC low-and high-risk groups, using a cutoff threshold of −0.04. When we assessed the distribution of risk scores and recurrence status, we observed that the high-risk patients had a significantly shorter RFI versus low-risk patients in both cohorts, with an HR of 6.15 (3.33–11.35) and 4.23 (2.26–7.92), respectively. Likewise, the 5-year recurrence-free probability in high-risk stage II and III patients was 56% and 57%, whereas it was 91% and 88% in low-risk patients in the training and validation cohorts, respectively (Fig. 4B and H). To verify the stage-wise tumor recurrence risk, we performed the log-rank analysis separately for each stage. In line with the results from cohort 1, in addition to excellent risk stratification for both stages, HRs for stage II patients were significantly higher in both cohorts (Fig. 4C, D and I, J).
Training and validation of the MRC in the FFPE specimens in clinical cohorts 2 and 3. A and G, MRC risk score violin plots from Cox regression model of the 8-miRNA signature in the training and validation cohorts, respectively. B, C, D and H, I, J, Stage-wise Kaplan–Meier plots for the RFI in the training (N = 165) and validation (N = 139) cohorts—stratified based on the MRC risk scores and (E and K) ROC curves achieved with MRC risk scores as well as its combination with the tumor stage and lymphatic invasion in the training and validation cohorts, respectively. F and L, HRs of the miRNA classifier, NCCN risk classification, and other clinicopathologic variables presented for stage II colorectal cancer from both cohorts. Adj chemo, adjuvant chemotherapy; LN, lymph node; Ly, lymphatic invasion.
Training and validation of the MRC in the FFPE specimens in clinical cohorts 2 and 3. A and G, MRC risk score violin plots from Cox regression model of the 8-miRNA signature in the training and validation cohorts, respectively. B, C, D and H, I, J, Stage-wise Kaplan–Meier plots for the RFI in the training (N = 165) and validation (N = 139) cohorts—stratified based on the MRC risk scores and (E and K) ROC curves achieved with MRC risk scores as well as its combination with the tumor stage and lymphatic invasion in the training and validation cohorts, respectively. F and L, HRs of the miRNA classifier, NCCN risk classification, and other clinicopathologic variables presented for stage II colorectal cancer from both cohorts. Adj chemo, adjuvant chemotherapy; LN, lymph node; Ly, lymphatic invasion.
We next assessed the accuracy of our miRNA classifier in recurrence prediction by performing ROC analysis. As illustrated in Fig. 4E and K, our MRC achieved an AUC of 0.71 (0.63–0.80) and 0.77 (0.68–0.85) in predicting recurrence in stage II and III patients in both cohorts, respectively. Univariate analysis revealed that together with MRC, tumor stage and lymphatic invasion were significantly associated in predicting recurrence in patients with stage II and III colorectal cancer. However, in multivariate analysis, MRC emerged as the only significant predictor of tumor recurrence in both cohorts (Table 2). In view of these findings, we combined MRC risk scores with tumor stage and lymphatic invasion, which further improved the AUC for recurrence prediction in both stages of colorectal cancer patients to 0.76 (0.68–0.84) and 0.80 (0.72–0.87) in the training and validation cohorts, respectively (Fig. 4E and F). As was the case in cohort 1, compared with NCCN criteria and other clinicopathologic variables that failed, our MRC was successful in identifying patients with true, high-risk, stage II colorectal cancer with excellent accuracy, and it was superior to predictive accuracies of currently available ColoPrint and Oncotype DX assays, especially in FFPE specimens (Fig. 4F and L). We have performed Spearman rho correlation between MRC and clinicopathologic risk factors for all the clinical cohorts, and as it is evident, the MRC is highly correlated with recurrence (Supplementary Table S3).
The MRC predicts tumor recurrence independent of adjuvant chemotherapy status in patients with colorectal cancer
A large subset of patients in our clinical validation cohorts were treated with adjuvant chemotherapy, which could potentially affect tumor recurrence prediction. To assess any such potential confounding effects, we analyzed the associations between MRC-derived risk subgroups and tumor recurrence separately in stage II and III patients who did and did not receive 5FU-based chemotherapy. In untreated patients, our MRC was still significantly associated with poor RFI in patients with stage II and III colorectal cancer (Supplementary Fig. S3A–S3D). In contrast, in patients who received 5FU, although we did not see any significant associations in stage II patients (probably due to small sample size), we noted that our MRC was robust in identifying high-risk stage III patients (Supplementary Fig. S3E–S3H), highlighting its recurrence prediction potential regardless of the adjuvant chemotherapy status in patients with colorectal cancer.
MRC low-risk microsatellite stable patients benefited from 5FU adjuvant therapy alone, whereas the high-risk group did not
In our FFPE cohort (both cohorts 2 and 3) only fluoropyrimidine was administered for adjuvant chemotherapy. To estimate whether MRC could predict the benefit of fluoropyrimidine adjuvant chemotherapy in stage III patients, we investigated the association between MRC risk and RFI among patients who did and did not receive fluoropyrimidine adjuvant therapy. Although there is no significant difference when MSI patients are included in the analysis (Supplementary Fig. S4A–S4C and S4E–S4G), the analysis in microsatellite-stable patients revealed that treatment with fluoropyrimidine adjuvant therapy was associated with a significant gain in 5-year recurrence-free probability [5-year RFI of 87% with chemotherapy vs. 67% with no chemotherapy, HR, 3.57 (0.96–13.25)] in the stage III MRC low-risk patient population (Supplementary Fig. S4H). On the other hand, there is no significant difference between patients who did and did not receive fluoropyrimidine adjuvant therapy in the stage III MRC high-risk patient population (Supplementary Fig. S4D).
Discussion
In our quest to develop a robust colorectal cancer prognostic signature, we have successfully developed an 8-miRNA signature that achieved excellent predictive values in tumor recurrence, in patients with both stage II and III colorectal cancer, which were validated in two independent clinical cohorts. Furthermore, when compared with the other clinicopathologic risk factors, our miRNA classifier remained the strongest prognostic indicator regardless of the adjuvant chemotherapy status. To further highlight the clinical significance of our findings, although the NCCN criteria failed to identify patients with high-risk stage II colorectal cancer, our MRC significantly stratified patients from all clinical cohorts into high- and low-risk subgroups rather robustly.
Previously, a 6-gene miRNA-based classifier was reported to predict tumor recurrence in patients with stage II colorectal cancer (28); however, in this study, our miRNA classifier performed significantly better and illustrated its ability to predict recurrence not only in patients with stage II colorectal cancer but in patients with stage III colorectal cancer. A recent gene expression–based CMS (19) identified a CMS4 mesenchymal subtype of colorectal cancer patients associated with poor prognosis. However, an eventual clinical translation of such a gene expression panel–based approach is challenging, primarily due to two reasons: (i) the number of genes involved in CMS4 subtyping is quite large, and (ii) any gene expression–based assay will require high-quality, fresh-frozen, RNA-preserved tissues, which is not always practical in routine clinical practice. Based upon our recent findings that the miR-200 family plays a central role in orchestrating the CMS4 subtype (20), and in view of the relative stability of miRNAs in a variety of biological fluids and FFPE tissues, these short noncoding RNAs present as attractive targets for biomarker development in patients with colorectal cancer.
Although we did not perform a direct comparison, the recurrence prediction values for our MRC were superior to gene expression–based signatures offered by the ColoPrint and Oncotype DX assays (29). It would also be interesting to validate our markers or stratify based on CDX2 as well as recently published immune scores in future (30, 31). An ideal prognostic classifier for colorectal cancer risk prediction should be robust, reproducible, and most importantly, potentially feasible in FFPE materials, which would eliminate the need to plan and invest methodologies to collect and preserve fresh-frozen tumor specimens. Our miRNA classifier successfully overcomes these barriers, as evidenced by its superior performance and independent validation in a large cohort of FFPE specimens. The availability of ideal prognostic and predictive biomarkers is essential for achieving the clinical goals in refining the therapeutic decisions, and thereby improving the survival and quality of life of patients with colorectal cancer. Although we did not have access to blood specimens in this study, we feel encouraged that given the stability and relative abundance of miRNAs in circulation, it is very likely that our miRNA signature may eventually be translated into a blood-based, predictive, surveillance assay.
A recent study published by Cantini and colleagues (32) reported miRNAs differentially expressed across gene expression–based colorectal cancer subtypes. They showed that miRNAs 200b, 33a, 362, and 429, which are present in our MRC, are associated with poor colorectal cancer CMS4 subtype. Furthermore, we and others have shown previously that mir-200 and mir-429 are associated with epithelial–mesenchymal transition (EMT) and stemness (20, 33, 34). The role of miRNAs 362, 33a, 30c2, 30b, 744, and 191 in cancer progression, EMT, and chemotherapy resistance has been reported earlier (35–41). This further exemplifies the importance of the miRNAs we found with an unbiased and systematic approach to be associated with poor prognosis in colorectal cancer.
With regard to potential limitations, our current study is retrospective in nature, and our results must be validated in future, prospective, multicenter clinical trials.
In conclusion, we provide a novel evidence that our MRC can effectively stratify patients with stage II and III colorectal cancer into high- and low-risk groups based upon clinical outcomes, thereby offering a significantly improved prognostic biomarker potential compared with the currently used clinicopathologic risk factors. Notably, our study has several strengths related to the study design and analytical methods. The miRNA classifier was validated in independent in silico datasets as well as two independent population-based clinical cohorts. As we developed a “risk prediction model” using our 8-miRNA signature, the scores can be readily applied to independent, future prospective cohorts. Although our assay also demonstrated effectiveness in FFPE tissues, we noted that the expression of three of the miRNAs was discrepant (not significant) in our validation cohort compared with the TCGA dataset used in the initial discovery. This effect may be due to the following two reasons: (i) the biological differences between fresh-frozen and FFPE tissues and (ii) a fairly common issue for the existence of “collinearity” in biomarker studies, that is, some of these miRNAs may have correlated expression levels, which might confound each other in the linear regression model—hence the observed differences in model coefficients between the two cohorts. Nonetheless, pending further optimization and validation in future studies, such an miRNA classifier potentially offers tremendous clinical value in directing personalized treatment regimens and clinical management of patients with stage II and III colorectal cancer.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: R. Kandimalla, F. Gao, X. Wang, A. Goel
Development of methodology: R. Kandimalla, F. Gao, X. Wang, A. Goel
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): R. Kandimalla, F. Gao, T. Matsuyama, T. Ishikawa, N. Takahashi, Y. Yamada, C.R. Becerra, S. Kopetz
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): R. Kandimalla, F. Gao, S. Kopetz, X. Wang, A. Goel
Writing, review, and/or revision of the manuscript: R. Kandimalla, F. Gao, Y. Yamada, C.R. Becerra, S. Kopetz, X. Wang, A. Goel
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): T. Ishikawa, A. Goel
Study supervision: H. Uetake, X. Wang
Acknowledgments
This work was supported by the CA72851, CA181572, CA184792, CA187956, and CA202797 grants from the National Cancer Institute, NIH; RP140784 from the Cancer Prevention Research Institute of Texas; grants from the Baylor Foundation and Baylor Scott & White Research Institute awarded to A. Goel; and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 21101115); as well as a grant from The Science Technology and Innovation Committee of Shenzhen (JCYJ20170307091256048) awarded to X. Wang.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.



