The differential gene expression patterns between normal colonic fibroblasts (NCF), carcinoma-associated fibroblasts from primary tumors (CAF-PT), and CAFs from hepatic metastasis (CAF-LM) are hypothesized to be useful for predicting relapse in primary tumors. A transcriptomic profile of NCF (n = 9), CAF-PT (n = 14), and CAF-LM (n = 11) was derived. Prediction Analysis of Microarrays (PAM) was used to obtain molecular details for each fibroblast class, and differentially expressed transcripts were used to classify patients according to recurrence status. A number of transcripts (n = 277) were common to all three types of fibroblasts and whose expression level was sequentially deregulated according to the transition: NCF→CAF-PT→CAF-LM. Importantly, the gene signature was able to accurately classify patients with primary tumors according to their prognosis. This capacity was exploited to obtain a refined 19-gene classifier that predicted recurrence with high accuracy in two independent datasets of patients with colorectal cancer and correlates with fibroblast migratory potential. The prognostic power of this genomic signature is strong evidence of the link between the tumor-stroma microenvironment and cancer progression. Furthermore, the 19-gene classifier was able to identify low-risk patients very accurately, which is of particular importance for stage II patients, who would benefit from the omission of chemotherapy, especially T4N0 patients, who are clinically classified as being at high risk.

Implications: A defined stromal gene expression signature predicts relapse in patients with colorectal cancer. Mol Cancer Res; 12(9); 1254–66. ©2014 AACR.

The huge outcome heterogeneity among patients with cancer, which hinders their clinical management, cannot be explained solely by the genetic background of the cancer cells. Some of this variation is due to intratumoral heterogeneity, whereby tumors contain multiple cell populations, each with distinct characteristics of karyotype, receptors, antigenicity, etc. The relative abundance of each cell type varies considerably between tumors and even between parts of the same tumor (1). However, the predominant cell type in desmoplastic tumors is the carcinoma-associated fibroblast (CAF; ref. 2).

To date, several genetic classifiers of colorectal cancer have been identified (3–6) but none of these is of relevance in relation to the stroma. In this article, we provide information about the importance of the transcriptomic status of fibroblasts to colorectal cancer prognosis.

Fibroblasts are ubiquitous mesenchymal cells with different vital functions during development, wound healing, and disease. Depending on their anatomic location, fibroblasts should be considered as different cell types as they have particular gene expression programs and different features according to their tissue specialization (7). However, a stereotyped gene expression program in response to serum was identified in fibroblasts from 10 different anatomic sites, revealing that in response to a wound many of the normal differentiated particularities of the fibroblasts in the wounded tissue are temporarily set aside in favor of a common response (8). In addition, this common serum response gene expression program was identified in many human tumors and was a powerful predictor of metastasis in different carcinomas. The transcriptomic program of serum response genes was used as a prototype of the fibroblasts' activation pattern, and the correlation of this expression in fibroblasts with a centroid of the differential expression in tumor samples displayed a powerful prognostic information, generating a continuous score that can be scaled (9).

It is already known that as tumor evolves, malignant cells educate surrounding stroma to create an adequate microenvironment for their convenience (10). This education can be understood through the degree of activation of fibroblasts as a surrogate marker. In other words, as the fibroblasts interact with other cells in the tumor, fibroblasts progressively evolve, and this evolution could provide new biomarkers and valuable information for patient's outcome.

Thus, according to the concept that there is a common serum response program in different fibroblasts, despite being considered different cells, we hypothesize that since the earliest start of tumorigenic process, when an oncogenic stress transforms an epithelial cell until the cell metastasizes, there is a common and monotonic response of the surrounding fibroblasts that can be measured and associated to prognosis. We call this response as the fibroblasts progression genes (FPG). We used this common response signature (277 genes) to develop a 25-gene classifier that characterized each fibroblast type with respect to location and that was of prognostic value for classifying patients with primary tumors. We also defined and validated a 19-gene classifier, being the best from the 277 genes predicting recurrence in patients with colorectal cancer. This simple signature score has remarkable predictive capabilities that confirm the biologic relevance of CAFs and their gene programs to cancer progression.

Fibroblast isolation

We obtained fibroblast cultures from fresh surgical specimens resected from patients with primary colorectal carcinoma: normal colonic fibroblasts (NCF, 9) from the normal colonic mucosa (NCM) at least 5 to 10 cm from the surgical margin, carcinoma-associated fibroblasts from the primary tumor (CAF-PT, 14), and carcinoma-associated fibroblasts (CAF-LM, 11) from fresh surgical specimens of liver metastases. All samples were collected under the supervision of the Ethics Committee of the Hospital Universitari de Bellvitge. These 34 samples comprise the MB dataset. Clinical baseline data of patients whose fibroblasts were isolated are in Supplementary Table S1. To establish primary cultures, we homogenized surgical specimens with collagenase IV and dispase in a MACS Separation Unit (Miltenyi Biotec), depleted homogenates from epithelial cells using anti-EPCAM Dynabeads (Invitrogen), and positively selected fibroblasts using anti-fibroblast microbeads (reference 130-050-601 Miltenyi Biotec). Cells were then cultured in DMEMF12 + 10% FBS (Gibco) with added penicillin/streptomycin.

To confirm that cultures derived from tumor specimens were pure fibroblast cultures, we used a large panel of markers (vinculin, α-smooth muscle actin, vimentin, E-cadherin, N-cadherin, and VE-cadherin) to exclude epithelial and endothelial cell contamination (Supplementary Fig. S1).

Microarray procedures and signature definition

RNA from fibroblast cultures in passage #3 was used to hybridize with the Affymetrix GeneChip Human Gene 1.0 ST Array. The microarray data were read, normalized, and transformed to numerical expression values using the justRMA function in the simpleaffy package (11). Differentially expressed genes were identified using the multi-class Significance Analysis of Microarrays (SAM) technique, available in the samr package (12).

We selected those probes that simultaneously displayed a robust multi-array average (RMA) normalized expression value of >4 in the three fibroblast populations (MB dataset). This yielded 10,779 probes common to NCF, CAF-PT, and CAF-LM. This rule was intended to exclude genes that had not normalized expression in at least one category, as well as background noise (Supplementary Fig. S2). The signature was defined as follows: fibroblast progression genes (FPG; NCF>CAF-PT>CAF-LM and NCF<CAF-PT<CAF-LM) contained genes with a log n-fold change (logFC) NCF versus CAF-PT of >1.5 and CAF-PT versus CAF-LM of >1.5 (179 genes), and CAF-PT versus NCF of >1.5 and CAF-LM versus CAF-PT of >1.5 (98 genes); q < 0.05. Therefore, the FPG signature comprises genes whose expression level increased or decreased during the NCM–primary tumor–liver metastasis sequence.

Definition of FPG class prediction

The Prediction Analysis of Microarrays (PAM; ref. 13) nearest shrunken centroid classifier was applied to the MB dataset to refine the 277 differentially expressed genes (FPGs) to derive a classifier that better characterized each class (normal colonic fibroblasts, NCFs; CAFs from the primary tumor, CAF-PTs; and CAFs from the liver metastasis, CAF-LMs). PAM estimates the predicted error rate based on misclassification error, which is calculated by averaging the errors from each of the 10-fold cross-validations (CV).

The classifier was then validated in external datasets. GSE31279 (14) comprised 24 samples obtained by laser capture microscopy (LCM): 10 normal colonic stroma, 10 carcinoma-associated stroma, and 4 metastasis-stroma; GSE22598 (15) comprised 17 whole-tissue samples of NCM and 17 paired colorectal carcinomas; the LP dataset comprised 19 colorectal carcinomas and 21 paired liver metastases from our own institution. Public datasets from GEO are detailed in Supplementary Table S2.

Finally, we computed a signature score for the genes of the classifier to assess its power in predicting patient survival in two independent datasets, GSE14333 (ref. 6; 31, stage I; 64, stage II; and 71, stage III patients) and GSE33133 (16). The signature score was obtained by summarizing centered gene expression means and computing a z-score for each patient. The expression score for each patient was classified as high or low according to the cutoff provided by Euclidian distance in the ROC curve analysis.

Recurrence-prediction classifier development

We developed a recurrence signature from the 277 genes comprising the FPG signature. PAM analysis identified relevant genes for the classifier from GSE14333 (RMA-normalized) defining the recurrent and nonrecurrent classes. Internal validation was done by the leave-one-out CV method. Validation was carried out with GSE17538 (ref. 5; RMA-normalized).

We also computed the z-score (signature score normalized by SD) and classified patients as high or low expression according to the cutoff obtained by Euclidean distance in the ROC curve analysis for recurrence prediction (cutoff, 0.187). Patients with a score >0.187 were classified as high risk and score <0.187 as low-risk. The same cutoff value (0.187) was applied to the validation datasets. Results were validated in the independent datasets GSE17538 (144 patients; 33, stage I; 50, stage II; and 61, stage III) and GSE33113 (ref. 16; 90 stage II patients; RMA-normalized) taking disease-free survival (DFS) as a primary endpoint. GSE14333 and GSE17538 cohorts partially overlapped. Cases duplicated in GSE14333 and GSE17538 were excluded from the validation.

We chose these datasets for training and validation because data came from whole-tumor samples without any enrichment in tumor cells, ensuring a minimum stromal percentage between 30% and 50%.

Wound-healing assay

Fibroblasts were grown on P60 dishes until confluence. We made a small scratch with a yellow pipette tip. After several PBS washes to remove floating cells, we began the assay by adding fresh medium. We obtained images of a predefined area of the scratch under a Leica microscope every 2 hours. The experiment was done twice, with three replicates on each occasion.

Ontology analysis

The Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 was used to determine likely biologic processes in which differentially expressed genes (FPGs) were involved. Gene ontology (GO) attributes of biologic processes were listed, ordered by their FDR-adjusted q value of <0.05.

Determination of the 19-gene signature cell-type specificity

We used data from GSE39396 (17) to check the stromal specificity of the genes in our CAF signature. GSE39396 consisted of four cell type populations isolated by FACS from six different colorectal tumors: EPCAM+ epithelial cells, CD45+ inflammatory cells, CD31+ endothelial cells, and FAPα+ CAFs.

Internal microarray validation

We internally validated the genes comprising the 25-gene signature by means of RT-PCR, confirming the results by Western blotting. The antibodies used for this purpose were rabbit anti-human SLC7A2 (HPA009169; Sigma-Aldrich), rabbit anti-human TGFB2 (V) (sc-90; Santa Cruz Biotechnology, Inc.), and rabbit anti-human ARHGDIB (Ab15198; Abcam).

Statistical analysis

Kaplan–Meier and Cox regression analysis estimates for DFS and disease-specific survival (DSS) were calculated and compared by the likelihood ratio tests. Recurrence and death from disease were considered as events.

Positive and negative posterior probabilities were chosen to measure the power of the classifier when considering clinical decision making.

Differential gene expression of fibroblasts along cancer progression

According to the overall design (Supplementary Fig. S2), we isolated fibroblasts (n = 34) from the three microenvironments in which colorectal epithelial cells reside (NCM, primary tumor, and liver metastasis). Because fibroblasts from different locations seem to have particular transcriptomic programs but share a stereotyped gene expression pattern in response to serum (8) and this response can be scaled to predict prognosis (9), we aimed to establish whether there is a set of fibroblastic genes whose expression arises gradually according to their anatomic demarcation involved in the steps of colorectal cancer progression (NCF, CAF-PT, and CAF-LM). After filtering the data, 10,779 probes met the characteristics imposed (cartoon, Supplementary Fig. S3A), most of which corresponded to development-related ontologies (Table 1). Interestingly, overexpressed genes in the sequence NCF-to-CAF-LM (low in NCF and high in CAF-LM) were also associated with wound-healing response, whereas in the opposite direction the biologic processes seem to be enriched in inflammatory response. From these probes, we selected those that fulfilled the criterion of a >1.5-fold change (normalized log2 value) in expression at each step at the same time (as illustrated in the cartoon, Supplementary Fig. S3B). We refer to this as monotonic gene expression during cancer progression, meaning that gene expression follows a decreasing or increasing monotonic function. By this means, we identified 98 and 179 genes that were upregulated and downregulated, respectively, from NCF to CAF-LM (Supplementary Table S3). Together, these 277 genes were designated as the FPG signature (fibroblastic genes relevant to cancer progression; Supplementary Fig. S3C).

Table 1.

Gene ontology

Overexpressed along the sequence NCF, CAF-PT, CAF-LM
CategoryTermCountPFold enrichmentFDR
GOTERM_BP_ALL GO:0009611∼response to wounding 31 4.86E−07 2.841 8.42E−04 
GOTERM_BP_ALL GO:0042060∼wound healing 17 2.04E−06 4.319 3.54E−03 
GOTERM_BP_ALL GO:0009605∼response to external stimulus 41 4.62E−06 2.173 8.00E−03 
GOTERM_BP_ALL GO:0008544∼epidermis development 16 5.95E−06 4.222 1.03E−02 
GOTERM_BP_ALL GO:0007398∼ectoderm development 16 1.36E−05 3.94 2.36E−02 
GOTERM_BP_ALL GO:0048856∼anatomical structure development 82 1.59E−05 1.568 2.75E−02 
GOTERM_BP_ALL GO:0048513∼organ development 62 1.80E−05 1.723 3.11E−02 
GOTERM_BP_ALL GO:0048731∼system development 77 1.81E−05 1.595 3.13E−02 
GOTERM_BP_ALL GO:0006928∼cell motion 26 1.81E−05 2.64 3.14E−02 
GOTERM_BP_ALL GO:0050878∼regulation of body fluid levels 13 3.24E−05 4.491 5.61E−02 
GOTERM_BP_ALL GO:0001568∼blood vessel development 17 5.09E−05 3.346 8.81E−02 
Infraexpressed along the sequence NCF, CAF-PT, CAF-LM
CategoryTermCountPFold enrichmentFDR
GOTERM_BP_ALL GO:0050727∼regulation of inflammatory response 12 9.41E−07 7.021 1.64E−03 
GOTERM_BP_ALL GO:0051270∼regulation of cell motion 18 1.62E−06 4.147 2.83E−03 
GOTERM_BP_ALL GO:0042127∼regulation of cell proliferation 40 2.51E−06 2.263 4.38E−03 
GOTERM_BP_ALL GO:0009611∼response to wounding 30 7.01E−06 2.546 1.22E−02 
GOTERM_BP_ALL GO:0002526∼acute inflammatory response 12 9.74E−06 5.559 1.70E−02 
GOTERM_BP_ALL GO:0045595∼regulation of cell differentiation 28 1.60E−05 2.541 2.80E−02 
GOTERM_BP_ALL GO:0045597∼positive regulation of cell differentiation 18 1.62E−05 3.495 2.82E−02 
GOTERM_BP_ALL GO:0030334∼regulation of cell migration 15 2.72E−05 3.947 4.75E−02 
GOTERM_BP_ALL GO:0040012∼regulation of locomotion 16 2.82E−05 3.706 4.92E−02 
GOTERM_BP_ALL GO:0009605∼response to external stimulus 41 2.93E−05 2.012 5.10E−02 
GOTERM_BP_ALL GO:0006950∼response to stress 63 3.49E−05 1.682 6.10E−02 
GOTERM_BP_ALL GO:0006954∼inflammatory response 21 3.98E−05 2.900 6.94E−02 
GOTERM_BP_ALL GO:0050793∼regulation of developmental process 33 4.75E−05 2.184 8.29E−02 
GOTERM_BP_ALL GO:0032101∼regulation of response to external stimulus 14 5.65E−05 3.940 9.85E−02 
Overexpressed along the sequence NCF, CAF-PT, CAF-LM
CategoryTermCountPFold enrichmentFDR
GOTERM_BP_ALL GO:0009611∼response to wounding 31 4.86E−07 2.841 8.42E−04 
GOTERM_BP_ALL GO:0042060∼wound healing 17 2.04E−06 4.319 3.54E−03 
GOTERM_BP_ALL GO:0009605∼response to external stimulus 41 4.62E−06 2.173 8.00E−03 
GOTERM_BP_ALL GO:0008544∼epidermis development 16 5.95E−06 4.222 1.03E−02 
GOTERM_BP_ALL GO:0007398∼ectoderm development 16 1.36E−05 3.94 2.36E−02 
GOTERM_BP_ALL GO:0048856∼anatomical structure development 82 1.59E−05 1.568 2.75E−02 
GOTERM_BP_ALL GO:0048513∼organ development 62 1.80E−05 1.723 3.11E−02 
GOTERM_BP_ALL GO:0048731∼system development 77 1.81E−05 1.595 3.13E−02 
GOTERM_BP_ALL GO:0006928∼cell motion 26 1.81E−05 2.64 3.14E−02 
GOTERM_BP_ALL GO:0050878∼regulation of body fluid levels 13 3.24E−05 4.491 5.61E−02 
GOTERM_BP_ALL GO:0001568∼blood vessel development 17 5.09E−05 3.346 8.81E−02 
Infraexpressed along the sequence NCF, CAF-PT, CAF-LM
CategoryTermCountPFold enrichmentFDR
GOTERM_BP_ALL GO:0050727∼regulation of inflammatory response 12 9.41E−07 7.021 1.64E−03 
GOTERM_BP_ALL GO:0051270∼regulation of cell motion 18 1.62E−06 4.147 2.83E−03 
GOTERM_BP_ALL GO:0042127∼regulation of cell proliferation 40 2.51E−06 2.263 4.38E−03 
GOTERM_BP_ALL GO:0009611∼response to wounding 30 7.01E−06 2.546 1.22E−02 
GOTERM_BP_ALL GO:0002526∼acute inflammatory response 12 9.74E−06 5.559 1.70E−02 
GOTERM_BP_ALL GO:0045595∼regulation of cell differentiation 28 1.60E−05 2.541 2.80E−02 
GOTERM_BP_ALL GO:0045597∼positive regulation of cell differentiation 18 1.62E−05 3.495 2.82E−02 
GOTERM_BP_ALL GO:0030334∼regulation of cell migration 15 2.72E−05 3.947 4.75E−02 
GOTERM_BP_ALL GO:0040012∼regulation of locomotion 16 2.82E−05 3.706 4.92E−02 
GOTERM_BP_ALL GO:0009605∼response to external stimulus 41 2.93E−05 2.012 5.10E−02 
GOTERM_BP_ALL GO:0006950∼response to stress 63 3.49E−05 1.682 6.10E−02 
GOTERM_BP_ALL GO:0006954∼inflammatory response 21 3.98E−05 2.900 6.94E−02 
GOTERM_BP_ALL GO:0050793∼regulation of developmental process 33 4.75E−05 2.184 8.29E−02 
GOTERM_BP_ALL GO:0032101∼regulation of response to external stimulus 14 5.65E−05 3.940 9.85E−02 

Refining the FPG signature: developing a classifier

We aimed to refine the FPG signature to determine the minimum number of genes whose expression levels characterize each class of fibroblast involved in colorectal cancer progression. Our 34 samples of fibroblasts (MB Dataset) were subjected to PAM using the nearest shrunken centroid method to define the classifier. Ten-fold cross-validation was used, randomly dividing the samples into 10 approximately equally sized parts. All parts were balanced to ensure that the three classes (NCF, CAF-PT, and CAF-LM) were distributed proportionally among each of the 10 parts. A minimum of 25 estimated genes predicted the lowest misclassification error (Supplementary Fig. S3D and S3E). Genes are listed in Table 2 and Supplementary Fig. S4A. Nine of these genes have been validated by means of RT-PCR (Supplementary Fig. S4B). Eleven of the 25 genes were downregulated, most of them [EYA4 (18), FBLN1 (19), (20), FBN2 (21–23), ADH1B (24), CYP39A1 (25), CHRDL1 (26), and TCF21 (27, 28)] by methylation in tumorigenic processes. The other 14, most of which are already known to be involved in cancer (TGFB2, NTF3 (29), UACA (30), SFRP4 (31), and ULBP2 (32), were overexpressed. Another gene like ARHGDIB has been associated with prognosis (33, 34) and cisplatin resistance (35). Selected genes were validated at the protein level (Supplementary Fig. S3F), and were found to have a greater level of expression in fibroblasts as the cancer progressed.

Table 2.

Twenty-five–gene signature and prediction in training dataset MB and validation datasets LP, GSE22598, and GSE31279

Average expression (log ratio)
Gene symbolGroupNCFCAF-PTCAF-LM
ARHGDIB UP −0.0179 0.9268 
C5orf46 UP 0.8287 
TMEM176B DW −0.5635 
TSPAN2 DW −0.5229 
EYA4 DW −0.4271 
CYP39A1 DW −0.327 
ADH1B DW 0.3203 
SRGN UP 0.3089 
CHRDL1 DW −0.304 
PTX3 UP 0.1956 
SLC7A2 UP 0.1849 
TGFB2 UP −0.172 
SFRP4 UP 0.1665 
FBLN1 DW −0.1621 
ULBP2 UP −0.1598 0.0049 
FBN2 DW −0.1571 
CCDC81 UP 0.1485 
GPR1 UP 0.1256 
TNFSF4 UP −0.0732 
CHL1 DW −0.0609 
UACA UP 0.056 
TCF21 DW −0.0406 
NTF3 UP −0.0191 
HECW2 UP 0.0072 
SLC38A4 DW −0.0004 
Dataset MBPrediction
Real statusNCFCAF-PTCAF-LMClass error rate
NCF 
CAF-PT 0.428 
CAF-LM 11 
Overall error rate = 0.174 (cross-validation) 
Validation
Dataset LPPrediction
Real statusPrimary tumorLiver metastasisClass error rate
Primary tumor 16 0.158 
Liver metastasis 21 
Overall error rate = 0.075 (cross-validation) 
GSE22598Prediction
Real statusNormal mucosaPrimary tumorClass error rate
Normal mucosa 15 0.118 
Primary tumor 17 
Overall error rate = 0.059 (cross-validation) 
GSE31279Prediction
Real statusNormal stromaPrimary tumor stromaLiver metastasisClass error rate
Normal stroma 0.10 
Primary tumor stroma 0.30 
Liver metastasis 0.25 
Overall error rate = 0.203 (cross-validation) 
Average expression (log ratio)
Gene symbolGroupNCFCAF-PTCAF-LM
ARHGDIB UP −0.0179 0.9268 
C5orf46 UP 0.8287 
TMEM176B DW −0.5635 
TSPAN2 DW −0.5229 
EYA4 DW −0.4271 
CYP39A1 DW −0.327 
ADH1B DW 0.3203 
SRGN UP 0.3089 
CHRDL1 DW −0.304 
PTX3 UP 0.1956 
SLC7A2 UP 0.1849 
TGFB2 UP −0.172 
SFRP4 UP 0.1665 
FBLN1 DW −0.1621 
ULBP2 UP −0.1598 0.0049 
FBN2 DW −0.1571 
CCDC81 UP 0.1485 
GPR1 UP 0.1256 
TNFSF4 UP −0.0732 
CHL1 DW −0.0609 
UACA UP 0.056 
TCF21 DW −0.0406 
NTF3 UP −0.0191 
HECW2 UP 0.0072 
SLC38A4 DW −0.0004 
Dataset MBPrediction
Real statusNCFCAF-PTCAF-LMClass error rate
NCF 
CAF-PT 0.428 
CAF-LM 11 
Overall error rate = 0.174 (cross-validation) 
Validation
Dataset LPPrediction
Real statusPrimary tumorLiver metastasisClass error rate
Primary tumor 16 0.158 
Liver metastasis 21 
Overall error rate = 0.075 (cross-validation) 
GSE22598Prediction
Real statusNormal mucosaPrimary tumorClass error rate
Normal mucosa 15 0.118 
Primary tumor 17 
Overall error rate = 0.059 (cross-validation) 
GSE31279Prediction
Real statusNormal stromaPrimary tumor stromaLiver metastasisClass error rate
Normal stroma 0.10 
Primary tumor stroma 0.30 
Liver metastasis 0.25 
Overall error rate = 0.203 (cross-validation) 

NOTE: Shrunken centroids for each class of fibroblast (n = 34). The column “group” describes the direction of the expression in the sequence NCF-to-CAF-LM.

The method computes a standardized centroid for each class. Centroids are the average gene expression for each gene in each class divided by the within-class SD for that gene. Nearest centroid classification takes the gene expression profile of a new sample, and compares it with each of these class centroids. The class whose centroid that it is closest to, in squared distance, is the predicted class for that new sample. Nearest shrunken centroid classification shrinks each of the class centroids toward the overall centroid for all classes by a threshold. This shrinkage consists of moving the centroid toward zero by threshold, setting it equal to zero if it hits zero.

Methylation status of these 25 genes has been evaluated in 258 samples of primary tumors from the TCGA (The Cancer Genome Atlas). Results are depicted in Supplementary Table S5.

Abbreviations: UP, CAF-LM > CAF-PT > NCF; DW, NCF > CAF-PT > CAF-LM.

The overall error rate after 10-fold cross-validation was 0.174 in the training samples (Table 2). The CAF-PT class was the biggest contributor to the misclassification, some samples being mistaken for CAF-LM and for NCF (Fig. 1A). We validated the 25-gene signature in three independent datasets, two whole-tumor samples (Dataset LP and GSE22598) and one microdissected sample (GSE31279). The probability of correctly classifying a sample was very high (Fig. 1B, Table 2), enabling us to conclude that these 25 genes were those whose expression levels best characterize each fibroblast class. Therefore, depending on the average gene expression value we could discriminate whether a given sample was an NCM, a primary tumor, or a liver metastasis.

Figure 1.

A, estimated probabilities for the fibroblast training set. Samples are partitioned by the true (top) and predicted (bottom) classes. NCF and CAF-LM are always well classified and some samples of CAF-PT have higher probabilities of being misclassified. B, validation of PAM analysis in the GSE22598 dataset consisting of 17 pairs of homogenized samples of whole-tumor or whole normal mucosa. The 25-gene signature can correctly classify normal mucosa samples and tumor samples with an accuracy of 94.1% after cross-validation. C, 25-gene signature score for the 34 samples of fibroblasts. Green, NCFs; orange, CAF-PT; red, CAF-LM. CAF-PT samples are those with a widest range of values, some samples having values similar to those of the CAF-LM and some being closer to those of the NCF. Red dotted line, 50th percentile of the signature score. We used this cutoff to classify patients in F and G. D, ROC curve for recurrence prediction according to the 25-gene signature score of the dataset GSE14333 (166 cases of colorectal cancer patients; AUC, 0.69). E, DFS Kaplan–Meier curves for the performance of the 25-gene signature in the GSE14333 dataset, categorizing the signature expression as high or low according to the cutoff obtained from the isolated fibroblasts (HR, 3.5; P = 0.00029). F, ROC curve for recurrence prediction according to the 25-gene signature score of the dataset GSE33113 (90 cases of patients with stage II colorectal cancer; AUC, 0.63). G, DFS Kaplan–Meier curves for the performance of the 25-gene signature in the GSE33113 dataset, categorizing the signature expression as high or low according to the cutoff obtained from the isolated fibroblasts (HR, 2.35; P = 0.12).

Figure 1.

A, estimated probabilities for the fibroblast training set. Samples are partitioned by the true (top) and predicted (bottom) classes. NCF and CAF-LM are always well classified and some samples of CAF-PT have higher probabilities of being misclassified. B, validation of PAM analysis in the GSE22598 dataset consisting of 17 pairs of homogenized samples of whole-tumor or whole normal mucosa. The 25-gene signature can correctly classify normal mucosa samples and tumor samples with an accuracy of 94.1% after cross-validation. C, 25-gene signature score for the 34 samples of fibroblasts. Green, NCFs; orange, CAF-PT; red, CAF-LM. CAF-PT samples are those with a widest range of values, some samples having values similar to those of the CAF-LM and some being closer to those of the NCF. Red dotted line, 50th percentile of the signature score. We used this cutoff to classify patients in F and G. D, ROC curve for recurrence prediction according to the 25-gene signature score of the dataset GSE14333 (166 cases of colorectal cancer patients; AUC, 0.69). E, DFS Kaplan–Meier curves for the performance of the 25-gene signature in the GSE14333 dataset, categorizing the signature expression as high or low according to the cutoff obtained from the isolated fibroblasts (HR, 3.5; P = 0.00029). F, ROC curve for recurrence prediction according to the 25-gene signature score of the dataset GSE33113 (90 cases of patients with stage II colorectal cancer; AUC, 0.63). G, DFS Kaplan–Meier curves for the performance of the 25-gene signature in the GSE33113 dataset, categorizing the signature expression as high or low according to the cutoff obtained from the isolated fibroblasts (HR, 2.35; P = 0.12).

Close modal

Gene expression associated with risk of progression

The CAF-PT class exhibited the greatest variation of those considered. We attempted to check that misclassified samples could be associated with prognosis, on the assumption that CAF-PTs with a higher signature score (reaching a value typical of CAF-LM) were at greater risk (i.e., had a poorer prognosis) than CAF-PTs with a lower signature score (Fig. 1C). We evaluated the prognostic power of the 25-gene classifier in GSE14333, which consisted of 166 colorectal primary tumors (stages I to III) and GSE33113 (90 patients, stage II). Assigning patients to high or low expression score (according to the 50th percentile of the 25-gene scores obtained from isolated fibroblasts; cutoff, −0.43), ROC curve (Fig. 1D), and DFS Kaplan–Meier analysis (Fig. 1E) indicated that these genes are associated with prognosis (HR, 3.5; P = 0.00029). An additional validation in GSE33113 (90 patients stage II) confirmed the tendency associated to prognosis (Fig. 1F and G). Therefore, this 25-gene score not only classified each fibroblast sample correctly but also classified patients with primary tumors according to their prognosis. The FPG and the 25-gene signature were obtained regardless of the gene expression data of the tumor specimens and patients' clinical data, so the prognostic power of this signature is strong evidence of the link between the tumor stroma and cancer progression.

Best genes associated with recurrence in patients with colorectal cancer

The 25-gene classifier was defined as the best genes whose monotonic expression characterized each fibroblast type. However, it was not specifically designed for predicting outcome. Nevertheless, given the prognostic power of the 25-gene classifier obtained from the FPG stereotyped gene expression pattern of fibroblasts, we aimed to define the best signature (from the 277 deregulated genes) for classifying patients with colorectal cancer, using recurrence as a primary endpoint. A PAM-optimized process yielded a 19-gene signature from GSE14333 (Supplementary Table S4). The score for each gene derived by PAM was associated with the status of the patient (recurrent or nonrecurrent) and the level of expression of the gene during cancer progression (UP or DW in Supplementary Table S4). Validation of PAM in the independent dataset GSE17538 showed a specificity of 79.7% and sensitivity of 54.9% after cross-validation. Of these 19 genes, 15 were overexpressed in the sequence NCF<CAF-PT<CAF-LM, and were thereby overexpressed during cancer progression; four genes were underexpressed.

To analyze the cell type–specific expression of the 19-gene signature further, we used data from GSE39396 (17), consisting of FACS-purified cell types from 6 patients with colorectal cancer. Relative levels of cell type–specific marker genes confirmed that most of the 19 genes are representative of the microenvironment compartment, especially from FAPα+ CAFs, with the exception of CFTR, which was expressed by EPCAM+ cells (Fig. 2A). In addition, because recruitment of fibroblasts is one of the characteristics of desmoplastic tumors, we performed a wound-healing assay to compare the migration of fibroblasts with high and low 19-gene signature scores. Fibroblasts with high signature scores had better migratory abilities, as illustrated in Fig. 2B.

Figure 2.

A, unsupervised hierarchical clustering of the 19-gene signature with data from GSE39396, consisting of four cell type populations isolated by FACS from six different colorectal tumors: EPCAM+ epithelial cells (green), CD45+ inflammatory cells (gray), CD31+ endothelial cells (yellow), and FAPα+ CAFs (orange). The heatmap shows that CAFs (orange) cluster in a separate group with respect to the expression values of the 19-gene signature. With the exception of CFTR, which is mainly expressed by epithelial cells, the other 18 genes are basically of stromal origin. B, representative wound-healing assay images of replicates of 19-gene signature low-score fibroblasts and 19-gene signature high-score fibroblasts. The micrographs show that high signature score fibroblasts (right) have a higher migratory capacity than those with a low signature score (left).

Figure 2.

A, unsupervised hierarchical clustering of the 19-gene signature with data from GSE39396, consisting of four cell type populations isolated by FACS from six different colorectal tumors: EPCAM+ epithelial cells (green), CD45+ inflammatory cells (gray), CD31+ endothelial cells (yellow), and FAPα+ CAFs (orange). The heatmap shows that CAFs (orange) cluster in a separate group with respect to the expression values of the 19-gene signature. With the exception of CFTR, which is mainly expressed by epithelial cells, the other 18 genes are basically of stromal origin. B, representative wound-healing assay images of replicates of 19-gene signature low-score fibroblasts and 19-gene signature high-score fibroblasts. The micrographs show that high signature score fibroblasts (right) have a higher migratory capacity than those with a low signature score (left).

Close modal

To rank patients by their level of expression of the 19 genes, and assuming equal biologic relevance for each of the 19 genes of the signature, we derived a score by calculating the average of the z-scores, applying a weight of +1 to overexpressed genes and of −1 to underexpressed genes. For patient's categorization, we used the Euclidian distance between sensitivity and specificity (ROC curve analysis; AUC, 0.79; cutoff, 0.187; Fig. 3A and B). The recurrence classifier identified 97 low-expression (58.4%) and 69 high-expression patients (41.6%; OR, 3.47; Fig. 3C). Low-risk and high-risk patients had significantly different survival outcomes, measured as DFS [HR, 5.49; 95% confidence intervals (CI), 2.86–10.53; P < 0.0001; Fig. 3D]. An independent cohort of 141 patients (GSE17538) was used to evaluate the performance of the signature score of the 19-gene classifier. In the validation set, 55 patients were defined as high risk (39%) and 86 as low risk (61%; Fig. 3E; AUC, 0.81). The prevalence of this validation dataset was 24.1% and the positive and negative posterior probabilities were 50% (95% CI, 36%–64%) and 8% (95% CI, 3.5%–16.4%), respectively. Stratifying patients by stage, the prevalence for stage II patients was 18%, with positive and negative posterior probabilities of 37% (95% CI, 17%–61%) and 6% (95% CI, 1%–22%), respectively. Interestingly, the clinical decision margin was wider for stage III, with a prevalence of 39% and positive and negative posterior probabilities of 61% (95% CI, 42%–77%) and 16% (95% CI, 6%–34%), respectively, thus with a difference of 45 points. Univariant Cox regression analysis revealed that high-risk patients had an HR of 7.43 (P < 0.0001) for DFS (Fig. 3G; statistically significant also stratifying by stages; Fig. 3H and I) and of 7.44 (P < 0.0001) for DSS (Fig. 3J). In multivariate analysis, after adjusting for significant variables in univariate analysis (stage and grade) the classifier score remained statistically significant for DFS (HR, 5.52; 95% CI, 2.23–13.65; P < 0.0001; Table 3). The 19-gene signature was also independently associated with DSS (P = 0.004; Table 3). In addition, the signature score as a continuous variable was also significant, increasing the risk of relapse (HR) 1.93 times per unit increase of the score (P < 0.0001; 95% CI, 1.47–2.53).

Figure 3.

A, ROC curve for the 19-gene signature performance in the GSE14333 dataset (AUC, 0.79). The AUC corresponds to the probability of the 19-gene signature ranking a randomly selected positive example higher than a randomly selected negative example, where 1 is the perfect test value and the dotted diagonal line shows the performance of a random predictor. B, histogram for 19-gene classifier score values for the patients of the training set. The red line corresponds to the cutoff (0.187) obtained from the Euclidian distance between sensitivity and specificity for recurrence prediction. The same cutoff value (0.187) is applied to the validation datasets. C, waterfall plot for recurrence risk of 166 patients from GSE14333 according to the 19-gene signature score (OR, 3.47). D, DFS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE14333 dataset for signatures classified as high (red) or low (black) expression. HR, 5.49; likelihood ratio test, P < 0.0001. Five-year DFS rates were 36% (high-expression patients) and 83% (low-expression patients). E, ROC curve for the 19-gene signature performance in the GSE17538 validation dataset (AUC, 0.81). F, waterfall plot for recurrence risk of 141 patients from validation dataset GSE17538 according to the 19-gene signature score (OR, 3.29; sensitivity, 78.8%; and specificity, 73.6%). G, DFS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE17538 dataset for scores classified as high (red) or low (black) expression. HR, 7.43; likelihood ratio test, P < 0.0001. Five-year DFS rates were 32% (high-expression patients) and 83% (low-expression patients). H and I, stratifying for stages, patients with high expression scores have a risk of shorter DFS 5.71-times higher than low-expression patients for stage II (P = 0.015) and 4.71 times for stage III. J, DSS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE17538 validation dataset for scores classified as high (red) or low (black) expression. HR, 7.44; P < 0.0001. Five-year DSS rates were 41% (high-expression patients) and 87% (low-expression patients).

Figure 3.

A, ROC curve for the 19-gene signature performance in the GSE14333 dataset (AUC, 0.79). The AUC corresponds to the probability of the 19-gene signature ranking a randomly selected positive example higher than a randomly selected negative example, where 1 is the perfect test value and the dotted diagonal line shows the performance of a random predictor. B, histogram for 19-gene classifier score values for the patients of the training set. The red line corresponds to the cutoff (0.187) obtained from the Euclidian distance between sensitivity and specificity for recurrence prediction. The same cutoff value (0.187) is applied to the validation datasets. C, waterfall plot for recurrence risk of 166 patients from GSE14333 according to the 19-gene signature score (OR, 3.47). D, DFS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE14333 dataset for signatures classified as high (red) or low (black) expression. HR, 5.49; likelihood ratio test, P < 0.0001. Five-year DFS rates were 36% (high-expression patients) and 83% (low-expression patients). E, ROC curve for the 19-gene signature performance in the GSE17538 validation dataset (AUC, 0.81). F, waterfall plot for recurrence risk of 141 patients from validation dataset GSE17538 according to the 19-gene signature score (OR, 3.29; sensitivity, 78.8%; and specificity, 73.6%). G, DFS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE17538 dataset for scores classified as high (red) or low (black) expression. HR, 7.43; likelihood ratio test, P < 0.0001. Five-year DFS rates were 32% (high-expression patients) and 83% (low-expression patients). H and I, stratifying for stages, patients with high expression scores have a risk of shorter DFS 5.71-times higher than low-expression patients for stage II (P = 0.015) and 4.71 times for stage III. J, DSS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE17538 validation dataset for scores classified as high (red) or low (black) expression. HR, 7.44; P < 0.0001. Five-year DSS rates were 41% (high-expression patients) and 87% (low-expression patients).

Close modal
Table 3.

Multivariate Cox proportional hazard analysis for DFS and DSS in the validation set GSE17538

Multivariate analysis 19-gene signature GSE17538
DFSDSS
HR (95% CI)PHR (95% CI)P
Grade 1.59 (0.66–3.79) 0.3 1.63 (0.64–4.12) 0.3 
Stage 2.06 (1.1–3.84) 0.023 1.455 (1.07–5.51) 0.033 
19-gene classifier (H vs. L) 5.52 (2.23–13.65) <0.0001 6.029 (1.76–20.65) 0.004 
Multivariate analysis 19-gene signature GSE17538
DFSDSS
HR (95% CI)PHR (95% CI)P
Grade 1.59 (0.66–3.79) 0.3 1.63 (0.64–4.12) 0.3 
Stage 2.06 (1.1–3.84) 0.023 1.455 (1.07–5.51) 0.033 
19-gene classifier (H vs. L) 5.52 (2.23–13.65) <0.0001 6.029 (1.76–20.65) 0.004 

NOTE: We included in the analysis only variables that were statistically significant in univariate analysis, then gender and age >55 years were excluded (n more clinical variables were available for the dataset at GEO).

To confirm the prognostic value in stage II patients, we used a third independent cohort of 90 patients (GSE33113). Using the cutoff obtained from the training dataset (Fig. 4A), the classifier also performed well in this third validation cohort (AUC, 0.71; Fig. 4B and C). A high level of expression of the 19-gene signature was clearly associated with poor prognosis in this cohort (HR, 3.84; P = 0.0034; Fig. 4D). Interestingly, the prevalence in this cohort was 21% and the positive and negative posterior probabilities were 37% (95% CI, 22%–55%) and 10% (95% CI, 4%–23%). Moreover, the signature score as a continuous variable was also significant, increasing the risk of relapse (HR) 2.06 times per unit increase of the score (P = 0.003; 95% CI, 1.29–3.29).

Figure 4.

A, histogram for 19-gene classifier score values for the patients of the validation dataset GSE33113 (90 patients stage II). The red line corresponds to the cutoff (0.187) obtained from the Euclidian distance between sensitivity and specificity for recurrence prediction in the training dataset. B, ROC curve for 19-gene signature performance in the GSE33113 (AUC, 0.71) validation dataset. C, waterfall plot for recurrence risk of 90 patients from validation dataset GSE33113 according to the 19-gene signature score (OR = 2.21; Sensitivity 68.4% and specificity 69%). D, DFS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE33113 validation dataset (stage II) for signatures classified as high (red) or low (black) expression. HR, 3.84; likelihood ratio test, P = 0.0034. Five-year DFS rates were 60% and 87% for high- and low-expression patients, respectively.

Figure 4.

A, histogram for 19-gene classifier score values for the patients of the validation dataset GSE33113 (90 patients stage II). The red line corresponds to the cutoff (0.187) obtained from the Euclidian distance between sensitivity and specificity for recurrence prediction in the training dataset. B, ROC curve for 19-gene signature performance in the GSE33113 (AUC, 0.71) validation dataset. C, waterfall plot for recurrence risk of 90 patients from validation dataset GSE33113 according to the 19-gene signature score (OR = 2.21; Sensitivity 68.4% and specificity 69%). D, DFS Kaplan–Meier curves for the performance of the 19-gene signature in the GSE33113 validation dataset (stage II) for signatures classified as high (red) or low (black) expression. HR, 3.84; likelihood ratio test, P = 0.0034. Five-year DFS rates were 60% and 87% for high- and low-expression patients, respectively.

Close modal

Thus, for both validation datasets, the classifier clearly identified low-risk patients, particularly in the case of stage II patients, who would benefit from the omission of chemotherapy, especially those who are T4N0 and thus clinically classified as being at higher risk.

Unfortunately, we only know the age, gender, recurrence status, grade (only for GSE17538), stage, and survival time of the patients in these cohorts, as the authors who uploaded the data to the GEO did not provide information about other characteristics.

Molecular biomarkers for predicting relapse of colorectal cancer are needed to improve the selection of patients who would benefit from an adjuvant treatment. Recently, the great importance of the stroma to the prognosis of various types of cancer, especially those with a high degree of desmoplasia, has become recognized. Our hypothesis is that differential gene expression between NCF, CAF-PT, and CAF-LM would be useful for predicting relapse in primary tumors. We identified a short list of genes of stromal fibroblasts from anatomic sites involved in colorectal carcinogenesis, whose transcriptional level was correlated with cancer progression. These genes display a gradual increase/decrease in expression during cancer progression, from NCM to liver metastasis. The scalability observed for this response may link the regenerative capacities of different organs after an injury (i.e., liver) and the greater ability of CAF-LM to induce epithelial–mesenchymal transition (EMT) phenotypes in epithelial cells more efficiently than in other types of myofibroblasts (36). Many of these genes are associated with wound healing and correlate with the activation status of fibroblasts. As an example, SERPINE1, POSTN, HBEGF, PTX3 or TGFβ2, IL1β, and TGFβ1 responsive genes, increase from NCF to CAF-LM, with IL1β and TGFβ1 being the main inductors of fibroblast activation. This transcriptomic program for wound response seems to be more effective in the more aggressive liver and primary tumors, and could explain the greater degree of desmoplasia that is associated with poor outcome (37). About downregulated genes, there is an enrichment of genes involved in inflammatory processes. This fact might explain the better outcome of patients with an intense inflammation-related component infiltration. Probably the lack of such CAF-derived molecules in the more aggressive tumors would avoid the recruitment of inflammatory cells to the tumor. In summary, this result strongly suggests that changes in colorectal tumor stroma have a crucial role in disease progression and outcome. Obviously, other fibroblast-specific genes could have prognostic value, although their level of expression would not be associated with the transition from normal fibroblasts to highly activated fibroblasts, as those genes would be differentially expressed in a particular microenvironment (e.g., they could be expressed in hepatic metastases but be absent from primary tumors).

Although fibroblasts from different demarcations display particular transcriptomic programs (38), even considering each fibroblast type as a different cell, it is interesting to observe that the transcriptome is dynamic and plastic when confronted with a malignant cell, probably as a result of the cross-talk between the two compartments. This is particularly relevant if we consider that CAFs from the primary tumor are derived from local NCF when oncogenically stimulated (same anatomic demarcation), but CAF-LM are cells with a different origin.

Previous studies have not focused on the role of the three microenvironments in which colorectal cancer cells interact during tumor progression. In addition, different stromal classifiers exist for breast (39), prostate (40), lung (41), and hepatocellular carcinoma (42), but this issue has not so far been addressed with respect to colorectal cancer; instead, the focus has been on CAFs, the principal component of the stroma. We also identified a 19-gene signature that predicts disease outcome with remarkable accuracy in whole-tumor samples.

The scalability of the stereotyped gene expression pattern was obtained from isolated fibroblasts of colorectal tumor-associated demarcations, so the prognostic power of the expression pattern is strong evidence of the importance of the stroma in the oncogenesis of colorectal tumors. This was observed implicitly in a signature previously obtained from whole-tumor samples, in which many of the 48 most relevant genes were of stromal origin (4). Because the method has been performed and validated in whole-tumor samples, in which the stroma/epithelium ratio was not known (37), the expression signal detected emphasizes the role of stroma. Most of the 19 genes identified in the present study are strongly expressed by CAFs (with the exception of ULBP2 and CFTR) and to a lesser extent by other cells of the tumor stroma. In addition, many of these genes have been associated previously with tumorigenic processes. Stromal SERPINE1 (codifying for PAI-1) has been involved in cell migration and angiogenesis promotion in different desmoplastic tumors (43, 44). INHBA (a TGFB family member) has been recently observed to be expressed in the sequence from NCM–colonic adenoma–colorectal adenocarcinoma (45) and has been associated with prognosis (46). Elevated expression of POSTN is reported to be associated with the invasion and anchorage-independent growth of the metastatic process of head and neck squamous cell carcinoma (47) and the invasive neoplasm of the pancreas (48). Furthermore, secreted POSTN was reported to trigger EMT through coactivation of EGFR and alpha 5 integrin (49). Other genes such as NTM (neurotrimin) and DENND2A have not been yet involved in cancer processes neither in the stroma nor epithelium.

According to previously reported gene signatures in colorectal cancer, we found overlap only between two genes (PDLIM3 and ULBP2) of our classifier and a recent molecular classification (50), contributing also these genes to characterize the group of worst outcome.

We present different, complementary ways of interpreting the results that could be of great relevance, depending on the clinical scenario. First, PAM analysis identifies the genes of relevance in terms of binary events (recurrence/nonrecurrence) that are highly specific after cross-validation. Second, the signature score (equal weight for all genes) reveals the biologic relevance of the genes selected by PAM, clearly demonstrating that this signature is an independent predictor of outcome, even for stage III patients. This step is crucial before trying to develop a model that is more mathematically accurate albeit without a biologic rationale. This reinforces the role of the stroma, and particularly fibroblasts, as important components of cancer progression, rather than as mere companions of malignant cells.

On the basis of our results, we provide accurate information for clinical decision making, as our classifier performs very well in identifying low-risk patients who would benefit from an omission of adjuvant chemotherapy if validated in independent clinical samples. This could be very significant for those stage II T4N0, perforated, obstructed, poorly differentiated, or microsatellite stable patients, who would otherwise be offered toxic adjuvant chemotherapy. In stage III patients, our signature also differentiates low-risk patients, but these results need to be corroborated in a larger series and in trials to convince oncologists to withhold chemotherapy completely from these patients, and even those of the T2N1 subgroup, whose members have a better clinical prognosis.

In conclusion, using the transcriptomic signature of fibroblasts that we defined and specifically the 19-gene classifier derived from it in a clinical setting, we can provide accurate information about the risk of recurrence, and may facilitate the selection of patients at risk of recurrence, especially for high-risk stage II patients who would benefit from adjuvant therapy.

The findings from the stratification of patients based exclusively on the CAF transcriptomic program indicate the need to develop therapeutic strategies focused on these cells, as they are the main components of desmoplastic tumors. Prospective studies are needed to determine whether treatment decisions based on our stromal 19-gene classifier could benefit patients.

No potential conflicts of interest were disclosed.

Conception and design: D.G. Molleví

Development of methodology: N.G. Díaz-Maroto, D.G. Molleví

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): M. Berdiel-Acer, X. Sanjuan, T. Serrano, V. Moreno, S. Gonçalves-Ribeiro, A. Villanueva, D.G. Molleví

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): M. Berdiel-Acer, D. Cuadras, A. Berenguer, V. Moreno, A. Villanueva, D.G. Molleví

Writing, review, and/or revision of the manuscript: M. Berdiel-Acer, X. Sanjuan, A. Berenguer, V. Moreno, S. Gonçalves-Ribeiro, D.G. Molleví

Study supervision: R. Salazar, D.G. Molleví

This work was supported by projects of the Instituto de Salud Carlos III (grants PI07/0657 and PI10/1604) and the Agència de Gestió d'Ajuts Universitaris i de Recerca (grant number AGAUR SGR-2009-681).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Lee
AH
,
Dublin
EA
,
Bobrow
LG
. 
Angiogenesis and expression of thymidine phosphorylase by inflammatory and carcinoma cells in ductal carcinoma in situ of the breast
.
J Pathol
1999
;
187
:
285
90
.
2.
Elenbaas
B
,
Weinberg
RA
. 
Heterotypic signaling between epithelial tumor cells and fibroblasts in carcinoma formation
.
Exp Cell Res
2001
;
264
:
169
84
.
3.
Salazar
R
,
Roepman
P
,
Capella
G
,
Moreno
V
,
Simon
I
,
Dreezen
C
, et al
Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer
.
J Clin Oncol
2011
;
29
:
17
24
.
4.
O'Connell
MJ
,
Lavery
I
,
Yothers
G
,
Paik
S
,
Clark-Langone
KM
,
Lopatin
M
, et al
Relationship between tumor gene expression and recurrence in four independent studies of patients with stage II/III colon cancer treated with surgery alone or surgery plus adjuvant fluorouracil plus leucovorin
.
J Clin Oncol
2010
;
28
:
3937
44
.
5.
Smith
JJ
,
Deane
NG
,
Wu
F
,
Merchant
NB
,
Zhang
B
,
Jiang
A
, et al
Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer
.
Gastroenterology
2010
;
138
:
958
68
.
6.
Jorissen
RN
,
Gibbs
P
,
Christie
M
,
Prakash
S
,
Lipton
L
,
Desai
J
, et al
Metastasis-associated gene expression changes predict poor outcomes in patients with dukes stage B and C colorectal cancer
.
Clin Cancer Res
2009
;
15
:
7642
51
.
7.
Rinn
JL
,
Bondre
C
,
Gladstone
HB
,
Brown
PO
,
Chang
HY
. 
Anatomic demarcation by positional variation in fibroblast gene expression programs
.
PLoS Genet
2006
;
2
:
e119
.
8.
Chang
HY
,
Sneddon
JB
,
Alizadeh
AA
,
Sood
R
,
West
RB
,
Montgomery
K
, et al
Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds
.
PLoS Biol
2004
;
2
:
E7
.
9.
Chang
HY
,
Nuyten
DS
,
Sneddon
JB
,
Hastie
T
,
Tibshirani
R
,
Sorlie
T
, et al
Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival
.
Proc Natl Acad Sci U S A
2005
;
102
:
3738
43
.
10.
Bissell
MJ
,
Hines
WC
. 
Why don't we get more cancer? A proposed role of the microenvironment in restraining cancer progression
.
Nat Med
2011
;
17
:
320
9
.
11.
Wilson
CL
,
Miller
CJ
. 
Simpleaffy: a BioConductor package for Affymetrix Quality Control and data analysis
.
Bioinformatics
2005
;
21
:
3683
5
.
12.
Tusher
VG
,
Tibshirani
R
,
Chu
G
. 
Significance analysis of microarrays applied to the ionizing radiation response
.
Proc Natl Acad Sci U S A
2001
;
98
:
5116
21
.
13.
Tibshirani
R
,
Hastie
T
,
Narasimhan
B
,
Chu
G
. 
Diagnosis of multiple cancer types by shrunken centroids of gene expression
.
Proc Natl Acad Sci U S A
2002
;
99
:
6567
72
.
14.
Abba
M
,
Laufs
S
,
Aghajany
M
,
Korn
B
,
Benner
A
,
Allgayer
H
. 
Look who's talking: deregulated signaling in colorectal cancer
.
Cancer Genomics Proteomics
2012
;
9
:
15
25
.
15.
Okazaki
S
,
Ishikawa
T
,
Iida
S
,
Ishiguro
M
,
Kobayashi
H
,
Higuchi
T
, et al
Clinical significance of UNC5B expression in colorectal cancer
.
Int J Oncol
2012
;
40
:
209
16
.
16.
de Sousa
EMF
,
Colak
S
,
Buikhuisen
J
,
Koster
J
,
Cameron
K
,
de Jong
JH
, et al
Methylation of cancer-stem-cell-associated Wnt target genes predicts poor prognosis in colorectal cancer patients
.
Cell Stem Cell
2011
;
9
:
476
85
.
17.
Calon
A
,
Espinet
E
,
Palomo-Ponce
S
,
Tauriello
DV
,
Iglesias
M
,
Cespedes
MV
, et al
Dependency of colorectal cancer on a TGF-beta-driven program in stromal cells for metastasis initiation
.
Cancer Cell
2012
;
22
:
571
84
.
18.
Kim
YH
,
Lee
HC
,
Kim
SY
,
Yeom
YI
,
Ryu
KJ
,
Min
BH
, et al
Epigenomic analysis of aberrantly methylated genes in colorectal cancer identifies genes commonly affected by epigenetic alterations
.
Ann Surg Oncol
2011
;
18
:
2338
47
.
19.
Xiao
W
,
Wang
J
,
Li
H
,
Guan
W
,
Xia
D
,
Yu
G
, et al
Fibulin-1 is down-regulated through promoter hypermethylation and suppresses renal cell carcinoma progression
.
J Urol
2013
;
190
:
291
30
20.
Cheng
YY
,
Jin
H
,
Liu
X
,
Siu
JM
,
Wong
YP
,
Ng
EK
, et al
Fibulin 1 is downregulated through promoter hypermethylation in gastric cancer
.
Br J Cancer
2008
;
99
:
2083
7
.
21.
Hibi
K
,
Mizukami
H
,
Saito
M
,
Kigawa
G
,
Nemoto
H
,
Sanada
Y
. 
FBN2 methylation is detected in the serum of colorectal cancer patients with hepatic metastasis
.
Anticancer Res
2012
;
32
:
4371
4
.
22.
Tsunoda
S
,
Smith
E
,
De Young
NJ
,
Wang
X
,
Tian
ZQ
,
Liu
JF
, et al
Methylation of CLDN6, FBN2, RBP1, RBP4, TFPI2, and TMEFF2 in esophageal squamous cell carcinoma
.
Oncol Rep
2009
;
21
:
1067
73
.
23.
Kim
TO
,
Park
J
,
Kang
MJ
,
Lee
SH
,
Jee
SR
,
Ryu
DY
, et al
DNA hypermethylation of a selective gene panel as a risk marker for colon cancer in patients with ulcerative colitis
.
Int J Mol Med
2013
;
31
:
1255
61
.
24.
Dannenberg
LO
,
Chen
HJ
,
Tian
H
,
Edenberg
HJ
. 
Differential regulation of the alcohol dehydrogenase 1B (ADH1B) and ADH1C genes by DNA methylation and histone deacetylation
.
Alcohol Clin Exp Res
2006
;
30
:
928
37
.
25.
Huang
YW
,
Jansen
RA
,
Fabbri
E
,
Potter
D
,
Liyanarachchi
S
,
Chan
MW
, et al
Identification of candidate epigenetic biomarkers for ovarian cancer detection
.
Oncol Rep
2009
;
22
:
853
61
.
26.
Mithani
SK
,
Smith
IM
,
Califano
JA
. 
Use of integrative epigenetic and cytogenetic analyses to identify novel tumor-suppressor genes in malignant melanoma
.
Melanoma Res
2011
;
21
:
298
307
.
27.
Richards
KL
,
Zhang
B
,
Sun
M
,
Dong
W
,
Churchill
J
,
Bachinski
LL
, et al
Methylation of the candidate biomarker TCF21 is very frequent across a spectrum of early-stage nonsmall cell lung cancers
.
Cancer
2011
;
117
:
606
17
.
28.
Weiss
D
,
Stockmann
C
,
Schrodter
K
,
Rudack
C
. 
Protein expression and promoter methylation of the candidate biomarker TCF21 in head and neck squamous cell carcinoma
.
Cell Oncol
2013
;
36
:
213
24
29.
Louie
E
,
Chen
XF
,
Coomes
A
,
Ji
K
,
Tsirka
S
,
Chen
EI
. 
Neurotrophin-3 modulates breast cancer cells and the microenvironment to promote the growth of breast cancer brain metastasis
.
Oncogene
2013
;
32
:
4064
77
.
30.
Moravcikova
E
,
Krepela
E
,
Prochazka
J
,
Rousalova
I
,
Cermak
J
,
Benkova
K
. 
Down-regulated expression of apoptosis-associated genes APIP and UACA in non-small cell lung carcinoma
.
Int J Oncol
2012
;
40
:
2111
21
.
31.
Ford
CE
,
Jary
E
,
Ma
SS
,
Nixdorf
S
,
Heinzelmann-Schwarz
VA
,
Ward
RL
. 
The Wnt gatekeeper SFRP4 modulates EMT, cell migration and downstream Wnt signalling in serous ovarian cancer cells
.
PLoS ONE
2013
;
8
:
e54362
.
32.
Chang
YT
,
Wu
CC
,
Shyr
YM
,
Chen
TC
,
Hwang
TL
,
Yeh
TS
, et al
Secretome-based identification of ULBP2 as a novel serum marker for pancreatic cancer detection
.
PLoS ONE
2011
;
6
:
e20029
.
33.
Fujita
A
,
Shida
A
,
Fujioka
S
,
Kurihara
H
,
Okamoto
T
,
Yanaga
K
. 
Clinical significance of Rho GDP dissociation inhibitor 2 in colorectal carcinoma
.
Int J Clin Oncol
2012
;
17
:
137
42
.
34.
Li
X
,
Wang
J
,
Zhang
X
,
Zeng
Y
,
Liang
L
,
Ding
Y
. 
Overexpression of RhoGDI2 correlates with tumor progression and poor prognosis in colorectal carcinoma
.
Ann Surg Oncol
2012
;
19
:
145
53
.
35.
Cho
HJ
,
Baek
KE
,
Kim
IK
,
Park
SM
,
Choi
YL
,
Nam
IK
, et al
Proteomics-based strategy to delineate the molecular mechanisms of RhoGDI2-induced metastasis and drug resistance in gastric cancer
.
J Proteome Res
2012
;
11
:
2355
64
.
36.
Berdiel-Acer
M
,
Bohem
ME
,
Lopez-Doriga
A
,
Vidal
A
,
Salazar
R
,
Martinez-Iniesta
M
, et al
Hepatic carcinoma-associated fibroblasts promote an adaptative response in colorectal cancer cells that inhibit proliferation and apoptosis: nonresistant cells die by nonapoptotic cell death
.
Neoplasia
2011
;
13
:
931
46
.
37.
Huijbers
A
,
Tollenaar
RA
,
GW
VP
,
Zeestraten
EC
,
Dutton
S
,
McConkey
CC
, et al
The proportion of tumor-stroma as a strong prognosticator for stage II and III colon cancer patients: validation in the VICTOR trial
.
Ann Oncol
2013
;
24
:
179
85
38.
Chang
HY
,
Chi
JT
,
Dudoit
S
,
Bondre
C
,
van de Rijn
M
,
Botstein
D
, et al
Diversity, topographic differentiation, and positional memory in human fibroblasts
.
Proc Natl Acad Sci U S A
2002
;
99
:
12877
82
.
39.
Finak
G
,
Bertos
N
,
Pepin
F
,
Sadekova
S
,
Souleimanova
M
,
Zhao
H
, et al
Stromal gene expression predicts clinical outcome in breast cancer
.
Nat Med
2008
;
14
:
518
27
.
40.
Jia
Z
,
Rahmatpanah
FB
,
Chen
X
,
Lernhardt
W
,
Wang
Y
,
Xia
XQ
, et al
Expression changes in the stroma of prostate cancer predict subsequent relapse
.
PLoS ONE
2012
;
7
:
e41371
.
41.
Navab
R
,
Strumpf
D
,
Bandarchi
B
,
Zhu
CQ
,
Pintilie
M
,
Ramnarine
VR
, et al
Prognostic gene-expression signature of carcinoma-associated fibroblasts in non-small cell lung cancer
.
Proc Natl Acad Sci U S A
2011
;
108
:
7160
5
.
42.
Gao
Q
,
Wang
XY
,
Qiu
SJ
,
Zhou
J
,
Shi
YH
,
Zhang
BH
, et al
Tumor stroma reaction-related gene signature predicts clinical outcome in human hepatocellular carcinoma
.
Cancer Sci
2011
;
102
:
1522
31
.
43.
Illemann
M
,
Hansen
U
,
Nielsen
HJ
,
Andreasen
PA
,
Hoyer-Hansen
G
,
Lund
LR
, et al
Leading-edge myofibroblasts in human colon cancer express plasminogen activator inhibitor-1
.
Am J Clin Pathol
2004
;
122
:
256
65
.
44.
Sieuwerts
AM
,
Martens
JW
,
Dorssers
LC
,
Klijn
JG
,
Foekens
JA
. 
Differential effects of fibroblast growth factors on expression of genes of the plasminogen activator and insulin-like growth factor systems by human breast fibroblasts
.
Thromb Haemost
2002
;
87
:
674
83
.
45.
Pesson
M
,
Volant
A
,
Uguen
A
,
Trillet
K
,
De La Grange
P
,
Aubry
M
, et al
A gene expression and pre-mRNA splicing signature that marks the adenoma-adenocarcinoma progression in colorectal cancer
.
PLoS ONE
2014
;
9
:
e87761
.
46.
Okano
M
,
Yamamoto
H
,
Ohkuma
H
,
Kano
Y
,
Kim
H
,
Nishikawa
S
, et al
Significance of INHBA expression in human colorectal cancer
.
Oncol Rep
2013
;
30
:
2903
8
.
47.
Kudo
Y
,
Ogawa
I
,
Kitajima
S
,
Kitagawa
M
,
Kawai
H
,
Gaffney
PM
, et al
Periostin promotes invasion and anchorage-independent growth in the metastatic process of head and neck cancer
.
Cancer Res
2006
;
66
:
6928
35
.
48.
Fukushima
N
,
Kikuchi
Y
,
Nishiyama
T
,
Kudo
A
,
Fukayama
M
. 
Periostin deposition in the stroma of invasive and intraductal neoplasms of the pancreas
.
Mod Pathol
2008
;
21
:
1044
53
.
49.
Yan
W
,
Shao
R
. 
Transduction of a mesenchyme-specific gene periostin into 293T cells induces cell invasive activity through epithelial-mesenchymal transformation
.
J Biol Chem
2006
;
281
:
19700
8
.
50.
De Sousa
EMF
,
Wang
X
,
Jansen
M
,
Fessler
E
,
Trinh
A
,
de Rooij
LP
, et al
Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions
.
Nat Med
2013
;
19
:
614
8
.

Supplementary data