Purpose: Gene expression profile (GEP)–based classification of colonic diseases is a new method for diagnostic purposes. Our aim was to develop diagnostic mRNA expression patterns that may establish the basis of a new molecular biological diagnostic method.

Experimental Design: Total RNA was extracted, amplified, and biotinylated from frozen colonic biopsies of patients with colorectal cancer (n = 22), adenoma (n = 20), hyperplastic polyp (n = 11), inflammatory bowel disease (n = 21), and healthy normal controls (n = 11), as well as peripheral blood samples of 19 colorectal cancer and 11 healthy patients. Genome-wide gene expression profile was evaluated by HGU133plus2 microarrays. To identify the differentially expressed features, the significance analysis of microarrays and, for classification, the prediction analysis of microarrays were used. Expression patterns were validated by real-time PCR. Tissue microarray immunohistochemistries were done on tissue samples of 121 patients.

Results: Adenoma samples could be distinguished from hyperplastic polyps by the expression levels of nine genes including ATP-binding cassette family A, member 8, insulin-like growth factor 1 and glucagon (sensitivity, 100%; specificity, 90.91%). Between low-grade and high-grade dysplastic adenomas, 65 classifier probesets such as aquaporin 1, CXCL10, and APOD (90.91/100) were identified; between colorectal cancer and adenoma, 61 classifier probesets including axin 2, von Willebrand factor, tensin 1, and gremlin 1 (90.91/100) were identified. Early- and advanced-stage colorectal carcinomas could be distinguished using 34 discriminatory transcripts (100/66.67).

Conclusions: Whole genomic microarray analysis using routine biopsy samples is suitable for the identification of discriminative signatures for differential diagnostic purposes. Our results may be the basis for new GEP-based diagnostic methods. (Cancer Epidemiol Biomarkers Prev 2008;17(10):2835–45)

Colorectal cancer is one of the most frequent cancers in the world with very high mortality. According to WHO data, ∼945,000 new colorectal cancer cases are registered worldwide, and almost 492,000 colorectal cancer–related deaths occur every year (1). Hence, the early diagnosis, the discrimination between genetically and expressionally different tumors, and in view of these, the enhancement of therapies, become necessary. The 5-year survival data also emphasize the importance of an early diagnosis of colorectal cancer. The 5-year survival rate is 80% to 90% in early colorectal cancer, 60% in case of nodal involvement, and <10% in metastatic colorectal cancer.

According to the widely accepted adenoma-dysplasia-carcinoma sequence, most of the colorectal cancer develop on the basis of villous adenomas (2, 3). Recently published, however, was the concept of a “serrated neoplasia pathway” referring to a pattern of progression of colorectal cancer that involves hyperplastic polyps and serrated adenomas (4). The serrated pathway culminates in colorectal cancers with DNA microsatellite instability, mutation of BRAF, and extensive DNA methylation (5-7). Iino et al. (8) suggested that MSI-L hyperplastic polyps may be precursors of the subset (10%) of colorectal cancers showing the MSI-L phenotype.

Gene expression analysis of colon biopsies using high-density oligonucleotide microarrays may help to detect such gene expression patterns that would establish the basis for new molecular biological diagnostic methods. Utilization of mRNA expression microarray data for diagnostic purposes has already begun. More and more scientific studies appear to focus on the gene expression background of colorectal cancer progression and metastasis development (9-18), characterization of colorectal cancer subtypes according to mRNA expression (12, 18, 19), the correlation of gene expression profile with clinicopathologic variables (12, 18, 20, 21), and mRNA expression–based prognosis (22). In addition to the surgical and biopsy tissue samples, mRNA expression analysis of peripheral blood samples may also play a crucial role in the establishment of early molecular-based diagnostics and prognostics of tumorous diseases (23-27). The handling and the evaluation of the huge amount of data collected by microarray analyses require an extensive bioinformatical background. Multivariate statistical analysis is needed for the development of automatic diagnostic disease classification methods.

We have previously reported the discriminative mRNA expression signatures between colorectal cancer versus normal, adenoma versus normal, inflammatory bowel disease (IBD) versus normal samples, and between the early and advanced stages of colorectal cancer (28). However, the gene expression profile–based classification of colonic diseases for diagnostic purposes has not yet been solved. The results of the HGU133 Plus 2.0 whole genomic microarrays—which were also used in our study—in colorectal diseases have been published by only five research groups (29-33), and only two of them used biopsy samples (29, 30). Using Affymetrix microarrays, high-throughput disease-specific marker screening can be done. Our aims in this study were to develop diagnostic mRNA expression patterns for the objective classification of inflammatory, benign, and malignant colorectal diseases, and to compare the gene expression background of adenomas and hyperplastic polyps as the possible points of origin of colorectal cancer. Furthermore, we analyzed the presence of certain local colorectal cancer markers in peripheral blood that had been identified while using biopsy samples. This is necessary for the development of blood-based, disease-specific diagnostic screening.

Patients and Samples

After the informed consent of untreated patients, colon biopsy samples were taken during endoscopic intervention and stored in RNALater Reagent (Qiagen, Inc.) at −80°C. Additionally, 9 mL of peripheral blood samples of untreated patients were taken into Paxgene Blood RNA Tubes (Qiagen) before colonoscopy. The blood samples were also stored at −80°C. Altogether, 377 tissue samples (85 fresh frozen and 292 formalin-fixed paraffin-embedded tissue samples) and peripheral blood samples of 19 colorectal cancer and 11 healthy patients were analyzed in our study, as well as the blood smears of 10 healthy and 10 colorectal cancer patients. Total RNA was extracted, and Affymetrix microarray analysis was done on the biopsies of patients with tubulovillous/villous adenomas (n = 20, 11 with high-grade dysplastic and 9 with low-grade dysplasia), colorectal adenocarcinoma (n = 22), hyperplastic polyps (n = 11), and healthy normal controls (n = 11), as well as from peripheral blood samples of 19 patients with colorectal cancer and 11 healthy patients. Fifty-two microarrays (8 normal, 15 adenoma, 15 colorectal cancer, 14 IBD) had been hybridized earlier; their data files were used in a previously published study using different comparisons (28) and are available in the Gene Expression Omnibus database (series accession number: GSE4183). The data sets of the newly hybridized 63 microarrays are registered in the GSE10714 (33 microarrays from biopsy samples: 3 normal, 11 hyperplastic polyps, 5 adenoma, 7 colorectal cancer, 7 IBD) and in the GSE10715 (30 microarrays from blood samples: 19 colorectal cancer and 11 normal) serial accession numbers. The diagnostic groups and the number of patients in each group are represented in Table 1. Detailed patient specification is described in Supplementary Table S1.

Table 1.

Number of patients in the different disease groups

GroupBiopsy samples, n = 85, original set
Biopsy samples, n = 92, independent set
Blood samples, n = 31
Affymetrix microarrayTaqman RT-PCRTissue microarrayTissue microarrayAffymetrix microarray
Adenoma with low-grade dysplasia — — — 
Adenoma with high-grade dysplasia 11 — — — 
CRC Dukes A-B 10 20 
CRC Dukes C-D 12 21 12 
Normal 11 21 11 
Hyperplastic polyp 11 — — — — 
Ulcerative colitis 12 10 — 
Crohn's disease — 16 — 
Undeterminate IBD — — — — 
Total patient numbers 85 34 29 93 30 
GroupBiopsy samples, n = 85, original set
Biopsy samples, n = 92, independent set
Blood samples, n = 31
Affymetrix microarrayTaqman RT-PCRTissue microarrayTissue microarrayAffymetrix microarray
Adenoma with low-grade dysplasia — — — 
Adenoma with high-grade dysplasia 11 — — — 
CRC Dukes A-B 10 20 
CRC Dukes C-D 12 21 12 
Normal 11 21 11 
Hyperplastic polyp 11 — — — — 
Ulcerative colitis 12 10 — 
Crohn's disease — 16 — 
Undeterminate IBD — — — — 
Total patient numbers 85 34 29 93 30 

Abbreviation: CRC, colorectal cancer.

Methods

mRNA Expression Microarray Analysis. Total RNA was extracted using the RNeasy Mini Kit (Qiagen) for biopsy samples and the Paxgene Blood RNA Kit (Qiagen) for peripheral blood samples according to the manufacturers' instructions. The isolated peripheral blood RNA samples were concentrated using the GeneChip Blood RNA Concentration Kit (Affymetrix, Inc.). The quantity and the quality of the isolated RNA were tested by measuring the absorbance and agarose gelelectrophoresis or capillary gelelectrophoresis using the 2100Bioanalyzer and RNA 6000 Pico Kit (Agilent, Inc.). Biotinylated cRNA probes were synthesized from 5 to 8 μg total RNA and fragmented using the One-Cycle Target Labeling and Control Kit4

according to the Affymetrix description. In case of peripheral blood RNA samples, 5 μg total RNA was used for cRNA probe synthesis, and during reverse transcription Globin Reduction PNA oligomers (Applied Biosystems) were applied to reduce the amount of globin transcripts. Ten micrograms of each fragmented cRNA sample were hybridized into HGU133 Plus2.0 array (Affymetrix) at 45°C for 16 h. The slides were washed and stained using Fluidics Station 450 and an antibody amplification staining method according to the manufacturer's instructions. The fluorescent signals were detected by a GeneChip Scanner 3000.

Statistical Evaluation of mRNA Expression Profiles

Preprocessing and Quality Control. Quality control analyses were done according to the suggestions of The Tumour Analysis Best Practices Working Group (34). Scanned images were inspected for artifacts, and the percentage of present calls (>25%) and control of the RNA degradation were evaluated. Based on the evaluation criteria, all biopsy measurements fulfilled the minimal quality requirements. The Affymetrix expression arrays were preprocessed by gcRMA with quantile normalization and median polish summarization. The data sets are available in the Gene Expression Omnibus databank for further analysis,5

series accession numbers GSE4183, GSE10714, and GSE10715.

Further Analyses. To identify differentially expressed features, significance analysis of microarrays was used. The nearest shrunken centroid method (prediction analysis of microarrays) was applied for sample classification from gene expression data. For gene selection, the random forest classification algorithm was used (35), whereas the .632+ bootstrap method was applied to estimate the prediction error rate (36). The confusion matrix of the true and the predicted classes was visualized on agreement plots (37). The preprocessing, data mining, and statistical steps were done using R-environment with Bioconductor libraries.

Taqman Real-time PCR. TaqMan real-time PCR (RT-PCR) was used to measure the expression of 26 selected genes using an Applied Biosystems Micro Fluidic Card System. The selected genes belonged to the prediction analysis of microarrays top 200 genes in the colorectal cancer versus normal, adenoma versus normal, and IBD versus normal comparisons, and validated Taqman assays were available. The measurements were done using an ABI PRISM 7900HT Sequence Detection System as described in the product's user guide.6

The data analysis was described earlier (28). For data analysis, the SDS 2.2 software was used.

Tissue Microarray Analysis and Blood Smear Immunocytochemistry. Cores of 1-mm diameter were collected from selected areas of formalin-fixed, paraffin-embedded tissue blocks made from 89 early colorectal cancer (stage Dukes B), 57 advanced colorectal cancer (stage Dukes C and D), 84 IBD (32 Crohn's disease, 40 ulcerative colitis, and 12 undeterminate IBD), and 62 normal colon samples of 122 patients and placed into recipient blocks. Tissue sections of 5-μm thickness were cut from the blocks and immunostained using the following antibodies: rabbit anti-human osteopontin (1:2,000 dilution; Chemicon), anti-osteonectin antibody (1:1,000 dilution; Chemicon), rabbit antihuman biglycan (1:200 dilution; Atlas), mouse anti-human collagen type IVα1 (1:300 dilution; Abcam, clone: COL-94), mouse anti-human vascular endothelial growth factor (1:2,000 dilution; Zymed, clone: VG 1), mouse anti-human von Willebrand factor (1:20; Dako, clone: F8/86), and mouse anti-human platelet-endothelial cell adhesion molecule 1 (1:40; Dako, clone: JC70A). Signal conversion was achieved using the EnVision+ kit (Dako) followed by 3,3′-diaminobenzidine–hydrogen peroxidase chromogen-substrate kit (Dako). Immunostained tissue microarray (TMA) slides were digitalized using a high-resolution Mirax Desk instrument (Zeiss) and analyzed with the Mirax TMA Module software (Zeiss). Protein expression was evaluated using an empirical scale considering intensity and occupied subcellular compartments of epithelial/carcinoma cells or lamina propria cells. For statistical analysis, Pearson's χ2 test and Fisher's exact test were done.

Blood smears of 10 healthy and 10 colorectal cancer patients were also immunostained using an anti-osteonectin antibody (1:1,000 dilution; Chemicon) and Alexa Fluor 488 F(ab′)2 fragment of goat anti-mouse IgG. The total and osteonectin-positive cells in 50 fields of view with 30× magnification were counted in each sample. For statistical analysis, a t test was done to evaluate the difference of osteonectin-positive/total cell number ratios between colorectal cancer and normal blood smears.

Classifiers between the Main Diagnostic Groups

The minimal number of discriminatory transcripts with high specificity and sensitivity values was determined using prediction analysis of microarrays in each comparison. Adenoma samples were distinguished from hyperplastic polyps by 100% sensitivity and 90.91% specificity, according to the expression level of minimally nine genes including ATP-binding cassette family A, member 8, insulin-like growth factor 1 and glucagon. Sixty-one classifier probesets were identified between colorectal cancer and adenoma, including axin 2, von Willebrand factor, tensin 1, and gremlin 1 (sensitivity, 90.91% and specificity, 100%). IBD and normal biopsies could be distinguished by 100% sensitivity and specificity using only three classifiers (REG1A, MMP3, and CHI3L1). According to the expression of 20 transcripts (such as INDO, CXCL9, CCR2, CD38, RARRES3, and CXCL10 transcripts), IBD and colorectal cancer samples could be separated by 100% sensitivity and by 95.24% specificity. Further details can be seen in Table 2.

Table 2.

Discriminatory PAM transcripts between the diagnostic groups

Group vs groupMinimum no. of discriminatory transcriptsSensitivity (%)Specificity (%)Including transcripts:
Adenoma vs hyperplastic polyp 100 90.91 ABCA8, KIAA1199, GCG, MAMDC2, C2orf32, 229670_at, IGF1, PCDH7, PRDX6 
IBD vs normal 100 100 REG1A, MMP3, CHI3L1 
Adenoma vs CRC 61 90.91 100 GREM1*, DDR2*, GUCY1A3*, TNS1, ADAMTS1, FBLN1, FLJ38028, RDX, FAM129A, ASPN, FRMD6, MCC, RBMS1*, SNAI2, MEIS1, DOCK10, PLEKHC1, FAM126A, TBC1D9, VWF, DCN, ROBO1, MSRB3, LATS2, MEF2C*, IGFBP3*, GNB4, RCN3, AKAP12, RFTN1, 226834_at, COL5A1, GNG2, NR3C1*, SPARCL1, MAB21L2, AXIN2, 236894_at, AEBP1, AP1S2, C10orf56, LPHN2, AKT3, FRMD6, COL15A1, CRYAB, COL14A1, LOC286167, QKI, WWTR1, GNG11, PAPPA, ELDT1 
IBD vs CRC 20 100 95.24 227458_at, INDO, CXCL9, CCR2, CD38, RARRES3, CXCL10, FAM26F*, TNIP3, NOS2A, CCRL1, TLR8, IL18BP, FCRL5, SAMD9L, ECGF1, TNFSF13B, GBP5, GBP1 
CRC-B vs CRC-CD 34 100 66.67 TMEM37*, IL33, CA4*, CCDC58, CLIC6, VSNL1*, ESPN, APCDD1, C13orf18, CYP4X1, ATP2A3, LOC646627, MUPCDH, ANPEP, C1orf115, HSD3B2, GBA3, GABRB2, GYLTL1B, LYZ, SPC25, CDKN2B, FAM89A, MOGAT2, SEMA6D, 229376_at, TSPAN5, IL6R, SLC26A2 
Adenoma with low-grade dysplasia vs adenoma with high-grade dysplasia 65 90.91 100 SI, DMBT1, CFI*, AQP1, APOD, TNFRSF17, CXCL10, CTSE, IGHA1, SLC9A3, SLC7A1, BATF2, SOCS1, DOCK2, NOS2A, HK2, CXCL2, IL15RA, POU2AF1, CLEC3B, ANI3BP, MGC13057, LCK*, C4BPA, HOXC6, GOLT1A, C2orf32, IL10RA, 240856_at, SOCS3, MEIS3P1, HIPK1, GLS, CPLX1, 236045_x_at, GALC, AMN, CCDC69, CCL28, CPA3, TRIB2, HMGA2, PLCL2, NR3C1, EIF5A, LARP4, RP5-1022P6.2, PHLDB2, FKBP1B, INDO, CLDN8, CNTN3, PBEF1, SLC16A9, CDC25B, TPSB2, PBEF1, ID4, GJB5, CHN2, LIMCH1, CXCL9, MFAP4 
UC vs CD 58 77.78 100 CCNG2, SLC44A4, DDAH1, TOB1, 231152_at, MKNK1, CEACAM7*, 1562836_at, CDC42SE2, PSD3, 231169_at, IGL@*, GSN, GPM6B, CDV3*, PDPK1, ANP32E, ADAM9, CDH1, NLRP2, 215777_at, OSBPL1, VNN1, RABGAP1L, PHACTR2, ASH1L, 213710_s_at, ZNF3, FUT2, IGHA1, EDEM1, GPR171, 229713_at, LOC643187, FLVCR1, SNAP23*, ETNK1, LOC728411, POSTN, MUC12, HOXA5, SIGLEC1, LARP5, PIGR, SPTBN1, UFM1, C6orf62, WDR90, ALDH1A3, F2RL1, IGHV1-69, DUOX2, RAB5A, CP 
Hyperplastic polyp vs normal 15 100 100 SLC6A14, ARHGEF10, ALS2, IL1RN, SPRY4, PTGER3, TRIM29, SERPINB5, 1560327_at, ZAK, BAG4, TRIB3, TTL, FOXQ1, UGT2A3 
Adenoma with low-grade dysplasia vs normal 100 100 KLK11, KIAA1199, FOXQ1 
Adenoma with high-grade dysplasia vs normal 100 100 CLDN8, ABCA8, PYY 
Adenoma vs normal 100 100 KIAA1199, FOXQ1, CA7 
CRC vs normal 100 100 VWF, IL8, CHI3L1, S100A8, GREM1 
Group vs groupMinimum no. of discriminatory transcriptsSensitivity (%)Specificity (%)Including transcripts:
Adenoma vs hyperplastic polyp 100 90.91 ABCA8, KIAA1199, GCG, MAMDC2, C2orf32, 229670_at, IGF1, PCDH7, PRDX6 
IBD vs normal 100 100 REG1A, MMP3, CHI3L1 
Adenoma vs CRC 61 90.91 100 GREM1*, DDR2*, GUCY1A3*, TNS1, ADAMTS1, FBLN1, FLJ38028, RDX, FAM129A, ASPN, FRMD6, MCC, RBMS1*, SNAI2, MEIS1, DOCK10, PLEKHC1, FAM126A, TBC1D9, VWF, DCN, ROBO1, MSRB3, LATS2, MEF2C*, IGFBP3*, GNB4, RCN3, AKAP12, RFTN1, 226834_at, COL5A1, GNG2, NR3C1*, SPARCL1, MAB21L2, AXIN2, 236894_at, AEBP1, AP1S2, C10orf56, LPHN2, AKT3, FRMD6, COL15A1, CRYAB, COL14A1, LOC286167, QKI, WWTR1, GNG11, PAPPA, ELDT1 
IBD vs CRC 20 100 95.24 227458_at, INDO, CXCL9, CCR2, CD38, RARRES3, CXCL10, FAM26F*, TNIP3, NOS2A, CCRL1, TLR8, IL18BP, FCRL5, SAMD9L, ECGF1, TNFSF13B, GBP5, GBP1 
CRC-B vs CRC-CD 34 100 66.67 TMEM37*, IL33, CA4*, CCDC58, CLIC6, VSNL1*, ESPN, APCDD1, C13orf18, CYP4X1, ATP2A3, LOC646627, MUPCDH, ANPEP, C1orf115, HSD3B2, GBA3, GABRB2, GYLTL1B, LYZ, SPC25, CDKN2B, FAM89A, MOGAT2, SEMA6D, 229376_at, TSPAN5, IL6R, SLC26A2 
Adenoma with low-grade dysplasia vs adenoma with high-grade dysplasia 65 90.91 100 SI, DMBT1, CFI*, AQP1, APOD, TNFRSF17, CXCL10, CTSE, IGHA1, SLC9A3, SLC7A1, BATF2, SOCS1, DOCK2, NOS2A, HK2, CXCL2, IL15RA, POU2AF1, CLEC3B, ANI3BP, MGC13057, LCK*, C4BPA, HOXC6, GOLT1A, C2orf32, IL10RA, 240856_at, SOCS3, MEIS3P1, HIPK1, GLS, CPLX1, 236045_x_at, GALC, AMN, CCDC69, CCL28, CPA3, TRIB2, HMGA2, PLCL2, NR3C1, EIF5A, LARP4, RP5-1022P6.2, PHLDB2, FKBP1B, INDO, CLDN8, CNTN3, PBEF1, SLC16A9, CDC25B, TPSB2, PBEF1, ID4, GJB5, CHN2, LIMCH1, CXCL9, MFAP4 
UC vs CD 58 77.78 100 CCNG2, SLC44A4, DDAH1, TOB1, 231152_at, MKNK1, CEACAM7*, 1562836_at, CDC42SE2, PSD3, 231169_at, IGL@*, GSN, GPM6B, CDV3*, PDPK1, ANP32E, ADAM9, CDH1, NLRP2, 215777_at, OSBPL1, VNN1, RABGAP1L, PHACTR2, ASH1L, 213710_s_at, ZNF3, FUT2, IGHA1, EDEM1, GPR171, 229713_at, LOC643187, FLVCR1, SNAP23*, ETNK1, LOC728411, POSTN, MUC12, HOXA5, SIGLEC1, LARP5, PIGR, SPTBN1, UFM1, C6orf62, WDR90, ALDH1A3, F2RL1, IGHV1-69, DUOX2, RAB5A, CP 
Hyperplastic polyp vs normal 15 100 100 SLC6A14, ARHGEF10, ALS2, IL1RN, SPRY4, PTGER3, TRIM29, SERPINB5, 1560327_at, ZAK, BAG4, TRIB3, TTL, FOXQ1, UGT2A3 
Adenoma with low-grade dysplasia vs normal 100 100 KLK11, KIAA1199, FOXQ1 
Adenoma with high-grade dysplasia vs normal 100 100 CLDN8, ABCA8, PYY 
Adenoma vs normal 100 100 KIAA1199, FOXQ1, CA7 
CRC vs normal 100 100 VWF, IL8, CHI3L1, S100A8, GREM1 

Abbreviations: CD, Crohn's disease; UC, ulcerative colitis.

*

Transcripts represented by more probesets.

Beside pair-wise comparisons, the random forest classification was also done to distinguish between the above-mentioned diagnostic groups (Fig. 1). The estimated prediction error was 12.9%. The main diagnostic groups could be distinguished according to the mRNA expression levels of 18 genes, including cell cycle and cell proliferation regulatory genes (retinoic acid responder 3, LATS large tumor suppressor homologue 2, mutated in colorectal cancers, WARS), COP1 apoptosis gene, HLA-DMA, APOL3, GBP2, SLAMF8 inflammatory response related genes, SPARC-like 1 calcium-binding extracellular matrix gene, SLC15A3 oligopeptide transporter, as well as IFN regulatory factor 1, and quaking homologue transcription and mRNA processing genes. The exact functions of several classifier genes (FAM26F, SAMD9L, GBP4, GIMAP5) have not yet been identified.

Figure 1.

Random forest. A. Heat map of the diagnostic groups separated using the random forest classification method. The heatmap visualizes the expression level of genes (rows) that were selected as classifiers using the random forest supervised machine learning method. One can realize the difference of gene expression according to the different diagnostic groups (columns). B. Agreement plot for visualization of the confusion matrix of the true and the predicted classes. The agreement plot is the representation of the strength of agreement in the confusion matrix of the observed (true) and predicted classes. The prediction of each sample was based on the classifier using the genes presented on heatmap. Black areas show the observed agreement positioned within larger rectangles representing the maximum possible agreement, given the marginal totals. Gray areas represent the degree of disagreement. AD, adenoma; CRC, colorectal cancer; IBD, inflammatory bowel disease.

Figure 1.

Random forest. A. Heat map of the diagnostic groups separated using the random forest classification method. The heatmap visualizes the expression level of genes (rows) that were selected as classifiers using the random forest supervised machine learning method. One can realize the difference of gene expression according to the different diagnostic groups (columns). B. Agreement plot for visualization of the confusion matrix of the true and the predicted classes. The agreement plot is the representation of the strength of agreement in the confusion matrix of the observed (true) and predicted classes. The prediction of each sample was based on the classifier using the genes presented on heatmap. Black areas show the observed agreement positioned within larger rectangles representing the maximum possible agreement, given the marginal totals. Gray areas represent the degree of disagreement. AD, adenoma; CRC, colorectal cancer; IBD, inflammatory bowel disease.

Close modal

Identification of Subclassifier Transcripts

The successful subdivision of IBD to ulcerative colitis and Crohn's disease was achieved by the expression of 58 genes such as cyclin G2, dual oxidase 2 and CEACAM7 (sensitivity 77.78%, specificity 100%). Adenomas with low-grade and high-grade dysplasia could be distinguished using 65 classifier probesets such as aquaporin 1, CXCL10, and complement factor 1 (sensitivity: 90.91%, specificity: 100%). Early and advanced stage colorectal carcinomas were differentiated by 34 discriminatory transcripts including transmembrane protein 37, interleukin 33, carbonic anhydrase 4, visinin-like 1, ubiquitous calcium-transporting ATPase, and CDK inhibitor 2B by high specificity (100%) and somewhat lower sensitivity values (66.67%; Table 2).

Expression of the Colorectal Cancer–Associated Tissue Markers in Peripheral Blood

The differentially expressed genes were determined by significance analysis of microarrays between colorectal cancer samples and healthy normal controls. The presence of these local tissue-specific mRNA expression markers in peripheral blood samples was also analyzed using the blood samples of 19 colorectal cancer and 11 healthy patients. Fifty-two transcripts were significantly up-regulated both in biopsy specimen and the peripheral blood of colorectal cancer patients compared with healthy normal controls. Three genes (SLC26A2 sulfate transporter, 227682_at, and UDP-glucose dehydrogenase) showed significantly decreased mRNA level both in colorectal cancer biopsy and blood samples compared with normals. In some colorectal cancer–related transcripts, mRNA expression in blood changed in the opposite way compared with their levels in cancer tissue. Seventeen genes showing elevated mRNA expression in colorectal cancer biopsy samples were down-regulated in the peripheral blood of colorectal cancer patients, whereas 12 genes underexpressed in colorectal cancer tissue were found to be overexpressed in colorectal cancer blood samples (Table 3.).

Table 3.

Correlation between colorectal cancer versus normal biopsy and peripheral blood results

Gene symbolProbeset ID
Up-regulated in CRC compared with normal in both biopsy and blood samples  
    TPM4 212481_s_at 
    SESTD1 226763_at 
    TTYH3 224674_at 
    TIMP1 201666_at 
    CD44 212014_x_at 
    TM9SF4 212194_s_at 
    PIM3 224739_at 
    PELO 218472_s_at 
    C6orf145 212923_s_at 
    SFXN3 217226_s_at 
    MYL9 201058_s_at 
    CD44 210916_s_at 
    CD44 204490_s_at 
    VCAN 221731_x_at 
    CD44 209835_x_at 
    VCAN 204620_s_at 
    VCAN 211571_s_at 
    TGFBI 201506_at 
    PLXND1 38671_at 
    TKT 208700_s_at 
    VCAN 215646_s_at 
    PF4 206390_x_at 
    CD44 1557905_s_at 
    IFITM3 212203_x_at 
    S100A11 200660_at 
    NA 228910_at 
    G6PD 202275_at 
    AP1M1 223025_s_at 
    ZC3H12A 218810_at 
    FSCN1 210933_s_at 
    NDE1 218414_s_at 
    IER3 201631_s_at 
    PEA15 200787_s_at 
    PTP4A3 206574_s_at 
    IMPDH1 204169_at 
    PRKCDBP 213010_at 
    DDEF1 224796_at 
    ESAM 225369_at 
    CCDC85B 204610_s_at 
    MGC7036 227983_at 
    IFITM2 201315_x_at 
    IFITM1 201601_x_at 
    COL18A1 209082_s_at 
    RAB31 217762_s_at 
    FLNA 214752_x_at 
    TMEM158 213338_at 
    CTSK 202450_s_at 
    ENC1 201340_s_at 
    ICAM1 202638_s_at 
    INTS1 212212_s_at 
    PI3 203691_at 
    NA 227041_at 
Down-regulated in CRC compared with normal in both biopsy and blood samples  
    SLC26A2 224959_at 
    NA 227682_at 
    UGDH 203343_at 
Up-regulated in CRC compared with normal in biopsy, down-regulated in blood samples  
    RANBP2 201712_s_at 
    DNAJC10 225174_at 
    CRKRS 225694_at 
    SLC39A6 202088_at 
    SLC39A6 202089_s_at 
    DIS3 222607_s_at 
    ELK3 221773_at 
    DNAJC10 229588_at 
    RANBP5 211953_s_at 
    IL8 202859_x_at 
    RANBP2 226922_at 
    SACS 213262_at 
    DNAJC10 221782_at 
    POT1 204354_at 
    GALNACT-2 218871_x_at 
    HS2ST1 203284_s_at 
    XPOT 212160_at 
Down-regulated in CRC compared with normal in biopsy, up-regulated in blood samples  
    MTMR11 205076_s_at 
    ETHE1 204034_at 
    SULT1A3 209607_x_at 
    C9orf19 225604_s_at 
    AGXT2L2 226519_s_at 
    SULT1A2 207122_x_at 
    FCGRT 218831_s_at 
    TRPM6 240389_at 
    SULT1A2 211385_x_at 
    SULT1A1 203615_x_at 
    ACADVL 200710_at 
    C22orf16 224932_at 
Gene symbolProbeset ID
Up-regulated in CRC compared with normal in both biopsy and blood samples  
    TPM4 212481_s_at 
    SESTD1 226763_at 
    TTYH3 224674_at 
    TIMP1 201666_at 
    CD44 212014_x_at 
    TM9SF4 212194_s_at 
    PIM3 224739_at 
    PELO 218472_s_at 
    C6orf145 212923_s_at 
    SFXN3 217226_s_at 
    MYL9 201058_s_at 
    CD44 210916_s_at 
    CD44 204490_s_at 
    VCAN 221731_x_at 
    CD44 209835_x_at 
    VCAN 204620_s_at 
    VCAN 211571_s_at 
    TGFBI 201506_at 
    PLXND1 38671_at 
    TKT 208700_s_at 
    VCAN 215646_s_at 
    PF4 206390_x_at 
    CD44 1557905_s_at 
    IFITM3 212203_x_at 
    S100A11 200660_at 
    NA 228910_at 
    G6PD 202275_at 
    AP1M1 223025_s_at 
    ZC3H12A 218810_at 
    FSCN1 210933_s_at 
    NDE1 218414_s_at 
    IER3 201631_s_at 
    PEA15 200787_s_at 
    PTP4A3 206574_s_at 
    IMPDH1 204169_at 
    PRKCDBP 213010_at 
    DDEF1 224796_at 
    ESAM 225369_at 
    CCDC85B 204610_s_at 
    MGC7036 227983_at 
    IFITM2 201315_x_at 
    IFITM1 201601_x_at 
    COL18A1 209082_s_at 
    RAB31 217762_s_at 
    FLNA 214752_x_at 
    TMEM158 213338_at 
    CTSK 202450_s_at 
    ENC1 201340_s_at 
    ICAM1 202638_s_at 
    INTS1 212212_s_at 
    PI3 203691_at 
    NA 227041_at 
Down-regulated in CRC compared with normal in both biopsy and blood samples  
    SLC26A2 224959_at 
    NA 227682_at 
    UGDH 203343_at 
Up-regulated in CRC compared with normal in biopsy, down-regulated in blood samples  
    RANBP2 201712_s_at 
    DNAJC10 225174_at 
    CRKRS 225694_at 
    SLC39A6 202088_at 
    SLC39A6 202089_s_at 
    DIS3 222607_s_at 
    ELK3 221773_at 
    DNAJC10 229588_at 
    RANBP5 211953_s_at 
    IL8 202859_x_at 
    RANBP2 226922_at 
    SACS 213262_at 
    DNAJC10 221782_at 
    POT1 204354_at 
    GALNACT-2 218871_x_at 
    HS2ST1 203284_s_at 
    XPOT 212160_at 
Down-regulated in CRC compared with normal in biopsy, up-regulated in blood samples  
    MTMR11 205076_s_at 
    ETHE1 204034_at 
    SULT1A3 209607_x_at 
    C9orf19 225604_s_at 
    AGXT2L2 226519_s_at 
    SULT1A2 207122_x_at 
    FCGRT 218831_s_at 
    TRPM6 240389_at 
    SULT1A2 211385_x_at 
    SULT1A1 203615_x_at 
    ACADVL 200710_at 
    C22orf16 224932_at 

Taqman RT-PCR Validation of 26 Selected Genes

The expression of all the 11 (6 up-regulated and 5 down-regulated in microarray analysis) adenoma-associated genes, 15 of the 18 colorectal cancer–related genes (15 overexpressed and 3 underexpressed), and all the 14 ulcerative colitis–associated genes (13 up-regulated and 1 down-regulated) correlated significantly with the Affymetrix results (P < 0.05). On average, the mRNA expression of 93% of the selected genes was verified by Taqman RT-PCR (Table 4.).

Table 4.

Taqman RT-PCR confirmation of the Affymetrix microarray results

Taqman IDGene symbolGene nameAffymetrix IDSample groupsP < 0.05ddCt
Hs00153304_m1 CD44 CD44 antigen 212014_x_at AD vs normal 1.82E−07 1.90 
Hs00171022_m1 CXCL12 Chemokine (C-X-C motif) ligand 12 209687_at AD vs normal 0.00305 −2.04 
    CRC vs normal 0.00735 −1.95 
Hs00179845_m1 MET Met proto-oncogene 203510_at AD vs normal 1.41E−06 2.17 
    CRC vs normal 0.00002 1.53 
Hs00200350_m1 ABCA8 ATP-binding cassette, subfamily A (ABC1), member 8 204719_at AD vs normal 0.000610 −3.35 
    CRC vs normal 0.00143 −3.20 
Hs00205545_m1 ADAMDEC1 ADAM-like, decysin 1 206134_at AD vs normal 1.16E−05 −3.69 
    CRC vs normal 9.18E−05 −2.74 
Hs00214306_m1 TRPM6 Transient receptor potential cation channel, subfamily M, member 6 224412_s_at AD vs normal 5.79E−05 −4.73 
    UC vs normal 0.000385 −4.63 
Hs00153408_m1 MYC v-myc myelocytomatosis viral oncogene homologue (avian) 202431_s_at AD vs normal 5.99E−06 2.35 
Hs00171558_m1 TIMP1 Tissue inhibitor of metalloproteinase 1 201666_at AD vs normal 3.90E−07 2.58 
    CRC vs normal 0.00153 2.74 
    UC vs normal 0.000219 2.36 
Hs00236937_m1 CXCL1 Chemokine (C-X-C motif) ligand 1 204470_at CRC vs normal 0.0114 3.84 
    UC vs normal 1.11E−05 4.04 
Hs00236966_m1 CXCL2 Chemokine (C-X-C motif) ligand 2 209774_x_at CRC vs normal 0.00204 3.70 
    UC vs normal 0.000592 3.68 
Hs00266139_m1 CA1 Carbonic anhydrase I 205950_s_at AD vs normal 0.000930 −6.13 
Hs00194353_m1 LCN2 Lipocalin 2 212531_at AD vs normal 2.67E−07 6.13 
    CRC vs normal 0.000509 4.83 
    UC vs normal 2.15E−06 5.06 
Hs00154230_m1 CALU Calumenin 214845_s_at CRC vs normal 0.0145 1.60 
Hs00169795_m1 VWF von Willebrand factor 202112_at CR vs normal 0.55142  
    UC vs normal 0.000112 2.44 
Hs00266237_m1 COL4A1 Collagen, type IV, α 1 211980_at CRC vs normal 0.0283 3.38 
Hs00156076_m1 BGN Biglycan 213905_x_at CRC vs normal 0.12042  
Hs00169777_m1 PECAM1 Platelet/endothelial cell adhesion molecule 208983_s_at CRC vs normal 0.76378  
Hs00174103_m1 IL8 Interleukin 8 202859_x_at CRC vs normal 0.0283 7.21 
    UC vs normal 6.80E−06 5.77 
Hs00204187_m1 DUOX2 Dual oxidase 2 219727_at UC vs normal 7.84E−05 6.35 
Hs00195812_m1 LIPG Lipase, endothelial 219181_at AD vs normal 0.000588 1.35 
    CRC vs normal 0.00711 1.08 
    UC vs normal 0.000588 1.35 
Hs00829485_sH IFITM2 IFN induced transmembrane protein 2 (1-8D) 201315_x_at CRC vs normal 0.00114 2.26 
Hs00171061_m1 CXCL3 Chemokine (C-X-C motif) ligand 3 207850_at CRC vs normal 0.00384 3.22 
    UC vs normal 7.48E−05 3.58 
Hs00277299_m1 IL1RN Interleukin 1 receptor antagonist 212657_s_at CRC vs normal 0.00714 4.66 
    UC vs normal 1.10E−05 5.30 
Hs00234579_m1 MMP9 Matrix metalloproteinase 9 203936_s_at UC vs normal 0.00724 1.85 
Hs00160066_m1 PI3 Protease inhibitor 3, skin-derived (SKALP) 203691_at UC vs normal 0.000257 4.26 
Hs00197374_m1 UBD Ubiquitin D 205890_s_at UC vs normal 0.000261 3.20 
Taqman IDGene symbolGene nameAffymetrix IDSample groupsP < 0.05ddCt
Hs00153304_m1 CD44 CD44 antigen 212014_x_at AD vs normal 1.82E−07 1.90 
Hs00171022_m1 CXCL12 Chemokine (C-X-C motif) ligand 12 209687_at AD vs normal 0.00305 −2.04 
    CRC vs normal 0.00735 −1.95 
Hs00179845_m1 MET Met proto-oncogene 203510_at AD vs normal 1.41E−06 2.17 
    CRC vs normal 0.00002 1.53 
Hs00200350_m1 ABCA8 ATP-binding cassette, subfamily A (ABC1), member 8 204719_at AD vs normal 0.000610 −3.35 
    CRC vs normal 0.00143 −3.20 
Hs00205545_m1 ADAMDEC1 ADAM-like, decysin 1 206134_at AD vs normal 1.16E−05 −3.69 
    CRC vs normal 9.18E−05 −2.74 
Hs00214306_m1 TRPM6 Transient receptor potential cation channel, subfamily M, member 6 224412_s_at AD vs normal 5.79E−05 −4.73 
    UC vs normal 0.000385 −4.63 
Hs00153408_m1 MYC v-myc myelocytomatosis viral oncogene homologue (avian) 202431_s_at AD vs normal 5.99E−06 2.35 
Hs00171558_m1 TIMP1 Tissue inhibitor of metalloproteinase 1 201666_at AD vs normal 3.90E−07 2.58 
    CRC vs normal 0.00153 2.74 
    UC vs normal 0.000219 2.36 
Hs00236937_m1 CXCL1 Chemokine (C-X-C motif) ligand 1 204470_at CRC vs normal 0.0114 3.84 
    UC vs normal 1.11E−05 4.04 
Hs00236966_m1 CXCL2 Chemokine (C-X-C motif) ligand 2 209774_x_at CRC vs normal 0.00204 3.70 
    UC vs normal 0.000592 3.68 
Hs00266139_m1 CA1 Carbonic anhydrase I 205950_s_at AD vs normal 0.000930 −6.13 
Hs00194353_m1 LCN2 Lipocalin 2 212531_at AD vs normal 2.67E−07 6.13 
    CRC vs normal 0.000509 4.83 
    UC vs normal 2.15E−06 5.06 
Hs00154230_m1 CALU Calumenin 214845_s_at CRC vs normal 0.0145 1.60 
Hs00169795_m1 VWF von Willebrand factor 202112_at CR vs normal 0.55142  
    UC vs normal 0.000112 2.44 
Hs00266237_m1 COL4A1 Collagen, type IV, α 1 211980_at CRC vs normal 0.0283 3.38 
Hs00156076_m1 BGN Biglycan 213905_x_at CRC vs normal 0.12042  
Hs00169777_m1 PECAM1 Platelet/endothelial cell adhesion molecule 208983_s_at CRC vs normal 0.76378  
Hs00174103_m1 IL8 Interleukin 8 202859_x_at CRC vs normal 0.0283 7.21 
    UC vs normal 6.80E−06 5.77 
Hs00204187_m1 DUOX2 Dual oxidase 2 219727_at UC vs normal 7.84E−05 6.35 
Hs00195812_m1 LIPG Lipase, endothelial 219181_at AD vs normal 0.000588 1.35 
    CRC vs normal 0.00711 1.08 
    UC vs normal 0.000588 1.35 
Hs00829485_sH IFITM2 IFN induced transmembrane protein 2 (1-8D) 201315_x_at CRC vs normal 0.00114 2.26 
Hs00171061_m1 CXCL3 Chemokine (C-X-C motif) ligand 3 207850_at CRC vs normal 0.00384 3.22 
    UC vs normal 7.48E−05 3.58 
Hs00277299_m1 IL1RN Interleukin 1 receptor antagonist 212657_s_at CRC vs normal 0.00714 4.66 
    UC vs normal 1.10E−05 5.30 
Hs00234579_m1 MMP9 Matrix metalloproteinase 9 203936_s_at UC vs normal 0.00724 1.85 
Hs00160066_m1 PI3 Protease inhibitor 3, skin-derived (SKALP) 203691_at UC vs normal 0.000257 4.26 
Hs00197374_m1 UBD Ubiquitin D 205890_s_at UC vs normal 0.000261 3.20 

Abbreviation: AD, adenoma.

TMA Analysis and Blood Smear Immunocytochemistry Results

In accordance with mRNA expression results, elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4α1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in colorectal cancer compared with healthy controls. Moderate cytoplasmatic osteopontin and osteonectin staining was found in the apical cytoplasm of epithelial cells in healthy colon tissue. Both osteonectin and osteopontin showed moderate to strong diffuse cytoplasmatic staining in colorectal cancer samples. Osteonectin protein expression was also significantly increased in blood smears of colorectal cancer patients (osteonectin positive mononuclear cells, 20.89% ± 2.16%) compared with the normal (6.72% ± 2.65%; P = 6.35 × 10−9; Supplementary Fig. S1). In colorectal cancer cases, strong subepithelial BGN immunostaining was found in lamina proprial myofibroblast like cells and leukocytes. No epithelial BGN immunoreactivity was detected. Most of the normal samples were negative for BGN, but in some cases weak apical epithelial BGN immunostaining was found, and no subepithelial labeling was seen. Whereas all normal samples were negative for Col4A1, certain carcinomatous cells showed a moderate to strong epithelial Col4A1 immunostaining in colorectal cancer samples. There was no lamina propria immunoreactivity. Regarding vWF, there was moderate epithelial immunostaining in carcinomatous cells in colorectal cancer samples, and some vWF immunoreactivity was also seen scattered in the lamina propria whereas in normal cases no mucosal immunostaining was seen. Subepithelial MMP9 immunostaining was found to be moderate and strong in lamina proprial leukocytes in colorectal cancer cases but not in carcinomatous epithelium. A diffuse weak intracytoplasmatic epithelial immunoreactivity was seen in normal samples. In case of vascular endothelial growth factor, epithelial immunoreactivity was found to be moderate to strong diffusely in carcinomatous cells of colorectal cancer samples. The subepithelium showed a moderate reaction. Weak to moderate subepithelial and luminal epithelial vascular endothelial growth factor immunoreactivity was found in almost all normal samples (Fig. 2).

Figure 2.

Immunostainings in TMA sections. A. Osteonectin immunostaining in CRC (A1) and healthy colonic mucosa (A2). B. Osteopontin immunostaining in CRC (B1) and healthy colonic mucosa (B2). C. Biglycan immunostaining in CRC (C1) and healthy colonic mucosa (C2). D. Collagen 4A1 immunostaining in CRC (D1) and healthy colonic mucosa (D2). E. von Willebrand factor immunostaining in CRC (E1) and healthy colonic mucosa (E2). F. MMP9 immunostaining in CRC (F1) and healthy colonic mucosa (F2). G. VEGF immunostaining in CRC (G1) and healthy colonic mucosa (G2). H. PECAM1 protein expression in active IBD (H1) and normal colonic tissue (H2). I. Collagen 4A1 protein expression in active IBD (I1) and healthy colonic mucosa (I2). The white arrows show the colonic epithelial cells. Elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4α1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in CRC compared with healthy controls. In proportion to normal tissue, overexpression of PECAM1, and collagen 4α1 proteins was found in IBD.

Figure 2.

Immunostainings in TMA sections. A. Osteonectin immunostaining in CRC (A1) and healthy colonic mucosa (A2). B. Osteopontin immunostaining in CRC (B1) and healthy colonic mucosa (B2). C. Biglycan immunostaining in CRC (C1) and healthy colonic mucosa (C2). D. Collagen 4A1 immunostaining in CRC (D1) and healthy colonic mucosa (D2). E. von Willebrand factor immunostaining in CRC (E1) and healthy colonic mucosa (E2). F. MMP9 immunostaining in CRC (F1) and healthy colonic mucosa (F2). G. VEGF immunostaining in CRC (G1) and healthy colonic mucosa (G2). H. PECAM1 protein expression in active IBD (H1) and normal colonic tissue (H2). I. Collagen 4A1 protein expression in active IBD (I1) and healthy colonic mucosa (I2). The white arrows show the colonic epithelial cells. Elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4α1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in CRC compared with healthy controls. In proportion to normal tissue, overexpression of PECAM1, and collagen 4α1 proteins was found in IBD.

Close modal

In comparison with normal tissue, PECAM1 and collagen 4α1 proteins were overexpressed in IBD in accordance with the up-regulated mRNA levels detected by microarrays. In IBD samples there was a strong subepithelial PECAM1 immunoreaction in lamina proprial leukocytoid cells. There was no epithelial immunoreaction in any of the normal samples. In several IBD samples a weak Col4α1 immunoreaction was found compared with normals. No subepithelial immunostaining could be detected (Fig. 2).

In this study, 85 colonic biopsy samples and 30 peripheral blood samples were analyzed in total by whole genomic expression microarrays to identify local tissue classifiers between the diagnostic groups and to analyze the presence of the tissue expression markers in peripheral blood.

In the daily routine, the situation where the biopsy sample taken during the endoscopic intervention is not evaluable in the appropriate manner by conventional histology occurs relatively frequently. Diagnostic expression profile from the whole biopsy specimen can overcome the sampling error failures in histology.

For the objective, molecular-based classification of the biopsy samples into main diagnostic groups, classifier transcript sets were determined. Functional analysis of significant genes can provide important information, because with the identification of the main signaling pathways, the key genes characterizing the given pathomechanism can be found and used for diagnostic analysis.

Because an IBD, especially the long-standing ulcerative colitis, is a precancerous condition, the analysis of IBD specimen is important to find early the adenoma-dysplasia-carcinoma sequence–related genes.

All three IBD classifiers have been hypothesized to show increased expression in IBD. In case of a tissue injury associated with IBD, REG1A (regenerating islet-derived 1α) mRNA was observed to be highly expressed in colonic mucosa (38). The protein product of this gene has a positive regulatory effect on cell proliferation (39), and may contribute to reduce epithelial apoptosis in inflammation (38). Matrix metalloproteinase 3 (MMP3), involved in wound repair and tumor initiation, was also up-regulated in IBD (40). Microarray analysis done by Mizoguchi et al. indicated that the third classifier, chitinase 3-like 1 (CHI3L1) is overexpressed specifically in inflamed mucosa. CHI3L1 plays a pathogenic role in colitis, presumably by enhancing the adhesion and invasion of bacteria on/into colonic epithelial cells (41). Dysregulated host/microbial interactions seem to play a central role in the pathogenesis of IBD.

Analyzed by function, most of the colorectal cancer versus adenoma discriminatory genes are involved in intracellular signal transduction (GNG11, latrophilin, AKAP12, ELTD1, tensin 1, axin 2, GNB4, ELTD1), cell proliferation (IGFBP3, MCC, LATS2), cell adhesion (ROBO1, AEBP1, VWF, collagen 15A1, DDR2, PLEKHC1), and transcription regulation (like NR3C1, WWTR1, MEIS1, MEF2C, SNAI2). However, the functions of several discriminatory transcripts are still unknown. For instance, gremlin 1 (GREM1) which is represented among the classifiers with two probesets as an antagonist of BMP, may play a role in regulating organogenesis, body patterning, and tissue differentiation. It was overexpressed in various human tumors including carcinomas of the lung, ovary, kidney, breast, colon, pancreas, and sarcoma (42).

Polyps could be classified into adenomatous and hyperplastic polyps according to the expression levels of nine transcripts. The ABCA8 ABC transporter, which was previously found to be underexpressed in colorectal cancer (28, 43), showed decreased expression in adenoma compared with hyperplastic polyp samples. Lower glucagon mRNA levels in adenomas may refer to the altered intestinal barrier function (44) and disordered cell proliferation regulation. Interestingly, IGF1, overexpression of which is closely associated with the early stage of colorectal carcinogenesis (45), was found to be more intensely expressed in hyperplastic polyps than in adenomas. The lower perixoredoxin 6 expression may indicate weaker protection against oxidative stress in adenomas. The exact functions of the MAMDC2, C2orf32, 229670_at, and KIAA1199 discriminatory transcripts have not yet been clarified.

The colorectal cancer versus IBD discriminatory genes are mainly immune and defense response–related genes (like CXCL9, CXCL10 chemokine ligands, CCR2, CCRL1 chemokine receptors, interleukin 18 binding protein, GBP1, GBP5, NOS2A, INDO, TNFSF13B, toll-like receptor 8, 227458_at) which showed decreased mRNA levels in colorectal cancer compared with IBD samples. CD38 expressed mainly in leukocytes is involved in cell adhesion and signal transduction, RARRES3 is a negative regulator of cell proliferation, whereas ECGF1 is a growth factor with angiogenic effects. RARRES3 has been reported to act as a tumor suppressor or growth regulator (46). Its decreased expression in colorectal cancer seems to support this assumption. Autocrine production of ECGF1 by endothelial cells may be a mechanism of inflammatory angiogenesis but not tumor angiogenesis and might be particularly important for the maintenance of damaged vasculature in IBD (47). The functions of some newly identified expression markers (FAM26F, FCRL5, SAMD9L, TNIP3) are unclarified.

The main diagnostic groups (colorectal cancer, IBD, adenomas, hyperplastic polyps) can be distinguished according to the mRNA expression levels of 18 genes determined by the random forest classification method with a 12.9% prediction error.

Besides the objective classification of the samples into main diagnostic groups, the differentiation among disease subtypes is also important for the improvement of the molecular-based diagnostics.

A relatively high number of classifiers is required for differentiation between high-grade and low-grade dysplastic villous adenomas. Several tumorigenesis-related discriminatory transcripts (such as HIPK1, CDC25B, CXCL2, and HMGA2) were found to be overexpressed in high-grade dysplastic adenoma referring to the high risk of colorectal cancer development (13, 14, 43, 48, 49). Homeodomain interacting protein kinase 1 (HIPK1) may thus play a role in tumorigenesis, perhaps by regulating the expression of p53 and/or Mdm2 (48). A correlation has previously been shown between the presence of HMGI proteins and the expression of a highly malignant phenotype in epithelial and fibroblastic rat thyroid cells. Moreover, HMGA2 seems to be involved in colorectal carcinogenesis (49).

Most of the colorectal cancer subtype classificators are involved in transport processes (calcium ion transport: transmembrane protein 37, ubiquitous calcium-transporting ATPase, CLIC6 chloride transporter, CYP4X1 electron transporter, GABRB2 chloride channel, SLC26A2 sulfate transporter), in metabolic processes (carbonic anhydrase 4, UDP glucuronosyltransferase 2 A3 polypeptide, glycosyltransferase-like 1B, monoacylglycerol O-acyltransferase 2), in cell adhesion and motility (espin, mucin-like protocadherin, tetraspanin 5), in signal transduction (visinin-like 1, C13orf18), and in cell cycle regulation (SPC25 kinetochore complex component, CDK inhibitor 2B). Visinin-like 1 (VSNL1) was overexpressed in neuroblastoma tumor specimens from patients with distant organ metastases compared with those without metastases (50). Decreased expression of the CDKN2B (alias p15) tumor suppressor gene is also typical in advanced colorectal cancer.

The future perspectives are to state the diagnosis and to perform screening using a more easily available sample source such as peripheral blood, and the further required diagnostic-therapeutic steps may be done with the help of them. However, WBC circulating in the peripheral blood tour all tissues of the body, and gene expression changes in them are affected by more conditions than the gene expression patterns in local tissue alterations. It is important to find the tissue markers that appear also in peripheral blood and can be specific for a given organic alteration.

Several colorectal cancer–associated tissue markers changed in peripheral blood parallel to the locally measured expression levels. Genes showing up-regulation in both biopsy and peripheral blood samples of colorectal cancer patients compared with normal controls are mainly involved in cell adhesion (like CD44, TGFβ1, ICAM1, versican, collagen 18A1, pelota homologue endothelial cell adhesion molecule), cell proliferation (such as IFITM1, IFITM2, TIMP1, fascin homologue 1), and intracellular signal transduction (including S100A11, filamin A, and DDEF1), whereas the functions of nine transcripts (like CCDC85B, TM9SF4, C6orf145, and TMEM158) have not yet been identified. The gene signals may come from peripheral blood mononucleic cells, as well as from circulating tumor cells. Previously, we reported a significantly positive correlation between the number of circulating tumor cells and clinical properties of colorectal cancer (51). The underexpressed genes in both biopsy and blood samples are involved in metabolism (UGDH) and sulfate transport (SLC26A2), whereas the function of 227682_at is unknown. In some colorectal cancer–related transcripts, mRNA expression in blood changed in the opposite way compared with their levels in cancer tissue. This phenomenon may relate to secondary immunologic processes including tumor-infiltrating lymphocytes rather than circulating tumor cells.

The expression of selected IBD- and colorectal cancer–associated genes was also measured at protein level, on 292 tissue sections of 29 overlapping and 93 independent sets of patients. TMA technology allowed the standardized analysis of a large number of samples within a short time and the validation of some of the mRNA expression results. In accordance with mRNA expression results, elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4A1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in colorectal cancer compared with healthy controls. Osteonectin protein expression in blood smears of colorectal cancer patients was also significantly elevated compared with normal controls. Overexpression of PECAM1 and collagen 4α1 proteins was detected in IBD compared with normal tissue, in accordance with the up-regulated mRNA levels detected by microarray.

In conclusion, whole genomic microarray analysis using routine biopsy samples may be suitable for the identification of discriminative signatures for differential diagnostic purposes. Our results may serve as a basis of new gene expression pattern–based diagnostic methods like Taqman and/or LightCycler 480 real-time PCR cards. As the mRNA expression results showed a strong correlation with the protein level expression, simultaneous analysis of protein marker sets can also take place. Nowadays, antibodies recognizing a wide range of proteins in formol-paraffin tissue sections are available, offering immunostaining of disease-specific markers as a simple test for daily diagnostic utilization.

No potential conflicts of interest were disclosed.

Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

We thank László Nagy, M.D., Ph.D., and Beáta Scholtz, Ph.D., for their help with the Taqman real-time PCR analysis, and Gabriella Kónya for her work in preparing the TMA immunostainings.

1
O'Connell JB, Maggard MA, Ko CY. Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging.
J Natl Cancer Inst
2004
;
96
:
1420
–5.
2
Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis.
Cell
1990
;
61
:
759
–67.
3
Leslie A, Carey FA, Pratt NR, Steele RJ. The colorectal adenoma-carcinoma sequence.
Br J Surg
2002
;
89
:
845
–60.
4
Hawkins NJ, Bariol C, Ward RL. The serrated neoplasia pathway.
Pathology
2002
;
34
:
548
–55.
5
Jass JR, Baker K, Zlobec I, et al. Advanced colorectal polyps with the molecular and morphological features of serrated polyps and adenomas: concept of a ‘fusion’ pathway to colorectal cancer.
Histopathology
2006
;
49
:
121
–31.
6
Kambara T, Simms LA, Whitehall VL, et al. BRAF mutation is associated with DNA methylation in serrated polyps and cancers of the colorectum.
Gut
2004
;
53
:
1137
–44.
7
Jass JR. Serrated adenoma of the colorectum and the DNA-methylator phenotype.
Nat Clin Pract Oncol
2005
;
2
:
398
–405.
8
Iino H, Jass JR, Simms LA, et al. DNA microsatellite instability in hyperplastic polyps, serrated adenomas, and mixed polyps: a mild mutator pathway for colorectal cancer?
J Clin Pathol
1999
;
52
:
5
–9.
9
Kwon HC, Kim SH, Roh MS, et al. Gene expression profiling in lymph node-positive and lymph node-negative colorectal cancer.
Dis Colon Rectum
2004
;
47
:
141
–52.
10
Agrawal D, Chen T, Irby R, et al. Osteopontin identified as colon cancer tumor progression marker.
C R Biol
2003
;
326
:
1041
–3.
11
Bandres E, Catalan V, Sola I, et al. Dysregulation of apoptosis is a major mechanism in the lymph node involvement in colorectal carcinoma.
J Oncol Rep
2004
;
12
:
287
–92.
12
Bertucci F, Salas S, Eysteries S, et al. Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters.
Oncogene
2004
;
23
:
1377
–91.
13
Birkenkamp-Demtroder K, Christensen LL, Olesen SH, et al. Gene expression in colorectal cancer.
Cancer Res
2002
;
62
:
4352
–63.
14
Williams NS, Gaynor RB, Scoggin S, et al. Identification and validation of genes involved in the pathogenesis of colorectal cancer using cDNA microarrays and RNA interference.
Clin Cancer Res
2003
;
9
:
931
–46.
15
Li M, Lin YM, Hasegawa S, et al. Genes associated with liver metastasis of colon cancer, identified by genome-wide cDNA microarray.
Int J Oncol
2004
;
24
:
305
–12.
16
Lin YM, Furukawa Y, Tsunoda T, Yue CT, Yang KC, Nakamura Y. Molecular diagnosis of colorectal tumors by expression profiles of 50 genes expressed differentially in adenomas and carcinomas.
Oncogene
2002
;
21
:
4120
–8.
17
Yanagawa R, Furukawa Y, Tsunoda T, et al. Genome-wide screening of genes showing altered expression in liver metastases of human colorectal cancers by cDNA microarray.
Neoplasia
2001
;
3
:
395
–401.
18
Frederiksen CM, Knudsen S, Laurberg S, Orntoft TF. Classification of Dukes' B and C colorectal cancers using expression arrays.
J Cancer Res Clin Oncol
2003
;
129
:
263
–71.
19
Zou TT, Selaru FM, Xu Y, Shustova V, Yin J, Mori Y. Application of cDNA microarrays to generate a molecular taxonomy capable of distinguishing between colon cancer and normal colon.
Oncogene
2002
;
21
:
4855
–62.
20
Birkenkamp-Demtroder K, Olesen SH, Sorensen FB, et al. Differential gene expression in colon cancer of the caecum versus the sigmoid and rectosigmoid.
Gut
2005
;
54
:
374
–84.
21
Chiu ST, Hsieh FJ, Chen SW, Chen CL, Shu HF, Li H. Clinicopathologic correlation of up-regulated genes identified using cDNA microarray and real-time reverse transcription-PCR in human colorectal cancer.
Cancer Epidemiol Biomarkers Prev
2005
;
14
:
437
–43.
22
Wang Y, Jatkoe T, Zhang Y, et al. Gene expression profiles and molecular markers to predict recurrence of Dukes' B colon cancer.
J Clin Oncol
2004
;
22
:
1564
–71.
23
Burczynski ME, Twine NC, Dukart G, et al. Transcriptional profiles in peripheral blood mononuclear cells prognostic of clinical outcomes in patients with advanced renal cell carcinoma.
Clin Cancer Res
2005
;
11
:
1181
–9.
24
Twine NC, Stover JA, Marshall B, et al. Disease-associated expression profiles in peripheral blood mononuclear cells from patients with advanced renal cell carcinoma.
Cancer Res
2003
;
63
:
6069
–75.
25
Martin KJ, Graner E, Li Y, et al. High-sensitivity array analysis of gene expression for the early detection of disseminated breast tumor cells in peripheral blood.
Proc Natl Acad Sci U S A
2001
;
98
:
2646
–51.
26
Smirnov DA, Zweitzig DR, Foulk BW, et al. Global gene expression profiling of circulating tumor cells.
Cancer Res
2005
;
65
:
4993
–7.
27
Toth K, Galamb O, Solymosi N, et al. Peripheral blood gene expression markers of colorectal diseases.
Magy Belorv Arch
2007
;
60
:
531
–9.
28
Galamb O, Gyorffy B, Sipos F, et al. Inflammation, adenoma and cancer: objective classification of colon biopsy specimens with gene expression signature.
Dis Markers
2008
;
25
:
1
–16.
29
Van der Flier LG, Sabates-Bellver J, Oving I, et al. The intestinal Wnt/TCF signature.
Gastroenterology
2007
;
132
:
628
–32.
30
Csillag C, Nielsen OH, Vainer B, et al. Expression of the genes dual oxidase 2, lipocalin 2 and regenerating islet-derived 1 α in Crohn's disease.
Scand J Gastroenterol
2007
;
42
:
454
–63.
31
Gardina PJ, Clark TA, Shimada B, et al. Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array.
BMC Genomics
2006
;
7
:
325
.
32
Borthakur A, Bhattacharyya S, Dudeja PK, Tobacman JK. Carrageenan induces interleukin-8 production through distinct Bcl10 pathway in normal human colonic epithelial cells.
Am J Physiol Gastrointest Liver Physiol
2007
;
292
:
G829
–38.
33
Mansilla F, Birkenkamp-Demtroder K, Kruhoffer M, et al. Differential expression of DHHC9 in microsatellite stable and instable human colorectal cancer subgroups.
Br J Cancer
2007
;
96
:
1896
–903.
34
Tumor Analysis Best Practices Working Group. Expression profiling-best practices for data generation and interpretation in clinical trials.
Nat Rev Genet
2004
;
5
:
229
–37.
35
Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest.
BMC Bioinformatics
2006
;
7
:
3
.
36
Efron B, Tibshirani RJ. Improvements on cross-validation: the.632+ bootstrap method.
J Am Stat Assoc
1997
;
92
:
548
–60.
37
Friendly M. Visualizing Categorical Data. SAS Institute. Cary (NC): 2000.
38
Dieckgraefe BK, Crimmins DL, Landt V, et al. Expression of the regenerating gene family in inflammatory bowel disease mucosa: Reg Iα upregulation, processing, and antiapoptotic activity.
J Investig Med
2002
;
50
:
421
–34.
39
Watanabe T, Yonekura H, Terazono K, Yamamoto H, Okamoto H. Complete nucleotide sequence of human reg gene and its expression in normal and tumoral tissues. The reg protein, pancreatic stone protein, and pancreatic thread protein are one and the same product of the gene.
J Biol Chem
1990
;
265
:
7432
–9.
40
Lawrance IC, Fiocchi C, Chakravarti S. Ulcerative colitis and Crohn's disease: distinctive gene expression profiles and novel susceptibility candidate genes.
Hum Mol Genet
2001
;
10
:
445
–56.
41
Mizoguchi E. Chitinase 3-like-1 exacerbates intestinal inflammation by enhancing bacterial adhesion and invasion in colonic epithelial cells.
Gastroenterology
2006
;
130
:
398
–411.
42
Namkoong H, Shin SM, Kim HK, et al. The bone morphogenetic protein antagonist gremlin 1 is overexpressed in human cancers and interacts with YWHAH protein.
BMC Cancer
2006
;
6
:
74
.
43
Croner RS, Foertsch T, Brueckl WM, et al. Common denominator genes that distinguish colorectal carcinoma from normal mucosa.
Int J Colorectal Dis
2005
;
20
:
353
–62.
44
Benjamin MA, McKay DM, Yang PC, Cameron H, Perdue MH. Glucagon-like peptide-2 enhances intestinal epithelial barrier function of both transcellular and paracellular pathways in the mouse.
Gut
2000
;
47
:
112
–9.
45
Nosho K, Yamamoto H, Taniguchi H, et al. Interplay of insulin-like growth factor-II, insulin-like growth factor-I, insulin-like growth factor-I receptor, COX-2, and matrix metalloproteinase-7, play key roles in the early stage of colorectal carcinogenesis.
Clin Cancer Res
2004
;
10
:
7950
–7.
46
Jiang SY, Chou JM, Leu FJ, et al. Decreased expression of type II tumor suppressor gene RARRES3 in tissues of hepatocellular carcinoma and cholangiocarcinoma.
World J Gastroenterol
2005
;
11
:
948
–53.
47
Saito S, Tsuno NH, Sunami E, et al. Expression of platelet-derived endothelial cell growth factor in inflammatory bowel disease.
J Gastroenterol
2003
;
38
:
229
–37.
48
Kondo S, Lu Y, Debbas M, et al. Characterization of cells and gene-targeted mice deficient for the p53-binding kinase homeodomain-interacting protein kinase 1 (HIPK1).
Proc Natl Acad Sci U S A
2003
;
100
:
5431
–6.
49
Fedele M, Bandiera A, Chiappetta G, et al. Human colorectal carcinomas express high levels of high mobility group HMGI(Y) proteins.
Cancer Res
1996
;
56
:
1896
–901.
50
Xie Y, Chan H, Fan J, et al. Involvement of visinin-like protein-1 (VSNL-1) in regulating proliferative and invasive properties of neuroblastoma.
Carcinogenesis
2007
;
28
:
2122
–30.
51
Molnar B, Floro L, Sipos F, Toth B, Sreter L, Tulassay Z. Elevation in peripheral blood circulating tumor cell number correlates with macroscopic progression in UICC stage IV colorectal cancer patients.
Dis Markers
2008
;
24
:
141
–50.

Supplementary data