Abstract
Purpose: Gene expression profile (GEP)–based classification of colonic diseases is a new method for diagnostic purposes. Our aim was to develop diagnostic mRNA expression patterns that may establish the basis of a new molecular biological diagnostic method.
Experimental Design: Total RNA was extracted, amplified, and biotinylated from frozen colonic biopsies of patients with colorectal cancer (n = 22), adenoma (n = 20), hyperplastic polyp (n = 11), inflammatory bowel disease (n = 21), and healthy normal controls (n = 11), as well as peripheral blood samples of 19 colorectal cancer and 11 healthy patients. Genome-wide gene expression profile was evaluated by HGU133plus2 microarrays. To identify the differentially expressed features, the significance analysis of microarrays and, for classification, the prediction analysis of microarrays were used. Expression patterns were validated by real-time PCR. Tissue microarray immunohistochemistries were done on tissue samples of 121 patients.
Results: Adenoma samples could be distinguished from hyperplastic polyps by the expression levels of nine genes including ATP-binding cassette family A, member 8, insulin-like growth factor 1 and glucagon (sensitivity, 100%; specificity, 90.91%). Between low-grade and high-grade dysplastic adenomas, 65 classifier probesets such as aquaporin 1, CXCL10, and APOD (90.91/100) were identified; between colorectal cancer and adenoma, 61 classifier probesets including axin 2, von Willebrand factor, tensin 1, and gremlin 1 (90.91/100) were identified. Early- and advanced-stage colorectal carcinomas could be distinguished using 34 discriminatory transcripts (100/66.67).
Conclusions: Whole genomic microarray analysis using routine biopsy samples is suitable for the identification of discriminative signatures for differential diagnostic purposes. Our results may be the basis for new GEP-based diagnostic methods. (Cancer Epidemiol Biomarkers Prev 2008;17(10):2835–45)
Introduction
Colorectal cancer is one of the most frequent cancers in the world with very high mortality. According to WHO data, ∼945,000 new colorectal cancer cases are registered worldwide, and almost 492,000 colorectal cancer–related deaths occur every year (1). Hence, the early diagnosis, the discrimination between genetically and expressionally different tumors, and in view of these, the enhancement of therapies, become necessary. The 5-year survival data also emphasize the importance of an early diagnosis of colorectal cancer. The 5-year survival rate is 80% to 90% in early colorectal cancer, 60% in case of nodal involvement, and <10% in metastatic colorectal cancer.
According to the widely accepted adenoma-dysplasia-carcinoma sequence, most of the colorectal cancer develop on the basis of villous adenomas (2, 3). Recently published, however, was the concept of a “serrated neoplasia pathway” referring to a pattern of progression of colorectal cancer that involves hyperplastic polyps and serrated adenomas (4). The serrated pathway culminates in colorectal cancers with DNA microsatellite instability, mutation of BRAF, and extensive DNA methylation (5-7). Iino et al. (8) suggested that MSI-L hyperplastic polyps may be precursors of the subset (10%) of colorectal cancers showing the MSI-L phenotype.
Gene expression analysis of colon biopsies using high-density oligonucleotide microarrays may help to detect such gene expression patterns that would establish the basis for new molecular biological diagnostic methods. Utilization of mRNA expression microarray data for diagnostic purposes has already begun. More and more scientific studies appear to focus on the gene expression background of colorectal cancer progression and metastasis development (9-18), characterization of colorectal cancer subtypes according to mRNA expression (12, 18, 19), the correlation of gene expression profile with clinicopathologic variables (12, 18, 20, 21), and mRNA expression–based prognosis (22). In addition to the surgical and biopsy tissue samples, mRNA expression analysis of peripheral blood samples may also play a crucial role in the establishment of early molecular-based diagnostics and prognostics of tumorous diseases (23-27). The handling and the evaluation of the huge amount of data collected by microarray analyses require an extensive bioinformatical background. Multivariate statistical analysis is needed for the development of automatic diagnostic disease classification methods.
We have previously reported the discriminative mRNA expression signatures between colorectal cancer versus normal, adenoma versus normal, inflammatory bowel disease (IBD) versus normal samples, and between the early and advanced stages of colorectal cancer (28). However, the gene expression profile–based classification of colonic diseases for diagnostic purposes has not yet been solved. The results of the HGU133 Plus 2.0 whole genomic microarrays—which were also used in our study—in colorectal diseases have been published by only five research groups (29-33), and only two of them used biopsy samples (29, 30). Using Affymetrix microarrays, high-throughput disease-specific marker screening can be done. Our aims in this study were to develop diagnostic mRNA expression patterns for the objective classification of inflammatory, benign, and malignant colorectal diseases, and to compare the gene expression background of adenomas and hyperplastic polyps as the possible points of origin of colorectal cancer. Furthermore, we analyzed the presence of certain local colorectal cancer markers in peripheral blood that had been identified while using biopsy samples. This is necessary for the development of blood-based, disease-specific diagnostic screening.
Materials and Methods
Patients and Samples
After the informed consent of untreated patients, colon biopsy samples were taken during endoscopic intervention and stored in RNALater Reagent (Qiagen, Inc.) at −80°C. Additionally, 9 mL of peripheral blood samples of untreated patients were taken into Paxgene Blood RNA Tubes (Qiagen) before colonoscopy. The blood samples were also stored at −80°C. Altogether, 377 tissue samples (85 fresh frozen and 292 formalin-fixed paraffin-embedded tissue samples) and peripheral blood samples of 19 colorectal cancer and 11 healthy patients were analyzed in our study, as well as the blood smears of 10 healthy and 10 colorectal cancer patients. Total RNA was extracted, and Affymetrix microarray analysis was done on the biopsies of patients with tubulovillous/villous adenomas (n = 20, 11 with high-grade dysplastic and 9 with low-grade dysplasia), colorectal adenocarcinoma (n = 22), hyperplastic polyps (n = 11), and healthy normal controls (n = 11), as well as from peripheral blood samples of 19 patients with colorectal cancer and 11 healthy patients. Fifty-two microarrays (8 normal, 15 adenoma, 15 colorectal cancer, 14 IBD) had been hybridized earlier; their data files were used in a previously published study using different comparisons (28) and are available in the Gene Expression Omnibus database (series accession number: GSE4183). The data sets of the newly hybridized 63 microarrays are registered in the GSE10714 (33 microarrays from biopsy samples: 3 normal, 11 hyperplastic polyps, 5 adenoma, 7 colorectal cancer, 7 IBD) and in the GSE10715 (30 microarrays from blood samples: 19 colorectal cancer and 11 normal) serial accession numbers. The diagnostic groups and the number of patients in each group are represented in Table 1. Detailed patient specification is described in Supplementary Table S1.
Number of patients in the different disease groups
Group . | Biopsy samples, n = 85, original set . | . | . | Biopsy samples, n = 92, independent set . | Blood samples, n = 31 . | ||
---|---|---|---|---|---|---|---|
. | Affymetrix microarray . | Taqman RT-PCR . | Tissue microarray . | Tissue microarray . | Affymetrix microarray . | ||
Adenoma with low-grade dysplasia | 9 | 6 | — | — | — | ||
Adenoma with high-grade dysplasia | 11 | 6 | — | — | — | ||
CRC Dukes A-B | 10 | 6 | 2 | 20 | 7 | ||
CRC Dukes C-D | 12 | 4 | 2 | 21 | 12 | ||
Normal | 11 | 5 | 9 | 21 | 11 | ||
Hyperplastic polyp | 11 | — | — | — | — | ||
Ulcerative colitis | 12 | 7 | 10 | 8 | — | ||
Crohn's disease | 9 | — | 6 | 16 | — | ||
Undeterminate IBD | — | — | — | 7 | — | ||
Total patient numbers | 85 | 34 | 29 | 93 | 30 |
Group . | Biopsy samples, n = 85, original set . | . | . | Biopsy samples, n = 92, independent set . | Blood samples, n = 31 . | ||
---|---|---|---|---|---|---|---|
. | Affymetrix microarray . | Taqman RT-PCR . | Tissue microarray . | Tissue microarray . | Affymetrix microarray . | ||
Adenoma with low-grade dysplasia | 9 | 6 | — | — | — | ||
Adenoma with high-grade dysplasia | 11 | 6 | — | — | — | ||
CRC Dukes A-B | 10 | 6 | 2 | 20 | 7 | ||
CRC Dukes C-D | 12 | 4 | 2 | 21 | 12 | ||
Normal | 11 | 5 | 9 | 21 | 11 | ||
Hyperplastic polyp | 11 | — | — | — | — | ||
Ulcerative colitis | 12 | 7 | 10 | 8 | — | ||
Crohn's disease | 9 | — | 6 | 16 | — | ||
Undeterminate IBD | — | — | — | 7 | — | ||
Total patient numbers | 85 | 34 | 29 | 93 | 30 |
Abbreviation: CRC, colorectal cancer.
Methods
mRNA Expression Microarray Analysis. Total RNA was extracted using the RNeasy Mini Kit (Qiagen) for biopsy samples and the Paxgene Blood RNA Kit (Qiagen) for peripheral blood samples according to the manufacturers' instructions. The isolated peripheral blood RNA samples were concentrated using the GeneChip Blood RNA Concentration Kit (Affymetrix, Inc.). The quantity and the quality of the isolated RNA were tested by measuring the absorbance and agarose gelelectrophoresis or capillary gelelectrophoresis using the 2100Bioanalyzer and RNA 6000 Pico Kit (Agilent, Inc.). Biotinylated cRNA probes were synthesized from 5 to 8 μg total RNA and fragmented using the One-Cycle Target Labeling and Control Kit4
according to the Affymetrix description. In case of peripheral blood RNA samples, 5 μg total RNA was used for cRNA probe synthesis, and during reverse transcription Globin Reduction PNA oligomers (Applied Biosystems) were applied to reduce the amount of globin transcripts. Ten micrograms of each fragmented cRNA sample were hybridized into HGU133 Plus2.0 array (Affymetrix) at 45°C for 16 h. The slides were washed and stained using Fluidics Station 450 and an antibody amplification staining method according to the manufacturer's instructions. The fluorescent signals were detected by a GeneChip Scanner 3000.Statistical Evaluation of mRNA Expression Profiles
Preprocessing and Quality Control. Quality control analyses were done according to the suggestions of The Tumour Analysis Best Practices Working Group (34). Scanned images were inspected for artifacts, and the percentage of present calls (>25%) and control of the RNA degradation were evaluated. Based on the evaluation criteria, all biopsy measurements fulfilled the minimal quality requirements. The Affymetrix expression arrays were preprocessed by gcRMA with quantile normalization and median polish summarization. The data sets are available in the Gene Expression Omnibus databank for further analysis,5
series accession numbers GSE4183, GSE10714, and GSE10715.Further Analyses. To identify differentially expressed features, significance analysis of microarrays was used. The nearest shrunken centroid method (prediction analysis of microarrays) was applied for sample classification from gene expression data. For gene selection, the random forest classification algorithm was used (35), whereas the .632+ bootstrap method was applied to estimate the prediction error rate (36). The confusion matrix of the true and the predicted classes was visualized on agreement plots (37). The preprocessing, data mining, and statistical steps were done using R-environment with Bioconductor libraries.
Taqman Real-time PCR. TaqMan real-time PCR (RT-PCR) was used to measure the expression of 26 selected genes using an Applied Biosystems Micro Fluidic Card System. The selected genes belonged to the prediction analysis of microarrays top 200 genes in the colorectal cancer versus normal, adenoma versus normal, and IBD versus normal comparisons, and validated Taqman assays were available. The measurements were done using an ABI PRISM 7900HT Sequence Detection System as described in the product's user guide.6
The data analysis was described earlier (28). For data analysis, the SDS 2.2 software was used.Tissue Microarray Analysis and Blood Smear Immunocytochemistry. Cores of 1-mm diameter were collected from selected areas of formalin-fixed, paraffin-embedded tissue blocks made from 89 early colorectal cancer (stage Dukes B), 57 advanced colorectal cancer (stage Dukes C and D), 84 IBD (32 Crohn's disease, 40 ulcerative colitis, and 12 undeterminate IBD), and 62 normal colon samples of 122 patients and placed into recipient blocks. Tissue sections of 5-μm thickness were cut from the blocks and immunostained using the following antibodies: rabbit anti-human osteopontin (1:2,000 dilution; Chemicon), anti-osteonectin antibody (1:1,000 dilution; Chemicon), rabbit antihuman biglycan (1:200 dilution; Atlas), mouse anti-human collagen type IVα1 (1:300 dilution; Abcam, clone: COL-94), mouse anti-human vascular endothelial growth factor (1:2,000 dilution; Zymed, clone: VG 1), mouse anti-human von Willebrand factor (1:20; Dako, clone: F8/86), and mouse anti-human platelet-endothelial cell adhesion molecule 1 (1:40; Dako, clone: JC70A). Signal conversion was achieved using the EnVision+ kit (Dako) followed by 3,3′-diaminobenzidine–hydrogen peroxidase chromogen-substrate kit (Dako). Immunostained tissue microarray (TMA) slides were digitalized using a high-resolution Mirax Desk instrument (Zeiss) and analyzed with the Mirax TMA Module software (Zeiss). Protein expression was evaluated using an empirical scale considering intensity and occupied subcellular compartments of epithelial/carcinoma cells or lamina propria cells. For statistical analysis, Pearson's χ2 test and Fisher's exact test were done.
Blood smears of 10 healthy and 10 colorectal cancer patients were also immunostained using an anti-osteonectin antibody (1:1,000 dilution; Chemicon) and Alexa Fluor 488 F(ab′)2 fragment of goat anti-mouse IgG. The total and osteonectin-positive cells in 50 fields of view with 30× magnification were counted in each sample. For statistical analysis, a t test was done to evaluate the difference of osteonectin-positive/total cell number ratios between colorectal cancer and normal blood smears.
Results
Classifiers between the Main Diagnostic Groups
The minimal number of discriminatory transcripts with high specificity and sensitivity values was determined using prediction analysis of microarrays in each comparison. Adenoma samples were distinguished from hyperplastic polyps by 100% sensitivity and 90.91% specificity, according to the expression level of minimally nine genes including ATP-binding cassette family A, member 8, insulin-like growth factor 1 and glucagon. Sixty-one classifier probesets were identified between colorectal cancer and adenoma, including axin 2, von Willebrand factor, tensin 1, and gremlin 1 (sensitivity, 90.91% and specificity, 100%). IBD and normal biopsies could be distinguished by 100% sensitivity and specificity using only three classifiers (REG1A, MMP3, and CHI3L1). According to the expression of 20 transcripts (such as INDO, CXCL9, CCR2, CD38, RARRES3, and CXCL10 transcripts), IBD and colorectal cancer samples could be separated by 100% sensitivity and by 95.24% specificity. Further details can be seen in Table 2.
Discriminatory PAM transcripts between the diagnostic groups
Group vs group . | Minimum no. of discriminatory transcripts . | Sensitivity (%) . | Specificity (%) . | Including transcripts: . |
---|---|---|---|---|
Adenoma vs hyperplastic polyp | 9 | 100 | 90.91 | ABCA8, KIAA1199, GCG, MAMDC2, C2orf32, 229670_at, IGF1, PCDH7, PRDX6 |
IBD vs normal | 3 | 100 | 100 | REG1A, MMP3, CHI3L1 |
Adenoma vs CRC | 61 | 90.91 | 100 | GREM1*, DDR2*, GUCY1A3*, TNS1, ADAMTS1, FBLN1, FLJ38028, RDX, FAM129A, ASPN, FRMD6, MCC, RBMS1*, SNAI2, MEIS1, DOCK10, PLEKHC1, FAM126A, TBC1D9, VWF, DCN, ROBO1, MSRB3, LATS2, MEF2C*, IGFBP3*, GNB4, RCN3, AKAP12, RFTN1, 226834_at, COL5A1, GNG2, NR3C1*, SPARCL1, MAB21L2, AXIN2, 236894_at, AEBP1, AP1S2, C10orf56, LPHN2, AKT3, FRMD6, COL15A1, CRYAB, COL14A1, LOC286167, QKI, WWTR1, GNG11, PAPPA, ELDT1 |
IBD vs CRC | 20 | 100 | 95.24 | 227458_at, INDO, CXCL9, CCR2, CD38, RARRES3, CXCL10, FAM26F*, TNIP3, NOS2A, CCRL1, TLR8, IL18BP, FCRL5, SAMD9L, ECGF1, TNFSF13B, GBP5, GBP1 |
CRC-B vs CRC-CD | 34 | 100 | 66.67 | TMEM37*, IL33, CA4*, CCDC58, CLIC6, VSNL1*, ESPN, APCDD1, C13orf18, CYP4X1, ATP2A3, LOC646627, MUPCDH, ANPEP, C1orf115, HSD3B2, GBA3, GABRB2, GYLTL1B, LYZ, SPC25, CDKN2B, FAM89A, MOGAT2, SEMA6D, 229376_at, TSPAN5, IL6R, SLC26A2 |
Adenoma with low-grade dysplasia vs adenoma with high-grade dysplasia | 65 | 90.91 | 100 | SI, DMBT1, CFI*, AQP1, APOD, TNFRSF17, CXCL10, CTSE, IGHA1, SLC9A3, SLC7A1, BATF2, SOCS1, DOCK2, NOS2A, HK2, CXCL2, IL15RA, POU2AF1, CLEC3B, ANI3BP, MGC13057, LCK*, C4BPA, HOXC6, GOLT1A, C2orf32, IL10RA, 240856_at, SOCS3, MEIS3P1, HIPK1, GLS, CPLX1, 236045_x_at, GALC, AMN, CCDC69, CCL28, CPA3, TRIB2, HMGA2, PLCL2, NR3C1, EIF5A, LARP4, RP5-1022P6.2, PHLDB2, FKBP1B, INDO, CLDN8, CNTN3, PBEF1, SLC16A9, CDC25B, TPSB2, PBEF1, ID4, GJB5, CHN2, LIMCH1, CXCL9, MFAP4 |
UC vs CD | 58 | 77.78 | 100 | CCNG2, SLC44A4, DDAH1, TOB1, 231152_at, MKNK1, CEACAM7*, 1562836_at, CDC42SE2, PSD3, 231169_at, IGL@*, GSN, GPM6B, CDV3*, PDPK1, ANP32E, ADAM9, CDH1, NLRP2, 215777_at, OSBPL1, VNN1, RABGAP1L, PHACTR2, ASH1L, 213710_s_at, ZNF3, FUT2, IGHA1, EDEM1, GPR171, 229713_at, LOC643187, FLVCR1, SNAP23*, ETNK1, LOC728411, POSTN, MUC12, HOXA5, SIGLEC1, LARP5, PIGR, SPTBN1, UFM1, C6orf62, WDR90, ALDH1A3, F2RL1, IGHV1-69, DUOX2, RAB5A, CP |
Hyperplastic polyp vs normal | 15 | 100 | 100 | SLC6A14, ARHGEF10, ALS2, IL1RN, SPRY4, PTGER3, TRIM29, SERPINB5, 1560327_at, ZAK, BAG4, TRIB3, TTL, FOXQ1, UGT2A3 |
Adenoma with low-grade dysplasia vs normal | 3 | 100 | 100 | KLK11, KIAA1199, FOXQ1 |
Adenoma with high-grade dysplasia vs normal | 3 | 100 | 100 | CLDN8, ABCA8, PYY |
Adenoma vs normal | 3 | 100 | 100 | KIAA1199, FOXQ1, CA7 |
CRC vs normal | 5 | 100 | 100 | VWF, IL8, CHI3L1, S100A8, GREM1 |
Group vs group . | Minimum no. of discriminatory transcripts . | Sensitivity (%) . | Specificity (%) . | Including transcripts: . |
---|---|---|---|---|
Adenoma vs hyperplastic polyp | 9 | 100 | 90.91 | ABCA8, KIAA1199, GCG, MAMDC2, C2orf32, 229670_at, IGF1, PCDH7, PRDX6 |
IBD vs normal | 3 | 100 | 100 | REG1A, MMP3, CHI3L1 |
Adenoma vs CRC | 61 | 90.91 | 100 | GREM1*, DDR2*, GUCY1A3*, TNS1, ADAMTS1, FBLN1, FLJ38028, RDX, FAM129A, ASPN, FRMD6, MCC, RBMS1*, SNAI2, MEIS1, DOCK10, PLEKHC1, FAM126A, TBC1D9, VWF, DCN, ROBO1, MSRB3, LATS2, MEF2C*, IGFBP3*, GNB4, RCN3, AKAP12, RFTN1, 226834_at, COL5A1, GNG2, NR3C1*, SPARCL1, MAB21L2, AXIN2, 236894_at, AEBP1, AP1S2, C10orf56, LPHN2, AKT3, FRMD6, COL15A1, CRYAB, COL14A1, LOC286167, QKI, WWTR1, GNG11, PAPPA, ELDT1 |
IBD vs CRC | 20 | 100 | 95.24 | 227458_at, INDO, CXCL9, CCR2, CD38, RARRES3, CXCL10, FAM26F*, TNIP3, NOS2A, CCRL1, TLR8, IL18BP, FCRL5, SAMD9L, ECGF1, TNFSF13B, GBP5, GBP1 |
CRC-B vs CRC-CD | 34 | 100 | 66.67 | TMEM37*, IL33, CA4*, CCDC58, CLIC6, VSNL1*, ESPN, APCDD1, C13orf18, CYP4X1, ATP2A3, LOC646627, MUPCDH, ANPEP, C1orf115, HSD3B2, GBA3, GABRB2, GYLTL1B, LYZ, SPC25, CDKN2B, FAM89A, MOGAT2, SEMA6D, 229376_at, TSPAN5, IL6R, SLC26A2 |
Adenoma with low-grade dysplasia vs adenoma with high-grade dysplasia | 65 | 90.91 | 100 | SI, DMBT1, CFI*, AQP1, APOD, TNFRSF17, CXCL10, CTSE, IGHA1, SLC9A3, SLC7A1, BATF2, SOCS1, DOCK2, NOS2A, HK2, CXCL2, IL15RA, POU2AF1, CLEC3B, ANI3BP, MGC13057, LCK*, C4BPA, HOXC6, GOLT1A, C2orf32, IL10RA, 240856_at, SOCS3, MEIS3P1, HIPK1, GLS, CPLX1, 236045_x_at, GALC, AMN, CCDC69, CCL28, CPA3, TRIB2, HMGA2, PLCL2, NR3C1, EIF5A, LARP4, RP5-1022P6.2, PHLDB2, FKBP1B, INDO, CLDN8, CNTN3, PBEF1, SLC16A9, CDC25B, TPSB2, PBEF1, ID4, GJB5, CHN2, LIMCH1, CXCL9, MFAP4 |
UC vs CD | 58 | 77.78 | 100 | CCNG2, SLC44A4, DDAH1, TOB1, 231152_at, MKNK1, CEACAM7*, 1562836_at, CDC42SE2, PSD3, 231169_at, IGL@*, GSN, GPM6B, CDV3*, PDPK1, ANP32E, ADAM9, CDH1, NLRP2, 215777_at, OSBPL1, VNN1, RABGAP1L, PHACTR2, ASH1L, 213710_s_at, ZNF3, FUT2, IGHA1, EDEM1, GPR171, 229713_at, LOC643187, FLVCR1, SNAP23*, ETNK1, LOC728411, POSTN, MUC12, HOXA5, SIGLEC1, LARP5, PIGR, SPTBN1, UFM1, C6orf62, WDR90, ALDH1A3, F2RL1, IGHV1-69, DUOX2, RAB5A, CP |
Hyperplastic polyp vs normal | 15 | 100 | 100 | SLC6A14, ARHGEF10, ALS2, IL1RN, SPRY4, PTGER3, TRIM29, SERPINB5, 1560327_at, ZAK, BAG4, TRIB3, TTL, FOXQ1, UGT2A3 |
Adenoma with low-grade dysplasia vs normal | 3 | 100 | 100 | KLK11, KIAA1199, FOXQ1 |
Adenoma with high-grade dysplasia vs normal | 3 | 100 | 100 | CLDN8, ABCA8, PYY |
Adenoma vs normal | 3 | 100 | 100 | KIAA1199, FOXQ1, CA7 |
CRC vs normal | 5 | 100 | 100 | VWF, IL8, CHI3L1, S100A8, GREM1 |
Abbreviations: CD, Crohn's disease; UC, ulcerative colitis.
Transcripts represented by more probesets.
Beside pair-wise comparisons, the random forest classification was also done to distinguish between the above-mentioned diagnostic groups (Fig. 1). The estimated prediction error was 12.9%. The main diagnostic groups could be distinguished according to the mRNA expression levels of 18 genes, including cell cycle and cell proliferation regulatory genes (retinoic acid responder 3, LATS large tumor suppressor homologue 2, mutated in colorectal cancers, WARS), COP1 apoptosis gene, HLA-DMA, APOL3, GBP2, SLAMF8 inflammatory response related genes, SPARC-like 1 calcium-binding extracellular matrix gene, SLC15A3 oligopeptide transporter, as well as IFN regulatory factor 1, and quaking homologue transcription and mRNA processing genes. The exact functions of several classifier genes (FAM26F, SAMD9L, GBP4, GIMAP5) have not yet been identified.
Random forest. A. Heat map of the diagnostic groups separated using the random forest classification method. The heatmap visualizes the expression level of genes (rows) that were selected as classifiers using the random forest supervised machine learning method. One can realize the difference of gene expression according to the different diagnostic groups (columns). B. Agreement plot for visualization of the confusion matrix of the true and the predicted classes. The agreement plot is the representation of the strength of agreement in the confusion matrix of the observed (true) and predicted classes. The prediction of each sample was based on the classifier using the genes presented on heatmap. Black areas show the observed agreement positioned within larger rectangles representing the maximum possible agreement, given the marginal totals. Gray areas represent the degree of disagreement. AD, adenoma; CRC, colorectal cancer; IBD, inflammatory bowel disease.
Random forest. A. Heat map of the diagnostic groups separated using the random forest classification method. The heatmap visualizes the expression level of genes (rows) that were selected as classifiers using the random forest supervised machine learning method. One can realize the difference of gene expression according to the different diagnostic groups (columns). B. Agreement plot for visualization of the confusion matrix of the true and the predicted classes. The agreement plot is the representation of the strength of agreement in the confusion matrix of the observed (true) and predicted classes. The prediction of each sample was based on the classifier using the genes presented on heatmap. Black areas show the observed agreement positioned within larger rectangles representing the maximum possible agreement, given the marginal totals. Gray areas represent the degree of disagreement. AD, adenoma; CRC, colorectal cancer; IBD, inflammatory bowel disease.
Identification of Subclassifier Transcripts
The successful subdivision of IBD to ulcerative colitis and Crohn's disease was achieved by the expression of 58 genes such as cyclin G2, dual oxidase 2 and CEACAM7 (sensitivity 77.78%, specificity 100%). Adenomas with low-grade and high-grade dysplasia could be distinguished using 65 classifier probesets such as aquaporin 1, CXCL10, and complement factor 1 (sensitivity: 90.91%, specificity: 100%). Early and advanced stage colorectal carcinomas were differentiated by 34 discriminatory transcripts including transmembrane protein 37, interleukin 33, carbonic anhydrase 4, visinin-like 1, ubiquitous calcium-transporting ATPase, and CDK inhibitor 2B by high specificity (100%) and somewhat lower sensitivity values (66.67%; Table 2).
Expression of the Colorectal Cancer–Associated Tissue Markers in Peripheral Blood
The differentially expressed genes were determined by significance analysis of microarrays between colorectal cancer samples and healthy normal controls. The presence of these local tissue-specific mRNA expression markers in peripheral blood samples was also analyzed using the blood samples of 19 colorectal cancer and 11 healthy patients. Fifty-two transcripts were significantly up-regulated both in biopsy specimen and the peripheral blood of colorectal cancer patients compared with healthy normal controls. Three genes (SLC26A2 sulfate transporter, 227682_at, and UDP-glucose dehydrogenase) showed significantly decreased mRNA level both in colorectal cancer biopsy and blood samples compared with normals. In some colorectal cancer–related transcripts, mRNA expression in blood changed in the opposite way compared with their levels in cancer tissue. Seventeen genes showing elevated mRNA expression in colorectal cancer biopsy samples were down-regulated in the peripheral blood of colorectal cancer patients, whereas 12 genes underexpressed in colorectal cancer tissue were found to be overexpressed in colorectal cancer blood samples (Table 3.).
Correlation between colorectal cancer versus normal biopsy and peripheral blood results
Gene symbol . | Probeset ID . | |
---|---|---|
Up-regulated in CRC compared with normal in both biopsy and blood samples | ||
TPM4 | 212481_s_at | |
SESTD1 | 226763_at | |
TTYH3 | 224674_at | |
TIMP1 | 201666_at | |
CD44 | 212014_x_at | |
TM9SF4 | 212194_s_at | |
PIM3 | 224739_at | |
PELO | 218472_s_at | |
C6orf145 | 212923_s_at | |
SFXN3 | 217226_s_at | |
MYL9 | 201058_s_at | |
CD44 | 210916_s_at | |
CD44 | 204490_s_at | |
VCAN | 221731_x_at | |
CD44 | 209835_x_at | |
VCAN | 204620_s_at | |
VCAN | 211571_s_at | |
TGFBI | 201506_at | |
PLXND1 | 38671_at | |
TKT | 208700_s_at | |
VCAN | 215646_s_at | |
PF4 | 206390_x_at | |
CD44 | 1557905_s_at | |
IFITM3 | 212203_x_at | |
S100A11 | 200660_at | |
NA | 228910_at | |
G6PD | 202275_at | |
AP1M1 | 223025_s_at | |
ZC3H12A | 218810_at | |
FSCN1 | 210933_s_at | |
NDE1 | 218414_s_at | |
IER3 | 201631_s_at | |
PEA15 | 200787_s_at | |
PTP4A3 | 206574_s_at | |
IMPDH1 | 204169_at | |
PRKCDBP | 213010_at | |
DDEF1 | 224796_at | |
ESAM | 225369_at | |
CCDC85B | 204610_s_at | |
MGC7036 | 227983_at | |
IFITM2 | 201315_x_at | |
IFITM1 | 201601_x_at | |
COL18A1 | 209082_s_at | |
RAB31 | 217762_s_at | |
FLNA | 214752_x_at | |
TMEM158 | 213338_at | |
CTSK | 202450_s_at | |
ENC1 | 201340_s_at | |
ICAM1 | 202638_s_at | |
INTS1 | 212212_s_at | |
PI3 | 203691_at | |
NA | 227041_at | |
Down-regulated in CRC compared with normal in both biopsy and blood samples | ||
SLC26A2 | 224959_at | |
NA | 227682_at | |
UGDH | 203343_at | |
Up-regulated in CRC compared with normal in biopsy, down-regulated in blood samples | ||
RANBP2 | 201712_s_at | |
DNAJC10 | 225174_at | |
CRKRS | 225694_at | |
SLC39A6 | 202088_at | |
SLC39A6 | 202089_s_at | |
DIS3 | 222607_s_at | |
ELK3 | 221773_at | |
DNAJC10 | 229588_at | |
RANBP5 | 211953_s_at | |
IL8 | 202859_x_at | |
RANBP2 | 226922_at | |
SACS | 213262_at | |
DNAJC10 | 221782_at | |
POT1 | 204354_at | |
GALNACT-2 | 218871_x_at | |
HS2ST1 | 203284_s_at | |
XPOT | 212160_at | |
Down-regulated in CRC compared with normal in biopsy, up-regulated in blood samples | ||
MTMR11 | 205076_s_at | |
ETHE1 | 204034_at | |
SULT1A3 | 209607_x_at | |
C9orf19 | 225604_s_at | |
AGXT2L2 | 226519_s_at | |
SULT1A2 | 207122_x_at | |
FCGRT | 218831_s_at | |
TRPM6 | 240389_at | |
SULT1A2 | 211385_x_at | |
SULT1A1 | 203615_x_at | |
ACADVL | 200710_at | |
C22orf16 | 224932_at |
Gene symbol . | Probeset ID . | |
---|---|---|
Up-regulated in CRC compared with normal in both biopsy and blood samples | ||
TPM4 | 212481_s_at | |
SESTD1 | 226763_at | |
TTYH3 | 224674_at | |
TIMP1 | 201666_at | |
CD44 | 212014_x_at | |
TM9SF4 | 212194_s_at | |
PIM3 | 224739_at | |
PELO | 218472_s_at | |
C6orf145 | 212923_s_at | |
SFXN3 | 217226_s_at | |
MYL9 | 201058_s_at | |
CD44 | 210916_s_at | |
CD44 | 204490_s_at | |
VCAN | 221731_x_at | |
CD44 | 209835_x_at | |
VCAN | 204620_s_at | |
VCAN | 211571_s_at | |
TGFBI | 201506_at | |
PLXND1 | 38671_at | |
TKT | 208700_s_at | |
VCAN | 215646_s_at | |
PF4 | 206390_x_at | |
CD44 | 1557905_s_at | |
IFITM3 | 212203_x_at | |
S100A11 | 200660_at | |
NA | 228910_at | |
G6PD | 202275_at | |
AP1M1 | 223025_s_at | |
ZC3H12A | 218810_at | |
FSCN1 | 210933_s_at | |
NDE1 | 218414_s_at | |
IER3 | 201631_s_at | |
PEA15 | 200787_s_at | |
PTP4A3 | 206574_s_at | |
IMPDH1 | 204169_at | |
PRKCDBP | 213010_at | |
DDEF1 | 224796_at | |
ESAM | 225369_at | |
CCDC85B | 204610_s_at | |
MGC7036 | 227983_at | |
IFITM2 | 201315_x_at | |
IFITM1 | 201601_x_at | |
COL18A1 | 209082_s_at | |
RAB31 | 217762_s_at | |
FLNA | 214752_x_at | |
TMEM158 | 213338_at | |
CTSK | 202450_s_at | |
ENC1 | 201340_s_at | |
ICAM1 | 202638_s_at | |
INTS1 | 212212_s_at | |
PI3 | 203691_at | |
NA | 227041_at | |
Down-regulated in CRC compared with normal in both biopsy and blood samples | ||
SLC26A2 | 224959_at | |
NA | 227682_at | |
UGDH | 203343_at | |
Up-regulated in CRC compared with normal in biopsy, down-regulated in blood samples | ||
RANBP2 | 201712_s_at | |
DNAJC10 | 225174_at | |
CRKRS | 225694_at | |
SLC39A6 | 202088_at | |
SLC39A6 | 202089_s_at | |
DIS3 | 222607_s_at | |
ELK3 | 221773_at | |
DNAJC10 | 229588_at | |
RANBP5 | 211953_s_at | |
IL8 | 202859_x_at | |
RANBP2 | 226922_at | |
SACS | 213262_at | |
DNAJC10 | 221782_at | |
POT1 | 204354_at | |
GALNACT-2 | 218871_x_at | |
HS2ST1 | 203284_s_at | |
XPOT | 212160_at | |
Down-regulated in CRC compared with normal in biopsy, up-regulated in blood samples | ||
MTMR11 | 205076_s_at | |
ETHE1 | 204034_at | |
SULT1A3 | 209607_x_at | |
C9orf19 | 225604_s_at | |
AGXT2L2 | 226519_s_at | |
SULT1A2 | 207122_x_at | |
FCGRT | 218831_s_at | |
TRPM6 | 240389_at | |
SULT1A2 | 211385_x_at | |
SULT1A1 | 203615_x_at | |
ACADVL | 200710_at | |
C22orf16 | 224932_at |
Taqman RT-PCR Validation of 26 Selected Genes
The expression of all the 11 (6 up-regulated and 5 down-regulated in microarray analysis) adenoma-associated genes, 15 of the 18 colorectal cancer–related genes (15 overexpressed and 3 underexpressed), and all the 14 ulcerative colitis–associated genes (13 up-regulated and 1 down-regulated) correlated significantly with the Affymetrix results (P < 0.05). On average, the mRNA expression of 93% of the selected genes was verified by Taqman RT-PCR (Table 4.).
Taqman RT-PCR confirmation of the Affymetrix microarray results
Taqman ID . | Gene symbol . | Gene name . | Affymetrix ID . | Sample groups . | P < 0.05 . | ddCt . |
---|---|---|---|---|---|---|
Hs00153304_m1 | CD44 | CD44 antigen | 212014_x_at | AD vs normal | 1.82E−07 | 1.90 |
Hs00171022_m1 | CXCL12 | Chemokine (C-X-C motif) ligand 12 | 209687_at | AD vs normal | 0.00305 | −2.04 |
CRC vs normal | 0.00735 | −1.95 | ||||
Hs00179845_m1 | MET | Met proto-oncogene | 203510_at | AD vs normal | 1.41E−06 | 2.17 |
CRC vs normal | 0.00002 | 1.53 | ||||
Hs00200350_m1 | ABCA8 | ATP-binding cassette, subfamily A (ABC1), member 8 | 204719_at | AD vs normal | 0.000610 | −3.35 |
CRC vs normal | 0.00143 | −3.20 | ||||
Hs00205545_m1 | ADAMDEC1 | ADAM-like, decysin 1 | 206134_at | AD vs normal | 1.16E−05 | −3.69 |
CRC vs normal | 9.18E−05 | −2.74 | ||||
Hs00214306_m1 | TRPM6 | Transient receptor potential cation channel, subfamily M, member 6 | 224412_s_at | AD vs normal | 5.79E−05 | −4.73 |
UC vs normal | 0.000385 | −4.63 | ||||
Hs00153408_m1 | MYC | v-myc myelocytomatosis viral oncogene homologue (avian) | 202431_s_at | AD vs normal | 5.99E−06 | 2.35 |
Hs00171558_m1 | TIMP1 | Tissue inhibitor of metalloproteinase 1 | 201666_at | AD vs normal | 3.90E−07 | 2.58 |
CRC vs normal | 0.00153 | 2.74 | ||||
UC vs normal | 0.000219 | 2.36 | ||||
Hs00236937_m1 | CXCL1 | Chemokine (C-X-C motif) ligand 1 | 204470_at | CRC vs normal | 0.0114 | 3.84 |
UC vs normal | 1.11E−05 | 4.04 | ||||
Hs00236966_m1 | CXCL2 | Chemokine (C-X-C motif) ligand 2 | 209774_x_at | CRC vs normal | 0.00204 | 3.70 |
UC vs normal | 0.000592 | 3.68 | ||||
Hs00266139_m1 | CA1 | Carbonic anhydrase I | 205950_s_at | AD vs normal | 0.000930 | −6.13 |
Hs00194353_m1 | LCN2 | Lipocalin 2 | 212531_at | AD vs normal | 2.67E−07 | 6.13 |
CRC vs normal | 0.000509 | 4.83 | ||||
UC vs normal | 2.15E−06 | 5.06 | ||||
Hs00154230_m1 | CALU | Calumenin | 214845_s_at | CRC vs normal | 0.0145 | 1.60 |
Hs00169795_m1 | VWF | von Willebrand factor | 202112_at | CR vs normal | 0.55142 | |
UC vs normal | 0.000112 | 2.44 | ||||
Hs00266237_m1 | COL4A1 | Collagen, type IV, α 1 | 211980_at | CRC vs normal | 0.0283 | 3.38 |
Hs00156076_m1 | BGN | Biglycan | 213905_x_at | CRC vs normal | 0.12042 | |
Hs00169777_m1 | PECAM1 | Platelet/endothelial cell adhesion molecule | 208983_s_at | CRC vs normal | 0.76378 | |
Hs00174103_m1 | IL8 | Interleukin 8 | 202859_x_at | CRC vs normal | 0.0283 | 7.21 |
UC vs normal | 6.80E−06 | 5.77 | ||||
Hs00204187_m1 | DUOX2 | Dual oxidase 2 | 219727_at | UC vs normal | 7.84E−05 | 6.35 |
Hs00195812_m1 | LIPG | Lipase, endothelial | 219181_at | AD vs normal | 0.000588 | 1.35 |
CRC vs normal | 0.00711 | 1.08 | ||||
UC vs normal | 0.000588 | 1.35 | ||||
Hs00829485_sH | IFITM2 | IFN induced transmembrane protein 2 (1-8D) | 201315_x_at | CRC vs normal | 0.00114 | 2.26 |
Hs00171061_m1 | CXCL3 | Chemokine (C-X-C motif) ligand 3 | 207850_at | CRC vs normal | 0.00384 | 3.22 |
UC vs normal | 7.48E−05 | 3.58 | ||||
Hs00277299_m1 | IL1RN | Interleukin 1 receptor antagonist | 212657_s_at | CRC vs normal | 0.00714 | 4.66 |
UC vs normal | 1.10E−05 | 5.30 | ||||
Hs00234579_m1 | MMP9 | Matrix metalloproteinase 9 | 203936_s_at | UC vs normal | 0.00724 | 1.85 |
Hs00160066_m1 | PI3 | Protease inhibitor 3, skin-derived (SKALP) | 203691_at | UC vs normal | 0.000257 | 4.26 |
Hs00197374_m1 | UBD | Ubiquitin D | 205890_s_at | UC vs normal | 0.000261 | 3.20 |
Taqman ID . | Gene symbol . | Gene name . | Affymetrix ID . | Sample groups . | P < 0.05 . | ddCt . |
---|---|---|---|---|---|---|
Hs00153304_m1 | CD44 | CD44 antigen | 212014_x_at | AD vs normal | 1.82E−07 | 1.90 |
Hs00171022_m1 | CXCL12 | Chemokine (C-X-C motif) ligand 12 | 209687_at | AD vs normal | 0.00305 | −2.04 |
CRC vs normal | 0.00735 | −1.95 | ||||
Hs00179845_m1 | MET | Met proto-oncogene | 203510_at | AD vs normal | 1.41E−06 | 2.17 |
CRC vs normal | 0.00002 | 1.53 | ||||
Hs00200350_m1 | ABCA8 | ATP-binding cassette, subfamily A (ABC1), member 8 | 204719_at | AD vs normal | 0.000610 | −3.35 |
CRC vs normal | 0.00143 | −3.20 | ||||
Hs00205545_m1 | ADAMDEC1 | ADAM-like, decysin 1 | 206134_at | AD vs normal | 1.16E−05 | −3.69 |
CRC vs normal | 9.18E−05 | −2.74 | ||||
Hs00214306_m1 | TRPM6 | Transient receptor potential cation channel, subfamily M, member 6 | 224412_s_at | AD vs normal | 5.79E−05 | −4.73 |
UC vs normal | 0.000385 | −4.63 | ||||
Hs00153408_m1 | MYC | v-myc myelocytomatosis viral oncogene homologue (avian) | 202431_s_at | AD vs normal | 5.99E−06 | 2.35 |
Hs00171558_m1 | TIMP1 | Tissue inhibitor of metalloproteinase 1 | 201666_at | AD vs normal | 3.90E−07 | 2.58 |
CRC vs normal | 0.00153 | 2.74 | ||||
UC vs normal | 0.000219 | 2.36 | ||||
Hs00236937_m1 | CXCL1 | Chemokine (C-X-C motif) ligand 1 | 204470_at | CRC vs normal | 0.0114 | 3.84 |
UC vs normal | 1.11E−05 | 4.04 | ||||
Hs00236966_m1 | CXCL2 | Chemokine (C-X-C motif) ligand 2 | 209774_x_at | CRC vs normal | 0.00204 | 3.70 |
UC vs normal | 0.000592 | 3.68 | ||||
Hs00266139_m1 | CA1 | Carbonic anhydrase I | 205950_s_at | AD vs normal | 0.000930 | −6.13 |
Hs00194353_m1 | LCN2 | Lipocalin 2 | 212531_at | AD vs normal | 2.67E−07 | 6.13 |
CRC vs normal | 0.000509 | 4.83 | ||||
UC vs normal | 2.15E−06 | 5.06 | ||||
Hs00154230_m1 | CALU | Calumenin | 214845_s_at | CRC vs normal | 0.0145 | 1.60 |
Hs00169795_m1 | VWF | von Willebrand factor | 202112_at | CR vs normal | 0.55142 | |
UC vs normal | 0.000112 | 2.44 | ||||
Hs00266237_m1 | COL4A1 | Collagen, type IV, α 1 | 211980_at | CRC vs normal | 0.0283 | 3.38 |
Hs00156076_m1 | BGN | Biglycan | 213905_x_at | CRC vs normal | 0.12042 | |
Hs00169777_m1 | PECAM1 | Platelet/endothelial cell adhesion molecule | 208983_s_at | CRC vs normal | 0.76378 | |
Hs00174103_m1 | IL8 | Interleukin 8 | 202859_x_at | CRC vs normal | 0.0283 | 7.21 |
UC vs normal | 6.80E−06 | 5.77 | ||||
Hs00204187_m1 | DUOX2 | Dual oxidase 2 | 219727_at | UC vs normal | 7.84E−05 | 6.35 |
Hs00195812_m1 | LIPG | Lipase, endothelial | 219181_at | AD vs normal | 0.000588 | 1.35 |
CRC vs normal | 0.00711 | 1.08 | ||||
UC vs normal | 0.000588 | 1.35 | ||||
Hs00829485_sH | IFITM2 | IFN induced transmembrane protein 2 (1-8D) | 201315_x_at | CRC vs normal | 0.00114 | 2.26 |
Hs00171061_m1 | CXCL3 | Chemokine (C-X-C motif) ligand 3 | 207850_at | CRC vs normal | 0.00384 | 3.22 |
UC vs normal | 7.48E−05 | 3.58 | ||||
Hs00277299_m1 | IL1RN | Interleukin 1 receptor antagonist | 212657_s_at | CRC vs normal | 0.00714 | 4.66 |
UC vs normal | 1.10E−05 | 5.30 | ||||
Hs00234579_m1 | MMP9 | Matrix metalloproteinase 9 | 203936_s_at | UC vs normal | 0.00724 | 1.85 |
Hs00160066_m1 | PI3 | Protease inhibitor 3, skin-derived (SKALP) | 203691_at | UC vs normal | 0.000257 | 4.26 |
Hs00197374_m1 | UBD | Ubiquitin D | 205890_s_at | UC vs normal | 0.000261 | 3.20 |
Abbreviation: AD, adenoma.
TMA Analysis and Blood Smear Immunocytochemistry Results
In accordance with mRNA expression results, elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4α1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in colorectal cancer compared with healthy controls. Moderate cytoplasmatic osteopontin and osteonectin staining was found in the apical cytoplasm of epithelial cells in healthy colon tissue. Both osteonectin and osteopontin showed moderate to strong diffuse cytoplasmatic staining in colorectal cancer samples. Osteonectin protein expression was also significantly increased in blood smears of colorectal cancer patients (osteonectin positive mononuclear cells, 20.89% ± 2.16%) compared with the normal (6.72% ± 2.65%; P = 6.35 × 10−9; Supplementary Fig. S1). In colorectal cancer cases, strong subepithelial BGN immunostaining was found in lamina proprial myofibroblast like cells and leukocytes. No epithelial BGN immunoreactivity was detected. Most of the normal samples were negative for BGN, but in some cases weak apical epithelial BGN immunostaining was found, and no subepithelial labeling was seen. Whereas all normal samples were negative for Col4A1, certain carcinomatous cells showed a moderate to strong epithelial Col4A1 immunostaining in colorectal cancer samples. There was no lamina propria immunoreactivity. Regarding vWF, there was moderate epithelial immunostaining in carcinomatous cells in colorectal cancer samples, and some vWF immunoreactivity was also seen scattered in the lamina propria whereas in normal cases no mucosal immunostaining was seen. Subepithelial MMP9 immunostaining was found to be moderate and strong in lamina proprial leukocytes in colorectal cancer cases but not in carcinomatous epithelium. A diffuse weak intracytoplasmatic epithelial immunoreactivity was seen in normal samples. In case of vascular endothelial growth factor, epithelial immunoreactivity was found to be moderate to strong diffusely in carcinomatous cells of colorectal cancer samples. The subepithelium showed a moderate reaction. Weak to moderate subepithelial and luminal epithelial vascular endothelial growth factor immunoreactivity was found in almost all normal samples (Fig. 2).
Immunostainings in TMA sections. A. Osteonectin immunostaining in CRC (A1) and healthy colonic mucosa (A2). B. Osteopontin immunostaining in CRC (B1) and healthy colonic mucosa (B2). C. Biglycan immunostaining in CRC (C1) and healthy colonic mucosa (C2). D. Collagen 4A1 immunostaining in CRC (D1) and healthy colonic mucosa (D2). E. von Willebrand factor immunostaining in CRC (E1) and healthy colonic mucosa (E2). F. MMP9 immunostaining in CRC (F1) and healthy colonic mucosa (F2). G. VEGF immunostaining in CRC (G1) and healthy colonic mucosa (G2). H. PECAM1 protein expression in active IBD (H1) and normal colonic tissue (H2). I. Collagen 4A1 protein expression in active IBD (I1) and healthy colonic mucosa (I2). The white arrows show the colonic epithelial cells. Elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4α1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in CRC compared with healthy controls. In proportion to normal tissue, overexpression of PECAM1, and collagen 4α1 proteins was found in IBD.
Immunostainings in TMA sections. A. Osteonectin immunostaining in CRC (A1) and healthy colonic mucosa (A2). B. Osteopontin immunostaining in CRC (B1) and healthy colonic mucosa (B2). C. Biglycan immunostaining in CRC (C1) and healthy colonic mucosa (C2). D. Collagen 4A1 immunostaining in CRC (D1) and healthy colonic mucosa (D2). E. von Willebrand factor immunostaining in CRC (E1) and healthy colonic mucosa (E2). F. MMP9 immunostaining in CRC (F1) and healthy colonic mucosa (F2). G. VEGF immunostaining in CRC (G1) and healthy colonic mucosa (G2). H. PECAM1 protein expression in active IBD (H1) and normal colonic tissue (H2). I. Collagen 4A1 protein expression in active IBD (I1) and healthy colonic mucosa (I2). The white arrows show the colonic epithelial cells. Elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4α1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in CRC compared with healthy controls. In proportion to normal tissue, overexpression of PECAM1, and collagen 4α1 proteins was found in IBD.
In comparison with normal tissue, PECAM1 and collagen 4α1 proteins were overexpressed in IBD in accordance with the up-regulated mRNA levels detected by microarrays. In IBD samples there was a strong subepithelial PECAM1 immunoreaction in lamina proprial leukocytoid cells. There was no epithelial immunoreaction in any of the normal samples. In several IBD samples a weak Col4α1 immunoreaction was found compared with normals. No subepithelial immunostaining could be detected (Fig. 2).
Discussion
In this study, 85 colonic biopsy samples and 30 peripheral blood samples were analyzed in total by whole genomic expression microarrays to identify local tissue classifiers between the diagnostic groups and to analyze the presence of the tissue expression markers in peripheral blood.
In the daily routine, the situation where the biopsy sample taken during the endoscopic intervention is not evaluable in the appropriate manner by conventional histology occurs relatively frequently. Diagnostic expression profile from the whole biopsy specimen can overcome the sampling error failures in histology.
For the objective, molecular-based classification of the biopsy samples into main diagnostic groups, classifier transcript sets were determined. Functional analysis of significant genes can provide important information, because with the identification of the main signaling pathways, the key genes characterizing the given pathomechanism can be found and used for diagnostic analysis.
Because an IBD, especially the long-standing ulcerative colitis, is a precancerous condition, the analysis of IBD specimen is important to find early the adenoma-dysplasia-carcinoma sequence–related genes.
All three IBD classifiers have been hypothesized to show increased expression in IBD. In case of a tissue injury associated with IBD, REG1A (regenerating islet-derived 1α) mRNA was observed to be highly expressed in colonic mucosa (38). The protein product of this gene has a positive regulatory effect on cell proliferation (39), and may contribute to reduce epithelial apoptosis in inflammation (38). Matrix metalloproteinase 3 (MMP3), involved in wound repair and tumor initiation, was also up-regulated in IBD (40). Microarray analysis done by Mizoguchi et al. indicated that the third classifier, chitinase 3-like 1 (CHI3L1) is overexpressed specifically in inflamed mucosa. CHI3L1 plays a pathogenic role in colitis, presumably by enhancing the adhesion and invasion of bacteria on/into colonic epithelial cells (41). Dysregulated host/microbial interactions seem to play a central role in the pathogenesis of IBD.
Analyzed by function, most of the colorectal cancer versus adenoma discriminatory genes are involved in intracellular signal transduction (GNG11, latrophilin, AKAP12, ELTD1, tensin 1, axin 2, GNB4, ELTD1), cell proliferation (IGFBP3, MCC, LATS2), cell adhesion (ROBO1, AEBP1, VWF, collagen 15A1, DDR2, PLEKHC1), and transcription regulation (like NR3C1, WWTR1, MEIS1, MEF2C, SNAI2). However, the functions of several discriminatory transcripts are still unknown. For instance, gremlin 1 (GREM1) which is represented among the classifiers with two probesets as an antagonist of BMP, may play a role in regulating organogenesis, body patterning, and tissue differentiation. It was overexpressed in various human tumors including carcinomas of the lung, ovary, kidney, breast, colon, pancreas, and sarcoma (42).
Polyps could be classified into adenomatous and hyperplastic polyps according to the expression levels of nine transcripts. The ABCA8 ABC transporter, which was previously found to be underexpressed in colorectal cancer (28, 43), showed decreased expression in adenoma compared with hyperplastic polyp samples. Lower glucagon mRNA levels in adenomas may refer to the altered intestinal barrier function (44) and disordered cell proliferation regulation. Interestingly, IGF1, overexpression of which is closely associated with the early stage of colorectal carcinogenesis (45), was found to be more intensely expressed in hyperplastic polyps than in adenomas. The lower perixoredoxin 6 expression may indicate weaker protection against oxidative stress in adenomas. The exact functions of the MAMDC2, C2orf32, 229670_at, and KIAA1199 discriminatory transcripts have not yet been clarified.
The colorectal cancer versus IBD discriminatory genes are mainly immune and defense response–related genes (like CXCL9, CXCL10 chemokine ligands, CCR2, CCRL1 chemokine receptors, interleukin 18 binding protein, GBP1, GBP5, NOS2A, INDO, TNFSF13B, toll-like receptor 8, 227458_at) which showed decreased mRNA levels in colorectal cancer compared with IBD samples. CD38 expressed mainly in leukocytes is involved in cell adhesion and signal transduction, RARRES3 is a negative regulator of cell proliferation, whereas ECGF1 is a growth factor with angiogenic effects. RARRES3 has been reported to act as a tumor suppressor or growth regulator (46). Its decreased expression in colorectal cancer seems to support this assumption. Autocrine production of ECGF1 by endothelial cells may be a mechanism of inflammatory angiogenesis but not tumor angiogenesis and might be particularly important for the maintenance of damaged vasculature in IBD (47). The functions of some newly identified expression markers (FAM26F, FCRL5, SAMD9L, TNIP3) are unclarified.
The main diagnostic groups (colorectal cancer, IBD, adenomas, hyperplastic polyps) can be distinguished according to the mRNA expression levels of 18 genes determined by the random forest classification method with a 12.9% prediction error.
Besides the objective classification of the samples into main diagnostic groups, the differentiation among disease subtypes is also important for the improvement of the molecular-based diagnostics.
A relatively high number of classifiers is required for differentiation between high-grade and low-grade dysplastic villous adenomas. Several tumorigenesis-related discriminatory transcripts (such as HIPK1, CDC25B, CXCL2, and HMGA2) were found to be overexpressed in high-grade dysplastic adenoma referring to the high risk of colorectal cancer development (13, 14, 43, 48, 49). Homeodomain interacting protein kinase 1 (HIPK1) may thus play a role in tumorigenesis, perhaps by regulating the expression of p53 and/or Mdm2 (48). A correlation has previously been shown between the presence of HMGI proteins and the expression of a highly malignant phenotype in epithelial and fibroblastic rat thyroid cells. Moreover, HMGA2 seems to be involved in colorectal carcinogenesis (49).
Most of the colorectal cancer subtype classificators are involved in transport processes (calcium ion transport: transmembrane protein 37, ubiquitous calcium-transporting ATPase, CLIC6 chloride transporter, CYP4X1 electron transporter, GABRB2 chloride channel, SLC26A2 sulfate transporter), in metabolic processes (carbonic anhydrase 4, UDP glucuronosyltransferase 2 A3 polypeptide, glycosyltransferase-like 1B, monoacylglycerol O-acyltransferase 2), in cell adhesion and motility (espin, mucin-like protocadherin, tetraspanin 5), in signal transduction (visinin-like 1, C13orf18), and in cell cycle regulation (SPC25 kinetochore complex component, CDK inhibitor 2B). Visinin-like 1 (VSNL1) was overexpressed in neuroblastoma tumor specimens from patients with distant organ metastases compared with those without metastases (50). Decreased expression of the CDKN2B (alias p15) tumor suppressor gene is also typical in advanced colorectal cancer.
The future perspectives are to state the diagnosis and to perform screening using a more easily available sample source such as peripheral blood, and the further required diagnostic-therapeutic steps may be done with the help of them. However, WBC circulating in the peripheral blood tour all tissues of the body, and gene expression changes in them are affected by more conditions than the gene expression patterns in local tissue alterations. It is important to find the tissue markers that appear also in peripheral blood and can be specific for a given organic alteration.
Several colorectal cancer–associated tissue markers changed in peripheral blood parallel to the locally measured expression levels. Genes showing up-regulation in both biopsy and peripheral blood samples of colorectal cancer patients compared with normal controls are mainly involved in cell adhesion (like CD44, TGFβ1, ICAM1, versican, collagen 18A1, pelota homologue endothelial cell adhesion molecule), cell proliferation (such as IFITM1, IFITM2, TIMP1, fascin homologue 1), and intracellular signal transduction (including S100A11, filamin A, and DDEF1), whereas the functions of nine transcripts (like CCDC85B, TM9SF4, C6orf145, and TMEM158) have not yet been identified. The gene signals may come from peripheral blood mononucleic cells, as well as from circulating tumor cells. Previously, we reported a significantly positive correlation between the number of circulating tumor cells and clinical properties of colorectal cancer (51). The underexpressed genes in both biopsy and blood samples are involved in metabolism (UGDH) and sulfate transport (SLC26A2), whereas the function of 227682_at is unknown. In some colorectal cancer–related transcripts, mRNA expression in blood changed in the opposite way compared with their levels in cancer tissue. This phenomenon may relate to secondary immunologic processes including tumor-infiltrating lymphocytes rather than circulating tumor cells.
The expression of selected IBD- and colorectal cancer–associated genes was also measured at protein level, on 292 tissue sections of 29 overlapping and 93 independent sets of patients. TMA technology allowed the standardized analysis of a large number of samples within a short time and the validation of some of the mRNA expression results. In accordance with mRNA expression results, elevated protein levels of osteonectin, osteopontin, biglycan, collagen 4A1, von Willenbrand factor, MMP9, and vascular endothelial growth factor were detected in colorectal cancer compared with healthy controls. Osteonectin protein expression in blood smears of colorectal cancer patients was also significantly elevated compared with normal controls. Overexpression of PECAM1 and collagen 4α1 proteins was detected in IBD compared with normal tissue, in accordance with the up-regulated mRNA levels detected by microarray.
In conclusion, whole genomic microarray analysis using routine biopsy samples may be suitable for the identification of discriminative signatures for differential diagnostic purposes. Our results may serve as a basis of new gene expression pattern–based diagnostic methods like Taqman and/or LightCycler 480 real-time PCR cards. As the mRNA expression results showed a strong correlation with the protein level expression, simultaneous analysis of protein marker sets can also take place. Nowadays, antibodies recognizing a wide range of proteins in formol-paraffin tissue sections are available, offering immunostaining of disease-specific markers as a simple test for daily diagnostic utilization.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).
Acknowledgments
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
We thank László Nagy, M.D., Ph.D., and Beáta Scholtz, Ph.D., for their help with the Taqman real-time PCR analysis, and Gabriella Kónya for her work in preparing the TMA immunostainings.