Abstract
Purpose: Early detection of colorectal cancer (CRC) and its precursor lesions is an effective approach to reduce CRC mortality rates. This study aimed to identify novel protein biomarkers for the early diagnosis of CRC.
Experimental Design: Proximal fluids are a rich source of candidate biomarkers as they contain high concentrations of tissue-derived proteins. The FabplCre;Apc15lox/+ mouse model represents early-stage development of human sporadic CRC. Proximal fluids were collected from normal colon and colon tumors and subjected to in-depth proteome profiling by tandem mass spectrometry. Carcinoembryonic antigen (CEA) and CHI3L1 human serum protein levels were determined by ELISA.
Results: Of the 2,172 proteins identified, quantitative comparison revealed 192 proteins that were significantly (P < 0.05) and abundantly (>5-fold) more excreted by tumors than by controls. Further selection for biomarkers with highest specificity and sensitivity yielded 52 candidates, including S100A9, MCM4, and four other proteins that have been proposed as candidate biomarkers for human CRC screening or surveillance, supporting the validity of our approach. For CHI3L1, we verified that protein levels were significantly increased in sera from patients with adenomas and advanced adenomas compared with control individuals, in contrast to the CRC biomarker CEA.
Conclusion: These data show that proximal fluid proteome profiling with a mouse tumor model is a powerful approach to identify candidate biomarkers for early diagnosis of human cancer, exemplified by increased CHI3L1 protein levels in sera from patients with CRC precursor lesions. Clin Cancer Res; 18(9); 2613–24. ©2012 AACR.
Novel biomarkers are needed to improve the current colorectal cancer (CRC) screening tests. Large-scale protein biomarker discovery by tandem mass spectrometry from human blood is challenging due to sample complexity and interindividual genetic heterogeneity. We here describe an alternative approach in which the influence of confounding factors is strongly reduced by in-depth proteome profiling of proximal fluids from colon (tumor) tissues obtained from a mouse model for human sporadic CRC. The validity of this approach is supported by identification of multiple biomarkers that are known candidates for CRC screening and verification of increased serum levels of one of these markers (CHI3L1) in patients with CRC precursor lesions. These data indicate that tens of novel candidate biomarkers for early detection of CRC were identified and imply that proteome profiling of proximal fluids using mouse models for human disease offers a powerful and generally applicable strategy to boost cancer protein biomarker discovery.
Introduction
More than one million people are diagnosed with colorectal cancer (CRC) each year, and currently about half of these patients die from this disease (1). Development of CRC is a multistep process that results from accumulation of (epi)genetic changes that affect biologic functions required to maintain tissue homeostasis. Mutations in the adenomatous polyposis coli (APC) tumor suppressor gene play a rate-limiting role in the majority of sporadic CRCs by activation of the Wnt signal transduction pathway that stimulates transformation of normal colon epithelium resulting in formation of adenomas. Moreover, APC mutations increase genetic instability, which promotes accumulation of additional genomic alterations that enhance tumor progression and malignant behavior (2). Importantly, the development and progression of benign lesions into invasive and metastatic carcinomas is a complex process that takes many years, which provides a realistic window of opportunity for detecting colon adenomas and early-stage (curable) CRC by screening of asymptomatic individuals (3, 4). To this end, low-cost, easy-to-apply stool- or serum-based tests with CRC-related biomarkers are either widely used or under investigation. Several randomized trials have shown that CRC screening with the fecal occult blood test (FOBT) reduces CRC incidence by about 20% and CRC mortality by up to 33% (5). However, the test performance of the assays that measure blood proteins in feces leaves room for improvement for which novel biomarkers are urgently needed.
Protein biomarkers are well suited for development of in vitro diagnostic tests. One strategy to identify novel biomarkers for blood-based CRC detection is to compare protein content of serum samples from patients with cancer with that of healthy control subjects. Although the advantage of such an approach is that new biomarkers would be discovered directly in a biofluid that can be used for cancer screening, its discovery rate is seriously hampered by sample complexity. The total dynamic concentration range of blood proteins spans 11 orders of magnitude (6), whereas current high-resolution mass spectrometry methods are only capable of detecting proteins at concentrations that span up to 4 orders of magnitude, typically restricted to the most abundant proteins within a given biologic sample. Tumor-derived proteins are strongly diluted in the blood circulation, and therefore the concentration of the vast majority of these proteins in blood will fall below the detection limits. Other complicating factors concern the diversity of human tissue and biofluid sample collections due to genetic and environmental heterogeneity of the human population. Collectively, these confounding factors cause considerable biologic variation between human samples, which significantly hampers biomarker discovery (7).
We applied a biomarker discovery strategy in which the confounding effects of sample complexity and sample heterogeneity were strongly reduced. Concerning sample complexity, the concentration of tissue-excreted proteins is highest in fluids in close proximity to the tissue source itself, further referred to as “proximal fluids.” Proximal fluids contain proteins that are secreted, shed by membrane vesicles, or externalized because of cell death. Therefore, proximal fluids provide a promising avenue for biomarker discovery (8–10). Concerning sample heterogeneity, the use of inbred mouse models for human disease strongly reduces the biologic variation due to genetic and environmental heterogeneity. Moreover, the initial molecular changes in disease pathogenesis in genetically engineered mouse models are well defined, and the stage of tumor development at the time of tissue or biofluid sampling is well controlled (11–13). We here report identification of 52 promising candidate CRC biomarkers upon in-depth proteome profiling of proximal fluids using a mouse model for colon tumorigenesis and exemplify their relevance for early diagnosis of human CRC by showing increased CHI3L1 protein levels in sera from patients with adenomas, advanced adenomas, and carcinomas compared with control subjects.
Materials and Methods
Materials
All chemicals were obtained from Sigma (Sigma-Aldrich). High-performance liquid chromatography (HPLC) solvents, liquid chromatography/mass spectrometry (LC/MS)-grade water, acetonitrile, and formic acid were obtained from Biosolve (Biosolve B.V.). Porcine sequence grade–modified trypsin was obtained from Promega (Promega Benelux B.V.).
Mice
Animal studies were approved by the Animal Experimentation Ethics Committee of the VU University Medical Center (VUmc; Amsterdam, The Netherlands), according to local and governmental regulations. FabplCre;Apc15lox/+ mice are highly predisposed to colon tumor development due to truncation of one allele of the Apc tumor suppressor gene in gut epithelial cells, whereas Apc15lox/+ control littermates do not exhibit colonic aberrations (14, 15). FabplCre;Apc15lox/+ C57Bl/6 mice and Apc15lox/+ C57Bl/6 littermates were generated by mating FabplCre C57Bl/6 mice with Apc15lox/15lox C57Bl/6 mice. Genotypes were determined by PCR. All mice were housed in individually ventilated cages with drinking water and food available ad libitum.
Collection of colon tissue proximal fluid samples
Mice were sacrificed by asphyxiation in CO2 at 202 days of age and colon tissues were collected immediately. Colon tumors were dissected in one piece from FabplCre;Apc15lox/+ C57Bl/6 mice (2 females, 1 male). Likewise, size-matched normal colon pieces were obtained from age- and gender-matched Apc15lox/+ C57Bl/6 mice. The freshly dissected tissues were briefly rinsed in PBS to remove stool products and transferred to Eppendorf tubes. A volume of 50 to 100 μL of PBS was added, just sufficient to immerse the whole tissue. Tissue samples were incubated at 37°C for 1 hour, followed by gentle centrifugation (2,000 rpm at 4°C for 2 minutes). The soluble fractions were transferred to new Eppendorf tubes and centrifuged at maximum speed to remove remaining cells and debris (13,200 rpm at 4°C for 20 minutes). The soluble fractions, further referred to as “proximal fluids,” were transferred again to new Eppendorf tubes and stored at −80°C until further use. The normal colon and colon tumor tissues were processed for immunohistochemical studies, by standard formalin fixation and paraffin embedding.
GeLC/MS-MS
Several workflows for label-free quantitative secretome proteomics were previously compared and evaluated in our laboratory. We here applied 1-dimensional gel electrophoresis followed by nano-liquid chromatography coupled to tandem mass spectrometry (GeLC/MS-MS) as described by Piersma and colleagues (16), that is, the workflow that yielded the highest number of proteins that could be identified in a reproducible manner. See Supplementary Materials and Methods for a more detailed description.
Database searching
Tandem mass spectrometry (MS-MS) spectra were searched against the mouse IPI database v3.31 (56,555 entries) with Sequest (version 27, rev 12), which is part of the BioWorks 3.3 data analysis package (Thermo Fisher). After database searching, the DTA and OUT files were imported into Scaffold 2_01_01 (Proteome Software). Scaffold was used to organize the gel-band data and to validate peptide identifications by the Peptide Prophet Algorithm (17). Only identifications with a probability of more than 95% were retained. Subsequently, the Protein Prophet algorithm (18) was applied and protein identifications with a probability of more than 99% with 2 peptides or more in at least one of the samples were retained. Proteins that contained similar peptides and could not be differentiated on the basis of MS-MS analysis were grouped. For each protein identified, the number of assigned spectra was exported to Excel. Additional general protein information was retrieved by Ingenuity Pathway Analysis (IPA version 7.5; Ingenuity Systems, Inc.).
Quantitation of data
Spectral counting was used for label-free quantitation of the proteomics data (19, 20). Data were normalized by dividing the number of spectral counts for each protein within a sample by the sum of spectral counts of that particular sample, multiplied by the average of total sample counts. Next, to estimate fold changes in protein abundance between colon tumor proximal fluid and normal colon proximal fluid samples, ratios of spectral counts (RSC values) were calculated by the following formula: RSC = log2[(n2 + f)/(n1 + f)] + log2[(t1 − n1 + f)/(t2 − n2 + f)]. In this formula, RSC is the log2 ratio of protein abundance between tumor and control samples, n1 and n2 equal the sum of spectral counts of one protein in the control or tumor samples, respectively, and t1 and t2 equal the total number of spectral counts of all proteins in the control or tumor samples, respectively. The f-value is a correction factor that prevents division by zero and has been set to 0.5 (19).
Statistical evaluation
Statistical evaluation was conducted with the beta-binomial test, which takes into account the discrete nature of spectral counting data and models both within-sample variation and between-sample variation, within a single statistical framework (21). Here, the beta-binomial test was applied to identify proteins that show statistically significant differences in spectral count numbers between the group of colon tumor proximal fluid samples and the group of normal colon proximal fluid samples. An R implementation of the test was used. Subsequently, the Benjamini and Hochberg method was used to adjust the P values for multiple testing (22).
Immunohistochemistry
Four-micrometer thick, formalin-fixed, paraffin-embedded sections of normal colon and colon tumor tissues previously used for collection of proximal fluids were deparaffinized and rehydrated, followed by immunohistochemical stainings. Endogenous peroxidases were neutralized with 0.3% hydrogen peroxide in methanol for 30 minutes. Staining for S100A9 was done upon antigen retrieval by microwave heating in citrate buffer (10 mmol/L, pH 6.0). The primary goat polyclonal antibody directed against mouse S100A9 (catalog number AF2065; R&D Systems) was incubated overnight at a 1:50 dilution at 4°C and subsequently detected through a standard streptavidin-biotinylated peroxidase complex with diaminobenzidine (DAKO). Staining for MCM4 was done upon antigen retrieval by autoclave heating in Tris/EDTA buffer (pH 9.0). The primary rabbit polyclonal antibody directed against MCM4 (catalog number NB100-1822; Novus Biologicals) was incubated overnight at a 1:500 dilution at 4°C and subsequently detected by an Envision–horseradish peroxidase system (DAKO). Slides were counterstained with Mayer's hematoxylin and dehydrated in alcohol and xylene before mounting.
Human serum collection
From 2009 to 2010, serum samples were collected from individuals who underwent colonoscopy in a diagnostic setting at the VUmc. Common indications for colonoscopy were irritable bowel syndrome (abdominal pain, change in bowel habits, bloating, diarrhea, and constipation) and gastrointestinal bleeding. Approval of the Institutional Review Board of VUmc was obtained before the start of the study. Informed consent was obtained from all the participants. Blood was collected in BD Vacutainer Plus plastic serum tube red (Becton, Dickinson and Company), allowed to clot at room temperature for a maximum of 1 hour, centrifuged at room temperature for 10 minutes at 1,500 × g, and stored at −80°C. Colonoscopy and histology were considered the gold standard for presence of adenomas, advanced adenomas [defined as an adenoma ≥ 1.0 cm, or an adenoma with a villous or tubulovillous architecture, or with high-grade dysplasia (ref. 23)], and adenocarcinomas. Subjects with an incomplete colonoscopy or in which bowel preparation was insufficient, as judged by the individual endoscopist, were excluded for further analysis. Hemolytic sera and sera from patients with a history of cancer or inflammatory bowel disease were also excluded for further analysis. In total sera that were collected from 41 females and 45 males, composed of sera from control subjects (n = 36) and patients with adenomas (n = 20), advanced adenomas (n = 22), and CRC (n = 8). Clinical information about the study participants is provided in Supplementary Table S1.
Determination of CHI3L1 and carcinoembryonic antigen serum concentrations
CHI3L1 serum levels were determined with a sandwich-type ELISA (Quidel Corporation) according to the manufacturer's instructions. Color intensity of the samples was measured at 405 nm with a Victor2 plate reader (Perkin-Elmer). Carcinoembryonic antigen (CEA) serum levels were measured on an Advia Centaur platform with an immunometric assay using luminescence detection (Siemens Medical Solutions). Interassay variation at 5, 20, and 54 μg/L were 7%, 5%, and 4%, respectively. Statistical differences in protein levels between each of the patient groups and control subjects were evaluated using the Mann–Whitney U test.
Results
Proximal fluid proteome profiling
Normal colon and colon tumor tissues were obtained using the FabplCre;Apc15lox/+ mouse model for human sporadic CRC (15). Proximal fluids were collected from 3 freshly excised colon tumors obtained from independent FabplCre;Apc15lox/+ mice and from 3 size- and location-matched pieces of normal colon obtained from independent age- and gender-matched Apc15lox/+ mice. Protein content was analyzed by in-depth proteomics using 1-dimensional gel electrophoresis and GeLC/MS-MS. A schematic representation of the workflow is provided in Supplementary Fig. S1. The numbers of spectral counts obtained for colon tumor proximal fluid samples (20,763 ± 2,560) were similar to those obtained for normal colon proximal fluid samples (22,462 ± 2,432). A total of 2,172 proteins were identified, corresponding to 2,075 different mouse genes and 1,958 different known human homologs (Supplementary Table S2). Of these, 318 proteins were uniquely identified in proximal fluids from normal colon samples and 390 proteins were uniquely identified in colon tumor samples (Fig. 1A). Overall, 912 of 1,782 proteins (51%) were identified in all 3 normal colon samples (Fig. 1B) and 975 of 1,854 proteins (53%) in all 3 tumor samples (Fig. 1C), showing reproducible detection of many (>900) proteins in each of these complex biologic triplicates.
Classically secreted proteins are obvious candidates for putative detection in biofluids. On the basis of general protein information retrieved from IPA (Supplementary Table S2), of 1,747 unique genes with known subcellular location, only 187 were annotated as “extracellular space” (10.7%). However, plasma membrane, cytoplasmic, and nuclear proteins can also be excreted, either as nonclassically secreted proteins or through microvesicular transport. Therefore, the proteins identified were analyzed with SecretomeP 2.0 as a computational tool to predict their secretory potential (24), which revealed that about 87% of all proteins were potentially secreted (Supplementary Table S2). To estimate to what extent proteins might be excreted by tumor cells through vesicular transport as cargo of microvesicles (25), proximal fluid protein data were compared with a list of proteins identified by in-depth GeLC/MS-MS proteomics analysis of the microvesicle fraction and the soluble fraction of the human HT29 CRC cell line secretome (Supplementary Table S2 and data not shown). Of 930 HT29-secreted proteins that corresponded to a unique mouse proximal fluid homolog, 671 proteins (72%) were detected in the microvesicle fraction. Collectively, these data indicate that the majority of nonclassically secreted proximal fluid proteins do have the potential to be excreted into biofluids and should be considered as putative targets for blood- or stool-based early detection of CRC.
Selection of candidate CRC biomarkers
Ratios of spectral counts (RSC values) were calculated and revealed 192 CRC candidate biomarker proteins that were more than 5-fold excreted by tumors compared with controls (RSC > 2.32) with statistical significance (P < 0.05; Fig. 2 and Supplementary Table S2). Biomarker candidates with potentially highest specificity and sensitivity should be excreted abundantly by tumors while not being excreted by normal healthy colon tissue or by nonneoplastic diseases. Therefore, a more stringent selection was applied to these biomarker candidates on the basis of protein identification in each of the 3 proximal fluid tumor samples and complete absence from the 3 normal colon samples, leaving 58 candidates. Proteins belonging to pathways that are generally involved in diverse pathologic conditions such as “acute phase response signaling,” the “coagulation system,” and the “complement system” (Supplementary Table S2) were also excluded, leaving 54 candidates. Of these, 2 different protein IDs referred to one gene (Lmna) and for one protein, a human homolog was not known (Ngp). All together, application of these stringent biomarker selection criteria yielded a list of 52 highly promising candidate protein biomarkers for early detection of CRC (Table 1).
Accession number . | Mus musculus gene symbol . | Human homolog gene symbol . | Gene description . | Rsc valuea . | P . | BH-corrected Pb . | Adenoma vs. normal (mRNA)c . |
---|---|---|---|---|---|---|---|
IPI00350772 | Apob | APOB | Apolipoprotein B | 8.49 | 0.00009 | 0.021 | — |
IPI00117914 | Arg1 | ARG1 | Arginase, liver | 6.06 | 0.00005 | 0.019 | ▾ |
IPI00314783 | Avil | AVIL | Advillin | 6.32 | 0.00006 | 0.019 | ▾ |
IPI00123194 | Bgn | BGN | Biglycan | 4.81 | 0.00082 | 0.024 | — |
IPI00387337 | Bzw2 | BZW2 | Basic leucine zipper and W2domains 2 | 4.65 | 0.00041 | 0.023 | ▴ |
IPI00757359 | Caprin1 | CAPRIN1 | Cell-cycle–associated protein 1 | 5.82 | 0.00007 | 0.019 | ▴ |
IPI00308990 | Cd14 | CD14 | CD14 molecule | 4.78 | 0.00041 | 0.023 | ▾ |
IPI00138180 | Cdh5 | CDH5 | Cadherin 5, type 2 | 5.28 | 0.00033 | 0.023 | ▾ |
IPI00756207 | Cgn | CGN | Cingulin | 4.5 | 0.00051 | 0.023 | ▾ |
IPI00277478 | Chi3l1 | CHI3L1 | Chitinase 3-like 1 | 5.91 | 0.00064 | 0.023 | ▴ |
IPI00329872 | Col1a1 | COL1A1 | Collagen, type I, alpha 1 | 4.39 | 0.00105 | 0.025 | — |
IPI00121430 | Col12a1 | COL12A1 | Collagen, type XII, alpha 1 | 8.23 | 0.00026 | 0.023 | ▴ |
IPI00131476 | Col18a1 | COL18A1 | Collagen, type XVIII, alpha 1 | 5.93 | 0.00024 | 0.023 | ▾ |
IPI00123196 | Dcn | DCN | Decorin | 5.37 | 0.00047 | 0.023 | ▾ |
IPI00623114 | Fat1 | FAT1 | FAT tumor suppressor homolog 1 | 6.14 | 0.00055 | 0.023 | ▴ |
IPI00119581 | Fbl | FBL | Fibrillarin | 5.19 | 0.00061 | 0.023 | ▴ |
IPI00130095 | G3bp1 | G3BP1 | GTPase-activating protein (SH3 domain) binding protein 1 | 5.42 | 0.00028 | 0.023 | ▴ |
IPI00222208 | Hnrnpul2 | HNRNPUL2 | Heterogeneous nuclear ribonucleoprotein U-like 2 | 5.19 | 0.00025 | 0.023 | — |
IPI00120257 | 1500019G21Rik | HSPBP1 | Heat shock 70 kDa–binding protein, cytoplasmic cochaperone 1 | 4.9 | 0.00048 | 0.023 | ▴ |
IPI00113726 | Lama1 | LAMA1 | Laminin, alpha 1 | 5.49 | 0.00025 | 0.023 | ▾ |
IPI00230435 | Lmna | LMNA | Lamin A/C | 6.75 | 0.00086 | 0.024 | ▴ |
IPI00400300 | Lmna | LMNA | Lamin A/C | 5.01 | 0.00064 | 0.023 | |
IPI00134607 | EG243642 | LOC645018 | Ribosomal protein S2 pseudogene 20 | 4.83 | 0.00078 | 0.024 | n.a. |
IPI00107952 | Lyz2 | LYZ | Lysozyme | 4.39 | 0.00105 | 0.025 | ▴ |
IPI00108338 | Mcm3 | MCM3 | Minichromosome maintenance complex component 3 | 5.81 | 0.00012 | 0.023 | ▴ |
IPI00117016 | Mcm4 | MCM4 | Minichromosome maintenance complex component 4 | 6.19 | 0.00007 | 0.019 | ▴ |
IPI00319200 | Mmp9 | MMP9 | Matrix metallopeptidase 9 | 7.87 | 0.00139 | 0.029 | — |
IPI00132578 | Mrto4 | MRTO4 | mRNA turnover 4 homolog | 4.93 | 0.00115 | 0.026 | ▴ |
IPI00120066 | Prom1 | PROM1 | Prominin 1 | 4.42 | 0.00232 | 0.04 | — |
IPI00337844 | Ranbp2 | RANBP2 | RAN-binding protein 2 | 4.97 | 0.00028 | 0.023 | ▴ |
IPI00467338 | Rangap1 | RANGAP1 | Ran GTPase–activating protein 1 | 4.88 | 0.0004 | 0.023 | ▴ |
IPI00133185 | Rpl14 | RPL14 | Ribosomal protein L14 | 4.39 | 0.00198 | 0.036 | ▴ |
IPI00222546 | Rpl22 | RPL22 | Ribosomal protein L22 | 4.39 | 0.00105 | 0.025 | ▴ |
IPI00122421 | Rpl27 | RPL27 | Ribosomal protein L27 | 4.89 | 0.00026 | 0.023 | ▴ |
IPI00420726 | Rps9 | RPS9 | Ribosomal protein S9 | 5.14 | 0.00085 | 0.024 | ▴ |
IPI00315127 | Rrm1 | RRM1 | Ribonucleotide reductase M1 | 5.65 | 0.00093 | 0.024 | ▴ |
IPI00222556 | S100a9 | S100A9 | S100 calcium–binding protein A9 | 5.86 | 0.00099 | 0.025 | ▴ |
IPI00315280 | Sema7a | SEMA7A | Semaphorin 7A, GPI membrane anchor | 4.2 | 0.00113 | 0.026 | — |
IPI00459636 | Sf3b1 | SF3B1 | Splicing factor 3b, subunit 1 | 4.88 | 0.00032 | 0.023 | ▴ |
IPI00349401 | Sf3b2 | SF3B2 | Splicing factor 3b, subunit 2 | 4.98 | 0.0005 | 0.023 | ▴ |
IPI00606586 | Smc2 | SMC2 | Structural maintenance of chromosomes 2 | 4.5 | 0.00051 | 0.023 | ▴ |
IPI00137433 | Smchd1 | SMCHD1 | Structural maintenance of chromosomes flexible hinge domain containing 1 | 4.31 | 0.00136 | 0.029 | ▾ |
IPI00170008 | Snrpa1 | SNRPA1 | Small nuclear ribonucleoprotein polypeptide A' | 5.35 | 0.00023 | 0.023 | ▴ |
IPI00322749 | Snrpd1 | SNRPD1 | Small nuclear ribonucleoprotein D1 polypeptide | 4.39 | 0.00105 | 0.025 | ▴ |
IPI00310907 | Spon1 | SPON1 | Spondin 1 | 5 | 0.00023 | 0.023 | ▾ |
IPI00134344 | Spnb3 | SPTBN2 | Spectrin, beta, non-erythrocytic 2 | 5.13 | 0.00032 | 0.023 | — |
IPI00461781 | Stat1 | STAT1 | Signal transducer and activator of transcription 1 | 4.24 | 0.00131 | 0.028 | — |
IPI00126338 | Tmpo | TMPO | Thymopoietin | 4.86 | 0.00064 | 0.023 | ▴ |
IPI00122223 | Top2a | TOP2A | Topoisomerase (DNA) II alpha | 6.31 | 0.00005 | 0.019 | ▴ |
IPI00130734 | Tyms | TYMS | Thymidylate synthetase | 4.21 | 0.00117 | 0.027 | ▴ |
IPI00172312 | Vill | VILL | Villin-like | 5.43 | 0.00052 | 0.023 | ▾ |
IPI00139957 | Wdr5 | WDR5 | WD repeat domain 5 | 4.66 | 0.00081 | 0.024 | ▴ |
IPI00622283 | Xpo5 | XPO5 | Exportin 5 | 4.52 | 0.00058 | 0.023 | ▴ |
Accession number . | Mus musculus gene symbol . | Human homolog gene symbol . | Gene description . | Rsc valuea . | P . | BH-corrected Pb . | Adenoma vs. normal (mRNA)c . |
---|---|---|---|---|---|---|---|
IPI00350772 | Apob | APOB | Apolipoprotein B | 8.49 | 0.00009 | 0.021 | — |
IPI00117914 | Arg1 | ARG1 | Arginase, liver | 6.06 | 0.00005 | 0.019 | ▾ |
IPI00314783 | Avil | AVIL | Advillin | 6.32 | 0.00006 | 0.019 | ▾ |
IPI00123194 | Bgn | BGN | Biglycan | 4.81 | 0.00082 | 0.024 | — |
IPI00387337 | Bzw2 | BZW2 | Basic leucine zipper and W2domains 2 | 4.65 | 0.00041 | 0.023 | ▴ |
IPI00757359 | Caprin1 | CAPRIN1 | Cell-cycle–associated protein 1 | 5.82 | 0.00007 | 0.019 | ▴ |
IPI00308990 | Cd14 | CD14 | CD14 molecule | 4.78 | 0.00041 | 0.023 | ▾ |
IPI00138180 | Cdh5 | CDH5 | Cadherin 5, type 2 | 5.28 | 0.00033 | 0.023 | ▾ |
IPI00756207 | Cgn | CGN | Cingulin | 4.5 | 0.00051 | 0.023 | ▾ |
IPI00277478 | Chi3l1 | CHI3L1 | Chitinase 3-like 1 | 5.91 | 0.00064 | 0.023 | ▴ |
IPI00329872 | Col1a1 | COL1A1 | Collagen, type I, alpha 1 | 4.39 | 0.00105 | 0.025 | — |
IPI00121430 | Col12a1 | COL12A1 | Collagen, type XII, alpha 1 | 8.23 | 0.00026 | 0.023 | ▴ |
IPI00131476 | Col18a1 | COL18A1 | Collagen, type XVIII, alpha 1 | 5.93 | 0.00024 | 0.023 | ▾ |
IPI00123196 | Dcn | DCN | Decorin | 5.37 | 0.00047 | 0.023 | ▾ |
IPI00623114 | Fat1 | FAT1 | FAT tumor suppressor homolog 1 | 6.14 | 0.00055 | 0.023 | ▴ |
IPI00119581 | Fbl | FBL | Fibrillarin | 5.19 | 0.00061 | 0.023 | ▴ |
IPI00130095 | G3bp1 | G3BP1 | GTPase-activating protein (SH3 domain) binding protein 1 | 5.42 | 0.00028 | 0.023 | ▴ |
IPI00222208 | Hnrnpul2 | HNRNPUL2 | Heterogeneous nuclear ribonucleoprotein U-like 2 | 5.19 | 0.00025 | 0.023 | — |
IPI00120257 | 1500019G21Rik | HSPBP1 | Heat shock 70 kDa–binding protein, cytoplasmic cochaperone 1 | 4.9 | 0.00048 | 0.023 | ▴ |
IPI00113726 | Lama1 | LAMA1 | Laminin, alpha 1 | 5.49 | 0.00025 | 0.023 | ▾ |
IPI00230435 | Lmna | LMNA | Lamin A/C | 6.75 | 0.00086 | 0.024 | ▴ |
IPI00400300 | Lmna | LMNA | Lamin A/C | 5.01 | 0.00064 | 0.023 | |
IPI00134607 | EG243642 | LOC645018 | Ribosomal protein S2 pseudogene 20 | 4.83 | 0.00078 | 0.024 | n.a. |
IPI00107952 | Lyz2 | LYZ | Lysozyme | 4.39 | 0.00105 | 0.025 | ▴ |
IPI00108338 | Mcm3 | MCM3 | Minichromosome maintenance complex component 3 | 5.81 | 0.00012 | 0.023 | ▴ |
IPI00117016 | Mcm4 | MCM4 | Minichromosome maintenance complex component 4 | 6.19 | 0.00007 | 0.019 | ▴ |
IPI00319200 | Mmp9 | MMP9 | Matrix metallopeptidase 9 | 7.87 | 0.00139 | 0.029 | — |
IPI00132578 | Mrto4 | MRTO4 | mRNA turnover 4 homolog | 4.93 | 0.00115 | 0.026 | ▴ |
IPI00120066 | Prom1 | PROM1 | Prominin 1 | 4.42 | 0.00232 | 0.04 | — |
IPI00337844 | Ranbp2 | RANBP2 | RAN-binding protein 2 | 4.97 | 0.00028 | 0.023 | ▴ |
IPI00467338 | Rangap1 | RANGAP1 | Ran GTPase–activating protein 1 | 4.88 | 0.0004 | 0.023 | ▴ |
IPI00133185 | Rpl14 | RPL14 | Ribosomal protein L14 | 4.39 | 0.00198 | 0.036 | ▴ |
IPI00222546 | Rpl22 | RPL22 | Ribosomal protein L22 | 4.39 | 0.00105 | 0.025 | ▴ |
IPI00122421 | Rpl27 | RPL27 | Ribosomal protein L27 | 4.89 | 0.00026 | 0.023 | ▴ |
IPI00420726 | Rps9 | RPS9 | Ribosomal protein S9 | 5.14 | 0.00085 | 0.024 | ▴ |
IPI00315127 | Rrm1 | RRM1 | Ribonucleotide reductase M1 | 5.65 | 0.00093 | 0.024 | ▴ |
IPI00222556 | S100a9 | S100A9 | S100 calcium–binding protein A9 | 5.86 | 0.00099 | 0.025 | ▴ |
IPI00315280 | Sema7a | SEMA7A | Semaphorin 7A, GPI membrane anchor | 4.2 | 0.00113 | 0.026 | — |
IPI00459636 | Sf3b1 | SF3B1 | Splicing factor 3b, subunit 1 | 4.88 | 0.00032 | 0.023 | ▴ |
IPI00349401 | Sf3b2 | SF3B2 | Splicing factor 3b, subunit 2 | 4.98 | 0.0005 | 0.023 | ▴ |
IPI00606586 | Smc2 | SMC2 | Structural maintenance of chromosomes 2 | 4.5 | 0.00051 | 0.023 | ▴ |
IPI00137433 | Smchd1 | SMCHD1 | Structural maintenance of chromosomes flexible hinge domain containing 1 | 4.31 | 0.00136 | 0.029 | ▾ |
IPI00170008 | Snrpa1 | SNRPA1 | Small nuclear ribonucleoprotein polypeptide A' | 5.35 | 0.00023 | 0.023 | ▴ |
IPI00322749 | Snrpd1 | SNRPD1 | Small nuclear ribonucleoprotein D1 polypeptide | 4.39 | 0.00105 | 0.025 | ▴ |
IPI00310907 | Spon1 | SPON1 | Spondin 1 | 5 | 0.00023 | 0.023 | ▾ |
IPI00134344 | Spnb3 | SPTBN2 | Spectrin, beta, non-erythrocytic 2 | 5.13 | 0.00032 | 0.023 | — |
IPI00461781 | Stat1 | STAT1 | Signal transducer and activator of transcription 1 | 4.24 | 0.00131 | 0.028 | — |
IPI00126338 | Tmpo | TMPO | Thymopoietin | 4.86 | 0.00064 | 0.023 | ▴ |
IPI00122223 | Top2a | TOP2A | Topoisomerase (DNA) II alpha | 6.31 | 0.00005 | 0.019 | ▴ |
IPI00130734 | Tyms | TYMS | Thymidylate synthetase | 4.21 | 0.00117 | 0.027 | ▴ |
IPI00172312 | Vill | VILL | Villin-like | 5.43 | 0.00052 | 0.023 | ▾ |
IPI00139957 | Wdr5 | WDR5 | WD repeat domain 5 | 4.66 | 0.00081 | 0.024 | ▴ |
IPI00622283 | Xpo5 | XPO5 | Exportin 5 | 4.52 | 0.00058 | 0.023 | ▴ |
Abbreviations: ▾, downregulated; n.a, not available; —, no significant difference; ▴, upregulated.
aRsc value is Log2 ratio of spectral counts (tumors compared with controls).
bBH-corrected P value is P value adjusted for multiple testing (Benjamini–Hochberg).
cDifferential mRNA expression analysis of 32 human colorectal adenomas compared with patient-matched normal mucosa tissues [data set GSE8671; Sabates-Bellver and colleagues (ref. 30)]. Significant differences based on false discovery rate (FDR) < 0.05, adjusted for multiple testing (Benjamini–Hochberg).
Verification of MCM4 and S100A9 tissue expression
The list of 52 most promising candidate CRC biomarkers included several proteins that have been described as potential biomarkers for CRC screening. MCM4 belongs to the minichromosome maintenance complex that consists of 6 different MCM proteins (MCM2–7), which have been proposed as the potential biomarkers for stool-based detection of CRC (26). S100A9 (calgranulin B) has been described as a serum-based as well as a stool-based candidate biomarker for CRC (27, 28). Immunohistochemical stainings were conducted for MCM4 and S100A9 to verify their protein expression within the normal colon mucosa and colon tumor tissues from which the proximal fluids were collected (Fig. 3). MCM4 exhibited strong staining of nuclei of (proliferating) epithelial cells in the lower part of the crypts of normal colon mucosa (Fig. 3F). Within tumors, a far majority of neoplastic epithelial cells stained positive for MCM4 (Fig. 3D and E). S100A9 exhibited strong staining of nonepithelial cells, presumably myeloid cells, within the tumor stroma (Fig. 3H). Little to no staining was observed for S100A9 in normal mouse colon mucosa (Fig. 3I). These data verify differential tissue expression of proteins in normal colon and colon tumors that were identified by proteomics analysis of tissue proximal fluids.
Expression of mouse-derived candidate CRC biomarkers by human colon adenomas
Increased excretion of protein candidate biomarkers from tumor tissues compared with normal colon tissues can be caused by transcriptomics-dependent and -independent molecular mechanisms. We expected that at least a subset of the list of 52 most promising candidate CRC biomarkers would be regulated at the mRNA level during early stages of colon tumor development and examined their expression in a series of 32 human colorectal adenomas and patient-matched normal mucosa samples making use of a data set retrieved from the Gene Expression Omnibus (GEO) database [www.ncbi.nlm.nih.gov/geo/; ref. (29); data set GSE8671 (30)]. Differential expression analysis using GenePattern (31) revealed that 31 of 51 candidates for whom data were available were significantly higher expressed by tumor samples than by control samples (Table 1). Consequently, hierarchical clustering on the basis of mRNA gene expression of human homologs of the mouse colon tumor protein biomarker candidates succeeded to nearly completely separate the human colorectal adenoma samples from the normal mucosa samples (Supplementary Fig. S2). These data indicate that the majority of mouse-derived candidate CRC protein biomarkers were regulated at the mRNA expression level during the early stages of human colon tumor development and verify their potential as candidate biomarkers for diagnosis of early stages of human colon tumorigenesis.
CHI3L1 and CEA serum levels in patients with (advanced) adenomas
CHI3L1 (also known as YKL-40) is one of the highly promising candidate biomarkers for early detection of CRC (Table 1). CHI3L1 has been described as a candidate CRC biomarker, for which increased serum levels have been associated with poor survival (32, 33). Considering that the mouse model represents early rather than late stages of colon tumor development combined with the observation that CHI3L1 mRNA levels were increased in human adenomas when compared with normal colon tissue (Table 1), we investigated CHI3L1 protein levels in human sera from control subjects and patients with colon tumor (Supplementary Table S1). CHI3L1 protein levels were significantly increased in sera from patients with colorectal adenomas (P < 0.05), advanced adenomas (P < 0.001), and CRC (P < 0.01) compared with control subjects, with median CHI3L1 levels of 99.6, 141.2, and 215.7 ng/mL, respectively, versus 68.4 ng/mL for control subjects (Fig. 4A). In contrast, the CRC biomarker CEA was not increased in sera from patients with CRC precursor lesions (adenomas and advanced adenomas), whereas its expression was significantly elevated in sera from patients with CRC (P < 0.001), with median CEA levels of 1.60, 1.30, and 3.60 ng/mL, respectively, versus 1.15 ng/mL for control subjects (Fig. 4B). The sensitivity of CHI3L1 for adenomas, advanced adenomas, and CRC was 25%, 55%, and 75%, respectively, with a specificity of 89% (cutoff value for CHI3L1 at 90th percentile of control subjects, at 129 ng/mL). The sensitivity of CEA for adenomas, advanced adenomas, and CRC was only 5%, 5%, and 37.5%, respectively, with a specificity of 100% (cutoff value for CEA at 5 ng/mL). Receiver operating curves (ROC) for advanced adenomas and for CRC versus control subjects illustrate that CHI3L1 is superior to CEA for the detection of advanced adenomas with area under the ROC curve (AUC) values of 0.79 and 0.60 for CHI3L1 and CEA, respectively, whereas CEA tends to be a better marker for the detection of patients with CRC with AUC values of 0.81 and 0.86 for CHI3L1 and CEA, respectively (Supplementary Fig. S3). These data lend further support to the notion that our strategy resulted in identification of candidate biomarkers for early diagnosis of CRC.
Discussion
The present study aimed to identify novel protein biomarkers for early diagnosis of CRC. Proteomics-based biomarker discovery using human biofluids such as blood is challenging due to the influence of several major confounding factors, in particular, sample complexity and interindividual sample heterogeneity. By conducting in-depth proteomics analysis of proximal fluids obtained from a mouse tumor model for sporadic CRC, the influence of confounding factors was strongly reduced, thereby increasing the “signal-to-noise ratio” for protein biomarker discovery. We here report identification of 192 CRC candidate biomarkers, that is, proteins that were significantly (P < 0.05) and abundantly (>5-fold) excreted by tumors compared with controls, thereby generating one of the largest colon cancer protein biomarker data sets to date (10). Application of more stringent selection criteria to enrich for candidates with highest specificity and sensitivity revealed 52 biomarker candidates for early detection of CRC (Table 1). The potential relevance of this mouse-derived protein biomarker data set for early diagnosis of human CRC was underscored by several observations. First, at least 6 of the 52 candidate biomarkers have been proposed as biomarkers for stool- or serum-based human CRC screening or surveillance, that is, the MCM proteins MCM3 and MCM4, S100A9, CHI3L1, arginase I, and matrix metalloproteinase (MMP)9 (26–28, 32–36). Second, mRNA gene expression of its human homologs allowed to cluster the majority of 32 colorectal adenoma samples together, separate from the patient-matched normal mucosa control samples (Supplementary Fig. S2). And third, we showed that CHI3L1 protein levels were increased in sera from patients with adenomas and advanced adenomas, that is, CRC precursor lesions whereas CEA levels were not. These data exemplify the potential use of mouse-derived candidate CRC biomarkers for early diagnosis of human CRC and support the validity of our approach.
Immunohistochemical stainings were conducted for 2 candidate CRC biomarkers, S100A9 and MCM4, to verify whether the differences in protein abundance in proximal fluids, as measured by mass spectrometry, were mimicked by differences in protein expression levels in normal colon and colon tumor tissues from which these proximal fluids were obtained (Fig. 3). Positive staining for S100A9 was observed in nonneoplastic cells in the tumor stroma, probably from myeloid origin (Fig. 3H), whereas hardly any staining was observed for S100A9 in normal colon tissue (Fig. 3I). Similar differences in S100A9 staining patterns have been observed between human normal colon and colon tumors (27), indicating large quantitative variation for this protein due to the presence of tumor-infiltrating leukocytes. Tumor-induced upregulation of S100A9 is known to lead to accumulation of myeloid-derived suppressor cells and contributes to suppression of the antitumor immune response (37). It is likely that the candidate CRC biomarker list contains more examples of proteins that originate from nonneoplastic cells such as fibroblasts, immune cells, and endothelial cells, whose biologic properties have been altered by their presence in the tumor microenvironment. For instance, arginase I is typically expressed by tumor-associated myeloid cells with immunosuppressive properties (38). In accordance with these data, neither S100A9 nor arginase I were identified in the secretome of the human epithelial CRC cell line HT29 (Supplementary Table S2).
Immunohistochemical staining for MCM4 revealed its abundant expression by both mouse colon tumor tissue and normal colon tissue. However, whereas MCM4 expression in normal colon was restricted to nuclei of epithelial cells in the lower half of the crypts comprising the proliferative compartment (Fig. 3F), MCM4 was expressed by the far majority of neoplastic cells (Fig. 3D and E). Similar staining patterns were observed for human normal colon mucosa and CRC samples for all members of the MCM complex (MCM2–7), for instance, as shown by the Human Protein Atlas (http://www.proteinatlas.org/; ref. 39). Interestingly, all MCM proteins except MCM6 were included in the list of 192 CRC candidate biomarkers with significant (P < 0.05) and abundant (>5-fold) excretion into proximal fluids from tumor tissues compared with control tissues, whereas MCM6 just barely failed to pass these selection criteria (P < 0.05, and >4-fold excretion by tumors). Clearly, the abundant expression of MCM proteins by normal colon tissues does not lead to high levels of protein excretion into proximal fluids, indicating that there is not necessarily a straightforward correlation between the amount of tissue expression of a protein and its abundance in proximal fluids. These data suggest that MCM proteins may be excreted by tumor tissues through a molecular mechanism that is more active in neoplastic cells than in normal cells.
Besides MCM proteins, surprisingly many other nonclassically secreted proteins were identified in colon (tumor) proximal fluids. Although the computational tool SecretomeP predicted that about 87% of proximal fluid proteins may have the potential to be secreted, MCM2–5 and MCM7 did not pass the SecretomeP NN-score threshold of 0.5 (Supplementary Table S2). Alternatively, we hypothesized that proteins might be excreted through microvesicular transport because tumor cells are known to secrete microvesicles at an increased rate (25). Comparison of the mouse colon (tumor) proximal fluid proteome to a list of microvesicle-associated and soluble secreted proteins shed from the human CRC cell line HT29 revealed that all MCM proteins (MCM2–7) could be detected in the microvesicle fraction. Similar observations were made for other nuclear CRC candidate biomarkers, such as topoisomerase 2A (TOP2A) and lamin A/C (LMNA; Supplementary Table S2). Collectively, these data support the notion that many nonclassically secreted proteins actually do have the potential to be excreted into proximal fluids and subsequently biofluids such as blood and stool and therefore should be considered candidate targets for development of diagnostic tests. Further research is required to examine the exact molecular mechanisms through which each of these proteins is being excreted.
Several other proteins within the top-candidate biomarker list have been linked to CRC carcinogenesis in various ways. Decorin (DCN) has been described as a colon tumor suppressor gene (40, 41). Although its mRNA expression is downregulated in colorectal adenomas compared with normal mucosa (Table 1), its mRNA expression is known to be significantly increased during adenoma-to-carcinoma progression (42). Likewise, mRNA expression levels of biglycan (BGN) and the collagens COL1A1 and COL18A1 are significantly increased during adenoma-to-carcinoma progression (42). Prominin-1 (PROM1, also known as CD133) has been studied extensively as a marker for colon cancer–initiating cells and has prognostic value to predict patient survival (43, 44). Lamin A/C (LMNA) is a nuclear envelope protein that has been described as a risk biomarker for CRC (45, 46). TOP2A interacts with the β-catenin/TCF-4 nuclear complex of the Wnt signaling pathway (47) and can be targeted by chemotherapeutic drugs such as etoposide and doxorubicin. Thymidylate synthetase (TYMS) is considered to be the primary site of action of the commonly used chemotherapeutic drug 5-fluorouracil, and ribonucleotide reductase M1 (RRM1) can be targeted by gemcitabine. These data suggest that some of the candidate biomarkers for early diagnosis identified in this study may also be applied as prognostic biomarkers or predictive biomarkers.
Although the strategy we applied to discover novel biomarkers for early diagnosis of CRC seems valuable, the study design is accompanied by several limitations. For instance, because we made use of a mouse colon tumor model for biomarker discovery to reduce sample heterogeneity and molecular diversity of the tumors, the mouse model is unlikely to represent the extensive tumor heterogeneity observed among patients with CRC. Consequently it remains to be determined to what extent the candidate biomarkers can be used to identify molecularly heterogeneous colon tumors in human. For candidate biomarker verification, we made use of ELISA because this technique allows detection of low concentrations of proteins in human serum samples. To the best of our knowledge, the commercially available ELISAs for candidate biomarkers have all been used to some extent to measure protein levels in patients with CRC, leaving none of the candidates that could readily be verified as truly novel CRC biomarkers. Instead of focusing on patients with CRC, emphasis was put on the analysis of sera from patients with early-stage disease, that is, colon adenomas and advanced adenomas. These sera, however, were collected in a diagnostic setting (Supplementary Table S1), which does not reflect a screening population. Moreover, expression levels of the marker that was verified, CHI3L1, are known to be increased in several types of cancer and during inflammation (32, 33), which limits its potential use as a highly specific marker for early diagnosis of CRC. As such, CHI3L1 and other candidate biomarkers still await thorough validation before they can be considered valid biomarkers for CRC screening.
In conclusion, this study illustrated that comparative analysis of proximal fluid proteome profiles obtained from mouse tumor and control tissues is a powerful strategy to discover novel candidate biomarkers by examination of relatively few biologic samples. We succeeded to acquire a list of promising mouse-derived candidate biomarkers that appears highly relevant to human colon tumor biology. This list of candidate biomarkers can function as a “frame of reference” to facilitate candidate selection for further biomarker validation studies in human. Emerging technologies such as selected reaction monitoring (SRM) mass spectrometry allow targeted detection of tens to a hundred biomarker candidates simultaneously in an antibody-independent manner, using human biofluids (7). In this way, it will become feasible to investigate what combinations of markers have optimal test performance to develop better tests for early diagnosis of CRC.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Grant Support
The authors thank the financial support for this study provided by an Aegon International Scholarship in Oncology (R.J.A. Fijneman.) and by the VUmc–Cancer Center Amsterdam (C.R. Jimenez and T.V. Pham, and proteomics infrastructure).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.