Colorectal cancer is the second most common malignant tumor worldwide. Analysis of the changes that occur during colorectal cancer progression could provide insights into the molecular mechanisms driving colorectal cancer development and identify improved treatment strategies. In this study, we performed an integrated multiomic analysis of 435 trace tumor samples from 148 patients with colorectal cancer, covering nontumor, intraepithelial neoplasia (IEN), infiltration, and advanced stage colorectal cancer phases. Proteogenomic analyses demonstrated that KRAS and BRAF mutations were mutually exclusive and elevated oxidative phosphorylation in the IEN phase. Chr17q loss and chr20q gain were also mutually exclusive, which occurred predominantly in the IEN and infiltration phases, respectively, and impacted the cell cycle. Mutations in TP53 were frequent in the advanced stage colorectal cancer phase and associated with the tumor microenvironment, including increased extracellular matrix rigidity and stromal infiltration. Analysis of the profiles of colorectal cancer based on consensus molecular subtype and colorectal cancer intrinsic subtype classifications revealed the progression paths of each subtype and indicated that microsatellite instability was associated with specific subtype classifications. Additional comparison of molecular characteristics of colorectal cancer based on location showed that ANKRD22 amplification by chr10q23.31 gain enhanced glycolysis in the right-sided colorectal cancer. The AOM/DSS-induced colorectal cancer carcinogenesis mouse model indicated that DDX5 deletion due to chr17q loss promoted colorectal cancer development, consistent with the findings from the patient samples. Collectively, this study provides an informative resource for understanding the driving events of different stages of colorectal cancer and identifying the potential therapeutic targets.

Significance: Characterization of the proteogenomic landscape of colorectal cancer during progression provides a multiomic map detailing the alterations in each stage of carcinogenesis and suggesting potential diagnostic and therapeutic approaches for patients.

Colorectal cancer is a common malignant tumor, and its incidence and mortality rank the second and third worldwide, respectively (1). Colorectal cancer exhibits high heterogeneity and is characterized by the accumulation of mutations and complicated carcinogenesis progress (2), leading to a barrier for the standardization of surgical techniques and more effective systemic therapies for early-stage and advanced stage diseases.

Generally, colorectal cancers evolve through an adenoma–carcinoma sequence (3) and are classified as early-stage colorectal cancers (e.g., T1 stage) and advanced stage colorectal cancers including T2, T3, and T4 stages. Specifically, the T1 stage can be divided into the surface epithelial neoplasia stage [named intraepithelial neoplasia (IEN) phase in this study] and infiltration carcinoma stage [named infiltration (IFT) phase in this study; ref. 4]. According to invasion depth, the IFT phase is compartmentalized into the stages of lamina propria (LP), muscularis mucosa (MM), and submucous (SM). The recent advances have achieved early detection of patients with colorectal cancer with high quality of life and significantly improved overall survival rate (>90%; ref. 5), and the revelation of dominant mutations and key events may be conducive to exploring molecular mechanisms during colorectal cancer carcinogenesis and provide precise medication in the clinic of colorectal cancer. In practice, nevertheless, due to the complexity of colorectal cancer development and availability of a small quantity of tissue samples in colorectal cancer at different stages, it is still a challenge to portray the multiomic molecular landscape and the key transmit events from normal or curable disease to progressively fatal disease.

Large-scale cancer sequencing efforts, including The Cancer Genome Atlas (TCGA; ref. 6) and Clinical Proteomic Tumor Analysis Consortium (CPTAC; ref. 7), have advanced our understanding of colorectal cancer at the gene level, uncovered the frequent mutations (e.g., TP53 and KRAS; ref. 8), and proposed several critical pathways in the advanced stage colorectal cancer phase, including cell cycle and glycolysis (9, 10). However, the first occurrences of the mutations and their related effects in colorectal cancer progression are poorly understood. In addition, the landscape of genomic aberrations and consequence on the proteomic alterations and phosphoproteomic actions during colorectal cancer carcinogenesis are yet unknown. Based on transcriptomic datasets, the international consortium and CPTAC have well-established four consensus molecular subtypes (CMS, CMS1/2/3/4; ref. 11) and five colorectal cancer intrinsic subtypes (CRIS, CRIS-A/B/C/D/E; ref. 12), providing deeper insights into colorectal cancer classifications in the advanced stage colorectal cancer phase. However, the tracks of colorectal cancer based on CMS and CRIS classifications have not been revealed. Besides, microsatellite instability (MSI) and microsatellite stability (MSS) were presented in solid tumors including colorectal cancer (13). However, the proteomic features of MSI and MSS tumors of colorectal cancer were not presented yet.

Colorectal cancer based on locations could be classified as left-sided and right-sided colorectal cancer (14) with different embryologic and anatomic features and drug resistance profiles. Nevertheless, diverse characterizations of left-sided and right-sided colorectal cancer progression are still unclear. Clinically, patients with colorectal cancer with bilateral tumor (mix group) had poor prognosis and had a greater possibility of invasion compared with left-sided colorectal cancer alone and right-sided colorectal cancer alone groups (15). However, the features of the colorectal cancer mix group were not well characterized, and the potential invasion mechanism is undefined.

Here, we presented proteogenomic patterns of 435 trace samples dissected from 148 patients with colorectal cancer and depicted a comprehensive proteogenomic landscape of the entire processes in colorectal cancer carcinogenesis. Furthermore, we disclosed the key transmit events in colorectal cancer progression, delineated diverse progression paths of colorectal cancer based on CMS and CRIS classifications, illustrated the features of patients with colorectal cancer based on MSI status, defined the characteristics of colorectal cancer locations, and proposed the potential strategy responding to drug resistance in colorectal cancer. This study presents the first proteomic and genomic landscapes of colorectal cancer in whole progression stages and significantly facilitates future basic and translational research of this deadly disease.

Patient samples of the early-stage colorectal cancer cohort

Construction of the early-stage colorectal cancer cohort

In this study, 600 consecutive patients with colorectal cancer presumed to have colorectal lesions underwent endoscopic submucosal dissection therapy from January 2018 to December 2018 at Zhongshan Hospital, Fudan University. As described in our previous study (16), there were no biases in selecting patients, and none of the patients had received any prior treatment, such as radiotherapy or chemotherapy. One hundred eighteen early colorectal cancer cases were eligible for the establishment of the intended study cohort. Of 482 excluded patients, 107 were diagnosed with nontumor (NT) lesions, 65 had stromal tumors, 110 patients were precluded because of unavailability of their normal tissue samples, and 200 samples failed to pass the pathologic quality check, such as tumor cell ratio <80%. Subsequently, 30 advanced stage colorectal cancer cases were screened after surgical resection without neoadjuvant therapy. Therefore, a total of 148 patients with colorectal cancer were selected to construct the colorectal cancer cohort in this study (Supplementary Table S1A). All cases were staged according to the Eighth Edition of the American Joint Committee on Cancer tumor–node–metastasis staging system. The present study complied with the ethical standards of Helsinki Declaration II and was approved by the Institutional Review Board of Fudan University Zhongshan Hospital (B2019-200R). Written informed consent was provided by all patients before any study-specific investigation. All patients provided the informed consent before any study-specific investigation was performed.

As described in our previous study (17), all samples in our cohort met the following criteria: First, according to the World Health Organization classification, all samples were well preserved and systematically evaluated to confirm the histopathologic diagnosis and any variant histology by more than two expert gastrointestinal pathologists, who determined the acceptable tissue segments based on the tumor content (>95%), the presence and extent of tumor necrosis (<5%), and signs of invasion into the muscularis propria. Second, the following criteria were applied: (i) successful extraction in DNA and protein; (ii) no tumor cells in the normal tissue.

According to the World Health Organization and Japanese pathology diagnostic criteria (18), all the substages in our early-stage colorectal cancer cohort were included in four TNM stages: T0 (normal epithelial, n = 166), T1 (T1a/b cancer, n = 239), T2 (n = 10), T3 (n = 10), and T4 (n = 10; Supplementary Table S1B). T1 was subclassified as low-grade IEN [LGIN; 2_1 (n = 37), 2_2 (n = 29), 2_3 (n = 20), 2_4 (n = 22), 2_5 (n = 4), and 2_6 (n = 1)], high-grade IEN [HGIN; 3_1 (n = 20), 3_2 (n = 21), 3_3 (n = 20), and 3_4 (n = 16)], LP stage [4_1 (n = 2), 4_2 (n = 18), and 4_3 (n = 1)], MM stage [5_1 (n = 1), 5_2 (n = 11), 5_3 (n = 1), and 5_4 (n = 1)], and submucosal invasion cancer stages, namely SMIA [6 (n = 11)] and SMIB stages [7_1 (n = 1), 7_2 (n = 1), and 7_3 (n = 1); Supplementary Table S1C]. All samples were distributed to four phases: NT phase, IEN phase (including LGIN and HGIN stages), IFT phase (ranging from LP to SMIB stages), and advanced stage colorectal cancer phase (ranging from T2 to T4 stages).

Formalin-fixed, paraffin-embedded specimen sampling and processing

All formalin-fixed, paraffin-embedded (FFPE) specimens were prepared and provided by Zhongshan Hospital, Fudan University. As described in our previous studies (16, 17), for clinical sample preparation, slides (10 μm thick) from FFPE blocks were macrodissected, deparaffinized with xylene, and washed with ethanol. FFPE blocks were cut into 3-μm-thick sections for hematoxylin and eosin staining. All substage specimens were scraped, evaluated, and confirmed by more than two experienced and board-certified gastrointestinal pathologists, and materials were aliquoted and stored at –80°C until further processing.

Samples for left-sided colorectal cancer and right-sided colorectal cancer

Based on the location, patients with colorectal cancer were divided into three groups: left-sided colorectal cancer alone group, who had only the left-sided colorectal cancer; right-sided colorectal cancer alone group, who had only the right-sided colorectal cancer; and the mix group, who had both left-sided colorectal cancer and right-sided colorectal cancer. In the main cohort, the mix group had left-sided and right-sided colorectal cancer on the record, and lesion from only one location (left-sided or right-sided) was collected in this study. The normal tissues were collected from the corresponding side of the tumor location. In the validation cohort, the mix group had both left-sided and right-sided colorectal cancer, and the tumor tissues and corresponding normal tissues were collected. All colorectal cancer samples (n = 60) in the validation cohort were dissected into 3-mm-thick sections and were then marked in the hematoxylin and eosin–stained sections. All normal/tumor samples were separately dissected from the FFPE slides and were evaluated by more than two experienced gastrointestinal pathologists.

Whole-exome sequencing

Whole-exome sequencing (WES) was performed by Novogene Co., Ltd. As described in our previous studies (16, 17), DNA from FFPE tumor tissue samples was collected, and matched germline DNA was obtained from NT tissue samples. One hundred six samples of 37 cases were analyzed by WES, and the methodologic details of the methods were provided by Novogene Co., Ltd. The resulting sequence libraries (the paired-end sequence and insert DNA between two ends) were quantified using Qubit 2.0 (Thermo Fisher Scientific), and the insert size was determined using an Agilent 2100 bioanalyzer (RRID: SCR_018043). Base calling was used to obtain the raw data (sequenced reads) from the primary image data.

DNA extraction and DNA qualification

One hundred six samples of 37 colorectal cancer cases were analyzed by WES, and the methodological details of the methods were provided by Novogene Co., Ltd. All the samples were first dewaxed with dimethylbenzene, and then DNA degradation and contamination were monitored on 1% agarose gels. Subsequently, the DNA concentration was measured using Qubit DNA assay in the Qubit 2.0 fluorometer (Invitrogen). At least 0.6 μg genomic DNA per sample was used as input for DNA sample preparation.

Library preparation

The methodologic details of the methods were provided by Novogene Co., Ltd., which were also described in our previous studies (16, 17). A measure of 0.6 μg genomic DNA per sample was used as input for DNA sample preparation. Sequencing libraries were generated using Agilent SureSelect Human All Exon kit (Agilent Technologies) following manufacturer’s recommendations, and index codes were added to each sample.

Fragmentation was carried out using the hydrodynamic shearing system (Covaris) to randomly generate 180 to 280 bp fragments. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3′-ends of DNA fragments, adapter oligonucleotides were ligated. DNA fragments with ligated adapter molecules on both ends were selectively enriched in a PCR. After PCR, libraries were hybridized with the liquid phase with a biotin-labeled probe, and then magnetic beads with streptomycin were used to capture the exons of genes. Captured libraries were enriched in a PCR to add index tags to prepare for sequencing. Products were purified using an AMPure XP system (Beckman Coulter) and quantified using the Agilent high-sensitivity DNA assay on the Agilent Bioanalyzer 2100 system.

The clustering of the index-coded samples was performed on a cBot cluster generation system using HiSeq PE cluster kit (Illumina) according to the manufacturer’s instructions. After cluster generation, the DNA libraries were sequenced on an Illumina HiSeq platform, and 150-bp paired-end reads were generated.

Quality control of data processing and analysis

The methodologic details of the methods were provided by Novogene Co., Ltd., which were also described in our previous studies (16, 17). Paired-end sequencing (PE150) was performed on Illumina HiSeq (Illumina NovaSeq 6000). The resulting sequence libraries (the paired-end sequence and insert DNA between two ends) were quantified using Qubit 2.0 (Thermo Fisher Scientific), and the insert size was determined using an Agilent 2100 bioanalyzer. The original fluorescence image files obtained from the HiSeq platform were transformed to short reads (raw data) by base calling, and these short reads were recorded in FASTQ format, which contains sequence information and corresponding sequencing quality information. Base calling was used to obtain the raw data (sequenced reads) from the primary image data.

Quality control:

  • (i)

    Discard a paired read if either one read contains adapter contamination (>10 nucleotides aligned to the adapter, allowing ≤10% mismatches);

  • (ii)

    Discard a paired read if more than 10% of bases are uncertain in either one read;

  • (iii)

    Discard a paired read if the proportion of low-quality (Phred quality < 5) bases is more than 50% in either one read.

All downstream bioinformatic analyses were based on the high-quality clean data, which were then retained. At the same time, quality control statistics, including total read number, raw data, raw depth, sequencing error rate, percentage of reads with Q30 (the percent of bases with Phred-scaled quality score >30), and QC content distribution, were calculated and summarized.

Read Mapping to the Reference Sequence

The methodologic details of the methods were provided by Novogene Co., Ltd., which were also described in our previous studies (16, 17). Valid sequencing data were mapped to the reference human genome (UCSC hg19) using Burrows–Wheeler Aligner (BWA) software (RRID: SCR_010910) to obtain the original mapping results stored in BAM format. If one or one paired read(s) were mapped to multiple positions, the strategy adopted by BWA was to choose the most likely placement. If two or more most likely placements presented, BWA picked one randomly, and then SAMtools (RRID: SCR_002105) and Picard (http://broadinstitute.github.io/picard/, RRID: SCR_006525) were used to sort BAM files, and duplicate marking, local realignment, and base quality recalibration were performed to generate the final BAM file for computation of the sequence coverage and depth. The mapping step was very difficult because of mismatches, including true mutation and sequencing error, and duplicates resulted from PCR amplification. These duplicate reads were uninformative and should not be considered evidence for variants. We used Picard (RRID: SCR_006525) to mark those duplicates for follow-up analysis.

Detecting and callings of somatic mutations

The methodological details of the methods were provided by Novogene Co., Ltd., which were also described in our previous studies (16, 17). BWA and Samblaster were used for genome alignment, MuTect software (RRID: SCR_000559) was applied for targeting somatic single-nucleotide variant sites, and Strelka was used for testing somatic INDEL information. Statistics used in the article include moderated t-statistics and Fisher exact test.

Gain of neomutations

To investigate the mutation at all stages during colorectal cancer carcinogenesis, the mutations in each stage were counted, and the gain of neomutations was then focused on, which were described in our previous studies (16, 17). In our study, the neomutation meant just occurring at a certain stage and not existing in the earlier stages. For example, if AFP was not mutated in the LGIN stage whereas mutated in the HGIN stage, AFP was the neomutation in the HGIN stage. The neomutations at a certain stage could also reflect the functions of specific mutations in colorectal cancer progression.

Impacts of the detected mutations on the protein and phosphoprotein levels

Somatic copy-number alterations and their impacts on proteome

For somatic copy-number alteration (SCNA) analysis, we used WES-derived BAM files that were processed in the somatic mutation detection pipeline. To investigate the cis-/trans-effects of SCNAs at the chr20q gain and chr17q loss, we focused on the genes that were detected both at the SCNA and protein levels, and then the spearman correlation coefficients were calculated (FDR < 0.05).

Defining cancer-associated genes

Cancer-associated genes (CAG) were compiled from genes defined by Bailey and colleagues (19) and listed by Mertins and colleagues (20). The list of CAGs is provided in Supplementary Table S4A.

Protein extraction and trypsin digestion

As described in our previous studies (16, 17), all samples were dissected with microdissection, collected in 1.5-mL EP tubes, and then stored in a refrigerator at –80°C. The thickness of each FFPE piece is 10 μM, and each substage sample contained no more than 10,000 cells.

Fifty microliters Tris (2-carboxy-ethyl)-phosphin-HCl buffer (2% deoxycholic acid sodium salt (Solarbio, catalog no. D8330), 40 mmol/L 2-chloroacetamide (Aldrich, catalog no. 22790-250G-F), 100 mmol/L Tris-phosphine hydrochloride (Amresco, catalog no. 0497), 10 mmol/L (2-carboxyl)-phosphine hydrochloride (Aldrich, catalog no. 4706-10G), and 1 mmol/L phenylmethylsulfonyl fluoride (Amresco, catalog no. M145-5G) mixed with mass spectroscopy (MS)–grade water (J.T. Baker, catalog no. 4218-03, pH 8.8) was added into 1.5-mL EP tubes with prepared samples and then heated in a 99°C metal bath for 30 minutes. After cooled to room temperature, 3 μg trypsin (Promega, catalog no. V528A) was added into each tube and digested for 18 hours at 37°C in an incubator. Then, 13 μL 10% formic acid (FA; Sigma, catalog no. F0507) was added into each tube and vortexed for 3 minutes and then centrifuged for 5 minutes (12,000 g). Afterward, a new 1.5-mL tube with 350 μL buffer [0.1% FA in 50% acetonitrile (ACN; J.T. Baker, catalog no. 9830-03)] was used for collecting the supernatant for extraction (vortex for 3 minutes and then centrifuged at 12,000 g for 5 minutes). The supernatant was transferred into a new tube and vacuum-dried at 60°C. After drying, 100 μL 0.1% FA was used for dissolving the peptides, then vortexed, and centrifuged for 3 minutes (12,000 g). The supernatant was collected in a new tube and then desalinated. Before desalination, the activation of pillars with two slides of 3M C18 disk is required, and the activation liquid is as follows: 90 μL 100% ACN twice, 90 μL 50% and 80% ACN once in turn, and then 90 μL 50% ACN once. The supernatant of the tubes was then loaded into the pillar twice, followed by decontamination with 90 μL 0.1% FA twice. Lastly, 90 μL elution buffer (0.1% FA in 50% ACN) was added into the pillar for elution twice, and only the effluent was collected for MS. The collection liquid was vacuum-dried at 60°C (∼1.5 hours).

Proteome analysis using LC-MS/MS analysis

As described in our previous studies (16, 17, 21), for the proteomic profiling of samples, peptides were analyzed on a Q Exactive HF-X Hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific) coupled with a high-performance liquid chromatography system (EASY-nLC 1200, Thermo Fisher Scientific). Dried peptide samples re-dissolved in Solvent A (0.1% FA in water) were loaded to a 2-cm self-packed trap column (100 μm inner diameter, practice size 3 μm ReproSil-Pur C18-AQ beads, home-made, SunChrom using Solvent A and separated on a 150-μm-inner-diameter column with a length of 15 cm (practice size 1.9 μm ReproSil-Pur C18-AQ beads, home-made, SunChrom) over a 150 minutes gradient (Solvent A: 0.1% FA in water; Solvent B: 0.1% FA in 80% ACN) at a constant flow rate of 600 nL/minute (0–150 minutes, 0 minutes, 4% B; 0–10 minutes, 4%–15% B; 10–125 minutes, 15%–30% B; 125–140 minutes, 30%–50% B; 140–141 minutes, 50%–100% B; 141–150 minutes, 100% B). The eluted peptides were ionized under 2.0 kV and introduced into a mass spectrometer). MS was performed under a data-dependent acquisition mode. For the MS1 spectra full scan, ions with m/z ranging from 300 to 1,400 were acquired using an Orbitrap mass analyzer at a high resolution of 120,000. The automatic gain control (AGC) target value was set at 3E6. The maximal ion injection time was 80 ms. MS2 spectrum acquisition was performed in the ion trap mode at a rapid speed. Precursor ions were selected and fragmented with higher energy collision dissociation with a normalized collision energy of 27%. Fragment ions were analyzed using the ion trap mass analyzer with the AGC target at 5E4. The maximal ion injection time of MS2 was 20 ms. Peptides that triggered MS/MS scans were dynamically excluded from further MS/MS scans for 12 seconds.

For the phosphoproteomic samples, peptides were analyzed on a Q Exactive HF-X Hybrid Quadrupole-Orbitrap mass spectrometer (Thermo Fisher Scientific) coupled with a high-performance liquid chromatography system (EASY-nLC 1200, Thermo Fisher Scientific). Dried peptide samples re-dissolved in Solvent A (0.1% FA in water) were loaded onto a 2-cm self-packed trap column (100 μm inner diameter, practice size 3 μm ReproSil-Pur C18-AQ beads, home-made, SunChrom) using Solvent A and separated on a 150-μm-inner-diameter column with a length of 30 cm (practice size 1.9 μm ReproSil-Pur C18-AQ beads, home-made, SunChrom) over a 150-minute gradient (buffer A: 0.1% formic acid in water; buffer B: 0.1% FA in 80% ACN) at a constant flow rate of 600 nL/minutes (0–150 minutes, 0 minutes, 4% B; 0–10 minutes, 4%–15% B; 10–125 minutes, 15%–30% B; 125–140 minutes, 30%–50% B; 140–141 minutes, 50%–100% B; 141–150 minutes, 100% B). The eluted phosphopeptides were ionized and detected using a Q Exactive HF-X Hybrid Quadrupole-Orbitrap mass spectrometry. Mass spectra were acquired over the scan range of m/z 300 to 1,400 at a resolution of 120,000 (AUG target value of 3E+06 and maximum injection time 80 ms). For the MS2 scan, higher energy collision dissociation fragmentation was performed at a normalized collision energy of 30%. The MS2 AGC target was set to 5E4 with a maximum injection time of 100 ms. The peptide mode was selected for monoisotopic precursor scan, and charge state screening was enabled to reject unassigned 1+, 7+, 8+, and >8+ ions with a dynamic exclusion time of 40 seconds to discriminate against previously analyzed ions between ±10 ppm.

Phosphopeptide enrichment and analysis

All qualified profiling data were processed at a Firmiana platform against the human RefSeq protein database (updated on 04-07-2013; RRID: SCR_003496) in the NCBI. Owing to the definite volume of the samples, only 101 samples from 36 patients with colorectal cancer were adequate for phosphoproteome: NT stage (n = 31), LGIN stage (n = 23), HGIN stage (n = 17), LP stage (n = 6), MM stage (n = 7), SMIA stage (n = 6), SMIB stage (n = 3), T2 stage (n = 3), T3 stage (n = 3), and T4 stage (n = 2; Supplementary Table S1D).

As described in our previous studies (16, 17), the phosphoproteome samples were prepared by Fe-NTA Phosphopeptide Enrichment Kit (Thermo Fisher Scientific, catalog no. A32992) according to the manufacturer’s instruction. Briefly, 2 mg of peptides was resuspended in 200 μL binding/wash buffer and loaded to the equilibrated spin column. The resin was mixed with the sample by gently tapping. The mixture was incubated for 30 minutes and centrifuged at 1,000 × g for 30 seconds to discard the flowthrough. The column was then washed by 200 μL of binding/wash buffer and centrifuged at 1,000 × g for 30 seconds for 3 times and washed by 200 μL of LC-MS grade water for one more time. The phosphopeptide was eluted by 100 μL of elution buffer and centrifuged at 1,000 × g for 30 seconds for 2 times. Phosphopeptides were dried down for LC-MS/MS analysis.

Quantification of global proteome data and phosphoproteome data

As described in our previous studies (16, 17), all MS raw files were processed at the Firmiana platform (a one-stop proteomic cloud platform: http://www.firmiana.org; ref. 22) and were searched against the NCBI human RefSeq protein database (updated on April 7, 2013, 32,015 entries) in the Mascot search engine (version 2.3, Matrix Science Inc., RRID: SCR_014322). Trypsin was used as the proteolytic enzyme allowing up to two missed cleavages. Carbamidomethyl (C) was considered a fixed modification. For the proteome profiling data, variable modifications were oxidation (M) and acetylation (protein N-term). For the phosphoproteome data, variable modifications were oxidation (M), acetylation (protein N-term), and phospho (S/T/Y). All identified peptides were quantified at the Firmiana platform with peak areas derived from their MS1 intensity. The mass tolerances were 20 ppm for precursor and 50 mmu for the product collected by Q-Exactive HFX. Precursor ion score charges were limited to +2, +3, and +4. The FDRs of the peptide–spectrum matches and proteins were set at a maximum 1%. Label-free protein quantifications were calculated in our cohort, the so-called iBAQ algorithm (23), which divided the protein abundance (derived from identified peptides’ intensities) by the number of theoretically observable peptides. Then the fraction of total, defined as a protein’s iBAQ divided by the total iBAQ of all identified proteins within one sample, was used to represent the normalized abundance of a particular protein across samples.

Data imputation

As described in our previous studies (16, 17), for the missing value in our study, we first applied match between runs algorithm (24) in this study, which has been proved to be an effective technique to fill the missing values. In brief, we built a dynamic regression function based on common identification peptides in samples. According to the correlation value R2, the function chooses a linear or quadratic function for regression to calculate retention time (RT) of corresponding hidden peptides, and the existence of the extracted ion chromatogram was checked based on the m/z and calculated RT. The function evaluated the peak area values of those exhibited extracted ion chromatograms. These peak area values are considered parts of corresponding proteins.

Hierarchical clustering analysis

As described in our previous studies (16, 17), the hierarchical clustering analysis and principal component analysis (PCA) were implemented in R (version 3.5.1) to assess the batch effects in our proteome dataset with respect to the following two variables: batch identity and sample type (substage/subtype/panel). For the hierarchical clustering analysis, the pair-wise Spearman correlation coefficients of the samples in the same substage were first investigated. To this end, samples in the same type exhibited a high similarity, whereas samples of different subtypes clearly differed. Furthermore, we used the average linkage algorithm using one minus the Spearman correlation coefficient as the dissimilarity measure.

In the global heatmap in our study, each protein expression value in the global proteomic expression matrix was transformed into a Z-score across all samples. For the sample-wise and protein-wise clustering, the distance was set as “Euclidean’’ distance, and the weight method was “complete”. The Z-score–transformed matrix was clustered using the R package pheatmap (version 1.0.12, RRID: SCR_016418).

Differential proteomic analysis and pathway enrichment

For comparing the differentially expressed proteins (DEP) of different stages during colorectal cancer progression (Fig. 1), we focused on protein abundance (average) at each stage, which were enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG; RRID: SCR_012773)/Gene Ontology(GO; RRID: SCR_002811) databases and ConsensusPathDB (http://cpdb.molgen.mpg.de/, RRID: SCR_002231). We then annotated the signaling pathways (adjust. FDR < 0.05) and manually checked the pathway-associated proteins and then estimated whether these are significantly associated with the stages of colorectal cancer (Kruskal–Wallis test, adjust. P < 0.05), as described in our previous studies (16, 17).

For analyzing the diverse functions of the KRAS and BRAF mutations (Fig. 2), the DEPs were used between the KRAS mutation group and the wild-type (WT) group, and between BRAF mutation group and the WT group (Wilcoxon rank-sum test, FDR < 0.05, Mut vs. WT ratio ≥ 2). Then, comparative analysis was performed based on gene set enrichment analysis (GSEA).

For analyzing how KRAS and BRAF mutations regulated diverse functions in colorectal cancer progression at the phosphoprotein level (Fig. 2), the DEPs were employed between the KRAS mutation group and the WT group, and between the BRAF mutation group and the WT group (Wilcoxon rank-sum test, FDR < 0.05, Mut vs. WT ratio ≥ 2, Phos vs. Pro ≥ 2). In addition, kinase–substrate enrichment analysis (KSEA) of the phosphoproteome of the KRAS mutation group and the BRAF mutation group was applied, and then the KRAS mutation–kinase–substrate network and the BRAF mutation–kinase–substrate network were established.

For analyzing how DDX5 deletion and TOP1 amplification enhanced cell cycle at the phosphoprotein level (Fig. 3), DEPs were applied between the DDX5 deletion group and the WT group, and between TOP1 amplification and the WT group (Wilcoxon rank-sum test, FDR < 0.05, Del/Amp vs. WT ratio ≥ 2, Phos vs. Pro ≥ 2). Then, KEGG/GO database were applied to pathway enrichment. In addition, KSEA of the phosphoproteome of the DDX5 deletion group and TOP1 amplification group was applied, and then the DDX5 deletion–kinase–substrate network and TOP1 amplification–kinase–substrate network were established.

For analyzing the functions of co-mutations of KRAS and TP53 (Fig. 4), four groups were established: KRAS WT and TP53 WT (KRAS WT/TP53 WT) group, KRAS WT and TP53 Mut (KRAS WT/TP53 Mut) group, KRAS Mut and TP53 WT (KRAS Mut/TP53 WT) group, and KRAS Mut and TP53 Mut (KRAS Mut/TP53 Mut) group. The DEPs among of those four groups were used for pathway enrichment analysis (Kruskal–Wallis test, FDR < 0.05). Then, the KEGG/GO database were applied to pathway enrichment.

For analyzing the divergency between the left-sided colorectal cancer alone group and right-sided colorectal cancer alone group (Fig. 5), the main cohort with 435 samples and another independent validation cohort with 60 samples were collected, and the DEPs were used between left-sided colorectal cancer alone group and right-sided colorectal cancer alone group (Wilcoxon rank-sum test, FDR < 0.05, left-sided colorectal cancer vs. right-sided colorectal cancer ratio ≥ 2 or ≤ 0.5). In the mix group, the DEPs were applied between the normal tissues and tumor tissues (Wilcoxon rank-sum test, FDR < 0.05, normal vs. tumor ratio ≥ 2 or ≤ 0.5). For analysis of the divergency between left-sided colorectal cancer/right-sided colorectal cancer and the mix group, the DEPs were employed between left-sided colorectal cancer/right-sided colorectal cancer and the mix group (Wilcoxon rank-sum test, FDR < 0.05, left-sided colorectal cancer/right-sided colorectal cancer vs. mix ratio ≥ 2 or ≤ 0.5). Then, KEGG/GO databases were applied for pathway enrichment.

For analyzing the progression paths of the CMS- and CRIS-based classifications (Fig. 6), the DEPs of different stages during colorectal cancer progression were evaluated (Kruskal–Wallis test, FDR < 0.05) and were then enriched by the KEGG/GO databases and ConsensusPathDB (http://cpdb.molgen.mpg.de/). We then annotated the signaling pathways (FDR < 0.05) and manually checked the pathway-associated proteins and then estimated whether these are significantly associated with the stages of colorectal cancer (Kruskal–Wallis test, FDR < 0.05).

For analyzing the difference between MSI and MSS (Fig. 7), the DEPs were used between the MSI and MSS tumors (Wilcoxon rank-sum test, FDR < 0.05, MSI vs. MSS ratio ≥ 2 or ≤ 0.5), and then GSEA was applied for pathway enrichment.

For analyzing the molecular characteristics of three groups (control group, cycle I group, and cycle II group) of the AOM/DSS-introduced colorectal cancer mouse model (Fig. 8), the highly expressed proteins of each group were used (Kruskal–Wallis test, FDR < 0.05), which were then enriched using ConsensusPathDB (http://cpdb.molgen.mpg.de/).

Construction and validation of predictive models to distinguish between KRAS Mut/TP53 Mut and others, between colorectal cancer liver metastasis and WT

Binomial logistic regression analysis was used to construct the predictive model for distinguishing between KRAS Mut/TP53 Mut and others, between colorectal cancer liver metastasis and WT based on 20 proteins using R software v3.5.1. The backward stepwise method was utilized for feature selection. Samples were randomly divided into the training set and the testing set. Moreover, the diagnostic value of this model was verified using ROC analysis (pROC R package version 1.16.2 and caret R package version 6.0–86). Sensitivity, specificity, accuracy, and AUC were used to determine predictive values. The predictive model was validated in the validation cohort.

Complement cascade and extracellular matrix signaling score

Single-sample GSEA was utilized to obtain a score for each sample based on proteomic data using the R package GSVA. Correlations between the stroma score and complement cascade/extracellular matrix (ECM) signaling were determined using Pearson correlation. The inferred complement cascade and ECM signaling score was performed using single-sample GSEA implemented in the R package GSVA.

Kinase activity prediction and phosphopeptide analysis

The phosphoproteome data of 101 colorectal cancer samples were searched against the same database using MaxQuant (RRID: SCR_014485). As described in our previous studies (16, 17), the phosphorylation of S or T or Y was set as variable modification, in which three miscleavages were allowed, with a minimum Andromeda score of 40 for spectra matches. The ratios of identified phosphorylation sites of all samples were used to estimate the kinase activities by KSEA algorithm. The information of kinase–substrate relationships was obtained from publicly available databases, including PhosphoSite (RRID: SCR_001837), Phospho.ELM (RRID: SCR_001109), and PhosphoPOINT (RRID: SCR_002109). The information of substrate motifs was obtained either from the literature or from an analysis of the KSEA dataset with Motif (sP). The kinase–substrate–motif network analysis was referenced from PhosphoSitePlus (https://www.phosphosite.org/homeAction, RRID: SCR_001837) and NetworKIN 3.0 (RRID: SCR_007818). Statistical analysis was performed in R (version 3.5.1) using the Kruskal–Wallis test.

Trajectory inference methods and progression path analysis

We used the monocle (version 2.10.1) and trajectory inference methods to trace the lineages in 148 patients with early-stage colorectal cancer. As described in our previous studies (16, 17), the proteins with mean expression over 1.0E−1 were highlighted and screened. The dataset was clustered and preprepared by t-distributed stochastic neighbor embedding using a Barnes–Hut implementation with Rtsne (version 0.15) in R (version 3.5.1). All phases of each patient with early-stage colorectal cancer were considered the pseudotime to construct the trajectory of the patients with colorectal cancer in CMS- and CRIS-based subtype classifications.

Cells

The HEK293T cell line (Cat# CRL-11268, RRID: CVCL_QW54), HCT116 cell line (Cat# CCL-247, RRID: CVCL_0291), SW480 cell line (Cat# CCL-228, RRID: CVCL_0546), and SW620 cell line (Cat# CCL-227, RRID: CVCL_0547), were purchased from ATCC. HEK293T cells, HCT116 cells, SW480 cells, and SW620 cells were cultured in DMEM/high glucose medium (HyClone) supplemented with 10% FBS (BI), 100 U/mL penicillin, and 100 μg/mL streptomycin (Sangon Biotech). All cells were cultivated in a humidified incubator with 5% CO2 at 37°C. The genetic identity of the cell lines was confirmed by short tandem repeat profiling (Cell ID, Promega), finally repeated in December 2023. Cells were periodically tested for Mycoplasma with Venor GeM Kit (Minerva Biolabs), and all cell lines tested negative for Mycoplasma contamination.

IHC analysis

Human colorectal cancer samples for tissue array were collected from June 2008 to December 2018 at Xinhua Hospital, Shanghai Jiaotong University School of Medicine. Institutional Review Board approval and informed consent were obtained for all sample collections (n = 244). IHC staining was performed on tissue array according to general protocols to analyze the expression of DDX5 using DDX5 rabbit antibody (Abcam, catalog no. ab126730, RRID: AB_11130291, dilution 1:250). Semiquantitative analysis of IHC was scored by staining intensity and percentage of staining.

Retrovirus packaging and infection

To generate the retrovirus targeting overexpression of human DDX5, the target sequences were cloned into the pBABE-FLAG-puro vector. To generate the retrovirus shRNA constructs against human DDX5, the target sequences were cloned into the pMKO.1 puro vector (Addgene, Cat# 8452, RRID: Addgene_8452). The shRNA sequences were listed as follows:

shDDX5: 5ʹ-AGG​TGG​AAA​CAT​ACA​GAA​GAA-3ʹ

Cell proliferation

Cells were seeded in 96-well plates at the density of 4,000 cells per well (HCT116 cells, SW480 cells, and SW620 cells). The cell number was determined by Cell Counting Kit-8 (Dojindo) according to the manufacturer’s protocol. In brief, the medium replaced with 100 μL fresh medium containing 10 μL of CCK8. After incubation for 3 hours at 37°C, the culture plates were shaken for 5 minutes, and the optical density values were read at 450 nm.

Mouse xenograft studies in vitro

Nude mice (4–6 weeks old, male, n = 32) were obtained from SLAC Laboratory Animals LLC, Shanghai, China. All mouse procedures were approved by the Xinhua Hospital Animal Care and Use Committee (XHEC-NSFC-2021-326). pMKO-Control, pMKO-shDDX5, pBABE-Control, and pBABE-oeDDX5 cells (1 × 106) were suspended in 100 mL of 1 × PBS separately and then injected subcutaneously into the bilateral axillary fat pad. After 21 days, the mice were sacrificed by neck dislocation. The tumors were dissected, imaged, and weighed. Tumor tissues were collected, fixed in 10% neutral buffered formalin for subsequent analyses.

AOM/DSS-induced colorectal cancer model in mice

Mice were weighed and intraperitoneally injected with AOM (6–8-week-old, 10 mg kg−1 body weight, dissolved in PBS) on day 1, followed by three cycles of DSS dissolved in drinking water as a stepwise increasing concentration of 1.25%–1.5%–1.75% from day 2. In every DSS cycle, mice drank DSS water for 7 days and then stopped for 14 days before the next cycle. In this study, mice were sacrificed on day 21 (cycle I, n = 4) and 44 (cycle II, n = 5). Tumors were calculated and photographed after longitudinal dissection of the colon. Specifically, the tumor in cycle I and cycle II represented the samples in the early stage and advanced stage. All mice (n = 13) were housed in ventilated cages in a pathogen‐free animal facility (Xinhua Hospital Animal Care and Use Committee, XHEC-NSFC-2021-326) with ad libitum access to water and standard rodent diet. The mice were maintained under controlled temperature (22° ± 2°) and a 12-hour light/dark cycle.

Immunoprecipitation

For immunoprecipitation, cells were lysed with 1% Nonidet P40 buffer containing 50 mmol/L Tris-HCl (pH 7.5), 150 mmol/L NaCl, and multiple protease inhibitors (phenylmethylsulfonylfluoride 1 mmol/L, aprotinin 1 mg/mL, leupeptin 1 mg/mL, pepstatin 1 mg/mL, Na3VO4 1 mmol/L, and NaF 1 mmol/L for acetylation experiments; TSA 2.5 mmol/L and NAM 25 mmol/L in addition in final concentration). Cell lysates were incubated with anti-FLAG M2 magnetic beads (Sigma) at 4°C. The immunocomplexes were washed with NP40 buffer for three times. Then both lysates and immunoprecipitants were examined using the indicated primary antibodies.

Websites used and proteins downloaded

In this study, to download the macrophage markers (Fig. 4), we accessed the CellMarker web (http://xteam.xbio.top/CellMarker/download.jsp) and downloaded the macrophage-related marker database (n = 285) through filtering out the key words “macrophage.” To gain the anchor proteins, we accessed the Human Protein Atlas (HPA; n = 49; https://www.proteinatlas.org) and downloaded the anchor protein database (n = 49) through filtering out the key words “anchor protein.”

Quantification and statistical analysis

All data were analyzed and plotted using R and GraphPad Prism 9 software, and Fisher exact test, Bartlett test, Kruskal–Wallis test, Wilcoxon signed-rank test, and Spearman correlation test were performed using R (version 3.5.1). Data in the barplot were presented as mean ± SEM. All statistical tests were two-tailed, and statistical significance was considered when (adjusted) P value < 0.05, which were adjusted using the Benjamini–Hochberg procedure. Kaplan–Meier plots (log-rank test) were used to describe the overall survival. Tissue samples of patients with colorectal cancer were randomly selected. For validating the findings in this study, each experiment was repeated at least three times independently. All experiments were reliably reproduced and indicated in the figure legends, Methods and Materials, and Results. Data in the boxplot were presented as median (central line), upper and lower quartiles (box limits), and 1.5× IQR (whiskers). For sample processing, PCA, and consensus clustering analysis, all investigators were blinded to outcomes.

Data availability

All proteomic and phosphoproteomic raw datasets generalized in this study have been deposited to the ProteomeXchange Consortium [dataset identifier: PXD052391 (https://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD052391)] via the iProX partner repository (https://www.iprox.cn/) under Project ID IPX0002188000 (https://www.iprox.cn/page/project.html?id=IPX0002188000). The raw WES data have been deposited in the Genome Sequence Archive (https://ngdc.cncb.ac.cn/gsa-human/) under accession number HRA007452 (https://ngdc.cncb.ac.cn/gsa-human/browse/HRA007452). The WES data are under restricted access due to data privacy laws related to patient consent for data sharing, and access can be obtained by following the Request Data steps in the Genome Sequence Archive website or by contacting the corresponding author. The approximate response time for accession requests is about 2 weeks. Once access has been granted, the data will be available to download for 3 months. The gene expression profiles of colorectal cancer cell lines analyzed in this study were obtained from the DepMap Public 21Q2 dataset at https://depmap.org/portal. The colorectal cancer datasets analyzed in this study (TCGA 2012 cohort, TCGA 2018 cohort, CPTAC cohort, metastatic colorectal cancer (mCRC) cohort, and Memorial Sloan Kettering Cancer Center (MSKCC) cohort were obtained from the cBioPortal website at https://www.cbioportal.org/. The dataset of the Zeng cohort was downloaded from its attached supplementary table (https://doi.org/10.1016/j.ccell.2020.08.002). The data generated in this study are available within the article and its Supplementary Data. The final analysis data for all cases were provided in the accompanying Supplementary Tables. All other raw data are available upon request from the corresponding author.

Overview of the proteogenomic landscape in colorectal cancer progression

We performed multiomic-based profiling of 435 trace samples collected from 148 patients with colorectal cancer and established 10 histopathologic stages with high tumor purity, providing chances to portray molecular profiles of colorectal cancer in a time-resolved mode (Supplementary Fig. S1A–C; Supplementary Table S1C; Materials and Methods). We conducted proteomic profiling of 435 samples, WES of 106 samples, and phosphoproteomic profiling of 101 samples (Fig. 1A; Supplementary Fig. S2A; Supplementary Table S1D and S1E).

At the gene level, the top ranked mutations are showed in Fig. 1B. The neomutations were peaked in the LGIN stage and T2 stage (Fig. 1C; Supplementary Table S2A and S2B; Materials and Methods), allowing us to explore the key events during colorectal cancer progression. Observation of the tumor mutation burden (TMB) of the TCGA 2012 cohort (13) and CPTAC cohort (9) displayed that low TMB was detected in the IEN phase (Fig. 1D).

At the protein level, high Spearman correlation coefficients (mean = 0.90) between the quality control samples (HEK293T cells) showed that MS was robust and consistent (Supplementary Fig. S2B; Supplementary Table S3A). Label-free quantification measurement of 435 samples resulted in a total of 14,597 protein groups (Fig. 1E; Supplementary Fig. S2C and S2D; Supplementary Table S3B; Materials and Methods). Comparably, a total of 47,415 phosphosites corresponding to 6,785 phosphoproteins were detected in 101 samples of the Fudan cohort (Fig. 1F; Supplementary Fig. S2E; Supplementary Table S3C). Overall, we established a comprehensive landscape of colorectal cancer progression at the multiomic level, and provided an integrated dataset in colorectal cancer carcinogenesis.

We presented a panel-based analysis (P1–P5) to elucidate the molecular dynamic models that drove colorectal cancer carcinogenesis (Materials and Methods), which was summarized as metabolic process–oxidative phosphorylation/mitochondrial electron transport–cell cycle/DNA replication–NF-κB/mTOR signaling–ECM signaling/complement cascade (Supplementary Fig. S2F and S2G). Together, we delineated molecular characterizations of different stages in colorectal cancer progression, providing a reference database for corresponding personalized medicine for patients with colorectal cancer.

Mutations of KRAS and BRAF enhanced oxidative phosphorylation in the IEN phase

KRAS mutation is frequent in human cancers and encodes a guanosine triphosphatase to regulate signal transduction (25). In our cohort, KRAS and BRAF mutations were mutually exclusive (ME, annotated as KRASBRAF-ME in this cohort) and negatively associated with outcomes of patients with colorectal cancer [log-rank test, P < 1.0E−4, referred in the mCRC cohort (26)]. In addition, KRASBRAF-ME mutations were predominant in the IEN phase (71.4%; Fig. 2A; Supplementary Fig. S3A).

To investigate the impacts of KRASBRAF-ME mutations, we performed GSEA for enriched pathways and found that KRASBRAF-ME mutations showed positive impacts on oxidative phosphorylation, which was also dominant in the IEN phase compared with the IFT phase (Fig. 2B). We integrated the highly expressed proteins in the KRASBRAF-ME mutation group compared with the WT group, which were then covered by the overrepresented proteins in the IEN phase compared with the IFT phase (Fig. 2C). The results showed that only RASAL1 was highly expressed in the KRASBRAF-ME mutation group and overrepresented in the IEN phase compared with the IFT phase (Fig. 2D). Validations from 10 colorectal cancer fresh frozen tissues presented that RASAL1 was highly expressed in the KRAS mutation group compared with the WT group (Fig. 2E).

Generally, RASAL1 inhibits RAS and leads to the activation of mitochondrial respiratory chain (27). In our cohort, KRASBRAF-ME mutations enhanced the expression of mitochondrial respiratory chain–associated proteins, which was also validated in other colorectal cancer cohorts, including CPTAC cohort, TCGA 2012 cohort, and TCGA 2018 cohort (Fig. 2F; Supplementary Fig. S3B and S3C; ref. 28). The mitofusins (MFN), along with optic atrophy 1 (OPA1), mediate mitochondrial fusion (29). In our cohort, OPA1 and RASAL1 both showed a positive association with MFN1 at the protein level (Fig. 2G). Together, these results revealed that KRASBRAF-ME mutations enhanced oxidative phosphorylation in the IEN phase (Fig. 2H).

Compared with KRAS mutation, BRAF mutation was associated with worse prognosis of patients with colorectal cancer (referred in the mCRC cohort, log-rank test, P < 1.0E−4; Supplementary Fig. S3D), indicating diverse functions between KRAS mutation and BRAF mutation. To explore the divergency, we integrated the DEPs between the KRAS mutation group and the WT group and DEPs between the BRAF mutation group and the WT group at different colorectal cancer phases (Supplementary Materials). Comparative analysis showed that the KRAS mutation group was featured by additional PI3K–AKT–mTOR signaling, and the BRAF mutation group was characterized by additional focal adhesion and ECM signaling at the protein and phosphoprotein levels, which was further validated in TCGA 2012 cohort (Supplementary Fig. S3E–H). We then performed KSEA of the phosphoproteome of the KRAS mutation group and the BRAF mutation group, and the results revealed that the kinases of PRKCQ/PRKDC and MET/PDGFRA along with their counterpart substrates were overrepresented in the KRAS mutation group and the BRAF mutation group (Supplementary Fig. S3I and S3J).

Briefly, KRAS and BRAF mutations elevated oxidative phosphorylation in the IEN phase. Additionally, the PRKCQ/PRKDC–substrate regulation network in the KRAS mutation group promoted PI3K–AKT signaling, and the MET/PDGFRA–substrate regulation network was notable in the BRAF mutation and enhanced ECM signaling and focal adhesion.

The deletion of DDX5 at the chr17q loss led to cell proliferation in colorectal cancer progression

To explore the impacts of SCNAs on the chromosomes in colorectal cancer progression, we performed whole exome–based SCNA analyses of the events at the arm level (Supplementary Table S2C). To this end, chr17q loss and chr20q gain were the top ranked loss event and gain event, respectively (Supplementary Fig. S4A). Specifically, chr17q loss and chr20q gain were also ME and occurred predominantly in the IEN phase and IFT phase, respectively (Fig. 3A).

To explore the biological functions of chr17q loss and ch20q gain, we performed cis-effects analysis of genes with CNA regions at the protein level. By chr20q gain, we found that 13 cis-effect genes were involved in cell cycle, which were also validated in other colorectal cancer cohorts including TCGA 2012 cohort and CPTAC cohort (Fig. 3B and C; Supplementary Fig. S4B). Based on the annotated outliers for the degree of which short hairpin RNA (RNAi)–mediated depletion reduced colorectal cancer cell lines (30), we found that the RNAi of those 13 genes had negative effects on the proliferation of colorectal cancer cells, in which TOP1 was recorded in CAGs (Fig. 3C; Supplementary Table S4A; ref. 20). In addition, the proteins, positively correlated with TOP1 amplification, were related to cell cycle and DNA replication (Supplementary Fig. S4C), suggesting that the oncogenic functions of TOP1 at the chr20q induced colorectal cancer cell proliferation in the IFT phase.

Among the nine significant cis-effect genes due to chr17q loss, the DDX5 deletion, transcriptional coactivator for TP53, involved in cell-cycle progression (31) and occurred predominantly in the IEN phase (Supplementary Fig. S4D and S4E). DDX5 showed a positive association with the prognosis of the patients with colorectal cancer [referred in TCGA 2014 cohort (12)], indicating potential mediative therapy of DDX5 in colorectal cancer progression (Fig. 3D). Though chr17q loss and chr20q gain were ME, the results of coimmunoprecipitation assay showed that DDX5 interacted with TOP1 (Fig. 3E), indicating that both DDX5 deletion and TOP1 amplification decreased the dragging effects of DDX5 on TOP1, thus promoting cell-cycle progress.

Phosphorylation functions in the cell cycle. We integrated the DEPs between the DDX5 deletion group and WT group, and between the TOP1 amplification group and WT group, at the protein and phosphoprotein levels (Supplementary Materials). The results of comparative analysis showed that cell cycle–related pathways, including M phase S phase,. were both overrepresented in the DDX5 deletion group and in the TOP1 amplification group compared with their corresponding WT groups (Supplementary Fig. S4F and S4G). To investigate how DDX5 deletion and TOP1 amplification regulated phosphorylation and enhanced the cell cycle, we performed KSEA of the phosphoproteome of the DDX5 deletion group and the TOP1 amplification group, and found that the kinases of CDK1 and CDK2 along with their substrates were overrepresented in the DDX5 deletion group and the TOP1 amplification group, respectively (Fig. 3F; Supplementary Fig. S4H). Taken together, DDX5 deletion due to chr17q loss and TOP1 amplification by chr20q gain enhanced the protein levels of kinases (CDK1 and CDK2),and thus activated cell cycle in the IEN phase and IFT phase, respectively (Fig. 3J), providing a reference database for the corresponding potential therapy of colorectal cancer.

Next, we investigated the role of DDX5 in colorectal cancer development in vitro. We collected another 244 samples (122 tumors and paired normal tissues) as an independent validation cohort and noted that the IHC score of DDX5 in the tumor tissues was significantly lower compared with that in their paired normal tissues (Fig. 3G). Overexpression of DDX5 in HCT-116 cells slowed down cell proliferation, and knockdown of DDX5 in HCT-116 cells increased cell proliferation (Fig. 3H; Supplementary Fig. S4I; Supplementary Table S5A and S5B). These results reflected that enhancement of DDX5 was associated with decreased cell proliferation in colorectal cancer.

Compared with the HCT116 cells, the expression of DDX5 was lower in the SW480 cells and was higher in the SW620 cells (Supplementary Fig. S4J). The results of CCK8 assay showed that overexpression of DDX5 in SW480 cells significantly decreased cell proliferation, and knockdown of DDX5 in SW620 cells notably promoted cell proliferation (Supplementary Fig. S4K–N). In addition, overexpression of DDX5 decelerated the xenograft growth of colorectal cancer cells, and knockdown of DDX5 expression promoted the xenograft growth of colorectal cancer cells (Fig. 3I; Supplementary Fig. S4O; Supplementary Table S5C and S5D). Together, these findings demonstrated that DDX5 loss was positively associated with cell proliferation and tumor growth in colorectal cancer.

Briefly, the integrated findings indicated that chr17q loss functioned in the transmit process from the NT phase to the IEN phase, demonstrating the potential role of DDX5 in colorectal cancer development (Fig. 3J).

The TP53 mutation was associated with TME in the advanced stage colorectal cancer phase

The TP53 mutation was frequent in colorectal cancer, while the related functions at the time-resolved mode during colorectal cancer carcinogenesis are still unclear. In our cohort, TP53 mutation was frequent in advanced stage colorectal cancer phase, and the results of GSEA showed that TP53 mutation displayed a positive association with ECM signaling and complement cascade (Fig. 4A; Supplementary Fig. S5A). The aberrant ECM functions in immune reprogramming of the tumor microenvironment (TME; ref. 32). The results of xCell (https://xcell.ucsf.edu) disclosed that TP53 mutation enhanced the immune cell signature of fibroblasts and pericytes, which were also positively associated with the elevated stroma score (Supplementary Fig. S5B). Pericytes contribute to complement activation and hamper antitumor T-cell responses, favoring tumor progression (33). Notably, a positive correlation between pericytes and complement cascade was observed in our cohort, as well as between stroma score and ECM signaling (Supplementary Fig. S5C and S5D). PDGFRB, one of the makers of pericytes, increases tumor vessel maturation (34). In our cohort, we found that PDGFRB was associated with the poor prognosis of the patients with colorectal cancer (referred to in TCGA cohort; ref. 12) and displayed positive association with complement cascade and pericytes (Supplementary Fig. S5E and S5F). Collectively, these findings presented that TP53 mutation enhanced the protein levels of ECM rigidity and complement cascade, elevating stromal infiltration in colorectal cancer progression (Supplementary Fig. S5G).

Furthermore, we found that TP53 and KRAS were co-mutations in colorectal cancer progression, and specifically, the co-mutations of KRAS and TP53 were prominent in the advanced stage colorectal cancer phase, which was also validated in the MSKCC cohort (Supplementary Fig. S5H; ref. 26).

Patterns of co-mutations indicated functional synergies and, importantly, may reflect the prognosis of patients (35). Based on the data from other colorectal cancer cohorts, the co-mutations of KRAS and TP53 were associated with poor prognosis of patients with colorectal cancer (Supplementary Fig. S5I). We divided the samples of our cohort into four groups and performed comparative analysis of the group-based highly expressed proteins (Kruskal–Wallis test, FDR < 0.05; Materials and Methods; Supplementary Materials). In the KRAS Mut/TP53 Mut group, the complement cascade was the predominant pathway, except for ECM signaling–related pathways (Fig. 4B). To investigate the potential links between activation of complement cascade and poor prognosis of patients with colorectal cancer with co-mutations of KRAS and TP53, we applied the molecular complex detection to calculate the maximal clique centrality (MCC) score of the proteins (bioRxiv 2021.06.22.449395). Among the complement cascade pathway–related proteins, top ranked 20 proteins were filtered with higher MCC score. Based on those 20 proteins, PCA revealed notable separation between the samples with and without co-mutations of KRAS and TP53, and the consistent results were further validated in TCGA 2012 cohort, CPTAC cohort, and Zeng cohort (Supplementary Fig. S5J and S5K; ref. 36). We then employed stepwise logistic regression based on those 20 proteins and applied 10-fold cross-validation, which generated a mean ROC-AUC of 0.968, The high ROC-AUC values were also validated in other colorectal cancer cohorts (Fig. 4C; Supplementary Fig. S5L).

Lin and colleagues (37) point the interactions between complement cascade and the progression of colorectal cancer with liver metastasis. PCA with those 20 proteins showed a clear boundary between the samples of the patients with and without colorectal cancer liver metastasis, and AUC of 0.947 was detected in the Zeng’s cohort (Supplementary Fig. S5M and S5L), demonstrating the accuracy and stability of the prediction model with those 20 proteins. Together, these findings implied the positive functions of KRAS and TP53 co-mutations in the complement cascade, involving in colorectal cancer liver metastasis.

Generally, the complement cascade presents an immunosuppressive phenotype (38). In this study, complement cascades showed significantly association with the immune signatures of monocytes and macrophages (Fig. 4D). Jiali and colleagues (39) identified that liver metastasis recruited monocyte-derived macrophages to remodel the TME of the liver. Of those 20 proteins, FN1 was associated with poor prognosis of patients with colorectal cancer (data referred in TCGA 2018 cohort). Besides, FN1 was the top ranked protein, which was overrepresented in the KRAS Mut/TP53 Mut group, and presented positive association with immune signatures of monocytes and macrophage (Fig. 4E and F; Supplementary Fig. S5N). In addition, FN1 was highly expressed in the colorectal cancer liver metastasis group (Supplementary Fig. S5O), implying that FN1 might function in colorectal cancer liver metastasis.

To investigate how FN1 affected the macrophage and thus involving colorectal cancer liver metastasis, we integrated FN1 positively linked proteins, and found that three proteins (VCAN, ITGAX, and MMP14) were recorded in the macrophage-related marker database (Materials and Methods), in which VCAN displayed positive association with FN1, indicating the recruitment of (VCAN-positive) macrophage (Fig. 4G and H). Therefore, KRAS and TP53 co-mutations enhanced the complement cascade and promoted the recruitment of macrophage and thus were involved in potential colorectal cancer liver metastasis (Fig. 4I), providing valuable insights into the molecular mechanism of colorectal cancer liver metastasis.

The characteristics of colorectal cancer based on locations and the impacts of ANKRD22 amplification on glycolysis in colorectal cancer progression

Generally, colorectal cancers are classified into left-sided colorectal cancer and right-sided colorectal cancer (14). However, the pathologic heterogeneity and origin characteristics are still unclear. Based on colorectal cancer locations, the patients were divided into three groups: left-sided colorectal cancer alone group (n = 80), right-sided colorectal cancer alone group (n = 30), and the mix group (n = 38; Fig. 5A; Materials and Methods).

Comparative analysis of colorectal cancer based on locations presented that the left-sided colorectal cancer alone group was featured by focal adhesion and ECM signaling, and the highly expressed proteins in the right-sided colorectal cancer alone group were dominant in metabolic processes, including glycolysis and citrate cycle (TCA), which was further validated by TCGA 2012 cohort and the CPTAC cohort (Fig. 5B; Supplementary Fig. S6A and S6B; Supplementary Materials). The similar trend was also showed in the corresponding normal tissues, suggesting that the diversity of colorectal cancer tumor based on locations was associated with their origin normal tissues.

To further validate the findings, we collected 60 samples from 25 patients with colorectal cancer as another independent cohort: left-sided colorectal cancer group (n = 9), right-sided colorectal cancer group (n = 9), and mix group (n = 7; Supplementary Table S6A and S6B; Materials and Methods). Morphologically, the normal tissues in the left-sided colorectal cancer and right-sided colorectal cancer of the mix group reserved the morphologic features of the normal tissues in the left-sided colorectal cancer alone group and the right-sided colorectal cancer alone group, and the morphologic features of tumor tissues of left-sided colorectal cancer and right-sided colorectal cancer in the mix group were similar to that of left-sided colorectal cancer alone and right-sided colorectal cancer alone. Comparative analysis of the DEPs of the normal tissues of left-sided colorectal cancer and right-sided colorectal cancer in the mix group directed that the features of the normal tissues of left-sided colorectal cancer and right-sided colorectal cancer in the mix group were consistent with that of the left-sided colorectal cancer alone group and the right-sided colorectal cancer alone group (Supplementary Fig. S6C–E), further proving that the divergency of left-sided colorectal cancer and right-sided colorectal cancer was associated with their corresponding normal tissues. The left-sided colorectal cancer and right-sided colorectal cancer of the mix group presented the features of the left-sided colorectal cancer alone group and the right-sided colorectal cancer alone group, and specifically, platelet activation and complement cascade were both dominant in left-sided colorectal cancer and right-sided colorectal cancer of the mix group (Supplementary Fig. S6F). Platelets are an active component of the TME and involved in metastasis (40). In this cohort, higher infiltration score and platelets were presented both in left-sided colorectal cancer and right-sided colorectal cancer of the mix group, compared with the left-sided colorectal cancer alone group and the right-sided colorectal cancer alone group (Supplementary Fig. S6G), implying cancer cell metastasis progression in the mix group. The protein levels of FN1 and VCAN and infiltration score of macrophages based on xCell score were overrepresented both in left-sided colorectal cancer and right-sided colorectal cancer of the mix group, compared with the left-sided colorectal cancer alone group and right-sided colorectal cancer alone group, respectively (Supplementary Fig. S6H–J). Positive (Pearson) associations between complement cascade and platelets and between FN1 and platelets were observed (Supplementary Fig. S6K), indicating the interactions between platelets and complement system in the mix group. Together, these findings further confirmed that left-sided colorectal cancer and right-sided colorectal cancer of the mix group presented the features of the left-sided colorectal cancer alone group and the right-sided colorectal cancer alone group, respectively, and platelets activation and complement cascade were both overrepresented in the mix group, involving in colorectal cancer liver metastasis (Supplementary Fig. S6L).

TP53 mutation was predominant in left-sided colorectal cancer (Fig. 5A), explaining the origin characteristics of left-sided colorectal cancer. Chr10q23.31 gain was the top rank arm event in right-sided colorectal cancer (Fig. 5A). Of 3 cis-effect genes by chr10q23.31 gain, only RNAi of ANKRD22 had negative effects on colorectal cancer cell proliferation (Fig. 5C–E). In addition, ANKRD22 was overrepresented in right-sided colorectal cancer at the protein level (Supplementary Fig. S6M).

ANKRD22 is one of the members of the ankyrin family and promotes energy metabolism (41). Of ANKRD22 amplification trans-effect proteins, three proteins (AKAP9, LRBA, and PIGS) were covered in the HPA anchor protein database (Fig. 5F; Supplementary Table S4B; Supplementary Materials), suggesting their function in the glucose metabolism process (42). ANKRD22 amplification had positive impacts on glycolysis at the protein and phosphoprotein levels (Fig. 5G and H; Supplementary Fig. S6N). In addition, positive (Pearson) correlation between AKAP9 and PRKACB indicated the activation of the transfer from fructose 6-phosphate to fructose 1,6-phosphates (Fig. 5I). PIGS exhibited positive association with GPI, which led to the activation of the transfer process from fructose 6-phosphate to fructose 1,6-phosphates and thus leading to the enhancement of glycolysis (Supplementary Fig. S6P). These findings were also validated in TCGA 2012 cohort, CPTAC cohort, and Zeng cohort (Supplementary Fig. S6O–R). Taken together, chr10q23.31 gain was a predominant arm event in right-sided colorectal cancer, in which ANKRD22 amplification impacted glycolysis (Fig. 5J).

The progression paths of colorectal cancer based on CMS and CRIS classifications

Of note, the CMS and CRIS provide deeper insights into the classifications of the advanced stages of colorectal cancer; however, the tracks of colorectal cancer based on CMS and CRIS classifications have not been revealed yet. In this study, we collected 435 samples from 148 patients with early-stage colorectal cancer, covering 10 histopathologic stages, allowing us to explore the progression of the CMS subtypes and CRIS subtypes.

We applied the methods of CMS-based classification (11) to 148 patients with colorectal cancer, which were then divided into four CMS subtypes: CMS1 (n = 20), CMS2 (n = 28), CMS3 (n = 58), and CMS4 (n = 42). Integrated analysis of patients with colorectal cancer revealed that the CMS subtypes were closely associated with various clinical features of patients with early-stage colorectal cancer. For example, the patients in CMS1 and CMS3 subtypes were relatively younger than the patients in CMS2 and CMS4 subtypes, and those in CMS2 and CMS4 subtypes had a higher proportion of male patients (Supplementary Fig. S7A), which improved our understanding of tumor heterogeneity.

MSI was overrepresented in the CMS1 and CMS3 subtypes, and the CMS2 and CMS4 subtypes were characterized by MSS (43). We evaluated MSI of the samples based on gene expression signature scores (17, 44) and classified the samples into two groups. Comparative analysis showed that CMS1 and CMS3 subtypes were characterized by MSI with higher MSI/MSS-sig score, and the consistent findings were observed in TCGA 2012 cohort (Supplementary Fig. S7A–C). In addition, the highest immune infiltration was observed in the CMS1 subtype evidenced by the highest immune score and HLAI score, and the highest stromal infiltration was detected in the CMS4 subtype (Supplementary Fig. S7D and S7E).

We applied the trajectory inference methods (45) to four CMS subtypes and found that the progressive paths between CMS1 and CMS3 subtypes were similar and the progressive paths between CMS2 and CMS4 subtypes were similar (Fig. 6A; Supplementary Fig. S7F). Phase-based supervised clustering analysis revealed carcinogenesis paths of colorectal cancer based on CMS classification: (i) CMS1: primary functions (NT)–oxidation phosphorylation (IEN)–immune infiltration (IFT last till advanced stage colorectal cancer); (ii) CMS3: primary functions (NT)–oxidation phosphorylation (IEN)–glycolysis (IFT last till advanced stage colorectal cancer); (iii) CMS2: primary functions (NT)–oxidation phosphorylation (IEN)–cell cycle/DNA replication (IFT last till advanced stage colorectal cancer); (iv) CMS4: primary functions (NT)–oxidation phosphorylation (IEN)–cell cycle/DNA replication (IFT)–ECM signaling/complement cascade (advanced stage colorectal cancer; Fig. 6B; Supplementary Materials).

In our study, the patients were divided into five CRIS subtypes: CRIS-A (n = 43), CRIS-B (n = 19), CRIS-C (n = 29), CRIS-D (n = 29), and CRIS-E (n = 28). The CRIS-A subtype was MSI-like, evidenced by the highest MSI/MSS-sig score, and other CRIS subtypes (CRIS-B/C/D/E) were MSS-like (Supplementary Fig. S7G and S7H). In addition, the progression paths of CRIS-A subtype and other subtypes contained the features of MSI-group and MSS-group carcinogenesis paths, respectively (Supplementary Fig. S7I). The carcinogenesis paths of CRIS-A and CRIS-B/C/D/E subtypes were revealed in a time-resolved mode: (i) CRIS-A: primary functions (NT)–oxidation phosphorylation (IEN)–immune infiltration and glycolysis (IFT); (ii) CRIS-B/C/D/E: primary functions (NT)–oxidation phosphorylation (IEN)–cell cycle (IFT)–ECM signaling and complement cascade (advanced stage colorectal cancer; Supplementary Fig. S7J; Supplementary Materials).

At the gene level, chr10q23.31 gain was frequent in the CMS3 subtype, and the corresponding progression path had the features of CMS3 subtype progression path (Fig. 6C and D). Besides, the paths of the ANKRD22 amplification and glycolysis based on the GSVA score contained the characterizations of the track of the CMS3 subtype (Supplementary Fig. S7K and S7L), indicating that the function of ANKRD22 amplification by chr10q23.31 gain in the typically dynamic drive pathways of the CMS3 subtype. The TP53 mutation was notably enriched in the CMS4 subtype, and the corresponding path of the TP53 mutation group presented the features of CMS4 subtype progressive path (Fig. 6C and D). The paths of the stroma score and fibroblasts based on xCell score and ECM signaling pathway based on GSVA score had the characterizations of CMS4-subtype progressive path (Supplementary Fig. S7L and S7M), indicating that TP53 mutation impacted the progressive path of the CMS4 subtype.

Collectively, we profiled the progression paths of colorectal cancer based on CMS and CRIS classifications and elucidated the functions of ANKRD22 amplification by chr10q23.31 gain in glycolysis in the CMS3 subtype (younger and female track) and the effects of TP53 mutation on stromal infiltration in the CMS4 subtype (older and male track; Fig. 6E), providing valuable insights into the clinical stratification and subtype-based targeted interventions of colorectal cancer.

The divergency between MSI and MSS in colorectal cancer progression

To explore the characterizations of the tumors with MSI and MSS status, we integrated the datasets of our cohort and other colorectal cancer cohorts, including TCGA 2012 cohort, CPTAC cohort, and Zeng cohort, and performed comparative analysis of the MSI-tumor and MSS-tumor colorectal cancer groups. MSI was detected as early as in the IEN phase evidenced by the highest MSI/MSS-sig score, compared with the IFT and advanced stage colorectal cancer phases, and the consistent results were validated in TCGA 2012 cohort and Zeng cohort (Fig. 7A; Supplementary Fig. S8A and S8B). These findings suggested that MSI was one of the key events in the early stages of colorectal cancer. GSEA revealed that glycolysis was predominant in the MSI group, and the complement cascade was overrepresented in the MSS group (Fig. 7B).

At the gene level, chr10q23.3 gain and ANKRD22 amplification were both frequent in the MSI group, and ANKRD22 was highly expressed in the MSI group at the protein level, which was also validated in the CPTAC cohort (Fig. 7C, D, and H; Supplementary Fig. S8C). In addition, three ankyrin proteins (AKAP9, LRBA, and PIGS) associated with the ANKRD22 amplification were also overrepresented in the MSI group (Supplementary Fig. S8D). The co-mutations of TP53 and KRAS functioned in the MSS group, evidenced by the overrepresentation of key proteins (FN1 and VCAN) and immune signatures (macrophage, monocytes, and hepatocytes; Fig. 7E–H; Supplementary Fig. S8E; Supplementary Materials).

Validations in the AOS/DSS-induced colorectal cancer model revealed that DDX5 loss promoted tumor growth

To demonstrate the findings and results in the Fudan cohort, we introduced AOM and DSS into the colorectal cancer carcinogenesis mouse model (Materials and Methods) and collected 13 samples (Figs. 1A and 8A). In the AOM/DSS-induced colorectal cancer mouse model, tumor volume and weight were elevated in the cycle II group (Fig. 8B). More protein identifications were detected in the AOM/DSS-induced colorectal cancer model group compared with the control group (Supplementary Fig. S9A–C; Supplementary Table S3D). We integrated the highly expressed proteins of those three groups (control, cycle I, and cycle II) and found that the metabolic process was overrepresented in the control group, which was the feature of the NT phase of colorectal cancer (Fig. 8C and D; Supplementary Materials). Consistent with the results in chr17q loss and chr20q gain, DNA replication was dominant in the cycle I group, evidenced by the overrepresentation of related markers (Fig. 8C–G). In addition, DDX5 was gradually decreased during the process of introducing AOM/DSS-induced colorectal cancer mouse model at the protein level (Fig. 8H and I), further demonstrating that DDX5 loss elevated tumor growth in colorectal cancer progression. The results of DDX5′ functions in colorectal cancer cell proliferation and tumor growth in vitro and in the colorectal cancer carcinogenesis mouse model revealed that DDX5 loss promoted colorectal cancer carcinogenesis and development, providing a valuable insight for potential medication in colorectal cancer. In the cycle II group, the dominant pathways included focal adhesion and complement cascade, and the markers of fibroblasts and pericytes were overrepresented (Supplementary Fig. S9D and S9E), which were consistent with the findings of TP53 mutation which impacted TME activity in the advanced stage colorectal cancer phase.

To further validate the findings of our study, we enrolled other cohorts with the datasets of AOM/DSS-induced colorectal cancer carcinogenesis models (Supplementary Materials), including Oshrat cohort (46) and Yan cohort (47). In the Oshrat cohort, the samples in the colitis stage (representing early-stage colorectal cancer) and malignant stage (representing advanced stage colorectal cancer) were collected. Comparative analysis showed that oxidative phosphorylation, cell cycle, and DNA replication–related proteins were overrepresented in the colitis stage, and oncogenic-related pathways were predominant in the malignant tumor, including ECM signaling and complement cascade (Supplementary Fig. S9F).

In the Yang cohort (47), the proteomic datasets of cycle I, cycle II, and cycle III were covered from the AOM/DSS-induced colorectal cancer carcinogenesis mouse model. We integrated the highly expressed proteins of cycles I/II/III and performed pathway enrichment analysis. As a result, mitochondrion activation and cell cycle were notable in the cycle I group, and the highly expressed proteins in cycle II and cycle III participated in stromal infiltration, evidenced by the activation of ECM signaling and focal adhesion (Supplementary Fig. S9G and S9H).

Integrated findings in our cohort and validation of the AOM/DSS-induced colorectal cancer mouse model further demonstrated the dynamic waves in colorectal cancer progression: oxidative phosphorylation (KRAS and BRAF)–cell cycle (TOP1 and DDX5)–ECM signaling/stromal infiltration (TP53; Supplementary Fig. S10).

This study depicted the comprehensive multiomic map in colorectal cancer progression and presented the depth exploration of multistage carcinogenesis of colorectal cancer (Supplementary Fig. S10). The KRAS–BRAF-ME mutations were prominent in the IEN phase and enhanced the oxidative phosphorylation; the chr20q gain, ME with chr17q loss, was a key event in the transmit process from the IEN phase to the IFT phase, leading to colorectal cancer cell proliferation; TP53 mutation was frequent in the advanced stage colorectal cancer phase, which impacted TME activity. In addition, we illustrated the molecular characterizations of left-sided colorectal cancer and right-sided colorectal cancer and proposed that the divergency of colorectal cancer based on locations was associated with their corresponding normal tissues. Furthermore, we found that the mix group presented platelet activation and complement cascade. Besides, we delineated the progression paths of colorectal cancer based on CMS (11) and CRIS (12) classifications and disclosed molecular characterizations of the patients with MSI and MSS.

A core finding is that DDX5 was downregulated during the carcinogenesis of colorectal cancer. We revealed a function for DDX5 in inhibition or activation in different cancer types with a diverse axis. For example, the DDX5–mTOR (activation) axis activated the cell proliferation in non–small cell lung cancer (48), and the DDX5–mTOR (inhibition) axis stimulated autophagy in liver cancer (49). However, the functions of DDX5 in colorectal cancer progression are yet undefined. The investigation of DDX5 in cancer was based on the advanced stage cancer samples, although in which stage it functions during the carcinogenesis process cannot be revealed. This study contained 435 samples covering 4 phases ranging from the NT phase to advanced stage colorectal cancer phase, and comprehensive proteogenomics revealed that DDX5 loss located at the chr17q functioned as early as in the IEN phase, and the DDX5–cell cycle (inhibition) axis activated cell proliferation and tumor growth in colorectal cancer, providing a referenced therapeutic target in the clinic of colorectal cancer.

TP53 and KRAS were frequent in colorectal cancer carcinogenesis and were associated with an elevated risk of colorectal cancer (50). Proteogenomics revealed that KRAS and TP53 mutations had diverse functions in a time-resolved model in colorectal cancer progression. Specifically, KRAS mutation was detected in the IEN phase, and the TP53 mutation functioned in the advanced stage colorectal cancer phase. Li and colleagues (36) discovered that KRAS and TP53 mutations were associated with colorectal cancer liver metastasis, one of the major causes for death of patients with colorectal cancer, although the related mechanism is lacking. This study disclosed that co-mutations of KRAS and TP53 promoted the complement cascade and constructed the links of KRASTP53–complement cascade–TME–colorectal cancer liver metastases, providing a valuable database and valuable insights into colorectal cancer liver metastasis. Clinically, the mix group is more prone to metastasis compared with the left-sided colorectal cancer/right-sided colorectal cancer alone group. However, the characterizations and metastasis mechanism of the mix group are not well characterized. This study delineated that the mix group were featured by co-mutations of KRAS and TP53 and were overrepresented by additional platelet activation and complement cascade, involving in potential colorectal cancer liver metastasis and providing a novel insight into the assessment and therapeutic strategies for patients with colorectal cancer with bilateral tumors.

Integrated analysis profiled four and two progression paths for CMS and CRIS classifications, respectively, and illustrated that MSI showed a positive association with colorectal cancer based on CMS and CRIS classifications. Specifically, the CMS1 (MSI-high) and CMS3 (MSI-low) were featured by MSI. The status of MSI is recognized as one of the molecular fingerprints in colorectal cancer (51), and large-scale colorectal cancer clinic trails shows the medicative effects of immune therapy on patients with colorectal cancer with MSI-high (52). On the basis of immune infiltration and profiles of colorectal cancer based on CMS and CRIS classifications, the association between MSI and subtype-based classifications indicated the immune therapy strategy for patients with colorectal cancer in the CMS1 and CMS3 subtypes and provided a valuable database for patients with colorectal cancer with CMS2 and CMS4 subtypes in the clinic.

Briefly, we presented temporal molecular switches promoting colorectal cancer progression at the multiomic level, disclosed the molecular characterizations of colorectal cancer based on locations, profiled the progression paths of colorectal cancer based on MSI and CMS/CRIS classifications, and demonstrated the impacts of DDX5 loss on colorectal cancer development, providing potential therapeutic targets in the clinic of colorectal cancer. We believe that this study provides insights into the understanding of the architecture of colorectal cancer progression and enables new advances in promoting a useful resource for potential diagnosis and therapy to manage colorectal cancer.

No disclosures were reported.

L. Li: Conceptualization, resources, data curation, formal analysis, funding acquisition, validation, investigation, methodology, writing–original draft, writing–review and editing. D. Jiang: Conceptualization, resources, data curation, formal analysis, investigation, methodology. H. Liu: Conceptualization, resources, data curation, formal analysis, validation, methodology. C. Guo: Conceptualization, resources, data curation, formal analysis, validation, methodology. Q. Zhang: Conceptualization, resources, data curation, formal analysis. X. Li: Resources, data curation, formal analysis, validation, methodology, writing–original draft. X. Chen: Resources, data curation, validation, methodology. Z. Chen: Formal analysis, validation. J. Feng: Data curation. S. Tan: Data curation. W. Huang: Investigation. J. Huang: Investigation. C. Xu: Conceptualization, project administration. C. Liu: Conceptualization, project administration. W. Yu: Conceptualization, writing–original draft, project administration. Y. Hou: Conceptualization, writing–original draft, project administration. C. Ding: Conceptualization, funding acquisition, writing–original draft, project administration, writing–review and editing.

This work is supported by the National Key Research and Development Program of China (2022YFA1303200 and 2022YFA1303201 to C. Ding), National Natural Science Foundation of China (32330062 and 31972933 to C. Ding), Program of Shanghai Academic/Technology Research Leader (22XD1420100 to C. Ding), Major Project of Special Development Funds of Zhangjiang National Independent Innovation Demonstration Zone (ZJ2019-ZD-004 to C. Ding), Shanghai Municipal Science and Technology Major Project (2023SHZDZX02 to C. Ding), the Fudan Original Research Personalized Support Project (to C. Ding), and China Postdoctoral Science Foundation (2023M740697 and GZB20230160 to L. Li). This work is supported by Shanghai Municipal Science and Technology Major Project and the Human Phenome Data Center of Fudan University.

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).

1.
Siegel
RL
,
Miller
KD
,
Fuchs
HE
,
Jemal
A
.
Cancer statistics, 2021
.
CA Cancer J Clin
2021
;
71
:
7
33
.
2.
Lao
VV
,
Grady
WM
.
Epigenetics and colorectal cancer
.
Nat Rev Gastroenterol Hepatol
2011
;
8
:
686
700
.
3.
Jass
JR
.
Classification of colorectal cancer based on correlation of clinical, morphological and molecular features
.
Histopathology
2007
;
50
:
113
30
.
4.
Weinberg
RA
.
Dynamics of cancer: incidence, inheritance, and evolution
.
Nature
2007
;
449
:
978
81
.
5.
Jiang
D
,
Li
X
,
Wang
H
,
Xu
C
,
Li
X
,
Sujie
A
, et al
.
A retrospective study of endoscopic resection for 368 patients with early esophageal squamous cell carcinoma or precancerous lesions
.
Surg Endosc
2017
;
31
:
2122
30
.
6.
Weinstein
JN
,
Collisson
EA
,
Mills
GB
,
Shaw
KRM
,
Ozenberger
BA
,
Ellrott
K
, et al;
Cancer Genome Atlas Research Network
.
The cancer genome atlas pan-cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
7.
Rudnick
PA
,
Markey
SP
,
Roth
J
,
Mirokhin
Y
,
Yan
XJ
,
Tchekhovskoi
DV
, et al
.
A description of the clinical proteomic tumor analysis consortium (CPTAC) common data analysis pipeline
.
J Proteome Res
2016
;
15
:
1023
32
.
8.
Gao
B
,
Li
X
,
Li
S
,
Wang
S
,
Wu
J
,
Li
J
.
Pan-cancer analysis identifies RNA helicase DDX1 as a prognostic marker
.
Phenomics
2022
;
2
:
33
49
.
9.
Vasaikar
S
,
Huang
C
,
Wang
X
,
Petyuk
VA
,
Savage
SR
,
Wen
B
, et al
.
Proteogenomic analysis of human colon cancer reveals new therapeutic opportunities
.
Cell
2019
;
177
:
1035
49.e19
.
10.
Ying
W
.
Phenomic studies on diseases: potential and challenges
.
Phenomics
2023
;
3
:
285
99
.
11.
Guinney
J
,
Dienstmann
R
,
Wang
X
,
de Reyniès
A
,
Schlicker
A
,
Soneson
C
, et al
.
The consensus molecular subtypes of colorectal cancer
.
Nat Med
2015
;
21
:
1350
6
.
12.
Zhang
B
,
Wang
J
,
Wang
X
,
Zhu
J
,
Liu
Q
,
Shi
Z
, et al
.
Proteogenomic characterization of human colon and rectal cancer
.
Nature
2014
;
513
:
382
7
.
13.
Muzny
DM
,
Bainbridge
MN
,
Chang
K
,
Dinh
HH
,
Drummond
JA
,
Fowler
G
, et al;
Cancer Genome Atlas Network
.
Comprehensive molecular characterization of human colon and rectal cancer
.
Nature
2012
;
487
:
330
7
.
14.
Elsaleh
H
,
Joseph
D
,
Grieu
F
,
Zeps
N
,
Spry
N
,
Iacopetta
B
.
Association of tumour site and sex with survival benefit from adjuvant chemotherapy in colorectal cancer
.
Lancet
2000
;
355
:
1745
50
.
15.
Kim
HK
,
Cho
JH
,
Lee
HY
,
Lee
J
,
Kim
J
.
Pulmonary metastasectomy for colorectal cancer: how many nodules, how many times?
World J Gastroenterol
2014
;
20
:
6133
45
.
16.
Li
L
,
Jiang
D
,
Zhang
Q
,
Liu
H
,
Xu
F
,
Guo
C
, et al
.
Integrative proteogenomic characterization of early esophageal cancer
.
Nat Commun
2023
;
14
:
1666
.
17.
Li
L
,
Jiang
D
,
Liu
H
,
Guo
C
,
Zhao
R
,
Zhang
Q
, et al
.
Comprehensive proteogenomic characterization of early duodenal cancer reveals the carcinogenesis tracks of different subtypes
.
Nat Commun
2023
;
14
:
1751
.
18.
Schlemper
RJ
,
Itabashi
M
,
Kato
Y
,
Lewin
KJ
,
Riddell
RH
,
Shimoda
T
, et al
.
Differences in diagnostic criteria for gastric carcinoma between Japanese and Western pathologists
.
Lancet
1997
;
349
:
1725
9
.
19.
Bailey
MH
,
Tokheim
C
,
Porta-Pardo
E
,
Sengupta
S
,
Bertrand
D
,
Weerasinghe
A
, et al
.
Comprehensive characterization of cancer driver genes and mutations
.
Cell
2018
;
174
:
1034
5
.
20.
Mertins
P
,
Mani
DR
,
Ruggles
KV
,
Gillette
MA
,
Clauser
KR
,
Wang
P
, et al
.
Proteogenomics connects somatic mutations to signalling in breast cancer
.
Nature
2016
;
534
:
55
62
.
21.
Yao
ZM
,
Xu
N
,
Shang
GG
,
Wang
HX
,
Tao
H
,
Wang
YZ
, et al
.
Proteogenomics of different urothelial bladder cancer stages reveals distinct molecular features for papillary cancer and carcinoma in situ
.
Nat Commun
2023
;
14
:
5670
.
22.
Feng
JW
,
Ding
C
,
Qiu
NQ
,
Ni
XT
,
Zhan
DD
,
Liu
WL
, et al
.
Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis
.
Nat Biotechnol
2017
;
35
:
409
12
.
23.
Schwanhausser
B
,
Busse
D
,
Li
N
,
Dittmar
G
,
Schuchhardt
J
,
Wolf
J
, et al
.
Corrigendum: global quantification of mammalian gene expression control
.
Nature
2013
;
495
:
126
7
.
24.
Cox
J
,
Mann
M
.
MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification
.
Nat Biotechnol
2008
;
26
:
1367
72
.
25.
Simanshu
DK
,
Nissley
DV
,
McCormick
F
.
RAS proteins and their regulators in human disease
.
Cell
2017
;
170
:
17
33
.
26.
Yaeger
R
,
Chatila
WK
,
Lipsyc
MD
,
Hechtman
JF
,
Cercek
A
,
Sanchez-Vega
F
, et al
.
Clinical sequencing defines the genomic landscape of metastatic colorectal cancer
.
Cancer Cell
2018
;
33
:
125
36.e3
.
27.
Janes
MR
,
Zhang
J
,
Li
LS
,
Hansen
R
,
Peters
U
,
Guo
X
, et al
.
Targeting KRAS mutant cancers with a covalent G12C-specific inhibitor
.
Cell
2018
;
172
:
578
89.e17
.
28.
Hoadley
KA
,
Yau
C
,
Hinoue
T
,
Wolf
DM
,
Lazar
AJ
,
Drill
E
, et al
.
Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer
.
Cell
2018
;
173
:
291
304.e6
.
29.
Song
ZY
,
Ghochani
M
,
McCaffery
JM
,
Frey
TG
,
Chan
DC
.
Mitofusins and OPA1 mediate sequential steps in mitochondrial membrane fusion
.
Mol Biol Cell
2009
;
20
:
3525
32
.
30.
Tsherniak
A
,
Vazquez
F
,
Montgomery
PG
,
Weir
BA
,
Kryukov
G
,
Cowley
GS
, et al
.
Defining a cancer dependency map
.
Cell
2017
;
170
:
564
76.e16
.
31.
Iyer
RS
,
Nicol
SM
,
Quinlan
PR
,
Thompson
AM
,
Meek
DW
,
Fuller-Pace
FV
.
The RNA helicase/transcriptional co-regulator, p68 (DDX5), stimulates expression of oncogenic protein kinase, Polo-like kinase-1 (PLK1), and is associated with elevated PLK1 levels in human breast cancers
.
Cell Cycle
2014
;
13
:
1413
23
.
32.
Kalluri
R
.
The biology and function of fibroblasts in cancer
.
Nat Rev Cancer
2016
;
16
:
582
98
.
33.
Roumenina
LT
,
Daugan
MV
,
Petitprez
F
,
Sautes-Fridman
C
,
Fridman
WH
.
Context-dependent roles of complement in cancer
.
Nat Rev Cancer
2019
;
19
:
698
715
.
34.
Greenberg
JI
,
Shields
DJ
,
Barillas
SG
,
Acevedo
LM
,
Murphy
E
,
Huang
JH
, et al
.
Erratum: a role for VEGF as a negative regulator of pericyte function and vessel maturation
.
Nature
2009
;
457
:
1168
.
35.
Nissan
MH
,
Pratilas
CA
,
Jones
AM
,
Ramirez
R
,
Won
HL
,
Liu
CL
, et al
.
Loss of NF1 in cutaneous melanoma is associated with RAS activation and MEK dependence
.
Cancer Res
2014
;
74
:
2340
50
.
36.
Li
C
,
Sun
Y-D
,
Yu
G-Y
,
Cui
J-R
,
Lou
Z
,
Zhang
H
, et al
.
Integrated omics of metastatic colorectal cancer
.
Cancer Cell
2020
;
38
:
734
47.e9
.
37.
Lin
L
,
Zeng
X
,
Liang
S
,
Wang
Y
,
Dai
X
,
Sun
Y
, et al
.
Construction of a co-expression network and prediction of metastasis markers in colorectal cancer patients with liver metastasis
.
J Gastrointest Oncol
2022
;
13
:
2426
38
.
38.
Wu
YC
,
Yang
SX
,
Ma
JQ
,
Chen
ZC
,
Song
GH
,
Rao
DN
, et al
.
Spatiotemporal immune landscape of colorectal cancer liver metastasis at single-cell level
.
Cancer Discov
2022
;
12
:
134
53
.
39.
Yu
JL
,
Green
MD
,
Li
SS
,
Sun
YL
,
Journey
SN
,
Choi
JE
, et al
.
Liver metastasis restrains immunotherapy efficacy via macrophage-mediated T cell elimination
.
Nat Med
2021
;
27
:
152
64
.
40.
Zhang
WK
,
Zhou
H
,
Li
HY
,
Mou
HC
,
Yinwang
E
,
Xue
YC
, et al
.
Cancer cells reprogram to metastatic state through the acquisition of platelet mitochondria
.
Cell Rep
2023
;
42
:
113464
.
41.
Pan
T
,
Liu
J
,
Xu
S
,
Yu
Q
,
Wang
H
,
Sun
H
, et al
.
ANKRD22, a novel tumor microenvironment-induced mitochondrial protein promotes metabolic reprogramming of colorectal cancer cells
.
Theranostics
2020
;
10
:
516
36
.
42.
Lin
CX
,
Tu
CW
,
Ma
YK
,
Ye
PC
,
Shao
X
,
Yang
ZA
, et al
.
Nobiletin inhibits cell growth through restraining aerobic glycolysis via PKA-CREB pathway in oral squamous cell carcinoma
.
Food Sci Nutr
2020
;
8
:
3515
24
.
43.
Joanito
I
,
Wirapati
P
,
Zhao
N
,
Nawaz
Z
,
Yeo
G
,
Lee
F
, et al
.
Single-cell and bulk transcriptome sequencing identifies two epithelial tumor cell states and refines the consensus molecular classification of colorectal cancer
.
Nat Genet
2022
;
54
:
963
75
.
44.
Cristescu
R
,
Lee
J
,
Nebozhyn
M
,
Kim
KM
,
Ting
JC
,
Wong
SS
, et al
.
Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes
.
Nat Med
2015
;
21
:
449
56
.
45.
Saelens
W
,
Cannoodt
R
,
Todorov
H
,
Saeys
Y
.
A comparison of single-cell trajectory inference methods
.
Nat Biotechnol
2019
;
37
:
547
54
.
46.
Levi-Galibov
O
,
Lavon
H
,
Wassermann-Dozorets
R
,
Pevsner-Fischer
M
,
Mayer
S
,
Wershof
E
, et al
.
Heat Shock Factor 1-dependent extracellular matrix remodeling mediates the transition from chronic intestinal inflammation to colon cancer
.
Nat Commun
2020
;
11
:
6245
.
47.
Wang
Y
,
Shan
Q
,
Hou
G
,
Zhang
J
,
Bai
J
,
Lv
X
, et al
.
Discovery of potential colorectal cancer serum biomarkers through quantitative proteomics on the colonic tissue interstitial fluids from the AOM-DSS mouse model
.
J Proteomics
2016
;
132
:
31
40
.
48.
Yang
ZW
,
Li
GY
,
Zhao
YZ
,
Zhang
L
,
Yuan
XH
,
Meng
LJ
, et al
.
Molecular insights into the recruiting between UCP2 and DDX5/ubap2l in the metabolic plasticity of non-small-cell lung cancer
.
J Chem Inf Model
2021
;
61
:
3978
87
.
49.
Zhang
H
,
Zhang
YQ
,
Zhu
XY
,
Chen
C
,
Zhang
C
,
Xia
YZ
, et al
.
DEAD box protein 5 inhibits liver tumorigenesis by stimulating autophagy via interaction with p62/SQSTM1
.
Hepatology
2019
;
69
:
1046
63
.
50.
Rajamaki
K
,
Taira
A
,
Katainen
R
,
Valimaki
N
,
Kuosmanen
A
,
Plaketti
RM
, et al
.
Genetic and epigenetic characteristics of inflammatory bowel disease-associated colorectal cancer
.
Gastroenterology
2021
;
161
:
592
607
.
51.
Vilar
E
,
Gruber
SB
.
Microsatellite instability in colorectal cancer-the stable evidence
.
Nat Rev Clin Oncol
2010
;
7
:
153
62
.
52.
Guo
LW
,
Wang
YJ
,
Yang
WX
,
Wang
CC
,
Guo
TA
,
Yang
JC
, et al
.
Molecular profiling provides clinical insights into targeted and immunotherapies as well as colorectal cancer prognosis
.
Gastroenterology
2023
;
165
:
414
28.e7
.
This open access article is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.