Background: Single-cell RNA sequencing (scRNA-seq) enables the precise identification of distinct cell populations and the measurement of gene expression within each population. Recently, integrating scRNA-seq and genome-wide association study (GWAS) data has shown promise in identifying cell-of-origin populations in human diseases. Colorectal cancer (CRC) progresses through a phenotypic continuum from normal colon to adenoma, and ultimately CRC. However, the specific disease cell-of-origin populations underlying each stage of CRC development at a single-cell resolution have not been well studied. Additionally, while transcriptome-wide association studies (TWAS) have identified more than 200 putative risk genes in CRC, the precise disease cell-of-origin populations associated with these genes remain elusive.

Materials and methods: We analyzed scRNA-seq datasets from 31 normal colon, 8 colon serrated polyps, and 15 conventional adenoma tissues of individuals participated the Colorectal Molecular Atlas Project (COLON MAP). We next conducted an integrative analysis of the aforementioned scRNA-seq data, with extensive epigenomics data and summary statistics from a large CRC GWAS conducted among European ancestry populations (78,473 CRC cases and 107,143 controls) using single-cell disease relevance score (scDRS), to investigate cell-of-origin populations underlying the CRC development. We conducted correlation analysis between gene expression and individual cell risk scores derived from scDRS. Additionally, we performed differential expression analysis comparing disease cell-of-origin populations with other cell populations at each stage.

Results: We identified multiple cell-of-origin populations associated with CRC, including absorptive cells (ABS) (P < 1 × 10-4) and goblet cells (GOB) (P < 7 × 10-3) in the normal colon, ABS (P < 2 × 10-3), enteroendocrine cells (EE) (P < 0.04), and serrated-specific cells (SSC) (P < 1 × 10-4) in serrated polyps, and ABS (P < 9 × 10-3) and SSC (P < 6 × 10-4) in adenomas. Among the previously reported 205 risk genes from TWAS, a majority show high correlations of their expressions with the risk scores of the single cells (median value R2 = 0.07, 0.14, 0.137 at normal colon, serrated polyps, adenoma, respectively). Moreover, 57 of these TWAS-reported genes (27.8% of 205 genes) exhibited evidence of differential expression in disease cell-of-origin populations compared to other cell populations at a nominal p-value < 0.05 in at least one stage (Binomial test, P < 0.01 for all).

Conclusion: This study reveals the cell-of-origin populations that influence the development of CRC, and provides evidence of the role of risk genes in these populations. These findings are crucial for identifying causal genes and understanding the cellular mechanisms that drive the biology and etiology of CRC.

Citation Format: Chao Li, Zhishan Chen, Qing Li, Jungyoon Choi, Quanhu Sheng, Wanqing Wen, Xiang Shu, Xiao-ou Shu, Jirong Long, Qiuyin Cai, Bhuminder Singh, Martha J. Shrubsole, Ken Lau, Wei Zheng, Xingyi Guo. Integrating single-cell RNA-seq and large genome-wide association study data to identify colorectal cancer cell-of-origin populations [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6614.