Single-cell RNA sequencing (scRNA-seq) allows for the study of the transcriptome at a cellular level, where populations of cells are annotated based on the expression of marker genes, providing a tool to gain cell-specific, subtle insights on cancer biology. However, precise annotation of cell type remains a challenge, hindering the efficiency of data interpretation. Several existing tools for cell-type annotation have been developed to improve resolution and reproducibility, yet their performance is reduced when the reference dataset contains many cell types, subclasses of similar cell types, or malignant cells. Interpatient malignant cell heterogeneity often leads to reduced accuracy when classifying cancer cells as most methods rely on correlations to a reference from a different source. Given the challenges in the annotation of scRNA-seq data of cancer and its high impact for elucidating mechanisms associated with tumor heterogeneity, pathogenesis, and treatment, we developed a comprehensive, hierarchically organized, multi-layered classifier spanning diverse malignant and normal cells of the tumor microenvironment. We found that performance improves when each layer focuses on a smaller number of classes and each cell sequentially moves down a series of classifiers with increased cell type resolution. When applied to an external validation dataset of over 300 primary solid tumor biopsies spanning diverse cancer types, the classifier accurately annotated the tissue of origin of malignant cells, and relevant subtypes of stromal and blood cells, with average F1 scores of 0.91, 0.95 and 0.99 respectively. Using confidence thresholds at each layer, the classifier abstains from classifying ambiguous cells. We applied 4 existing annotators provided with the same reference to the external test dataset and found that cancer cells are misclassified or unclassified, while the blood and stromal cells are accurately classified, highlighting our tool’s unique ability to classify cancer cells. Moreover, we applied our classifier’s to scRNA-seq data derived from breast cancer metastasis to the liver and were able to uncover the tissue of origin, demonstrating the potential use for determining the source of a metastatic tumour. Finally, given that our classifier is modular, we leveraged two recently published single cell breast cancer atlases to add a breast cancer subtype classification layer, that consistently identified the correct clinical subtype of single breast cancer cells in external data. This study provides a flexible model for the annotation of cells comprising the tumor microenvironment in pan cancer settings, while existing methods require tissue-specific references for every cancer type. Our classifier provides a powerful method for investigating intercellular communication pathways between tumor cells and non-malignant cells of the tumor microenvironment.

Citation Format: Ido Nofech-Mozes, Philip Awadalla, Sagi Abelson. Comprehensive cell-type classification of tumor and normal cells from single cell RNA sequencing in pan cancer settings [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 1221.