Cellular deconvolution approaches allow virtual reconstruction of the tissue composition from bulk RNAseq data. Although the role of the tumor microenvironment (TME) in disease progression and response to therapy is well established, current deconvolution methods that define this intratumoral cellular component are yet to be integrated into guiding cancer therapy and understanding its clinical impact.

Here we present a novel approach for tumor cellular deconvolution based on a machine learning-based algorithm. We developed a framework that is trained on the RNAseq profiles of 11,857 purified cell samples. Our algorithm was subsequently able to detect minor differences between cell types and correctly reconstruct the cell type percentage from the bulk RNAseq of mixtures of different cell types. We used three-stage hierarchical learning procedure for LightGBM model that included training on artificial RNAseq mixtures of different purified cell types, including major immune (T, B, NK, Macrophage) and stromal (CAF, endothelial) cell populations. The model was then trained to reconstruct proportions of T cell subtypes such as Th, CTL, Treg, as well as M1 and M2 macrophages and other cell types. To increase the accuracy of deconvolution in specific cancer types, the algorithm was trained on mixtures that included purified malignant cells in proportions specific to their TME. We used the relative abundance of RNA per cell type in order to accurately determine the cellular proportion.

Our algorithm performance was validated on 14 datasets of different tissues (PBMC, bone marrow, tumors) by comparison of RNA-seq deconvolution with flow cytometry or single cell RNAseq measurements of the same cell suspension, obtaining a median correlation value of 0.96 (ranging from 0.917 to 0.986). The presented approach yielded higher performance compared with other cell deconvolution technologies including EPIC, quanTIseq, Cibersort, xCell and MCPcounter. In addition, a comparison of tumor purity by histological examination of TCGA samples with our algorithm prediction yielded a correlation value of 0.75. Comparison of tumor composition from fresh non-small cell lung cancer samples comparing CYTOF and RNAseq from the same specimen resulted an overall correlation of 0.925 for multiple cell types.

In summary, this novel machine learning based technology provides an accurate and robust tool for cell deconvolution from the tumor biopsy (including stromal and immune elements), using bulk RNAseq. Future application of this novel computational tool could lead to an improved and more comprehensive understanding of the role of the microenvironment in tumor pathogenesis and ultimately support clinical decision making for the treatment of cancer.

Citation Format: Alexander Zaytcev, Maxim Chelushkin, Katerina Nuzhdina, Alexander Bagaev, Daniyar Dyykanov, Vladimir Zyrin, Susan Raju Paul, Diane L. Davies, Patrick M. Reeves, Michael Lanuti, Mark C. Poznansky, Arthtur Baisangurov, Ravshan Ataullakhanov, Nathan Fowler. Novel machine learning based deconvolution algorithm results in accurate description of tumor microenvironment from bulk RNAseq [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 853.