Our goal is to define multiple informative transcriptome variables that capture sources of heterogeneity in chronic lymphocytic leukemia (CLL) cells, for flexible modeling in epidemiology and clinical studies. CD19+/CD5+ B-cells were sorted from whole blood of 227 CLL patients and RNA was sequenced on the HiSeq4000 or NovaSeq platforms as part of the ORIEN Avatar Initiative. Transcript-based read counts were generated from FASTQ files using Salmon. High-quality genes were selected and read counts internally normalized and corrected for batch effects using ComBat. Preprocessing resulted in a final set of 8,716 quality-controlled, autosomal, protein-coding genes. PCA was performed and, using a scree test, we selected 14 dimensions that represented 55.9% of the total variance across the CLL patients' transcriptomes. Fourteen quantitative, orthogonal CLL dimension variables were calculated for all 227 patients. By design, these CLL dimensions capture transcriptome variance and provide novel multi-gene expression biomarkers. We assessed whether these CLL transcriptome dimensions captured known clinically relevant molecular differences. First, we investigated associations with IGVH mutational status (determined using MiXCR). CLL dimension variables 1, 5, 6, and 8 predicted IGHV mutational status (p=4.6x10-16). Next, we investigated associations with ZAP70 and CD38 biomarkers, calculated by their expression in the RNA sequencing data using a separate pipeline and correcting for batch effects by ComBat (neither gene was in the 8,716 genes retained for PCA). CLL dimension variables 3, 5, 6, 7, and 8 significantly predicted Zap70 expression (p=1.6x10-38). CLL dimension variables 2, 3, 5, and 6 significantly predicted CD38 expression (p=3.1x10-31). Transcriptome dimension variables provide a flexible intrinsic framework to describe heterogeneity across CLL patients. We have shown that our transcriptome dimensions capture IGHV mutational status and ZAP70 and CD38 expression, all biomarkers for prognosis. Future work will include exploring the ability of the 14 dimensions to capture other known important molecular markers for CLL, including somatic deletion of 17p deletion, somatic mutational patterns, microsatellite instability, and previously described expression-based subgroups. Transcriptome dimensions are designed for utility as predictor variables, alongside other covariates, in parametric modeling, and have the potential to improve both epidemiology and clinical studies.

Citation Format: Julie E. Feusier, Rosalie G. Waller, Michael J. Madsen, Brian Avery, Nicola J. Camp. Transcriptional dimensions provide a framework for describing tumor heterogeneity in CLL [abstract]. In: Proceedings of the AACR Virtual Meeting: Advances in Malignant Lymphoma; 2020 Aug 17-19. Philadelphia (PA): AACR; Blood Cancer Discov 2020;1(3_Suppl):Abstract nr PO-01.