We address challenges in longitudinal RNA-seq analysis of patient tumors, which can reveal powerful insights on cancer evolution at personalized level, proposing methods for unbiased analysis and integration

Experimental procedures:

214 whole-genome bulk RNA-seq samples from 61 high-grade serous ovarian cancer (HGSC) patients were collected before and after chemotherapy, along with single-cell RNA-seq data from >93,000 cells in 11 patients. 308 TCGA treatment-naive bulk RNA-seq samples from HGSC, 474 from melanoma (SKCM), and 581 from endometrial carcinoma (UCEC) were used for validation.


We show that in a longitudinal analysis, cancer progression and chemotherapy exert both microenvironmental and phenotypic changes. We developed PRISM [1] to factor the expression data at individual bulk level, and show the adjustment improves the association between expression profiles and patient survival. Our findings extend to pathway and expression-derived tumor subtype levels in HGSC and SKCM.

Technical batch effects pose challenges for multi-institute or long-running sample collections. Unbiased correction requires replicates, which are typically unavailable for patient data. We developed POIBM [submitted] to simultaneously infer a suitable reference and factor out the batch effects. We show that POIBM effectively discovers true replicates, batch effects plague many cancer types in TCGA data, and batch correction allows more meaningful expression subtyping in UCEC.

Distinct genomic backgrounds of the patients hinders stratification and discovering shared functional states. We developed PRIMUS [submitted] to simultaneously factor the patient background (or other confounders) and a sample clustering, which is necessary when the underlying stratification is uneven or shared by patient subsets. Among the commonly aberrant cell states, such as EMT, PRIMUS analysis on HGSC scRNA identified a novel stress signature, enriched by chemotherapy and indicating poor survival, which were validated in PRISM-factored TCGA bulk ovarian data.


Our methodology facilitates unbiased longitudinal expression analysis and integration. The more accurate phenotypic changes at gene, pathway, and expression subtype level may confer sensitivity/resistance chemotherapy, and, as shown, allow enhance patient survival prediction and discovery of novel chemotherapy related subtypes. Consequently, our work aids ranking putative intervention strategies for overcoming ovarian cancer chemoresistance for future validation experiments.

[1]: Hakkinen et al., Bioinformatics 37: 2882 (2021)

Citation Format: Antti Häkkinen, Kaiyang Zhang, Susanna Holmström, Sanaz Jamaldazeh, Johanna Hynninen, Sakari Hietanen, Kaisa Huhtinen, Sampsa Hautaniemi. Factoring expression data of high-grade serous ovarian cancer tumors for unbiased longitudinal analysis [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 2707.