Significance: Gene expression signatures have been successfully used in radiotherapy dosing, predicting benefit of adjuvant chemotherapy, and prognosticating based on hypoxic state. Yet, despite these successes, these tools often fall short in validation between datasets and translation to clinical use. We propose a novel method for deriving more robust gene expression signatures predictive of chemotherapeutic response. Although this method may be applied to any drug, here, we derive a signature to predict cisplatin sensitivity in epithelial-based cancer cell lines.
Methodology: We ranked epithelial-based cell lines from the Genomics of Drug Discovery in Cancer dataset based on their response to cisplatin (IC50). Differential gene expression analysis between resistant and sensitive cell lines was performed using the SAM, limma, and multtest algorithms implemented with R. The intersection of the DE results from each method became the seed genes in a co-expression network built from TCGA gene expression data. The most highly co-expressed seed genes are extracted and they are termed “connectivity genes.” We perform this analysis 5 times, each time excluding 20% of the GDSC dataset. We formed the final signature by combining genes found in at least 3 of the 5 connectivity gene sets.
A cell line's median expression of the signature genes is termed its “signature score.” Using 5-fold cross validation, models are built to predict IC50 given a cell line's signature score or expression of all signature genes. For every model built testing our gene signature, a null distribution of performance metrics was produced by repeating the procedure 1000 times with random gene signatures of the same length as the signature in question.
Results: The final signature contains 13 genes: ADA, NPM3, CSTA, KRT5, KRT14, ATP1B3, LY6K, USP31, BNC1, MAP7D3, LRRC8C, C15orf41, and SLFN11.
Using linear regression to predict a cell line's IC50 from its signature score, the predicted vs. actual rank of IC50 has a Spearman correlation coefficient of 0.408 and a p-value of << 0.001. When limiting the dataset to only cell lines in the top/bottom quintile of signature expression, the correlation coefficient increases to 0.624. Using L2-penalized linear regression to predict a cell line's IC50 from expression of the 13 signature genes, the predicted vs. actual rank of IC50 has a Spearman correlation coefficient of 0.568. When limiting the dataset to only cell lines in the top/bottom quintile of signature expression, the correlation coefficient increases to 0.680. Compared to their respective null distribution, the resulting coefficients always fell above the 95% confidence interval.
Conclusions: We have demonstrated that our method is capable of deriving gene expression signatures predictive of chemotherapeutic response. Because extracted genes are co-expressed in both cancer cell lines (GDSC) and patient tumor samples (TCGA), we hypothesize that these signatures should have improved robustness in novel datasets and translation to clinical use.
Citation Format: Jessica Scarborough, Andrew Dhawan, Jacob Scott. Exploiting convergent evolution to derive a cisplatin sensitivity gene expression signature in epithelial based cancer [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 4415.