Abstract
There are different methods to predict survival time, including traditional statistical regression, and modern neural network. Still, precise prediction of survival time remains as a challenge. In recent years algebraic topology was introduced as a novel method to infer information from high dimensional data set, where a mapper uses an algorithm to obtain geometric or topological features of the data set, and associate the features with the predicted variables. In this study, the data set includes gender, age, AFP, GGT, PPT, TB, ALT, tumor size, tumor number, BCLC stage, number of liver nodules, type of surgical approach etc., total 21 diagnostic variables. We use these variables to predicted OS and DFS, and the data set contains 329 records. In this study, the topology mapper is executed to acquire the topological features of all the data (including training and test sets), since this task is unsupervised, so the target values are not considered. Our method is to infer three new features of each data record, including the location of the record on the main branch of the topology, whether it is a branch node (i.e. has more than 2 branches), and whether it is an endpoint. A simple neural network with one hidden layer is designed and implemented with Tensorflow, and the three features will be added to the data set and provide valuable information for prediction of the target values. Without adding the new features from topology mapper, we have 72% accuracy (in predicting the survival time is greater or less than the mean value) on test set data when predicting OS, and we have 75% accuracy on test set data when predicting the DFS. With the new features from topology mapper, we have 83% accuracy on test set data when predicting OS, and 84% accuracy on test set data when predicting the DFS. The three new features inferred from the topology mapper have a significant effect on the accuracy of prediction and provide better accuracy. The topology based data analysis method in this study could be applied to other classification problems related to clinical trial data, and will need further verification.
Citation Format: Cheng Huang, Hui-Chuan Sun, Junda Chen, Lin Shi, Fengqing Li, Yihui Lin, Hanyan Yang. A topology based data analysis method to predict survival time [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 1634.