Abstract
A number of staging systems have been developed to predict clinical outcomes in hepatocellular carcinoma (HCC). However, no general consensus has been reached regarding the optimal model. New approaches such as machine learning (ML) strategies are powerful tools for incorporating risk factors from multiple platforms. We retrospectively reviewed the baseline information, including clinicopathologic characteristics, laboratory parameters, and peripheral immune features reflecting T-cell function, from three HCC cohorts. A gradient-boosting survival (GBS) classifier was trained with prognosis-related variables in the training dataset and validated in two independent cohorts. We constructed a 20-feature GBS model classifier incorporating one clinical feature, 14 laboratory parameters, and five T-cell function parameters obtained from peripheral blood mononuclear cells. The GBS model–derived risk scores demonstrated high concordance indexes (C-indexes): 0.844, 0.827, and 0.806 in the training set and validation sets 1 and 2, respectively. The GBS classifier could separate patients into high-, medium- and low-risk subgroups with respect to death in all datasets (P < 0.05 for all comparisons). A higher risk score was positively correlated with a higher clinical stage and the presence of portal vein tumor thrombus (PVTT). Subgroup analyses with respect to Child–Pugh class, Barcelona Clinic Liver Cancer stage, and PVTT status supported the prognostic relevance of the GBS-derived risk algorithm independent of the conventional tumor staging system. In summary, a multiparameter ML algorithm incorporating clinical characteristics, laboratory parameters, and peripheral immune signatures offers a different approach to identify patients with the greatest risk of HCC-related death.
Introduction
Liver cancer is a major cause of cancer-related mortality, accounting for 8.2% of all cancer-related deaths worldwide (1). Major risk factors for hepatocellular carcinoma (HCC) include infection with hepatitis B virus (HBV) or hepatitis C virus (HCV) and either alcoholic or nonalcoholic liver disease (2). The clinical outcome of patients with HCC depends greatly on tumor burden, treatment modality, and liver function. Therefore, numerous staging systems and predictive/prognostic models have been developed to evaluate the hepatic functional reserve (3).
To date, several staging systems, such as the Barcelona Clinic Liver Cancer (BCLC) staging classification (4), the Cancer of the Liver Italian Program score (5), the Japan Integrated Staging score (6, 7), and the Tokyo score (8), have been developed to predict clinical outcomes of patients with HCC undergoing different treatments. Liver function parameters, including the Child–Pugh classification system (9), albumin and bilirubin levels (10), the international normalized ratio (11), and the alkaline phosphatase (ALP)-to-platelet ratio (12), have also been shown to be independent prognostic factors for overall survival (OS) in patients with HCC. However, most of these risk stratification scoring algorithms were developed on the basis of clinical and laboratory data and employed conventional strategies, such as systemic literature reviews or multivariable Cox survival analyses (13). However, no consensus has been reached regarding which staging algorithm can be used to develop the best prognostic model. Hence, new approaches to discover survival models are still needed in liver cancer research.
T-cell exhaustion, defined as the impaired capacity to proliferate and secrete cytokines, accompanied by upregulation of immune checkpoint molecules [e.g., programmed cell death protein 1 (PD-1) and its ligand PD-L1], has also been correlated with HCC progression (14–16). The advent of immune checkpoint therapy has revolutionized the treatment of solid malignancies. However, only a subset of patients derives clinical benefit from this therapy (17, 18), indicating that additional inhibitory receptors may be involved in T-cell exhaustion and associated with the clinical outcomes of patients with cancer. Inhibitory molecules that have been shown to contribute to suppression of T-cell responses include lymphocyte-activation gene 3 (LAG-3), B- and T-lymphocyte attenuator, T-cell immunoglobulin domain and mucin domain 3 (TIM-3 or HAVCR2), and T-cell immunoreceptor with immunoglobulin and immunoreceptor tyrosine-based inhibitory motif domains (TIGIT; refs. 19, 20). We previously identified that upregulation of the PD-1+TIGIT+CD8+ T-cell subpopulation during HCC progression is accompanied by features indicating T-cell exhaustion and might be a predictor of poor outcomes (21). Taken together, these findings indicate the apparent prognostic significance of immune characteristics during HCC development and suggest that these characteristics are worthy of evaluation in the prognostic model.
To address the issues outlined above, we employed a machine learning (ML) strategy and developed a multiparametric prognostic model based on conventional clinicopathologic characteristics [tumor size, portal vein tumor thrombus (PVTT), Child–Pugh class, BCLC stage, etc.], laboratory parameters [γ-glutamyl transferase (γ-GGT), C-reactive protein (CRP), ALP, HBV-DNA and alpha fetoprotein (AFP) levels, etc.], and immune features reflecting T-cell function (peripheral CD4+PD-1+, CD8+PD-1+, and CD8+PD-1+TIGIT+TIM-3+ populations, etc.). We derived a ML-based risk scoring system in a cohort of 136 patients with HCC and validated the model in two independent datasets. Our data revealed that peripheral predictors, such as γ-GGT, CRP, ALP, and circulating T-cell populations, are critical determinants of the clinical outcome of liver cancer. This ML-based multiparameter risk scoring model could provide an approach to identify patients with HCC who have the greatest risk for OS.
Materials and Methods
Patients and sample collection
We retrospectively included patients with HCC who were hospitalized in Beijing Ditan Hospital (Beijing, P.R. China) between May 1, 2017, and June 1, 2018, and randomly divided them into a training set (n = 136) and validation set 1 (n = 56). For model verification, we recruited patients from March 1, 2016, to April 1, 2017, as independent validation set 2 (n = 105). The diagnostic criteria for HCC were the same as those used in our previous study (22). The study was approved by the Ethics Committee of Beijing Ditan Hospital, Capital Medical University (Beijing, P.R. China). All patients signed written informed consent. The study was conducted in accordance with the Helsinki Declaration of 1975, as revised in 2008. The inclusion criteria were as follows: (i) age ≥18 years and (ii) patients with HCC diagnosed by imaging or histology evaluation according to the Asian Pacific Association for the Study of the Liver (APSAL) guidelines (23). The exclusion criteria were as follows: (i) patients with cholangiocarcinoma, (ii) patients with metastatic HCC, (iii) patients with other tumors, (iv) patients taking immunosuppressive agents, and (v) patients who were lost to follow-up. The flow chart of the real-world study is shown in Supplementary Fig. S1. In total, we excluded (i) 21 patients with cholangiocarcinoma, (ii) 14 patients with metastatic HCC, (iii) 9 patients with other tumors, and (iv) 18 patients who were lost to follow-up.
All patients were examined by imaging or serum AFP measurement every 3 months, and patient progression was assessed by the mRECIST criteria (24). The progression-free survival time was defined as the time from enrollment to disease progression or death; the OS time was defined as the time from enrollment to death or January 1, 2019, whichever came first. We used electronic medical records and telephone follow-up visits to clarify the survival status and survival time of patients.
Flow cytometry staining and analysis
We collected peripheral blood from all patients before comprehensive treatment, and separated the peripheral blood mononuclear cells (PBMC) by Ficoll-Paque (Amersham Pharmacia Biotech) density gradient centrifugation. PBMCs were incubated with conjugated antibodies for 30 minutes at 4°C in the dark, and then washed and subjected to flow cytometry analysis. To exclude dead cells, we added 7-AAD viability dye (BD Biosciences, #51-68981E) staining solution to each sample prior to analysis. The immune checkpoint molecules were stained on the surface of T cells and the antibodies used for staining were as follows: PD-1-BV711 (clone EH12.1, BD Biosciences, #564017), TIGIT-PE-CY7 (clone MBSA43, eBioscience, #25-9500-42), TIM-3-BV650 (clone 7D3, BD Biosciences, #565564), LAG-3-APC (clone 3DS223H, eBioscience, #17-2239-42), and CTLA-4 (clone BNI3, BD Biosciences, #555853) and the corresponding isotype controls. The CD4 and CD8 T-cell subsets were divided into the naïve T (Tn) cell (CD45RA+CCR7+), central memory T (Tcm) cell (CD45RA−CCR7+), effector memory T (Tem) cell (CD45RA−CCR7−), and terminally differentiated effector memory T (Temra) cell (CD45RA+CCR7−) populations. We used an LSR Fortessa Flow Cytometer (BD Biosciences) for data collection and FlowJo software for data processing and analysis. The representative dot plots and gating strategy are shown in Supplementary Fig. S2.
Data preprocessing and feature selection
We employed a command line pipeline to automatically construct an ML survival model with the following modules: data preprocessing, feature selection, modeling, and prediction.
For data preprocessing, to shrink outliers and approximate a Gaussian distribution for numeric features, the values of all features were log2 transformed after a pseudocount of 1 was added. We removed features and samples for which more than 20% of the values were missing. Missing data were then filled in with the median values of each feature and further standardized by removing the mean and scaling to unit variance. The features for which more than 97% of the values were the same were removed. The low-variance features (variance less than 0.0005) were also removed. We further ranked the features using six unsupervised methods (ANOVA F-test, mutual information, Pearson correlation test, Spearman correlation test, Kendall tau rank correlation coefficient, and logistic regression) and used the features supported by at least one method for further analysis. For each method, we extracted only the top 40% of ranked features.
After data processing in the training set, a univariate Cox proportional hazards (CoxPH) model was first utilized to remove features not meeting the threshold criteria [concordance index (C-index) > 0.5 and log likelihood ratio test P ≤ 0.1]. Recursive feature elimination with cross-validation (RFECV) was then used to fit the data to select features with the highest accuracy scores. To obtain consensus features, we performed 20 iterations of the stratified shuffle-split cross-validation iterator with 10 splitting iterations and a test size range of 0.15 to 0.34 with a step size of 0.01. As a result, in the final model, 20 optimal features were selected by the gradient-boosting survival (GBS) estimator to construct the final survival predictor.
Model construction and performance evaluation
On the basis of the 20 selected features, the GBS model was used for modeling with 13-fold cross-validation (10-split stratified shuffles and 3-split stratified K-folds) in the training set. In each fold, hyperparameter optimization was tuned by an exhaustive cross-validation grid search with a test set of 0.3, and the model was refitted in the training set with the optimal score parameters. The performance of our survival predictor was evaluated by the censored C-index (ref. 25; c-statistic) and mean cumulative/dynamic AUC (26). Predictions for independent validation sets 1 and 2 were performed with the training test set procedure. The analysis process was implemented in Python 3.6, and the analysis code is available at https://github.com/WellJoea/MLSurvival.
Statistical analysis
The data were analyzed using R statistical software version 3.6. We used the χ2 test for categorical data and the Wilcoxon rank-sum test for continuous variables. Kaplan–Meier survival analysis was performed to estimate OS. For the two-subgroup classification, a risk score of 0 was used as the cutoff value. For the three-subgroup classification, the cutoff values for high-, medium-, and low-risk scores were determined by using the X-tile plots for the training set (27). Time-dependent ROC curve analysis was performed to analyze the predictive and prognostic accuracy of each feature examined.
Results
Patient characteristics
We compared the clinical and immunologic characteristics of the training set and the two validation sets. In more than 80% of the patients with HCC, liver function was classified as Child–Pugh A or B, and nearly 70% of these patients had been treated with the minimally invasive therapeutic modalities of transarterial chemoembolization or radiofrequency ablation. No differences were found among the three groups in terms of treatment type or liver function. In training set and validation set 1, 68.4% and 78.6% of patients with HCC, respectively, were in the early stage of HCC (BCLC stage A or B). However, in validation set 2, this percentage was significantly decreased (only 51.4%; P = 0.001; Table 1). Similarly, the percentage of patients with HCC with PVTT in validation set 2 was significantly higher than that in either the training set or validation set 1 (P = 0.006). Regarding etiology, 80.1% and 78.6% of the patients in training set and validation set 1, respectively, had HBV infection compared with 93.3% of the patients in validation set 2 (P = 0.014). No differences in liver function, renal function, coagulation function, or T-cell count were found among the three cohorts (Supplementary Table S1). The percentages of CD4+TIGIT+, CD4+TIM-3+, CD4+TIGIT+TIM-3+, CD4+PD-1+TIGIT+TIM-3+, CD4+ Tcm cells, CD8+PD-1+, and CD8+ Tcm cells were higher in validation set 2 than in either the training set or validation set 1. CD4+ Tn, CD8+PD-1+TIGIT+TIM-3+, and CD8+ Temra cells were lower in validation set 2 (Supplementary Table S2). Overall, these observations may indicate a discrepancy in the baseline characteristics of the three cohorts because the patients were recruited during different time periods.
Clinical characteristics of patients with HCC.
. | Discovery cohort, n = 136 (%) . | Validation cohort 1, n = 56 (%) . | Validation cohort 2, n = 105 (%) . | P . |
---|---|---|---|---|
Age | 0.258 | |||
≥60 | 55 (40.4) | 29 (51.8) | 41 (39.0) | |
<60 | 81 (59.6) | 27 (48.2) | 64 (61.0) | |
Gender (male) | 0.485 | |||
Male | 110 (80.9) | 47 (83.9) | 91 (86.7) | |
Female | 26 (19.1) | 9 (16.1) | 14 (13.3) | |
Alcohol | 0.260 | |||
Alcohol | 65 (47.8) | 31 (55.4) | 61 (58.1) | |
No alcohol | 71 (52.2) | 25 (44.6) | 44 (41.9) | |
Etiology | 0.014 | |||
HBV | 109 (80.1) | 44 (78.6) | 98 (93.3) | |
HCV | 13 (9.6) | 7 (12.5) | 1 (1.0) | |
Child–Pugh stage | 0.483 | |||
A | 82 (60.3) | 32 (57.1) | 58 (55.2) | |
B | 36 (26.5) | 19 (33.9) | 28 (26.7) | |
C | 18 (13.2) | 5 (8.9) | 19 (18.1) | |
Tumor size | 0.990 | |||
≤5 cm | 86 (36.8) | 36 (35.7) | 67 (36.2) | |
>5 cm | 50 (63.2) | 20 (64.3) | 38 (63.8) | |
Tumor multiplicity | 0.190 | |||
Solitary | 63 (46.3) | 18 (32.1) | 43 (41.0) | |
Multiple | 73 (53.7) | 38 (67.9) | 62 (59.0) | |
PVTT at baseline | 0.006 | |||
Yes | 38 (27.9) | 16 (28.6) | 49 (46.7) | |
No | 98 (72.1) | 40 (71.4) | 56 (53.3) | |
BCLC staging | 0.001 | |||
A–B | 93 (68.4) | 44 (78.6) | 54 (51.4) | |
C–D | 43 (31.6) | 12 (21.4) | 51 (48.6) | |
Type of treatment | 0.756 | |||
Resection | 7 (5.1) | 3 (5.4) | 7 (6.7) | |
Minimally invasive | 98 (72.1) | 38 (67.9) | 79 (75.2) | |
Palliative | 31 (22.8) | 15 (26.8) | 19 (18.1) |
. | Discovery cohort, n = 136 (%) . | Validation cohort 1, n = 56 (%) . | Validation cohort 2, n = 105 (%) . | P . |
---|---|---|---|---|
Age | 0.258 | |||
≥60 | 55 (40.4) | 29 (51.8) | 41 (39.0) | |
<60 | 81 (59.6) | 27 (48.2) | 64 (61.0) | |
Gender (male) | 0.485 | |||
Male | 110 (80.9) | 47 (83.9) | 91 (86.7) | |
Female | 26 (19.1) | 9 (16.1) | 14 (13.3) | |
Alcohol | 0.260 | |||
Alcohol | 65 (47.8) | 31 (55.4) | 61 (58.1) | |
No alcohol | 71 (52.2) | 25 (44.6) | 44 (41.9) | |
Etiology | 0.014 | |||
HBV | 109 (80.1) | 44 (78.6) | 98 (93.3) | |
HCV | 13 (9.6) | 7 (12.5) | 1 (1.0) | |
Child–Pugh stage | 0.483 | |||
A | 82 (60.3) | 32 (57.1) | 58 (55.2) | |
B | 36 (26.5) | 19 (33.9) | 28 (26.7) | |
C | 18 (13.2) | 5 (8.9) | 19 (18.1) | |
Tumor size | 0.990 | |||
≤5 cm | 86 (36.8) | 36 (35.7) | 67 (36.2) | |
>5 cm | 50 (63.2) | 20 (64.3) | 38 (63.8) | |
Tumor multiplicity | 0.190 | |||
Solitary | 63 (46.3) | 18 (32.1) | 43 (41.0) | |
Multiple | 73 (53.7) | 38 (67.9) | 62 (59.0) | |
PVTT at baseline | 0.006 | |||
Yes | 38 (27.9) | 16 (28.6) | 49 (46.7) | |
No | 98 (72.1) | 40 (71.4) | 56 (53.3) | |
BCLC staging | 0.001 | |||
A–B | 93 (68.4) | 44 (78.6) | 54 (51.4) | |
C–D | 43 (31.6) | 12 (21.4) | 51 (48.6) | |
Type of treatment | 0.756 | |||
Resection | 7 (5.1) | 3 (5.4) | 7 (6.7) | |
Minimally invasive | 98 (72.1) | 38 (67.9) | 79 (75.2) | |
Palliative | 31 (22.8) | 15 (26.8) | 19 (18.1) |
Feature selection and GBS model construction
Aiming to construct a robust multidimensional survival predictor for OS in patients with HCC, we comprehensively analyzed 91 features, including conventional clinical characteristics, laboratory parameters, and immune parameters reflecting T-cell function (Table 1; Supplementary Tables S1 and S2). We stacked all three types of features from the training set into an ensemble ML framework (Fig. 1). After data preprocessing, we performed univariate CoxPH analysis on all features and identified 48 features as OS-relevant candidate variables (Supplementary Fig. S3; log-rank P < 0.1).
Workflow of ML survival model construction and validation. We retrospectively included three cohorts of patients with HCC as the training set (n = 136), validation set 1 (n = 56), and an independent validation set 2 (n = 105). In the training set, we comprehensively analyzed 91 features by using an ensemble ML framework. After data preprocessing, feature selection, and model construction, we built a risk scoring algorithm based on 20 final features. Two cohorts were applied for verification. The predictive and prognostic value of the risk scoring algorithm was analyzed. Alb, albumin; TBIL, total bilirubin; TS, tumor size.
Workflow of ML survival model construction and validation. We retrospectively included three cohorts of patients with HCC as the training set (n = 136), validation set 1 (n = 56), and an independent validation set 2 (n = 105). In the training set, we comprehensively analyzed 91 features by using an ensemble ML framework. After data preprocessing, feature selection, and model construction, we built a risk scoring algorithm based on 20 final features. Two cohorts were applied for verification. The predictive and prognostic value of the risk scoring algorithm was analyzed. Alb, albumin; TBIL, total bilirubin; TS, tumor size.
These features were further selected on the basis of RFECV ranking and were used to train the ML-GBS model using 13-fold cross-validation. As a result, a GBS model incorporating 20 features was constructed, and a risk score was subsequently derived (Figs. 1 and 2). The final 20 selected features included only one clinical feature (tumor diameter), 14 laboratory parameters [e.g., CRP, γ-GGT, ALP, neutrophil-to-lymphocyte ratio (NLR)], and five T-cell function parameters (e.g., CD4+PD-1+TIGIT+TIM-3+, CD4+TIGIT+TIM-3+, and CD8+ Tem cells) obtained by FACS analysis of PBMCs, indicating that tumor and immune features in the periphery could facilitate the prediction of patient prognoses (Fig. 2).
Key features in the ML-GBS survival model in the training set. A, Weights of 20 important features (13-fold cross-validation) of the GBS model in the training set. B, Univariate Cox analysis of the 20 selected features. A forest plot showing the HR (orange dot) and 95% CI (gray line). C, Selected features in the ML-GBS survival model. %EO, percentage of eosinophils; %Lymph, percentage of lymphocytes; BUN, urea nitrogen; CHE, cholinesterase; Coef, coefficient; Cr, creatinine; GRAN, granulocyte count; HGB, hemoglobin; N/L, neutrophil-to-lymphocyte ratio; PTA, prothrombin activity; RBC, red blood cell; TBIL, total bilirubin.
Key features in the ML-GBS survival model in the training set. A, Weights of 20 important features (13-fold cross-validation) of the GBS model in the training set. B, Univariate Cox analysis of the 20 selected features. A forest plot showing the HR (orange dot) and 95% CI (gray line). C, Selected features in the ML-GBS survival model. %EO, percentage of eosinophils; %Lymph, percentage of lymphocytes; BUN, urea nitrogen; CHE, cholinesterase; Coef, coefficient; Cr, creatinine; GRAN, granulocyte count; HGB, hemoglobin; N/L, neutrophil-to-lymphocyte ratio; PTA, prothrombin activity; RBC, red blood cell; TBIL, total bilirubin.
Performance of the GBS model–derived classifier
We next examined the association between the GBS model–derived risk score and OS time (months) in all three cohorts. A significant linear correlation was observed between the two indicators (Pearson correlation, P < 0.0001; Fig. 3A; Supplementary Fig. S4). Although the median follow-up times of the surviving patients varied among the three cohorts, ranging from 12.4 to 19.3 months, the predictive risk scoring system demonstrated high C-index values of 0.844, 0.827, and 0.806 in the training set and validation sets 1 and 2, respectively (Fig. 3A). The Kaplan–Meier curves showed that the ML classifier could effectively separate patients into subgroups with a higher or lower risk of death. With either a two-group (Fig. 3B–D) or a three-group (Fig. 3E–G) classification, the low-risk and/or medium-risk subgroups exhibited substantially improved OS compared with the reference high-risk subgroup in all three datasets (P < 0.05 for all comparisons). For instance, for the two-subgroup classification in validation set 1, the death rates for patients with low-risk scores were significantly lower than those for patients with high-risk scores [HR, 0.12, 95% confidence interval (CI), 0.04–0.35; P < 0.0001; Fig. 3C]. Similar results were identified in validation set 2.
Associations between the risk score and OS in all datasets. A, C-index and distribution of the ML-GBS–derived risk score and patient OS in all three datasets. B–D, Comparison of Kaplan–Meier OS curves of the high-risk score (risk score ≥ 0) and low-risk score (risk score < 0) subgroups in the three datasets. E–G, Comparison of Kaplan–Meier OS curves between the high-, median-, and low-risk score subgroups in the three datasets. For classification of the three subgroups, the cutoff points for the high-, median-, and low-risk score subgroups were determined by using the X-tile plots from the training set. The log-rank test was used to calculate the P value. Number of patients indicated.
Associations between the risk score and OS in all datasets. A, C-index and distribution of the ML-GBS–derived risk score and patient OS in all three datasets. B–D, Comparison of Kaplan–Meier OS curves of the high-risk score (risk score ≥ 0) and low-risk score (risk score < 0) subgroups in the three datasets. E–G, Comparison of Kaplan–Meier OS curves between the high-, median-, and low-risk score subgroups in the three datasets. For classification of the three subgroups, the cutoff points for the high-, median-, and low-risk score subgroups were determined by using the X-tile plots from the training set. The log-rank test was used to calculate the P value. Number of patients indicated.
Time-dependent ROC analyses were performed to determine the prognostic and predictive accuracy of the GBS risk score classifier (Supplementary Fig. S5), revealing a high AUC for the risk scores in both validation sets (Supplementary Fig. S5A; validation set 1: 0.886, 0.858, and 0.917 and validation set 2: 0.924, 0.868, and 0.884 at 6, 12, and 18 months, respectively). The stratification power of the GBS-derived risk score was superior to that of other traditional modalities, as revealed by univariate Cox model analysis of multiple prognostic markers (Supplementary Fig. S5B). Multivariate Cox regression analysis was performed to validate the independent prognostic value of the GBS-derived risk scores. The risk score was an independent factor for OS when adjusted for other clinical characteristics (nonselected features in the GBS model), including sex, age, Child–Pugh class, PVTT status, BCLC stage, etc., in all three independent datasets and in the combined cohort (P < 0.05 for all comparisons; Supplementary Figs. S6–S9).
Correlations between GBS-derived risk score and essential clinical characteristics
On the basis of the GBS ML analysis of the training set, the risk score model integrated 20 features as critical factors, among which only one clinical characteristic, tumor size (diameter), was selected (Fig. 2). We, therefore, sought to assess the correlations between the GBS-derived risk scores and other prognostic clinical characteristics demonstrated previously to be essential. We first observed a distinct distribution pattern of risk scores in the different subgroups defined by these clinical characteristics. Specifically, patients in the Child–Pugh C or BCLC C–D subgroups or patients who had PVTT had higher risk scores than those in the Child–Pugh A or BCLC A–B subgroups or those who did not have PVTT, respectively, in the combined cohort (Fig. 4A; Table 2). A similar trend was also identified in all three independent cohorts (Table 2; Supplementary Fig. S10), indicating that high-risk scores could also represent advancement along the clinical staging spectrum, even though these factors were not included in the final GBS prognostic model.
Comparison of survival distribution by the risk score in different subpopulations. A, Density distribution plot of the risk score in subpopulations staged by different clinical staging systems (vertical dashed lines: median value of the subpopulation). B, Comparison of Kaplan–Meier OS curves between the high-risk (risk score ≥ 0) and low-risk (risk score < 0) score subgroups in subpopulations with Child–Pugh A, B, or C stage. C, Comparison of Kaplan–Meier OS curves between the high- and low-risk score subgroups of patients with or without PVTT. D, Comparison of Kaplan–Meier OS curves between the high- and low-risk score subgroups of patients within BCLC stage A–B or C–D. The log-rank test was used to calculate the P value. Number of patients indicated.
Comparison of survival distribution by the risk score in different subpopulations. A, Density distribution plot of the risk score in subpopulations staged by different clinical staging systems (vertical dashed lines: median value of the subpopulation). B, Comparison of Kaplan–Meier OS curves between the high-risk (risk score ≥ 0) and low-risk (risk score < 0) score subgroups in subpopulations with Child–Pugh A, B, or C stage. C, Comparison of Kaplan–Meier OS curves between the high- and low-risk score subgroups of patients with or without PVTT. D, Comparison of Kaplan–Meier OS curves between the high- and low-risk score subgroups of patients within BCLC stage A–B or C–D. The log-rank test was used to calculate the P value. Number of patients indicated.
Correlation between GBS-derived risk score and clinic characteristics.
. | Training set . | Validation set 1 . | Validation set 2 . | Combined cohort . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | High . | Lowa . | . | High . | Low . | . | High . | Low . | . | High . | Low . | . | |
Characteristics . | n = 54 . | n = 82 . | P . | n = 24 . | n = 32 . | P . | n = 51 . | n = 54 . | P . | n = 129 . | n = 168 . | P . | |
Age | Age < 60 | 36 (66.7) | 45 (54.9)b | 0.233 | 10 (41.7) | 17 (53.1) | 0.563 | 33 (64.7) | 31 (57.4) | 0.571 | 79 (61.2) | 93 (55.4) | 0.368 |
Age ≥ 60 | 18 (33.3) | 37 (45.1) | 14 (58.3) | 15 (46.9) | 18 (35.3) | 23 (42.6) | 50 (38.8) | 75 (44.6) | |||||
Gender | Female | 10 (18.5) | 16 (19.5) | 1 | 5 (20.8) | 4 (12.5) | 0.636 | 3 (5.9) | 11 (20.4) | 0.058 | 18 (14.0) | 31 (18.5) | 0.38 |
Male | 44 (81.5) | 66 (80.5) | 19 (79.2) | 28 (87.5) | 48 (94.1) | 43 (79.6) | 111 (86.0) | 137 (81.5) | |||||
Alcohol | No | 33 (61.1) | 38 (46.3) | 0.131 | 12 (50.0) | 13 (40.6) | 0.67 | 19 (37.3) | 25 (46.3) | 0.459 | 64 (49.6) | 76 (45.2) | 0.528 |
Yes | 21 (38.9) | 44 (53.7) | 12 (50.0) | 19 (59.4) | 32 (62.7) | 29 (53.7) | 65 (50.4) | 92 (54.8) | |||||
Etiology | Alcoholic | 4 (7.4) | 2 (2.4) | 0.52 | 0 (0.0) | 1 (3.1) | 0.094 | 4 (7.8) | 1 (1.9) | 0.275 | 8 (6.2) | 4 (2.4) | 0.324 |
HBV | 41 (75.9) | 68 (82.9) | 17 (70.8) | 27 (84.4) | 47 (92.2) | 51 (94.4) | 105 (81.4) | 146 (86.9) | |||||
HCV | 6 (11.1) | 7 (8.5) | 3 (12.5) | 4 (12.5) | 0 (0.0) | 1 (1.9) | 9 (7.0) | 12 (7.1) | |||||
Others | 3 (5.6) | 5 (6.1) | 4 (16.7) | 0 (0.0) | 0 (0.0) | 1 (1.9) | 7 (5.4) | 6 (3.6) | |||||
Child–Pugh | A | 13 (24.1) | 69 (84.1) | <0.001 | 6 (25.0) | 26 (81.2) | <0.001 | 12 (23.5) | 46 (85.2) | <0.001 | 31 (24.0) | 141 (83.9) | <0.001 |
B | 25 (46.3) | 11 (13.4) | 13 (54.2) | 6 (18.8) | 22 (43.1) | 6 (11.1) | 60 (46.5) | 23 (13.7) | |||||
C | 16 (29.6) | 2 (2.4) | 5 (20.8) | 0 (0.0) | 17 (33.3) | 2 (3.7) | 38 (29.5) | 4 (2.4) | |||||
Tumor sizec | ≤5 cm | 14 (25.9) | 72 (87.8) | <0.001 | 7 (29.2) | 29 (90.6) | <0.001 | 23 (45.1) | 44 (81.5) | <0.001 | 44 (34.1) | 145 (86.3) | <0.001 |
>5 cm | 40 (74.1) | 10 (12.2) | 17 (70.8) | 3 (9.4) | 28 (54.9) | 10 (18.5) | 85 (65.9) | 23 (13.7) | |||||
Multiplicity | ≤2 | 19 (35.2) | 44 (53.7) | 0.053 | 8 (33.3) | 10 (31.2) | 1 | 17 (33.3) | 26 (48.1) | 0.179 | 44 (34.1) | 80 (47.6) | 0.026 |
>2 | 35 (64.8) | 38 (46.3) | 16 (66.7) | 22 (68.8) | 34 (66.7) | 28 (51.9) | 85 (65.9) | 88 (52.4) | |||||
PVTT | No | 26 (48.1) | 72 (87.8) | <0.001 | 13 (54.2) | 27 (84.4) | 0.029 | 14 (27.5) | 42 (77.8) | <0.001 | 53 (41.1) | 141 (83.9) | <0.001 |
Yes | 28 (51.9) | 10 (12.2) | 11 (45.8) | 5 (15.6) | 37 (72.5) | 12 (22.2) | 76 (58.9) | 27 (16.1) | |||||
BCLC | A–B | 20 (37.0) | 73 (89.0) | <0.001 | 16 (66.7) | 28 (87.5) | 0.121 | 13 (25.5) | 41 (75.9) | <0.001 | 49 (38.0) | 142 (84.5) | <0.001 |
C–D | 34 (63.0) | 9 (11.0) | 8 (33.3) | 4 (12.5) | 38 (74.5) | 13 (24.1) | 80 (62.0) | 26 (15.5) | |||||
Treatment | Minimally invasive | 29 (53.7) | 69 (84.1) | <0.001 | 13 (54.2) | 25 (78.1) | 0.093 | 30 (58.8) | 49 (90.7) | <0.001 | 72 (55.8) | 143 (85.1) | <0.001 |
Palliative | 23 (42.6) | 8 (9.8) | 10 (41.7) | 5 (15.6) | 18 (35.3) | 1 (1.9) | 51 (39.5) | 14 (8.3) | |||||
Resection | 2 (3.7) | 5 (6.1) | 1 (4.2) | 2 (6.2) | 3 (5.9) | 4 (7.4) | 6 (4.7) | 11 (6.5) |
. | Training set . | Validation set 1 . | Validation set 2 . | Combined cohort . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
. | High . | Lowa . | . | High . | Low . | . | High . | Low . | . | High . | Low . | . | |
Characteristics . | n = 54 . | n = 82 . | P . | n = 24 . | n = 32 . | P . | n = 51 . | n = 54 . | P . | n = 129 . | n = 168 . | P . | |
Age | Age < 60 | 36 (66.7) | 45 (54.9)b | 0.233 | 10 (41.7) | 17 (53.1) | 0.563 | 33 (64.7) | 31 (57.4) | 0.571 | 79 (61.2) | 93 (55.4) | 0.368 |
Age ≥ 60 | 18 (33.3) | 37 (45.1) | 14 (58.3) | 15 (46.9) | 18 (35.3) | 23 (42.6) | 50 (38.8) | 75 (44.6) | |||||
Gender | Female | 10 (18.5) | 16 (19.5) | 1 | 5 (20.8) | 4 (12.5) | 0.636 | 3 (5.9) | 11 (20.4) | 0.058 | 18 (14.0) | 31 (18.5) | 0.38 |
Male | 44 (81.5) | 66 (80.5) | 19 (79.2) | 28 (87.5) | 48 (94.1) | 43 (79.6) | 111 (86.0) | 137 (81.5) | |||||
Alcohol | No | 33 (61.1) | 38 (46.3) | 0.131 | 12 (50.0) | 13 (40.6) | 0.67 | 19 (37.3) | 25 (46.3) | 0.459 | 64 (49.6) | 76 (45.2) | 0.528 |
Yes | 21 (38.9) | 44 (53.7) | 12 (50.0) | 19 (59.4) | 32 (62.7) | 29 (53.7) | 65 (50.4) | 92 (54.8) | |||||
Etiology | Alcoholic | 4 (7.4) | 2 (2.4) | 0.52 | 0 (0.0) | 1 (3.1) | 0.094 | 4 (7.8) | 1 (1.9) | 0.275 | 8 (6.2) | 4 (2.4) | 0.324 |
HBV | 41 (75.9) | 68 (82.9) | 17 (70.8) | 27 (84.4) | 47 (92.2) | 51 (94.4) | 105 (81.4) | 146 (86.9) | |||||
HCV | 6 (11.1) | 7 (8.5) | 3 (12.5) | 4 (12.5) | 0 (0.0) | 1 (1.9) | 9 (7.0) | 12 (7.1) | |||||
Others | 3 (5.6) | 5 (6.1) | 4 (16.7) | 0 (0.0) | 0 (0.0) | 1 (1.9) | 7 (5.4) | 6 (3.6) | |||||
Child–Pugh | A | 13 (24.1) | 69 (84.1) | <0.001 | 6 (25.0) | 26 (81.2) | <0.001 | 12 (23.5) | 46 (85.2) | <0.001 | 31 (24.0) | 141 (83.9) | <0.001 |
B | 25 (46.3) | 11 (13.4) | 13 (54.2) | 6 (18.8) | 22 (43.1) | 6 (11.1) | 60 (46.5) | 23 (13.7) | |||||
C | 16 (29.6) | 2 (2.4) | 5 (20.8) | 0 (0.0) | 17 (33.3) | 2 (3.7) | 38 (29.5) | 4 (2.4) | |||||
Tumor sizec | ≤5 cm | 14 (25.9) | 72 (87.8) | <0.001 | 7 (29.2) | 29 (90.6) | <0.001 | 23 (45.1) | 44 (81.5) | <0.001 | 44 (34.1) | 145 (86.3) | <0.001 |
>5 cm | 40 (74.1) | 10 (12.2) | 17 (70.8) | 3 (9.4) | 28 (54.9) | 10 (18.5) | 85 (65.9) | 23 (13.7) | |||||
Multiplicity | ≤2 | 19 (35.2) | 44 (53.7) | 0.053 | 8 (33.3) | 10 (31.2) | 1 | 17 (33.3) | 26 (48.1) | 0.179 | 44 (34.1) | 80 (47.6) | 0.026 |
>2 | 35 (64.8) | 38 (46.3) | 16 (66.7) | 22 (68.8) | 34 (66.7) | 28 (51.9) | 85 (65.9) | 88 (52.4) | |||||
PVTT | No | 26 (48.1) | 72 (87.8) | <0.001 | 13 (54.2) | 27 (84.4) | 0.029 | 14 (27.5) | 42 (77.8) | <0.001 | 53 (41.1) | 141 (83.9) | <0.001 |
Yes | 28 (51.9) | 10 (12.2) | 11 (45.8) | 5 (15.6) | 37 (72.5) | 12 (22.2) | 76 (58.9) | 27 (16.1) | |||||
BCLC | A–B | 20 (37.0) | 73 (89.0) | <0.001 | 16 (66.7) | 28 (87.5) | 0.121 | 13 (25.5) | 41 (75.9) | <0.001 | 49 (38.0) | 142 (84.5) | <0.001 |
C–D | 34 (63.0) | 9 (11.0) | 8 (33.3) | 4 (12.5) | 38 (74.5) | 13 (24.1) | 80 (62.0) | 26 (15.5) | |||||
Treatment | Minimally invasive | 29 (53.7) | 69 (84.1) | <0.001 | 13 (54.2) | 25 (78.1) | 0.093 | 30 (58.8) | 49 (90.7) | <0.001 | 72 (55.8) | 143 (85.1) | <0.001 |
Palliative | 23 (42.6) | 8 (9.8) | 10 (41.7) | 5 (15.6) | 18 (35.3) | 1 (1.9) | 51 (39.5) | 14 (8.3) | |||||
Resection | 2 (3.7) | 5 (6.1) | 1 (4.2) | 2 (6.2) | 3 (5.9) | 4 (7.4) | 6 (4.7) | 11 (6.5) |
Note: Bold text indicates a statistically significant difference with a P value less than 0.05.
aRisk scores of high and low subgroups were classified according to GBS-derived risk score > 0 or < 0.
bValues are presented as numbers of patients followed by percentages in parentheses.
cTumor size is selected as an important feature in the GBS survival model.
Next, we also showed that the risk scoring system could function properly in different classification subgroups of patients with HCC. For instance, within the Child–Pugh A, B, or C subgroups in the combined cohort, the low-risk subgroups exhibited substantially improved OS compared with the reference high-risk subgroups (Fig. 4B; P < 0.05 for all comparisons). Similarly, the GBS-derived risk score could stratify patients with better or worse prognoses, both among patients with HCC with or without PVTT (Fig. 4C; Supplementary Fig. S11) or among patients with different BCLC stages (Fig. 4D; Supplementary Fig. S12). Collectively, these observations support the robustness of our prognostic model, as they indicate that the model can adapt to different distributions of events and clinical staging systems.
A higher GBS-derived risk score correlates with a higher incidence of recurrence
Given that the GBS prognostic model showed excellent performance for OS in all three cohorts (Fig. 3), we then sought to determine whether the established risk scores were correlated with the incidence of HCC recurrence. We performed Kaplan–Meier survival analysis of the recurrence-free survival (RFS) time in all three datasets. This analysis confirmed that the RFS times were longer in the low-risk score subgroup than in the high-risk score subgroup in all cohorts (Supplementary Fig. S13; P < 0.05 for all comparisons). Similar results were obtained when patients were subclassified into the low-, median-, and high-risk score subgroups (Supplementary Fig. S14; P < 0.05 for all comparisons). The 1-year recurrence rates for patients with low- versus high-risk scores were 63.2% (50/79) versus 85.7% (42/49), 52% (13/25) versus 95.2% (20/21), 61.2% (30/49) versus 84% (42/50), and 60.8% (93/153) versus 86.7% (104/120) in the training set, validation set 1, set 2, and combined cohorts, respectively (Supplementary Table S3), indicating a potential role of the risk score in predicting HCC recurrence. Our data also confirmed the significant association between 1-year progression and other clinical characteristics, including the Child–Pugh class or BCLC stage, tumor size, PVTT status, and treatment strategies (Supplementary Table S3).
We also observed that the high-risk score subgroup was characterized by the high expression of inhibitory molecules on T-cell populations in the periphery, including CD4+PD-1+TIGIT+TIM-3+, CD4+TIGIT+TIM-3+, and CD8+PD-1+TIGIT+TIM-3+ T cells (Supplementary Fig. S15). These observations may suggest a functional role for PD-1+TIGIT+ TIM-3+ T cells in the pathogenesis and progression of liver cancer.
Discussion
Liver cancer staging and risk assessment are key steps in treatment planning. To date, the optimal tool with the best prognostic value is still controversial. Here, we examined three well-documented cohorts with adequate follow-up, encompassing a training set and two validation sets. On the basis of three clusters of possible predictors for HCC-related death, namely, conventional clinicopathologic characteristics, laboratory parameters assessing liver function, and immune features reflecting T-cell function, we constructed a multiparameter GBS prognostic model. Our data confirmed the robustness of the GBS-derived scoring algorithm and its prognostic value across various cohorts of patients with HCC, although the baseline clinical characteristics of these three cohorts varied greatly.
In this investigation, the final GBS-derived risk prediction model incorporated 20 features. Tumor size (diameter) was consistently proven to be an essential prognostic marker (28). At the same time, 14 laboratory parameters were selected, among which CRP, ALP, and γ-GGT were the highest ranked features. In a previous study, elevated CRP in patients with HCC at the time of diagnosis was associated with a dismal prognosis and remained an independent prognostic factor for OS in multivariate analysis (29). However, the underlying mechanistic role of CRP in liver cancer remains largely unknown. Similarly, the independent prognostic value of serum ALP (28) and γ-GGT (30) has been demonstrated in previous reports. We also showed that the NLR was incorporated into the risk scoring algorithm. An increased NLR has been correlated with the accumulation of tumor-associated macrophages and a worse prognosis (31). Collectively, laboratory tests provided multiple candidate predictors of liver function, and these parameters may be ideal adjustment factors for constructing risk stratification models for patients with HCC via an ML strategy.
Our data also suggest that markers of T-cell exhaustion obtained from peripheral blood samples may be additional key factors for patient survival. Previously, we found that coexpression of PD-1, TIM3, and TIGIT was significantly upregulated in CD4+ and CD8+ T cells from patients with HBV-related HCC compared with patients with chronic HBV infection or HBV-related liver cirrhosis. Functionally, high expression of these inhibitory molecules is accompanied by features of T-cell exhaustion and is associated with poor clinical outcomes (21). When we stacked these immunologic features reflecting T-cell function along with conventional risk factors for HCC into the ML framework, five immunologic factors were incorporated into the GBS risk model, indicating that different CD4+ and CD8+ T-cell populations were correlated with HCC-related death. Many studies have shown that the density of tumor-infiltrating T cells, including CD4+ and CD8+ T cells, is associated with the RFS and OS of patients with HCC (32–34). Our retrospective study also showed that increased peripheral blood CD4+ T cells prolonged progression-free survival in patients with HCC (35). In addition to total T cells, T-cell subpopulations are also closely related to tumor prognosis. Previous studies have confirmed that effector memory cells are associated with the immune response, tumor metastasis, and survival in patients with colorectal cancer (36, 37). The upregulation of Tem cells has also been observed in patients with HCC after preoperative treatment with nivolumab/ipilimumab (38). Similar to the above results, our GBS risk model also revealed the prognostic value of T-cell subpopulations, especially Tem cells, on the survival of patients with HCC. A publication reveals that immune cell subpopulations become progressively suppressive from circulating blood to nontumor tissues to the tumor microenvironment (TME) in HCCs (39), indicating that the risk scoring system may correlate with the functional status of the immune microenvironment in situ.
Conventionally, the CoxPH model, which assumes a linear relationship between the model parameters and static covariates, is widely applied in survival analysis (40). ML experts have developed a suite of effective algorithms that may either complement or outperform traditional statistical methods, such as support vector machines, random survival forests, and GBS analyses. Examples of potential ML applications in precision oncology include both the identification of novel diagnostic/predictive/prognostic features and model-based assessment of clinical outcomes, subtype classification, and biomarker selection (41–43). An immunomarker-SVM–based prognostic classifier was developed for predicting the survival of patients with nasopharyngeal carcinoma (44). A retrospective multicenter study demonstrates that a deep convolutional neural network can be used to assess the TME and predict colorectal cancer prognosis from histopathologic images (45). The newly developed ML-based classifier, which integrated patient sex, lymph node metastasis, and expression of several proteins, is a powerful predictor of OS and disease-free survival in patients with gastric cancer. In liver cancer, a deep learning–based model that was constructed using RNA sequencing, miRNA sequencing, and methylation data from The Cancer Genome Atlas, predicts prognosis with an average C-index of 0.74 (46).
Although there are many models for predicting the prognosis of HCC, few include immune markers. Studies on the tumor immune response and immunosuppression have received increasing attention. Immunotherapies, such as immune checkpoint therapy, have achieved a new breakthrough in the field of oncology. There are also some HCC prognostic models based on immune factors related mostly to the TME (33, 47, 48). Here, we characterized the functional status of peripheral T cells and constructed an ML-based risk score by incorporating the above peripheral factors. We showed that the effective risk scoring algorithm predicted OS in patients with liver cancer, with an average C-index of 0.815 in the validation sets, emphasizing the significant prognostic value of the peripheral immune system at baseline. The robustness of the GBS prognostic model was confirmed in two independent validation sets, even though the median OS time varied between the two datasets. All findings were reproducible when patients were subclassified according to the BCLC stage, Child–Pugh class, or PVTT status. When adjusted for nonselected clinical features using multivariate Cox proportional analysis, the GBS-derived risk score was an independent risk factor in all datasets. The performance of the GBS-derived risk score could compete with or exceed that of other well-known staging systems, such as the BCLC and Child–Pugh classifications, when all three datasets were comprehensively evaluated.
Our study has some limitations. This was a single-center real-world study with a relatively small sample size and a short follow-up. Although we found that a higher GBS-derived risk score was associated with a shorter RFS time and a higher 1-year recurrence rate in all three datasets, this investigation failed to assess the value of the risk score for predicting late recurrence at 2 or 3 years, due to the lack of clinical information over the longer follow-up period. Therefore, future studies with available relative long-term progression or recurrence information would be helpful to confirm the validity and robustness of the GBS-based classifier. In addition, we observed that patients with high-risk scores showed elevated expression of inhibitory molecules on peripheral T-cell populations. Whether and how the exhausted phenotype of T cells is implicated in the processes of tumor progression and recurrence remain to be determined. Our study may provide some clues for novel combination strategies of immune checkpoint blockade therapy in liver cancer. Another limitation of this study was that we mainly focused on HBV+ HCC. Several studies have confirmed that HBV+ HCC has a larger tumor burden, a higher incidence of vascular invasion, and a worse prognosis than HCV− HCC and nonviral HCC (49, 50). Therefore, whether this model can be applied to patients with other etiologies should be further explored.
In conclusion, we employed a GBS ML strategy and constructed a prognostic model that incorporated tumor size, laboratory parameters, and peripheral immunologic features reflecting T-cell function. Because serum measurements and PBMC analysis are accessible and easy to perform, utilizing the GBS-derived risk scoring system to predict the clinical outcomes of patients with HCC is highly desirable. Future investigations would be helpful to refine the risk algorithm and may ultimately result in the development of a precision and novel strategy to guide personalized treatment.
Authors' Disclosures
No disclosures were reported.
Authors' Contributions
X. Liu: Data curation, formal analysis. J. Lu: Formal analysis, investigation. G. Zhang: Data curation, investigation. J. Han: Data curation, investigation. W. Zhou: Software, methodology. H. Chen: Conceptualization, investigation, writing–original draft. H. Zhang: Conceptualization, writing–review and editing. Z. Yang: Resources, supervision, project administration.
Acknowledgments
This work was supported by the National Key Sci-Tech Special Project of China (grant no. 2018ZX10302207), the National Science Foundation of China (grant no. 81874435), the Dengfeng Talent Support Program of Beijing Municipal Administration of Hospitals (grant no. DFL20191803), the Special Fund of Capital Health Research and Development (grant no. 2020-2-2173), and Beijing Hospitals Authority Clinical Medicine Development of Special Funding Support (grant no. ZYLX202127).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.