Abstract
Prediction models for acute myeloid leukemia (AML) are useful, but have considerable inaccuracy and imprecision. No current model includes covariates related to immune cells in the AML microenvironment. Here, an immune risk score was explored to predict the survival of patients with AML.
We evaluated the predictive accuracy of several in silico algorithms for immune composition in AML based on a reference of multi-parameter flow cytometry. CIBERSORTx was chosen to enumerate immune cells from public datasets and develop an immune risk score for survival in a training cohort using least absolute shrinkage and selection operator Cox regression model.
Six flow cytometry–validated immune cell features were informative. The model had high predictive accuracy in the training and four external validation cohorts. Subjects in the training cohort with low scores had prolonged survival compared with subjects with high scores, with 5-year survival rates of 46% versus 19% (P < 0.001). Parallel survival rates in validation cohorts-1, -2, -3, and -4 were 46% versus 6% (P < 0.001), 44% versus 18% (P = 0.041), 44% versus 24% (P = 0.004), and 62% versus 32% (P < 0.001). Gene set enrichment analysis indicated significant enrichment of immune relation pathways in the low-score cohort. In multivariable analyses, high-risk score independently predicted shorter survival with HRs of 1.45 (P = 0.005), 2.12 (P = 0.004), 2.02 (P = 0.034), 1.66 (P = 0.019), and 1.59 (P = 0.001) in the training and validation cohorts, respectively.
Our immune risk score complements current AML prediction models.
Although there are several predictive scores for outcomes of acute myeloid leukemia (AML), these are based on covariates, such as cytogenetics, mutation topography, and measurable residual disease. None include data on immune state. Here, we evaluated the predictive accuracy of several in silico algorithms for immune composition in AML bone marrow cells based on a reference of multi-parameter flow cytometry. CIBERSORTx was chosen to enumerate immune cells in the bone marrow microenvironment and develop an immune risk score for survival in patients with AML receiving chemotherapy. The risk score was validated in five independent cohorts from Gene Expression Omnibus and The Cancer Genome Atlas, proving replicability. The correlation between the immune risk score and the T-cell exhaustion was also indicated. The immune risk score can serve as a complement for the current AML prediction models. Its potential relation to the T-cell exhaustion may also be explored as an indicator for immunotherapy in the future.
Introduction
Outcomes of therapy of patients with acute myeloid leukemia (AML) are heterogeneous (1). There are several widely used prognostic and predictive models, including the European LeukemiaNet (ELN) 2017 risk classification or CALGB cytogenetics-based risk classification, which focus on subject- and disease-related variables. Adding data on mutation topography and from results of measurable residual disease (MRD) tests has only slightly increased prediction accuracy (2–5). These data indicate unknown (latent) covariates (along with measurement error and chance) influence outcomes (6, 7). A potential covariate not systematically analyzed is the impact of immune cells in the bone marrow microenvironment, which may be important in therapy response of patients with AML.
Recently, several computational methods have emerged to profile the immune cell landscape of the tumor microenvironment based on gene expression data (8). CIBERSORTx is a new, machine learning method to estimate cell type abundance and cell type–specific gene expression in heterogeneous samples through gene expression profiles (9). It is the next-generation version of CIBERSORT (cell type identification by estimating relative subsets of RNA transcripts) algorithm, a method for enumerating cell composition from tissue gene expression profiles by deconvolution (10). It developed the new functionalities with cross-platform data normalization (9). CIBERSORTx also allows transcriptomes of individual cell types to be digitally purified from bulk RNA admixtures without physical isolation (9). Analysis of immune cell phenotypes using CIBERSORT and CIBERSORTx is reported in several cancers, but not AML (11–14). We assessed the prediction efficiency of CIBERSORTx computational algorithm for the immune cell composition of the bone marrow microenvironment in AML. We used these data to develop an immune risk score to predict survival of patients with AML receiving chemotherapy in five clinical trials datasets.
Materials and Methods
Evaluation of prediction efficiency of in silico approaches
To assess the prediction efficiency of CIBERSORTx and other algorithms, we first evaluated the immune cell subsets enumerated by multi-parameter flow cytometry (MPFC) and in silico approaches in the Sun Yat-sen University Cancer Center (SYSUCC)-based network cohort (SYSUCC cohort). Briefly, bone marrow samples were obtained from subjects with AML pretherapy. Diagnosis was confirmed by morphology according to French-American-British classification criteria (15). Bone marrow mononucleated cells were isolated with Ficoll-Paque solution and viably frozen. Viability after recovery was examined by ViaStainAOPI Staining Solutions (Nexcelom). Only samples with ≥80% living cells were studied by flow cytometry. Details of the MPFC are in Supplementary Data S1. RNA sequencing (RNA-seq) was performed on each subject sample. Computational algorithms, including CIBERSORTx (9), EPIC (16), MCP-counter (17), and xCell (18), were used to enumerate the immune cell composition in the bone marrow environment. Correlations between immune cell types inferred by in silico approaches and MPFC were analyzed. Only immune cell types inferred with CIBERSORTx and validated by MPFC were used to construct the immune risk model. The protocol was approved by the Institutional Review Boards of SYSUCC (Guangzhou, P.R. China) and conducted on SYSUCC-based network in accordance with the Declaration of Helsinki. Subjects in the SYSUCC cohort provided written informed consent.
Retrieval of AML datasets
We systematically retrieved the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) to identify eligible gene expression datasets with information on survival in subjects with AML. Eligibility criteria included: (i) >40 subjects, (ii) ≤5 platforms were used to study samples in the dataset, and (iii) available survival data. When there were overlapping datasets, we chose the dataset with most complete data and the largest sample size. RNA-seq data of the AML-LAML cohort from The Cancer Genome Atlas (TCGA) were downloaded from the UCSC Xena database (https://xenabrowser.net/datapages/). Transcripts per million (TPM) normalized values were generated for analysis of RNA-seq data.
Study population
Four independent series [GSE37642 (AMLCG1999), GSE425 (Bullinger), GSE106291 (AMLCG2008), and GSE14468 (HOVON)] of GEO datasets (19–23) and TCGA data were identified using the above selection criteria. The GSE37642 dataset was used as the training cohort for model construction and the GSE425, TCGA, GSE106291, and GSE14468 datasets as validation cohorts-1, -2, -3, and -4. Detailed clinical and laboratory covariates were obtained from the GEO dataset website or from the principle investigator (Supplementary Table S1). The ELN 2017 and 2010 risk classifications were used in training cohort and validation cohort-4, respectively. Subjects in ELN 2010 intermediate-I and intermediate-II risk cohorts were combined as the intermediate cohort for analyses. CALGB cytogenetics risk classification was used in validation cohort-2. Only cytogenetic data were available for validation cohort-1. Consequently, we reclassified cytogenetic risk of this cohort according to ELN 2017 risk classification. Genetic risk category could not be accessed in validation cohort-3. Survival was calculated from study entry to death from any cause. Event-free survival (EFS) was calculated from study entry to treatment failure (i.e., no remission, relapse, or death).
Gene expression data processing
The corresponding raw data were downloaded from GEO. For series in which raw data were unavailable, matrix files were downloaded directly. For microarray data, a robust multiarray quantile method was used to normalize expression between arrays (with the limma package; ref. 24). Only validation cohort-2 (GSE425) had missing data. Median missing rate of each sample in GSE425 was 8% (range, 1.5%–61.7%). After testing the accuracy and robustness of imputation, missing data of GSE425 were imputed by the impute.knn algorithm through the impute package (25). When a gene had more than one probe set, we calculated mean values of probe sets as the expression. We used the sva package to merge the normalized data, which were detected by different platforms in the same datasets by adjusting for batch effects (26). A hierarchical clustering method was conducted for outlier detection and deletion with the “hclust” function. The expression matrix was normalized to the TPM format for RNA-seq data processing. Only genes shared within the training microarray matrix were selected for further analysis to avoid noise from other noncoding RNAs.
CIBERSORTx estimation
The standardized gene expression data were uploaded to the CIBERSORTx online analytic platform (https://cibersortx.stanford.edu/) and used to quantify the immune cell composition of myeloid tissue with the LM22 gene signature and 1,000 permutations (10). The algorithm can estimate the relative proportion of 22 human immune-related cell subsets, including B and T cells, natural killer (NK) cells, macrophages, dendritic cells, and myeloid subsets, from the gene expression profile. An empirical P value for the deconvolution was produced for each sample through Monte Carlo sampling (27). Only outputs with P < 0.05 were eligible for further analysis (28).
Statistical analysis
Correlation between immune cells inferred by MPFC and in silico approaches was evaluated by Spearman correlation. The Pearson correlation test was applied to quantify the association between the immune risk score and continuous variables. ANOVA was used to compare immune risk score cohorts. The least absolute shrinkage and selection operator (LASSO) Cox regression model was employed to select the optimal weighting coefficient and build an immune risk score on the basis of the penalized maximum likelihood estimator with the glmnet package. Optimal values of the penalty parameter λ were determined by 1,000-fold cross-validation via the min criteria (29–31). The immune risk score model was generated using the continuous variables of immune cell proportions in the five cohorts. Predictive performance of the immune risk scores was evaluated using time-dependent ROC curves with the quantification of the AUC using the survivalROC package (32). The confidence interval (CI) of the ROC curve was calculated by the bootstrap method. The generated risk scores were separated into high- and low-immune risk categories by the optimal cut-off determined by the survminer package with the minprop (the minimal proportion of the observations/group) of 20%. Survival was estimated according to the Kaplan–Meier method and compared by the log-rank test. Uni- and multivariable Cox regressions were performed to identify prognostic performance. Gene set enrichment analysis (GSEA) was used to identify the significantly enriched pathways between the high- and low-immune risk groups. A nomogram was used to visualize the model of multivariable Cox regression and integrated the single risk factors to construct the merged risk score. Performance of the nomogram was assessed with calibration and prognostic value for survival. Statistical analyses were performed by R software (version 3.6.0) and SPSS version 22.0 (IBM). Statistical tests were two-sided with P values less than 0.05 considered statistically significant.
Ethical consideration
Subjects gave written informed consent in the SYSUCC cohort and we followed principles of the Declaration of Helsinki.
Data availability and deposition
The raw sequence data of SYSUCC cohort have been deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2017) in National Genomics Data Center (Nucleic Acids Res 2020), Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences (Beijing, China), under accession number HRA000369 that are publicly accessible at http://bigd.big.ac.cn/gsa-human.
Results
Subject selection and covariates
We analyzed immune cells from 25 subjects with AML in the SYSUCC cohort. The immune cell composition in AML bone marrow was analyzed by MPFC (Supplementary Fig. S1A and S1B). Seventeen immune cell subsets were analyzed. Except for leukemia cells, monocytes were the most common immune cell types in AML bone marrow, followed by CD8-positive T cells and resting CD4-positive memory T cells (Supplementary Fig. S1C). We inferred immune cell types based on the computational algorithms, including CIBERSORTx, EPIC, MCP-counter, and xCell, and calculated correlations between predicted and actual immune cell subsets to assess predictive accuracy of the algorithms. CIBERSORTx was the most accurate with significant correlations with 11 immune cell subsets between CIBERSORTx-predicted subsets and MPFC subsets (Fig. 1A; Supplementary Fig. S2). The 11 validated immune cells were used to construct the immune risk model construction.
Public datasets were selected using the search strategy displayed in Supplementary Fig. S3 and Supplementary Table S2. Data from 1,435 subjects from four GEO datasets (GSE37642, GSE425, GSE106291, and GSE14468; refs. 19–23) and TCGA were retrospectively analyzed. Data from GSE37642 (n = 538) were used for immune model estimation and data from GSE425 (n = 116), TCGA (n = 130), GSE106291 (n = 196), and GSE14468 (n = 455) were used as validation cohorts-1, -2, -3, and -4. Subject covariates are displayed in Supplementary Table S1. Significant heterogeneity was observed among the training and validation cohorts in distribution of clinical covariates. For example, the training cohort had higher proportion of subjects >60 years of age compared with the other four validation cohorts. Validation cohort-2 had a higher percentage of subjects with normal cytogenetics compared with the training cohort, validation cohort-1, and validation cohort-4.
Immune risk score model construction
Eleven validated immune cell proportions in the training cohort were analyzed using LASSO Cox regression to construct an immune risk score (Fig. 1B and C). A total of 1,000 bootstrap replicates were conducted using the penalized maximum likelihood estimator (Fig. 1B). The regularization path was computed as a grid of values for the regularization parameter, lambda, using the min criteria and optimal weighting coefficients were selected to generate the prognostic model (Fig. 1C). Six immune cell subsets were identified to construct the model (Fig. 1C). The formula for the immune risk score was calculated for each subject as follows:
Risk score = (4.24 × proportion of activated NK cells) + (2.01 × proportion of naïve B cells) – (0.41 × proportion of resting mast cells) – (0.80 × proportion of CD8-positive T cells) – (1.00 × proportion of M2 macrophages) – (2.87 × proportion of follicular helper T cells).
Predictive value of the selected immune cell subsets was further verified by the log-rank test after classification into high and low levels by the corresponding optimal cut-off value in the training cohort (Supplementary Fig. S4).
Evaluation of the immune risk score
Time-dependent ROC analysis was used to investigate the prognostic accuracy of the immune risk score in the training and validation cohorts (Table 1; Supplementary Fig. S5A–S5F). The score was assessed as a continuous variable. The AUC values of the immune risk score for 3- and 5-year survival were 0.66 (95% CI, 0.62–0.71; P < 0.001) and 0.68 (95% CI, 0.63–0.73; P < 0.001) in the training cohort (Table 1; Supplementary Fig. S5A). Parallel values in validation cohort-1 were 0.74 (95% CI, 0.60–0.85; P = 0.001) and 0.77 (95% CI, 0.63–0.93; P < 0.001; Table 1; Supplementary Fig. S5B), 0.63 (95% CI, 0.50–0.71; P = 0.027) and 0.70 (95% CI, 0.52–0.84; P = 0.016) in validation cohort-2 (Table 1; Supplementary Fig. S5C), 0.62 (95% CI, 0.54–0.70; P = 0.002) and 0.57 (95% CI, 0.50–0.65; P = 0.048) in validation cohort-3 (Table 1; Supplementary Fig. S5D), and 0.58 (95% CI, 0.52–0.63; P < 0.001) and 0.58 (95% CI, 0.53–0.64; P = 0.004) in validation cohort-4 (Table 1; Supplementary Fig. S5E). Similar result was observed for EFS in validation cohort-4 with AUC of 3- and 5-year EFS as 0.60 (95% CI, 0.54–0.65; P < 0.001) and 0.59 (95% CI, 0.53–0.65; P = 0.002; Table 1; Supplementary Fig. S5F).
. | 3-year AUC HR (95% CI) . | P . | 5-year AUC HR (95% CI) . | P . |
---|---|---|---|---|
Survival | ||||
Training cohort | 0.66 (0.62–0.71) | <0.001 | 0.68 (0.63–0.73) | <0.001 |
Validation cohort-1 | 0.74 (0.60–0.85) | 0.001 | 0.77 (0.63–0.93) | <0.001 |
Validation cohort-2 | 0.63 (0.50–0.71) | 0.027 | 0.70 (0.52–0.84) | 0.016 |
Validation cohort-3 | 0.62 (0.54–0.70) | 0.002 | 0.57 (0.50–0.65) | 0.048 |
Validation cohort-4 | 0.58 (0.52–0.63) | <0.001 | 0.58 (0.53–0.64) | 0.004 |
EFS | ||||
Validation cohort-4 | 0.60 (0.54–0.65) | <0.001 | 0.59 (0.53–0.65) | 0.002 |
. | 3-year AUC HR (95% CI) . | P . | 5-year AUC HR (95% CI) . | P . |
---|---|---|---|---|
Survival | ||||
Training cohort | 0.66 (0.62–0.71) | <0.001 | 0.68 (0.63–0.73) | <0.001 |
Validation cohort-1 | 0.74 (0.60–0.85) | 0.001 | 0.77 (0.63–0.93) | <0.001 |
Validation cohort-2 | 0.63 (0.50–0.71) | 0.027 | 0.70 (0.52–0.84) | 0.016 |
Validation cohort-3 | 0.62 (0.54–0.70) | 0.002 | 0.57 (0.50–0.65) | 0.048 |
Validation cohort-4 | 0.58 (0.52–0.63) | <0.001 | 0.58 (0.53–0.64) | 0.004 |
EFS | ||||
Validation cohort-4 | 0.60 (0.54–0.65) | <0.001 | 0.59 (0.53–0.65) | 0.002 |
Note: Bold indicates P < 0.05.
Next, subjects were sorted into low- and high-immune risk score cohorts according to the optimal cut-off value determined in each cohort. Five-year survivals were 46% (39%–54%) versus 19% (15%–24%; P < 0.001; Fig. 2A) in the training cohort, 46% (34%–60%) versus 6% (1%–33%; P < 0.001; Fig. 2B) in validation cohort-1, 44% (27%–72%) versus 18% (10%–32%; P = 0.041; Fig. 2C) in validation cohort-2, 44% (36%–53%) versus 24% (13%–43%; P = 0.004; Fig. 2D) in validation cohort-3, and 62% (43%–61%) versus 32% (27%–37%; P < 0.001; Fig. 2E) in validation cohort-4. Five-year EFSs were 35% (25%–48%) versus 23% (19%–28%; P < 0.001; Fig. 2F) in validation cohort-4.
Uni- and multivariable regression analyses
Because ELN risk category was unavailable in validation cohorts- 1 and -2, we used cytogenetic risk classification in these cohorts and ELN risk category in the training cohort and validation cohort-4. In multivariable analyses, the immune risk score remained an independent predictor of survival in the five cohorts, with HRs of 1.45 (1.12–1.88; P = 0.005), 2.12 (1.27–3.52; P = 0.004), 2.02 (1.06–3.87; P = 0.034), 1.66 (1.08–2.53; P = 0.019), and 1.59 (1.0–2.11; P = 0.001; Table 2). We found no correlation between immune risk score and percentage bone marrow blasts (Supplementary Fig. S6). The independent predictive impact of immune risk score after adjusting for percentage bone marrow blasts was confirmed in multivariable Cox regression analyses (Supplementary Table S3).
. | Univariable . | Multivariable . | ||
---|---|---|---|---|
. | HR (95% CI) . | P . | HR (95% CI) . | P . |
Training cohort (AMLCG1999) | ||||
Age (>60 vs. ≤60 years) | 2.03 (1.66–2.49) | <0.001 | 1.58 (1.26–1.96) | <0.001 |
ELN 2017 risk category | 2.06 (1.81–2.35) | <0.001 | 2.04 (1.69–2.46) | <0.001 |
Gender (female vs. male) | 0.82 (0.67–1.01) | 0.057 | — | — |
Bone marrow blasts (>70 vs. ≤70%) | 1.22 (0.99–1.52) | 0.095 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.04 (0.84–1.28) | 0.728 | — | — |
Hemoglobin concentration (>60 vs. ≤60 g/L) | 1.23 (0.78–1.93) | 0.368 | — | — |
Platelets (>40 vs. ≤40 × 10E+9/L) | 1.03 (0.84–1.27) | 0.751 | — | — |
LDH (>300 vs. ≤300 U/L) | 0.99 (0.80–1.24) | 0.963 | — | — |
FLT3-ITD (yes vs. no) | 1.23 (0.97–1.55) | 0.087 | — | — |
RUNX1-RUNX1T1 (yes vs. no) | 0.53 (0.32–0.88) | 0.013 | 0.64 (0.36–1.16) | 0.145 |
RUNX1 mutation (yes vs. no) | 2.10 (1.60–2.69) | <0.001 | 0.93 (0.70–1.24) | 0.629 |
NPM1 mutation (yes vs. no) | 0.71 (0.56–0.90) | 0.005 | 1.28 (0.94–1.75) | 0.122 |
CEBPA mutation (yes vs. no) | 0.42 (0.24–0.73) | 0.002 | 0.82 (0.46–1.47) | 0.509 |
MLL-PTD (yes vs. no) | 1.31 (0.90–1.91) | 0.164 | — | — |
AML subtypes (de novo vs. secondary) | 1.76 (1.36–2.28) | <0.001 | 1.28 (0.97–1.68) | 0.083 |
Immune risk score (high vs. low) | 2.05 (1.61–2.61) | <0.001 | 1.45 (1.12–1.88) | 0.005 |
Validation cohort-1 (Bullinger) | ||||
Age (>60 vs. ≤60 years) | 1.95 (1.19–3.18) | 0.008 | 1.38 (0.81–2.34) | 0.233 |
Cytogenetic risk category | 2.12 (1.13–3.96) | 0.019 | 1.74 (0.89–3.40) | 0.105 |
Gender (female vs. male) | 1.01 (0.62–1.64) | 0.962 | — | — |
Bone marrow blasts (>70 vs. ≤70%) | 0.99 (0.54–1.80) | 0.964 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.12 (0.53–2.35) | 0.763 | — | — |
LDH (>300 vs. ≤300 U/L) | 1.03 (0.52–2.02) | 0.930 | — | — |
FLT3 mutation (yes vs. no) | 1.42 (0.86–2.35) | 0.173 | — | — |
MLL-PTD (yes vs. no) | 1.13 (0.45–2.82) | 0.797 | — | — |
Immune risk score (high vs. low) | 2.58 (1.59–4.18) | <0.001 | 2.12 (1.27–3.52) | 0.004 |
Validation cohort-2 (TCGA) | ||||
Age (>60 vs. ≤60 years) | 2.81 (1.79–4.39) | <0.001 | 2.80 (1.72–4.55) | <0.001 |
Cytogenetic risk category | 3.04 (1.49–6.21) | 0.002 | 2.29 (1.10–4.76) | 0.027 |
Gender (female vs. male) | 1.00 (0.64–1.55) | 0.987 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.27 (0.80–2.02) | 0.301 | — | — |
Hemoglobin concentration (>60 vs. ≤60 g/L) | 0.40 (0.10–1.65) | 0.206 | — | — |
Platelets (>40 vs. ≤40 × 10E+9/L) | 1.18 (0.75–1.86) | 0.473 | — | — |
NPM1 mutation (yes vs. no) | 1.21 (0.74–1.98) | 0.449 | — | — |
Immune risk score (high vs. low) | 1.86 (1.03–3.38) | 0.041 | 2.02 (1.06–3.87) | 0.034 |
Validation cohort-3 (AMLCG2008) | ||||
Age (>60 vs. ≤60 years) | 2.00 (1.38–2.91) | <0.001 | — | — |
Gender (female vs. male) | 0.93 (0.64–1.35) | 0.710 | — | — |
RUNX1-RUNX1T1 (yes vs. no) | 0.67 (0.21–2.12) | 0.499 | — | — |
RUNX1 mutation (yes vs. no) | 1.83 (1.16–2.88) | 0.009 | 1.43 (0.89–2.29) | 0.137 |
Immune risk score (high vs. low) | 1.84 (1.22–2.80) | 0.004 | 1.66 (1.08–2.53) | 0.019 |
Validation cohort-4 (HOVON) | ||||
Age (>60 vs. ≤60 years) | 1.88 (1.35–2.62) | <0.001 | 1.86 (1.33–2.58) | 0.001 |
ELN 2010 risk category | 2.00 (1.52–2.64) | <0.001 | 1.68 (1.23–2.28) | 0.001 |
Gender (female vs. male) | 1.04 (0.83–1.30) | 0.756 | — | — |
Bone marrow blasts (>70 vs. ≤70%) | 1.19 (0.94–1.49) | 0.146 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.18 (0.91–1.53) | 0.216 | — | — |
Platelets (>40 vs. ≤40 × 10E+9/L) | 1.19 (0.93–1.51) | 0.168 | — | — |
FLT3-ITD (yes vs. no) | 1.56 (1.22–1.99) | <0.001 | 1.43 (1.07–1.89) | 0.014 |
NPM1 mutation (yes vs. no) | 0.86 (0.67–1.11) | 0.251 | — | — |
CEBPA mutation (yes vs. no) | 0.64 (0.40–1.04) | 0.070 | — | — |
Immune risk score (high vs. low) | 1.70 (1.29–2.25) | <0.001 | 1.59 (1.20–2.11) | 0.001 |
. | Univariable . | Multivariable . | ||
---|---|---|---|---|
. | HR (95% CI) . | P . | HR (95% CI) . | P . |
Training cohort (AMLCG1999) | ||||
Age (>60 vs. ≤60 years) | 2.03 (1.66–2.49) | <0.001 | 1.58 (1.26–1.96) | <0.001 |
ELN 2017 risk category | 2.06 (1.81–2.35) | <0.001 | 2.04 (1.69–2.46) | <0.001 |
Gender (female vs. male) | 0.82 (0.67–1.01) | 0.057 | — | — |
Bone marrow blasts (>70 vs. ≤70%) | 1.22 (0.99–1.52) | 0.095 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.04 (0.84–1.28) | 0.728 | — | — |
Hemoglobin concentration (>60 vs. ≤60 g/L) | 1.23 (0.78–1.93) | 0.368 | — | — |
Platelets (>40 vs. ≤40 × 10E+9/L) | 1.03 (0.84–1.27) | 0.751 | — | — |
LDH (>300 vs. ≤300 U/L) | 0.99 (0.80–1.24) | 0.963 | — | — |
FLT3-ITD (yes vs. no) | 1.23 (0.97–1.55) | 0.087 | — | — |
RUNX1-RUNX1T1 (yes vs. no) | 0.53 (0.32–0.88) | 0.013 | 0.64 (0.36–1.16) | 0.145 |
RUNX1 mutation (yes vs. no) | 2.10 (1.60–2.69) | <0.001 | 0.93 (0.70–1.24) | 0.629 |
NPM1 mutation (yes vs. no) | 0.71 (0.56–0.90) | 0.005 | 1.28 (0.94–1.75) | 0.122 |
CEBPA mutation (yes vs. no) | 0.42 (0.24–0.73) | 0.002 | 0.82 (0.46–1.47) | 0.509 |
MLL-PTD (yes vs. no) | 1.31 (0.90–1.91) | 0.164 | — | — |
AML subtypes (de novo vs. secondary) | 1.76 (1.36–2.28) | <0.001 | 1.28 (0.97–1.68) | 0.083 |
Immune risk score (high vs. low) | 2.05 (1.61–2.61) | <0.001 | 1.45 (1.12–1.88) | 0.005 |
Validation cohort-1 (Bullinger) | ||||
Age (>60 vs. ≤60 years) | 1.95 (1.19–3.18) | 0.008 | 1.38 (0.81–2.34) | 0.233 |
Cytogenetic risk category | 2.12 (1.13–3.96) | 0.019 | 1.74 (0.89–3.40) | 0.105 |
Gender (female vs. male) | 1.01 (0.62–1.64) | 0.962 | — | — |
Bone marrow blasts (>70 vs. ≤70%) | 0.99 (0.54–1.80) | 0.964 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.12 (0.53–2.35) | 0.763 | — | — |
LDH (>300 vs. ≤300 U/L) | 1.03 (0.52–2.02) | 0.930 | — | — |
FLT3 mutation (yes vs. no) | 1.42 (0.86–2.35) | 0.173 | — | — |
MLL-PTD (yes vs. no) | 1.13 (0.45–2.82) | 0.797 | — | — |
Immune risk score (high vs. low) | 2.58 (1.59–4.18) | <0.001 | 2.12 (1.27–3.52) | 0.004 |
Validation cohort-2 (TCGA) | ||||
Age (>60 vs. ≤60 years) | 2.81 (1.79–4.39) | <0.001 | 2.80 (1.72–4.55) | <0.001 |
Cytogenetic risk category | 3.04 (1.49–6.21) | 0.002 | 2.29 (1.10–4.76) | 0.027 |
Gender (female vs. male) | 1.00 (0.64–1.55) | 0.987 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.27 (0.80–2.02) | 0.301 | — | — |
Hemoglobin concentration (>60 vs. ≤60 g/L) | 0.40 (0.10–1.65) | 0.206 | — | — |
Platelets (>40 vs. ≤40 × 10E+9/L) | 1.18 (0.75–1.86) | 0.473 | — | — |
NPM1 mutation (yes vs. no) | 1.21 (0.74–1.98) | 0.449 | — | — |
Immune risk score (high vs. low) | 1.86 (1.03–3.38) | 0.041 | 2.02 (1.06–3.87) | 0.034 |
Validation cohort-3 (AMLCG2008) | ||||
Age (>60 vs. ≤60 years) | 2.00 (1.38–2.91) | <0.001 | — | — |
Gender (female vs. male) | 0.93 (0.64–1.35) | 0.710 | — | — |
RUNX1-RUNX1T1 (yes vs. no) | 0.67 (0.21–2.12) | 0.499 | — | — |
RUNX1 mutation (yes vs. no) | 1.83 (1.16–2.88) | 0.009 | 1.43 (0.89–2.29) | 0.137 |
Immune risk score (high vs. low) | 1.84 (1.22–2.80) | 0.004 | 1.66 (1.08–2.53) | 0.019 |
Validation cohort-4 (HOVON) | ||||
Age (>60 vs. ≤60 years) | 1.88 (1.35–2.62) | <0.001 | 1.86 (1.33–2.58) | 0.001 |
ELN 2010 risk category | 2.00 (1.52–2.64) | <0.001 | 1.68 (1.23–2.28) | 0.001 |
Gender (female vs. male) | 1.04 (0.83–1.30) | 0.756 | — | — |
Bone marrow blasts (>70 vs. ≤70%) | 1.19 (0.94–1.49) | 0.146 | — | — |
WBC (>10 vs. ≤10 × 10E+9/L) | 1.18 (0.91–1.53) | 0.216 | — | — |
Platelets (>40 vs. ≤40 × 10E+9/L) | 1.19 (0.93–1.51) | 0.168 | — | — |
FLT3-ITD (yes vs. no) | 1.56 (1.22–1.99) | <0.001 | 1.43 (1.07–1.89) | 0.014 |
NPM1 mutation (yes vs. no) | 0.86 (0.67–1.11) | 0.251 | — | — |
CEBPA mutation (yes vs. no) | 0.64 (0.40–1.04) | 0.070 | — | — |
Immune risk score (high vs. low) | 1.70 (1.29–2.25) | <0.001 | 1.59 (1.20–2.11) | 0.001 |
Note: Bold indicates P < 0.05.
Abbreviations: LDH, lactate dehydrogenase; WBC, white blood cell.
Sensitivity and subgroup analyses
The endpoint for our analyses was survival. However, death after starting AML therapy can occur from events not likely correlated with bone marrow immune cell microenvironment, such as early deaths from organ toxicity before complete remission could be achieved and/or evaluated. To reduce this potential confounding, we performed sensitivity analyses focusing on subjects achieving a histologic complete remission. Using the previously adopted cut-off values, subjects in the training cohort achieving a histologic complete remission with high-immune risk score had shorter 5-year survival compared with similar subjects with a low-immune risk score [31% (25%–39%) vs. 57% (48%–67%); P < 0.001; Fig. 3A]. Similar results were observed in the histologic complete remission population in validation cohorts-1, -3, and -4, with 5-year survival rates 50% (22%–100%) versus 94% (87%–100%; P = 0.010 Fig. 3B), 34% (20%–60%) versus 63% (44%–64%; P = 0.053; Fig. 3C), and 40% (34%–46%) versus 58% (50%–68%; P < 0.001; Fig. 3D), respectively. Validation cohort-2 could not be analyzed because of missing data.
Subgroup analysis based on age, ELN 2017, AML subtypes (de novo/secondary/treatment related), cytogenetic risk categories, and several gene mutations was performed in the training cohort (Supplementary Fig. S7A). Immune risk score could discriminate the survival in several specific clinical categories. For example, compared with those with high score, survival of patients with low score was higher in the favorable ELN risk cohort [HR, 2.14 (1.33–3.45); P = 0.002], marginally higher in the intermediate ELN risk cohort [HR, 1.54 (0.98–2.44); P = 0.062], and comparable in the adverse ELN risk cohort [HR, 0.98 (1.25–1.84); P = 0.924; Supplementary Fig. S7B–S7D].
Distribution of the immune risk score in diverse subgroups
Next, we interrogated the distribution of immune risk scores in different clinical and genetic risk cohorts. In the training cohort, subjects >60 years of age had a significantly higher immune risk score compared with those ≤60 years of age (mean ± SD, −0.21 ± 0.25 vs. −0.29 ± 0.23; P < 0.001; Fig. 4A). A similar correlation was observed in validation cohort-1 (0.29 ± 0.39 vs. 0.05 ± 0.48; P = 0.007; Fig. 4A). We found significant differences in immune risk score based on cytogenetic risk in the training cohort (Fig. 4B) and validation cohort-1 (Fig. 4B). The immune risk score increased within the ELN risk levels in the training cohort (−0.18 ± 0.25, −0.24 ± 0.23, and −0.35 ± 0.21 for adverse, intermediate, and favorable ELN risk cohorts; r = 0.29, Spearman correlation; P < 0.001) and validation cohort-4 (−0.18 ± 0.33, −0.20 ± 0.27, and −0.24 ± 0.19 for adverse, intermediate, and favorable ELN risk cohorts; r = 0.09 and P = 0.064 for Spearman correlation) and within the cytogenetic risk levels in validation cohort-2 (0.20 ± 0.29, 0.09 ± 0.25, and 0 ± 0.28 for adverse, intermediate, and favorable cytogenetic risk cohorts; r = 0.25, Spearman correlation, P = 0.004; Fig. 4C).
Immune risk score–related biological phenotypes
Immune risk score–related biological phenotypes were analyzed by mRNA expression profiles in the largest cohort (training cohort). We first derived the correlation between the immune risk score and expression of several selected genes and immune checkpoint markers. Immune risk score was significantly correlated with: (i) immune-related genes, including ADORA2A, B2M, CD27, CD96, and MIF (all Padj < 0.05; detailed P values in Supplementary Table S4; Fig. 5A); (ii) epigenetic and transcriptional regulation genes, including NFE2L2, MAFB, RBPJ, KAM1A, and PRMT5 (all Padj < 0.05; Fig. 5A; Supplementary Table S4); (iii) genes related to cell proliferation, including TGFBR2, PIK3C2A, LYN, SETBP1, NOTCH1, and TP53 (all Padj < 0.05; Fig. 5A; Supplementary Table S4); and (iv) apoptosis genes, including BIRC3, BCL10, BCL2L11, BCL6, and FAF1 (all Padj < 0.05; Fig. 5A; Supplementary Table S4). GSEA data showed a significant enrichment of the highly transcribed genes of the low-immune risk score group in several immune-related biological phenotypes, such as cell activation, immune system development, innate immune response, positive regulation of immune response, and regulation of cytokine production (Fig. 5B and C).
Immune risk score and T-cell exhaustion
Next, we looked for evidence of immune escape and possible correlation with the immune risk score. Expression of PD-1 and CTLA-4 on T cells was assessed by MPFC with the output median fluorescence intensity in the SYSUCC cohort. Immune risk score was calculated on the basis of the CIBERSORTx, as described. Subjects with higher immune risk scores had higher T-cell PD-1 expression (r = 0.50; P = 0.012; Fig. 5D). These data suggest T-cell exhaustion in the high-immune risk score population. We found no similar correlation with CTLA-4 expression.
Construction of risk merged score
Immune risk score, ELN 2017 risk category, and age were integrated into a merged score to predict 5-year survival probability with a nomogram in the training cohort (Fig. 6A). In Cox regression, the immune risk score had a significant interaction with age (P = 0.026), but not with ELN 2017 risk category (favorable, P = 0.120 and intermediate, P = 0.847). Calibration plots showed good agreement between estimated and actual probabilities (Fig. 6B). The merged score with ELN 2017 risk category, age, and immune risk score had a significantly higher AUC [0.83 (0.79–0.87)] compared with the ELN risk category (P = 0.033; Fig. 6C), indicating an advantage for the merged score including the immune risk score.
Discussion
Immune aspects of the bone marrow microenvironment may be important in the biology of AML, including therapy response and survival. However, relatively little is known and much relevant data are contradictory (33–36). We used MPFC to interrogate the immune cell composition of the AML bone marrow microenvironment in the SYSUCC cohort. These data were a reference to evaluate predictive accuracy of several in silico algorithms of immune composition. The CIBERSORTx algorithm was chosen to deconvolute the bulk gene expression profile from subjects with AML in public chemotherapy-based datasets. We used six immune cell subsets to develop an immune risk score which correlated with survival after anti-leukemia therapy in a training and four validation cohorts.
Subjects with low-immune risk score had prolonged EFS and survival. We acknowledge that the bases for this correlation are complex and are cautious to assume it is cause-and-effect. For example, some deaths after intensive chemotherapy are therapy rather than leukemia related. Although we would have preferred to study cumulative incidence of relapse (CIR) or leukemia-free survival (LFS) in histologic complete responders rather than survival, these data were unavailable. Consequently, we used sensitivity analyses of subjects obtaining a histologic complete remission to confirm our findings. Importantly, we found no correlation between the immune risk score and percentage bone marrow blasts, which might have confounded our results. Our score was validated in four independent cohorts.
Several previous studies have used the MPFC to identify specific immune cell subsets and predict therapy- or survival-related outcomes (37, 38). However, these studies had small sample sizes and profiled few immune cell subsets. For example, Galen and colleagues studied several immune cell subsets in patients with AML using single-cell RNA-seq, but did not interrogate the relationship with survival (39).
Consistency and reproducibility of the immune score in four validation cohorts and its independent predictive impact in multivariable analyses suggest the immune score may help predict survival of patients with AML receiving chemotherapy. The GSEA showed significant enrichment of immune-related pathways in the low-risk immune score cohort, including cell activation, immune system development, innate immune response, and regulation of the immune response and cytokine production. An activated immune response in subjects with a low-immune risk score could explain their better prognosis. Moreover, Knaus and colleagues reported that multiple aspects of deranged T-cell function are observed in AML at diagnosis, with exhaustion and senescence being the dominant processes (40). The data in the SYSUCC cohort we presented also indicate that the immune risk score was associated with the PD-1 expression on T cells. Therefore, we speculated T-cell exhaustion may explain, in part, the correlation between the immune risk score and survival we observed.
The proportion of activated NK-cell population had the strongest weight and positive correlation in generating the immune risk score. Others have also found similar data (41). Although some studies report activated NK cells is an effective anti-leukemia therapy (42–44), these cells were allogeneic, not autologous, and were given for relapse rather than at diagnosis
The proportion of follicular helper T cells was the second important value in the formula of immune risk score and was favorably correlated with survival. In contrast to solid cancers (45), we found a high proportion of M2-like macrophages was associated with a lower immune risk score and better survival. Other immune subsets with significant predictive impacts on survival included B cells, CD8-postive T cells, and mast cells.
There are important limitations to our study. Our endpoint was survival, rather than CIR or LFS in patients with histologic complete remission. We also lacked data on some important predictive covariates in some datasets, such as mutation topography and results of MRD testing in subjects achieving a complete remission. The cohorts we studied were heterogeneous in some subject-, disease-, and therapy-related covariates. Especially for therapy, patients in the training cohort received the double induction containing either two courses or one course of high-dose cytarabine plus mitoxantrone and postremission therapy by either autologous stem cell transplantation or by prolonged maintenance. Rare patients in this cohort received an allotransplant. Patients in the validation cohort-1 received one of two treatment protocols (AML HD98A and AML HD98B). Patients in validation cohort-3 received sequential high-dose cytosine arabinoside and mitoxantrone or standard double induction for initial chemotherapy. Validation cohort-4 even included the patients from seven independent clinical trials that enrolled in the Dutch-Belgian Hemato-Oncology Cooperative Group-04, -10, -12, -29, -32, -42, or -43 protocols. The significant treatment heterogeneity in validation cohort-4 mainly contributed the relatively lower AUC of immune risk score in validation cohort-4. Consequently, the cut-offs we used to define low- and high-risk immune score differed between datasets. We acknowledge potential biases in unbalanced covariates between the cohorts.
In conclusion, we constructed an immune risk score on the basis of six immune cell phenotypes identified using CIBERSORTx. This score independently predicted survival in patients with AML receiving chemotherapy. When added, it also improved accuracy of the ELN risk score. Our proposed immune risk model may complement other predictive models, which rely predominately on age, cytogenetics, mutation topography, and results of MRD testing (unavailable pretherapy; ref. 5). Because of our retrospective study design, confirmation in a large clinical trial is needed.
Authors' Disclosures
K.H. Metzeler reports grants and personal fees from Celgene and personal fees from Novartis, Daiichi Sankyo, Otsuka, Astellas, Pfizer, and Jazz Pharmaceuticals outside the submitted work. W. Hiddemann reports grants and personal fees from Roche AG and Janssen outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
Y. Wang: Conceptualization, data curation, writing-original draft, writing-review and editing. Y.-y. Cai: Methodology. T. Herold: Data curation, writing-review and editing. R.-C. Nie: Formal analysis, visualization, methodology, writing-original draft. Y. Zhang: Data curation. R.P. Gale: Conceptualization, writing-review and editing. K.H. Metzeler: Data curation. Y. Zeng: Data curation. S.-q. Wang: Resources, data curation. X.-Y. Pan: Data curation. T.-h. Yang: Data curation. Y.-b. Wu: Data curation. Q. Zhang: Data curation. Z.-j. Wuxiao: Data curation. X. Du: Data curation. Z.-w. Liang: Data curation. Y.-z. Su: Data curation. J.-b. Xu: Data curation. Y.-q. Wang: Data curation. Z.-l. Liu: Data curation. J.-w. Wu: Data curation. X. Zhang: Data curation. B.-y. Wu: Data curation. R.-z. Xiao: Data curation. S.-b. Wang: Data curation. J.-y. Li: Data curation. P.-d. Chi: Data curation. Q.-y. Zhang: Data curation. S.-l. Chen: Data curation. Z.-y. Qin: Data curation. X.-m. Zhang: Software, investigation. N. Zhong: Visualization. W. Hiddemann: Data curation. Q.-f. Liu: Conceptualization, writing-review and editing. B. Zhang: Conceptualization, writing-review and editing. Y. Liang: Conceptualization, project administration, writing-review and editing.
Acknowledgments
Y. Liang was supported, in part, by Sun Yat-sen University Start-up Funding, grant no. 201603, the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (2017ZT07S096), and the National Natural Science Foundation of China (grant nos. 81873428 and 81660682). R.P. Gale acknowledges support from the National Institute of Health Research (NIHR) Biomedical Research Centre funding scheme. T. Herold acknowledges support from the Wilhelm-Sander-Stiftung (grant no. 2013.086.2). We thank Zi-chen Wu and Dan-ru Shi from BD Biosciences total solution team for technical assistance.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.