Abstract
Despite the improvement of therapeutic regimens, several patients with multiple myeloma (MM) still experience early relapse (ER). This subset of patients currently represents an unmet medical need.
We pooled data from seven European multicenter phase II/III clinical trials enrolling 2,190 patients with newly diagnosed MM from 2003 to 2017. Baseline patient evaluation included 14 clinically relevant features. Patients with complete data (n = 1,218) were split into training (n = 844) and validation sets (n = 374). In the training set, a univariate analysis and a multivariate logistic regression model on ER within 18 months (ER18) were made. The most accurate model was selected on the validation set. We also developed a dynamic version of the score by including response to treatment.
The Simplified Early Relapse in Multiple Myeloma (S-ERMM) score was modeled on six features weighted by a score: 5 points for high lactate dehydrogenase or t(4;14); 3 for del17p, abnormal albumin, or bone marrow plasma cells >60%; and 2 for λ free light chain. The S-ERMM identified three patient groups with different risks of ER18: Intermediate (Int) versus Low (OR = 2.39, P < 0.001) and High versus Low (OR = 5.59, P < 0.001). S-ERMM High/Int patients had significantly shorter overall survival (High vs. Low: HR = 3.24, P < 0.001; Int vs. Low: HR = 1.86, P < 0.001) and progression-free survival-2 (High vs. Low: HR = 2.89, P < 0.001; Int vs. Low: HR = 1.76, P < 0.001) than S-ERMM Low. The Dynamic S-ERMM (DS-ERMM) modulated the prognostic power of the S-ERMM.
On the basis of simple, widely available baseline features, the S-ERMM and DS-ERMM properly identified patients with different risks of ER and survival outcomes.
Despite the huge amount of literature, there is a lack of consensus on how to better predict early relapse (ER) in patients with multiple myeloma (MM). We pooled data from seven European clinical trials enrolling 2,190 patients with newly diagnosed MM from October 2003 to March 2017 to develop the Simplified Early Relapse in Multiple Myeloma (S-ERMM) score. This analysis provided further evidence of the critical role of predicting ER in patients with MM, which is strongly associated with poor outcome. The S-ERMM predicted ER by using simple and widely available baseline features. After external validation, the future development of this prognostic index may consider its combination with other static-risk features (genomic abnormalities, circulating tumor cells) and dynamic risk evaluation (response to the therapy such as minimal residual disease) for ER prediction. The identification of high-risk patients with dismal prognosis is the first step toward a better design of therapeutic approaches for this patient subgroup.
Introduction
In the past few years, the prognosis of patients with multiple myeloma (MM) has been markedly improved by the introduction of new drugs and better therapeutic strategies both at diagnosis and at relapse (1–4). Traditionally, the maximal benefit in terms of duration of remission has been observed with first-line therapies. With the use of high-dose chemotherapy and autologous stem cell transplantation (ASCT) combined with novel agents or the adoption of multi-targeted agents including immunomodulatory (IMiD) agents, proteasome inhibitors (PI), and mAbs, the current median progression-free survival (PFS) of patients with newly diagnosed MM (NDMM) ranged between 41 and 50 months (4, 5). Despite this remarkable improvement, still a significant proportion of patients experiences an early relapse (ER), which has been associated with a dismal prognosis. Several studies reported the association of baseline clinical features with ER (6–12), but there is no clear consensus on what the most important determinants are; as a matter of fact, even patients without well-known high-risk features at baseline may relapse early (7, 13). Data published so far mainly come from registries or from retrospective analyses that do not systematically consider updated standard-of-care risk assessments [e.g., cytogenetics, Revised International Staging System (R-ISS); ref. 12].
Unfortunately, a consensus on the appropriate definition of ER is also lacking. So far, it has been defined as relapse within 12 or 18 months from the start of induction treatment (10, 12), or else within 12 or 24 months from transplantation (6–9, 11, 14, 15).
Indeed, patients with ER have an inferior prognosis, as compared with patients who relapse later and, as such, represent a high-risk group and an unmet medical need (12, 14).
The correct identification of patient risk at baseline is the first step toward a risk-adaptive therapeutic approach. The aim of our analysis was to develop and validate the Simplified Early Relapse in Multiple Myeloma (S-ERMM), a score to predict the risk of ER based on widely available clinical and biological features. Thereafter, the S-ERMM score was remodulated during the patient clinical course by integrating response to therapy. We also aimed to correlate the S-ERMM with the long-term outcomes overall survival (OS) and progression-free survival-2 (PFS2).
Materials and Methods
Source of data and participants
Individual patient data from 2,190 patients with NDMM enrolled in seven multicenter European, open-label, phase II/III clinical trials evaluating novel agent-based therapies from October 2003 to March 2017 were pooled together and analyzed: NCT01093196, NCT01346787, NCT01857115, NCT01190787, NCT00551928, NCT01091831, and NCT02203643 (2, 3, 16–20). Each study was approved by ethics committees or institutional review boards at the respective study sites and was conducted in accordance with the Declaration of Helsinki; all patients provided written informed consent. All patients received new drugs (IMiD agents and/or PIs) as upfront treatment, with or without transplantation. Trial details, treatment schedules, and eligibility criteria are reported in Supplementary Tables S1A and S1B.
Prognostic factors and outcomes
Data were retrieved from electronic case-report forms. All the available individual baseline features were analyzed. Age, creatinine levels, albumin, β2-microglobulin (β2m), and monoclonal plasma cells in the bone marrow (BMPC) were evaluated as continuous features. According to the International Myeloma Working Group recommendations, the percentage of BMPCs considered was the highest in case of discrepancy between BM biopsy and BM aspirate (21).
Free light chain (FLC; λ vs. κ), M-component subtype (IgA vs. others), lactate dehydrogenase (LDH) levels >/≤ upper limit of normal (ULN), presence versus absence of plasmacytomas, presence versus absence of chromosomal abnormalities (CA) detected by interphase fluorescence in situ hybridization [iFISH; del17p, t(4;14), t(14;16), t(11;14)] were evaluated as categorical values. iFISH analysis was centralized in one laboratory (see the Supplementary Methods). High-risk CAs were defined as the presence of 17p deletion (del17p) and/or t(4;14) translocation and/or t(14;16) translocation (22). The cutoffs for del17p and IgH translocation were 10% and 15%, respectively. Baseline R-ISS stage (II/III vs. I) was also included in the prognostic factor evaluation (23). Patients with complete data were then split into training and validation sets. In the validation set, patients treated with more innovative and effective therapies were included. These patients also had a shorter median follow-up.
On the basis of the available literature, two cutoffs for ER were evaluated: 18 (ER18) and 24 (ER24) months from diagnosis. In the ER18 analysis, patients who died for reasons other than progressive disease (PD) or who withdrew consent within 18 months were excluded from the analysis because they were not at risk of progression for the entire first 18 months. Patients experiencing PD within 18 months from diagnosis were included in the ER18 population; those not experiencing PD within 18 months were included in the reference population. The reference population was then divided into two groups: patients experiencing PD after 18 months from diagnosis at the time of their last follow-up (Late Relapse group) and patients who were free from progression at the time of their last follow-up (No PD group).
Methods for the ER24 analysis were similar, but they included a cutoff of 24 months after diagnosis (see the Supplementary Appendix). The results regarding the best cutoff are reported in the main text of this contribution. For the sake of completeness, the other analyses are included in the Supplementary Appendix.
OS was calculated from the start of treatment until the date of death or the date the patient was last known to be alive. PFS2 was calculated from the start of treatment until the date of PD after the second line of treatment (second PD) or death (regardless of the cause of death), whichever came first. Other clinical endpoints are detailed in the Supplementary Methods.
Statistical analysis
From the training set, a univariate (UV) analysis on ER18 as outcome was performed according to chi-square and Kruskal–Wallis tests, as appropriate. Features with P < 0.1 were then tested in a multivariate (MV) logistic regression model. We compared two MV analyses, one including the R-ISS and the other including individual features defining the R-ISS (LDH, albumin, β2m, and CAs). To account for potential confounders, each MV analysis was adjusted for age. Subsequently, each MV analysis was identified through a backward selection based on the minimization of the Akaike information criterion to identify independent prognostic factors. Continuous parameters were not categorized a priori because this would have negatively affected the power of the analysis. After selecting the best MV model, the optimal cutoffs for the most significant continuous features were re-evaluated by spline function. MV models were used to estimate OR for ER18 risk, 95% confidence intervals (CI), and P values.
Each model was tested on the validation set by assessing the AUC, to select the most accurate model including individual features or features aggregated into the R-ISS.
Once the most accurate model was selected, three prognostic groups of patients with Low, Intermediate (Int), and High risk of ER were defined by categorizing the linear predictors of the final MV logistic model. Hence, two optimal cut-points were found maximizing the ORs defined by the MV model in the training set. A scalar score was thus proportionally assigned to each predictor according to the coefficients of the final MV model. As the linear score, two optimal cut-points were found maximizing the ORs defined by the MV model in both the training and validation sets. Thus, we developed the S-ERMM score, which identified three different groups of patients with Low, Int, and High risks of ER18. Other statistical survival analyses are detailed in the Supplementary Methods.
To integrate baseline prognostic evaluation and response to treatment, we developed the Dynamic S-ERMM (DS-ERMM), a logistic model that included S-ERMM score and achievement of at least a very good partial response (≥VGPR). Because this score includes response, it should not be assessed at baseline, but at a subsequent timepoint after treatment, to remodulate patient risk during therapy (dynamic risk score). We therefore analyzed data from a landmark point, which was set at the median time to achieve ≥VGPR and included only patients who did not relapse before the landmark point. We assessed the role of ≥VGPR and S-ERMM in a MV logistic regression model to predict ER18. The DS-ERMM was modeled on the proportional coefficients obtained from the MV model. To measure the prognostic performances on this subcohort, we compared the concordance (C)-index assessed in both models (24).
The statistical analysis was performed using R (v.3.5.2). We used the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis Or Diagnosis (TRIPOD) criteria to validate our methods (25, 26).
Data sharing
After the publication of this article, data collected for this analysis and related documents will be made available to others upon reasonably justified request, which has to be written and addressed to the attention of the corresponding author Francesca Gay at the following e-mail address: fgay[at]cittadellasalute.to.it. The corresponding author Francesca Gay is responsible to evaluate and eventually accept or refuse every request to disclose data and their related documents, in compliance with the ethical approval conditions, in compliance with applicable laws and regulations, and in conformance with the agreements in place with the involved subjects, the participating institutions, and all the other parties directly or indirectly involved in the participation, conduct, development, management, and evaluation of this analysis.
Results
Patient characteristics
Data from 2,190 patients were available; 3 patients were excluded because of screening failure.
In the ER18 analysis, 50 patients died for reasons other than PD and 51 withdrew their consent within 18 months and were excluded; patients eligible for the analyses were 2,086. Patients with complete data (n = 1,218) were then split into training (n = 844) and validation (n = 374) sets and included in the logistic regression analysis.
Training set: in the overall population [median follow-up 70 months, interquartile range (IQR) = 48–81 months], the median age was 66 years, 73% of patients presented with R-ISS stage II/III, 10% with LDH>ULN, 14% with del17p, and 14% with t(4;14). Twenty-nine percent of patients had BMPCs > 60% and 36% had λ FLC. A total of 312 of 844 (37%) patients experienced ER18. Patients in the ER18 versus the reference population were significantly older (P = 0.026), and had higher β2m (P < 0.001) and lower albumin (P < 0.001) levels; a higher proportion of patients had LDH>ULN (P = 0.001), t(4;14) (P < 0.001), R-ISS stage II/III (P < 0.001), del17p (P = 0.005), and BMPCs > 60% (P = 0.001; Table 1).
. | POOLED SET . | TRAINING SETa . | VALIDATION SETa . | |||||||
---|---|---|---|---|---|---|---|---|---|---|
. | Overall population . | Overall population . | ER18 population . | Reference population . | P . | Overall population . | ER18 population . | Reference population . | P . | |
No. of patients (%) | 2,190 | 844 | 312 (37) | 532 (63) | — | 374 | 61 (16) | 313 (84) | — | |
Age, y: median (IQR) | 63.0 (56.0–72.0) | 66.0 (57.0–73.0) | 68.0 (58.0–75.0) | 65.0 (57.0–73.0) | 0.026 | 57.0 (51.0–62.0) | 56.0 (48.0–62.0) | 58.0 (52.0–62.0) | 0.173 | |
Missing, N (%) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
Albumin, g/dL: median (IQR) | 3.8 (3.4–4.2) | 3.8 (3.4–4.2) | 3.7 (3.2–4.1) | 3.9 (3.5–4.2) | <0.001 | 3.9 (3.5–4.3) | 3.7 (3.4–4.1) | 3.9 (3.5–4.3) | 0.046 | |
Missing, N (%) | 10 (0.5) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
β2m, mg/dL: median (IQR) | 3.4 (2.4–5.1) | 3.6 (2.5–5.4) | 4.1 (2.8–6.1) | 3.3 (2.4–5.0) | <0.001 | 2.9 (2.0–4.2) | 3.7 (2.2–5.8) | 2.8 (2.0–4.0) | 0.015 | |
Missing, N (%) | 8 (0.4) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
LDH>ULN, N (%) | 211 (9.6) | 83 (10) | 45 (14) | 38 (7) | 0.001 | 54 (14) | 17 (28) | 37 (12) | 0.002 | |
Missing, N (%) | 267 (12.2) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
del17p, N (%) | 246 (11.2) | 121 (14) | 59 (19) | 62 (12) | 0.005 | 52 (14) | 11 (18) | 41 (13) | 0.414 | |
Missing, N (%) | 477 (21.8) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
t(4;14), N (%) | 228 (10.4) | 117 (14) | 63 (20) | 54 (10) | <0.001 | 57 (15) | 17 (28) | 40 (13) | 0.005 | |
Missing, N (%) | 482 (22.0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
t(11;14), N (%) | 341 (15.5) | 156 (18) | 50 (16) | 106 (20) | 0.188 | 87 (23) | 14 (23) | 73 (23) | 1 | |
Missing, N (%) | 522 (23.8) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
t(14;16), N (%) | 78 (3.6) | 32 (4) | 16 (5) | 16 (3) | 0.171 | 19 (5) | 4 (7) | 15 (5) | 0.798 | |
Missing, N (%) | 506 (23.1) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
R-ISS, II/III: N (%) | 1388 (63.5) | 613 (73) | 255 (82) | 358 (67) | <0.001 | 250 (67) | 53 (87) | 197 (63) | <0.001 | |
Missing, N (%) | 388 (17.7) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
Creatinine, mg/dL: median (IQR) | 0.9 (0.7–1.1) | 0.9 (0.8–1.2) | 1.0 (0.8–1.2) | 0.9 (0.8–1.1) | 0.136 | 0.8 (0.7–1.0) | 0.8 (0.7–1.2) | 0.8 (0.7–1.0) | 0.427 | |
Missing, N (%) | 67 (3.1) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
BMPCs >60%, N (%) | 612 (28) | 243 (29) | 112 (36) | 131 (25) | 0.001 | 139 (37) | 27 (44) | 112 (36) | 0.267 | |
Missing, N (%) | 121 (5.5) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
FLC, λ, N (%) | 727 (36) | 301 (36) | 123 (39) | 178 (33) | 0.095 | 143 (38) | 18 (30) | 125 (40) | 0.165 | |
Missing, N (%) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
M component, IgA, N (%) | 451 (21) | 186 (22) | 65 (21) | 121 (23) | 0.575 | 57 (15) | 10 (16) | 47 (15) | 0.937 | |
Missing, N (%) | 3 (0.1) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
Plasmacytomas, N (%) | 266 (12) | 78 (9) | 27 (9) | 51 (10) | 0.743 | 49 (13) | 10 (16) | 39 (12) | 0.532 | |
Missing, N (%) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) |
. | POOLED SET . | TRAINING SETa . | VALIDATION SETa . | |||||||
---|---|---|---|---|---|---|---|---|---|---|
. | Overall population . | Overall population . | ER18 population . | Reference population . | P . | Overall population . | ER18 population . | Reference population . | P . | |
No. of patients (%) | 2,190 | 844 | 312 (37) | 532 (63) | — | 374 | 61 (16) | 313 (84) | — | |
Age, y: median (IQR) | 63.0 (56.0–72.0) | 66.0 (57.0–73.0) | 68.0 (58.0–75.0) | 65.0 (57.0–73.0) | 0.026 | 57.0 (51.0–62.0) | 56.0 (48.0–62.0) | 58.0 (52.0–62.0) | 0.173 | |
Missing, N (%) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
Albumin, g/dL: median (IQR) | 3.8 (3.4–4.2) | 3.8 (3.4–4.2) | 3.7 (3.2–4.1) | 3.9 (3.5–4.2) | <0.001 | 3.9 (3.5–4.3) | 3.7 (3.4–4.1) | 3.9 (3.5–4.3) | 0.046 | |
Missing, N (%) | 10 (0.5) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
β2m, mg/dL: median (IQR) | 3.4 (2.4–5.1) | 3.6 (2.5–5.4) | 4.1 (2.8–6.1) | 3.3 (2.4–5.0) | <0.001 | 2.9 (2.0–4.2) | 3.7 (2.2–5.8) | 2.8 (2.0–4.0) | 0.015 | |
Missing, N (%) | 8 (0.4) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
LDH>ULN, N (%) | 211 (9.6) | 83 (10) | 45 (14) | 38 (7) | 0.001 | 54 (14) | 17 (28) | 37 (12) | 0.002 | |
Missing, N (%) | 267 (12.2) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
del17p, N (%) | 246 (11.2) | 121 (14) | 59 (19) | 62 (12) | 0.005 | 52 (14) | 11 (18) | 41 (13) | 0.414 | |
Missing, N (%) | 477 (21.8) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
t(4;14), N (%) | 228 (10.4) | 117 (14) | 63 (20) | 54 (10) | <0.001 | 57 (15) | 17 (28) | 40 (13) | 0.005 | |
Missing, N (%) | 482 (22.0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
t(11;14), N (%) | 341 (15.5) | 156 (18) | 50 (16) | 106 (20) | 0.188 | 87 (23) | 14 (23) | 73 (23) | 1 | |
Missing, N (%) | 522 (23.8) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
t(14;16), N (%) | 78 (3.6) | 32 (4) | 16 (5) | 16 (3) | 0.171 | 19 (5) | 4 (7) | 15 (5) | 0.798 | |
Missing, N (%) | 506 (23.1) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
R-ISS, II/III: N (%) | 1388 (63.5) | 613 (73) | 255 (82) | 358 (67) | <0.001 | 250 (67) | 53 (87) | 197 (63) | <0.001 | |
Missing, N (%) | 388 (17.7) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
Creatinine, mg/dL: median (IQR) | 0.9 (0.7–1.1) | 0.9 (0.8–1.2) | 1.0 (0.8–1.2) | 0.9 (0.8–1.1) | 0.136 | 0.8 (0.7–1.0) | 0.8 (0.7–1.2) | 0.8 (0.7–1.0) | 0.427 | |
Missing, N (%) | 67 (3.1) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
BMPCs >60%, N (%) | 612 (28) | 243 (29) | 112 (36) | 131 (25) | 0.001 | 139 (37) | 27 (44) | 112 (36) | 0.267 | |
Missing, N (%) | 121 (5.5) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
FLC, λ, N (%) | 727 (36) | 301 (36) | 123 (39) | 178 (33) | 0.095 | 143 (38) | 18 (30) | 125 (40) | 0.165 | |
Missing, N (%) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
M component, IgA, N (%) | 451 (21) | 186 (22) | 65 (21) | 121 (23) | 0.575 | 57 (15) | 10 (16) | 47 (15) | 0.937 | |
Missing, N (%) | 3 (0.1) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | |||
Plasmacytomas, N (%) | 266 (12) | 78 (9) | 27 (9) | 51 (10) | 0.743 | 49 (13) | 10 (16) | 39 (12) | 0.532 | |
Missing, N (%) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) | 0 (0) |
Abbreviations: BMPCs, bone marrow plasma cells; β2m, β2-microglobulin; del17p, 17p deletion; ER18, early relapse within 18 months from diagnosis; FLC, free light chains; IQR, interquartile range; LDH, lactate dehydrogenase; M, monoclonal; N, number; P, P value; R-ISS, Revised International Staging System stage; t, translocation; ULN, upper limit of normal; y, years.
aPatients with complete data only.
Validation set: in the overall population (median follow-up 35 months; IQR, 29–41), the median age was 57 years, which was significantly lower (P < 0.001) than that in the training set. Patients who experienced ER18 were 61 of 374 (16%). The distribution of baseline features between ER18 and the reference population was similar to that in the training set, except for the absence of significant differences in the proportion of patients with BMPCs > 60% and del17p, although this may be related to the smaller sample size (Table 1).
The median time to ≥VGPR was 9 months in the training set and 3 months in the validation set. ≥VGPR at 9 months was achieved in 40% and 81% of patients in the training and validation sets, respectively.
Best model of ER
On the basis of the UV analysis of patients who experienced ER18, 10 of 14 features were included in the MV analysis: age, FLC, BMPCs, del17p, t(4;14), t(14;16), albumin, β2m, LDH, and R-ISS stage. In the MV analysis incorporating the R-ISS, age, R-ISS II/III versus I, and increased BMPCs increased the risk of ER18. When the MV analysis was performed including single features defining the R-ISS, increased BMPCs, λ FLC, LDH>ULN, presence of del17p, and t(4;14) increased the probability of ER18 (Table 2).
. | ER18a . | |||||
---|---|---|---|---|---|---|
. | Analysis including R-ISS . | Analysis including single features . | ||||
. | UV Analysis . | MV Analysis . | UV Analysis . | MV Analysis . | ||
. | P . | OR (95% CI) . | P . | P . | OR (95% CI) . | P . |
Age (increased by 1 y) | 0.026 | 1.01 (1.00–1.02) | 0.113 | 0.026 | 1.01 (1.00–1.03) | 0.094 |
Albumin (increased by 1 mg/dL) | <0.001 | 0.75 (0.60–0.95) | 0.015 | |||
β2m (increased by 1 mg/dL) | <0.001 | —b | —b | |||
LDH (> vs. ≤ ULN) | 0.001 | 2.03 (1.27–3.26) | 0.003 | |||
del17p (presence vs. no) | 0.005 | 1.65 (1.10–2.47) | 0.016 | |||
t(4;14) (presence vs. no) | <0.001 | 2.12 (1.40–3.19) | <0.001 | |||
R-ISS (II/III vs. I) | <0.001 | 1.91 (1.35–2.71) | <0.001 | |||
BMPCs % (increased by 5%) | <0.001 | 1.05 (1.02–1.08) | <0.001 | <0.001 | 1.06 (1.03–1.09) | <0.001 |
FLC (λ vs. κ) | 0.095 | —b | —b | 0.095 | 1.31 (0.97–1.78) | 0.076 |
. | ER18a . | |||||
---|---|---|---|---|---|---|
. | Analysis including R-ISS . | Analysis including single features . | ||||
. | UV Analysis . | MV Analysis . | UV Analysis . | MV Analysis . | ||
. | P . | OR (95% CI) . | P . | P . | OR (95% CI) . | P . |
Age (increased by 1 y) | 0.026 | 1.01 (1.00–1.02) | 0.113 | 0.026 | 1.01 (1.00–1.03) | 0.094 |
Albumin (increased by 1 mg/dL) | <0.001 | 0.75 (0.60–0.95) | 0.015 | |||
β2m (increased by 1 mg/dL) | <0.001 | —b | —b | |||
LDH (> vs. ≤ ULN) | 0.001 | 2.03 (1.27–3.26) | 0.003 | |||
del17p (presence vs. no) | 0.005 | 1.65 (1.10–2.47) | 0.016 | |||
t(4;14) (presence vs. no) | <0.001 | 2.12 (1.40–3.19) | <0.001 | |||
R-ISS (II/III vs. I) | <0.001 | 1.91 (1.35–2.71) | <0.001 | |||
BMPCs % (increased by 5%) | <0.001 | 1.05 (1.02–1.08) | <0.001 | <0.001 | 1.06 (1.03–1.09) | <0.001 |
FLC (λ vs. κ) | 0.095 | —b | —b | 0.095 | 1.31 (0.97–1.78) | 0.076 |
Abbreviations: BMPCs, bone marrow plasma cells; β2m, β2-microglobulin; CI, confidence interval; del17p, 17p deletion; ER18, early relapse within 18 months from diagnosis; FLC, free light chains; LDH, lactate dehydrogenase; M, monoclonal; MV, multivariate; OR, odds ratio; P, P value; R-ISS, Revised International Staging System stage; t, translocation; ULN, upper limit of normal; UV, univariate; y, years.
aOnly significant features (P < 0.1) in UV analysis and age were included.
bExcluded before the MV analysis by the Akaike information criterion.
Each MV model was then tested on the validation set. The AUC was 0.62 (95% CI = 0.55–0.69) for the ER18 model including the R-ISS and 0.66 (95% CI = 0.58–0.73) for the ER18 model incorporating individual features. The ER18 model incorporating individual features resulted in the highest AUC (0.66) and was therefore selected to develop the S-ERMM score.
UV and MV ER24 analyses and the AUC in the validation set are reported in the Supplementary Results and in Supplementary Table S3.
S-ERMM score
The ER18 linear index was calculated as 0.047 × BMPCs %/5 + 0.589 × LDH/ULN (IF LDH>ULN) + 0.459 × del17p (IF present) + 0.705 × t(4;14) (IF present) + 0.293 × FLC (IF λ) − 0.284 × albumin.
In the training set, the linear score significantly discriminated three patient groups (High, Int, and Low risk) with significantly different risks of ER18 (Fig. 1).
BMPC and albumin levels were dichotomized according to the optimal cutoff: high BMPC level if >60% and abnormal albumin level if ≤3.5 or ≥5 (Supplementary Fig. S1).
The S-ERMM score (https://sermm.emnitaly.org/) was mathematically consistent with the linear index and was defined including six features identified in the MV analysis: 5 points for LDH>ULN or the presence of t(4;14); 3 points for the presence of del17p, abnormal albumin, and BMPCs > 60%; and 2 points for the presence of λ FLC (Fig. 1).
The Low-risk group included patients with a total score ≤5 (68% of patients in the training set, 29% of whom experienced an ER18); the Int-risk group included patients with a total score between 6 and 10 (25% of patients in the training set, 50% of whom with ER18); and the High-risk group included patients with a total score ≥11 (7% of patients in the training set, 70% of whom with ER18). In the training set, the S-ERMM significantly discriminated three groups of patients with different risks of ER18: Int versus Low (OR = 2.39, 95% CI = 1.73–3.30, P < 0.001) and High versus Low (OR = 5.59, 95% CI = 3.08–10.16, P < 0.001). The S-ERMM was confirmed in the validation set: Int versus Low (OR = 2.27, 95% CI = 1.23–4.17, P = 0.008) and High versus Low (OR = 4.87, 95% CI = 2.01–11.76, P = 0.001). The impact of the S-ERMM on the ER18 risk was higher than that of each single feature.
In the DS-ERMM analyses, the training population (n = 673) included patients evaluable for response at 9 months (median time to ≥VGPR), 162 (24%) of whom experienced ER18. In this population, both the S-ERMM and the achievement of ≥VGPR were statistically independent predictors of ER in the MV analysis (Supplementary Fig. S2).
The DS-ERMM score was defined as the S-ERMM score obtained at baseline minus 4 points in case of achievement of ≥VGPR. Patients who reached the 9-month cutoff (which was not reached by 171 patients, 150 of whom relapsed/died before), were thus reclassified in three groups: the Low-risk group included 250 patients (37%) with a total score ≤0 (only 12% of whom experienced an ER18); the Int-risk group included 271 patients (40%) with a total score between 1 and 5 (only 24% of whom with ER18); and the High-risk group included 152 patients (23%) with a total score ≥6 (45% of whom with ER18). These three groups had different risks of ER18 (Supplementary Fig. S3): Int versus Low (OR = 2.36, 95% CI = 1.46–3.80, P < 0.001) and High versus Low (OR = 6.34, 95% CI = 3.84–10.46, P < 0.001).
In the validation set, there were no significant differences in terms of risk of ER18 between DS-ERMM Int versus Low, while a trend toward a higher risk was observed in the High versus Low comparison (OR = 2.40, P = 0.09; see the Supplementary Results).
Following the application of the S-ERMM score at baseline and the remodulation of patient risk at 9 months according to DS-ERMM (for those patients who did not relapse during the first 9 months), 20% of patients in the total population of the training set were classified as High-risk patients, 39% as Int-risk patients, and 41% as Low-risk patients.
Survival analysis
A landmark analysis with landmark point at 18 months was performed. OS and PFS2 were significantly shorter in the ER18 population than in the reference population and the Late Relapse and No Relapse populations (Supplementary Figs. S4A–S4D). Similarly, ER18 patients showed an inferior outcome after relapse (Supplementary Results; Supplementary Fig. S5).
The median OS was 31.5 months in patients with S-ERMM High, 59.5 with S-ERMM Int, and not reached (NR) with S-ERMM Low. Median PFS2 was 19.8 months in patients with S-ERMM High, 40.0 months with S-ERMM Int, and 62.3 months with S-ERMM Low. OS and PFS2 were significantly shorter in S-ERMM Int versus S-ERMM Low patients (OS, HR = 1.86, 95% CI = 1.48–2.33; PFS2, HR = 1.76, 95% CI = 1.45–2.14; both P < 0.001) and in S-ERMM High versus S-ERMM lnt patients (OS, HR = 1.74, 95% CI = 1.22–2.50, P = 0.002; PFS2, HR = 1.64, 95% CI = 1.18–2.28; P = 0.003; Fig. 2). The median PFS was 31.6 months in S-ERMM Low, 17.3 months in S-ERMM Int, and 13.2 months in S-ERMM High patients.
Subgroup analyses for OS according to first-line treatment confirmed the prognostic role of the S-ERMM in ASCT-ineligible patients (Int vs. Low, HR = 1.75, 95% CI = 1.30–2.35, P < 0.001; High vs. Int, HR = 1.85, 95% CI = 1.10–3.11, P = 0.020); in ASCT-eligible patients (Int vs. Low, HR = 1.94, 95% CI = 1.36–2.78, P < 0.001; High vs. Int, HR = 1.81, 95% CI = 1.09–3.01, P = 0.022; Supplementary Fig. S6); in patients treated with PIs (Int vs. Low, HR = 1.73, 95% CI = 0.93–3.20, P = 0.083; High vs. Int, HR = 3.13, 95% CI = 1.21–8.08, P = 0.018); and in patients treated with IMiD agents (Int vs. Low, HR = 1.85, 95% CI = 1.45–2.37, P < 0.001; High vs. Int, HR = 1.64, 95% CI = 1.11–2.43, P = 0.013).
According to the DS-ERMM, OS and PFS2 were significantly shorter in DS-ERMM Int versus DS-ERMM Low patients (OS, HR = 1.96, 95% CI = 1.44–2.66; PFS2, HR = 1.86, 95% CI = 1.44–2.38; both P < 0.001) and in DS-ERMM High versus DS-ERMM Low patients (OS, HR = 3.28, 95% CI = 2.37–4.54, P < 0.001; PFS2, HR = 2.91, 95% CI = 2.22–3.82; P < 0.001; Fig. 3).
Discussion
Several studies reported dismal survival outcomes in patients with MM experiencing an ER; however, the definition of ER varies from study to study, and consensus is still lacking. Also, the impact of well-known disease-related risk factors (e.g., albumin, β2m, CAs by iFISH, and LDH) on the risk of ER has not been thoroughly assessed in patients with NDMM. The correct evaluation of baseline ER risk thus remains an unmet medical need.
We confirmed ER with an 18-month cutoff as a strong predictor of the long-term outcomes OS and PFS2 in the context of novel treatment approaches (e.g., PIs, IMiD agents, and ASCT). Our decision of adopting the 18-month cutoff for the definition of ER was also supported by the available literature, in which most of the studies defined ER as both 18 months from diagnosis and 12 months from ASCT (6–10, 12, 27).
We developed the S-ERMM score by identifying and integrating six features that predicted the risk of ER [presence of t(4;14), del17p, LDH>ULN, BMPCs > 60%, abnormal albumin, and λ FLC]. The S-ERMM is a simple tool enabling the identification of three patient groups with significantly different risks of ER (Fig. 2) and significantly different PFS2 and OS. In particular, S-ERMM High patients had a median OS of 31 months, significantly shorter than that of S-ERMM Int (median, 60 months) and S-ERMM Low patients (median NR after 6 years of follow-up).
Although several studies tried to correlate the ER risk with clinical and biological features (6–9, 11, 12, 14), most of them did not cover all of the recognized MM prognostic factors [ref. 23; e.g., extensive CA analysis (8, 9, 14) and LDH assessment (10)]. In our series, we included widely available and well-recognized baseline features. The S-ERMM score included albumin levels (which reflect the inflammatory state at diagnosis; ref. 28), high-risk CAs (del17p and t(4;14), associated with a biologically aggressive disease), and high LDH (29) and BMPC levels (associated with tumor burden; ref. 30).
The majority of analyses published so far on ER in MM are single-center or retrospective studies. To the best of our knowledge, only Bygrave and colleagues analyzed young ASCT-eligible patients enrolled in a single clinical trial (7). Indeed, in our analysis, data from clinical trials underwent a systematic data assessment, with baseline features assessment, centralized laboratory analyses, and uniform evaluation of response and clinical outcomes (31). In this light, the development and validation of the S-ERMM score in a population consisting of both young (transplant-eligible) and elderly (>65 years) patients enrolled in four phase II/III clinical trials treated with novel agents from different drug classes with or without ASCT support the application of this score to patients with NDMM. On the other hand, this is a selected population of European clinical trials that indeed needs validation in real-life settings.
Response to therapy is a strong predictor of better OS and PFS2 (15), and the achievement of a deep response [minimal residual disease (MRD) negativity] may abrogate the poor prognosis conferred by high-risk FISH at diagnosis. Therefore, the importance of integrating static (baseline) and dynamic (response) prognostic features led to the incorporation of response to treatment (≥VGPR) into the S-ERMM score. The assessment of the S-ERMM score at the time of diagnosis and the remodulation of patient risk at 9 months (for those patients who did not relapse during the first 9 months) improved our ability to detect ER patients. In fact, in the initial population of our analysis, only 7% of patients were included in the high-risk group (S-ERMM High), while 68% of patients were included in the S-ERMM Low group (29% of whom had an ER18) and 25% in the S-ERMM Int group (50% of whom had an ER18). Of note, in the DS-ERMM analysis, the Low-risk group included only 37% of patients (with only 12% who had an ER18), and the Int-risk group 40% of patients (with only 24% who had an ER18). Ultimately, using sequentially these two scores in the overall population, 20% of patients were determined to be at High risk, 39% at Int risk, and 41% at Low risk. This improvement in the evaluation of patient risk of ER highlighted the role of the dynamic modulation of patient risk at baseline.
Unfortunately, in the validation set, there were no significant differences in terms of risk of ER18 between DS-ERMM Int versus Low, while trends toward a higher risk were observed in the High versus Low comparison.
Of note, the optimal response (degree and timing) to be incorporated as a dynamic factor should consider the type of patient population and the availability of treatment options: these two factors determine the choice of a specific therapy, with different degrees of efficacy and time to best response. In the validation set, the rate of ≥VGPR was definitely higher than in the training set, and the median time to response was lower. We presume that the assessment of a deeper response, such as the achievement of MRD negativity, could better discriminate patients in the context of novel, highly effective therapies. Unfortunately, MRD evaluation was not available in most of the trials included in the training set and could not be used as optimal response to recalculate the risk of ER after therapy. Still, our main aim was to identify patients at risk of ER using risk assessment at diagnosis, and the S-ERMM score was prognostic in the context of both older (training set) and more recent (validation set) drug regimens.
Our analysis has some limitations. First, the risk classification based on the S-ERMM score was designed to better identify patients at high risk of ER and, as a consequence, was unbalanced, with only a small proportion of patients in the S-ERMM High group. Nevertheless, the risk group stratification improved after the remodulation of risk assessment by using the DS-ERMM score.
Another limitation was the low number of patients treated upfront with a combination of PIs and IMiD agents in the training set, since this currently represents a standard of care for both young and elderly patients. Nevertheless, our results were validated in a population who received intensive and effective induction and consolidation therapies including the second-generation PI carfilzomib with or without IMiD agents and ASCT intensification. In this context, the S-ERMM maintained its prognostic role, but the percentage of patients experiencing ER was definitely lower than that reported in the training set.
In conclusion, we were able to correctly classify a good proportion of patients who experienced ER by assessing the S-ERMM score at baseline and remodulating patient risk at 9 months with the DS-ERMM score. An external validation of the S-ERMM and DS-ERMM scores is warranted, especially in patients treated with combinations of PIs, IMiD agents, and anti-CD38 mAbs. Our ability to predict ER could also be improved by the inclusion of other risk features at baseline with known prognostic impact, such as amp(1q21), TP53 mutational status, and circulating plasma cells (27, 32, 33). Unfortunately, these data were not available for this analysis.
The correct identification of patient risk at diagnosis and during therapy is an essential step toward a risk-adapted approach, the cure of patients, and the prevention of overtreatment and undertreatment.
Authors’ Disclosures
M.T. Petrucci reports personal fees from Celgene, Janssen-Cilag, BMS, Amgen, Takeda, Sanofi, and Karyopharm outside the submitted work. M. Offidani reports grants and personal fees from Amgen and Celgene outside the submitted work. A.M. Liberati reports grants and other support from Bristol-Myers Squibb, Takeda, Roche, Celgene, and AbbVie; other support from Sanofi, Novartis, Iqvia, Verastem, and Servier; and grants from Novartis, Janssen, Amgen, Incyte, Beigene, Oncopeptides, Karyopharm, Archigen, Debiopharm, Morphosys, Fibrogen, and Onconova Therapeutics, Inc. outside the submitted work. R. Mina reports personal fees from Sanofi, Celgene, Takeda, and Janssen outside the submitted work. A. Belotti reports other support from Amgen, Janssen, and Celgene outside the submitted work. G. Gaidano reports personal fees from AbbVie, Janssen, and AstraZeneca outside the submitted work. R. Hájek reports personal fees from Janssen, Abbvie, BMS, and Pharma Mar, as well as grants and personal fees from Amgen, Celgene, Takeda, and Novartis outside the submitted work. P. Sonneveld has served on the advisory boards for Amgen, Celgene, Genenta, Janssen, Seattle Genetics, Takeda, and Karyopharm. P. Musto reports personal fees from Celgene, Janssen-Cilag, Takeda, Novartis, Bristol-Myers Squibb, AbbVie, Amgen, Jazz, Gilead, Sanofi, and GlaxoSmithKline outside the submitted work. M. Boccadoro reports personal fees and other support from Sanofi, Celgene, Amgen, Janssen, Novartis, and Bristol-Myers Squibb; personal fees from AbbVie and GSK; and other support from Mundipharma outside the submitted work. F. Gay reports personal fees from Amgen, Celgene, Janssen, Takeda, Bristol-Myers Squibb, AbbVie, GSK, Roche, Adaptive Biotechnologies, Oncopeptides, and Bluebird bio outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
G.M. Zaccaria: Conceptualization, resources, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. L. Bertamini: Conceptualization, resources, data curation, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. M.T. Petrucci: Resources, data curation, validation, investigation, writing–review and editing. M. Offidani: Resources, data curation, validation, investigation, writing–review and editing. P. Corradini: Resources, data curation, validation, investigation, writing–review and editing. A. Capra: Resources, data curation, software, formal analysis, validation, investigation, visualization, writing–original draft, writing–review and editing. A. Romano: Resources, data curation, validation, investigation, writing–review and editing. A.M. Liberati: Resources, data curation, validation, investigation, writing–review and editing. D. Mannina: Resources, data curation, validation, investigation, writing–review and editing. P. de Fabritiis: Resources, data curation, validation, investigation, writing–review and editing. N. Cascavilla: Resources, data curation, validation, investigation, writing–review and editing. M. Ruggeri: Resources, data curation, validation, investigation, writing–review and editing. R. Mina: Resources, data curation, validation, investigation, writing–original draft, writing–review and editing. F. Patriarca: Resources, data curation, validation, investigation, writing–review and editing. G. Benevolo: Resources, data curation, validation, investigation, writing–review and editing. A. Belotti: Resources, data curation, validation, investigation, writing–review and editing. G. Gaidano: Resources, data curation, validation, investigation, writing–review and editing. A. Nagler: Resources, data curation, validation, investigation, writing–review and editing. R. Hájek: Resources, data curation, validation, investigation, writing–review and editing. A. Spencer: Resources, data curation, validation, investigation, writing–review and editing. P. Sonneveld: Resources, data curation, validation, investigation, writing–review and editing. P. Musto: Resources, data curation, validation, investigation, writing–review and editing. M. Boccadoro: Conceptualization, resources, data curation, supervision, validation, investigation, writing–review and editing. F. Gay: Conceptualization, resources, data curation, formal analysis, supervision, validation, investigation, methodology, writing–original draft, writing–review and editing.
Acknowledgments
G.M. Zaccaria acknowledges the support of European Myeloma Network (EMN) Research Italy. The authors wish to thank all the study participants and referring clinicians for their valuable contributions.
No funding was provided for this contribution.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.