Abstract
Proteasome inhibitors are widely used in treating multiple myeloma, but can cause serious side effects and response varies among patients. It is, therefore, important to gain more insight into which patients will benefit from proteasome inhibitors.
We introduce simulated treatment learned signatures (STLsig), a machine learning method to identify predictive gene expression signatures. STLsig uses genetically similar patients who have received an alternative treatment to model which patients will benefit more from proteasome inhibitors than from an alternative treatment. STLsig constructs gene networks by linking genes that are synergistic in their ability to predict benefit.
In a dataset of 910 patients with multiple myeloma, STLsig identified two gene networks that together can predict benefit to the proteasome inhibitor, bortezomib. In class “benefit,” we found an HR of 0.47 (P = 0.04) in favor of bortezomib, while in class “no benefit,” the HR was 0.91 (P = 0.68). Importantly, we observed a similar performance (HR class benefit, 0.46; P = 0.04) in an independent patient cohort. Moreover, this signature also predicts benefit for the proteasome inhibitor, carfilzomib, indicating it is not specific to bortezomib. No equivalent signature can be found when the genes in the signature are excluded from the analysis, indicating that they are essential. Multiple genes in the signature are linked to working mechanisms of proteasome inhibitors or multiple myeloma disease progression.
STLsig can identify gene signatures that could aid in treatment decisions for patients with multiple myeloma and provide insight into the biological mechanism behind treatment benefit.
Proteasome inhibitors play an important part in the treatment of multiple myeloma, but response varies widely among patients. As proteasome inhibitors are associated with serious side effects and alternative treatments are available, there is a clinical need for a tool that can aid in treatment decision-making at the moment of diagnosis. Here, we present a gene expression signature that can identify patients that will benefit more from a proteasome inhibitor than an alternative treatment. We validated the signature in an independent dataset representing current clinical practice in multiple myeloma. This signature could help optimize treatment selection at the moment of diagnosis.
Introduction
For many anticancer drugs, the response varies widely across patients. As many of these drugs are associated with serious side effects, it is essential to identify which drug will maximally benefit the patient. Tools that aid in such decisions, for example, based on patient-derived genetic or transcriptomic profiles, have only been developed for a few treatments and diseases. Most efforts in this direction focus on detecting specific mutations for which it is known that a targeted therapy exists (1). However, many patients do not carry any mutations that are known to be actionable, and in practice only 7% of patients can be matched to a targeted therapy with the highest level of evidence (2). Moreover, a range of efficacious therapies exist that are nontargeted. Consequently, there is a clear clinical utility for methods that can more generically predict, at the time of diagnosis, if a patient will benefit from a certain treatment or not.
Multiple myeloma is characterized by a malignant proliferation of plasma cells, both in the bone marrow and extramedullary sites. Multiple myeloma is considered incurable with a median survival of approximately 6 years (3). Several driver mutations have been identified in multiple myeloma (4), but in most patients, no actionable mutations are observed and targeted therapies are, therefore, not commonly used in multiple myeloma. Currently, proteasome inhibitors are one of the most important components of treatment in multiple myeloma and since their introduction in the clinic, survival has significantly improved (5). Because of higher immunoglobulin production, multiple myeloma cells are thought to be more reliant on proteasomal degradation of proteins, making them vulnerable to proteasome inhibition (6). After bortezomib, which was the first proteasome inhibitor to be introduced in the clinic for multiple myeloma, second-generation proteasome inhibitors, like carfilzomib and ixazomib, have recently been approved.
Despite the success of proteasome inhibitors, there is still wide variability in response across patients. Substantial efforts have been made to discover what distinguishes responders from nonresponders. For instance, several studies have implicated differential expression of genes involved in the unfolded protein response (7). Other studies describe complex changes in the entire energy metabolism as a potential discriminating factor (8). Several chromosomal aberrations have also been found to influence bortezomib response, although this effect is not fully understood (9, 10). Despite all the efforts, there is currently no biomarker capable of determining which patients will benefit most from receiving a proteasome inhibitor.
Most studies investigating proteasome inhibitor response compare gene expression patterns of patients responding well or poorly to a certain treatment (6, 11–13). The identified genes can then be combined in a classifier to predict good or poor response in new patients. However, a clinically more interesting question is whether a patient will benefit more from a proteasome inhibitor than from another treatment. This is a markedly different question than identifying good and poor responders within one homogeneous treatment group. After all, even patients with poor survival may have benefited from their treatment; their outcome could have been even worse on another treatment. Conversely, a patient with a good survival outcome could have experienced an equivalently good or better response on another treatment. It is, therefore, impossible to assign patients to class “benefit” or “no benefit” a priori, because response to another treatment cannot be observed. Standard methods, which rely on the existence of such class labels, are thus unsuitable for predicting treatment benefit.
Here, we propose a novel method, simulated treatment learning signatures (STLsig), to infer gene signatures that can predict treatment benefit for patients at the moment of diagnosis. We applied STLsig to find a gene expression signature capable of identifying patients for whom treatment with proteasome inhibitors results in better survival than an alternative treatment. First, the gene signature should be capable of predicting proteasome inhibitor benefit in an independent patient cohort, which has been shown challenging for prognostic classifiers (14). A second important objective of STLsig was to identify a simple, interpretable model, which contains genes that have biological relevance to the molecular mechanism underlying proteasome inhibitor efficacy. To enable this, we leveraged the core concept of simulated treatment learning (STL), which we have proposed previously (15), that allows training classifiers without having a predefined labeling of patients. While our previous method was successful in identifying a model that can predict treatment benefit, these models rely on large numbers of Gene Ontology sets, making interpretation complex.
We propose a different approach, which identifies small gene networks that can be used to predict proteasome inhibitor benefit. To obtain a signature for treatment benefit, we formed networks of genes that are complementary in their ability to predict benefit. STLsig is fully data driven and does not rely on any biological knowledge or predefined gene networks as input.
We demonstrate the utility of STLsig on a 910-sample dataset combining three different phase III clinical trials with patients with multiple myeloma receiving a treatment either with or without the proteasome inhibitor, bortezomib (the HTT cohort). STLsig enabled the discovery of a 14-gene signature that can accurately identify a subset of patients benefiting from bortezomib. We validated this gene expression signature in an independent data (the CoMMpass cohort), where we predicted benefit for bortezomib or an alternative proteasome inhibitor, carfilzomib, demonstrating that the signature is robust and generalizes to other data. Moreover, we show that no gene expression signature with a similar performance can be found when the signature genes are removed from the dataset. The genes included in the signature are thus, essential for predicting proteasome inhibitor benefit. Several of the genes in the signature are related to multiple myeloma or the working mechanisms of proteasome inhibitors. To our knowledge, this is the first approach capable of discovering treatment benefit–specific gene signatures without predefined labels.
Materials and Methods
Data
To develop the gene network and train the bortezomib benefit signature, we pooled gene expression and survival data from three phase III trials (referred to as the HTT cohort): Total Therapy 2 (TT2, GSE2658), Total Therapy 3 (TT3, GSE2658), and HOVON-65/GMMG-HD4 (H65, GSE19784). The TT2 dataset includes 345 newly diagnosed multiple myeloma (NDMM) samples, treated either with thalidomide and melphalan (n = 173) or melphalan alone (n = 172). The TT3 dataset includes 238 NDMM samples, treated with bortezomib, thalidomide, dexamethasone, cyclophosphamide, cisplatin, and etoposide (VTDPACE). The H65 dataset includes 327 NDMM samples, treated either with vincristine, doxorubicin, and dexamethasone (VAD, n = 158) or bortezomib, doxorubicin, and dexamethasone (PAD, n = 169). In the HTT cohort, we defined a bortezomib arm (n = 407), which comprises the PAD arm from H65 and TT3, and a non-bortezomib arm (n = 503), which comprises the VAD arm from H65 and TT2. We divided the HTT cohort in a train set (n = 606) and a test set (n = 304). We ensured the two treatment arms were distributed evenly between training and test data and that the HR between the treatments was similar.
All samples were profiled with the Affymetrix Human Genome U133 plus 2.0 array. Gene expression was MAS5 and log2 normalized. Batch effects resulting from pooling different datasets were corrected with ComBat (16). Data were scaled to mean 0 and variance 1 per probe set.
For validation of both the bortezomib model and carfilzomib model, we used the CoMMpass trial (NCT0145429) dataset generated by the Multiple Myeloma Research Foundation. For 747 patients, RNA-sequencing (RNA-seq), survival data, and treatment information were available (CoMMpass Interim Analysis 13). Of these patients, 61 did not receive any proteasome inhibitor in first-line treatment, 530 received bortezomib, and 156 received carfilzomib. Sequencing data were processed with the Cufflinks pipeline (for details see: researcher.themmrf.org). For validation, we combined the log2 normalized values from the HTT data and the FPKM values from CoMMpass. We scaled the combined data to mean 0 and variance 1, and then, performed ComBat batch correction, as performing mean–variance scaling before ComBat leads to better overlap between the datasets in the tSNE. In ComBat batch correction, H65, TT2, TT3, and CoMMpass were defined as four separate batches.
For training the signature, we used progression-free survival (PFS) as an endpoint as we consider PFS a more direct measurement of treatment effect than overall survival (OS). Cox proportional hazard models were fitted using the R package “survival” (version 2.44).
Constructing and evaluating gene pairs
We selected only those probe sets that met the following requirements: (i) variance across the samples >2 in the training dataset before mean variance scaling, (ii) unambiguous mapping to one gene, and (iii) matching gene in the CoMMpass dataset. This yielded n = 3,319 genes. We then constructed all possible gene pairs from these 3,319 genes, resulting in 5,506,221 gene pairs.
To train the gene signature, we divided the HTT cohort (n = 910) into four folds, fold A (n = 202), fold B (n = 202), fold C (n = 202), and fold D (n = 304), fold assignment is provided in the Supplementary Data. Folds A, B, and C were used to train the signature as described below, while fold D acted as hold out data to validate the signature and optimize a threshold to be used in independent validation data.
To determine treatment benefit, we followed the core concept of STL laid out in our previous work (15), where for each patient a score zPFS was defined that measured whether the patient survived longer than expected compared with patients with similar gene expression that received another treatment. More specifically, for gene pair {n,m} and patient j, we defined:
where, PFSj is the PFS time of patient j, I = 1 if patient j received the target treatment (a proteasome inhibitor in this work) and I = −1, otherwise. Moreover, |{{\rm{\Pi }}^j}$| is the set of K nearest neighbors to patient j defined in terms of Euclidean distance in the expression space spanned by genes n and m and only considering patients that received another treatment than patient j. Throughout this article K = 10. In the set |{{\rm{\Pi }}^j}$|, we discarded patients for whom we cannot be sure whether they survived longer or not (i.e., if both patients are censored). This led to an average of seven patients being used in the calculation of μPFSj. Subsequently, zPFS was normalized to a z-score by comparing |\mu $|PFS with a background distribution resulting from repeating this procedure M = 1,000 times with a random |{{\rm{\Pi }}^j}$|. The zPFS score describes how much shorter (or longer) the survival of patient j is compared with patients with similar gene expression, but opposite treatment than expected by random chance.
To score gene pairs, a twofold cross-validation was employed using fold A (n = 202) and fold B (n = 202). Within each fold, a kNN-regression model (k = 10) was trained, which was used to predict zPFS on the other fold. The gene pair score was defined as the Spearman correlation coefficient between the predicted zPFS and calculated zPFS across all patients. The score for each gene pair was the mean correlation of the two folds. We repeated this procedure five times with a different split in folds.
Gene network construction
We constructed gene networks separately for all five repeats, and then constructed a consensus network, which only contained the genes and edges found in all five repeats. To construct the gene networks, for each gene, we ranked all gene pairs containing that gene on the mean Spearman correlation coefficient found. We then connected genes that were mutually synergistic. We achieved this by requiring that AB was among the top 5% of pairs including A and among the top 5% of pairs including B. However, if a single gene was informative for treatment benefit, gene pairs containing this gene could be highly ranked even if the second gene was uninformative. Including these gene pairs in our network and subsequent signature would introduce noise, which would both harm biological interpretation of the signature and potentially decrease the predictive performance in independent data. Therefore, we also required the mean correlation of the gene pair to be above the median correlation of all selected gene pairs. We evaluated all gene networks in the consensus network on their ability to predict benefit and selected the best performing combination to construct the signature.
Gene network selection and gene signature construction
After gene network construction, gene networks were selected using forward feature selection. To rank gene networks, we determined the predictive performance for each gene network. To this end, we calculated zPFS for each patient and each gene network separately on fold A and B together (n = 404). The top 25% of patients (in terms of zPFS) were assigned to class “benefit,” while the remaining patients are assigned to class “no benefit.” Subsequently, a Cox proportional hazards regression on the treatment variable was performed within the “benefit” patient group as well as in the “no benefit” patient group. The performance of a gene network was defined by the difference between the Cox regression β's in class “benefit” and class “no benefit.”
To select gene networks to use in the final model, we performed forward feature selection using fold C, which comprised 202 patients not used in folds A and B. Gene networks were added sequentially based on their performance on folds A and B. Ranking of patients across more than one gene network was done based on the sum of the zPFS scores of the individual gene networks.
Validation of gene networks
To validate the signature in independent data, we used all training data (n = 606) as a reference set, where zPFS was known. For each patient in the validation set, we computed the Euclidean distance to all patients in the reference set per gene network. We then used inverse distance weighting to calculate the estimated zPFS of a validation patient j by:
where, T comprises all patients in the reference dataset. Given a certain gene expression vector x, weights wi were given by:
where, d is the Euclidean distance between the expression data of gene |x\ $|of patients i and j.
Availability of data and material
The datasets supporting the conclusions of this article are available on Gene Expression Omnibus (GEO). Gene expression data from the H65 study are available at GSE19784 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19784). Gene expression data from both TT2 and TT3 are available at GSE2658 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2658). Thirty patients from the TT3 study used in the article are not included in the GSE2658 dataset, these can be found in ArrayExpress dataset E-TABM-1138 (https://www.ebi.ac.uk/arrayexpress/experiments/E-TABM-1138/). The PFS data for all three studies are available at https://github.com/jubels/GESTURE, linked to the GEO and ArrayExpress IDs. All gene expression and survival data for the CoMMpass study are available at: research.themmrf.org.
All codes needed to discover and validate the signature are available at https://github.com/jubels/STLsig. All code requires R and is platform independent.
Results
Overview of the algorithm
STLsig relies on the idea that patients exhibiting similar gene expression profiles who received different treatments, can be used to model response to the treatment they did not receive. Similarity between patients should be defined by genes relevant to treatment benefit. STLsig, therefore, derives treatment-specific gene networks to form a gene expression signature capable of predicting treatment benefit. To train this signature, we divided the HTT cohort in a test set (fold D, n = 304) and a training set (n = 606), which was further subdivided into three equal parts, folds A, B, and C. We then assessed the ability to predict bortezomib benefit for all 5,506,221 gene pairs arising from the high variance genes (n = 3,319) in the HTT training set.
For each patient i in fold A, we determined a z-score (zPFS) per gene pair describing the normalized mean survival difference of patient i with its genetically similar neighbors that received a different treatment than patient i. We then tested the ability of the gene pair to predict the zPFS score for patients in fold B. We also assessed the performance of each gene pair when calculation of zPFS was performed on fold B and predicted on fold A. Performance of each gene pair was defined as the mean Spearman rank correlation coefficient between predicted and calculated zPFS values in both folds. A gene pair was retained if it was synergistic, that is, if the genes in the pair predicted zPFS better together than when they were paired with other genes.
We formed a consensus network by repeating the twofold cross-validation five times. Only gene pairs that were found to be synergistic in all repeats and that exceeded the median correlation across all gene pairs and all repeats were retained. From this consensus network, we extracted gene networks, that is, all connected components.
To evaluated each gene network, we recalculated zPFS for each patient using all genes in the network and classified the top 25% of the patients as class “benefit” and the rest as class “no benefit.” Subsequently, gene networks were ranked on the basis of the difference between the Cox regression β's found in class “benefit” and class “no benefit.” To build the signature, we sequentially added each network based on this ranking and evaluated the performance of the combined networks on fold C. The steps of the algorithm are summarized in Fig. 1.
Gene networks yield a 14-gene signature that can predict bortezomib benefit
The consensus network formed as described previously contained 617 genes connected by 451 edges and consisted of 167 gene networks, which included 104 individual gene pairs. The largest gene network contains 42 genes; the mean number of genes per network is 3.7. The optimal signature was formed by combining the top two ranked gene networks, which are shown in Fig. 2. With this signature, we found an HR of 0.49 [P = 0.09; 95% confidence interval (CI), 0.22–1.11] in class “benefit” (n = 50) and an HR of 0.91 (P = 0.74; 95% CI, 0.54–1.55) in class “no benefit” (n = 152), on fold C of the HTT cohort.
To assign a zPFS score to a new and unseen patient, for whom survival was unknown, we calculated the distance in gene expression space between this patient and every patient in the training data (the reference set). The predicted zPFS score of the new patient was the weighted sum of the zPFS scores of the patients in the reference set. Weights were determined by the inverse distance, that is, the most similar patients in the reference set contributed most to the predicted zPFS (see Materials and Methods). In this manner, we assessed the ability of the 14-gene signature to predict benefit for the 304 patients from the HTT cohort not included in training (fold D). The HR in favor of bortezomib found in fold D was 0.75 (P = 0.11; 95% CI, 0.53–1.06). Figure 3A shows the HR in class benefit found with different zPFS thresholds. A range of thresholds resulted in an HR below the HR observed in the total dataset, indicating that the predicted zPFS was associated with bortezomib benefit. The optimal class “benefit,” that is, the class “benefit” associated with the lowest HR, comprised 30.6% of the patients, which corresponds to a zPFS threshold of 0.3268. With this threshold, we found an HR of 0.47 (P = 0.04; 95% CI, 0.23–0.96) in class “benefit” and an HR of 0.91 (P = 0.68; 95% CI, 0.60–1.39) in class “no benefit” (Fig. 3B). This establishes that our signature can predict bortezomib benefit in unseen data from the same patient cohort, demonstrating that the signature can be used prospectively to inform treatment choice. Our results indicate that, despite the fact that nearly all patients with multiple myeloma receive a treatment regimen that includes a proteasome inhibitor (5), approximately 70% of patients do not see benefit.
The 14-gene signature achieves robust prediction performance in an independent patient cohort
Gene expression signatures often suffer from cohort-specific fitting and cross-validation within one dataset can thus, lead to an overestimation of performance (17). To obtain a more robust estimate of performance it is essential to perform validation on an external and completely independent cohort. Therefore, we validated its performance in the CoMMpass trial, which represents an independent patient cohort, which was profiled on a different platform (RNA-seq). In contrast to the HTT dataset, which is a randomized clinical trial, the CoMMpass dataset is an observational study and thus, represents clinical reality more closely. To bring the CoMMpass RNA-seq data in the same space as the microarray reference dataset, we employed a ComBat batch correction (Materials and Methods; Supplementary Fig. S1). We define a proteasome inhibitor treatment arm (n = 686) and a no proteasome inhibitor treatment arm (n = 61). The proteasome inhibitor treatment arm contained both bortezomib and carfilzomib. Using the threshold optimized on fold D of the HTT cohort, we found an HR of 0.46 (P = 0.04; 95% CI, 0.22–0.97) in class “benefit” (n = 150). We also saw a good performance when we used OS as an endpoint (Table 1). Our signature is thus, capable of predicting benefit in a completely independent cohort and across platforms, indicating the signal picked up by our classifier is robust and generalizes to the broader multiple myeloma patient population.
. | HR whole population (PFS) . | HR benefit (PFS) . | HR no benefit (PFS) . | HR whole population (OS) . | HR benefit (OS) . | HR no benefit (OS) . |
---|---|---|---|---|---|---|
Bortezomib (HTT cohort) | 0.75 (0.53–1.06) | 0.47 (0.23–0.96) | 0.91 (0.60–1.39) | 0.71 (0.45–1.11) | 0.50 (0.20–1.23) | 0.82 (0.47–1.40) |
P = 0.11 | P = 0.04 | P = 0.68 | P = 0.13 | P = 0.13 | P = 0.46 | |
n = 304 | n = 93 | n = 211 | n = 304 | n = 93 | n = 211 | |
PI | 0.70 (0.51–0.97) | 0.46 (0.22–0.97) | 0.79 (0.55–1.13) | 0.55 (0.34–0.85) | 0.18 (0.08–0.42) | 0.74 (0.44–1.23) |
P = 0.04 | P = 0.04 | P = 0.2 | P = 0.007 | P = 8 × 10−5 | P = 0.25 | |
n = 747 | n = 150 | n = 597 | n = 747 | n = 150 | n = 597 | |
Bortezomib (no carfilzomib) | 0.75 (0.54–1.04) | 0.49 (0.23–1.03) | 0.84 (0.58–1.21) | 0.60 (0.39–0.92) | 0.20 (0.09–0.48) | 0.79 (0.48–1.33) |
P = 0.09 | P = 0.06 | P = 0.35 | P = 0.02 | P = 0.0003 | P = 0.038 | |
n = 591 | n = 124 | n = 467 | n = 591 | n = 124 | n = 467 | |
Carfilzomib (no bortezomib) | 0.42 (0.26–0.68) | 0.25 (0.06–0.93) | 0.47 (0.27–0.80) | 0.24 (0.11–0.53) | Infa | 0.37 (0.16–0.85) |
P = 0.0004 | P = 0.04 | P = 0.005 | P = 0.0004 | P = 0.02 | ||
n = 217 | n = 37 | n = 180 | n = 217 | n = 180 | ||
Lenalidomide | 0.72 (0.58– 0.88) | 0.79 (0.50–1.25) | 0.69 (0.54–0.86) | 0.56 (0.41–0.76) | 0.74 (0.36–1.52) | 0.51 (0.37–0.73) |
P = 0.001 | P = 0.31 | P = 0.001 | P = 0.0001 | P = 0.42 | P = 0.0002 | |
n = 747 | n = 149 | n = 598 | n = 747 | n = 149 | n = 598 | |
PI (excluding chemotherapy) | 0.69 (0.49–0.97) | 0.50 (0.23–1.13) | 0.75 (0.51–1.11) | 0.52 (0.33–0.83) | 0.21 (0.08–0.53) | 0.67 (0.39–1.15) |
P = 0.03 | P = 0.10 | P = 0.15 | P = 0.006 | P = 0.001 | P = 0.15 | |
n = 515 | n = 109 | n = 406 | n = 515 | n = 109 | n = 406 | |
PI (no PI/lenalidomide combination) | 0.91 (0.64–1.30) | 0.65 (0.27–1.57) | 1.02 (0.68–1.51) | 0.82 (0.51–1.32) | 0.29 (0.10–0.82) | 1.05 (0.61–1.81) |
P = 0.60 | P = 0.33 | P = 0.93) | P = 0.42 | P = 0.02 | P = 0.86 | |
n = 387 | n = 71 | n = 316 | n = 387 | n = 71 | n = 316 | |
Bortezomib (APEX) | 0.74 (0.54–1.03) | 0.31 (0.11–0.87) | 0.78 (0.55–1.10) | 1.24 (0.82–1.82) | 0.42 (0.15–1.16) | 1.42 (0.93–2.16) |
P = 0.07 | P = 0.03 | P = 0.15 | P = 0.28 | P = 0.09 | P = 0.10 | |
n = 242 | n = 25 | n = 217 | n = 242 | n = 25 | n = 242 | |
Bortezomib/lenalidomide vs. bortezomib | 0.64 (0.51–0.81) | 0.84 (0.51–1.39) | 0.59 (0.45–0.76) | 0.46 (0.32–0.66) | 0.63 (0.28–1.45) | 0.42 (0.28–0.63) |
P = 0.0002 | P = 0.50 | P = 7 × 10−5 | P = 1 × 10−5 | P = 0.28 | P = 3 × 10−5 | |
n = 530 | n = 112 | n = 418 | n = 530 | n = 112 | n = 418 |
. | HR whole population (PFS) . | HR benefit (PFS) . | HR no benefit (PFS) . | HR whole population (OS) . | HR benefit (OS) . | HR no benefit (OS) . |
---|---|---|---|---|---|---|
Bortezomib (HTT cohort) | 0.75 (0.53–1.06) | 0.47 (0.23–0.96) | 0.91 (0.60–1.39) | 0.71 (0.45–1.11) | 0.50 (0.20–1.23) | 0.82 (0.47–1.40) |
P = 0.11 | P = 0.04 | P = 0.68 | P = 0.13 | P = 0.13 | P = 0.46 | |
n = 304 | n = 93 | n = 211 | n = 304 | n = 93 | n = 211 | |
PI | 0.70 (0.51–0.97) | 0.46 (0.22–0.97) | 0.79 (0.55–1.13) | 0.55 (0.34–0.85) | 0.18 (0.08–0.42) | 0.74 (0.44–1.23) |
P = 0.04 | P = 0.04 | P = 0.2 | P = 0.007 | P = 8 × 10−5 | P = 0.25 | |
n = 747 | n = 150 | n = 597 | n = 747 | n = 150 | n = 597 | |
Bortezomib (no carfilzomib) | 0.75 (0.54–1.04) | 0.49 (0.23–1.03) | 0.84 (0.58–1.21) | 0.60 (0.39–0.92) | 0.20 (0.09–0.48) | 0.79 (0.48–1.33) |
P = 0.09 | P = 0.06 | P = 0.35 | P = 0.02 | P = 0.0003 | P = 0.038 | |
n = 591 | n = 124 | n = 467 | n = 591 | n = 124 | n = 467 | |
Carfilzomib (no bortezomib) | 0.42 (0.26–0.68) | 0.25 (0.06–0.93) | 0.47 (0.27–0.80) | 0.24 (0.11–0.53) | Infa | 0.37 (0.16–0.85) |
P = 0.0004 | P = 0.04 | P = 0.005 | P = 0.0004 | P = 0.02 | ||
n = 217 | n = 37 | n = 180 | n = 217 | n = 180 | ||
Lenalidomide | 0.72 (0.58– 0.88) | 0.79 (0.50–1.25) | 0.69 (0.54–0.86) | 0.56 (0.41–0.76) | 0.74 (0.36–1.52) | 0.51 (0.37–0.73) |
P = 0.001 | P = 0.31 | P = 0.001 | P = 0.0001 | P = 0.42 | P = 0.0002 | |
n = 747 | n = 149 | n = 598 | n = 747 | n = 149 | n = 598 | |
PI (excluding chemotherapy) | 0.69 (0.49–0.97) | 0.50 (0.23–1.13) | 0.75 (0.51–1.11) | 0.52 (0.33–0.83) | 0.21 (0.08–0.53) | 0.67 (0.39–1.15) |
P = 0.03 | P = 0.10 | P = 0.15 | P = 0.006 | P = 0.001 | P = 0.15 | |
n = 515 | n = 109 | n = 406 | n = 515 | n = 109 | n = 406 | |
PI (no PI/lenalidomide combination) | 0.91 (0.64–1.30) | 0.65 (0.27–1.57) | 1.02 (0.68–1.51) | 0.82 (0.51–1.32) | 0.29 (0.10–0.82) | 1.05 (0.61–1.81) |
P = 0.60 | P = 0.33 | P = 0.93) | P = 0.42 | P = 0.02 | P = 0.86 | |
n = 387 | n = 71 | n = 316 | n = 387 | n = 71 | n = 316 | |
Bortezomib (APEX) | 0.74 (0.54–1.03) | 0.31 (0.11–0.87) | 0.78 (0.55–1.10) | 1.24 (0.82–1.82) | 0.42 (0.15–1.16) | 1.42 (0.93–2.16) |
P = 0.07 | P = 0.03 | P = 0.15 | P = 0.28 | P = 0.09 | P = 0.10 | |
n = 242 | n = 25 | n = 217 | n = 242 | n = 25 | n = 242 | |
Bortezomib/lenalidomide vs. bortezomib | 0.64 (0.51–0.81) | 0.84 (0.51–1.39) | 0.59 (0.45–0.76) | 0.46 (0.32–0.66) | 0.63 (0.28–1.45) | 0.42 (0.28–0.63) |
P = 0.0002 | P = 0.50 | P = 7 × 10−5 | P = 1 × 10−5 | P = 0.28 | P = 3 × 10−5 | |
n = 530 | n = 112 | n = 418 | n = 530 | n = 112 | n = 418 |
Abbreviation: PI, proteasome inhibitor.
aNo events in carfilzomib arm.
We next assessed the performance for each of the two proteasome inhibitors separately. When we evaluated benefit for the bortezomib patients (excluding carfilzomib patients from the analysis), we found an HR of 0.49 (P = 0.06; 95% CI, 0.23–1.03) in class “benefit” (n = 124). When predicting benefit for the carfilzomib patients, we found an HR of 0.31 (P = 0.06; 95% CI, 0.09 - 1.02) in class “benefit” (n = 38). It should be noted that the carfilzomib “no benefit” group should be considered a “less benefit” group, as there was still a significant HR in favor of carfilzomib in class “no benefit,” likely due to the low overall HR [0.42 (P = 0.0004; 95% CI, 0.26–0.68)]. Nevertheless, the fact that our signature can identify a patient group with substantially reduced HRs for carfilzomib-treated patients indicates that it is more broadly applicable to proteasome inhibitors in general and not only bortezomib. All HRs for PFS and OS are shown in Table 1.
The percentage of patients classified as “benefit” in the CoMMpass dataset was lower than that the HTT dataset. When we calculated the HR on the CoMMpass dataset using different zPFS thresholds to define class “benefit,” we found that for both bortezomib and carfilzomib, the class “benefit” associated with the lowest HR contained approximately 30% of the patients (Fig. 3D and E), similar to what we observed in the HTT data. This shows that in CoMMpass also, different zPFS thresholds were associated with benefit and suggests, approximately 30% of patients with multiple myeloma experience more benefit from proteasome inhibitor treatment than the population as a whole.
Finally, we confirmed that our model is specific for proteasome inhibitor treatment by testing it on the immunomodulatory drug, lenalidomide. We found an HR of 0.79 (95% CI, 0.50–1.25) in class “benefit” (n = 149), clearly showing the signature is specific for proteasome inhibitor treatment.
The predictive performance of the 14-gene signature holds in single-agent proteasome inhibitor treatment
In clinical practice, the majority of patients receive a combination of treatments. To ensure the signal captured in our signature was proteasome inhibitor specific, and not dependent on a specific treatment combination, we tested the performance of our signature on data from the APEX trial (ref. 18; GSE9782). In this trial, a single-agent bortezomib treatment was tested against high-dose dexamethasone in a relapse setting. Unfortunately, two of the genes in our signature (CFAP53 and linc00485) were not measured in this study, but we could apply the signature with the remaining 12 genes. With these genes, 10.3% of the patients were classified as benefit and we found an HR of 0.31 (95% CI, 0.11–0.87; P = 0.03) in favor of bortezomib in class “benefit,” while in class “no benefit,” we found an HR of 0.78 (95% CI, 0.55–1.10; P = 0.15; Supplementary Fig. S2). Second, while there were no single agent–treated patients in CoMMpass, we found that we could still predict benefit when we remove patients who received both a proteasome inhibitor and lenalidomide, albeit with a nonsignificant HR due to lower sample size (Table 1). The signal of our signature is thus, not dependent on a combination of treatments.
The predictive performance of the 14-gene signature is relevant in current clinical practice
Chemotherapy-free regimens play an increasingly important role in multiple myeloma, but chemotherapy was present in the CoMMpass dataset. To test the performance of our model in a more clinically representative setting and show the performance generalizes to a more modern treatment regimen, we excluded all patients who received any type of chemotherapy (vincristine, doxorubicin, cyclophosphamide, or melphalan, n = 232; patient numbers per treatment presented in Supplementary Table S2). We found that, in this chemotherapy-free cohort, we could still predict benefit to proteasome inhibitor treatment with a similar effect size (HR, 0.50; 95% CI, 0.23–1.13; P = 0.10) as found in the whole dataset (Supplementary Fig. S3). However, due to the smaller sample size this HR was not significant at P = 0.05.
Bortezomib and lenalidomide are two of the most used drugs and are often given together. Also, in the CoMMpass data, many patients in the proteasome inhibitor arm also received lenalidomide (see Table 2 for patient numbers per treatment). The combination of bortezomib and lenalidomide is superior to both lenalidomide alone and bortezomib alone. However, in class “benefit,” this combination was not superior to bortezomib alone (HR, 0.95; 95% CI, 0.58–1.55; P = 0.84), suggesting the addition of lenalidomide is not beneficial if the patient already benefits from bortezomib treatment. This shows our signature is also relevant in treatment combinations and could guide when bortezomib alone is sufficient, thus reducing the treatment burden on the patient.
. | Class benefit . | Class no benefit . | Total . |
---|---|---|---|
Bor without len | 53 (35.6%) | 219 (36.6%) | 272 (36.4%) |
Car without len | 8 (5.4%) | 51 (8.5%) | 59 (7.9%) |
Len without car/bor | 10 (6.7%) | 46 (7.7%) | 56 (7.5%) |
Bor + len | 59 (39.6%) | 199 (33.3%) | 258 (34.5%) |
Car + len | 17 (11.4%) | 80 (13.4%) | 97(13.0%) |
No car/bor/len | 2 (1.3%) | 3 (0.5%) | 5 (0.7%) |
149 | 598 | 747 |
. | Class benefit . | Class no benefit . | Total . |
---|---|---|---|
Bor without len | 53 (35.6%) | 219 (36.6%) | 272 (36.4%) |
Car without len | 8 (5.4%) | 51 (8.5%) | 59 (7.9%) |
Len without car/bor | 10 (6.7%) | 46 (7.7%) | 56 (7.5%) |
Bor + len | 59 (39.6%) | 199 (33.3%) | 258 (34.5%) |
Car + len | 17 (11.4%) | 80 (13.4%) | 97(13.0%) |
No car/bor/len | 2 (1.3%) | 3 (0.5%) | 5 (0.7%) |
149 | 598 | 747 |
Abbreviations: Bor, bortezomib (PI arm); car, carfilzomib (PI arm); len, lenalidomide; PI, proteasome inhibitor.
Class “benefit” cannot be characterized by known markers or models
Next, we assessed whether class “benefit” can be characterized by known markers. To this end, we first performed enrichment analysis of the routinely measured chromosomal aberrations (FISH markers), Eastern Cooperative Oncology Group performance status, or revised International Staging System score in both classes. None of these were overrepresented in either class “benefit” or “no benefit” (Fig. 3F). Moreover, none of the markers had a predictive performance for proteasome inhibitor benefit in the CoMMpass study that outperforms our signature (Supplementary Fig. S4). There have been extensive efforts to predict prognosis in multiple myeloma using gene expression, for example, with the GEP70 signature (19, 20). We found no correlation between our score and the GEP70 model (Supplementary Fig. S5). While we observed that the GEP70 low-risk group in CoMMpass had more benefit form proteasome inhibitor treatment (HR, 0.56; 95% CI, 0.38–0.84; P = 0.005), we did not see this effect in the H65/GMMG-HD4 dataset (HR benefit, 0.90; 95% CI, 0.67–1.20; P = 0.45). Our signature could also still distinguish a benefit group (n = 86) within this low-risk group in CoMMpass (HR benefit, 0.33; 95% CI, 0.14–0.76; P = 0.009).
Recently, a seven-gene signature was published to distinguish between standard and good response to bortezomib in the PADIMAC study (19); none of these genes overlapped with our signature genes. When we assessed the ability of our signature to predict bortezomib response in the PADIMAC study, we found an AUC of 0.86 (Supplementary Fig. S6), indicating that our signature is also capable of predicting response. Moreover, the seven-gene signature is reported to be applicable only in a nontransplant setting, while our signature does not have this limitation.
Selected genes and links between them are essential for performance
In prognostic classification it is known that many different signatures with similar performance can be found (21). This casts doubt on the usefulness of biologically interpreting the genes within a signature. We thus, first investigated whether the genes in our signature are essential for performance.
We first permuted the expression vector for each gene in the signature 100 times (while the other 13 genes remained unchanged) and applied this signature to fold D of the HTT cohort. The largest effect was observed for DAB2IP, with a mean difference in validation HR of 0.29 (SE = 0.06). Correlation between genes influenced the decrease in performance, for instance, shuffling SHTN1 had the smallest impact on validation performance and its expression was significantly correlated with more genes than any other gene (with TPD52L1, NES, and ST6GAL2; Supplementary Fig. S7). Therefore, losing its information will have less impact. Nevertheless, we demonstrate that all individual genes are important for the validation performance, as none can be shuffled without decreasing performance (Fig. 4A).
Next, we assessed the importance of the relationship between the genes by shuffling the edges between all genes included in the network 10 times, while ensuring every gene remains linked to at least one other gene. We then inferred a signature with STLsig, meaning a new combination of gene networks was selected to form the predictive signature. The mean HR found in the hold out data in class “benefit” was 0.74 (SE = 0.05), which was approximately equal to the HR found in the dataset without classification. The connection between genes is thus, essential for the performance of the signature.
Finally, we removed all 14 signature genes from the dataset and reran STLsig. The new signature, which contained 312 genes from 85 gene networks, resulted in an HR of 0.56 (P = 0.23; 95% CI, 0.23 - 1.41) in the training data, which was worse than the original signature. The performance on the patients in fold D, which required optimizing a new zPFS threshold, also yielded a worse performance (HR, 0.59; P = 0.06; 95% CI, 0.34 - 1.01; n = 130 in the “benefit” class). Moreover, changing this threshold to yield a differently sized class “benefit” did not yield performances that approached that of the original 14-gene signature (Supplementary Fig. S8).
Together, these results establish that the 14 identified genes are essential to the performance of the model.
Multiple signature genes are associated with multiple myeloma or proteasome inhibition
Having established that the genes in the signature are essential to the performance, we investigated how the genes in the signature may be involved in determining proteasome inhibitor benefit. Interestingly, in addition to having the largest impact when its information was lost, DAB2IP was also the only gene that was significantly differentially expressed between class “benefit” and “no benefit” (P = 0.002). DAB2IP plays an essential role in the IRE1-mediated endoplasmic reticulum stress response and inducing apoptosis via the JNK pathway (22). Apoptosis induced by endoplasmic reticulum stress is one of the main working mechanisms of proteasome inhibitors (5).
While none of the other signature genes were differentially expressed between class benefit and no benefit, several genes do have a clear link to cancer, or multiple myeloma specifically. For instance, NES is a stem cell marker that is not found in healthy plasma cells, but is found specifically in multiple myeloma (23). Moreover, NES has been associated with treatment response in multiple myeloma (23). CLIP1 is involved in microtubule-kinetochore attachment and plays a role in proper chromosome alignment during mitosis (23, 24), and has been associated with cancer progression and chemotherapy resistance (25), although not in relation to multiple myeloma. SNX9 is described to play an important role in trafficking ADAM9 to the cell surface (26). ADAM9 is expressed in multiple myeloma cells and induces IL6 production by osteoblasts, potentially creating a more permissive bone marrow environment for multiple myeloma cell proliferation (27). One of the described working mechanisms of bortezomib is the downregulation of the production of IL6 in the bone marrow environment (27, 28). The gene TPD52L1 is a negative regulator of ATM (29), which is involved in the DNA damage response and activated by bortezomib treatment (29, 30). ST6GAL2 has been described previously to be significantly downregulated in carfilzomib-resistant cell lines (31).
Together, this indicates that our signature is not only capable of predicting benefit, but could also aid in understanding differential response to proteasome inhibitor treatment.
Different cellular response to bortezomib in class benefit
For 142 patients in the HTT cohort, tumor gene expression was measured again 48 hours after receiving bortezomib. To investigate whether the cellular response to bortezomib was different for patients classified as “benefit,” we performed a differential expression analysis before and after treatment separately in class “benefit” and class “no benefit” using SAM (32). Because of the low number of patients in class “benefit” for whom a second measurement was available, we relaxed our definition of benefit and classified patients as “benefit” if the calculated zPFS > 0 (n = 71). We found 12 genes that were significantly differentially expressed before and after treatment in class “benefit,” but not in class “no benefit.” We also found two genes that were significantly differentially expressed only in class “no benefit” (Fig. 4B). To identify the genes that truly represented a different cellular response in class “benefit” and “no benefit,” we computed the difference in fold change between both classes. To ensure that this was not a random difference, we also computed this difference for all genes using 1000 random class labelings. We found four genes, TNS3, PXN, C2CD4A, and PSPC1, where the difference between “benefit” and “no benefit” was larger than expected by random chance (P < 0.05 after Bonferroni correction for multiple testing). None of these genes have been linked to multiple myeloma, although all have been connected to disease progression in other cancer types (33–36). Interestingly, TNS3, PXN, and PSPC1 were all described to play a role in cell adhesion and a migratory phenotype (36, 37). Cell adhesion–mediated drug resistance (CAM-DR) has been described extensively in multiple myeloma (38–40). Moreover, it has been suggested that bortezomib can overcome CAM-DR (41, 42). A different regulation of cell adhesion in class “benefit” could play a role in the observed benefit to proteasome inhibitors.
Discussion
In this work we propose STLsig, a method to identify interpretable signatures that robustly predict patient benefit to proteasome inhibitors from a gene expression measurement at time of diagnosis. The 14-gene signature, derived with our method, validates on an independent patient cohort, which was moreover, measured on a different platform, confirming the robust nature of the signature.
A clinical trial setting is most suitable for training the STLsig model. Here, treatment was randomized and each treatment arm contained roughly the same number of patients. This was important for calculating zPFS and training the signature, as it ensured each patient had sufficiently similar neighbors in the gene expression space. Once the signature is trained, it can be validated on a less balanced dataset. We, therefore, used the HTT cohort as training data, rather than the newer CoMMpass dataset. It should be noted that the treatment combinations used in the HTT cohort are no longer representative of clinical practice; because sufficiently long follow-up is needed to train a model, we necessarily trained on older data. Recently, it was shown that daratumumab, combined with bortezomib, thalidomide, and dexamethasone (VTd), is superior to VTd (43). This will arguably be the new standard, but because daratumumab is relatively new, suitable gene expression datasets are not available. It is clear that proteasome inhibitors continue to play an important role in multiple myeloma treatment. In the CoMMpass dataset, we show that the performance of our signature remained stable in different treatment combinations and that, while it was trained on bortezomib, it can also predict benefit to carfilzomib. Moreover, we show that the addition of lenalidomide to bortezomib-based treatment only leads to better survival in the “no benefit” group. This establishes our model is also relevant in a more modern, chemotherapy-free setting. We also demonstrate our signature can be applied to patients for whom the expression profiling was performed using RNA-seq, demonstrating cross-platform robustness.
We have only considered gene expression patterns in this research because it has been shown that for classifiers aimed at predicting cancer survival, gene expression captures the majority of the signal (44). More specific to multiple myeloma, Chapman and colleagues found bortezomib response could not be reliably predicted from mutation events (19). The mutational landscape in multiple myeloma is quite sparse and we found no difference in mutation burden or in the specific genes that were mutated between class “benefit” and “no benefit” (Supplementary Fig. S9). We also did not see a difference between class “benefit” and “no benefit” in the 63 driver genes that were recently identified (ref. 4; Supplementary Fig. S10). While multiple myeloma is a very complex disease and this complexity can most likely not be captured in only two groups differentiated by gene expression patterns, the signature identified can aid in optimal treatment selection and thus, has direct clinical applicability.
Several of the genes in the signature are already described to be involved in the proteasome system or disease progression in multiple myeloma and we show these genes are essential for the predictive performance, as no equivalent signature could be found without them. These findings reinforce the importance of the selected genes and indicate the power of STLsig to further elucidate proteasome inhibitor–specific mechanisms.
STLsig can readily be applied to other diseases and drugs. A very potent application could be to perform post hoc analysis of clinical trial data for drugs which missed their endpoint. Such analysis could reveal a subset of patients who would still benefit from the drug, thus potentially extracting valuable information from failed clinical trials.
Taken together, we provide a powerful machine learning approach to aid in treatment decisions in the clinic, ensuring a more optimal treatment choice and ultimately improve patient outcomes.
Disclosure of Potential Conflicts of Interest
J. Ubels reports personal fees from SkylineDx (employee) and grants from Van Herk Charity (PhD fellowship) during the conduct of the study, as well as being listed as a coinventor on a patent regarding a method for identifying gene expression signatures owned by SkylineDx. P. Sonneveld reports grants from SkylineDx during the conduct of the study and grants and personal fees from Janssen, Celgene, Karyopharm, and Amgen outside the submitted work. M. H. van Vliet reports other from SkylineDx (employment and stock options) during the conduct of the study, as well as being listed as a coinventor on a patent regarding a method for identifying gene expression signatures owned by SkylineDx. J. de Ridder reports other from Cyclomics BV (founder of company) outside the submitted work, as well as being listed as a coinventor on a patent regarding a method for identifying gene expression signatures owned by SkylineDx. No other potential conflicts of interest were disclosed.
Authors' Contributions
J. Ubels: Conceptualization, software, formal analysis, validation, investigation, visualization, methodology, writing-original draft, writing-review and editing. P. Sonneveld: Supervision, writing-review and editing. M.H. van Vliet: Conceptualization, methodology, writing-review and editing. J. de Ridder: Conceptualization, supervision, methodology, writing-original draft, writing-review and editing.
Acknowledgments
The CoMMpass dataset was generated as part of the Multiple Myeloma Research Foundation Personalized Medicine Initiatives (https://research.themmrf.org and www.themmrf.org). J. Ubels was supported by a PhD fellowship from the Van Herk Charity foundation.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.