Abstract
Treatment of myeloma has benefited from the introduction of more effective and better tolerated agents, improvements in supportive care, better understanding of disease biology, revision of diagnostic criteria, and new sensitive and specific tools for disease prognostication and management. Assessment of minimal residual disease (MRD) in response to therapy is one of these tools, as longer progression-free survival (PFS) is seen consistently among patients who have achieved MRD negativity. Current therapies lead to unprecedented frequency and depth of response, and next-generation flow and sequencing methods to measure MRD in bone marrow are in use and being developed with sensitivities in the range of 10−5 to 10−6 cells. These technologies may be combined with functional imaging to detect MRD outside of bone marrow. Moreover, immune profiling methods are being developed to better understand the immune environment in myeloma and response to immunomodulatory agents while methods for molecular profiling of myeloma cells and circulating DNA in blood are also emerging. With the continued development and standardization of these methodologies, MRD has high potential for use in gaining new drug approvals in myeloma. The FDA has outlined two pathways by which MRD could be qualified as a surrogate endpoint for clinical studies directed at obtaining accelerated approval for new myeloma drugs. Most importantly, better understanding of MRD should also contribute to better treatment monitoring. Potentially, MRD status could be used as a prognostic factor for making treatment decisions and for informing timing of therapeutic interventions. Clin Cancer Res; 23(15); 3980–93. ©2017 AACR.
Foreword
In the words of patient advocate Dr. James Omel, commenting on the practical value of minimum residual disease (MRD) technology in the assessment of treatment for myeloma, “Does MRD negativity mean that I am ‘cured?’ Maybe. Does MRD positivity at a very low rate, such as 10–6, mean that I need more treatment? Maybe. Maybe not. Even though we patients are now able to achieve a fantastic level of response not attainable in the past, our doctors still must learn how to utilize this marker. Can I ever truly feel that I am free of this disease? Can patients safely stop taking expensive maintenance therapy?” With the following descriptions of current evidence surrounding the use and utility of MRD testing, we hope to answer some of these questions.
Introduction
Treatment paradigms in multiple myeloma have undergone a radical transformation in the past decade due to the introduction of novel agents, including immunomodulatory drugs, proteasome inhibitors, and mAbs, which are more effective and better tolerated than conventional chemotherapy. These novel agents, along with improvements in supportive care, have changed the treatment paradigm and prolonged the survival of patients with myeloma 3- to 4-fold (1, 2). Progress has not been limited to the treatment arena but has also included a better understanding of disease biology, revision of diagnostic criteria, as well as development of new sensitive and specific tools for disease prognostication (3–10). Assessing response to therapy is an integral component of disease management, and unlike the majority of other cancers, in myeloma, this has primarily been done indirectly through assessment of a tumor marker—the secreted monoclonal protein. Although this is applicable in the majority of patients, a small proportion of patients have no measurable monoclonal protein secretion, and direct assessment of the tumor burden is utilized (11). Importantly, novel agents can now achieve increased depth and frequency of response, including complete response (CR) and prolonged progression-free survival (PFS), but residual drug-resistant tumor cells lead to inevitable relapses in nearly all patients. Multiple studies using either flow or molecular methods with increased sensitivity to detect tumor cells have consistently shown a superior PFS among patients who have achieved an MRD-negative status (12–18). As current therapies achieve an unprecedented frequency and depth of response, these novel, more sensitive methods for response assessment and detection of MRD are urgently needed (19).
In January 2016, in a collaboration convened under the Foundation for the National Institutes of Health (FNIH) Biomarkers Consortium, a diverse multidisciplinary group of advocacy organizations and patients, research foundations, academia, government (NIH and FDA), and industry was assembled to produce this current perspective piece to document and clarify the role of MRD in improving patient care and enhancing the development of new therapies in myeloma. This article will (i) describe the state of the science and technology supporting the use of MRD in myeloma, (ii) summarize recent meta-analyses of the impact of MRD on PFS and overall survival (OS) in myeloma, and (iii) examine the current evidence and propose studies needed to define MRD as a response biomarker/surrogate endpoint both for new drug approval and for informing clinical practice in myeloma.
Data Science and Technology
MRD assessment in myeloma: Technical considerations
Three technical approaches have been used to address this issue: one cellular technique [multiparametric flow cytometry (MFC)] and two molecular techniques [quantitative allele-specific oligonucleotide PCR (qASO-PCR) and multiplex PCR using standard primers recognizing all V and J segments within a given immunoglobulin (Ig) locus, followed by high-throughput next-generation sequencing (NGS)]. Both qASO-PCR and NGS are based on the measurement of the patient-specific clonal rearrangements of the Ig gene. Both tools have a relatively high sensitivity, but NGS has the added advantage of not requiring patient-specific primers, as noted in the International Myeloma Working Group (IMWG) response criteria guidelines (20).
In myeloma, it is important to use a highly sensitive test for MRD, as the lowest MRD level seems to be associated with the longest PFS (21). Unlike some other hematologic malignancies, MRD must be evaluated in the bone marrow, where most tumor plasma cells (PC) reside, and to date, several attempts to evaluate it in blood have shown much lower sensitivity (loss of two logs). Determining MRD measurements in blood, including identifying a trend of MRD over time, using sensitive techniques, such as optimized and validated MFC [next-generation flow (NGF)] or NGS, remains an area of active investigation. These cellular and molecular technologies have advantages and disadvantages (see Table 1).
Flow versus NGS MRD comparison
Flow cytometry (MFC/NGF) |
Rapid (results available within a few days) |
Applicable to most patients |
Does not require pretreatment immunophenotype |
Requires fresh samples |
High sensitivity attainable (with NGF, at 10−6 cells) |
Subjective interpretation |
Standardization in progress, but multiple different fluorochrome panels, protocols, instruments, etc. are in use |
Technology widely available |
NGS |
<1 week turnaround time possible (10 days–2 weeks currently) |
Not applicable to all patients (∼95%) |
Requires diagnostic sample; does not require patient-specific primers |
Does not require fresh samples |
High sensitivity (10−6 cells) |
More objective interpretation |
Standardization possible |
Technology not yet widely available |
Flow cytometry (MFC/NGF) |
Rapid (results available within a few days) |
Applicable to most patients |
Does not require pretreatment immunophenotype |
Requires fresh samples |
High sensitivity attainable (with NGF, at 10−6 cells) |
Subjective interpretation |
Standardization in progress, but multiple different fluorochrome panels, protocols, instruments, etc. are in use |
Technology widely available |
NGS |
<1 week turnaround time possible (10 days–2 weeks currently) |
Not applicable to all patients (∼95%) |
Requires diagnostic sample; does not require patient-specific primers |
Does not require fresh samples |
High sensitivity (10−6 cells) |
More objective interpretation |
Standardization possible |
Technology not yet widely available |
MFC and NGF
Tumor PCs are characterized by a significant number of aberrant phenotypes, enabling a clear separation of tumor and normal PCs. Patient-specific aberrant phenotypes are highly stable and distinguish the abnormal PC from the background of all other cells in the sample during the course of the disease (22). Accordingly, optimized panels of mAbs and sample preparation protocols allow the identification of tumor PCs from a background of normal PCs irrespective of patients' phenotypes, and the analysis of the diagnostic sample is not mandatory. Small changes in the phenotype may occur throughout the course of the disease (related to clonal selection of chemoresistant MRD cells), but NGF can always discriminate between tumor and normal PCs, and such small changes do not influence the specificity or sensitivity of the assay. MFC requires fresh cells, ideally in less than 36 hours after bone marrow aspiration, and potentially is applicable to 100% of the patients.
MRD monitoring by MFC and also NGF is widely available, with most cytometry labs equipped with ≥8-color cytometers. In the past, the wide availability of MFC has naturally led to the implementation of multiple institutional nonstandardized protocols, with variations in laser settings, calibration, and compensation. Most importantly, large variability exists in the antibody panels as well as the specific fluorochromes and number of analyzed cells (23). Despite the significant variability between institutions and the limited sensitivity of traditional flow cytometry methods, virtually all groups using MFC-based MRD assessment have reported positive results using this method. Nevertheless, the sensitivity of conventional MFC MRD assessment remains systematically lower than qASO-PCR or NGS. However, due to recent technical improvements (24), sensitivities of NGF protocols are now routinely in the 2 × 10–6 range, and implementation of optimized and validated methods in advanced cytometry labs under strict performance evaluation and periodic quality control may overcome previous lack of standardization (25).
NGF incorporates a sample quality check of bone marrow cellularity via simultaneous detection of B-cell precursors, erythroblasts, myeloid precursors, and/or mast cells. This information is critical to ensure sample quality and to identify hemodiluted bone marrow aspirates that may lead to false negative results. Because of the recent advent of immunomodulatory therapies, the possibility of flow-based comprehensive immune profiling [including T, natural killer (NK), and B cells, as well as macrophages and other myeloid cells] at the time of MRD assessment could also be predictive of outcome (26) and may help to identify patients who despite persistent MRD may experience long-term survival due to active immune surveillance.
NGS
Tumor PCs have specific clonal rearrangements of one or more of the three Ig genes (IgH, IgLκ, and IgLλ). These rearrangements are unique to the original progenitor B cell that develops into the malignant PC and its progeny. To track these rearranged Ig gene sequences over time in any given patient, an initial diagnostic or relapse sample must first be analyzed to define the specific dominant sequence or sequences in the proliferating malignant clone. On a practical level, the sequence(s) so identified provides a reliable marker of the transformed clone in a variety of lymphoid malignancies (18, 27, 28). There has been some study of variation in the VDJ sequences seen in patients with lymphoid malignancy, but such variation is inconsistent, and seen almost entirely in patients with B-precursor acute lymphoblastic leukemia (ALL) whose malignant cells represent an earlier developmental stage, and is not seen in patients where transformation occurs later along a differentiated pathway, as in chronic lymphocytic leukemia (CLL), or presumably myeloma, that has moved beyond VDJ recombination and to or through a phase where somatic hypermutation occurs. Variation is seen primarily on only one of the two IgH alleles in a given clone, and the varying allele has a common Ig “DNJ” sequence string that does not vary over time (29). Thus, in general, these primary nucleotide sequence markers are stable during the course of the disease, and either fresh or frozen samples may be used. The sensitivity of NGS is strictly a function of how many cellular equivalents of DNA are analyzed, with the limit of detection approaching 1 × 10–6. By using chemically synthesized templates that encompass every possible VJ combination for each of the Ig loci, NGS technology has achieved a high level of performance (30); moreover, these templates allow for robust quantitation of this assay, serve as internal controls, and can be used for proficiency testing across sites. As with NGF methods, NGS methods are able to achieve a high sensitivity up to 10–6.
Because of the recent advent of immunomodulatory therapies for the treatment of solid and hematologic malignancies, the need to better understand the T-cell repertoire and tumor- or drug-specific T-cell response has become imperative. The same methodology of immunosequencing described above for assessment of the repertoire of the Ig loci has been successfully applied to profiling the T-cell receptor sequences present in any sample of interest. It is possible using two different aliquots of the same genomic DNA specimen (extracted from a given bone marrow or blood sample from a patient with myeloma) to track residual disease using the Ig loci and also determine the mature T-cell repertoire. Assessment of the diversity present in a sample can predict the likelihood of immune-mediated events in a variety of solid tumors (31). In myeloma, immunosequencing and MFC technology has shown that the anti-CD38 mAb daratumumab increased T-cell number and clonality of the peripheral blood T-cell repertoire in responsive patients (32).
Metabolic response and MRD evaluation outside the bone marrow
The patchy pattern of bone marrow PC infiltration in myeloma increases the likelihood of a false negative assessment of MRD by using NGF or NGS. In addition, bone marrow evaluations fail to detect extramedullary sites of clonal PCs, which are observed more commonly now due to extended survival of myeloma patients and to more sensitive imaging studies (33, 34). Therefore, to ensure complete eradication of tumor, sensitive bone marrow–based assays should be coupled with functional imaging techniques that assess residual disease outside the bone marrow.
18Fluoro-2-deoxyglucose positron emission tomography (FDG-PET)/computed tomography (CT), is employed to evaluate and monitor response to treatment due to its ability to distinguish between metabolically active and inactive sites of disease (35). In several prior studies, posttreatment FDG-PET/CT negativity was defined according to a range of heterogeneous criteria, which included the lack of any increased FDG uptake, decreased metabolism of the tumor below different cutoffs of standard uptake value (SUV), or the absence of metabolically active focal lesions. Nevertheless, all these studies consistently showed positive outcomes related to negative FDG-PET/CT scans in previously untreated patients after completion of therapy, either before or after autologous stem cell transplantation (ASCT), or even in the absence of high-dose therapy and subsequent ASCT (15, 36–40). Whether or not FDG-PET/CT can detect MRD after highly effective newer therapies will be addressed in future clinical trials. Finally, MRD monitoring with FDG-PET/CT may result in both false positive and false negative results (35).
Data on evaluation of response to therapy with dynamic contrast-enhanced (DCE)-MRI and diffusion-weighted imaging (DWI)-MRI are limited. On the basis of initial experiences, it is likely that whole-body DWI-MRI is highly sensitive for detection of diffuse bone marrow PC infiltration, correlates better than FDG-PET with bone marrow trephine samples, and can identify significant changes in patients in remission after therapy (41, 42). Very recently, DCE-MRI has been shown to have prognostic significance. A retrospective analysis on a limited series of newly diagnosed myeloma patients showed a good relationship between IMWG response criteria and imaging response, as evaluated by combining anatomic information from conventional MRI with functional information from DCE-MRI and DWI-MRI (43).
In the absence of more specific myeloma tracers, FDG-PET/CT remains the preferred imaging technique for monitoring metabolic response to therapy outside the bone marrow. Changes in FDG-PET/CT avidity provide an earlier evaluation of response than is possible with MRI, but future studies will compare FDG-PET/CT with DCE-MRI or whole-body DWI-MRI. In addition, MRI/PET technology is continuing to mature and may have a role in the future. Given the ability of FDG-PET/CT to explore the extramedullary compartment, this technique is complementary with sensitive bone marrow cell– and molecular-based assays for MRD assessment after therapy. Attempts to standardize the interpretation of results are ongoing (35). The combination of a negative bone marrow–based assay, negative FDG-PET/CT scan, and a normal serum-free light/heavy-chain ratio might ultimately reflect the complete eradication of myeloma cells.
Emerging technologies: Comprehensive profiling of myeloma from blood and bone marrow at low tumor burden
Characterization of myeloma requires a method to (i) isolate myeloma-derived genetic or cellular material from the peripheral blood or the bone marrow from myeloma patients with high sensitivity, (ii) enable comprehensive genomic analysis of small numbers of myeloma cells or low concentrations of tumor-derived DNA fragments, and (iii) provide information on genomic aberrations in a quantitative manner. An ideal test would detect the presence and molecular characteristics of residual myeloma subclones at low tumor burden, detect mutations or other drug targets that can guide therapy, and be easily repeated throughout treatment. Although flow cytometric analysis of bone marrow or peripheral blood has the potential to detect recurrence of myeloma and MRD with a sensitivity of 10–5 or better, it does not reveal detailed molecular information to guide therapeutic intervention. The existence of circulating myeloma cells [or using the more general term, circulating tumor cells (CTC)] has been reported (44, 45), but extensive genomic characterization has not been feasible due to the small numbers of these cells. Similarly, cell-free tumor nucleic acids circulating in blood plasma are known in myeloma (46), but technologies have not been available to deeply and broadly profile low concentrations of tumor-derived DNA fragments among a high background of DNA fragments from noncancerous cells. Interrogation of CTCs and cell-free DNA (cfDNA) would have the benefit of not requiring repeated bone marrow biopsies.
Ultra-deep sequencing of cfDNA isolated from blood plasma (Fig. 1) has recently been shown to recapitulate mutational profiles and relative subclonal composition inferred from clinical testing of matched bone marrow aspirates (96% concordance, >98% specificity; ref. 47). In this study, hybrid capture of all exons of five genes in cfDNA followed by sequencing to >20,000× detected mutations that were not present in single bone marrow aspirates but that persisted in serial blood samples at concentrations consistent with clinical course. This approach is highly scalable to encompass hundreds of frequently mutated genes, a broader panel of cytogenetic abnormalities, or whole exome or genome sequencing (48). To detect MRD will require integration of molecular barcoding techniques to enable error suppression at levels <10–5 (49, 50) as well as expansion of the targeted genomic regions to increase detection of low-frequency mutant alleles (51), including new markers of resistant clones. In its current form, cfDNA sequencing can be a valuable adjunct to bone marrow testing and may complement emerging single-cell assays.
Method for isolation of cfDNA from blood plasma to enable sequencing of tumor mutational profiles.
Method for isolation of cfDNA from blood plasma to enable sequencing of tumor mutational profiles.
Flow cytometric analysis can be used to isolate highly purified PC clones (26) or CTCs (52). Comprehensive genomic analysis of myeloma CTCs has been made possible by a recently developed methodology using sensitive serial dilution and single-cell isolation, coupled with whole genome amplification (53, 54). This approach provides robust transcriptomic profiling to detect expression of >3,700 parameters on average, all with single-cell resolution, yielding important information about cancer drivers and therapeutic targets. There are a number of technologies to isolate CTCs (55), ranging from high-speed flow cytometric cell sorting to microfluidic cell isolation. Single myeloma CTCs are isolated with a sensitivity of 10–5, faithfully reproduce the pattern of somatic mutations present in myeloma bone marrow, and accurately identify chromosomal translocations resulting in overexpression of key myeloma-associated oncogenes. Therefore, noninvasive isolation and deep genomic characterization of single myeloma cells from peripheral blood is feasible.
The approaches described above are complex and costly at present, and further development of standards is needed. Nonetheless, technologies that aim to isolate small input materials are developing at a rapid pace, and sequencing costs are likely to decrease, which may allow for incorporation of deep molecular analysis of the peripheral blood and bone marrow to assess MRD in the near future.
Clinical Data Available from Completed Studies
The complexity of the methods used to measure MRD has limited its utility and decreased ability to correlate MRD status with clinical outcome. Thus, to address the challenge and evaluate the impact of MRD on patient outcome, two meta-analyses of published small- to medium-size studies with reported MRD assessment in patients with newly diagnosed myeloma were performed: one from Dana-Farber Cancer Institute (DFCI)/Francophone du Myélome (IFM; ref. 56) and the other by Memorial Sloan Kettering Cancer Center (MSKCC)/NCI (57).
DFCI/IFM meta-analysis
In the recently published meta-analysis conducted by the DFCI/IFM group (56), a comprehensive MEDLINE search was performed of all English-language publications between 1990 and 2016. The authors identified articles that included one of the key terminologies, those that described controlled or randomized controlled trials, and patient cohort studies with MRD status that reported PFS or OS in 20 or more patients. Any MRD measurement methodology was allowed, as long as the limit of detection was at least 10–4 or lower. This is important, as studies with detection thresholds >10–4 have often failed to show a strong correlation between MRD levels and classical outcome measures. Of 430 screened records, 50 articles satisfied the initial screen parameters; of these, 18 were excluded due to lack of complete data, and 11 were excluded due to other considerations (one included relapsed myeloma patients, five included ASCT, two assessed MRD in apheresis products, one duplicate, and one methodology with insufficient sensitivity). Therefore, 21 met the criteria for integrated meta-analysis. The impact of MRD on PFS was reported in 14 studies (n = 1,273) and OS in 12 studies (n = 1,100). Patients achieving MRD negativity (660) had better PFS compared with those who were MRD positive [n = 613; hazard ratio (HR), 0.41; 95% confidence interval (CI), 0.36–0.48; P < 0.0001], with a median PFS of 26 months for MRD-positive patients and 54 months for MRD-negative patients. Patients achieving MRD negativity had better OS compared with those who were MRD positive (HR, 0.57; 95% CI, 0.46–0.71; P < 0.0001), with a median OS of 82 months and 98 months for MRD-positive and MRD-negative patients, respectively.
Focusing on MRD status in those patients achieving conventional CR, five studies have reported PFS (n = 574), and six studies have reported OS (n = 616; Supplementary Fig. S1). Achieving MRD-negative status was associated with significantly better PFS (HR, 0.44; 95% CI, 0.34–0.56; P < 0.0001) and OS (HR, 0.47; 95% CI, 0.33–0.67; P < 0.0001), with a median PFS and OS of 56 and 112 months, respectively, for MRD-negative patients and 34 and 82 months, respectively, for MRD-positive patients. Five studies evaluated MRD status before and after ASCT and found that the proportion of patients achieving MRD-negative status increased after ASCT. Similarly, two studies evaluated MRD status after maintenance therapy and found that maintenance therapy increased the proportion of patients achieving and maintaining MRD-negative status. This meta-analysis also showed that patients with favorable cytogenetics who achieved MRD-negative status had the best OS compared with patients who were either high risk or MRD positive; conversely, patients with high-risk cytogenetics who remain MRD positive had the worst outcome.
MSKCC/NCI meta-analysis
The published meta-analysis by MSKCC/NCI (57) was based on a systematic literature search conducted on December 22, 2015, for clinical trials in newly diagnosed myeloma patients with information on MRD and clinical outcomes. A predefined search strategy was applied in MEDLINE (via PubMed), EMBASE, and Cochrane's Central Register of Controlled Trials (CENTRAL). First, 390 potential studies were identified; however, 370 studies were excluded, as they were not clinical trials with MRD assessment in myeloma. Thus, 20 clinical trials of newly diagnosed myeloma patients with information on MRD and clinical outcomes were identified and assessed for inclusion in this meta-analysis. Upon careful review of the 20 identified studies (12–14, 16, 18, 21, 58–71), four studies were excluded because they reported on ASCT (61–63, 71), seven were excluded because they did not evaluate the association between MRD status and PFS and/or OS (16, 21, 64–68), four were excluded because they analyzed overlapping cohorts of patients (duplicates; refs. 13, 14, 18, 69), and one was excluded because the timing of MRD analysis was not specified (70). Four studies with information on MRD status and HR for PFS were included in the final analysis (12, 58–60).
Despite inherent differences across included studies (e.g., eligibility criteria, use of drugs, application of MRD assays), all HRs were in the same direction and favored MRD negativity predicting for a longer PFS. Overall, the meta-analysis shows that patients who achieved MRD negativity (vs. remained MRD positive) had better PFS (HR, 0.35; 95% CI, 0.27–0.46; P < 0.001; Supplementary Fig. S2).
As described above, four studies with information on MRD status and HRs for PFS were included in the final analysis, and three of these had information on OS (12, 58, 59). The study by Korde and colleagues had no deaths during the original follow-up window (up to 30 months) and was therefore not included (58). Thus, the studies by Paiva and colleagues and Mateos and colleagues were the only two that provided HRs for OS (12, 59). The meta-analysis showed that patients who achieved MRD negativity (vs. remained MRD positive) had better OS (HR, 0.48; 95% CI, 0.33–0.70; P < 0.001; Supplementary Fig. S2).
In conclusion, these two comprehensive meta-analyses confirm that MRD-negative status after treatment of newly diagnosed myeloma is associated with significant improvement in survival and support the integration of MRD assessment in clinical trials of myeloma, particularly using next-generation MRD technologies, as both meta-analyses are based on older studies using low-sensitivity MFC methods. These data also supported the incorporation of MRD in the newly revised response criteria. However, additional data are needed to establish the role of MRD for both registration of new agents and to inform clinical practice in myeloma. To address this need, data correlating MRD with clinical outcome from ongoing DFCI/IFM, Multiple Myeloma Research Foundation CoMMpass, International Myeloma Foundation Black Swan, and other industry clinical trials are already, or soon will be, available.
Clinical Trial Design: New Drug Approval Evidence
In the United States, two approval pathways exist for drug and biological products that treat serious and life-threatening diseases and conditions. Regular approval is granted on the basis of demonstration of clinical benefit, an effect on an endpoint that provides a direct clinical benefit to the patient (such as prolongation of life and better quality of life), or an established surrogate for at least one of these (72–74). Accelerated Approval (AA) can be granted on the basis of a surrogate endpoint that is not considered established but is reasonably likely to predict clinical benefit. Following AA, a postmarketing study is conducted and submitted for FDA review, demonstrating that treatment with drug or biologic is indeed associated with clinical benefit.
Both approval pathways require substantial evidence for approval from adequate and well-controlled investigations. A key difference between the two regulatory pathways may arise from uncertainty and differences in external supporting data that can be leveraged. The FDA has a long history of accepting an improvement in an established surrogate as evidence of clinical benefit, that is, reducing blood pressure or cholesterol. These reductions are considered so well established that they satisfy the agency's criteria for regular approval and are supported by external scientific data. Other endpoints, such as tumor size reduction, are recognized as providing “a reasonable likelihood” that the drug will have a clinical benefit. The FDA was able to grant AA to bortezomib (Velcade; Millennium/Takeda) for the treatment of myeloma in 2003, after analyzing the response data from two studies.
Two approval pathways exist for the regulatory acceptance of a surrogate endpoint such as MRD. One is through the formal Drug Development Tool Qualification Process. FDA's Critical Path Initiative Drug Development Tool Qualification Program provides a framework for development and regulatory acceptance of drug development tools (DDT; ref. 75). Information about a DDT that has been formally qualified for a specific context of use will be made publicly available to expedite drug development and review of regulatory applications. Within that specific context of use, the DDT can be relied upon to have a specific interpretation and application in drug development, such that the qualified DDT can be included in Investigational New Drug (IND) or New Drug Application/Biologic License Application (NDA/BLA) submissions without the need for the Center for Drug Evaluation and Research (CDER) to reconsider and reconfirm the suitability of the DDT.
Alternatively, regulatory acceptance of a surrogate endpoint can occur through collaboration with a specific review division either within a specific drug development program or outside the context of a specific application. Within a specific drug development program, a pharmaceutical sponsor will meet with the agency to present scientific data supportive of the proposed surrogate endpoint. These discussions include the disease or condition, prior FDA use of the novel endpoint, endpoint definition, whether the endpoint can be reliably measured, whether the assay is validated and standardized, and the extent of the scientific literature. If the endpoint is an assay to be used for treatment decisions, an assay device platform may need to be approved as well. Examples of this surrogate acceptance pathway include the development of the novel endpoint major molecular response in chronic myelogenous leukemia (CML) and pathologic complete response in neoadjuvant breast cancer.
The specific requirements needed to support acceptance of a surrogate endpoint are determined on a case-by-case basis; however, there are common concepts that guide the FDA's determination. When determining whether a potential surrogate endpoint is reasonably likely to predict clinical benefit to support AA, the FDA will evaluate the provided scientific rationale explaining the relationship between the proposed surrogate endpoint, the disease, and the clinical benefit effect (74). There must be empirical evidence to support this relationship, which could include epidemiologic, pathophysiologic, therapeutic, pharmacologic, or other evidence; clinical data should also be provided (74), including the analytic validity of the test that measures the surrogate endpoint biomarker proposed.
Furthermore, FDA acceptance of a surrogate endpoint for regulatory use in a clinical trial and the FDA requirement for an approved in vitro diagnostic (IVD) device for widespread clinical use are two different regulatory processes, and the requirements for both regulatory processes should be considered. Sometimes, the requirements of both regulatory processes must be met. A key element of the DDT program for determining whether a biomarker is suitable under its stated context of use is to understand the assay or tools that will assess that biomarker, and whether the assay used is analytically valid. In evaluating new biomarkers for qualification or within a specific drug development program, CDER works closely with the Office of In Vitro Diagnostics and Radiological Health in the Center for Devices and Radiological Health (CDRH). CDRH evaluates the suitability and specifications of the assay(s) employed to assess the qualified biomarker in clinical trials to obtain robust and reproducible results.
Between 2012 and 2014, the FDA held several workshops on the use of MRD testing in hematologic malignancies (76–79). These workshops identified several issues with the regulatory use of MRD. One such issue is the standardization of MRD testing. The new IMWG criteria are helpful in addressing this issue (20). However, the variable approaches previously used have created uncertainty regarding the relationship between MRD and clinical benefit endpoints of interest and specifically the threshold(s) that corresponds to these clinical benefit endpoints. It is important to ensure that lower levels of detection actually correlate better with OS, especially if the lower level of MRD negativity is achieved only through treatment with drugs that produce more toxicity. Similarly, in settings where there are salvage therapies with significant activity, attaining deeper levels of MRD negativity may not be necessary and may even be harmful, and there is some anecdotal evidence in myeloma that suggests that patients who have MRD-positive disease may still have good clinical outcomes (26, 80). The FDA is agnostic to the specific methodology used for assessment of MRD but requires that adequate evidence exist supporting the methodology, the thresholds used, and the relationship to a clinical benefit endpoint.
Other issues impacting the understanding of MRD in myeloma include the role of other factors in the relationship between MRD and clinical benefit measurements. Multiple studies have demonstrated that factors such as CR status, risk profile, or cytogenetics influence this relationship (14, 16). These other factors, which play a role in the clinical benefit endpoint, are especially important to consider to allow further elucidation of the strength of the relationship between MRD and clinical benefit.
Applicants are encouraged to meet with FDA scientists in presubmission meetings, and throughout the development process, to receive specific advice on design and evidence requirements including patient selection; standardization of protocols and assays across testing sites; analytic validation requirements (e.g., clinical decision cutoff points, limit of detection, linearity, specificity, etc.); commercialization of an IVD assay; assay use clinical practice (e.g., as aids in patient monitoring); technical methods used to measure the endpoint (e.g., for MRD, NGF, NGS, etc.); missing data; and other factors, potentially independent of the surrogate endpoint, that can affect clinical benefit measurement.
If MRD is ultimately accepted as a surrogate reasonably likely to predict clinical benefit of novel therapeutics for myeloma, and is used as the primary endpoint in trials for AA, there may be a requirement to conduct a confirmatory trial to verify the anticipated clinical benefit. At the time of initial AA, the confirmatory trial should already be initiated.
It is important to note that in the area of myeloma, MRD has multiple potential uses. Again, if accepted as a surrogate endpoint, it could be used as a clinical trial endpoint in a variety of different clinical settings. In the relapsed/refractory and newly diagnosed setting, many novel therapies are often evaluated in combination with established therapies. If a randomized trial is conducted comparing a novel therapy with an active regimen, MRD may be useful to distinguish the depth of response obtained with the novel therapy as compared with the established regimen (see Fig. 2). This could be particularly useful when the comparator is an active control, with a relatively high response rate already. The ability to detect an improvement in response may be limited; however, there may be a difference in the depth of response. For this application of MRD, a better understanding of what magnitude of MRD negativity difference is clinically meaningful would be needed.
Schema for a potential randomized trial comparing a novel therapy in relapsed/refractory multiple myeloma (R/R MM) to an active regimen. AB = drug A + drug B combination. MRD may be useful as an endpoint for accelerated approval that can then be converted to regular approval pending confirmation in a follow-up context using endpoints with more well-accepted clinical benefit. ORR, overall response rate.
Schema for a potential randomized trial comparing a novel therapy in relapsed/refractory multiple myeloma (R/R MM) to an active regimen. AB = drug A + drug B combination. MRD may be useful as an endpoint for accelerated approval that can then be converted to regular approval pending confirmation in a follow-up context using endpoints with more well-accepted clinical benefit. ORR, overall response rate.
In the area of smoldering myeloma, MRD could be particularly useful, given the long time to achieve conventional endpoints either in the form of PFS or OS. The same is true for the maintenance setting because of the long times to outcome readout. For trials with registration intention, MRD could also be used for patient selection. Particularly poor clinical outcomes have been observed in patients with high-risk cytogenetics who are MRD positive after treatment. MRD status together with cytogenetic risk could be used as a trial enrichment strategy, such that these patients are selected for additional treatment (14).
Clinical Trial Design: Change Clinical Practice with Existing Drugs
The myeloma response criteria developed over a decade ago by the IMWG is based on serum and urine protein electrophoresis, serum and urine immunofixation, serum-free light chain (sFLC), and bone marrow assessment with PC quantitation (81, 82). The consensus criteria were uniformly incorporated into routine clinical practice as well as clinical trials, allowing for better comparison of different drugs, drug combinations, and treatment strategies. Because of the remarkable changes in treatment approaches and improved results, new methods are urgently required to refine current clinical response criteria. The IMWG has, therefore, recently updated the response criteria, incorporating more sensitive approaches for assessment of residual disease both in and outside of the bone marrow (ref. 20; see Supplementary Table S1). The revised criteria specifically now include a new category of MRD negativity in the bone marrow, which is further refined as imaging negative by assessment of the extramedullary compartment using sensitive imaging techniques. Although rapid advances in technology will allow detection of smaller and smaller amounts of disease, it is important to utilize a feasible standard set of criteria that can be applied today both in clinical trials and in clinical practice. These new response criteria allow us to apply such standards presently, and future trials will further define their role in clinical practice. It will be important to begin including these new techniques in clinical trials to gain information on their best uses in clinical settings.
Although achievement of MRD negativity should portend better outcomes, these techniques should also contribute to better treatment monitoring and facilitate decisions in the clinic. Broadly, the potential utility of MRD testing can be related to its use (i) as a prognostic factor, (ii) as a tool for altering treatment approaches/making treatment decisions, (iii) in defining success of a defined set of treatments to allow for comparison of efficacy between approaches, and (iv) for informing timing of treatment interventions.
Use of MRD as a prognostic tool
Multiple studies have shown that achievement of MRD negativity after a variety of different therapies, and using a variety of different techniques, can lead to better survival outcomes. The more sensitive the technique and deeper the assessed depth of response, the better the survival (21). However, there is significant heterogeneity in the existing literature regarding the value of the depth of response and outcomes. One of the biggest obstacles to interpreting current data is the timing of MRD assessment in the different studies and even within the same study. Bone marrow evaluation has traditionally been performed once the serum and urine measurements turn negative, and, hence, the timing was quite dependent on the kinetics of response. A longer time to the bone marrow and MRD assessment guarantees survival until that time and skews the results, as is often seen in responder analyses. More recent studies, especially where MRD testing has been prospectively incorporated into the study design, have addressed this issue by use of timed bone marrow evaluations and landmark analyses from the time of MRD assessment.
Such timed evaluation also gives insight into a group of patients not well characterized to date. This small fraction of patients may not be negative by serum and urine studies but are negative on bone marrow assessment by the most sensitive MRD detection techniques. This discrepant result may be due to the long half-life of the Ig proteins or an inaccurate estimate of the bone marrow due to sampling issues, with the former being more probable due to the favorable outcome of MRD-negative yet immunofixation-positive patients. It is important to delineate the natural history of this group of patients in prospective trials that define the appropriate timing of MRD assessment in relation to other markers of response.
The lack of uniform assessment across entire cohorts, both in terms of timing and the types of tests used, prevents full assessment of the impact of MRD results on outcomes. Moreover, many patients reported in retrospective studies are highly selected. Prospective studies, or prospectively studied patients from prior studies that uniformly address MRD measurements, will allow for an intention-to-treat analysis and will permit this difficulty to be overcome.
What kind of studies can address these deficiencies in understanding the role of MRD testing in myeloma? Studies could be limited to existing datasets, where the assessment and outcomes are available for all patients initially selected for a particular treatment. These retrospective studies can be used to compare different treatment approaches, which can in turn inform future trials. They can also help to determine whether the type of treatment has an independent effect on outcomes, that is, is MRD the same irrespective of the treatment used to achieve it? Comparison of MRD assessments done at different time points using the same methodology can be compared to inform the best timing of MRD assessment. Furthermore, analysis of existing data will allow a better determination as to how long the MRD negativity needs to be sustained before the notion of cure can be considered in these patients. This last point is particularly relevant, as patients are living longer with myeloma, and current trials could take a decade or longer to provide an answer unless we assess this biomarker response.
Indeed, the proportion of patients attaining MRD negativity in myeloma is probably the strongest determinant of the outcome. Importantly, none of these studies will answer the most important question: whether the ability of any given patient to achieve MRD negativity with a particular therapy is more of a reflection of disease biology, or if MRD-negative state should be a treatment goal that will translate into better outcome for that individual patient.
Using MRD results for making treatment decisions
This is the most important aspect of MRD assessment from a patient and caregiver standpoint: What are you going to do with the result? Unfortunately, little data are currently available to guide us in the use of MRD in routine management of myeloma. Moreover, the same applies for conventional CR, as in most cases this also does not affect the treatment decision process. Future studies should link MRD to decision points as to how to manage the different phases of treatment. As described above, meta-analyses of all available studies do indicate that MRD negativity portends for prolonged PFS and OS, but MRD as a goal of therapy should be better defined. MRD testing may be useful to inform clinical decision making in the following clinic contexts: (i) defining the timing of high-dose therapy (HDT) in transplant-eligible patients, (ii) determining the need for consolidation therapy following HDT, (iii) delineating the duration of maintenance therapy, and (iv) identifying the need for changing therapy anywhere along this continuum. The following examples highlight potential designs for each of these decision nodes.
Timing of HDT.
Considerable controversy exists as to the ideal depth of response required for a patient to proceed to HDT. The current approach, partially based on the failure of bone marrow graft purging methods and retrospective studies showing lack of benefit to additional therapy prior to HDT in patients not achieving a PR after induction therapy, is to proceed to HDT irrespective of the depth of pretransplant response. At present, however, the potential of achieving an MRD-negative status before transplant will reopen examination of the importance of graft contamination. More importantly, it will also raise questions as to the benefit of proceeding to transplant early after initial therapy versus delaying HDT to time of relapse. A reasonable study design would be to enroll patients who are MRD positive after four cycles of induction and then randomize them to receive either four more cycles of induction or only induction until MRD is achieved, and then proceed to HDT. Those who are negative for MRD could be randomized to early versus delayed HDT. This trial design will require a concerted approach by several groups in collaboration to obtain large numbers of patients with long follow-up (Fig. 3A).
A, Role of MRD in timing of HDT therapy in multiple myeloma schema 1. B, Role of MRD in post-HDT consolidation schema 2. C, Role of MRD in maintenance therapy in multiple myeloma schema 3. D, Role of MRD in duration of therapy in non-transplant setting schema 4.
A, Role of MRD in timing of HDT therapy in multiple myeloma schema 1. B, Role of MRD in post-HDT consolidation schema 2. C, Role of MRD in maintenance therapy in multiple myeloma schema 3. D, Role of MRD in duration of therapy in non-transplant setting schema 4.
Post-HDT consolidation.
Increasingly, sensitive tests demonstrate residual disease after one cycle of HDT, and randomized studies have shown that additional therapy following the initial HDT, such as tandem transplant, additional cycles of induction, or new treatment regimens, can further deepen response. Although randomized trials have not clearly demonstrated a survival advantage for these approaches, it remains a clinical setting in which MRD testing could inform treatment decisions. A simple trial design would take all patients post-single HDT, do MRD testing, and then randomize patients to receive consolidation versus no consolidation therapy, with stratification based on MRD status. Such a trial will determine whether consolidation has benefit depending on the MRD status. Alternatively, one could enroll all patients post-single HDT, do an MRD test, and randomize MRD-negative patients to no consolidation versus specified cycles of consolidation. Patients who are MRD positive can be randomized to the same consolidation therapy or a more intense approach (Fig. 3B).
Duration of maintenance.
Maintenance approaches following HDT are increasingly being used on the basis of results of phase III trials. The ideal duration of maintenance therapy is unknown. This is an important question, as prolonged maintenance may be associated with increased cost of care and long-term toxicities, without significant incremental benefit over limited duration of maintenance. How can we address this question? A simple trial design would enroll patients post-single HDT and a specified duration of maintenance (1 year or 2 years), do MRD testing, and then randomize patients to receive continued maintenance versus no maintenance therapy, stratifying based on MRD status. Such a trial will determine whether indefinite maintenance has benefit dependent on the MRD status. Alternatively, all patients could be treated with a specified duration of maintenance (1 year or 2 years) and have an MRD test, followed by randomization of MRD-negative patients to either receive no more maintenance versus limited additional cycles of maintenance. Patients who are MRD positive can be randomized to indefinite maintenance or switching therapy to a new class of drug(s) (Fig. 3C).
Duration of therapy in non-transplant setting.
In the setting of non-transplant–eligible patients, there is increasing adoption of prolonged therapy until disease progression rather than a fixed duration of therapy, as had been the prior practice in myeloma. The data from clinical trials continue to be mixed with no clear message emerging. In the FIRST trial (83) comparing 18 months of lenalidomide to lenalidomide until progression, PFS was superior for the indefinite lenalidomide, with no improvement in OS. In a meta-analysis of Italian trials, Palumbo and colleagues (84) demonstrated improved OS and PFS with continuous therapy compared with fixed duration therapy. Given the lack of definitive data, it is important to address this question, and using MRD as a clinical decision point will allow us to clearly identify patient groups who will benefit from such an approach and who will be better served with a limited duration of therapy saving them the toxicity and the expense associated with extended treatment. How can we address this question? A simple trial design would enroll patients who have received a defined duration of therapy (e.g., 18 months), do MRD testing, and then randomize patients to receive continued therapy versus observation, stratifying based on MRD status. Such a trial will determine whether indefinite therapy has benefit dependent on the MRD status. Alternatively, patients could have an MRD test, followed by randomization of MRD-negative patients to observation versus continued therapy. Patients who are MRD positive can be randomized to indefinite therapy or switching therapy to a new class of drug(s) or adding another drug from a new class (Fig. 3D).
Defining success of a treatment approach
A challenge today is developing clinical trials with endpoints that are achievable in a time frame that can be beneficial for patients currently undergoing treatment. To demonstrate the efficacy of a particular drug, a clinical trial design either compares it with a standard-of-care (SOC) drug or compares combinations of the investigational drug with an SOC drug against an SOC drug alone, with PFS as a commonly used endpoint. Although response rates are also used in some situations to get AA of new drugs in patient populations where no alternative treatment options exist, there are multiple new treatment options in myeloma. PFS is considered a better index of efficacy, as it is a composite measure of the depth of response, the response rate, any toxicity that could lead to early discontinuation or death, and durability of response. Moreover, PFS in relapsed disease has steadily climbed to over 2 years in the recent ASPIRE trial (85), making future trials targeting PFS long, complicated, and costly. This is even more of an issue in the newly diagnosed myeloma setting, where clinical trials are not typically examining if a new drug is effective in the disease but whether early incorporation of the drug into the current treatment paradigm leads to better survival outcomes. In this setting, OS is a better index, as eventually patients would have the option of using the drug subsequently (in second-line or greater therapy) if early use of a given drug does not confer benefit. Importantly, with median OS rapidly improving due to the availability of novel treatment options, OS benefit is increasingly difficult to obtain in an expeditious fashion, thereby limiting patient access to novel therapies. Therefore, there is an urgent need for early markers of efficacy that can be reliably linked to these longer term outcomes.
MRD assessment has the potential to be such an early marker of later outcomes, which will require stringent validation in well-designed prospective trials or analysis of existing clinical trials that systematically include MRD assessment. The latter approach would be more expeditious, as many large phase II and III trials initiated in the past 2 to 3 years have integrated MRD testing (see Supplementary Table S2).
Defining the timing of treatment intervention
Using MRD negativity as the goal of initial therapy will derive similar questions as to whether MRD might inform retreatment at time of relapse as well. Currently, our treatment paradigm often delays treatment of relapsing patients until redevelopment of CRAB features [C, calcium (elevated); R, renal failure; A, anemia; B, bone lesions] instead of relying on biochemical markers alone. However, if MRD negativity is a stepping-stone to cure, then reappearance of MRD represents a failure to achieve that goal due to inadequacy of the initial treatment approach and/or a fundamental inability to eradicate disease due to intrinsic disease biology.
A potential clinical trial design will enroll patients reaching an MRD-negative state and then randomize to early intervention as soon as the MRD tests become positive versus intervention at the time of objective criteria for disease progression. Such a study will require regular, timed assessment of MRD as well as myeloma blood protein and bone marrow profiling. Endpoints for such a study will be confounded without mandating the type of salvage therapy and will determine whether reattainment of MRD negativity portends increased OS.
Conclusions and Future Directions
Significant progress has been made in the development of MRD technologies that reliably and reproducibly achieve 10–5 to 10–6 sensitivity for tumor cell detection and are predictive of both PFS and OS in myeloma. Specifically, two recently published meta-analyses provide quantitative data to support the integration of MRD assessment as an endpoint in clinical trials of myeloma. Indeed, the inclusion of MRD in the recently revised IMWG response criteria represents a strong recognition by the international myeloma community of the urgent need for MRD monitoring to both facilitate new drug development and inform clinical practice. What is required to achieve these goals? As highlighted earlier, additional data are needed to establish MRD surrogacy. A collaborative process with the FDA, similar to that used to define the role of pathologic CR in neoadjuvant breast cancer, may further advance this goal. The recent effort that led to CR30 acceptance as a surrogate for PFS in first-line follicular lymphoma from the FLASH analysis (86) provides an additional example for a similar collaborative effort that could be used to define the utility of MRD myeloma. Such collaborative efforts with the FDA are aligned with the White House Precision Medicine Initiative (PMI) and Vice President Biden's Cancer “Moonshot” priorities. Importantly, the establishment of such a clinically annotated broad database in myeloma could answer many outstanding questions surrounding the use of MRD in myeloma. Should MRD be assessed only in patients who are in CR? Given the impact of risk factors on MRD, does MRD negativity in patients with high-risk disease bear the same impact as in those with standard risk? What is the appropriate timing of MRD assessment? How will novel agents, especially immune-based strategies, affect the MRD assessment? Moreover, maintenance of such a database will be useful to assess the utility of novel technologies to measure MRD, such as imaging, cfDNA, or single-cell sequencing.
Although progress in myeloma treatment and patient outcome has been remarkable in the recent past, continued progress will require use of MRD for new drug AA. It may not be practical to utilize PFS as an endpoint in future registration trials. Importantly, incorporating MRD into clinical trials is also necessary to define its role to inform clinical practice in myeloma. The time is now for the community of clinical researchers and caregivers, biotechnology and pharmaceuticals, NIH, and FDA to come together and fulfill the promise of MRD to ensure continued improvements in patient outcome in myeloma.
Disclosure of Potential Conflicts of Interest
K.C. Anderson is a consultant/advisory board member for Bristol-Myers Squibb, Celgene, Gilead, and Millennium/Takeda. S.K. Kumar is a consultant/advisory board member for Abbvie, Amgen, Celgene, Janssen, Skyline, and Takeda. M. Cavo is a consultant/advisory board member for Amgen, Bristol-Myers Squibb, Celgene, Janssen, Novartis, and Takeda. R.D. Loberg holds ownership interest (including patents) in Amgen. J.L. Omel is a consultant/advisory board member for Takeda. N. Valente holds ownership interest (including patents) in Roche. E. Zamagni reports receiving speakers bureau honoraria from and is a consultant/advisory board member for Amgen, Bristol-Myers Squibb, Celgene, and Janssen. No potential conflicts of interest were disclosed by the other authors.
Disclaimer
This article reflects the views of the authors and should not be construed to represent the FDA's views or policies.
Authors' Contributions
Conception and design: K.C. Anderson, D. Auclair, G.J. Kelloff, N.J. Gormley, S.K. Kumar, O. Landgren, N.C. Munshi, S.I. Gutman, M.A. Hussein, I.R. Kirsch, R.F. Little, R.D. Loberg, J.L. Omel
Development of methodology: K.C. Anderson, D. Auclair, H. Avet-Loiseau, S.K. Kumar, O. Landgren, N.C. Munshi, I.R. Kirsch, R.F. Little, J.L. Omel, T.J. Pugh
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): K.C. Anderson, C.C. Sigman, O. Landgren, F.E. Davies
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): K.C. Anderson, C.C. Sigman, O. Landgren, N.C. Munshi, M. Cavo, F.E. Davies, M.A. Hussein, R.D. Loberg, G.H. Reaman
Writing, review, and/or revision of the manuscript: K.C. Anderson, D. Auclair, G.J. Kelloff, C.C. Sigman, H. Avet-Loiseau, A.T. Farrell, N.J. Gormley, S.K. Kumar, O. Landgren, N.C. Munshi, M. Cavo, F.E. Davies, A. Di Bacco, J.S. Dickey, S.I. Gutman, H.R. Higley, M.A. Hussein, J.M. Jessup, I.R. Kirsch, R.F. Little, R.D. Loberg, J.G. Lohr, L. Mukundan, J.L. Omel, T.J. Pugh, G.H. Reaman, M.D. Robbins, A.K. Sasser, N. Valente, E. Zamagni
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): K.C. Anderson, J.S. Dickey, H.R. Higley, J.M. Jessup, L. Mukundan
Study supervision: K.C. Anderson, G.J. Kelloff
Acknowledgments
The authors gratefully acknowledge the FNIH Biomarkers Consortium Cancer Steering Committee and the Multiple Myeloma Research Foundation for providing the venue for this collaboration.