Abstract
Drug development can be associated with slow timelines, particularly for rare or difficult-to-treat solid tumors such as glioblastoma. The use of external data in the design and analysis of trials has attracted significant interest because it has the potential to improve the efficiency and precision of drug development. A recurring challenge in the use of external data for clinical trials, however, is the difficulty in accessing high-quality patient-level data. Academic research groups generally do not have access to suitable datasets to effectively leverage external data for planning and analyses of new clinical trials. Given the need for resources to enable investigators to benefit from existing data assets, we have developed the Glioblastoma External (GBM-X) Data Platform which will allow investigators in neuro-oncology to leverage our data collection and obtain analyses. GBM-X strives to provide an uncomplicated process to use external data, contextualize single-arm trials, and improve inference on treatment effects early in drug development. The platform is designed to welcome interested collaborators and integrate new data into the platform, with the expectation that the data collection can continue to grow and remain updated. With such features, GBM-X is designed to help to accelerate evaluation of therapies, to grow with collaborations, and to serve as a model to improve drug discovery for rare and difficult-to-treat tumors in oncology.
External data in the design and analysis of trials have the potential to improve the efficiency and precision of drug development. Given a need for mechanisms to enable investigators to benefit from high-quality patient-level data of patients with newly diagnosed glioblastoma, the Glioblastoma External (GBM-X) Data Platform is a tool designed to allow neuro-oncology investigators to leverage our data collection and obtain analyses. GBM-X strives to provide an uncomplicated process to use external data, contextualize single-arm trials, and improve inference on treatment effects early in drug development. The GBM-X Data Platform can serve as a model to improve drug discovery for rare and difficult-to-treat tumors in oncology.
Introduction
In oncology, drug development is associated with slow timelines, particularly for rare or difficult-to-treat solid tumors such as glioblastoma. Randomized controlled trials (RCT) constitute the gold standard for therapeutic testing, but they can have limitations, particularly in the early stages of development, including high cost and prolonged timelines. For disease settings such as gliomas, single-arm trials are commonly used but associated with risks of bias and inaccurate decision making in early phase trials (1). There is therefore a need for novel approaches and statistical designs to evaluate candidate therapies and for decision making throughout the development pathway of novel treatments.
Leveraging external data with patient-level information has been proposed as a potential strategy to accelerate the development of new therapies, particularly for rare or difficult-to-treat cancers (2). Externally augmented clinical trial (EACT) designs incorporate prespecified external data with patient-level information in the analysis plan of a clinical trial. Several feasibility studies have shown that external datasets could accelerate the evaluation of new therapies (2, 3), and this has garnered interest from regulatory agencies (4, 5). The most direct application is to use an external control arm to contextualize the results of single-arm trials, and this has been explored across several indications in oncology (2, 3, 5). The use of statistically valid adjustment methods to remove the influence of confounding covariates, which differ in the trial and external populations, can facilitate the evaluation of novel treatments with greater precision compared with conventional single-arm trial analyses. While other uses of external data are largely unexplored, external control data have the potential to bolster decision making in clinical trials, including possible applications in interim analyses (6), subgroup analyses, hybrid randomized trials, and selection of the sample size and other design characteristics.
Applications of External Data in Neuro-Oncology
The use of external data has attracted significant interest in neuro-oncology (7). Initial efforts have focused on glioblastoma, a disease with a dismal prognosis and in need of therapeutic advances. Of note, despite a long history of poor decision making leading to repeated failed phase III trials, single-arm studies continue to be common in glioblastoma early phase trials (1). Retrospective analyses evaluating the use of external control arms have demonstrated potential value in glioblastoma, with reduction of false-positive results compared with standard analyses of single-arm trials (2). Other applications of external control data in neuro-oncology clinical trials, ranging from interim decisions to the analysis of randomized trials, have been discussed previously (2, 6, 7). A prospective phase II single-arm trial analyzed with an external control dataset has recently been reported in recurrent glioblastoma, and a subsequent registrational trial that will leverage external data in a hybrid randomized design has been announced (8).
To develop designs that utilize external controls for newly diagnosed glioblastoma trials, a recurring challenge we have encountered is the difficulty in overcoming barriers to access high-quality patient-level data. These datasets ideally come from previously completed clinical trials or alternatively from real-world repositories such as institutional libraries of well-annotated datasets. Academic research groups generally will not have access to suitable datasets to effectively leverage external data for planning and analyses of new clinical trials. Data sharing has long been difficult, and barriers to data sharing have been discussed at length, including concerns related to patient privacy, academic credit, data-sharing infrastructures, costs, data standards, and inappropriate secondary analyses (9). Datasets of previously completed clinical studies, including trials conducted with support from federal agencies, remain difficult to access for the outlined purposes. In our experience, years can elapse when requesting data from investigators, industry, cooperative groups, and other organizations. Moreover, after pursuing the process to request data, it is difficult to predict whether approval will be granted or all requested data elements will be provided. These challenges affect the balance between costs—including time and dedicated personnel to utilize these data sources—and potential efficiencies that could be achieved integrating external patient-level data in future clinical trials. Of note, in the literature (10) and in our discussions with patient advocacy groups in neuro-oncology, there is support for the use of data from completed trials for further clinical research (7).
Development of a Glioblastoma Data Platform
In light of the potential of external data in oncology research and preferences of patients to have their data used outside of a single study, we need practical solutions to reduce barriers raised by industry and academia. Thus, we recognized the need for mechanisms to enable investigators to benefit from existing data assets. With this in mind, we have developed the Glioblastoma External (GBM-X) Data Platform (https://rconnect.dfci.harvard.edu/gbmdata/) that will allow investigators in neuro-oncology to leverage our data collection. Current datasets include deidentified individual patient-level data of over 1,200 patients with newly diagnosed glioblastoma with relevant pretreatment covariates, extracted from six clinical trials and two institutional databases (Table 1). All these datasets include patients that received the standard-of-care therapy of radiotherapy and temozolomide for newly diagnosed glioblastoma.
. | . | Source . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | Total . | Cho et al. 2011 (PMID: 22001862) . | DFCI GBM . | NCT00441142 . | NCT00689221 . | NCT00813943 . | NCT00943826a . | NCT02977780 . | UCLA/KP GBM . |
. | n = 1,662 (%) . | n = 16 (1) . | n = 625 (38) . | n = 27 (2) . | n = 272 (16) . | n = 88 (5) . | n = 459 (28) . | n = 69 (4) . | n = 106 (6) . |
Age | |||||||||
Median (range) | 58 (17–94) | 58 (36–69) | 60 (17–94) | 55 (26–73) | 57 (21–78) | 57 (21–74) | 56 (18–79) | 59 (24–75) | 58 (20–79) |
<65 | 1,277 (77) | 12 (75) | 442 (71) | 25 (93) | 219 (81) | 77 (88) | 378 (82) | 54 (78) | 70 (66) |
≥65 | 385 (23) | 4 (25) | 183 (29) | 2 (7) | 53 (19) | 11 (12) | 81 (18) | 15 (22) | 36 (34) |
Sex | |||||||||
Female | 684 (41) | 8 (50) | 270 (43) | 12 (44) | 129 (47) | 34 (39) | 165 (36) | 28 (41) | 38 (36) |
Male | 978 (59) | 8 (50) | 355 (57) | 15 (56) | 143 (53) | 54 (61) | 294 (64) | 41 (59) | 68 (64) |
KPS | |||||||||
<90 | 481 (29) | 7 (44) | 245 (39) | 5 (19) | 120 (44) | 49 (56) | — | 24 (35) | 31 (29) |
90–100 | 646 (39) | 9 (56) | 304 (49) | 22 (81) | 152 (56) | 39 (44) | — | 45 (65) | 75 (71) |
Unknown | 535 (32) | — | 76 (12) | — | — | — | 459 (100) | — | — |
RPA | |||||||||
3 | 234 (14) | 2 (12) | 65 (10) | — | 46 (17) | 16 (18) | 78 (17) | — | 27 (25) |
4–5 | 1,309 (79) | 14 (88) | 537 (86) | — | 226 (83) | 72 (82) | 381 (83) | — | 79 (75) |
Unknown | 119 (7) | — | 23 (4) | 27 (100) | — | — | — | 69 (100) | — |
MGMT promoter methylation status | |||||||||
Unmethylated | 768 (46) | 7 (44) | 313 (50) | 15 (56) | — | 88 (100) | 236 (51) | 69 (100) | 40 (38) |
Methylated | 687 (41) | 9 (56) | 254 (41) | 6 (22) | 272 (100) | — | 116 (25) | — | 30 (28) |
Unknown | 207 (12) | — | 58 (9) | 6 (22) | — | — | 107 (23) | — | 36 (34) |
Extent of surgical resection | |||||||||
Biopsy | 130 (8) | — | 55 (9) | 5 (19) | — | — | 42 (9) | 6 (9) | 22 (21) |
Gross total resection | 771 (46) | 11 (69) | 293 (47) | 9 (33) | 137 (50) | 46 (52) | 192 (42) | 37 (54) | 46 (43) |
Subtotal resection | 746 (45) | 5 (31) | 277 (44) | 13 (48) | 126 (46) | 36 (41) | 225 (49) | 26 (38) | 38 (36) |
Unknown | 15 (1) | — | — | — | 9 (3) | 6 (7) | — | — | — |
IDH mutation status | |||||||||
Wildtype | 760 (46) | — | 620 (99) | 17 (63) | — | — | — | 69 (100) | 54 (51) |
Mutant | 6 (0) | — | 1 (0) | 4 (15) | — | — | — | — | 1 (1) |
Unknown | 896 (54) | 16 (100) | 4 (1) | 6 (22) | 272 (100) | 88 (100) | 459 (100) | — | 51 (48) |
Time-to-event in months | |||||||||
Follow-up time | |||||||||
Median (95% CI) | 33 (32–35) | NR (NR–NR) | 55 (46–71) | 38 (22–NR) | 30 (28–32) | 22 (21–NR) | 31 (29–32) | 19 (15–23) | 40 (37–47) |
Overall survival | |||||||||
Median (95% CI) | 19 (19–20) | 14 (12–24) | 21 (20–22) | 16 (11–32) | 27 (24–33) | 14 (13–15) | 17 (15–18) | 15 (13–17) | 21 (19–26) |
Progression-free survival | |||||||||
Median (95% CI) | 9 (8–10) | 8 (7–18) | 10 (10–11) | 8 (5–23) | 15 (12–19) | 8 (6–8) | 6 (6–8) | 6 (6–8) | 8 (6–11) |
. | . | Source . | |||||||
---|---|---|---|---|---|---|---|---|---|
. | Total . | Cho et al. 2011 (PMID: 22001862) . | DFCI GBM . | NCT00441142 . | NCT00689221 . | NCT00813943 . | NCT00943826a . | NCT02977780 . | UCLA/KP GBM . |
. | n = 1,662 (%) . | n = 16 (1) . | n = 625 (38) . | n = 27 (2) . | n = 272 (16) . | n = 88 (5) . | n = 459 (28) . | n = 69 (4) . | n = 106 (6) . |
Age | |||||||||
Median (range) | 58 (17–94) | 58 (36–69) | 60 (17–94) | 55 (26–73) | 57 (21–78) | 57 (21–74) | 56 (18–79) | 59 (24–75) | 58 (20–79) |
<65 | 1,277 (77) | 12 (75) | 442 (71) | 25 (93) | 219 (81) | 77 (88) | 378 (82) | 54 (78) | 70 (66) |
≥65 | 385 (23) | 4 (25) | 183 (29) | 2 (7) | 53 (19) | 11 (12) | 81 (18) | 15 (22) | 36 (34) |
Sex | |||||||||
Female | 684 (41) | 8 (50) | 270 (43) | 12 (44) | 129 (47) | 34 (39) | 165 (36) | 28 (41) | 38 (36) |
Male | 978 (59) | 8 (50) | 355 (57) | 15 (56) | 143 (53) | 54 (61) | 294 (64) | 41 (59) | 68 (64) |
KPS | |||||||||
<90 | 481 (29) | 7 (44) | 245 (39) | 5 (19) | 120 (44) | 49 (56) | — | 24 (35) | 31 (29) |
90–100 | 646 (39) | 9 (56) | 304 (49) | 22 (81) | 152 (56) | 39 (44) | — | 45 (65) | 75 (71) |
Unknown | 535 (32) | — | 76 (12) | — | — | — | 459 (100) | — | — |
RPA | |||||||||
3 | 234 (14) | 2 (12) | 65 (10) | — | 46 (17) | 16 (18) | 78 (17) | — | 27 (25) |
4–5 | 1,309 (79) | 14 (88) | 537 (86) | — | 226 (83) | 72 (82) | 381 (83) | — | 79 (75) |
Unknown | 119 (7) | — | 23 (4) | 27 (100) | — | — | — | 69 (100) | — |
MGMT promoter methylation status | |||||||||
Unmethylated | 768 (46) | 7 (44) | 313 (50) | 15 (56) | — | 88 (100) | 236 (51) | 69 (100) | 40 (38) |
Methylated | 687 (41) | 9 (56) | 254 (41) | 6 (22) | 272 (100) | — | 116 (25) | — | 30 (28) |
Unknown | 207 (12) | — | 58 (9) | 6 (22) | — | — | 107 (23) | — | 36 (34) |
Extent of surgical resection | |||||||||
Biopsy | 130 (8) | — | 55 (9) | 5 (19) | — | — | 42 (9) | 6 (9) | 22 (21) |
Gross total resection | 771 (46) | 11 (69) | 293 (47) | 9 (33) | 137 (50) | 46 (52) | 192 (42) | 37 (54) | 46 (43) |
Subtotal resection | 746 (45) | 5 (31) | 277 (44) | 13 (48) | 126 (46) | 36 (41) | 225 (49) | 26 (38) | 38 (36) |
Unknown | 15 (1) | — | — | — | 9 (3) | 6 (7) | — | — | — |
IDH mutation status | |||||||||
Wildtype | 760 (46) | — | 620 (99) | 17 (63) | — | — | — | 69 (100) | 54 (51) |
Mutant | 6 (0) | — | 1 (0) | 4 (15) | — | — | — | — | 1 (1) |
Unknown | 896 (54) | 16 (100) | 4 (1) | 6 (22) | 272 (100) | 88 (100) | 459 (100) | — | 51 (48) |
Time-to-event in months | |||||||||
Follow-up time | |||||||||
Median (95% CI) | 33 (32–35) | NR (NR–NR) | 55 (46–71) | 38 (22–NR) | 30 (28–32) | 22 (21–NR) | 31 (29–32) | 19 (15–23) | 40 (37–47) |
Overall survival | |||||||||
Median (95% CI) | 19 (19–20) | 14 (12–24) | 21 (20–22) | 16 (11–32) | 27 (24–33) | 14 (13–15) | 17 (15–18) | 15 (13–17) | 21 (19–26) |
Progression-free survival | |||||||||
Median (95% CI) | 9 (8–10) | 8 (7–18) | 10 (10–11) | 8 (5–23) | 15 (12–19) | 8 (6–8) | 6 (6–8) | 6 (6–8) | 8 (6–11) |
Abbreviations: CI, confidence interval; DFCI GBM, Dana-Farber Cancer Institute institutional database of glioblastoma patients; IDH, isocitrate dehydrogenase; KPS, Karnofsky performance status; MGMT, O6-methylguanine-DNA-methyltransferase; RPA, recursive partioning analysis; UCLA/KP GBM, Institutional database of glioblastoma patients treated at University of California Los Angeles/Kaiser
Permanante, used in prior analysis (3).
aLimited data access, used in prior analysis (3).
GBM-X strives to provide an uncomplicated process to use external data, contextualize single-arm trials, and improve inference on treatment effects early in drug development. The proposed workflow for the data platform is illustrated in Fig. 1. Investigators who have an ongoing or completed glioblastoma clinical trial can request an analysis to integrate information from the GBM-X Data Platform. We have structured the platform to allow users at various institutions to obtain these analyses without direct sharing of deidentified patient-level data from our data collection. We offer a library of standardized analyses that integrate data from new trials with patient-level information from our data collection. Once standardized agreements are achieved and patient privacy is assured via Institutional Review Board requirements, we will run the analysis and provide results, such as treatment effect estimates, by using established statistical methods. The investigators that request the analyses will receive (i) data dictionaries of our datasets (ii) simulated datasets that resemble major characteristics of the actual GBM-X datasets (unit of measures, names of the variables, etc.) and (iii) the R code used for the analyses. These are key components that will provide the users transparent information on the statistical procedures underlying the data analysis, and investigators can ask the GBM-X team for additional details as needed. The platform will be initially geared toward investigator-initiated trials, and we will limit the service to 10 clinical studies for an initial 1-year pilot period. We will then expand beyond these constraints based on the experience during the pilot period and feedback from the community and users.
The GBM-X Data Platform builds on the experience of existing data-sharing platforms (e.g., Project Data Sphere, YODA, Vivli) that are generally broad in scope and have an assortment of trial datasets across different indications, including common and rare cancers. While these efforts have shown tremendous potential, investigators hoping to leverage external data for trials will generally benefit from as much disease-specific datasets as possible (2). GBM-X seeks to unify data assets for a single disease with the explicit purpose of serving future analyses and interim decisions of early-stage trials in glioblastoma.
Definitions of all variables in the current GBM-X database and study populations have been compared with identify potential differences across studies. We used data dictionaries and publications to examine the definitions and representations of populations, outcomes, and pretreatment variables across datasets. Consistency of formats and definitions of variable are necessary to effectively leverage external datasets in the analysis of clinical trials.
Future Directions
Having a data hub is essential to effectively leverage external data in future clinical trials, but the data platform must be dynamic and up to date to stay relevant. Indeed, it is necessary to account for changes such as improvements in treatments, supportive care, or the identification of novel biomarkers. Moving forward, GBM-X can potentially expand in multiple directions. While we have deidentified patient-level datasets for newly diagnosed glioblastoma, we envision future growth of the data collection by (i) incorporating additional datasets, (ii) integrating additional patient-level information, such as imaging, next-generation sequencing data, and toxicity data, and (iii) broadening to other disease populations [recurrent glioblastoma, isocitrate dehydrogenase (IDH)-mutant gliomas, and H3K27M-mutated gliomas]. We will prioritize extensions that can translate into more efficient development of new treatments.
The platform is designed to welcome interested collaborators that want to share data, with the expectation that the data collection can continue to grow and remain temporally relevant. In accepting data from other research groups, we are allowing for flexibility to accept data through two collaborative data-sharing models. The first model is (i) centralized patient-level data sharing, where deidentified patient-level data are housed on the GBM-X server (see Fig. 1). The second model is (ii) data-private collaborative learning without direct patient-level data sharing (i.e., contributors submit data summaries to GBM-X). In this latter model of data sharing, we will incorporate data summaries, such as regression models, which will add information from completed studies to GBM-X. This approach facilitates data sharing and provides an option for investigators who want to contribute without the complexities of sharing patient-level records. Standard meta-analytic methods can be used to summarize the regression models that will be provided by different study groups into a single regression function (11), and this summary can be used in the analysis of single-arm trials to infer treatment effects. Any data provided to the GBM-X Data Platform will be used only for its prespecified purpose, as specified by data contributors.
Conclusion
RCTs will remain the indisputable gold standard for the evaluation of treatments, but external datasets can supplement information gleaned from RCTs and single-arm studies. As data access remains the greatest barrier to studying and implementing EACTs, data sharing platforms such as GBM-X attempt to break down these barriers and allow for more efficient treatment development in neuro-oncology. The GBM-X Data Platform can help accelerate the evaluation of new therapies in neuro-oncology and can serve as a model to improve drug discovery for rare and difficult-to-treat tumors in oncology.
Authors' Disclosures
R. Rahman reports grants from Project Data Sphere during the conduct of the study. S. Ventz reports grants from NCI during the conduct of the study. T. Cloughesy reports personal fees from Katmai, Lista Therapeutics, Stemline, Novartis, Roche, Sonalasense, Sagimet, Clinical Care Options, Ideology Health, Servier, Jubilant, Immvira, Gan & Lee, BrainStorm, Sapience, Inovio, DNATrix, Tyme, Kintara, SDP, Bayer, Merck, Boehringer Ingelheim, VBL, Amgen, Kiyatec, Odonate Therapeutics, QED, Agios, and Novocure and personal fees and other support from Chimerix and Global Coalition for Adaptive Research outside the submitted work; in addition, T. Cloughesy has a patent for UC case no(s). 2017-973, 2019-630, 2021- 014, 2021-059, 2021-060, UCLA 2023-067-1, UCLA 2023-049-1, UCLA 2021-232-1, UCLA 2020-446-2, UCLA 2021-083-1, pending, issued, licensed, and with royalties paid from Katmai. B.M. Alexander reports personal fees from Foundation Medicine and Roche outside the submitted work. No disclosures were reported by the other authors.
Acknowledgments
The authors thank Jon McDunn and Bill Louv for input on this article and Project Data Sphere for research support. R. Rahman is supported by the Dana-Farber Cancer Institute Early Career Faculty Innovation Fund and the Joint Center for Radiation Therapy Foundation Grant. L. Trippa is supported by NIH grant R01LM013352.
Related links: Link to GBM External Data Platform website: https://rconnect.dfci.harvard.edu/gbmdata/