Objectives To explore the feasibility of using real-world data (RWD) with machine learning methods to simulate colorectal cancer (CRC) trials (i.e., 6 Phase III randomized clinical trials comparing other treatment regimens with FOLFIRI—an FDA-approved standard of care first line chemotherapy treatment in patients with metastatic CRC). Methods We used RWD from the OneFlorida Clinical Research Consortium, a clinical research network contributing to the national PCORnet with longitudinal linked electronic health records of ~15 million Floridians. We used the study protocols in the original trials, including the eligibility criteria, to define the various study populations. We focused on patients’ safety outcomes in terms of the occurrence of severe adverse events (SAEs) after the treatments; calculated SAE prevalence, mean SAEs per patient, and SAE event rates for each category defined in the CTCAE v5.0. We considered two scenarios: (1) only simulating the control arm (CA) (i.e., the FOLFIRI arm), and (2) simulating both the CA and experimental arm (EA) (e.g., Panitumumab + FOLFIRI) and calculating the relative risk of SAE between the 2 arms. Two sampling strategies were used to simulate study population: random sampling and proportional sampling with gender and race. Among the 6 trials, only 2 had sufficient patients in OneFlorida for the two-arm simulations. We used propensity score matching (PSM) on baseline characteristics such as age, gender, race, and comorbidities to simulate the randomization process. In addition to the traditional logistic regression (LR) model, we considered machine learning (ML) models for PSM (such as neural networks) as LR-based PSM assumes linearity of the underlying variables. Each trial was simulated 1,000 times. Results Consistent with the existing literature, the mean SAE and SAE event rates were higher in all CAs simulated through RWD from OneFlorida. The proportional sampling strategy provided estimates of SAE prevalence more comparable to rates reported by the original trials. In the two-arm simulations, no significant differences were observed in the matched case-control samples using LR or ML methods. As expected with patients treated in real-world settings, larger mean SAEs and SAE event rates (but similar SAE prevalence ) were observed in the simulations compared with the original trials. The risk ratios of having SAE obtained from simulations comparing CA vs. EA were very close to the ratios calculated from the original trials. Conclusion Our study showed feasibility of simulating cancer trials using RWD and obtained comparable estimates to the original trial in terms of patient safety outcomes. Despite more SAEs in RWD, ratios between CAs and EAs were similar to the previously published rigorously conducted trials. Future in-depth investigations are warranted and shall consider state-of-the-art AI methods such as deep learning and causal AI methods to help tackle issues with using RWD for cancer trial simulation (e.g., data bias, high-dimensionality).

Citation Format: Zhaoyi Chen, Hansi Zhang, Thomas George, Mattia Prosperi, Yi Guo, Dejana Braithwaite, Elizabeth Shenkman, Jonathan Licht, Jiang Bian. Simulation of colorectal cancer clinical trials using real-world data and machine learning [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-071.