Abstract
We present a novel Generative AI pipeline designed to integrate molecular, clinical, and functional data, specifically tailored to develop explainable predictive models and therapeutic strategies for Multiple Myeloma (MM). MM is an incurable, biologically complex, and a heterogeneous disease characterized by multifactorial progression and therapy-resistant phases, driven by genetic, cytogenetic, epigenetic, and transcriptomic alterations. These alterations exhibit patient-specific variability but converge on conserved biological pathways. The lack of consensus in treating pre-malignant, high-risk, or therapy-refractory MM patients underscores the need for innovative approaches like the one described here. Our AI architecture leverages the Mistral 7 Large Language Model (Mistral AI), fine-tuned using Low-Rank Adaptation (LoRA), and incorporates a Retrieval-Augmented Generation (RAG) layer to enhance contextual understanding and decision-making. The pipeline was trained on a unique cohort of 2419 MM patient samples (PMRC DB), spanning the disease spectrum from pre-malignant to late-refractory stages, using integrated data from molecular (RNAseq - 1175, whole exome sequencing - 1232, cytogenetics - 2008), clinical (treatment history and response - 1413), and ex vivo functional assays (986), and validated in CoMMpass cohort (MMRF). The RAG framework integrates a literature database curated from PubMed abstracts relevant to MM biology and clinical trials through the R library easy PubMed. The documents were embedded into dense vector representations using the sentence-transformers/all-MiniLM-L6-v2 model. These embeddings were indexed using FAISS's IndexFlatL2, for fast similarity-based retrieval of relevant documents for a query run on structured PMRC DB. The pipeline retrieves the top-k relevant documents based on proximity to the query's embedding, providing essential context. Retrieved documents and the user query are combined into a structured prompt, which is passed to the model for contextually aware text generation. The model uniquely combines an MM patient’s features and biological/clinical context from published literature to generate text output that provides patient-specific insight from published literature. The AI system provided explainable insights into MM treatment strategies and can potentially aid in therapeutic decision-making. Its accuracy as a predictive biomarker and capacity to elucidate underlying biology require further validation, especially for clinical cohorts with sparse or missing data. This work represents a proof-of-concept for the integration of generative AI with domain-specific biological and clinical datasets, setting the foundation for more precise and explainable AI-driven therapeutic strategies in MM.
Praneeth Reddy Sudalagunta, Rafael Renatino Canevarolo, Daniel DeAvila, Rachel Howard, Maria Coelho Silva, Alexandra Achille, Angel Perez, Robert Gatenby, Mark B. Meads, Kenneth H. Shain, Ariosto Siqueira Silva. A generative AI architecture integrates molecular, clinical and functional data to develop explainable predictive models and therapeutic strategies for multiple myeloma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 1 (Regular Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_1):Abstract nr 3639.