Abstract
Introduction: A “living” prospective database for tracking patient outcomes and creating clinical trial screening populations would be useful for head and neck multidisciplinary teams and can be translated to innumerable populations of interest. The database should be rapidly accessible, economical, accurate, and transcend the typical 5-year length of grants. We hypothesized we could construct such a database and piloted the concept using patients within our head and neck cancer active surveillance population. This population consists of patients at high risk for the development of primary and secondary oral carcinoma over 20 years. A second goal was identifying subjects for oral leukoplakia studies and construction of genomic atlases. Methods: Through the University of Minnesota Clinical and Translational Sciences Institute (CTSI), we performed an IRB interrogation of our patient population in the EPIC electronic medical record (EMR) of 4,496 unique patients seen by the senior author and identified 1,375 unique deidentified patient records representing individuals with likely preneoplastic lesions. Our approach included 30 International Classification of Disease (ICD) 9 and 10 codes corresponding to 120 unique mucosal head and neck precancerous lesion descriptors. Information included typical demographic and risk data, laboratory, pathology, and possible staging data. All curated data were stored on CTSI servers with direct access to data analysis software. Results: The ICD 9 and 10 codes queried encompass 36 low risk, 16 moderate/low risk, 8 moderate/high risk, and 55 high risk lesion codes. 25/30 of the queried ICD 9 and 10 codes (>12,000 instances) were identified in the 4,496 patients queried, encompassing 1,375 unique patients, with most patients harboring multiple ICD codes. We identified 464 patients with diagnosed high risk lesions, 346 with moderate/high risk lesions, 692 with moderate/low risk lesions, and 728 with low-risk lesions. Several research functions in the EMR were identified to construct screening lists. From IRB approval to obtaining the data set from the CTSI, this process requires fewer than 24 hours of combined labor, minimal cost, and did not require manual chart abstraction. Conclusions: From our preliminary success in curating a dataset of patients diagnosed with high-risk oral lesions, we conclude that utilization of ICD 9 and 10 codes as inclusion criteria represents an efficient and simple way for busy practitioners to screen large amounts of clinical data for study populations of interest (>1 × 106 EMR). We will develop quality assurance and improvement measures to audit data accuracy. Further expansion of this work will apply similar algorithmic dataset curation to medical records acquired from clinicians beyond the current study population. Critically, this process could be used for a variety of conditions to develop comprehensive longitudinal databases to examine both retrospective and prospective outcomes, provided an ICD 9 or 10 code exists for the condition and coding is accurate and consistent.
Citation Format: Lindsey Mortensen, Alexander J. Dwyer, Beverly R. Wuertz, Frank G. Ondrey. Operationalizing the health care system electronic medical record to create head and neck cancer research and screening databases [abstract]. In: Proceedings of the AACR-AHNS Head and Neck Cancer Conference: Innovating through Basic, Clinical, and Translational Research; 2023 Jul 7-8; Montreal, QC, Canada. Philadelphia (PA): AACR; Clin Cancer Res 2023;29(18_Suppl):Abstract nr PO-010.