Abstract
Lung Cancer screening trials have demonstrated significant mortality reduction. Low-Dose Computed Tomography (LDCT) screening can frequently discover many small nodules in at risk participants. However classification of these, sub-cm nodules as cancerous or benign is a challenging task even for expert clinicians. In this work we use machine learning (ML) techniques to differentiate, cancerous (clinically confirmed) and benign nodules (>5 years of follow-up). Data for this study is drawn from a screening study (PanCan) from which we selected 613 distinct nodules (141 cancerous, and ~size matched 472 benign). We analyzed texture and shape features (~170) that are extracted from the nodule with and without perimeter transition pixels to control for perimeter effects. Features are also extracted from the ring of parenchyma surrounding the nodule to account for tumor effects on surrounding tissue. From the equivalent location in the opposite lung, parenchymal characteristics were extracted which we use to normalize the nodule texture features in an effort to reduce scanner bias. Our preliminary results for machine learning classification have shown model accuracies of up to ~80% (feature selection and classification algorithm dependent). Radiomic feature data can be combined with patient demographic variables such as age, sex, and smoking status to further improve our models, reaching ~84% classification accuracy. When normalized by the opposite lung, 58% of texture features showed improved classification ability. A single feature from the ring of parenchyma surrounding the nodule achieved 73.5% accuracy. Others have shown that nodule area/volume is a good classifier; for this data set it gave 68% accuracy. Data exploration to study classification accuracies among different subsets of patients reveals a number of interesting trends. We have noted for instance that in patients without emphysema, tumors can be classified with much higher accuracy (>90%) than in those who suffer from it. This and other results detailing classification among specific patient groupings may prove relevant in any potential future clinical implementation of these results.
Citation Format: Rohan Abraham, Ian Janzen, Saeed Seyyedi, Sukhinder Khattra, John Mayo, Ren Yuan, Renelle Meyers, Stephen Lam, Calum MacAulay. Machine learning CADx process for classification of lung nodules below the Lung-RADS 4A threshold in LDCT scans [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-053.