Background: Tumoral molecular characterization and genomic analysis is required for appropriate choice of therapy, requiring tumor biopsies which are invasive and associated with life-threatening complications. Standard of care computed tomography (CT) acquired during lung cancer management has a yet untapped wealth of information on in situ tumor architecture, heterogeneity and peritumoral environment which have prognostic implications. Previously published literature on overall survival (OS) prediction in stage III non-small cell lung cancer (NSCLC) on CT are limited by use of heterogeneous tumor histology, therapy, imaging technique and imaging scanner type, all of which can impact radiomic features and hence potentially obscure a discernible predictive radiomic signature. To address these challenges, we 1) used a well-curated cohort of stage III NSCLC patients, 2) developed radiomic phenotypes predictive of OS, and 3) accounted for differences across image acquisition modalities and vendors.

Methods: We retrospectively analyzed 110 thoracic CT scans (82 non-contrast, 28 contrast enhanced; from three vendors) from stage III lung adenocarcinoma patients (68 female, 42 male) acquired between April 2012−October 2018, with median age of 66 (range 60−71) years, and 56 identified events of death. Isotropic interpolation (3mm) was implemented to account for variations in image spatial resolution. Tumor segmentations were performed by one of three experienced radiologists using itk-SNAP. A set of 107 radiomic features subdivided into first order statistics, shape-based and textural, were extracted for each tumor using the Pyradiomics package. Radiomic features with different distributions across vendors were identified and discarded using the Kruskal-Wallis test. Harmonization of radiomic features based on radiocontrast agents was performed using ComBat batch effect correction. Radiomic phenotypes were derived through unsupervised hierarchical clustering of the main principal components of the radiomic features. A baseline Cox model based on the established tumor volume and ECOG status was built and compared with a model integrating such clinical covariates with the radiomic phenotypes using C-statistics.

Results: The OS predictive performance of the Cox model integrating radiomic phenotypes and clinical covariates had C-index = 0.68, (95%) CI = [0.61,0.76], an improvement since the baseline model alone had C-index = 0.65, CI = [0.58,0.73]. Radiomic phenotypes derived from non-harmonized features did not add value to the predictive performance of the baseline model.

Conclusions: Accounting for differences related to image acquisition, vendors and radiocontrast agents through feature harmonization, can substantially improve the predictive performance of well-known clinical covariates using standard CT used in NSCLC management.

Citation Format: Jose M. Luna, Andrew R. Barsky, Russell T. Shinohara, Alexandra D. Dreyfuss, Hannah Horng, Leonid Roshkovan, Michelle Hershman, Babak Haghighi, Peter B. Noel, Keith A. Cengel, Sharyn I. Katz, Eric S. Diffenderfer, Despina Kontos. Robust feature selection and ComBat-based harmonization to improve survival prediction in stage III lung cancer using radiomic phenotypes [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 661.