Abstract
Purpose: Lung cancer has been the leading cause of cancer-related deaths worldwide. To address the clinical need for efficacious treatments, genetically engineered mouse models (GEMMs) have become integral in identifying and evaluating unique pathways that may be exploited as therapeutic targets. Assessment of GEMM tumor burden on histopathological sections performed by manual inspection is both time consuming and prone to subjective bias. Therefore, an interplay of needs and challenges exists for computer aided detection tools, for the accurate and efficient analysis of these histopathology images. Our work demonstrates a simple machine learning approach called sparse principal component analysis (PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E). Methods: Our method comprises four steps: 1) cascaded sparse PCA; 2) graph-based PCA hashing; 3) block-wise histograms; and 4) support vector machine (SVM) classification. In our proposed architecture, sparse PCA is employed to learn the filter banks of the multiple stages. This is followed by a graph-based PCA hashing and block histograms for indexing and pooling. The meaningful features extracted from this sparse PCA are then fed to an SVM classifier. We tested the proposed sparse PCA network on H&E slides obtained from an inducible KrasG12D lung cancer mouse model. Our dataset consists of N = 21 whole slide histopathology lung images with 9 non-tumor bearing control mice and 12 mice with visible lung tumors. Tumor lesions from 12 lung images with visible tumors were visually identified by three trained individuals, which served as ground truth. The size of each image in our dataset is 2048 × 2048 pixels. Each image was divided into non-overlapping image patches of size 20 × 20 pixels consisting of a total of 12,361 cancer lesion patches and 207,839 non-cancer patches. We used 50% of the data for training and 50% of the data for testing our proposed sparse PCA network. We evaluated our algorithm using conventional metrics that have been used for evaluation of classification algorithms, namely precision (P), recall (R), and coverage measure (F-score). Results: The automatic cancer lesion detection results were compared with manually annotated ground truth. The proposed method achieves a cancer lesion detection accuracy of 97.98% with P = 0.8624, R = 0.9062 and F-score = 0.8790. The proposed method was found to take on average 17 minutes to train and learn a good representation for accurate and efficient classification of cancerous lesions within the images. Conclusion: We demonstrated a simple machine learning methodology for detection of cancerous lesions within histopathological lung images. Experimental results show that the proposed method is able to classify the regions of interest both efficiently and accurately. Future work will focus on feature extraction of individual tumors and tumor location within lungs.
Citation Format: Sundaresh Ram, Wenfei Tang, Alexander J. Bell, Cara Spencer, Alexander Buschhuas, Charles R. Hatt, Marina P. di Magliano, Stefanie Galban, Craig J. Galban. Detection of cancer lesions in histopathological lung images using a sparse PCA network [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-086.