Introduction: In digital pathology, large amounts of time and money are spent annotating whole slide images (WSIs). By using an AI algorithm trained on thousands of WSIs, it is possible to remove the process of annotating from the workflow whilst simultaneously finding potential imaging features that correlate with drug response, molecular phenotypes, histologic features (e.g. Tertiary Lymphoid Structures), inflammation at the tumor-stroma interface, artifacts, etc. Method: A pipeline was constructed for ingesting WSI from the set of diagnostic slides available in the cancer genome archive (TCGA). The WSIs were masked for tissue areas and divided into patches extracted at multiple magnification levels. These patches were used to train a U-Net (initialised with a pre-trained ResNet34) to complete a self-supervised in-painting task and evaluated with perceptual loss using VGG16. After training, embeddings were extracted from the bottleneck layer of the U-Net, and these were evaluated for their capacity to cluster according to tissue type and magnification level. Results: The trained embeddings showed strong clustering by tissue type, magnification level, and separation of artefacts. The results were qualitatively evaluated by pathologist with consensus and exceeded the performance of a baseline pre-trained ImageNet model. Conclusion: These results highlight a novel methodology for AI algorithm development that removes the need for numerous pathologists to annotate pathology images by hand and converts their role to reviewing and quality-checking images. AI algorithms based on this approach can have a significant impact in the pathologic evaluation of tissue for oncology clinical trials by extracting a richer set of features than glass microscopy alone would permit. Furthermore, self-supervised methods, such as in-painting allow for the training of embeddings that encapsulate a richer understanding of the data and are consequently more repurposable than supervised/labelled approaches.
Citation Format: Jason Hipp, Mona Xu, Lucas Bordeaux, Feng Gu, Carlos Pedrinaci, Khan Baykaner. Unsupervised learning of image embeddings enables new opportunities to extract novel information from digital pathology H&E images [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-076.