Abstract
Despite the crucial role of phenotypic and genetic intratumoral heterogeneity in understanding and predicting clinical outcomes for patients with cancer, computational pathology studies have yet to make substantial steps in this area. The major limiting factor has been the bulk gene–sequencing practice that results in loss of spatial information of gene status, making the study of intratumoral heterogeneity difficult. In this issue of Cancer Research, Acosta and colleagues used deep learning to study if localized gene mutation status can be predicted from localized tumor morphology for clear cell renal cell carcinoma. The algorithm was developed using curated sets of matched hematoxylin and eosin and IHC images, which represent spatially resolved morphology and genotype, respectively. This study confirms the existence of a strong link between morphology and underlying genetics on a regional level, paving the way for further investigations into intratumoral heterogeneity.
Increasingly precise genomic and transcriptomic characterizations of cancer cell populations and their individual constituents have uncovered intratumoral heterogeneity as a complexity present in every tumor. Intratumoral genomic and transcriptomic heterogeneity has emerged as a poorly understood cause of differences in patient prognosis, with wide-ranging effects on progression and the development of resistance to therapy (1, 2).
In addition, intratumoral histologic heterogeneity is something that pathologists observe and grapple with in nearly every case, creating difficulty in assigning diagnoses and increasing interobserver variability. Clear cell renal cell carcinoma (ccRCC) in particular has been something of a poster child for all types of intratumoral heterogeneity, with studies showing remarkable variability in tumor cell morphology and mutations within the same lesion (3, 4).
The links between morphology, expression, and mutation are relatively unexplored, although lines of research in computational pathology suggest that deep learning approaches may be able to discover them. For instance, a recent multimodal deep learning framework demonstrated that better cancer diagnosis and prognosis are achieved by incorporating both phenotypes and genotypes (5). In addition, with empirical evidence supporting the hypothesis that genetic alterations are reflected in tumor morphology, recent studies have unearthed connections between the cancer phenotype and genotype, such as microsatellite instability (6), or specific genetic mutations (7). Comprehensively, these studies suggest that (i) histopathology and molecular genetics are two crucial components that should be analyzed in tandem and (ii) the complicated nonlinear link between the two modalities can reasonably be captured with deep learning frameworks. Therefore, a unified deep learning framework that incorporates both the phenotype and genotype would be a good starting point for probing intratumoral heterogeneity.
The routine evaluation of genomic intratumoral heterogeneity is made difficult given the fact that most clinical penetrant methods of mutation detection use bulk sequencing pipelines, which are significantly cheaper and less complex than multiregion or spatially resolved technologies. In computational pathology, this shortcoming has generally been dealt with using two paradigms: (i) strong supervision, where the same slide-level genomic label is assigned to all regions within the slide, or (ii) weak supervision or multiple instance learning, where each region's contribution towards the slide-level label is quantified and aggregated for slide-level prediction. However, the first approach risks having noisy labels, for example, wild-type gene label for regions where a mutation is truly present, whereas the second approach, despite the ability to create a heatmap of each region's importance for the given gene status, does not guarantee that highly important regions necessarily correspond to the specific gene status and thus requires further validation.
To better understand how heterogeneity of morphology in tumor regions can be linked to heterogeneity of mutations, Acosta and colleagues propose a deep learning framework that provides localized genetic prediction from localized morphology (8). They leverage spatially resolved genetic information obtained from IHC staining performed on proximal tissue sections to the hematoxylin and eosin (H&E)–stained regions to ensure minimal morphologic change between the two sections. Using IHC as a proxy for genotype comes with a few benefits. First, IHC is an effective alternative to sequencing-based methods due to its cheaper cost and allows easier visual assessment of intratumoral heterogeneity. Second, this provides gene mutation status at a regional level, allowing the authors to train their models locally.
Although the usage of IHC to assess intratumoral heterogeneity has been proposed before (9), this study is the first to combine the heterogeneity in phenotypes and genotypes. IHC has also been used to provide labels for gene mutation prediction convolutional neural network (CNN) in lung cancer (7); however, only the entire slide-level protein expression was used and therefore was subject to the aforementioned shortcomings. In this context, the use of pathologist-annotated IHC mapped to tumor morphology presents an important next step for computational pathology.
Primarily, using a curated set of matched whole slide H&E and IHC images from 1,282 patients with ccRCC at the Mayo Clinic, the authors focus on the three most common driver genes in ccRCC, namely BAP1, PBRM1, and SETD2, which are important prognostication factors (10). In the first part of the study, the authors ask whether CNN models can make the slide-level prediction of the mutation status of the three genes from morphology, similar to other weakly supervised prediction studies with bulk-sequencing data. The models achieved high test AUC values for predicting mutation versus wild-type BAP1, PBRM1, and SETD2, demonstrating the existence of a link between phenotype and underlying genetics. The authors then shift to the localized prediction task, leveraging the ground-truth label availability at the finer regional resolution. The regional-level prediction task was therefore cast as a fully supervised problem, in contrast to the weakly supervised slide-level prediction. The AUC for regional-level prediction for BAP1, PBRM1, and SETD2 was similar to the slide-level performance, indicating that genetic status can still be predicted from morphology at a finer scale. This implies that even with tumor subclones displaying different morphology or gene mutation status, the morphologic correlates of genotype are robust. Furthermore, in settings where clinical decisions are informed on the basis of intratumoral heterogeneity, analysis of heterogeneous morphologic features based on routinely generated H&E images can serve as a proxy or even replacement for costly genetic assays.
It is natural to ask whether these observations are impacted by the specific center at which the dataset is curated. To this end, the authors performed extensive validation of their findings on independent external ccRCC datasets, which include The Cancer Genome Atlas, two external patient tissue microarrays (TMA), and a patient-derived xenograft TMA cohort. Although prediction performances suffer slightly as expected due to the different preparation steps and mutation sequencing procedures, the internally trained models yield meaningful AUC in the independent cohorts. This further strengthens the authors' findings that robust connections exist between phenotype and genotype, especially BAP1.
Despite optimistic prospects, certain points need to be addressed prior to wider adoption in clinical practice. Even with all the benefits of IHC, sequencing-based assays can identify gene mutations with higher sensitivity and specificity than IHC (7). Therefore, it is important to validate whether IHC for a specific gene of interest achieves clinically plausible sensitivity and specificity before use for downstream tasks. Second, due to the data-intensive nature of deep learning frameworks, sufficient data must be amassed for training. In cases of limited data availability, a lightweight network with data augmentation schemes should be deployed to prevent overfitting. Furthermore, other well-known powerful feature extractors, such as residual networks or vision transformers, should also be tested.
By bridging the gap between histopathology and molecular genetics, the study by Acosta and colleagues lays the ground for further cancer-specific deep learning approaches for studying intratumoral heterogeneity in a cost-effective manner. The findings from these studies could aid further scientific inquiry to elucidate the mechanisms behind how certain genotypes result in reproducible morphological phenotypes. Practically speaking, frameworks such as these could assist in guiding multiregion sampling for sequencing or other costly procedures. In addition, the authors set a great example of conducting computational pathology research by curating and analyzing multi-institutional datasets, which is especially important for data-intensive deep learning frameworks. As more powerful deep learning frameworks, such as residual network and vision transformer, are combined with massive data collection efforts, we expect an exciting crop of deep learning research studying intratumoral heterogeneity through the lens of morphologic correlates of genotype.
Authors' Disclosures
F. Mahmood reports grants from NIH, Fredrick National Lab, BWH President's Fund, and MGH Pathology outside the submitted work. No disclosures were reported by the other authors.