Abstract
Multiplex immunofluorescence (mIF) can detail spatial relationships and complex cell phenotypes in the tumor microenvironment (TME). However, the analysis and visualization of mIF data can be complex and time-consuming. Here, we used tumor specimens from 93 patients with metastatic melanoma to develop and validate a mIF data analysis pipeline using established flow cytometry workflows (image cytometry). Unlike flow cytometry, spatial information from the TME was conserved at single-cell resolution. A spatial uniform manifold approximation and projection (UMAP) was constructed using the image cytometry output. Spatial UMAP subtraction analysis (survivors vs. nonsurvivors at 5 years) was used to identify topographic and coexpression signatures with positive or negative prognostic impact. Cell densities and proportions identified by image cytometry showed strong correlations when compared with those obtained using gold-standard, digital pathology software (R2 > 0.8). The associated spatial UMAP highlighted “immune neighborhoods” and associated topographic immunoactive protein expression patterns. We found that PD-L1 and PD-1 expression intensity was spatially encoded—the highest PD-L1 expression intensity was observed on CD163+ cells in neighborhoods with high CD8+ cell density, and the highest PD-1 expression intensity was observed on CD8+ cells in neighborhoods with dense arrangements of tumor cells. Spatial UMAP subtraction analysis revealed numerous spatial clusters associated with clinical outcome. The variables represented in the key clusters from the unsupervised UMAP analysis were validated using established, supervised approaches. In conclusion, image cytometry and the spatial UMAPs presented herein are powerful tools for the visualization and interpretation of single-cell, spatially resolved mIF data and associated topographic biomarker development.
Introduction
Multiplex immunofluorescence (mIF) is an emerging technology that has great potential for quantifying and mapping immune cells and associated immunoactive protein expression (1). There is great interest in using this platform for prognostic and predictive immuno-oncology biomarker discovery. A recent meta-analysis showed that assays using mIF had higher positive and negative predictive values for response to anti-PD-1/PD-L1 when compared with PD-L1 IHC, an IFNγ gene signature, or mutational burden (2). In that meta-analysis, the typical number of markers studied was only two or three. Multispectral, mIF systems are now available that can study up to eight markers across an entire slide (>1,000 high-power fields).
mIF tissue-based assays allow for marker coexpression and spatial parameters to be studied with single-cell resolution. Signal intensity can also be assessed, rendering these technologies for tissue sections analogous to flow cytometry. Output parameters for mIF assays are typically assessed using image analysis software and exported in machine-readable format on a per-cell basis, facilitating bespoke queries with software such as R. This approach poses multiple challenges and can be time-consuming. Once cells are identified (segmenting), using the image analysis software, manual training is required for the assignment of individual cell types and their expression/coexpression patterns (phenotyping). Tools to explore the complex spatial associations between markers in an unsupervised manner are also lacking. Current supervised approaches include proximity analyses using a predetermined distance between cells or nearest neighbor analyses (3–6), both of which have the potential to oversimplify the data.
A potential alternative is to use methods from the field of flow cytometry, which has a long history of analyzing coexpression and intensity data on a cell-by-cell basis. Dimensionality reduction techniques such as t-distributed stochastic neighborhood embedding and uniform manifold approximation and projection (UMAP) are available for exploring high-dimensionality flow cytometry datasets (7, 8), but lack a spatial component. In this study, we set out to: (i) determine whether we could capitalize on previously defined flow cytometry workflows to analyze mIF data, while maintaining the spatial information provided by this emerging technology, that is, “image cytometry”; and (ii) develop a novel spatial UMAP-based, unsupervised approach to cluster and visualize the resultant data and identify spatial correlations of clinical relevance.
Materials and Methods
Specimen collection and tissue microarray design
This study was conducted in accordance with the Declaration of Helsinki and was performed following Johns Hopkins University Institutional Review Board approval (#NA_00085595). This protocol allows for the retrieval of tissue from archives from patients who signed an informed written consent or with waiver of consent. Ninety-three specimens from unique patients with metastatic melanoma acquired from 1990–2012 were identified in the surgical pathology archives. At the 5-year follow-up time point, 34 of 93 patients (36%) were alive. Additional clinicopathologic characteristics are provided in Supplementary Table S1.
The diagnosis of melanoma was confirmed by a board-certified pathologist (J.M. Taube), and a representative formalin-fixed, paraffin-embedded (FFPE) block was chosen from each specimen for inclusion in tissue microarrays (TMA). Six 1.2-mm cores were taken from each block including central and peripheral tumor areas (Supplementary Fig. S1). The resultant TMAs were reviewed, and cores with <10% surface area occupied by tumor cells were excluded from analysis.
mIF
TMAs were stained for CD8, PD-1, PD-L1, CD163, FoxP3 and melanoma cocktail (Sox10 and S100) as previously described using the antibodies listed in Supplementary Table S2 (9). Briefly, following deparrafinization of 4-μm-thick tissue slides, antigen retrieval was performed through microwave pretreatment with AR9 followed by AR6 (Akoya Biosciences, catalog nos. AR900250ML and AR600125ML). The first primary antibody (“Position 1”) was then applied. Opal polymer HRP Ms+Rb (Akoya Biosciences, catalog no. ARH1001E) or PowerVision Poly-HRP (1:1 dilution, Leica Biosystems, mouse PV catalog no. PV6119 and Rb PV catalog no. PV6114) was used as the secondary antibody. The slides were washed, and the tyramide signal amplification dye (Opal 7 color kit, Akoya Biosciences, catalog no. NEL861001KT) for Position 1 was applied. Slides were then microwaved in AR6 to strip the primary and secondary antibodies, washed, and blocked again using blocking solution. The second primary antibody (“Position 2”) was applied, and the process was repeated through amplification of the sixth primary antibody (Supplementary Table S2). DAPI was applied and slides were coverslipped using ProLong Diamond Antifade Mountant (Life Technologies, catalog no. MAP36961).
Stained slides were scanned using the Vectra 3.0 Quantitative Pathology Imaging System (Akoya Biosystems). InForm 2.3 Image Analysis software (Akoya Biosciences) was used for spectral unmixing, cell segmentation, and identification and quantification of cellular subsets. Sox10/S100 (tumor), CD8, CD163, and FoxP3 were considered “linage markers”; they were scored in a binary fashion (“positive” or “negative”) and the densities of these lineages were determined. Cells that were negative for these markers were considered in the “Other” cellular subset. PD-1 and PD-L1 were considered “expression markers,” and their expression by the identified cellular subsets was determined.
Image cytometry
In addition to the InForm image analysis workflow described above, a flow cytometry–like workflow was performed (Fig. 1A and B). In this approach, the Vectra system was used only for imaging and spectral unmixing. The subsequent image analysis was performed by transferring the resultant, unmixed multilayer-TIFF images into HALO V2.0 (Indica Labs) for cell segmentation. The resultant single-cell data [cell area and location as well as mean fluorescence intensity (MFI) for each marker per cell] was exported to FlowJo (V10, Becton, Dickinson & Company). All cores corresponding to a single tumor sample were grouped using gates. Marker expression for each individual specimen was then gated on the basis of MFI. Analyses of cell density and distance were performed in R software. In addition, the association between CD8 proximity and PD-1 and PD-L1 expression intensity was ascertained. For this latter analysis, only specimens containing at least 150 PD-L1+ or PD-1+ cells, respectively, were included.
Spatial UMAP
We implemented a UMAP dimensionality reduction technique to analyze the multiplex, coordinately mapped, single-cell data. A per-cell feature vector based solely on spatial information was encoded for every cell and input into the UMAP. Specifically, the cell density of each of the five cell lineages (tumor, CD163+, CD8+, FoxP3+, and “Other”) were determined within five concentric spatial bounds (0–25, 26–50, 51–100, 101–150, 151–200 μm) around each individual cell. This resulted in a feature vector of 25 densities [5 (cell types) × 5 (spatial bounds)] for each cell, of which we used 2,500 cells per specimen to train the UMAP embedding. This algorithm was then applied to an independent set of 2,500 cells per specimen for visualization. Information from the “expression” markers (PD-1 and PD-L1) was not used to generate the UMAP. The intensity of PD-1 and PD-L1 expression was overlaid onto the UMAP landscape after it was generated to evaluate for potential spatial encoding.
Survival analyses
Clinical variables were studied, including the age of the patient, tumor stage, the size of the largest tumor deposit, and whether extracapsular extension was present if the tumor was in a lymph node. In addition, n = 552 potential mIF variables were identified, including density for each cell lineage, density of cells expressing PD-1 and PD-L1, and permutations of the densities of each of these cell phenotypes within 25 μm of another cell phenotype. Representative variables included: density of PD-L1+CD163+ cells within 25 μm of a CD8+ cell; or any PD-L1+ cell within 25 μm of a PD1+ cell.
To determine the clinical and mIF features most closely associated with 5-year overall survival, both supervised and unsupervised approaches were used. The supervised approach was performed using R/Bioconductor software (v.4.0.3), and variable selection was performed two ways. In the first, univariate Cox regression analysis was used to estimate the P value of an association of each continuous variable with 5-year survival using the coxph function from the survival package (v.3.2–7; ref. 10). In the second approach, least absolute shrinkage and selection operator (LASSO) was used to reduce closely related covariates. To find the optimal threshold for variable selection, we performed 5-fold cross-validation using the glmnet package (v. 4.1; ref. 11).
The unsupervised approach for identification of features associated with survival was performed using spatial UMAPs. The single-cell data from survivors and nonsurvivors at the 5-year time point were aggregated by outcome groups. Differences between the two resultant spatial UMAP average bin densities were used to identify cell population with positive and negative prognostic impact.
Statistical analysis and data and code availability
Correlation between cell densities obtained by image cytometry and InForm platforms were evaluated with simple linear regression. Comparisons between PD-1 and PD-L1 MFI according to topographic variables, for example, CD8+ proximity used Mann–Whitney (two variables) or Kruskal–Wallis (more than two variables) tests.
The data and spatial UMAP code are available at https://www.sciserver.org and https://github.com/BarasLab/SpatialUMAP, respectively. The supervised survival analysis code is available in Supplementary Computational Methods.
Results
Image cytometry
We compared the densities of CD8+, CD163+, FoxP3+, PD-1+, PD-L1+, and tumor cells using flow cytometry software with those obtained through the current gold standard, that is, the phenotyping software associated with the microscope platform, and showed a strong correlation for all markers (R = 0.79 or above; Fig. 1C). We noted that image cytometry had the added benefit of a 5-fold reduction in run-time.
Spatial UMAP
We used the cell lineage and X,Y coordinate data generated using image cytometry to create spatial UMAPs. These allowed us to cluster and visualize the topographic data and identify cells embedded within similar tumor immune neighborhoods (Fig. 2A). Distinct neighborhoods were delineated when the densities of different cell lineages were highlighted within the UMAP (lineage-specific subset UMAPs). They included those composed predominantly of tumor cells (immune-silent or excluded areas), those enriched for CD8+ cells and FoxP3+ cells (T-cell inflamed microenvironment), and those enriched for CD163+ and “Other” cells (myeloid-predominant or stromal cell rich; Fig. 2B). Although tumors as a whole have previously been characterized as immune-rich or immune-excluded (12), we were able to identify these motifs on a regional scale within individual tumors.
Next, we overlaid the expression intensity of PD-1 and PD-L1 on the lineage-specific subset UMAPs, revealing spatial encoding of PD-L1 and PD-1 expression (Fig. 3A). The highest intensity PD-L1 expression was observed on CD163+ cells in areas with dense CD8+ infiltrates. PD-L1 expression was also observed on all other lineages (tumor, CD8+, FoxP3+, and “Other” cells) in CD8+-dense neighborhoods. A cluster of PD-L1−CD163+ cells in CD8-poor neighborhoods was also evident, further supporting adaptive (IFNγ-driven) PD-L1 expression (13). The highest PD-1 expression was observed on CD8+ cells in tumor cell–rich spatial clusters with low-level PD-L1 expression, consistent with a more exhausted T-cell phenotype following prolonged antigen exposure (14). We then corroborated the PD-L1 and PD-1 spatial cluster UMAP findings. First, we tested the MFI of PD-L1 at different distances from CD8+ cells, and we found that both PD-L1 expression intensity and proportion of PD-L1+ tumor cells, CD163+ macrophages, and “Other” cells increased with proximity to CD8+ cells (Fig. 3B; Supplementary Fig. S2). Next, CD8+PD-1+ cells were tested for PD-1 expression intensity as a function of the density of tumor cells in the immediate neighborhood (Fig. 3C). This analysis confirmed the UMAP finding that PD-1 intensity was greater when CD8+PD-1+ cells were surrounded by a higher density of tumor cells.
Survival analysis
The spatial UMAP was apportioned into survivors and nonsurvivors at the 5-year time point, and the differences between their average bin densities were used to identify spatial immune clusters with positive or negative prognostic impact (Fig. 4A and B). Spatial arrangements where tumor cells were adjacent to high CD8+ or PD-1+ cell densities and/or high densities of CD8+ or PD-1+ cells in close proximity to PD-L1+ cells, that is, T-cell “inflamed” tumors, were enriched in long-term survivors. In contrast, spatial neighborhoods with a high density of CD163+ cells lacking PD-L1 expression, especially those in close proximity to another macrophage, were enriched in nonsurvivors. We confirmed the prognostic impact of the spatial clusters by performing a univariate Cox linear regression analysis on the configuration for each neighborhood, for example, density of CD8+ cells ≤25 μm from any PD-L1+ cell (Fig. 4B; Supplementary Fig. S3). We next confirmed that the variables indicated by the spatial UMAP analysis were the same as those that would be identified using a conventional approach to univariate Cox regression analysis coupled with variable prioritization. Specifically, we generated more than 550 possible variables for the 6-plex mIF panel, representing different cell densities, coexpression, and supervised spatial metrics and used two different approaches to variable prioritization. Variables identified using the lowest P values from Cox regression modeling and those selected by LASSO algorithm both overlapped with those highlighted by the spatial UMAP (Supplementary Tables S3 and S4). Clinical variables such as patient age, tumor stage, largest tumor deposit size, and presence of extracapsular extension were included in these models but did not emerge as key variables.
Discussion
Flow cytometry is widely used by the scientific community and is known for its ability to enable rapid and robust multiparametric analysis of cell protein expression. Here, we present an efficient image cytometry approach for multispectral mIF image analysis that allows robust MFI-based quantification of immune populations and maintains spatial information at a single cell level. Previous studies have reported flow cytometry–like dot plots and associated gating strategies for the quantification of specific cellular subsets in slides stained with multiplex chromogenic IHC (15), 3-color standard IF (16), and confocal microscopy (17, 18). These earlier studies did not benchmark their method against established digital pathology image analysis strategies. Moreover, multispectral mIF technology is more amenable to image cytometry analysis than some of these other technologies due to scalability as well as the potential to achieve a high signal-to-noise ratio and a large dynamic range (1). Here, we describe a method comparable with current state-of-the-art digital pathology software. This approach also has the benefit of facilitating dynamic thresholding and the potential integration of compensation tools to correct for marker bleedthrough between channels.
Spatially annotated, multiplex datasets are relatively new to the field of immuno-oncology. So far, efforts to explore this type of data have used fairly restricted characterizations of spatial encoding in the microenvironment (arbitrary cell-to-cell distance cutoffs, so-called proximity analysis, nearest neighbor analysis; refs. 3, 4, 19–21). More comprehensive approaches are needed to unravel more subtle and complex correlations. The spatial UMAP dimensionality reduction approach allows for unsupervised clustering of multifaceted spatial arrangements within the tumor microenvironment, including those associated with clinical outcome. Given that the clusters shown here are generated based on lineage markers arrangements only, the fact that PD-1 and PD-L1 expression localizes to select spatial UMAP regions indicates spatially encoded expression. Furthermore, the expression levels of these markers, for example, PD-1low versus PD-1high, also appears to be spatially encoded, which is of interest as such levels are known to have functional relevance (14). Future studies with spatial UMAPs may explore larger tumor surface areas, for example, whole slides stained and imaged with multispectral mIF (9), higher resolution distance metrics (22), and comparisons between lesions from different metastatic and anatomic sites (23, 24).
In summary, our work shows that the use of image cytometry provides a robust, efficient, and reproducible platform to quantify large datasets of mIF-stained tumors. Furthermore, it also supports the use of spatial UMAPs for improved interpretation of mIF data, including the discovery of relevant cell-to-cell interactions and orchestrated marker expression. Although we focused on multispectral mIF output here, our pipeline could be used with any type of high-dimensionality multiplex imaging data, for example, imaging mass cytometry or multiplexed ion beam imaging (25).
Authors' Disclosures
S. Berry reports personal fees from Akoya Biosciences outside the submitted work, as well as a patent for P16788-01 PRO 63208829 pending. E.L. Engle reports nonfinancial support from Akoya Biosciences during the conduct of the study, as well as a patent for method of predicting response to immunotherapy pending. B. Green reports nonfinancial support from Akoya Biosciences during the conduct of the study, as well as a patent for mIF/IHC assay on tumor tissue for prognosis pending. R. Gottardo reports other support from Ozette Technologies outside the submitted work; has received consulting income from Takeda Vaccines, speaker fees from Illumina and Fluidigm, and research support from Janssen Pharmaceuticals; and declares stock ownership in 10X Genomics, Vir Biotechnology, Abcellera, and BioNTech. R.A. Anders reports grants and personal fees from Bristol Myers Squibb, and personal fees from AstraZeneca and Merck SD during the conduct of the study. E.J. Lipson reports grants from Bloomberg-Kimmel Institute for Cancer Immunotherapy, Barney Family Foundation, Moving for Melanoma of Delaware, and Laverna Hahn Charitable Trust during the conduct of the study, as well as personal fees from Bristol Myers Squibb, Novartis, EMD Serono, Array BioPharma, Macrogenics, Merck, Sanofi/Regeneron, Genentech, Odonate Therapeutics, and Eisai outside the submitted work. J.M. Taube reports nonfinancial support from Akoya Biosciences during the conduct of the study; other support from Akoya Biosciences, Merck, AstraZeneca, and Compugen and grants and other support from Bristol Myers Squibb outside the submitted work; and a patent for AstroPath imaging and mIF assay pending. No disclosures were reported by the other authors.
Authors' Contributions
N.A. Giraldo: Conceptualization, data curation, software, formal analysis, investigation, methodology, writing–original draft, writing–review and editing. S. Berry: Formal analysis, validation, investigation, methodology, writing–review and editing. E. Becht: Methodology, writing–review and editing. D. Ates: Resources, data curation, methodology. K.M. Schenk: Data curation. E.L. Engle: Data curation, formal analysis, supervision, validation, writing–review and editing. B. Green: Data curation, software, formal analysis, validation, methodology. P. Nguyen: Resources, data curation, formal analysis, validation, methodology. A. Soni: Data curation, validation, writing–review and editing. J.E. Stein: Data curation, formal analysis, methodology, writing–review and editing. F. Succaria: Resources, data curation, writing–review and editing. A. Ogurtsova: Resources, methodology, writing–review and editing. H. Xu: Validation, methodology, writing–review and editing. R. Gottardo: Software, methodology, writing–review and editing. R.A. Anders: Supervision, writing–review and editing. E.J. Lipson: Data curation, supervision. L. Danilova: Conceptualization, data curation, formal analysis, investigation, methodology, writing–review and editing. A.S. Baras: Conceptualization, data curation, software, investigation, methodology, writing–original draft, project administration, writing–review and editing. J.M. Taube: Conceptualization, resources, data curation, formal analysis, supervision, funding acquisition, investigation, methodology, writing–original draft, project administration, writing–review and editing.
Acknowledgments
The authors thank Kareem Adams and Danielle Signer for their administrative and technical assistance. This work was supported by NIH 3R01CA142779 (J.M. Taube) and NCI R50CA243627 (L. Danilova); The Mark Foundation for Cancer Research (J.M. Taube, E.L. Engle, S. Berry, L. Danilova); Emerson Collective (J.M. Taube); Bristol Myers Squibb (J.M. Taube); Sidney Kimmel Cancer Center Core Grant P30 CA006973 (J.M. Taube); and The Bloomberg∼Kimmel Institute for Cancer Immunotherapy.