Triple-negative breast cancer (TNBC) is an aggressive disease that disproportionately affects African American (AA) women. Limited targeted therapeutic options exist for patients with TNBC. Here, we employ spatial transcriptomics to interrogate tissue from a racially diverse TNBC cohort to comprehensively annotate the transcriptional states of spatially resolved cellular populations. A total of 38,706 spatial features from a cohort of 28 sections from 14 patients were analyzed. Intratumoral analysis of spatial features from individual sections revealed heterogeneous transcriptional substructures. However, integrated analysis of all samples resulted in nine transcriptionally distinct clusters that mapped across all individual sections. Furthermore, novel use of join count analysis demonstrated nonrandom directional spatial dependencies of the transcriptionally defined shared clusters, supporting a conserved spatio-transcriptional architecture in TNBC. These findings were substantiated in an independent validation cohort comprising 17,861 spatial features representing 15 samples from 8 patients. Stratification of samples by race revealed race-associated differences in hypoxic tumor content and regions of immune-rich infiltrate. Overall, this study combined spatial and functional molecular analyses to define the tumor architecture of TNBC, with potential implications in understanding TNBC disparities.
Spatial transcriptomics profiling of a diverse cohort of triple-negative breast cancers and innovative informatics approaches reveal a conserved cellular architecture across cancers and identify proportional differences in tumor cell composition by race.
Breast cancer—the most commonly diagnosed cancer in women—is a disease spectrum that can be classified into several histological and molecular subtypes (1). Triple-negative breast cancer (TNBC) is a form of breast cancer that does not express estrogen receptor, progesterone receptor, or amplified HER2, and comprises approximately 15% of invasive breast cancers (2). TNBC remains clinically challenging, as it is associated with more aggressive disease, earlier metastasis, higher rates of relapse, and poorer outcomes than other breast cancer subtypes (3). TNBC is known to be a genetically heterogenous disease, with many driver events giving rise to several intrinsic subtypes with varied clinical outcomes (4, 5). This, in addition to the lack of expression of actionable targets, has made development of effective therapies for TNBC difficult (6).
TNBC disproportionately affects African American (AA) women and women of sub-Saharan African ancestry, who have higher incidence, greater TNBC-related mortality, and poorer 5-year survival rates compared with Caucasian women (7–9). The association of TNBC risk with African ancestry suggests a genetic component to the disparity (10), and genomic studies have identified unique characteristics of AA, African, and Caucasian breast cancers (11, 12). However, the resulting effects on tumor biology are poorly understood. Thus, descriptive and clinical studies of TNBC must prioritize diverse representation in study cohorts, which has historically been inadequate (13).
Advanced molecular profiling provides an opportunity to further our understanding of disease biology and therapeutic vulnerabilities through comprehensive characterization of tumor specimens. Transcriptomic profiling technologies have evolved to allow the measurement of comprehensive gene expression from single cells, which can resolve the composition of cell types within the tumor microenvironment to reveal the extent of tumor heterogeneity (14). Further advancement now allows for spatially resolved whole-transcriptome measurements, where tissue architecture is maintained and provides critical context for higher-dimensional data interpretation (15, 16). Spatial transcriptomics (ST) technologies are particularly suitable for the study of solid tumors, which develop in a complex environment where communication between malignant cells, immune cells, fibroblasts, and vasculature can influence disease progression (17). For instance, cellular interactions in breast tumors can take the form of metabolic cross-talk (18), immune cross-talk (19), and subclonal communication (20–22). Preservation of spatial context provides a critical framework in which to dissect these relationships.
In this study, we applied ST to comprehensively describe the transcriptional substructure and heterogeneity defining the cellular architecture of TNBC. We performed manual and expression-informed molecular annotation of 28 TNBC tissue sections from 14 patients, of whom 7 self-identified as AA and 7 as Caucasian. We then utilized clustering analysis to summarize and assess transcriptionally defined intratumoral heterogeneity. We further integrated expression data from all samples to generate a reference, from which, we defined 9 shared clusters represented across the entire dataset. Molecular annotation tools were used to understand the biological context of cells represented by transcriptionally distinct clusters. Moreover, to analyze spatial relationships between clusters, we made novel use of join count analysis (JCA), which revealed consistent patterns of spatial exclusion and dependency in TNBC. We validated our identified reference clusters and their spatial relationships in an independent cohort of 15 additional samples from eight AA patients. Finally, functional annotation of shared clusters revealed racial differences in the expression of clinically significant immune- and hypoxia-related genes, which may shed light on racial disparities in TNBC clinical outcomes.
Materials and Methods
De-identified human breast tissues were procured from the Virginia Commonwealth University (VCU) Massey Cancer Center Tissue and Data Acquisition and Analysis Core under a protocol approved by the VCU Institutional Review Board (VCU IRB HM2471). All patients provided written informed consent. Specimens were obtained during standard-of-care procedures and banked when judged to be in excess of that required for patient diagnosis and treatment. Fresh samples procured during routine gross examination in surgical pathology were deposited in 2 mL cryovials, snap-frozen in liquid nitrogen, then stored at −80°C. Tissue samples were embedded in Scigen Tissue Plus O.C.T. Compound (Thermo Fisher Scientific) prior to cryosectioning.
All samples selected for study were primary TNBC tumors collected between the years of 2002 and 2019. Samples were divided into two cohorts: a reference cohort of 28 samples and a validation cohort of 15 samples. Reference cohort samples were obtained from 14 patients with TNBC, 7 of whom identified as Black or AA and 7 who identified as Caucasian or Non-Hispanic White. Every effort was made to stage-match and age-match the samples, with age <50 years and stage II tumors preferred. Eleven of fourteen tumors represented in the reference cohort did not receive neoadjuvant treatment; these patients may have declined therapy or were not candidates for neoadjuvant treatment at the time of collection. Validation cohort samples were obtained from 8 patients, all of whom were AA, and varied from stage II to stage III. Six of eight tumors represented in the validation cohort received neoadjuvant treatment.
Two frozen blocks representing nonadjacent tumor sections were available for all patients, with the exception of three cases with limited tissue available, for which adjacent sections were provided, and one case for which a single section was provided. Sections were randomly sampled from frozen blocks to reduce selection bias. Distances between paired nonadjacent sections varied on the basis of the size of the tumor and its handling at the time of banking; most sections sampled the original tumor centrally. Full specimen information is available in Supplemental File S1. In total, we applied ST to 43 tissue sections, representing 22 tumor specimens.
Visium spatial gene expression assay and sequencing
Fresh frozen, OCT-embedded tissues were cryosectioned at a thickness of 10 μm and mounted on Visium spatial gene expression slides (10x Genomics, #1000184), which contain four 6.5 mm × 6.5 mm capture areas comprising 5,000 barcoded spatial features each. The samples were processed as described in the manufacturer's protocols. Briefly, tissues were stained with hematoxylin and eosin (H&E) and microscopic images were captured with a Zeiss Axioscan2 microscope using a 10× objective. Tissues were then permeabilized, allowing RNA to diffuse and bind to the slide surface via polyA capture. Permeabilization was performed for 9 minutes, which was determined using the Spatial Tissue Optimization procedure (10x Genomics, #1000193). Following permeabilization, cDNA was synthesized on-slide from the immobilized RNA. cDNA was then collected and used to generate sequencing libraries containing unique sample indices, as well as P5 and P7 primers for Illumina-compatible sequencing. Paired-end sequencing (2 × 100) was performed on an Illumina NovaSeq 6000 instrument to produce a minimum of ∼250 million read pairs per sample (Supplementary Fig. S1).
Data preprocessing and normalization
Sequencing data were preprocessed using the Space Ranger pipelines (version 1.1.0; 10x Genomics). BCL data were demultiplexed and converted to FASTQ format using the spaceranger mkfastq pipeline. The spaceranger count pipeline was then used for read alignment to human reference genome GRCh38, unique molecular identifier (UMI) counting, and generation of feature-spot matrices corresponding to the microscopic tissue image. The pipeline also performs automatic tissue detection and fiducial alignment from the image. The raw UMI counts and spatial barcodes were imported to the R program Seurat (RRID:SCR_016341, version 4.0.4; ref. 23) with the Load10x_Spatial command. Gene expression count data were then normalized using the SCtransform method from the R package sctransform (RRID:SCR_022146, version 0.3.2; ref. 24). We chose this method due to its ability to correct for technical read depth variation while preserving biological variation, which is often present in ST data due to the varying cellularity across a tissue section (24).
Frozen tissues from each block selected for study were stained with H&E and evaluated for tumor adequacy. The images captured from Visium slides were reviewed and annotated by a pathologist, using Adobe Photoshop software (RRID:SCR_014199). For each slide, areas of carcinoma were delineated from the intervening and surrounding desmoplastic stromal tissue. The degree of tumoral and stromal inflammatory cell infiltration was also assessed and reported quantitatively. Where applicable, areas of necrosis, dense immune infiltrates, or background stromal fibrosis or desmoplasia were indicated.
ESTIMATE analysis was performed with the R package estimate (version 1.0.13; ref. 25). ESTIMATE utilizes gene signatures derived from immune and stromal cells to calculate two corresponding enrichment scores: immune score and stromal score. Tumor purity is then computed from these scores as described by Yoshihara and colleagues (25). ESTIMATE was applied to normalized expression data from each spatial feature. The analysis returned a stromal score, immune score, and tumor purity for each feature, allowing visualization on a spatial map or as a violin plot.
Dimensionality reduction and clustering analysis
Normalized counts were subjected to principal component analysis (PCA) through 30 principal components. A shared nearest neighbor graph was generated, and clustering was performed with the Louvain modularity optimization algorithm at a resolution of 0.4. This resolution produced a stable set of clusters without over- or underclustering the data (Supplementary Figs. S2A and S2B). Resolution determination was aided by the R package clustree (RRID:SCR_016293, version 0.5.0; ref. 26), and by evaluation of ESTIMATE scores, UMAP projections, and spatial profiles of clusters. Dimensionality reduction was performed by Uniform Manifold Approximation and Projection (UMAP). Seurat was used for all clustering and dimensionality reduction analyses.
TNBC subtype determination
Subtype determination at the cluster level was performed with the TNBCtype webtool, which classifies samples based on centroid correlation to six TNBC subtypes (27). Normalized gene expression was averaged across all features in each cluster and provided as input. All reported results had a P value <0.05.
Gene set enrichment analysis
Gene set enrichment analysis (GSEA) was applied at both the feature level and the cluster level using the Broad Institute's GenePattern software (RRID:SCR_003199; ref. 28). At the feature level, single sample GSEA (ssGSEA; ref. 29) was applied to expression data from each spatial feature, using gene sets from the Hallmarks collection (30) and infiltrating immune cell signatures defined by Yu and colleagues (31) to yield feature enrichment scores. To apply GSEA at the cluster level, differentially expressed cluster markers were first determined by Wilcoxon rank sum testing, using Seurat FindAllMarkers feature. Gene lists pertaining to each cluster were analyzed by preranked GSEA against the Hallmark gene sets.
Deconvolution of immune content in clusters was performed with CIBERSORTx (RRID:SCR_016955; ref. 32). Normalized gene expression was summed across all features in each cluster to create a pseudo-bulk profile. This was provided as a mixture file and analyzed against the LM22 immune cell signatures (33). The analysis was run in absolute mode with B-mode batch correction at 500 permutations.
Data integration of reference cohort
Integration of transcriptional data from all 28 reference cohort samples was performed with Seurat using the method described by Stuart and colleagues (34) with the goal of identifying shared cell populations. Briefly, the method identifies transcriptionally similar anchor cells in all samples, which are used to correct technical batch effect and integrate the datasets. After individual SCTransform normalization and PCA of samples, 3,000 integration features (genes) were determined for use in anchor identification. Because of the large size of our dataset, use of a canonical correlation analysis—which determines anchors between all possible sample pairings—for anchor determination proved to be computationally prohibitive. We instead used reciprocal PCA, with sample 092A as the reference. The anchors identified were then used for integration. PCA was performed on the integrated data and integrated clusters (IC) were defined at a clustering resolution of 0.4.
Label transfer to validation cohort
IC assignments were transferred to the validation cohort utilizing Seurat's reference mapping feature. Following normalization, each sample in the validation cohort was individually queried against the integrated reference to find transcriptionally aligned anchors, using canonical correlation analysis. Identified anchors were then used to project IC labels from the reference onto the query sample.
Join count statistics
JCA is a statistical method for quantifying spatial correlation of contiguous nominal data. For application to our ST data, spatial maps were converted to a raster format, in which barcodes were represented by pixels coded by cluster assignment, as follows.
Spatial coordinates for each barcode were retrieved and annotated with the corresponding IC assignment. Using the R package raster (version 3.5–2), a rectangular raster was created with an extent encompassing the area of the tissue, and a resolution that captured one barcode per pixel with no empty space between. The resolution varied between samples, likely due to slight variations in the printing of the capture features on the slides. Each IC was then subset and rasterized individually using the raster parameters previously determined. Rasterizing ICs individually allowed each to be given a unique identifier coded in the resulting pixels. After each rasterization, a check was performed to ensure the number of pixels matched the number of barcodes in the subset. All rasterized IC subsets were then merged to recreate the full tissue section as a raster.
The rasterized sample was then compatible with JCA. Using the R package spdep (RRID:SCR_019294, version 1.1–11; ref. 35), a neighbors list was created in the “queen” style, which identifies adjacent and diagonal neighbors for each feature. The resulting neighbors list was then subset to remove empty background space, as well as single features with no neighbors (e.g., features removed from the main tissue). The neighbors list was then supplemented with binary spatial weights, and multicategorical JCA was performed with the command joincount.multi, which tabulates the joins of all possible combinations of cluster pairings within the sample. The test returns the observed join counts, the expected count under conditions of spatial randomness, and the variance of observed to expected calculated under nonfree sampling. The z score is then calculated as the difference between observed and expected counts, divided by the square root of the variance. Since this analysis does not correct for multiple testing, we selected a stringent z score cutoff of ±3 (equivalent to a P value of 0.00135, or 99.99% confidence) to assess significance.
We developed an R Bioconductor package, stJoincount (https://github.com/Nina-Song/stJoincount), to facilitate JCA of ST data produced by the 10x Genomics Visium platform. The program finds optimal parameters for sample rasterization, and performs JCA followed by result visualization.
Relapse-free survival in our reference cohort was defined as days to first recurrence or days to death, if no recurrence first occurred. If no death event was recorded, patients were considered censored at the date of their last clinical encounter. Survival data can be found in Supplemental File S1. NDRG1 and IGKC expression were averaged across all features in each sample to produce a summary expression value. Cutoff values to differentiate low from high expression were determined with the webtool CutoffFinder (36) to optimize the significance of correlation with survival. Survival analysis was then performed with R package survival (RRID:SCR_021137, version 3.3–1; ref. 37). Significance was determined by Cox–Mantel log-rank test.
Survival analysis of publicly available breast cancer microarray-based expression datasets was performed using the web-based Kaplan–Meier Plotter tool (RRID:SCR_018753; ref. 38). Datasets queried are listed in Supplemental File S2. NDRG1 expression was examined with Probe ID 200632_s_at, and IGKC with Probe ID 211643_x_at. Analysis was performed with NDRG1 and IGKC individually, and with the ratio of NDRG1 expression (as the numerator) to IGKC expression (as the denominator). Optimal cutoff values to define high and low expression were determined on the basis of an iterative approach as described by Lanczky and colleagues (38). Biased arrays and redundant samples were removed from the analysis, resulting in a total cohort size of 4,929 patients. The cohort was analyzed as a whole, and was also subset into PAM50 molecular subtypes: basal (n = 953), luminal A (n = 1809), luminal B (n = 1353), and HER2+ (n = 695). Significance was determined by Cox–Mantel log-rank test.
Expression data and tissue images for all samples are available in the Gene Expression Omnibus (GEO) under record GSE210616. Datasets analyzed for survival analysis are all publicly available and are listed in Supplemental File S2.
Sample selection and ST
We selected 14 primary TNBC tumors for study from 14 patients, including 7 that self-identified as AA and 7 that identified as Caucasian. Most tumors in this reference cohort were histopathologically classified as stage II and 11 of 14 were chemo-naïve (Supplementary File S1). Of the samples that received neoadjuvant chemotherapy, 2 were AA and 1 was Caucasian. From each tumor, we obtained two nonadjacent tissue sections, except for one case in which we used serial sections due to limited material. Thus, this reference cohort was comprised of 28 TNBC tissue sections, to which we applied ST.
The ST workflow is described in Materials and Methods and summarized in Fig. 1. Following sequencing, all data proceeded through a pipeline for demultiplexing, genome alignment, and spatial alignment. Data were quality assessed and high-quality data were obtained from all samples (Supplementary Fig. S1). On average, each sample produced expression data from 1,382 individual spatial features, and 22,855 genes were detected on average per section. Expression data for each sample was normalized using SCTransform to best correct for technical artifacts while preserving biological variation (24). All subsequent analysis was conducted with the resulting normalized expression values.
Manual and gene signature–informed sample annotation
We first analyzed whether gene expression data could be used for discernment of carcinoma from surrounding tissue. For comparison, H&E-stained images of each sample were manually annotated by a board certified anatomic pathologist (Fig. 2A), which revealed that sections were dominated by regions of tumor and fibrosis. Less common were regions of necrosis, adipose tissue, and immune infiltrate (Supplementary Fig. S3A). We then applied ESTIMATE to normalized expression data to concordantly define the transcriptional substructure of each sample. ESTIMATE was developed and validated on bulk transcriptional data to infer tumor, stromal, and immune content using gene expression signatures (25). Application of ESTIMATE at the feature level closely mirrored manual histological annotation. In representative sample 094D (Fig. 2A), ESTIMATE's stromal score highlighted extensive regions of fibrosis (Fig. 2B; Supplementary Fig. S3B) and tumor purity score accurately identified regions of tumor (Fig. 2B and C).
In independent representative case 120D, manual annotation identified fibrotic and necrotic regions, in addition to a region of dense lymphoid infiltrate (Fig. 2D). Once again, ESTIMATE scores corresponded with the histopathological assessment (Fig. 2E and F; Supplementary Fig. S3C). To further characterize the identity of infiltrating immune cells, we applied a set of cell-type specific expression signatures derived from tumor-associated immune cells (31). The spatial pattern of T-cell enrichment matches a dense ring of lymphoid cells, whereas the remainder of the region strongly expresses a macrophage signature (Fig. 2G and H). We can thus apply ESTIMATE and cell-type-specific gene signatures to ST data to quantify tumor and immune content, and augment manual annotation.
Samples in the reference cohort varied in their tumor content, with samples originating from the same patient displaying similar tumor purity profiles (Fig. 2I) and ESTIMATE scores (Supplementary Figs. S3D and S3E). UMAP embeddings also reflect the general transcriptional similarity of paired samples (Supplementary Fig. S3F). Furthermore, the immune score and stromal score were closely correlated in all samples, with both enriched in histologically defined regions of fibrosis (Fig. 2J).
Clustering reveals intratumor heterogeneity
TNBC is genetically heterogeneous (1), which is illustrated when ST data are subject to unsupervised clustering within single specimens. Clustering of spatial features at low-resolution broadly allows discernment of tumor and stromal regions in agreement with histologic and informatics-based annotation (Supplementary Figs. S2A and S2B). Increasing the resolution produces more clusters, revealing transcriptionally distinct subregions of tumor and stroma (Supplementary Figs. S2A and S2B). We determined a resolution of 0.4 to capture maximum sample heterogeneity without overclustering the data (Supplementary Fig. S2A). At this resolution, samples contained an average of four tumor clusters and one nontumor cluster (Supplementary Fig. S2C).
Clustering analysis of sample 094D based on expression data produced six clusters (Fig. 3A and B). ESTIMATE analysis revealed three clusters (2, 5, and 6) with high tumor purity, whereas cluster 4 demonstrated low tumor purity and represents fibrosis (Fig. 3C and D). Determination of TNBC subtypes defined by Lehmann and colleagues (39) for each cluster revealed subtype mixing within the sample (Fig. 3E). Basal like-1 (BL1) and mesenchymal (M) subtypes were represented in tumor regions, whereas cluster 4 was classified as mesenchymal stem like (MSL), a subtype that represents tumor-associated stromal cells (40). Cluster 1, which separates regions of tumor from regions of fibrosis was classified as immunomodulatory (IM), which reflects the presence of tumor-infiltrating lymphocytes (40).
We determined expression markers for each cluster by differential expression, and spatial visualization of these genes confirms their discrete localization (Fig. 3F and G). To better understand the biological context of cells represented by each cluster, we applied GSEA to identify molecular pathways enriched among each cluster (Fig. 3H). In agreement with ESTIMATE analysis, cluster 4 was enriched in immune-related pathways, whereas tumor clusters were enriched in various oncogenic and metabolic pathways. For example, GSEA implies the activation of MYC and cell-cycle signaling driving cancer cells within cluster 5, whereas cluster 6 likely includes tumor cells that are hypoxic and glycolysis dependent. Furthermore, the results of cluster-level GSEA are consistent with feature-level enrichment analysis when plotted spatially (Fig. 3I).
Dataset integration and identification of shared cell populations
We next sought to perform clustering of all features across all samples to investigate shared TNBC spatio-transcriptional characteristics. To do this, we integrated expression data from all features to consolidate the natural deviation across tumors from multiple individuals with varying genetic backgrounds (Supplementary Fig. S4). Integration allows us to use diverse individual datasets to create a comprehensive reference from which common transcriptionally defined TNBC cell types can be annotated (34). The resulting integrated dataset comprised 38,706 spatial features, with nearly equal representation of AA and Caucasian features (19,401 and 19,305, respectively).
The integrated data were subject to clustering analysis to produce nine ICs (Fig. 4A). Cluster assignments could then be mapped back to features in individual samples to assess the substructure of clusters across cases and sections (Fig. 4B). ESTIMATE analysis indicated that IC3, IC6, and IC7 represent likely stromal-immune cell types, whereas the remaining clusters are defined by tumor cell types, with varying degrees of tumor purity (Fig. 4C). Importantly, all ICs were represented in all individual samples in varying proportions (Fig. 4D).
Assessment of marker genes revealed further characteristics of ICs (Fig. 4E). The most widely represented tumor clusters, IC1 and IC2, are enriched in ribosomal and mitochondrial genes, respectively. We interpret these as archetypal characteristics of highly metabolic tumors. The remaining tumor ICs have more distinctive expression patterns. For example, IC5 expressed high levels of NDRG1, a hypoxia-inducible gene (41) that has been associated with breast cancer progression and brain metastasis (42). IC9 distinctly expressed the antiviral gene IFIT1, which has been studied in breast cancer as a predictive marker for chemotherapy response (43). The nontumor clusters also expressed distinguishing markers: IC3 was enriched in immunoglobulin genes, and IC7 in collagens. Gene expression mapping on individual samples confirms the colocalization of marker genes with their respective ICs (Fig. 4F).
GSEA of IC expression data revealed enrichment of a hypoxic signature in IC5, whereas IC9 was distinctly enriched for genes involved in interferon α and γ response pathways (Fig. 4G). IC1 and IC4 were enriched in several cell-cycle-promoting pathways and likely represent actively dividing tumor cells. On the other hand, IC5 may reflect dormant cells with inactive E2F signaling and G2M checkpoint signaling, consistent with a hypoxic environment. In agreement with ESTIMATE analysis, IC3, IC6, and IC7 were enriched in immune signaling pathways, whereas tumor clusters IC1, IC2, and IC5 were deficient in these (Fig. 4G).
To further explore the immune effector cell composition of ICs, we applied CIBERSORTx  to IC expression data. Consistent with ESTIMATE and GSEA, IC3 had the highest overall CIBERSORTx score, suggesting that it contains the highest immune content (Fig. 4H). Notably, tumor cluster IC5 has a high proportion of tumor associated macrophages but is relatively deficient in CD4 and CD8 T cells (Fig. 4H), which is supported by enrichment analysis of lymphocyte signatures (Supplementary Fig. S5A).
Spatial distribution of ICs
Following IC mapping across individual samples, visual inspection suggested that nonrandom spatial patterns exist. To quantitively evaluate spatial correlation between ICs, we employed JCA (Fig. 5A), which is appropriate for determination of the structures of spatially contiguous nominal data of two or more categories (44). In the simplest form of data, in which only two categories are present, JCA determines the degree of dispersion or interspersion of the observations. When applied to multicategorical data, as we have, JCA determines the strength and directionality of the spatial correlation between all category pairings (45).
The workflow for JCA of ST data are summarized in Fig. 5A and described in Materials and Methods. To apply JCA to our samples, we first rasterize our spatial data while maintaining categorical information (Fig. 5B and C). In the resulting raster, each spatial barcode is represented by a pixel coded by IC assignment. Neighbors for each observation are then identified, and JCA is performed to quantify “joins,” paired categories of neighboring observations. Spatially proximate observations will have many similar neighbors and joins, which would be greater than expected in a random arrangement of features (Fig. 5D). By the same rationale, spatially dispersed observations will manifest as dissimilarity of neighbors and fewer joins than in a random arrangement (Fig. 5E). The observed join count for each pair of observations can then be compared with what would be expected under spatially random conditions (Fig. 5F). The statistical significance of the spatial correlation is represented by the z score, which reflects the degree and directionality of deviation from randomness (Fig. 5G). A positive z score indicates clustering, whereas a negative z score indicates dispersion. Each IC is typically characterized by strong positive autocorrelation, whereas IC pairings vary on a spectrum of spatial dependencies (Fig. 5G).
We applied JCA to all 28 samples, and Fig. 5H summarizes the resulting z scores. We observed consistent patterns in the spatial relationships of several cluster pairs. IC3 and IC7 were positively correlated in nearly all samples. We had previously characterized these as stromal-immune clusters distinguished by unique gene expression programs (Fig. 4E and G). Conversely, IC3 was consistently dispersed from IC1, IC2, and IC5, which represent tumor clusters. Thus, JCA results demonstrate that some TNBC tumor cells are nonrandomly distanced from immunologically rich regions within the tumor microenvironment. The results are reinforced when presented spatially in representative samples and are further reflected in the UMAP embeddings of the integrated dataset (Fig. 5I).
Annotation of an independent validation cohort
We sought to validate the ICs defined from our reference cohort in a validation cohort of 15 additional, independent TNBC samples from eight AA patients. Samples in the validation cohort varied from stage II to III and in tumor purity scores (Fig. 6A). Following data normalization, we annotated the validation cohort via reference mapping to the integrated reference dataset. Each validation sample was individually queried against the reference, and IC labels were transferred for each feature, analogous to reference mapping of single-cell RNA-seq data (34).
On an individual basis, transferred IC labels were compared with manual annotations and tumor purity scores for each sample (Fig. 6B–D). In general, features assigned to the stromal-immune clusters (IC3, IC6, and IC7) had low tumor purity, and matched regions annotated as fibrosis. When tumor purity scores were cumulatively examined in all 15 samples, the pattern matched that seen in the reference cohort (Figs. 4C and 6E). We further validated that IC marker genes identified in the reference dataset corresponded with the label-transferred identities in the validation set (Fig. 6F). In addition, JCA revealed that all strong spatial dependencies observed in the reference cohort were replicated in the validation samples (Fig. 6G and H). Thus, we observe a common spatio-transcriptional architecture in TNBC.
Racial differences revealed by TNBC ST
Because our reference dataset contained a near-equal number of features from AA and Caucasian samples, we examined whether IC representation varied by race. We found a higher proportion of features representing IC3, IC6, and IC8 in Caucasian samples, whereas features representing IC5, IC7, and IC9 were predominantly contributed by AA samples (Fig. 7A). Furthermore, Caucasian samples contained significantly more IC3 by area, whereas AA samples contained more IC5 by area (Fig. 7B).
Given the disparity in clinical outcomes for AA and Caucasian patients with TNBC, we sought to further investigate both IC3 and IC5. Annotation of IC3 previously revealed a strong immune identity (Fig. 4). Further CIBERSORTx analysis revealed that lymphocyte and macrophage content within IC3 was significantly greater in Caucasian samples, whereas AA samples contained lower immune content overall (Fig. 7C and D). Interestingly, M0 macrophage content was disproportionately high in AA samples, particularly in IC5 (Supplementary Fig. S5B). GSEA revealed IC5 to be distinctly hypoxic and coordinately glycolytic (Fig. 4G), characteristics that are associated with cell growth, metastasis, and resistance to radiotherapy in breast cancer (46–48). We verified that IC5 corresponded to features with high hypoxia scores in both AA and Caucasian samples (Fig. 7E). However, features from AA samples had greater cumulative hypoxia scores than features from Caucasian samples (Supplementary Fig. S5C), and the average hypoxia score per section was higher in AA samples (Supplementary Fig. S5D). Furthermore, this difference was more pronounced in tumor regions, defined as features with high tumor purity scores (Fig. 7F).
JCA revealed a negative spatial association between IC5 and IC3 (Fig. 5H), which agrees with hypoxia having several established roles in immune suppression (49). The relationship of hypoxia to immune content in our cohort can be summarized by the expression ratio of NDRG1 to IGKC, the top marker genes for IC5 and IC3, respectively (Fig. 4E). We verify that the NDRG1 to IGKC expression ratio was significantly higher in AA features in our cohort, particularly in IC5 (Fig. 8A). Furthermore, we determined that high NDRG1 expression was associated with poor survival in our cohort, whereas high IGKC expression was protective (Fig. 8B and C).
We verified these findings in large, publicly available breast cancer expression datasets. In basal-like breast cancer, the molecular subtype most similar to histologically defined TNBC (50), high IGKC expression was significantly protective (Fig. 8D), more so than in Luminal A, Luminal B, and HER2+ subtypes (Supplementary Fig. S6). This is consistent with previous findings that IGKC is prognostic in breast and other cancers (51). Conversely, high NDRG1 expression was associated with worse outcomes (Fig. 8E), and a high ratio of NDRG1 to IGKC expression was most strongly indicative of poor survival (Fig. 8F), especially in basal-like breast cancer (Supplementary Fig. S6).
In this study, we utilize ST technology to perform comprehensive spatial and transcriptional characterization of a cohort of TNBC tumors from African American (AA) and Caucasian patients.
We describe several levels of data annotation. The first is at the level of the spatial feature, which encompasses approximately 5 to 10 cells in a diameter of 55 μm. By treating each feature individually, we can apply ssGSEA (29) and ssGSEA-based analysis, such as ESTIMATE (25), to accurately define regions of tumor, stroma, and immune infiltrate. Spatially mapped ESTIMATE signatures were comparable with a pathologist's manual annotations in identification of tumor regions. In addition, signatures pertaining to cell identity markers or signaling pathways provide further insight that cannot be deduced histologically. To summarize gene expression profiles within a sample, we move to cluster-level annotation. Clustering analysis revealed several transcriptionally distinct tumor and stromal cell populations within each sample, representing multiple TNBC subtypes and oncogenic processes. This is consistent with several single cell RNA-seq (scRNA-seq) studies of TNBC that have revealed the high degree of subclonal heterogeneity in tumors (52–54).
Finally, to obtain the most comprehensive view of ST profiles in TNBC, we perform an integrative analysis to define and annotate nine shared ICs represented in all samples. Although previous studies have generated breast cancer cell atlases from scRNA-seq data (55), the same has not been done with ST data alone, as most ST technologies do not allow single-cell resolution. With this constraint in mind, we designed a study that could lay the groundwork for developing such ST tumor atlases. Our reference dataset comprising 38,706 spatial features is the largest integrated ST dataset to be described, to the best of our knowledge. Furthermore, we validate the applicability of 9 ICs in an additional 17,861 features, representing 15 independent samples. Therefore, despite the resolution limitations of ST, our study utilizes a moleculary-rich TNBC cohort, and provides the unique opportunity to interrogate the TNBC tumor and immune microenvironment in an unbiased manner through informatics-based molecular annotation coupled with spatial analysis.
When considering spatial statistics, it is essential to apply the appropriate analysis for the data type and structure. When ICs were mapped spatially on individual samples, our data took the form of nominal labels in a latticed arrangement. To describe spatial relationships within this scheme, we make novel use of JCA, which, to our knowledge, has not previously been applied to ST data. We demonstrate an approach to make ST data derived from the 10x Genomics Visium platform compatible with JCA through rasterization. Subsequent analysis revealed the direction and degree of spatial dependency between ICs in all samples. Interestingly, several tumor cluster pairs were consistently nonrandomly dispersed. This, in addition to the strong spatial autocorrelation of each IC, may reflect compartmentalized clonal evolution and cellular substructure (56). We also observed dispersion between IC3, a stromal-immune cluster, and several tumor clusters, indicating that immune content is largely sequestered in the fibrotic compartment in TNBC.
The spatial incompatibility between hypoxic cells (IC5) and the primary immune cluster (IC3) is particularly noteworthy. A growing body of evidence supports both active and passive roles for tumor hypoxia in immune evasion (49, 57, 58). Indeed, hypoxic regions in our samples exhibited greater lymphocyte deficiency than other tumor regions. Hypoxia itself is associated with metastasis and radiotherapy resistance (47, 59), and exerts a selective pressure for more aggressive tumor clones (60). Hypoxia is therefore mechanistically significant to disease progression, and its association with immunosuppression has further implications for the use of immunotherapy in TNBC treatment.
Our finding that IC5 was overrepresented in AA samples, whereas IC3 was overrepresented in Caucasian samples, supports the notion that the tumor microenvironment differs in AA and Caucasian breast cancer (61). It has been reported that AA breast cancers have greater densities of microvessels and tumor-associated macrophages (TAM; refs. 62, 63), which have recently been linked to aggressive disease and reduced survival in breast cancer (64). This is consistent with our annotation of IC5, which we found to have a higher proportion of TAMs to lymphocytes. Furthermore, TAMs in AA breast cancers favor the protumorigenic M2 phenotype and have greater proliferative capacity (63). Interestingly, these findings resulted from studies utilizing laser-capture microdissection and manual histologic annotation of tumor tissue. In contrast, a study utilizing bulk RNA-seq data found no significant difference in the general immunological profiles of AA and Caucasian TNBC (65). Because the relevant cell populations are small, the use of bulk RNA-seq data may obscure results. The proportion of IC3 in our study samples varied between 6% and 23%. Thus, detection of subtle differences in these cell populations may require additional studies with larger specimens.
Our results suggest that NDRG1 expression is representative of the hypoxic expression profile in AA tumors, whereas IGKC expression represents the immune-supportive environment in Caucasian tumors. By calculating the ratio of expression between the two, we approximate the balance between hypoxia and immune infiltrate in our study cohort. We illustrate that a high ratio of NDRG1 to IGKC is associated with race and with significantly reduced survival in our cohort. Our survival analysis was validated in publicly available datasets of basal-like breast cancer. Although basal-like is a molecular subtype of breast cancer that is not interchangeable with histologically defined TNBC (66, 67), there is large overlap between the two (3, 50), and both disproportionately affect AA women (7, 68). Thus, these results suggest a link between differences in the tumor microenvironment and disparities in TNBC clinical outcomes.
It is important to acknowledge several variables present in our study. First, clinical disparities in TNBC outcome are multifactorial, influenced by both biological factors and broader disparities (8). Although we control for patient age and tumor stage in our cohort, we do not consider factors such as socioeconomic or environmental variables. Although ST technology is robust, its relatively low throughput does not lend itself well to controlling for these multiple factors. In addition, a minority of samples in our reference cohort received neoadjuvant therapy, which can alter tumor biology. However, because we find that all ICs are represented in all samples, we may be capturing heterogeneity at a level independent of the effects of neoadjuvant therapy. Furthermore, most validation cohort samples had received neoadjuvant treatment; we were also able to map and validate ICs in that cohort. Nevertheless, these are variables to be controlled for in larger studies.
The promise of ST lies in its ability to provide unprecedented insight into the compositions and interactions of cells within the tumor. We have combined functional molecular annotation with spatial analysis to profile the tumor architecture of TNBC. Our findings suggest a conserved spatio-transcriptional architecture in TNBC, despite intratumoral and stromal heterogeneity within individual samples. Furthermore, our results indicate that hypoxia may considerably influence the tumor environment in AA TNBC. Pending validation in larger cohorts, this raises the possibility of evaluating the utility of hypoxia-targeting strategies, such as hypoxia-activated prodrugs (69) or hypoxia-targeted nanoparticles (70). We also found that hypoxic regions in AA samples were associated with disproportionately high M0 macrophage content, which may support the evaluation of macrophage reprogramming-based therapeutic strategies, some of which are undergoing clinical trials (71). Thus, our study provides novel insights into the relationship between tumor biology and racial disparities in TNBC, and could ultimately lead to improved outcomes for this clinically challenging disease.
R. Bassiouni reports grants from AACR/Genentech during the conduct of the study. J.D. Carpten reports grants from NCI during the conduct of the study; and also presented data related to a different study at a 10x Genomics sponsored symposium. No disclosures were reported by the other authors.
R. Bassiouni: Conceptualization, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. M.O. Idowu: Conceptualization, resources, data curation, investigation, writing–review and editing. L.D. Gibbs: Conceptualization, resources, data curation, investigation, writing–review and editing. V. Robila: Resources, data curation, investigation, writing–review and editing. P.J. Grizzard: Resources, data curation, formal analysis, writing–review and editing. M.G. Webb: Software, formal analysis. J. Song: Software, formal analysis. A. Noriega: Resources, formal analysis, supervision, funding acquisition, writing–review and editing. D.W. Craig: Conceptualization, resources, supervision, funding acquisition, project administration, writing–review and editing. J.D. Carpten: Conceptualization, resources, supervision, funding acquisition, project administration, writing–review and editing.
Funding for the research project was provided by NIH-NCI Cancer Center Support Grant P30 CA014089 (to D.W. Craig, J.D. Carpten), NIH-NCI CARE2 PACHE U54 CA233465 (to D.W. Craig and J.D. Carpten), Tower Cancer Research Foundation (to R. Bassiouni and J.D. Carpten), and AACR/Genentech Cancer Disparities Fellowship 20–40–18-BASS (to R. Bassiouni). Services in support of the research project were provided by the VCU Massey Cancer Center Tissue and Data Acquisition and Analysis Core, supported, in part, with funding from NIH-NCI Cancer Center Support Grant P30 CA016059. Sequencing for the project was performed by the Keck Genomics Platform and the Molecular Genomics Core at the University of Southern California, supported, in part, with funding from NIH-NCI Cancer Center Support Grant P30 CA014089. Microscopy for the project was partially performed by the Optical Imaging Facility at the Eli and Edythe Broad CIRM Center at the University of Southern California. The authors particularly thank Dr. Seth Ruffins for his assistance in image acquisition.
The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).