Triple-negative breast cancer (TNBC) is an aggressive disease that disproportionately affects African American (AA) women. Limited targeted therapeutic options exist for patients with TNBC. Here, we employ spatial transcriptomics to interrogate tissue from a racially diverse TNBC cohort to comprehensively annotate the transcriptional states of spatially resolved cellular populations. A total of 38,706 spatial features from a cohort of 28 sections from 14 patients were analyzed. Intratumoral analysis of spatial features from individual sections revealed heterogeneous transcriptional substructures. However, integrated analysis of all samples resulted in nine transcriptionally distinct clusters that mapped across all individual sections. Furthermore, novel use of join count analysis demonstrated nonrandom directional spatial dependencies of the transcriptionally defined shared clusters, supporting a conserved spatio-transcriptional architecture in TNBC. These findings were substantiated in an independent validation cohort comprising 17,861 spatial features representing 15 samples from 8 patients. Stratification of samples by race revealed race-associated differences in hypoxic tumor content and regions of immune-rich infiltrate. Overall, this study combined spatial and functional molecular analyses to define the tumor architecture of TNBC, with potential implications in understanding TNBC disparities.

Significance:

Spatial transcriptomics profiling of a diverse cohort of triple-negative breast cancers and innovative informatics approaches reveal a conserved cellular architecture across cancers and identify proportional differences in tumor cell composition by race.

Breast cancer—the most commonly diagnosed cancer in women—is a disease spectrum that can be classified into several histological and molecular subtypes (1). Triple-negative breast cancer (TNBC) is a form of breast cancer that does not express estrogen receptor, progesterone receptor, or amplified HER2, and comprises approximately 15% of invasive breast cancers (2). TNBC remains clinically challenging, as it is associated with more aggressive disease, earlier metastasis, higher rates of relapse, and poorer outcomes than other breast cancer subtypes (3). TNBC is known to be a genetically heterogenous disease, with many driver events giving rise to several intrinsic subtypes with varied clinical outcomes (4, 5). This, in addition to the lack of expression of actionable targets, has made development of effective therapies for TNBC difficult (6).

TNBC disproportionately affects African American (AA) women and women of sub-Saharan African ancestry, who have higher incidence, greater TNBC-related mortality, and poorer 5-year survival rates compared with Caucasian women (7–9). The association of TNBC risk with African ancestry suggests a genetic component to the disparity (10), and genomic studies have identified unique characteristics of AA, African, and Caucasian breast cancers (11, 12). However, the resulting effects on tumor biology are poorly understood. Thus, descriptive and clinical studies of TNBC must prioritize diverse representation in study cohorts, which has historically been inadequate (13).

Advanced molecular profiling provides an opportunity to further our understanding of disease biology and therapeutic vulnerabilities through comprehensive characterization of tumor specimens. Transcriptomic profiling technologies have evolved to allow the measurement of comprehensive gene expression from single cells, which can resolve the composition of cell types within the tumor microenvironment to reveal the extent of tumor heterogeneity (14). Further advancement now allows for spatially resolved whole-transcriptome measurements, where tissue architecture is maintained and provides critical context for higher-dimensional data interpretation (15, 16). Spatial transcriptomics (ST) technologies are particularly suitable for the study of solid tumors, which develop in a complex environment where communication between malignant cells, immune cells, fibroblasts, and vasculature can influence disease progression (17). For instance, cellular interactions in breast tumors can take the form of metabolic cross-talk (18), immune cross-talk (19), and subclonal communication (20–22). Preservation of spatial context provides a critical framework in which to dissect these relationships.

In this study, we applied ST to comprehensively describe the transcriptional substructure and heterogeneity defining the cellular architecture of TNBC. We performed manual and expression-informed molecular annotation of 28 TNBC tissue sections from 14 patients, of whom 7 self-identified as AA and 7 as Caucasian. We then utilized clustering analysis to summarize and assess transcriptionally defined intratumoral heterogeneity. We further integrated expression data from all samples to generate a reference, from which, we defined 9 shared clusters represented across the entire dataset. Molecular annotation tools were used to understand the biological context of cells represented by transcriptionally distinct clusters. Moreover, to analyze spatial relationships between clusters, we made novel use of join count analysis (JCA), which revealed consistent patterns of spatial exclusion and dependency in TNBC. We validated our identified reference clusters and their spatial relationships in an independent cohort of 15 additional samples from eight AA patients. Finally, functional annotation of shared clusters revealed racial differences in the expression of clinically significant immune- and hypoxia-related genes, which may shed light on racial disparities in TNBC clinical outcomes.

Sample selection

De-identified human breast tissues were procured from the Virginia Commonwealth University (VCU) Massey Cancer Center Tissue and Data Acquisition and Analysis Core under a protocol approved by the VCU Institutional Review Board (VCU IRB HM2471). All patients provided written informed consent. Specimens were obtained during standard-of-care procedures and banked when judged to be in excess of that required for patient diagnosis and treatment. Fresh samples procured during routine gross examination in surgical pathology were deposited in 2 mL cryovials, snap-frozen in liquid nitrogen, then stored at −80°C. Tissue samples were embedded in Scigen Tissue Plus O.C.T. Compound (Thermo Fisher Scientific) prior to cryosectioning.

All samples selected for study were primary TNBC tumors collected between the years of 2002 and 2019. Samples were divided into two cohorts: a reference cohort of 28 samples and a validation cohort of 15 samples. Reference cohort samples were obtained from 14 patients with TNBC, 7 of whom identified as Black or AA and 7 who identified as Caucasian or Non-Hispanic White. Every effort was made to stage-match and age-match the samples, with age <50 years and stage II tumors preferred. Eleven of fourteen tumors represented in the reference cohort did not receive neoadjuvant treatment; these patients may have declined therapy or were not candidates for neoadjuvant treatment at the time of collection. Validation cohort samples were obtained from 8 patients, all of whom were AA, and varied from stage II to stage III. Six of eight tumors represented in the validation cohort received neoadjuvant treatment.

Two frozen blocks representing nonadjacent tumor sections were available for all patients, with the exception of three cases with limited tissue available, for which adjacent sections were provided, and one case for which a single section was provided. Sections were randomly sampled from frozen blocks to reduce selection bias. Distances between paired nonadjacent sections varied on the basis of the size of the tumor and its handling at the time of banking; most sections sampled the original tumor centrally. Full specimen information is available in Supplemental File S1. In total, we applied ST to 43 tissue sections, representing 22 tumor specimens.

Visium spatial gene expression assay and sequencing

Fresh frozen, OCT-embedded tissues were cryosectioned at a thickness of 10 μm and mounted on Visium spatial gene expression slides (10x Genomics, #1000184), which contain four 6.5 mm × 6.5 mm capture areas comprising 5,000 barcoded spatial features each. The samples were processed as described in the manufacturer's protocols. Briefly, tissues were stained with hematoxylin and eosin (H&E) and microscopic images were captured with a Zeiss Axioscan2 microscope using a 10× objective. Tissues were then permeabilized, allowing RNA to diffuse and bind to the slide surface via polyA capture. Permeabilization was performed for 9 minutes, which was determined using the Spatial Tissue Optimization procedure (10x Genomics, #1000193). Following permeabilization, cDNA was synthesized on-slide from the immobilized RNA. cDNA was then collected and used to generate sequencing libraries containing unique sample indices, as well as P5 and P7 primers for Illumina-compatible sequencing. Paired-end sequencing (2 × 100) was performed on an Illumina NovaSeq 6000 instrument to produce a minimum of ∼250 million read pairs per sample (Supplementary Fig. S1).

Data preprocessing and normalization

Sequencing data were preprocessed using the Space Ranger pipelines (version 1.1.0; 10x Genomics). BCL data were demultiplexed and converted to FASTQ format using the spaceranger mkfastq pipeline. The spaceranger count pipeline was then used for read alignment to human reference genome GRCh38, unique molecular identifier (UMI) counting, and generation of feature-spot matrices corresponding to the microscopic tissue image. The pipeline also performs automatic tissue detection and fiducial alignment from the image. The raw UMI counts and spatial barcodes were imported to the R program Seurat (RRID:SCR_016341, version 4.0.4; ref. 23) with the Load10x_Spatial command. Gene expression count data were then normalized using the SCtransform method from the R package sctransform (RRID:SCR_022146, version 0.3.2; ref. 24). We chose this method due to its ability to correct for technical read depth variation while preserving biological variation, which is often present in ST data due to the varying cellularity across a tissue section (24).

Histopathologic annotation

Frozen tissues from each block selected for study were stained with H&E and evaluated for tumor adequacy. The images captured from Visium slides were reviewed and annotated by a pathologist, using Adobe Photoshop software (RRID:SCR_014199). For each slide, areas of carcinoma were delineated from the intervening and surrounding desmoplastic stromal tissue. The degree of tumoral and stromal inflammatory cell infiltration was also assessed and reported quantitatively. Where applicable, areas of necrosis, dense immune infiltrates, or background stromal fibrosis or desmoplasia were indicated.

ESTIMATE analysis

ESTIMATE analysis was performed with the R package estimate (version 1.0.13; ref. 25). ESTIMATE utilizes gene signatures derived from immune and stromal cells to calculate two corresponding enrichment scores: immune score and stromal score. Tumor purity is then computed from these scores as described by Yoshihara and colleagues (25). ESTIMATE was applied to normalized expression data from each spatial feature. The analysis returned a stromal score, immune score, and tumor purity for each feature, allowing visualization on a spatial map or as a violin plot.

Dimensionality reduction and clustering analysis

Normalized counts were subjected to principal component analysis (PCA) through 30 principal components. A shared nearest neighbor graph was generated, and clustering was performed with the Louvain modularity optimization algorithm at a resolution of 0.4. This resolution produced a stable set of clusters without over- or underclustering the data (Supplementary Figs. S2A and S2B). Resolution determination was aided by the R package clustree (RRID:SCR_016293, version 0.5.0; ref. 26), and by evaluation of ESTIMATE scores, UMAP projections, and spatial profiles of clusters. Dimensionality reduction was performed by Uniform Manifold Approximation and Projection (UMAP). Seurat was used for all clustering and dimensionality reduction analyses.

TNBC subtype determination

Subtype determination at the cluster level was performed with the TNBCtype webtool, which classifies samples based on centroid correlation to six TNBC subtypes (27). Normalized gene expression was averaged across all features in each cluster and provided as input. All reported results had a P value <0.05.

Gene set enrichment analysis

Gene set enrichment analysis (GSEA) was applied at both the feature level and the cluster level using the Broad Institute's GenePattern software (RRID:SCR_003199; ref. 28). At the feature level, single sample GSEA (ssGSEA; ref. 29) was applied to expression data from each spatial feature, using gene sets from the Hallmarks collection (30) and infiltrating immune cell signatures defined by Yu and colleagues (31) to yield feature enrichment scores. To apply GSEA at the cluster level, differentially expressed cluster markers were first determined by Wilcoxon rank sum testing, using Seurat FindAllMarkers feature. Gene lists pertaining to each cluster were analyzed by preranked GSEA against the Hallmark gene sets.

CIBERSORTx analysis

Deconvolution of immune content in clusters was performed with CIBERSORTx (RRID:SCR_016955; ref. 32). Normalized gene expression was summed across all features in each cluster to create a pseudo-bulk profile. This was provided as a mixture file and analyzed against the LM22 immune cell signatures (33). The analysis was run in absolute mode with B-mode batch correction at 500 permutations.

Data integration of reference cohort

Integration of transcriptional data from all 28 reference cohort samples was performed with Seurat using the method described by Stuart and colleagues (34) with the goal of identifying shared cell populations. Briefly, the method identifies transcriptionally similar anchor cells in all samples, which are used to correct technical batch effect and integrate the datasets. After individual SCTransform normalization and PCA of samples, 3,000 integration features (genes) were determined for use in anchor identification. Because of the large size of our dataset, use of a canonical correlation analysis—which determines anchors between all possible sample pairings—for anchor determination proved to be computationally prohibitive. We instead used reciprocal PCA, with sample 092A as the reference. The anchors identified were then used for integration. PCA was performed on the integrated data and integrated clusters (IC) were defined at a clustering resolution of 0.4.

Label transfer to validation cohort

IC assignments were transferred to the validation cohort utilizing Seurat's reference mapping feature. Following normalization, each sample in the validation cohort was individually queried against the integrated reference to find transcriptionally aligned anchors, using canonical correlation analysis. Identified anchors were then used to project IC labels from the reference onto the query sample.

Join count statistics

JCA is a statistical method for quantifying spatial correlation of contiguous nominal data. For application to our ST data, spatial maps were converted to a raster format, in which barcodes were represented by pixels coded by cluster assignment, as follows.

Spatial coordinates for each barcode were retrieved and annotated with the corresponding IC assignment. Using the R package raster (version 3.5–2), a rectangular raster was created with an extent encompassing the area of the tissue, and a resolution that captured one barcode per pixel with no empty space between. The resolution varied between samples, likely due to slight variations in the printing of the capture features on the slides. Each IC was then subset and rasterized individually using the raster parameters previously determined. Rasterizing ICs individually allowed each to be given a unique identifier coded in the resulting pixels. After each rasterization, a check was performed to ensure the number of pixels matched the number of barcodes in the subset. All rasterized IC subsets were then merged to recreate the full tissue section as a raster.

The rasterized sample was then compatible with JCA. Using the R package spdep (RRID:SCR_019294, version 1.1–11; ref. 35), a neighbors list was created in the “queen” style, which identifies adjacent and diagonal neighbors for each feature. The resulting neighbors list was then subset to remove empty background space, as well as single features with no neighbors (e.g., features removed from the main tissue). The neighbors list was then supplemented with binary spatial weights, and multicategorical JCA was performed with the command joincount.multi, which tabulates the joins of all possible combinations of cluster pairings within the sample. The test returns the observed join counts, the expected count under conditions of spatial randomness, and the variance of observed to expected calculated under nonfree sampling. The z score is then calculated as the difference between observed and expected counts, divided by the square root of the variance. Since this analysis does not correct for multiple testing, we selected a stringent z score cutoff of ±3 (equivalent to a P value of 0.00135, or 99.99% confidence) to assess significance.

We developed an R Bioconductor package, stJoincount (https://github.com/Nina-Song/stJoincount), to facilitate JCA of ST data produced by the 10x Genomics Visium platform. The program finds optimal parameters for sample rasterization, and performs JCA followed by result visualization.

Survival analysis

Relapse-free survival in our reference cohort was defined as days to first recurrence or days to death, if no recurrence first occurred. If no death event was recorded, patients were considered censored at the date of their last clinical encounter. Survival data can be found in Supplemental File S1. NDRG1 and IGKC expression were averaged across all features in each sample to produce a summary expression value. Cutoff values to differentiate low from high expression were determined with the webtool CutoffFinder (36) to optimize the significance of correlation with survival. Survival analysis was then performed with R package survival (RRID:SCR_021137, version 3.3–1; ref. 37). Significance was determined by Cox–Mantel log-rank test.

Survival analysis of publicly available breast cancer microarray-based expression datasets was performed using the web-based Kaplan–Meier Plotter tool (RRID:SCR_018753; ref. 38). Datasets queried are listed in Supplemental File S2. NDRG1 expression was examined with Probe ID 200632_s_at, and IGKC with Probe ID 211643_x_at. Analysis was performed with NDRG1 and IGKC individually, and with the ratio of NDRG1 expression (as the numerator) to IGKC expression (as the denominator). Optimal cutoff values to define high and low expression were determined on the basis of an iterative approach as described by Lanczky and colleagues (38). Biased arrays and redundant samples were removed from the analysis, resulting in a total cohort size of 4,929 patients. The cohort was analyzed as a whole, and was also subset into PAM50 molecular subtypes: basal (n = 953), luminal A (n = 1809), luminal B (n = 1353), and HER2+ (n = 695). Significance was determined by Cox–Mantel log-rank test.

Data availability

Expression data and tissue images for all samples are available in the Gene Expression Omnibus (GEO) under record GSE210616. Datasets analyzed for survival analysis are all publicly available and are listed in Supplemental File S2.

Sample selection and ST

We selected 14 primary TNBC tumors for study from 14 patients, including 7 that self-identified as AA and 7 that identified as Caucasian. Most tumors in this reference cohort were histopathologically classified as stage II and 11 of 14 were chemo-naïve (Supplementary File S1). Of the samples that received neoadjuvant chemotherapy, 2 were AA and 1 was Caucasian. From each tumor, we obtained two nonadjacent tissue sections, except for one case in which we used serial sections due to limited material. Thus, this reference cohort was comprised of 28 TNBC tissue sections, to which we applied ST.

The ST workflow is described in Materials and Methods and summarized in Fig. 1. Following sequencing, all data proceeded through a pipeline for demultiplexing, genome alignment, and spatial alignment. Data were quality assessed and high-quality data were obtained from all samples (Supplementary Fig. S1). On average, each sample produced expression data from 1,382 individual spatial features, and 22,855 genes were detected on average per section. Expression data for each sample was normalized using SCTransform to best correct for technical artifacts while preserving biological variation (24). All subsequent analysis was conducted with the resulting normalized expression values.

Figure 1.

Overview of ST workflow and study design. A, An overview of the 10x Genomics Visium ST workflow, beginning with affixation of tissue to Visium spatial gene expression slides and ending with sequencing data processing and analysis. B, An overview of the study design, including analysis strategies for a reference cohort of 28 samples, and a validation cohort of 15 samples. (Created with BioRender.com.)

Figure 1.

Overview of ST workflow and study design. A, An overview of the 10x Genomics Visium ST workflow, beginning with affixation of tissue to Visium spatial gene expression slides and ending with sequencing data processing and analysis. B, An overview of the study design, including analysis strategies for a reference cohort of 28 samples, and a validation cohort of 15 samples. (Created with BioRender.com.)

Close modal

Manual and gene signature–informed sample annotation

We first analyzed whether gene expression data could be used for discernment of carcinoma from surrounding tissue. For comparison, H&E-stained images of each sample were manually annotated by a board certified anatomic pathologist (Fig. 2A), which revealed that sections were dominated by regions of tumor and fibrosis. Less common were regions of necrosis, adipose tissue, and immune infiltrate (Supplementary Fig. S3A). We then applied ESTIMATE to normalized expression data to concordantly define the transcriptional substructure of each sample. ESTIMATE was developed and validated on bulk transcriptional data to infer tumor, stromal, and immune content using gene expression signatures (25). Application of ESTIMATE at the feature level closely mirrored manual histological annotation. In representative sample 094D (Fig. 2A), ESTIMATE's stromal score highlighted extensive regions of fibrosis (Fig. 2B; Supplementary Fig. S3B) and tumor purity score accurately identified regions of tumor (Fig. 2B and C).

Figure 2.

Transcriptional estimation of tumor purity approximates expert pathologic annotation. A, H&E-stained images of sample 094D. Pathologist's annotation highlights extensive regions of fibrosis. B, Expression data per spatial feature was analyzed using ESTIMATE to generate a tumor purity score, stromal score, and immune score. C, Violin plot of tumor purity scores of features in the regions identified as tumor and fibrosis by manual annotation of sample 094D. D, Pathologist's annotation of sample 120D, which contained a region of dense lymphoid infiltrate in addition to tumor, fibrotic, and necrotic regions. E, Per-feature ESTIMATE analysis of sample 120D. F, Comparison of pathologist-assigned identity to tumor purity score for sample 120D. G, Magnified region of immune infiltrate in sample 120D, with an arrow indicating dense outer edge. H, Per-feature enrichment scores for several tumor-associated immune cell signatures (31) as determined by ssGSEA. I, Violin plot of tumor purity profiles of all 28 samples. Samples are colored by patient ID; two samples were available for each patient. J, Scatter plot of immune and stromal scores per feature in all samples, with a correlation coefficient of r = 0.7. Data points are colored by tumor purity, which is inversely related to immune and stromal scores.

Figure 2.

Transcriptional estimation of tumor purity approximates expert pathologic annotation. A, H&E-stained images of sample 094D. Pathologist's annotation highlights extensive regions of fibrosis. B, Expression data per spatial feature was analyzed using ESTIMATE to generate a tumor purity score, stromal score, and immune score. C, Violin plot of tumor purity scores of features in the regions identified as tumor and fibrosis by manual annotation of sample 094D. D, Pathologist's annotation of sample 120D, which contained a region of dense lymphoid infiltrate in addition to tumor, fibrotic, and necrotic regions. E, Per-feature ESTIMATE analysis of sample 120D. F, Comparison of pathologist-assigned identity to tumor purity score for sample 120D. G, Magnified region of immune infiltrate in sample 120D, with an arrow indicating dense outer edge. H, Per-feature enrichment scores for several tumor-associated immune cell signatures (31) as determined by ssGSEA. I, Violin plot of tumor purity profiles of all 28 samples. Samples are colored by patient ID; two samples were available for each patient. J, Scatter plot of immune and stromal scores per feature in all samples, with a correlation coefficient of r = 0.7. Data points are colored by tumor purity, which is inversely related to immune and stromal scores.

Close modal

In independent representative case 120D, manual annotation identified fibrotic and necrotic regions, in addition to a region of dense lymphoid infiltrate (Fig. 2D). Once again, ESTIMATE scores corresponded with the histopathological assessment (Fig. 2E and F; Supplementary Fig. S3C). To further characterize the identity of infiltrating immune cells, we applied a set of cell-type specific expression signatures derived from tumor-associated immune cells (31). The spatial pattern of T-cell enrichment matches a dense ring of lymphoid cells, whereas the remainder of the region strongly expresses a macrophage signature (Fig. 2G and H). We can thus apply ESTIMATE and cell-type-specific gene signatures to ST data to quantify tumor and immune content, and augment manual annotation.

Samples in the reference cohort varied in their tumor content, with samples originating from the same patient displaying similar tumor purity profiles (Fig. 2I) and ESTIMATE scores (Supplementary Figs. S3D and S3E). UMAP embeddings also reflect the general transcriptional similarity of paired samples (Supplementary Fig. S3F). Furthermore, the immune score and stromal score were closely correlated in all samples, with both enriched in histologically defined regions of fibrosis (Fig. 2J).

Clustering reveals intratumor heterogeneity

TNBC is genetically heterogeneous (1), which is illustrated when ST data are subject to unsupervised clustering within single specimens. Clustering of spatial features at low-resolution broadly allows discernment of tumor and stromal regions in agreement with histologic and informatics-based annotation (Supplementary Figs. S2A and S2B). Increasing the resolution produces more clusters, revealing transcriptionally distinct subregions of tumor and stroma (Supplementary Figs. S2A and S2B). We determined a resolution of 0.4 to capture maximum sample heterogeneity without overclustering the data (Supplementary Fig. S2A). At this resolution, samples contained an average of four tumor clusters and one nontumor cluster (Supplementary Fig. S2C).

Clustering analysis of sample 094D based on expression data produced six clusters (Fig. 3A and B). ESTIMATE analysis revealed three clusters (2, 5, and 6) with high tumor purity, whereas cluster 4 demonstrated low tumor purity and represents fibrosis (Fig. 3C and D). Determination of TNBC subtypes defined by Lehmann and colleagues (39) for each cluster revealed subtype mixing within the sample (Fig. 3E). Basal like-1 (BL1) and mesenchymal (M) subtypes were represented in tumor regions, whereas cluster 4 was classified as mesenchymal stem like (MSL), a subtype that represents tumor-associated stromal cells (40). Cluster 1, which separates regions of tumor from regions of fibrosis was classified as immunomodulatory (IM), which reflects the presence of tumor-infiltrating lymphocytes (40).

Figure 3.

Clustering reveals transcriptionally distinct regions of tumor and stroma. A, Graph-based clustering of sample 094D resulted in six transcriptionally distinct clusters. B, UMAP projection of sample 094D, colored by cluster assignment. C, Tumor purity profiles of clusters identified in sample 094D. D, UMAP projection of sample 094D, colored by tumor purity score. E, TNBC subtypes assigned to each cluster. F, Heatmap of top differentially expressed genes per cluster for sample 094D. Colors represent the Pearson residuals of normalized and variance-stabilized expression data (24). Each column represents a single spatial feature, and features are grouped by their cluster assignment. G, Spatial mapping of representative marker genes for clusters in sample 094D. Color scale represents log-transformed normalized gene expression. H, GSEA for clusters of sample 094D against Hallmark collection gene sets. Only results with a FDR < 0.05 are displayed. NES, normalized enrichment score. I, Spatially mapped feature-level ssGSEA scores for selected Hallmark gene sets. Color scale represents enrichment scores.

Figure 3.

Clustering reveals transcriptionally distinct regions of tumor and stroma. A, Graph-based clustering of sample 094D resulted in six transcriptionally distinct clusters. B, UMAP projection of sample 094D, colored by cluster assignment. C, Tumor purity profiles of clusters identified in sample 094D. D, UMAP projection of sample 094D, colored by tumor purity score. E, TNBC subtypes assigned to each cluster. F, Heatmap of top differentially expressed genes per cluster for sample 094D. Colors represent the Pearson residuals of normalized and variance-stabilized expression data (24). Each column represents a single spatial feature, and features are grouped by their cluster assignment. G, Spatial mapping of representative marker genes for clusters in sample 094D. Color scale represents log-transformed normalized gene expression. H, GSEA for clusters of sample 094D against Hallmark collection gene sets. Only results with a FDR < 0.05 are displayed. NES, normalized enrichment score. I, Spatially mapped feature-level ssGSEA scores for selected Hallmark gene sets. Color scale represents enrichment scores.

Close modal

We determined expression markers for each cluster by differential expression, and spatial visualization of these genes confirms their discrete localization (Fig. 3F and G). To better understand the biological context of cells represented by each cluster, we applied GSEA to identify molecular pathways enriched among each cluster (Fig. 3H). In agreement with ESTIMATE analysis, cluster 4 was enriched in immune-related pathways, whereas tumor clusters were enriched in various oncogenic and metabolic pathways. For example, GSEA implies the activation of MYC and cell-cycle signaling driving cancer cells within cluster 5, whereas cluster 6 likely includes tumor cells that are hypoxic and glycolysis dependent. Furthermore, the results of cluster-level GSEA are consistent with feature-level enrichment analysis when plotted spatially (Fig. 3I).

Dataset integration and identification of shared cell populations

We next sought to perform clustering of all features across all samples to investigate shared TNBC spatio-transcriptional characteristics. To do this, we integrated expression data from all features to consolidate the natural deviation across tumors from multiple individuals with varying genetic backgrounds (Supplementary Fig. S4). Integration allows us to use diverse individual datasets to create a comprehensive reference from which common transcriptionally defined TNBC cell types can be annotated (34). The resulting integrated dataset comprised 38,706 spatial features, with nearly equal representation of AA and Caucasian features (19,401 and 19,305, respectively).

The integrated data were subject to clustering analysis to produce nine ICs (Fig. 4A). Cluster assignments could then be mapped back to features in individual samples to assess the substructure of clusters across cases and sections (Fig. 4B). ESTIMATE analysis indicated that IC3, IC6, and IC7 represent likely stromal-immune cell types, whereas the remaining clusters are defined by tumor cell types, with varying degrees of tumor purity (Fig. 4C). Importantly, all ICs were represented in all individual samples in varying proportions (Fig. 4D).

Figure 4.

Clustering of an integrated dataset reveals shared cell populations. A, Graph-based clustering analysis of the integrated reference dataset resulted in nine ICs, depicted here on the integrated UMAP embeddings. B, ICs mapped back to individual representative samples. C, Ridgeline plot of ESTIMATE scores for all ICs, calculated for all corresponding features in the reference dataset. The x-axis represents score and increases from left to right. D, Bar graph representing the distribution of the nine ICs across the individual samples of the dataset. E, Heatmap of top differentially expressed marker genes of the ICs across the integrated dataset. Color scale represents the Pearson residuals of normalized and variance-stabilized expression data (24). F, Expression of selected marker genes represented on the integrated UMAP embedding, as well as spatially in individual samples. Color scales represent Pearson residuals. G, GSEA of ICs using the Hallmark gene sets. Only results with a FDR < 0.05 are displayed. H, CIBERSORTx was used to deconvolute immune mixtures in each ICs. Absolute CIBERSORTx scores are represented on the y-axis, and immune identities of selected cell types are differentiated by color.

Figure 4.

Clustering of an integrated dataset reveals shared cell populations. A, Graph-based clustering analysis of the integrated reference dataset resulted in nine ICs, depicted here on the integrated UMAP embeddings. B, ICs mapped back to individual representative samples. C, Ridgeline plot of ESTIMATE scores for all ICs, calculated for all corresponding features in the reference dataset. The x-axis represents score and increases from left to right. D, Bar graph representing the distribution of the nine ICs across the individual samples of the dataset. E, Heatmap of top differentially expressed marker genes of the ICs across the integrated dataset. Color scale represents the Pearson residuals of normalized and variance-stabilized expression data (24). F, Expression of selected marker genes represented on the integrated UMAP embedding, as well as spatially in individual samples. Color scales represent Pearson residuals. G, GSEA of ICs using the Hallmark gene sets. Only results with a FDR < 0.05 are displayed. H, CIBERSORTx was used to deconvolute immune mixtures in each ICs. Absolute CIBERSORTx scores are represented on the y-axis, and immune identities of selected cell types are differentiated by color.

Close modal

Assessment of marker genes revealed further characteristics of ICs (Fig. 4E). The most widely represented tumor clusters, IC1 and IC2, are enriched in ribosomal and mitochondrial genes, respectively. We interpret these as archetypal characteristics of highly metabolic tumors. The remaining tumor ICs have more distinctive expression patterns. For example, IC5 expressed high levels of NDRG1, a hypoxia-inducible gene (41) that has been associated with breast cancer progression and brain metastasis (42). IC9 distinctly expressed the antiviral gene IFIT1, which has been studied in breast cancer as a predictive marker for chemotherapy response (43). The nontumor clusters also expressed distinguishing markers: IC3 was enriched in immunoglobulin genes, and IC7 in collagens. Gene expression mapping on individual samples confirms the colocalization of marker genes with their respective ICs (Fig. 4F).

GSEA of IC expression data revealed enrichment of a hypoxic signature in IC5, whereas IC9 was distinctly enriched for genes involved in interferon α and γ response pathways (Fig. 4G). IC1 and IC4 were enriched in several cell-cycle-promoting pathways and likely represent actively dividing tumor cells. On the other hand, IC5 may reflect dormant cells with inactive E2F signaling and G2M checkpoint signaling, consistent with a hypoxic environment. In agreement with ESTIMATE analysis, IC3, IC6, and IC7 were enriched in immune signaling pathways, whereas tumor clusters IC1, IC2, and IC5 were deficient in these (Fig. 4G).

To further explore the immune effector cell composition of ICs, we applied CIBERSORTx [29] to IC expression data. Consistent with ESTIMATE and GSEA, IC3 had the highest overall CIBERSORTx score, suggesting that it contains the highest immune content (Fig. 4H). Notably, tumor cluster IC5 has a high proportion of tumor associated macrophages but is relatively deficient in CD4 and CD8 T cells (Fig. 4H), which is supported by enrichment analysis of lymphocyte signatures (Supplementary Fig. S5A).

Spatial distribution of ICs

Following IC mapping across individual samples, visual inspection suggested that nonrandom spatial patterns exist. To quantitively evaluate spatial correlation between ICs, we employed JCA (Fig. 5A), which is appropriate for determination of the structures of spatially contiguous nominal data of two or more categories (44). In the simplest form of data, in which only two categories are present, JCA determines the degree of dispersion or interspersion of the observations. When applied to multicategorical data, as we have, JCA determines the strength and directionality of the spatial correlation between all category pairings (45).

Figure 5.

Spatial analytics of integrated clusters reveals a consistent molecular topography. A, An overview of JCA. Each sample is subset by IC, and corresponding coordinates are rasterized individually using R package raster. Subsetting allows each IC to be coded uniquely during rasterization. Individual rasters for all nine ICs are then merged to recreate the complete sample. R package spdep is then used to generate a neighbors list with spatial weights for each observation. Join counts for all cluster pairs are then tabulated and compared with a spatially random distribution. B, Mapping of ICs on sample 094B. C, Raster of sample 094B, in which each spatial feature is represented by a pixel coded by IC assignment. D, A region of sample 094B representative of clustering, in which any given pixel is likely to have a similar group of neighbors. E, A region representative of dispersion, in which most pixels have dissimilar neighbors. F, JCA results for sample 094B. For each IC pair, the observed join count is indicated, along with what would be expected under conditions of spatial randomness. G,Z values corresponding to the analysis in F, reflecting the deviation of the observed join count from theoretical randomness. A positive z value indicates spatial clustering, a negative value indicates spatial dispersion, and a value close to 0 indicates no spatial dependency, that is, a near random distribution. H, A summary of z values resulting from JCA of all 28 samples represented in a box plot. The box represents the interquartile range, with the median marked by a vertical line. Outliers are represented by points outside the range of the whiskers. The shaded region represents our selected z-value cutoff of ±3 (as described in Materials and Methods) to assess significance of results. All autocorrelations were strongly positive (z ≥ 10) and are not represented here. I, Spatial mapping and UMAP embeddings of selected IC pairs found to trend towards positive and negative spatial dependency.

Figure 5.

Spatial analytics of integrated clusters reveals a consistent molecular topography. A, An overview of JCA. Each sample is subset by IC, and corresponding coordinates are rasterized individually using R package raster. Subsetting allows each IC to be coded uniquely during rasterization. Individual rasters for all nine ICs are then merged to recreate the complete sample. R package spdep is then used to generate a neighbors list with spatial weights for each observation. Join counts for all cluster pairs are then tabulated and compared with a spatially random distribution. B, Mapping of ICs on sample 094B. C, Raster of sample 094B, in which each spatial feature is represented by a pixel coded by IC assignment. D, A region of sample 094B representative of clustering, in which any given pixel is likely to have a similar group of neighbors. E, A region representative of dispersion, in which most pixels have dissimilar neighbors. F, JCA results for sample 094B. For each IC pair, the observed join count is indicated, along with what would be expected under conditions of spatial randomness. G,Z values corresponding to the analysis in F, reflecting the deviation of the observed join count from theoretical randomness. A positive z value indicates spatial clustering, a negative value indicates spatial dispersion, and a value close to 0 indicates no spatial dependency, that is, a near random distribution. H, A summary of z values resulting from JCA of all 28 samples represented in a box plot. The box represents the interquartile range, with the median marked by a vertical line. Outliers are represented by points outside the range of the whiskers. The shaded region represents our selected z-value cutoff of ±3 (as described in Materials and Methods) to assess significance of results. All autocorrelations were strongly positive (z ≥ 10) and are not represented here. I, Spatial mapping and UMAP embeddings of selected IC pairs found to trend towards positive and negative spatial dependency.

Close modal

The workflow for JCA of ST data are summarized in Fig. 5A and described in Materials and Methods. To apply JCA to our samples, we first rasterize our spatial data while maintaining categorical information (Fig. 5B and C). In the resulting raster, each spatial barcode is represented by a pixel coded by IC assignment. Neighbors for each observation are then identified, and JCA is performed to quantify “joins,” paired categories of neighboring observations. Spatially proximate observations will have many similar neighbors and joins, which would be greater than expected in a random arrangement of features (Fig. 5D). By the same rationale, spatially dispersed observations will manifest as dissimilarity of neighbors and fewer joins than in a random arrangement (Fig. 5E). The observed join count for each pair of observations can then be compared with what would be expected under spatially random conditions (Fig. 5F). The statistical significance of the spatial correlation is represented by the z score, which reflects the degree and directionality of deviation from randomness (Fig. 5G). A positive z score indicates clustering, whereas a negative z score indicates dispersion. Each IC is typically characterized by strong positive autocorrelation, whereas IC pairings vary on a spectrum of spatial dependencies (Fig. 5G).

We applied JCA to all 28 samples, and Fig. 5H summarizes the resulting z scores. We observed consistent patterns in the spatial relationships of several cluster pairs. IC3 and IC7 were positively correlated in nearly all samples. We had previously characterized these as stromal-immune clusters distinguished by unique gene expression programs (Fig. 4E and G). Conversely, IC3 was consistently dispersed from IC1, IC2, and IC5, which represent tumor clusters. Thus, JCA results demonstrate that some TNBC tumor cells are nonrandomly distanced from immunologically rich regions within the tumor microenvironment. The results are reinforced when presented spatially in representative samples and are further reflected in the UMAP embeddings of the integrated dataset (Fig. 5I).

Annotation of an independent validation cohort

We sought to validate the ICs defined from our reference cohort in a validation cohort of 15 additional, independent TNBC samples from eight AA patients. Samples in the validation cohort varied from stage II to III and in tumor purity scores (Fig. 6A). Following data normalization, we annotated the validation cohort via reference mapping to the integrated reference dataset. Each validation sample was individually queried against the reference, and IC labels were transferred for each feature, analogous to reference mapping of single-cell RNA-seq data (34).

Figure 6.

Annotation of a validation cohort with IC labels. A, Tumor purity profiles of 15 samples belonging to the validation cohort. B, Representative sample 395B from the validation cohort, with pathologist's manual annotation. C, Per-feature tumor purity scores for sample 395B. D, IC assignments for sample 395B, which were transferred from the integrated dataset by reference mapping. E, Cumulative tumor purity scores for all samples in the validation cohort after each was individually queried for reference mapping. The x-axis represents transferred IC labels. F, The expression of selected marker genes in the labeled validation cohort. Expression is represented by Pearson residuals. G, JCA was performed on labeled samples of the validation cohort. The box plot represents a summary of the resulting z scores for a subset of cluster pairs that were found to be significant in the reference cohort. H, Spatial validation of cluster pairs with positive and negative spatial dependencies in representative samples of the validation cohort.

Figure 6.

Annotation of a validation cohort with IC labels. A, Tumor purity profiles of 15 samples belonging to the validation cohort. B, Representative sample 395B from the validation cohort, with pathologist's manual annotation. C, Per-feature tumor purity scores for sample 395B. D, IC assignments for sample 395B, which were transferred from the integrated dataset by reference mapping. E, Cumulative tumor purity scores for all samples in the validation cohort after each was individually queried for reference mapping. The x-axis represents transferred IC labels. F, The expression of selected marker genes in the labeled validation cohort. Expression is represented by Pearson residuals. G, JCA was performed on labeled samples of the validation cohort. The box plot represents a summary of the resulting z scores for a subset of cluster pairs that were found to be significant in the reference cohort. H, Spatial validation of cluster pairs with positive and negative spatial dependencies in representative samples of the validation cohort.

Close modal

On an individual basis, transferred IC labels were compared with manual annotations and tumor purity scores for each sample (Fig. 6BD). In general, features assigned to the stromal-immune clusters (IC3, IC6, and IC7) had low tumor purity, and matched regions annotated as fibrosis. When tumor purity scores were cumulatively examined in all 15 samples, the pattern matched that seen in the reference cohort (Figs. 4C and 6E). We further validated that IC marker genes identified in the reference dataset corresponded with the label-transferred identities in the validation set (Fig. 6F). In addition, JCA revealed that all strong spatial dependencies observed in the reference cohort were replicated in the validation samples (Fig. 6G and H). Thus, we observe a common spatio-transcriptional architecture in TNBC.

Racial differences revealed by TNBC ST

Because our reference dataset contained a near-equal number of features from AA and Caucasian samples, we examined whether IC representation varied by race. We found a higher proportion of features representing IC3, IC6, and IC8 in Caucasian samples, whereas features representing IC5, IC7, and IC9 were predominantly contributed by AA samples (Fig. 7A). Furthermore, Caucasian samples contained significantly more IC3 by area, whereas AA samples contained more IC5 by area (Fig. 7B).

Figure 7.

Annotation of ICs in the context of racial disparities. A, The percentage of features in each IC contributed by Caucasian (C) and African American (AA) samples in the cohort. B, The proportion of each IC by total tissue area in each sample was calculated as the number of spatial features belonging to the IC divided by the total number of features in the tissue. IC proportions were compared between AA and Caucsian samples using multiple t test corrected by FDR. C, Immune content of ICs in AA and Caucasian samples as determined by CIBERSORTx. Absolute CIBERSORTx scores are represented on the y-axis. FH, follicular helper; MR, memory resting. D, Boxplots summarizing the immune content determined by IC in individual samples. q values were determined by multiple t testing corrected by FDR. *, q < 0.05; ***, q < 0.001; ****, q < 0.0001. E, Spatial maps of features assigned to IC5 and hypoxia signature scores in representative samples. F, Violin plots of hypoxia enrichment score by race in features with high tumor purity (score > 0.75) and low tumor purity (score < 0.75). n indicates the number of spatial features in each group. P value was determined by Welch two-sample t test.

Figure 7.

Annotation of ICs in the context of racial disparities. A, The percentage of features in each IC contributed by Caucasian (C) and African American (AA) samples in the cohort. B, The proportion of each IC by total tissue area in each sample was calculated as the number of spatial features belonging to the IC divided by the total number of features in the tissue. IC proportions were compared between AA and Caucsian samples using multiple t test corrected by FDR. C, Immune content of ICs in AA and Caucasian samples as determined by CIBERSORTx. Absolute CIBERSORTx scores are represented on the y-axis. FH, follicular helper; MR, memory resting. D, Boxplots summarizing the immune content determined by IC in individual samples. q values were determined by multiple t testing corrected by FDR. *, q < 0.05; ***, q < 0.001; ****, q < 0.0001. E, Spatial maps of features assigned to IC5 and hypoxia signature scores in representative samples. F, Violin plots of hypoxia enrichment score by race in features with high tumor purity (score > 0.75) and low tumor purity (score < 0.75). n indicates the number of spatial features in each group. P value was determined by Welch two-sample t test.

Close modal

Given the disparity in clinical outcomes for AA and Caucasian patients with TNBC, we sought to further investigate both IC3 and IC5. Annotation of IC3 previously revealed a strong immune identity (Fig. 4). Further CIBERSORTx analysis revealed that lymphocyte and macrophage content within IC3 was significantly greater in Caucasian samples, whereas AA samples contained lower immune content overall (Fig. 7C and D). Interestingly, M0 macrophage content was disproportionately high in AA samples, particularly in IC5 (Supplementary Fig. S5B). GSEA revealed IC5 to be distinctly hypoxic and coordinately glycolytic (Fig. 4G), characteristics that are associated with cell growth, metastasis, and resistance to radiotherapy in breast cancer (46–48). We verified that IC5 corresponded to features with high hypoxia scores in both AA and Caucasian samples (Fig. 7E). However, features from AA samples had greater cumulative hypoxia scores than features from Caucasian samples (Supplementary Fig. S5C), and the average hypoxia score per section was higher in AA samples (Supplementary Fig. S5D). Furthermore, this difference was more pronounced in tumor regions, defined as features with high tumor purity scores (Fig. 7F).

JCA revealed a negative spatial association between IC5 and IC3 (Fig. 5H), which agrees with hypoxia having several established roles in immune suppression (49). The relationship of hypoxia to immune content in our cohort can be summarized by the expression ratio of NDRG1 to IGKC, the top marker genes for IC5 and IC3, respectively (Fig. 4E). We verify that the NDRG1 to IGKC expression ratio was significantly higher in AA features in our cohort, particularly in IC5 (Fig. 8A). Furthermore, we determined that high NDRG1 expression was associated with poor survival in our cohort, whereas high IGKC expression was protective (Fig. 8B and C).

Figure 8.

NDRG1 and IGKC expression correlate with survival in TNBC and basal breast cancer. A, Violin plot of the ratio of expression of NDRG1 to IGKC by race in each IC. Log-transformed expression ratio is represented on the y-axis. q value was determined by multiple t test corrected by FDR. **, q < 0.01; ****, q < 0.0001. B and C, Kaplan–Meier plots of relapse-free survival in our reference cohort, when samples were stratified by NDRG1 (B) or IGKC (C) expression as detailed in Materials and Methods. DF, Kaplan–Meier plots of relapse-free survival stratified by IGKC expression (D), NDRG1 expression (E), and NDRG1/IGKC ratio (F) in molecularly subtyped basal breast cancer. Analysis is based on publicly available microarray expression datasets as detailed in Materials and Methods. HR and P value are indicated.

Figure 8.

NDRG1 and IGKC expression correlate with survival in TNBC and basal breast cancer. A, Violin plot of the ratio of expression of NDRG1 to IGKC by race in each IC. Log-transformed expression ratio is represented on the y-axis. q value was determined by multiple t test corrected by FDR. **, q < 0.01; ****, q < 0.0001. B and C, Kaplan–Meier plots of relapse-free survival in our reference cohort, when samples were stratified by NDRG1 (B) or IGKC (C) expression as detailed in Materials and Methods. DF, Kaplan–Meier plots of relapse-free survival stratified by IGKC expression (D), NDRG1 expression (E), and NDRG1/IGKC ratio (F) in molecularly subtyped basal breast cancer. Analysis is based on publicly available microarray expression datasets as detailed in Materials and Methods. HR and P value are indicated.

Close modal

We verified these findings in large, publicly available breast cancer expression datasets. In basal-like breast cancer, the molecular subtype most similar to histologically defined TNBC (50), high IGKC expression was significantly protective (Fig. 8D), more so than in Luminal A, Luminal B, and HER2+ subtypes (Supplementary Fig. S6). This is consistent with previous findings that IGKC is prognostic in breast and other cancers (51). Conversely, high NDRG1 expression was associated with worse outcomes (Fig. 8E), and a high ratio of NDRG1 to IGKC expression was most strongly indicative of poor survival (Fig. 8F), especially in basal-like breast cancer (Supplementary Fig. S6).

In this study, we utilize ST technology to perform comprehensive spatial and transcriptional characterization of a cohort of TNBC tumors from African American (AA) and Caucasian patients.

We describe several levels of data annotation. The first is at the level of the spatial feature, which encompasses approximately 5 to 10 cells in a diameter of 55 μm. By treating each feature individually, we can apply ssGSEA (29) and ssGSEA-based analysis, such as ESTIMATE (25), to accurately define regions of tumor, stroma, and immune infiltrate. Spatially mapped ESTIMATE signatures were comparable with a pathologist's manual annotations in identification of tumor regions. In addition, signatures pertaining to cell identity markers or signaling pathways provide further insight that cannot be deduced histologically. To summarize gene expression profiles within a sample, we move to cluster-level annotation. Clustering analysis revealed several transcriptionally distinct tumor and stromal cell populations within each sample, representing multiple TNBC subtypes and oncogenic processes. This is consistent with several single cell RNA-seq (scRNA-seq) studies of TNBC that have revealed the high degree of subclonal heterogeneity in tumors (52–54).

Finally, to obtain the most comprehensive view of ST profiles in TNBC, we perform an integrative analysis to define and annotate nine shared ICs represented in all samples. Although previous studies have generated breast cancer cell atlases from scRNA-seq data (55), the same has not been done with ST data alone, as most ST technologies do not allow single-cell resolution. With this constraint in mind, we designed a study that could lay the groundwork for developing such ST tumor atlases. Our reference dataset comprising 38,706 spatial features is the largest integrated ST dataset to be described, to the best of our knowledge. Furthermore, we validate the applicability of 9 ICs in an additional 17,861 features, representing 15 independent samples. Therefore, despite the resolution limitations of ST, our study utilizes a moleculary-rich TNBC cohort, and provides the unique opportunity to interrogate the TNBC tumor and immune microenvironment in an unbiased manner through informatics-based molecular annotation coupled with spatial analysis.

When considering spatial statistics, it is essential to apply the appropriate analysis for the data type and structure. When ICs were mapped spatially on individual samples, our data took the form of nominal labels in a latticed arrangement. To describe spatial relationships within this scheme, we make novel use of JCA, which, to our knowledge, has not previously been applied to ST data. We demonstrate an approach to make ST data derived from the 10x Genomics Visium platform compatible with JCA through rasterization. Subsequent analysis revealed the direction and degree of spatial dependency between ICs in all samples. Interestingly, several tumor cluster pairs were consistently nonrandomly dispersed. This, in addition to the strong spatial autocorrelation of each IC, may reflect compartmentalized clonal evolution and cellular substructure (56). We also observed dispersion between IC3, a stromal-immune cluster, and several tumor clusters, indicating that immune content is largely sequestered in the fibrotic compartment in TNBC.

The spatial incompatibility between hypoxic cells (IC5) and the primary immune cluster (IC3) is particularly noteworthy. A growing body of evidence supports both active and passive roles for tumor hypoxia in immune evasion (49, 57, 58). Indeed, hypoxic regions in our samples exhibited greater lymphocyte deficiency than other tumor regions. Hypoxia itself is associated with metastasis and radiotherapy resistance (47, 59), and exerts a selective pressure for more aggressive tumor clones (60). Hypoxia is therefore mechanistically significant to disease progression, and its association with immunosuppression has further implications for the use of immunotherapy in TNBC treatment.

Our finding that IC5 was overrepresented in AA samples, whereas IC3 was overrepresented in Caucasian samples, supports the notion that the tumor microenvironment differs in AA and Caucasian breast cancer (61). It has been reported that AA breast cancers have greater densities of microvessels and tumor-associated macrophages (TAM; refs. 62, 63), which have recently been linked to aggressive disease and reduced survival in breast cancer (64). This is consistent with our annotation of IC5, which we found to have a higher proportion of TAMs to lymphocytes. Furthermore, TAMs in AA breast cancers favor the protumorigenic M2 phenotype and have greater proliferative capacity (63). Interestingly, these findings resulted from studies utilizing laser-capture microdissection and manual histologic annotation of tumor tissue. In contrast, a study utilizing bulk RNA-seq data found no significant difference in the general immunological profiles of AA and Caucasian TNBC (65). Because the relevant cell populations are small, the use of bulk RNA-seq data may obscure results. The proportion of IC3 in our study samples varied between 6% and 23%. Thus, detection of subtle differences in these cell populations may require additional studies with larger specimens.

Our results suggest that NDRG1 expression is representative of the hypoxic expression profile in AA tumors, whereas IGKC expression represents the immune-supportive environment in Caucasian tumors. By calculating the ratio of expression between the two, we approximate the balance between hypoxia and immune infiltrate in our study cohort. We illustrate that a high ratio of NDRG1 to IGKC is associated with race and with significantly reduced survival in our cohort. Our survival analysis was validated in publicly available datasets of basal-like breast cancer. Although basal-like is a molecular subtype of breast cancer that is not interchangeable with histologically defined TNBC (66, 67), there is large overlap between the two (3, 50), and both disproportionately affect AA women (7, 68). Thus, these results suggest a link between differences in the tumor microenvironment and disparities in TNBC clinical outcomes.

It is important to acknowledge several variables present in our study. First, clinical disparities in TNBC outcome are multifactorial, influenced by both biological factors and broader disparities (8). Although we control for patient age and tumor stage in our cohort, we do not consider factors such as socioeconomic or environmental variables. Although ST technology is robust, its relatively low throughput does not lend itself well to controlling for these multiple factors. In addition, a minority of samples in our reference cohort received neoadjuvant therapy, which can alter tumor biology. However, because we find that all ICs are represented in all samples, we may be capturing heterogeneity at a level independent of the effects of neoadjuvant therapy. Furthermore, most validation cohort samples had received neoadjuvant treatment; we were also able to map and validate ICs in that cohort. Nevertheless, these are variables to be controlled for in larger studies.

The promise of ST lies in its ability to provide unprecedented insight into the compositions and interactions of cells within the tumor. We have combined functional molecular annotation with spatial analysis to profile the tumor architecture of TNBC. Our findings suggest a conserved spatio-transcriptional architecture in TNBC, despite intratumoral and stromal heterogeneity within individual samples. Furthermore, our results indicate that hypoxia may considerably influence the tumor environment in AA TNBC. Pending validation in larger cohorts, this raises the possibility of evaluating the utility of hypoxia-targeting strategies, such as hypoxia-activated prodrugs (69) or hypoxia-targeted nanoparticles (70). We also found that hypoxic regions in AA samples were associated with disproportionately high M0 macrophage content, which may support the evaluation of macrophage reprogramming-based therapeutic strategies, some of which are undergoing clinical trials (71). Thus, our study provides novel insights into the relationship between tumor biology and racial disparities in TNBC, and could ultimately lead to improved outcomes for this clinically challenging disease.

R. Bassiouni reports grants from AACR/Genentech during the conduct of the study. J.D. Carpten reports grants from NCI during the conduct of the study; and also presented data related to a different study at a 10x Genomics sponsored symposium. No disclosures were reported by the other authors.

R. Bassiouni: Conceptualization, software, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. M.O. Idowu: Conceptualization, resources, data curation, investigation, writing–review and editing. L.D. Gibbs: Conceptualization, resources, data curation, investigation, writing–review and editing. V. Robila: Resources, data curation, investigation, writing–review and editing. P.J. Grizzard: Resources, data curation, formal analysis, writing–review and editing. M.G. Webb: Software, formal analysis. J. Song: Software, formal analysis. A. Noriega: Resources, formal analysis, supervision, funding acquisition, writing–review and editing. D.W. Craig: Conceptualization, resources, supervision, funding acquisition, project administration, writing–review and editing. J.D. Carpten: Conceptualization, resources, supervision, funding acquisition, project administration, writing–review and editing.

Funding for the research project was provided by NIH-NCI Cancer Center Support Grant P30 CA014089 (to D.W. Craig, J.D. Carpten), NIH-NCI CARE2 PACHE U54 CA233465 (to D.W. Craig and J.D. Carpten), Tower Cancer Research Foundation (to R. Bassiouni and J.D. Carpten), and AACR/Genentech Cancer Disparities Fellowship 20–40–18-BASS (to R. Bassiouni). Services in support of the research project were provided by the VCU Massey Cancer Center Tissue and Data Acquisition and Analysis Core, supported, in part, with funding from NIH-NCI Cancer Center Support Grant P30 CA016059. Sequencing for the project was performed by the Keck Genomics Platform and the Molecular Genomics Core at the University of Southern California, supported, in part, with funding from NIH-NCI Cancer Center Support Grant P30 CA014089. Microscopy for the project was partially performed by the Optical Imaging Facility at the Eli and Edythe Broad CIRM Center at the University of Southern California. The authors particularly thank Dr. Seth Ruffins for his assistance in image acquisition.

The publication costs of this article were defrayed in part by the payment of publication fees. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/).

1.
Cancer Genome Atlas Network.
Comprehensive molecular portraits of human breast tumours
.
Nature
2012
;
490
:
61
70
.
2.
Foulkes
WD
,
Smith
IE
,
Reis
JS
.
Triple-negative breast cancer
.
New Engl J Med
2010
;
363
:
1938
48
.
3.
Dent
R
,
Trudeau
M
,
Pritchard
KI
,
Hanna
WM
,
Kahn
HK
,
Sawka
CA
, et al
.
Triple-negative breast cancer: clinical features and patterns of recurrence
.
Clin Cancer Res
2007
;
13
:
4429
34
.
4.
Bareche
Y
,
Venet
D
,
Ignatiadis
M
,
Aftimos
P
,
Piccart
M
,
Rothe
F
, et al
.
Unravelling triple-negative breast cancer molecular heterogeneity using an integrative multiomic analysis
.
Ann Oncol
2018
;
29
:
895
902
.
5.
Shah
SP
,
Roth
A
,
Goya
R
,
Oloumi
A
,
Ha
G
,
Zhao
Y
, et al
.
The clonal and mutational evolution spectrum of primary triple-negative breast cancers
.
Nature
2012
;
486
:
395
9
.
6.
Bianchini
G
,
Balko
JM
,
Mayer
IA
,
Sanders
ME
,
Gianni
L
.
Triple-negative breast cancer: challenges and opportunities of a heterogeneous disease
.
Nat Rev Clin Oncol
2016
;
13
:
674
90
.
7.
Carey
LA
,
Perou
CM
,
Livasy
CA
,
Dressler
LG
,
Cowan
D
,
Conway
K
, et al
.
Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study
.
JAMA
2006
;
295
:
2492
502
.
8.
Dietze
EC
,
Sistrunk
C
,
Miranda-Carboni
G
,
O'Regan
R
,
Seewaldt
VL
.
Triple-negative breast cancer in African-American women: disparities versus biology
.
Nat Rev Cancer
2015
;
15
:
248
54
.
9.
Amirikia
KC
,
Mills
P
,
Bush
J
,
Newman
LA
.
Higher population-based incidence rates of triple-negative breast cancer among young African-American women: implications for breast cancer screening recommendations
.
Cancer
2011
;
117
:
2747
53
.
10.
Newman
LA
,
Kaljee
LM
.
Health disparities and triple-negative breast cancer in African American Women: a review
.
JAMA Surg
2017
;
152
:
485
93
.
11.
Keenan
T
,
Moy
B
,
Mroz
EA
,
Ross
K
,
Niemierko
A
,
Rocco
JW
, et al
.
Comparison of the genomic landscape between primary breast cancer in African American versus White Women and the association of racial differences with tumor recurrence
.
J Clin Oncol
2015
;
33
:
3621
7
.
12.
Pitt
JJ
,
Riester
M
,
Zheng
Y
,
Yoshimatsu
TF
,
Sanni
A
,
Oluwasola
O
, et al
.
Characterization of Nigerian breast cancer reveals prevalent homologous recombination deficiency and aggressive molecular features
.
Nat Commun
2018
;
9
:
4181
.
13.
Loree
JM
,
Anand
S
,
Dasari
A
,
Unger
JM
,
Gothwal
A
,
Ellis
LM
, et al
.
Disparity of race reporting and representation in clinical trials leading to cancer drug approvals from 2008 to 2018
.
JAMA Oncol
2019
;
5
:
e191870
.
14.
Gonzalez Castro
LN
,
Tirosh
I
,
Suva
ML
.
Decoding cancer biology one cell at a time
.
Cancer Discov
2021
;
11
:
960
70
.
15.
Stahl
PL
,
Salmen
F
,
Vickovic
S
,
Lundmark
A
,
Navarro
JF
,
Magnusson
J
, et al
.
Visualization and analysis of gene expression in tissue sections by spatial transcriptomics
.
Science
2016
;
353
:
78
82
.
16.
Bassiouni
R
,
Gibbs
LD
,
Craig
DW
,
Carpten
JD
,
McEachron
TA
.
Applicability of spatial transcriptional profiling to cancer research
.
Mol Cell
2021
;
81
:
1631
9
.
17.
Quail
DF
,
Joyce
JA
.
Microenvironmental regulation of tumor progression and metastasis
.
Nat Med
2013
;
19
:
1423
37
.
18.
Dias
AS
,
Almeida
CR
,
Helguero
LA
,
Duarte
IF
.
Metabolic crosstalk in the breast cancer microenvironment
.
Eur J Cancer
2019
;
121
:
154
71
.
19.
Salemme
V
,
Centonze
G
,
Cavallo
F
,
Defilippi
P
,
Conti
L
.
The crosstalk between tumor cells and the immune microenvironment in breast cancer: implications for immunotherapy
.
Front Oncol
2021
;
11
:
610303
.
20.
Martin-Pardillos
A
,
Valls Chiva
A
,
Bande Vargas
G
,
Hurtado Blanco
P
,
Pineiro Cid
R
,
Guijarro
PJ
, et al
.
The role of clonal communication and heterogeneity in breast cancer
.
BMC Cancer
2019
;
19
:
666
.
21.
Cleary
AS
,
Leonard
TL
,
Gestl
SA
,
Gunther
EJ
.
Tumour cell heterogeneity maintained by cooperating subclones in Wnt-driven mammary cancers
.
Nature
2014
;
508
:
113
7
.
22.
Naffar-Abu Amara
S
,
Kuiken
HJ
,
Selfors
LM
,
Butler
T
,
Leung
ML
,
Leung
CT
, et al
.
Transient commensal clonal interactions can drive tumor metastasis
.
Nat Commun
2020
;
11
:
5799
.
23.
Hao
Y
,
Hao
S
,
Andersen-Nissen
E
,
Mauck
WM
3rd
,
Zheng
S
,
Butler
A
, et al
.
Integrated analysis of multimodal single-cell data
.
Cell
2021
;
184
:
3573
87
.
24.
Hafemeister
C
,
Satija
R
.
Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
.
Genome Biol
2019
;
20
:
296
.
25.
Yoshihara
K
,
Shahmoradgoli
M
,
Martinez
E
,
Vegesna
R
,
Kim
H
,
Torres-Garcia
W
, et al
.
Inferring tumour purity and stromal and immune cell admixture from expression data
.
Nat Commun
2013
;
4
:
2612
.
26.
Zappia
L
,
Oshlack
A
.
Clustering trees: a visualization for evaluating clusterings at multiple resolutions
.
Gigascience
2018
;
7
:
giy083
.
27.
Chen
X
,
Li
J
,
Gray
WH
,
Lehmann
BD
,
Bauer
JA
,
Shyr
Y
, et al
.
TNBCtype: a subtyping tool for triple-negative breast cancer
.
Cancer Inform
2012
;
11
:
147
56
.
28.
Subramanian
A
,
Tamayo
P
,
Mootha
VK
,
Mukherjee
S
,
Ebert
BL
,
Gillette
MA
, et al
.
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles
.
Proc Natl Acad Sci U S A
2005
;
102
:
15545
50
.
29.
Barbie
DA
,
Tamayo
P
,
Boehm
JS
,
Kim
SY
,
Moody
SE
,
Dunn
IF
, et al
.
Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1
.
Nature
2009
;
462
:
108
U22
.
30.
Liberzon
A
,
Birger
C
,
Thorvaldsdottir
H
,
Ghandi
M
,
Mesirov
JP
,
Tamayo
P
.
The molecular signatures database hallmark gene set collection
.
Cell Syst
2015
;
1
:
417
25
.
31.
Yu
X
,
Chen
YA
,
Conejo-Garcia
JR
,
Chung
CH
,
Wang
X
.
Estimation of immune cell content in tumor using single-cell RNA-seq reference data
.
BMC Cancer
2019
;
19
:
715
.
32.
Newman
AM
,
Steen
CB
,
Liu
CL
,
Gentles
AJ
,
Chaudhuri
AA
,
Scherer
F
, et al
.
Determining cell type abundance and expression from bulk tissues with digital cytometry
.
Nat Biotechnol
2019
;
37
:
773
82
.
33.
Newman
AM
,
Liu
CL
,
Green
MR
,
Gentles
AJ
,
Feng
W
,
Xu
Y
, et al
.
Robust enumeration of cell subsets from tissue expression profiles
.
Nat Methods
2015
;
12
:
453
7
.
34.
Stuart
T
,
Butler
A
,
Hoffman
P
,
Hafemeister
C
,
Papalexi
E
,
Mauck
WM
3rd
, et al
.
Comprehensive integration of single-cell data
.
Cell
2019
;
177
:
1888
902
.
35.
Bivand
RRS
,
Pebesma
E
,
Gomez-Rubio
V
.
Applied spatial data analysis with R
,
Second edition
:
Springer
,
NY
;
2013
.
36.
Budczies
J
,
Klauschen
F
,
Sinn
BV
,
Gyorffy
B
,
Schmitt
WD
,
Darb-Esfahani
S
, et al
.
Cutoff Finder: a comprehensive and straightforward Web application enabling rapid biomarker cutoff optimization
.
PLoS One
2012
;
7
:
e51862
.
37.
Therneau
T
.
A Package for Survival Analysis in R
.
2022
.
Available from:
https://CRAN.R-project.org/package=survival.
38.
Lanczky
A
,
Gyorffy
B
.
Web-based survival analysis tool tailored for medical research (KMplot): development and implementation
.
J Med Internet Res
2021
;
23
:
e27633
.
39.
Lehmann
BD
,
Bauer
JA
,
Chen
X
,
Sanders
ME
,
Chakravarthy
AB
,
Shyr
Y
, et al
.
Identification of human triple-negative breast cancer subtypes and preclinical models for selection of targeted therapies
.
J Clin Invest
2011
;
121
:
2750
67
.
40.
Lehmann
BD
,
Jovanovic
B
,
Chen
X
,
Estrada
MV
,
Johnson
KN
,
Shyr
Y
, et al
.
Refinement of triple-negative breast cancer molecular subtypes: implications for neoadjuvant chemotherapy selection
.
PLoS One
2016
;
11
:
e0157368
.
41.
Cangul
H
.
Hypoxia upregulates the expression of the NDRG1 gene leading to its overexpression in various human cancers
.
BMC Genet
2004
;
5
:
27
.
42.
Villodre
ES
,
Hu
X
,
Eckhardt
BL
,
Larson
R
,
Huo
L
,
Yoon
EC
, et al
.
NDRG1 in aggressive breast cancer progression and brain metastasis
.
J Natl Cancer Inst
2022
;
114
:
579
91
.
43.
Weichselbaum
RR
,
Ishwaran
H
,
Yoon
T
,
Nuyten
DS
,
Baker
SW
,
Khodarev
N
, et al
.
An interferon-related gene signature for DNA damage resistance is a predictive marker for chemotherapy and radiation for breast cancer
.
Proc Natl Acad Sci U S A
2008
;
105
:
18490
5
.
44.
Cliff
AD
,
Ord
JK
.
Spatial Processes: Models & Applications. Pion
;
1981
.
45.
Upton
G
,
Fingleton
B
.
Spatial Data Analysis by Example, Volume 1
:
Wiley
;
1985
.
46.
Tang
K
,
Zhu
L
,
Chen
J
,
Wang
D
,
Zeng
L
,
Chen
C
, et al
.
Hypoxia promotes breast cancer cell growth by activating a glycogen metabolic program
.
Cancer Res
2021
;
81
:
4949
63
.
47.
Rankin
EB
,
Giaccia
AJ
.
Hypoxic control of metastasis
.
Science
2016
;
352
:
175
80
.
48.
Mast
JM
,
Kuppusamy
P
.
Hyperoxygenation as a therapeutic supplement for treatment of triple negative breast cancer
.
Front Oncol
2018
;
8
:
527
.
49.
Barsoum
IB
,
Koti
M
,
Siemens
DR
,
Graham
CH
.
Mechanisms of hypoxia-mediated immune escape in cancer
.
Cancer Res
2014
;
74
:
7185
90
.
50.
Badve
S
,
Dabbs
DJ
,
Schnitt
SJ
,
Baehner
FL
,
Decker
T
,
Eusebi
V
, et al
.
Basal-like and triple-negative breast cancers: a critical review with an emphasis on the implications for pathologists and oncologists
.
Mod Pathol
2011
;
24
:
157
67
.
51.
Schmidt
M
,
Hellwig
B
,
Hammad
S
,
Othman
A
,
Lohr
M
,
Chen
Z
, et al
.
A comprehensive analysis of human gene expression profiles identifies stromal immunoglobulin kappa C as a compatible prognostic marker in human solid tumors
.
Clin Cancer Res
2012
;
18
:
2695
703
.
52.
Karaayvaz
M
,
Cristea
S
,
Gillespie
SM
,
Patel
AP
,
Mylvaganam
R
,
Luo
CC
, et al
.
Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq
.
Nat Commun
2018
;
9
:
3588
.
53.
Pal
B
,
Chen
Y
,
Vaillant
F
,
Capaldo
BD
,
Joyce
R
,
Song
X
, et al
.
A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast
.
EMBO J
2021
;
40
:
e107333
.
54.
Zhou
S
,
Huang
YE
,
Liu
H
,
Zhou
X
,
Yuan
M
,
Hou
F
, et al
.
Single-cell RNA-seq dissects the intratumoral heterogeneity of triple-negative breast cancer based on gene regulatory networks
.
Mol Ther Nucleic Acids
2021
;
23
:
682
90
.
55.
Wu
SZ
,
Al-Eryani
G
,
Roden
DL
,
Junankar
S
,
Harvey
K
,
Andersson
A
, et al
.
A single-cell and spatially resolved atlas of human breast cancers
.
Nat Genet
2021
;
53
:
1334
47
.
56.
Greaves
M
,
Maley
CC
.
Clonal evolution in cancer
.
Nature
2012
;
481
:
306
13
.
57.
Pietrobon
V
,
Marincola
FM
.
Hypoxia and the phenomenon of immune exclusion
.
J Transl Med
2021
;
19
:
9
.
58.
Vito
A
,
El-Sayes
N
,
Mossman
K
.
Hypoxia-driven immune escape in the tumor microenvironment
.
Cells-Basel
2020
;
9
:
992
.
59.
Sorensen
BS
,
Horsman
MR
.
Tumor hypoxia: impact on radiation therapy and molecular pathways
.
Front Oncol
2020
;
10
:
562
.
60.
Graeber
TG
,
Osmanian
C
,
Jacks
T
,
Housman
DE
,
Koch
CJ
,
Lowe
SW
, et al
.
Hypoxia-mediated selection of cells with diminished apoptotic potential in solid tumours
.
Nature
1996
;
379
:
88
91
.
61.
Kim
G
,
Pastoriza
JM
,
Condeelis
JS
,
Sparano
JA
,
Filippou
PS
,
Karagiannis
GS
, et al
.
The contribution of race to breast tumor microenvironment composition and disease progression
.
Front Oncol
2020
;
10
:
1022
.
62.
Martin
DN
,
Boersma
BJ
,
Yi
M
,
Reimers
M
,
Howe
TM
,
Yfantis
HG
, et al
.
Differences in the tumor microenvironment between African-American and European-American breast cancer patients
.
PLoS One
2009
;
4
:
e4531
.
63.
Koru-Sengul
T
,
Santander
AM
,
Miao
F
,
Sanchez
LG
,
Jorda
M
,
Gluck
S
, et al
.
Breast cancers from black women exhibit higher numbers of immunosuppressive macrophages with proliferative activity and of crown-like structures associated with lower survival compared to non-black Latinas and Caucasians
.
Breast Cancer Res Treat
2016
;
158
:
113
26
.
64.
Cassetta
L
,
Fragkogianni
S
,
Sims
AH
,
Swierczak
A
,
Forrester
LM
,
Zhang
H
, et al
.
Human tumor-associated macrophage and monocyte transcriptional landscapes reveal cancer-specific reprogramming, biomarkers, and therapeutic targets
.
Cancer Cell
2019
;
35
:
588
602
.
65.
O'Meara
T
,
Safonov
A
,
Casadevall
D
,
Qing
T
,
Silber
A
,
Killelea
B
, et al
.
Immune microenvironment of triple-negative breast cancer in African-American and Caucasian women
.
Breast Cancer Res Treat
2019
;
175
:
247
59
.
66.
Perou
CM
,
Sorlie
T
,
Eisen
MB
,
van de Rijn
M
,
Jeffrey
SS
,
Rees
CA
, et al
.
Molecular portraits of human breast tumours
.
Nature
2000
;
406
:
747
52
.
67.
Rakha
EA
,
Tan
DSP
,
Foulkes
WD
,
Ellis
IO
,
Tutt
A
,
Nielsen
TO
, et al
.
Are triple-negative tumours and basal-like breast cancer synonymous?
Breast Cancer Res
2007
;
9
:
404
.
68.
Bauer
KR
,
Brown
M
,
Cress
RD
,
Parise
CA
,
Caggiano
V
.
Descriptive analysis of estrogen receptor (ER)-negative, progesterone receptor (PR)-negative, and HER2-negative invasive breast cancer, the so-called triple-negative phenotype: a population-based study from the California cancer Registry
.
Cancer
2007
;
109
:
1721
8
.
69.
Li
Y
,
Zhao
L
,
Li
XF
.
Targeting hypoxia: hypoxia-activated prodrugs in cancer therapy
.
Front Oncol
2021
;
11
:
700407
.
70.
Li
Y
,
Jeon
J
,
Park
JH
.
Hypoxia-responsive nanoparticles for tumor-targeted drug delivery
.
Cancer Lett
2020
;
490
:
31
43
.
71.
Tan
Y
,
Wang
M
,
Zhang
Y
,
Ge
S
,
Zhong
F
,
Xia
G
, et al
.
Tumor-associated macrophages: a potential target for cancer therapy
.
Front Oncol
2021
;
11
:
693517
.