Abstract
Signaling between cancer and nonmalignant (stromal) cells in the tumor microenvironment (TME) is a key to tumor progression. Here, we deconvoluted bulk tumor transcriptomes to infer cross-talk between ligands and receptors on cancer and stromal cells in the TME of 20 solid tumor types. This approach recovered known transcriptional hallmarks of cancer and stromal cells and was concordant with single-cell, in situ hybridization and IHC data. Inferred autocrine cancer cell interactions varied between tissues but often converged on Ephrin, BMP, and FGFR-signaling pathways. Analysis of immune checkpoints nominated interactions with high levels of cancer-to-immune cross-talk across distinct tumor types. Strikingly, PD-L1 was found to be highly expressed in stromal rather than cancer cells. Overall, our study presents a new resource for hypothesis generation and exploration of cross-talk in the TME.
This study provides deconvoluted bulk tumor transcriptomes across multiple cancer types to infer cross-talk in the tumor microenvironment.
Introduction
The tumor microenvironment (TME) is a multifaceted cellular environment that both constrains the evolving tumor (1) and plays a pivotal role in tumor progression and therapeutic response (2). Existing experimental methods to characterize the TME, such as imaging and single-cell–based approaches, cannot be applied retrospectively to existing large-scale bulk tumor datasets. Such datasets represent a vast and mostly unexplored resource for studying cancer cell ligand–receptor (LR) repertoires and cross-talk in the TME.
General regression-based methods for mixed-tissue transcriptome deconvolution may be applied to deconvolve mixed gene expression profiles when cell type fractions can be robustly estimated from the sample (3, 4). However, existing approaches to deconvolve bulk tumor gene expression profiles are commonly focused on estimation of immune cell type fractions (5, 6) or optimized and applied in a particular set of tumor types (7–9). Previous studies have explored how the fraction of cancer cells (tumor purity) can be estimated for tumor samples (10–12), and further how it varies across tumors and may confound gene expression analysis (13). Here, we combined two existing approaches, consensus tumor purity estimation (13) and regression-based mixed-tissue transcriptome deconvolution (4), to estimate cancer and stromal (comprising any noncancer cell) compartment molecular profiles from bulk tumor RNA-seq data. From these profiles, we inferred expression levels of known ligands and receptors as well as interaction potentials within and between the two compartments. We validated the approach using orthogonal public cancer genomics, single-cell RNA-seq, RNA in situ hybridization (ISH), and IHC imaging data. We applied the approach to approximately 8,000 tumor samples across 20 solid tumor types from The Cancer Genome Atlas (TCGA) to predict and nominate cancer and stromal cell cross-talk conserved across tumor types as well as tissue-specific interactions. The data and results from this study are available as an interactive resource, https://cite.genome.sg, enabling data-driven exploration and hypothesis generation of cross-talk in the TME.
Materials and Methods
Tumor data sources
We analyzed 20 solid tumor types with TCGA acronyms BLCA, BRCA, CESC, CRC (COAD and READ combined), ESCA, GBM, HNSC, KIRC, KIRP, LGG, LIHC, LUAD, LUSC, OV, PAAD, PRAD, SKCM, STAD, THCA, and UCEC. The full names of cancer types and sample size are indicated in the Supplementary Table S1. We obtained somatic mutation (SNV) and copy-number variation (CNV) data for 20 tumor types from the Broad Institute Firehose website (See data availability section below). Uniformly processed TCGA RNA-seq (FPKM) data were obtained from the UCSC Xena server (See data availability).
Tumor purity estimation
Tumor purity can be estimated from both DNA and RNA data obtained from a bulk tumor sample. To obtain unbiased estimates of tumor purity across the TCGA cohort, we exploited that nearly all tumor samples had independent genomic (Exome-seq, SNP-array) and transcriptomic (RNA-seq) profiles. We first computed the tumor purity of all samples using three genome-based and one transcriptome-based method: AbsCN-Seq (Exome-seq; ref. 11), PurBayes (Exome-seq; ref. 14), ASCAT (SNP-array; ref. 15), and ESTIMATE (RNA-seq; ref. 12). We also evaluated other published tumor transcriptome deconvolution methods (7, 9, 12, 16), but ESTIMATE showed markedly higher concordance with the genome-based methods (Supplementary Fig. S1A and S1B). Next, some tumor purity values were missing because the algorithms failed to converge for certain input data. In addition, instances of very low (<10%) purity estimates were only observed for one method (PurBayes), and all genome-based methods showed a disproportionate fraction of samples with extreme high-purity estimates (>98%; Supplementary Fig. S1C). Comparative analysis suggested that samples with such extreme purity estimates could be misestimated (Supplementary Fig. S1D), and we therefore chose the conservative approach to flag tumor samples with extreme low (<0.1) or high (>0.98) tumor purity as missing data. To limit potential technical bias from missing data in the consensus purity estimation, missing data were imputed using an iterative principal component analysis of the incomplete algorithm versus sample tumor–purity matrix (using the missMDA R package; ref. 17). We used quantile normalization to standardize and average the tumor-purity distributions of different algorithms per tumor type. The final consensus tumor-purity estimate was obtained as the mean of these normalized purity values.
Database of LR interactions
We obtained approximately 1,400 LR pairs supported by evidence in the literature as compiled and curated by Ramilowski and colleagues (18). In addition, we curated and added a list of immune checkpoint LR pairs associated with cancer. The complete list of LR interactions together with literature references can be found in Supplementary Table S2.
Cancer–stroma gene expression deconvolution
We assume tumors to be comprised of cancer and stromal (any noncancer) cells. Measured bulk tumor mRNA abundance is then given by the sum of mRNA molecules derived from these two compartments. Tumor mRNA expression (|{e_{tumor,i}}$|) measured for a given gene in sample i can then be expressed as:
Here pi denotes the cancer cell proportion (tumor purity) in sample i, and |{\bar{e}_{cancer}}$| and |{\bar{e}_{stroma}}$| are average expression levels for the gene in the cancer and stromal compartment (not dependent on sample i), respectively. We make the simplifying assumption that these (non-negative) average compartment expression levels are constant across the set of tumors and estimate them using non-negative least squares regression. 95% confidence intervals and standard deviations for the cancer and stroma point estimates are estimated using bootstrapping.
Bulk tumor RNA-seq FPKM data were log-transformed, log2(X+1), before deconvolution. First, we observed that the relationship between tumor purity and raw bulk tumor gene expression was generally heteroscedastic: Across all cancer types, tumor purity had overall stronger linear correlation with log-transformed RNA-seq gene expression data, as can be observed in deconvolution of known stromal-specific genes (Supplementary Figs. S2–S5). Second, we directly compared results for untransformed and log-transformed data, and although the results were overall similar, we found that log transformation provided better separation between inferred cancer and stroma compartment gene expression for known stromal genes (Supplementary Fig. S6), which is consistent with previous empirical analysis (19).
We found that the equation above tended to misestimate stromal gene expression for genes with somatic copy-number alterations (CNA) affecting gene expression in a subset of the samples (for example ERBB2 in HER2-positive breast tumors). We therefore used a modified approach for such genes. We first identified genes with correlation between CNA and mRNA expression (comparing expression for samples with diploid and non-diploid CNA, the Mann–Whitney U test, P < 1e−6, to account for multiple testing) in a given set of tumors, and then estimated cancer and stromal compartment gene expression using a two-step approach. Stromal compartment mRNA expression was first inferred using the above approach using only samples with diploid copy number for the gene. We then used the inferred mean stroma compartment expression, the measured mean tumor expression, and the mean purity of the tumor samples to calculate the mean cancer compartment expression using the above equation.
Deconvolution of iTRAQ tumor protein abundance data
We obtained iTRAQ protein abundance data for BRCA and OV tumor types using CPTAC consortium data available at cBioPortal (20). Bulk abundance profiles were deconvolved into cancer and stroma compartment abundance profiles similar to the procedures described for RNA-seq data above. Briefly, we have matched protein abundance and consensus tumor purity data available for each tumor. We can therefore use the consensus tumor purity estimates and NNLS to infer the (mean) abundance of individual proteins in the cancer and stromal compartment, respectively.
LR relative cross-talk score
To estimate the relative flow of signaling between cancer and stromal cell compartments, we developed the relative cross-talk (RC) score. Similar to previous studies (21–23), we estimate LR complex activity using the product of ligand and receptor gene expression. The RC score then estimates the relative complex concentration given all four possible directions of signaling and a normal tissue state, for example, for cancer-to-cancer (C>C) signaling:
The normal term in the denominator is included to normalize for complex activity in normal tissue, and this term is calculated directly from the observed gene expression levels in normal tissue samples available for each tumor type in TCGA. Our LR product score also draws inspiration from the law of mass action. Here, we can estimate the concentration of an LR complex from the molar concentrations of the individual ligand, L and receptor, R, and their dissociation constant, KD, under equilibrium:
Because our analysis focuses on relative LR complex concentrations, we may ignore the dissociation constant as it is present in both numerator and denominator of the RC score.
Our analysis depends on multiple simplifying assumptions. We assume that compartment-level mRNA expression estimates are reasonable proxies for ligand and receptor molar concentration at the site of LR complex formation. Although some ligands may be secreted and able to diffuse to more distant target cells, others are anchored at cell surfaces and thereby only permit local interactions. For the latter local interactions, we assume that there is sufficient mixing of cancer and stromal cells inside tumors. In addition, we assume there is no external competition for individual ligands and receptors of a given LR pair, and that these effects are constant across samples or compartments. Finally, we note that LR expression and interaction scores could vary across both molecular subtypes (Fig. 3) and the physical location of tumor samples. Indeed, to demonstrate the last point, analysis of gastric cancer tumors showed significant differences in LR interactions between tumor samples obtained from the gastrointestinal junction and the antrum/pylorus of the stomach (Supplementary Fig. S7A and S7B).
Gene set enrichment analysis
To study genes differentially expressed between cancer and stromal compartments, we performed gene set enrichment analysis (GSEA; ref. 24) pre-ranked analysis of genes sorted by differential expression (log-fold change) in the two compartments. We analyzed all hallmark gene sets and used an FDR cutoff value of 0.25 to determine gene sets with differential enrichment.
Immune cell-type association analysis
We obtained immune cell-type proportions estimated by Cibersort (6) across all TCGA samples (25). To discover association between cell types and checkpoint ligands/receptors, we computed the Pearson correlation coefficient between median immune cell type proportions and inferred cancer/stromal expression levels across tumor types.
Analysis of melanoma scRNA-seq data
We obtained processed scRNA-seq data of melanoma tumors (Tirosh and colleagues, 2016) from the Broad Single Cell Portal (https://singlecell.broadinstitute.org/). We analyzed and compared deconvolved gene expression estimates of annotated malignant (cancer, N = 1,257 cells) and nonmalignant (stroma, N = 3,256 cells) cells. Differences in gene expression between cancer and stromal cells were evaluated using the Wilcoxon rank-sum test (two-sided).
IHC quantification analysis of Human Protein Atlas data
To quantify cancer and stromal cells expression of genes, we performed color deconvolution of IHC images obtained from the Human Protein Atlas (proteinatlas.org; ref. 26) using the ImageJ software package and standard protocols (27). Following manual selection and segmentation of cancer and stromal cells (without knowledge of antibody staining), color intensities were measured with ImageJ, and DAB (target), hematoxylin (cells), and complementary components were estimated. Average antibody intensities were then estimated for the cancer and stromal compartment of a given slide. Further technical details are provided in the Supplementary Methods.
RNA-ISH of breast cancer tissue
We obtained formalin-fixed paraffin-embedded (FFPE) breast cancer tissue slides for 2 luminal (ER+/PR+) and 2 triple-negative basal-like breast tumors. ISH was performed using specific RNAScope (https://acdbio.com) probes to evaluate IL6ST and SFRP1 RNA level and localization. The FFPE slides were deparaffinized, rehydrated, and pretreated using the RNAScope Sample Preparation kit according to the manufacturer's recommendations. The slides were incubated in target retrieval solution (#322000, Advanced Cell Diagnostics) for 15 minutes at 97°C followed by a protease solution (#322330, Advanced Cell Diagnostics) for 30 minutes at 40°C. RNAScope IL6ST (#447251) and SFRP1 (#429381-C2) target probe and RNAScope 2.5 HD Duplex Detection (Chromogenic) kit (#322430, Advanced Cell Diagnostics) were applied to the slide according to the manufacturer's instructions. RNA expression was quantified using two approaches. First, according to the manufacturer's instructions (TS 46–003), we segmented the Images using supervised machine learning. Briefly, we used Fiji/ImageJ to train a 4-class random forest classifier to distinguish regions with SFRP1 staining, IL6ST staining, nuclear regions, and background signal. We used this model to segment all 8 images. We selected 5 cancer cell regions in each image and recorded the mean probability for the SFRP1 and ILST6 staining classes in each region. These staining probabilities were summarized (mean and standard deviation) across all regions from the four images (2 patients) for the luminal and basal subtype, respectively. As a second and different approach, we quantified IL6ST and SFRP1 mRNA expression in cancer cells of basal and luminal breast cancer tissues using 3-channel color deconvolution. Channels were specified for probe 1 (SFRP1), probe 2 (IL6ST), and background. Multiple areas enriched for cancer cells were selected in each tissue slide and the optical density (OD) was calculated for each area. The average and standard deviation of the OD for each gene and tissue type were calculated.
Data availability
Interactive resource containing data and results from this study: https://cite.genome.sg/ui
TCGA SNV and CNV data: https://gdac.broadinstitute.org/. Data release version 2016/01/28.
TCGA RNA-seq data: https://toil.xenahubs.net/download/tcga_RSEM_gene_fpkm.gz
ASCAT purity estimates: https://cancer.sanger.ac.uk/cosmic/download
IHC data: https://www.proteinatlas.org.
Results
Overview of approach
Tumor purity can be estimated from both DNA and RNA data obtained from a bulk tumor sample. To obtain unbiased estimates of tumor purity across the TCGA cohort, we exploited that both genomic (Exome-seq, SNP-array) and transcriptomic (RNA-seq) data had been obtained for each tumor sample. Accordingly, we computed consensus purity estimates using three genome-based and a transcriptome-based method across approximately 8,000 samples and 20 solid tumor types (Fig. 1 and Materials and Methods). Most tumors had a purity in the range of 40%–70%, but showed large variation within and between tumor types (Fig. 2A). Second, the tumor transcriptome profiles were deconvolved into an average cancer and stromal cell profile. For each tumor type and gene, we regressed the measured bulk mRNA expression against the estimated tumor purity to predict the expression level in an imaginary tumor composed entirely of cancer or stromal cells (Materials and Methods). Third, to infer cancer and stromal cell ligand and receptor repertoires, as well as potential cross-talk between these compartments, we combined the inferred expression profiles with an existing curated database of LR interactions to predict candidate LR interactions across cancer types (Materials and Methods).
Overview of approach. The purity of each bulk tumor sample is first estimated using a consensus approach. mRNA expression levels in “average” cancer and stromal cells are inferred for a set of tumors (e.g., tumor type) using non-negative least-squares regression; figure shows data for CD4 in breast cancer. Candidate autocrine and paracrine signaling interactions are inferred using a database of curated receptor–ligand signaling interactions.
Overview of approach. The purity of each bulk tumor sample is first estimated using a consensus approach. mRNA expression levels in “average” cancer and stromal cells are inferred for a set of tumors (e.g., tumor type) using non-negative least-squares regression; figure shows data for CD4 in breast cancer. Candidate autocrine and paracrine signaling interactions are inferred using a database of curated receptor–ligand signaling interactions.
Validation of tumor transcriptome deconvolution. A, Inferred consensus tumor purity estimates for approximately 8,000 bulk tumor samples across 20 solid tumor types. B, Inferred expression of known compartment-specific genes in cancer and stromal compartments across cancer. C, Inferred cancer and stroma compartment expression levels for 280 previously reported stromal-specific genes. D, Genes specifically expressed in cancer and stromal cells were inferred for each tumor type. Correlation (Pearson r) between mRNA expression and somatic CNAs at each gene locus was evaluated (top). The fraction of genome altered by CNA was determined for each tumor type (bottom). E, Deconvolved cancer and stroma compartment expression levels in melanoma (SKCM) for cancer- and stroma-specific genes previously identified with melanoma scRNA-seq. F, Genes were ordered by inferred expression difference between cancer and stroma compartments in each tumor type, and GSEA was used to identify cancer- and stroma-enriched gene sets. G, Protein expression was inferred for cancer and stroma compartments in ovarian (OV) and breast (BRCA) cancer cohorts using iTRAQ protein quantification data and compared with deconvolved RNA-seq expression data. H, Comparison of IHC and deconvolved RNA-seq expression for a gene (S100A6) with highly variable cancer/stroma expression across cancer types; error bars reflect SDs of the point estimates.
Validation of tumor transcriptome deconvolution. A, Inferred consensus tumor purity estimates for approximately 8,000 bulk tumor samples across 20 solid tumor types. B, Inferred expression of known compartment-specific genes in cancer and stromal compartments across cancer. C, Inferred cancer and stroma compartment expression levels for 280 previously reported stromal-specific genes. D, Genes specifically expressed in cancer and stromal cells were inferred for each tumor type. Correlation (Pearson r) between mRNA expression and somatic CNAs at each gene locus was evaluated (top). The fraction of genome altered by CNA was determined for each tumor type (bottom). E, Deconvolved cancer and stroma compartment expression levels in melanoma (SKCM) for cancer- and stroma-specific genes previously identified with melanoma scRNA-seq. F, Genes were ordered by inferred expression difference between cancer and stroma compartments in each tumor type, and GSEA was used to identify cancer- and stroma-enriched gene sets. G, Protein expression was inferred for cancer and stroma compartments in ovarian (OV) and breast (BRCA) cancer cohorts using iTRAQ protein quantification data and compared with deconvolved RNA-seq expression data. H, Comparison of IHC and deconvolved RNA-seq expression for a gene (S100A6) with highly variable cancer/stroma expression across cancer types; error bars reflect SDs of the point estimates.
Validation of tumor transcriptome deconvolution
We performed multiple analyses to evaluate the validity and accuracy of the tumor transcriptome deconvolution approach. First, known stromal (FAP, CD4, CSF1R) and epithelial-cell lineage factors (EPCAM) showed expected gene expression differences between stromal and cancer compartments across cancer types (Fig. 2B, Supplementary Figs. S2–S5, and S8). Similarly, we found that previously derived stromal and immune-cell–specific gene sets (12) were inferred to have markedly higher expression in the stroma compartment of all tumor types (Fig. 2C). Next, because somatic CNAs are hallmarks of cancer cell genomes, we reasoned that stroma-specific genes should not be affected by tumor CNAs. Indeed, inferred cancer-specific genes (top-100, ranked by cancer vs. stroma expression change in each tumor type), but not stroma-specific, showed strong positive correlation between expression and bulk tumor CNA levels (Fig. 2D). Furthermore, variation in overall CNA-expression correlation levels between tumor types could be explained by the overall prevalence of CNAs in a given tumor type.
We tested that the concordance of the deconvolved expression estimates with melanoma single-cell RNA-seq (scRNA-seq) profiling data (28). Indeed, deconvolved stroma and cancer-compartment expression was significantly higher for stroma and cancer cell–specific genes identified by the scRNA-seq study (P = 5e − 55 and P = 7.2e − 4, respectively, Mann–Whitney two-tailed, Fig. 2E). Next, using GSEA, we found consistent association of compartment-wise inferred gene expression and known hallmarks of cancer (e.g., cell cycle and DNA repair) and stromal cells (e.g., angiogenesis and immune response) across cancer types (Fig. 2F; Supplementary Fig. S9). We then evaluated the extent that the deconvolved mRNA profiles represent an accurate proxy for protein levels in cancer and stromal cells. We used the same approach to deconvolve protein abundance data from two TCGA tumor types (OV and BRCA; ref. 29), and found good concordance between the mRNA and protein expression profiles (Fig. 2G). Finally, to evaluate the concordance of the deconvolution approach with IHC data, we first identified genes with high variability in cancer/stromal cell differential expression across tumor types (Supplementary Fig. S10A). We then used IHC data from The Human Protein Atlas (26) to confirm that expression patterns of the two most highly expressed and variable genes were concordant across tumor types (S100A6 in Fig. 2H and LDHB in Supplementary Fig. S10B). Overall, these diverse comparisons, using orthogonal types of data, support that the deconvolution approach can robustly deconvolve gene expression profiles of cancer and stromal cell compartments across the 20 tested tumor types.
Inference and validation of aberrant LR expression in breast cancer
To further validate whether we could infer compartment-specific expression of ligands and receptors, we studied the well-defined molecular subtypes of breast cancer. The basal subtype is defined, in part, by lack of HER2 and estrogen receptor overexpression. Expectedly, we observed approximately 60-fold and approximately 250-fold lower cancer compartment expression of HER2 (ERBB2) and estrogen (ESR1) receptors in basal as compared with HER2+ and luminal subtypes, respectively (Supplementary Fig. S11). We then performed a systematic screen for differentially expressed ligands and receptors in basal tumors. This screen highlighted the antagonistic Wnt-pathway ligand, SFRP1, which had very high expression in cancer cells of basal tumors while being nearly absent in cancer cells of other breast cancer subtypes (Fig. 3A and C). Among receptors, the screen highlighted marked downregulation of the IL6 co-receptor, IL6ST (GP130), in basal cancer cells (Fig. 3B and C). To validate these compartment-specific differential expression patterns, we performed RNA-ISH using specific RNAScope probes generated against either SFRP1 or IL6ST in FFPE sections of basal and luminal-type breast tumors (see Materials and Methods, Fig. 3D). Cancer compartment SFRP1 and IL6ST expression in each subtype was quantified using two different quantification approaches yielding similar results (Fig. 3E; Supplementary Fig. S12). Consistent with the transcriptome deconvolution screen, we observed markedly higher expression of SFRP1, and lower expression of IL6ST, in cancer cells of basal tumors as compared with luminal tumors.
Validation of aberrant ligand and receptor expression in basal breast cancer. A and B, Differential cancer compartment expression of ligands (A) and receptors (B) in basal versus other molecular subtypes of breast cancer. C, Inferred expression of selected differentially expressed receptors and ligands in cancer and stromal compartment across breast cancer subtypes; expression in normal tissue (non-deconvolved) was included for comparison. D, RNA-ISH of SFRP1 and IL6ST in two basal and two luminal FFPE breast tumor samples (two slides per tumor). E, Quantification of SFRP1 and IL6ST expression (mean detection probability across cancer cell regions) in the tissue sections.
Validation of aberrant ligand and receptor expression in basal breast cancer. A and B, Differential cancer compartment expression of ligands (A) and receptors (B) in basal versus other molecular subtypes of breast cancer. C, Inferred expression of selected differentially expressed receptors and ligands in cancer and stromal compartment across breast cancer subtypes; expression in normal tissue (non-deconvolved) was included for comparison. D, RNA-ISH of SFRP1 and IL6ST in two basal and two luminal FFPE breast tumor samples (two slides per tumor). E, Quantification of SFRP1 and IL6ST expression (mean detection probability across cancer cell regions) in the tissue sections.
Pan-cancer compartment-specific expression of ligands and receptors
We then explored the compartment specificity of 263 ligands and 242 receptors (603 LR pairs) with detectable bulk tumor gene expression (>1 RPKM in >25% of samples) in at least half of the 20 cancer types. Expectedly, ligands belonging to the complement system (e.g., C1QB and C3) as well as leukocyte-specific chemokines (e.g., CCL5 and CCL21) had high stroma-specific expression across tumor types (Fig. 4A). Among receptors, immune (e.g., CD2 and CD4) and macrophage (CSF1R)-specific factors were expectedly among the top stromal-specific receptors across tumor types (Fig. 4B). Cancer-specific ligand expression across tumor types was much less frequent and pronounced. Interestingly, cancer-specific ligands included endothelial cell targeting factors (PODXL2 and VEGFA), consistent with the hypothesis that cancer cells may shape tumor vascularization (Fig. 4A; ref. 30). Receptors with cancer cell–specific expression across tumor types were also less common, but included known cancer-associated members of the EGF-family (ERBB2 and ERBB3), Wnt-family (LRP4, LRP5 and LRP6), and FGF-family (FGFR3; Fig. 4B).
Pan-cancer inference of cross-talk. A and B, Differential expression (median) between cancer and stroma compartments of ligands (A) and receptors (B) across cancer types. C, The relative cross-talk (RC) score approximates the relative flow of signaling in the four possible directions between cancer and stromal cell compartments, including a bulk (non-deconvolved) matched normal tissue reference. In the concrete example, a given ligand is expressed with 1, 6, and 2 copies (receptor in 3, 1, and 1 copies) in cancer, stroma, normal cells, respectively. This yields a stroma-to-cancer (S>C) RC score of 60%. D, Median RC score across the 20 solid tumor types was estimated and plotted for each direction of signaling. E and F, RC scores for the 5 LR pairs with highest median pan-cancer cancer-to-cancer and stroma-to-cancer signaling scores. G, Single-cell gene expression data for four genes inferred in the pan-cancer (ACVR2B and LRP6) and SKCM-specific (BMPR1B and SEMA6A) analysis, comparing expression for malignant (cancer) and nonmalignant (stroma) cells in melanoma tumors.
Pan-cancer inference of cross-talk. A and B, Differential expression (median) between cancer and stroma compartments of ligands (A) and receptors (B) across cancer types. C, The relative cross-talk (RC) score approximates the relative flow of signaling in the four possible directions between cancer and stromal cell compartments, including a bulk (non-deconvolved) matched normal tissue reference. In the concrete example, a given ligand is expressed with 1, 6, and 2 copies (receptor in 3, 1, and 1 copies) in cancer, stroma, normal cells, respectively. This yields a stroma-to-cancer (S>C) RC score of 60%. D, Median RC score across the 20 solid tumor types was estimated and plotted for each direction of signaling. E and F, RC scores for the 5 LR pairs with highest median pan-cancer cancer-to-cancer and stroma-to-cancer signaling scores. G, Single-cell gene expression data for four genes inferred in the pan-cancer (ACVR2B and LRP6) and SKCM-specific (BMPR1B and SEMA6A) analysis, comparing expression for malignant (cancer) and nonmalignant (stroma) cells in melanoma tumors.
Inference of LR interactions across tumor types
Next, we developed the RC score to quantify and differentiate between types of autocrine and paracrine (here denoting signaling within or between compartments, respectively) LR cross-talk (Fig. 4C). Briefly, the RC score approximates LR complex concentrations and relative flow (direction) of signaling within and between cancer and stromal cell compartments, including a bulk (non-deconvolved) matched normal tissue reference (see Materials and Methods). A pan-cancer analysis of these RC scores revealed a striking difference between the cancer and stromal compartment. Although only 3 LR pairs had strong autocrine cancer-to-cancer signaling scores (median RC score >40%) across tumor types, 264 LR pairs showed strong autocrine stroma-to-stroma signaling RC scores across cancer types (Fig. 4D). Interestingly, the paracrine signaling interface between cancer and stromal cell compartments also showed a substantial number of recurrent interactions (>25 LR pairs with RC >40% for both cancer-to-stroma and stroma-to-cancer, Fig. 4D).
The inferred recurrent cancer-to-cancer autocrine LR pairs included receptors such as FGFR8, LRP6, and MST1R (Fig. 4E). Interestingly, MST1R (RON) is a prognostic marker and candidate therapeutic target in many solid cancer types (31). Signaling through ACVR2B was notably both among the top inferred cancer-to-cancer and stroma-to-cancer signaling interactions across tumor types (Fig. 4E and F), highlighting a potential common role for ACVR-coupled TGF-β/SMAD signaling in tumorigenesis.
We next investigated whether these compartment-specific recurrent LR expression patterns could be validated using orthogonal scRNA-seq data. We focused on the 3 pan-cancer LR pairs with highest inferred cancer-to-cancer signaling in SKCM (BMP7 >ACVR2B, BMP8B >ACVR2B, WNT3 >LRP6, Fig. 4E). Among these pairs, the two receptors (ACVR2B and LRP6) were inferred to be overexpressed (∼4-fold higher expression), whereas ligands were inferred to have unaltered expression, in SKCM cancer versus stromal compartments. Confirming these data, both receptors were also significantly overexpressed in malignant versus nonmalignant cells in a melanoma scRNA-seq dataset (P < 1e−100, Wilcoxon rank-sum, Fig. 4G; ref. 28).
Convergence of autocrine cancer cell cross-talk across tumor types
The pan-cancer analysis indicated that cancer-cell–directed LR interactions tended to differ across tumor types. To uncover tumor-type–specific LR interactions, we performed a screen for LR interactions with extreme autocrine cancer–cancer RC scores within individual tumor types. Indeed, most (14/20) cancer types had multiple LR interactions with high cancer autocrine scores (RC score >60%, Fig. 5A).
Convergence of autocrine cancer cell cross-talk across tumor types. A, Distributions of RC (relative cross-talk) scores for cancer–cancer autocrine signaling across tumor types; the top two LR pairs are highlighted for each tumor type. B, Cancer–cancer RC scores for the top BMP/ACVR (involving any BMP-ligand), Ephrin (involving any Ephrin receptor), and FGF-signaling (involving any FGF receptor) LR pair in each cancer type.
Convergence of autocrine cancer cell cross-talk across tumor types. A, Distributions of RC (relative cross-talk) scores for cancer–cancer autocrine signaling across tumor types; the top two LR pairs are highlighted for each tumor type. B, Cancer–cancer RC scores for the top BMP/ACVR (involving any BMP-ligand), Ephrin (involving any Ephrin receptor), and FGF-signaling (involving any FGF receptor) LR pair in each cancer type.
We first sought to validate the top inferred SKCM cancer autocrine pairs with public scRNA-seq data. Among the top-2 LR pairs (BMP7>BMPR1B, SEMA6A>PLXNA2, Fig. 5A), the receptor BMPR1B and the ligand SEMA6A were inferred to be significantly overexpressed in deconvoluted cancer versus stromal cells (4- and 7-fold, respectively). Confirming these results, both genes were also significantly overexpressed in cancer cells in melanoma scRNA-seq data (P <1e−100, Fig. 4G). Next, in brain tumors (LGG and GBM), delta-notch signaling was highly specific to the cancer compartment (DLL1-NOTCH1, RC >80%, Fig. 5A). Both DLL1 and NOTCH1 were inferred to have >4-fold higher expression in cancer compared with stromal cells and normal brain tissue in both tumor types, and DLL1 cancer cell expression was approximately 16-fold higher than stroma and normal tissue in LGG (Supplementary Fig. S13). These data are consistent with previous studies showing that Notch autocrine/juxtacrine signaling is critical for glioma tumorigenesis (32, 33).
Interestingly, the top autocrine cancer–cancer LR interactions in each tumor type tended to converge on the same signaling pathways. 9/20 solid tumor types had top autocrine interactions converging on BMP-signaling pathways (BMP-ligands and ACVR-receptors, Fig. 5A). Furthermore, 19/20 tumor types (all except KIRC) had at least one cancer–cancer-specific interaction (RC>40%) involving BMP-ligands and ACVR2B/BMPR1B receptors. Eight tumor types had top cancer–cancer interactions involving different FGF ligands and receptors, with 14/20 tumor types having at least one cancer–cancer-specific FGF-signaling interaction (RC>40%, Fig. 5A and B). A similar pattern was observed for Eph/ephrin signaling interactions, with 6 tumor types having top interactions converging on Eph/ephrin signaling and 13/20 tumor types having at least one cancer–cancer-specific interaction (RC>40%, Fig. 5A and B).
Pan-cancer expression signatures of immune checkpoints
Immune checkpoints are modulators of tumor immune cell infiltration and antitumor activity. We first explored cancer and stromal compartment expression patterns of two common anticancer therapeutic targets, the PD-L1>PD-1 and CD86>CTLA4 immune checkpoint interactions. Expectedly, expression of the two receptors (PD-1 and CTLA4) was high in stroma and almost absent in cancer cells across all tumor types (Fig. 6A). Interestingly, stromal expression of the receptors was highest in melanoma (SKCM; Supplementary Fig. S14A), the solid tumor type where immune checkpoint blockade (ICB) was first clinically approved and where therapeutic blockade currently shows the most profound response rates (34).
Pan-cancer expression of immune checkpoints. A, Inferred cancer and stromal compartment gene expression across tumor types for immune checkpoint ligands (PD-1 and CTLA-4) and cognate ligands (PD-L1 and CD86, respectively). B, Median pan-cancer cancer-to-stroma and stroma-to-cancer RC cross-talk scores for 27 known immune checkpoint LR pairs, top three as well as PD-L1>PD-1 and CD86>CTLA4 highlighted. C, RC (relative cross-talk) scores for top three LR pairs with highest median pan-cancer cancer-to-stroma and stroma-to-cancer signaling scores. D, Correlation of immune cell-type fractions (median across tumor type) with inferred cancer and stroma expression of checkpoint ligands and receptors across tumor types. E, Highly correlated pairs of immune cell-types and checkpoint ligand/receptor expression.
Pan-cancer expression of immune checkpoints. A, Inferred cancer and stromal compartment gene expression across tumor types for immune checkpoint ligands (PD-1 and CTLA-4) and cognate ligands (PD-L1 and CD86, respectively). B, Median pan-cancer cancer-to-stroma and stroma-to-cancer RC cross-talk scores for 27 known immune checkpoint LR pairs, top three as well as PD-L1>PD-1 and CD86>CTLA4 highlighted. C, RC (relative cross-talk) scores for top three LR pairs with highest median pan-cancer cancer-to-stroma and stroma-to-cancer signaling scores. D, Correlation of immune cell-type fractions (median across tumor type) with inferred cancer and stroma expression of checkpoint ligands and receptors across tumor types. E, Highly correlated pairs of immune cell-types and checkpoint ligand/receptor expression.
It is currently unclear to what extent cancer and stromal cells express the cognate inhibitory checkpoint ligands across different types of tumors (CD274/PD-L1 and CD86; ref. 35). Strikingly, our expression deconvolution showed marked overexpression of both ligands in the stroma compared with the cancer compartment across all tumor types (Fig. 6A). However, tumor types reported to have high response to ICB (34), such as cervical (CESC), lung squamous (LUSC), and kidney cancers (KIRC/KIRP), were among the tumor types with highest PD-L1 expression in cancer cells (Fig. 6A; Supplementary Fig. S14B and S14C). In contrast, colorectal cancer, which has one of the poorest reported response rates to ICB, had the lowest (near-zero) inferred cancer cell expression of PD-L1 as compared with other tumor types (Fig. 6A; Supplementary Fig. S14D).
Next, we expanded the analysis to a larger set of 27 immune checkpoint LR interactions linked to antitumor activity (see Materials and Methods). To discover and characterize checkpoints involving potential cross-talk between cancer and immune cells, we computed the median cancer-to-stroma and stroma-to-cancer cross-talk scores for each checkpoint LR pair across tumor types. This analysis again highlighted low cancer–stroma cross-talk scores for the PD-L1>PD1 (CD274>PDCD1) and CD86>CTLA4 checkpoints across cancer types (median cancer-to-stroma RC score <6%, Fig. 6B). However, our analysis revealed two LR pairs, NCR3LG1>NCR3 and ICOSLG>ICOS, with very high cancer-to-stroma RC scores (62.5% and 31.7%, respectively, Fig. 6B). NCR3LG1>NCR3, an NK cell–activating checkpoint, had high (>50%) cancer-to-stroma score across most cancer types, with very high scores (>95%) in 4 tumor types (BRCA, CESC, HNSC, and KIRC, Fig. 6C). ICOSLG>ICOS, a T-cell–activating checkpoint, had moderate (25%–50%) cancer-to-stroma scores across most tumor types, except in brain and melanoma tumors where most signaling was inferred to be confined to the stroma compartment (Fig. 6C).
Inferred cross-talk from immune to cancer cells expectedly highlighted the interferon-gamma (IFNG) pathway. LR interactions involving IFNG and the two cognate receptors (IFNGR1 and IFNGR2) had moderate (25%–50%) but consistent stroma-to-cancer scores across all tumor types (Fig. 6B and C). Next, BTLA>TNFRSF14 (HVEM), a T-cell inhibiting checkpoint, showed moderate stroma-to-cancer interaction scores across several tumor types. Pancreatic, prostate, and kidney tumors had high (∼50%) stroma-to-cancer interaction scores (Fig. 6C).
Association of immune cell types and checkpoint expression
Finally, we explored potential associations between tumor immune cell types and checkpoint ligand/receptor expression. We obtained immune cell-type frequencies estimated across all TCGA samples (25), and correlated these frequencies (median per tumor type) with inferred cancer and stroma expression levels of PD1/CTLA4 ligands and receptors across cancer types (Fig. 6D). This analysis revealed multiple strong associations for stromal expression of CTLA4 and PDCD1 receptor expression. Expectedly, stromal PDCD1 expression correlated strongly with levels of cytotoxic CD8 T cells across tumor types (Pearson r = 0.76, P = 9.0e−5). Interestingly, both stromal CTLA4 and PDCD1 expression correlated with M1/M2 macrophage polarization. This was especially pronounced for CTLA4, which showed strong positive correlation with M0/M1 levels (r = 0.67, P = 1e−3, Fig. 6E), and negative correlation with M2 levels (r = −0.7, P = 5e−4).
Strong associations between immune cell type levels and expression of CD86/CD274 ligands were less common. Although CD274 cancer expression showed poor correlation with all types of immune cells, stromal CD274 expression was inversely correlated with CD4+ memory cell levels (r = −0.64, P = 3e−3, Fig. 6E). Interestingly, we observed a strong correlation between CD86 cancer expression and neutrophil levels (r = −0.71, P = 4e−4, Fig. 6E). This signal was driven by high CD86 cancer expression and neutrophil levels in brain (GBM/LGG), lung (LUAD/LUSC), and kidney (KIRC) tumors, and the association was still significant after removing the GBM tumor type with very high CD86 expression and neutrophil levels (without GBM: r = −0.60, P = 0.007).
Discussion
Tumors are heterogenous mixtures of cancer cells and infiltrating noncancer (stromal) cells. Here, we exploit that tumor composition (purity) vary across tumor samples, enabling tumor transcriptome deconvolution and inference of cancer and stromal compartment ligand and receptor expression profiles. We show that this approach recovers known hallmarks of cancer and stromal cell transcriptomes and is concordant with orthogonal single-cell transcriptome, IHC, and RNA-ISH data. By analyzing approximately 8,000 tumor samples across 20 solid tumor types, we inferred putative LR cross-talk within and between the cancer and stromal compartments.
Many LR pairs had stroma-specific expression across all tumor types and tissues, highlighting the unique and conserved modes of cross-talk underlying stromal and immune cells in all solid tumors. In contrast, autocrine cancer-to-cancer cross-talk tended to be tumor-type–specific and likely determined by the cell-of-origin. However, independent of tumor type, the top cancer–cancer autocrine pairs often converged on BMP, Ephrin, and FGFR-family signaling pathways. The importance of these pathways in tumorigenesis is well established (1, 36, 37), underscoring the validity and utility of our approach. Moreover, the top cancer cell–specific interactions in individual tumor types highlight specific putative cancer cell signaling dependencies, such as BMP2>ACVR2B in gliomas and EFNA3>EPHB1 in endometrial carcinomas. Further functional studies are needed to validate these interactions and test whether they represent therapeutic vulnerabilities. Encouragingly, LR pairs with strong cancer–cancer autocrine activity in many tumor types included MST1>MST1R (RON). RON acts upstream of the RAS–ERK and PI3K–AKT signaling pathways and is a prognostic marker and candidate therapeutic target in many solid cancer types (31).
We further explored how this approach could nominate differences in LR cross-talk between molecular or clinical subtypes of a given tumor type. We recovered the expected subtype-specific expression patterns of HER2 and ESR1 in cancer cells across breast cancer subtypes. By inferring cross-talk specific to basal breast cancers, we inferred marked overexpression of the Wnt inhibitory ligand, SFRP1, in cancer cells of basal tumors. Downregulation of SFRP1 has been linked to epithelial-to-mesenchymal transition in breast cancer cells (38). Interestingly, our analysis also inferred strong upregulation of IL6 signaling through GP130 (IL6ST) in cancer cells of non-basal tumors. This observation is consistent with previous studies highlighting a functional role for RET-IL6 cross-talk in ER-positive breast tumors (39). Importantly, both SFRP1 and IL6ST compartment and subtype-specific expression patterns could be validated using RNA-ISH in breast cancer FFPE tissue sections.
We also explored pan-cancer expression patterns, and potential cancer-immune cross-talk, of immune checkpoints implicated in antitumor immunity. Deconvolution analysis uncovered moderate expression of PD-L1 and CD86 ligands, two common therapeutic targets, in cancer cells of many tumor types, consistent with the hypothesis that cancer cells can express these ligands to attenuate T-cell responses (40). However, all tumor types showed even higher expression of these two checkpoint ligands in the stromal compartment, suggesting that the bulk of immune checkpoint inhibitory signals may be mediated by noncancer cells in the TME. Interestingly, the tumor types with highest (e.g., lung squamous and kidney cancers) and lowest (e.g., colorectal cancer) inferred PD-L1 cancer cell expression have markedly different response rates to PD-L1/PD-1 checkpoint blockade in clinical trials (34). In contrast, melanoma tumors, with very high checkpoint inhibition response rates, did not follow this pattern and displayed near-zero PD-L1 cancer cell expression. Overall, these results suggest both potential similarities and differences in correlates of ICB response. Furthermore, we found that stromal CTLA4 and PDCD1 expression correlated with M1/M2 macrophage polarization, where tumor types containing low levels of M1, and high levels of M2 macrophages, also showed low stromal CTLA4/PDCD1 expression. This observation is consistent with the finding that M2 macrophages generally promote cancer progression and facilitate immune suppression (41).
A broader analysis of 27 known immune checkpoints across tumor types highlighted multiple checkpoint interactions with expression signatures indicative of cancer-immune cross-talk. The two checkpoints, NCR3LG1>NCR3 and ICOSLG>ICOS, showed very high cancer-to-stroma cross-talk scores across tumor types. These two checkpoints are known activators of NK and T-cell responses, respectively (42). However, soluble NCR3LG1 (B7-H6) in the TME has been reported to sequester NCR3 receptors and inhibit NK-mediated antitumor activity (42). The high cancer specificity of NCR3LG1 expression observed in several tumor types, such as breast and kidney cancers, therefore, highlight a putative therapeutic opportunity to inhibit proteolytic shedding of NCR3LG1 and enhance NCR3-mediated activation of NK cells. In the reverse direction, BLTA>TNFRSF14 showed high stroma-to-cancer LR interaction scores across several tumor types. Interactions between BTLA ligands on immune cell and TNFRSF14, also known as herpesvirus entry mediator (HVEM), trigger inhibition of T-cell immune responses (43). We identified multiple tumor types, such as pancreatic and kidney tumors, with strong cancer cell–specific expression of TNFRSF14, highlighting potential cancer cell–induced immune inhibition via BTLA in these malignancies.
Overall, our study provides insights into the complex interactions shaping the TME. We show how tumor transcriptome deconvolution can nominate putative, testable cross-talk interactions underlying different tumor types and subtypes. Downstream experiments are needed to validate and explore the function of such candidate interactions. The advent of single-cell or spatial transcriptomic technologies (44, 45) may provide an orthogonal approach to study cross-talk in the TME. However, our approach is especially useful in settings where bulk tumor biopsy data are either already abundant or the only feasible data source, which is the case in many clinical settings. We anticipate that our data and results could complement existing biomarker and target discovery approaches by providing computationally purified LR profiles of cancer cells as they exist inside human tumors.
Supplementary Data
All code used to generate the figures and statistics of the paper is included in a CodeOcean Data Capsule (https://codeocean.com/capsule/0965978). RC scores and other summary metrics for all analysed ligand-receptor pairs are summarized in supplementary table “RCScores”.
Authors' Disclosures
No disclosures were reported.
Authors' Contributions
U. Ghoshdastider: Conceptualization, formal analysis, validation, investigation, visualization, methodology, writing–original draft, writing–review and editing. N. Rohatgi: Conceptualization, formal analysis, investigation, visualization, methodology, writing–review and editing. M. Mojtabavi Naeini: Investigation. P. Baruah: Software, investigation, visualization, methodology. E. Revkov: Investigation. Y.A. Guo: Investigation, visualization, methodology. S. Rizzetto: Investigation, visualization, methodology. A.M.L. Wong: Investigation, RNAscope analysis. S. Solai: Investigation, methodology. T.T. Nguyen: Investigation, visualization, methodology. J.P.S. Yeong: Investigation, methodology, acquired and annotated FFPE samples. J. Iqbal: Investigation, methodology, acquired and annotated FFPE samples. P.H. Tan: Investigation, methodology, acquired and annotated FFPE samples. B. Chowbay: Investigation, methodology, acquired and annotated FFPE samples. R. Dasgupta: Methodology, RNAscope analysis. A.J. Skanderup: Conceptualization, data curation, formal analysis, supervision, funding acquisition, writing–original draft, writing–review and editing.
Acknowledgments
This research is supported by the Singapore Ministry of Health's National Medical Research Council under its OF-IRG program (OFIRG18may-0075) and Agency for Science, Technology and Research (A*STAR) under its CDAP program (grant no. 1727600057).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.