Tumor microenvironment (TME) cells constitute a vital element of tumor tissue. Increasing evidence has elucidated their clinicopathologic significance in predicting outcomes and therapeutic efficacy. Nonetheless, no studies have reported a systematic analysis of cellular interactions in the TME. In this study, we comprehensively estimated the TME infiltration patterns of 1,524 gastric cancer patients and systematically correlated the TME phenotypes with genomic characteristics and clinicopathologic features of gastric cancer using two proposed computational algorithms. Three TME phenotypes were defined, and the TMEscore was constructed using principal component analysis algorithms. The high TMEscore subtype was characterized by immune activation and response to virus and IFNγ. Activation of transforming growth factor β, epithelial–mesenchymal transition, and angiogenesis pathways were observed in the low TMEscore subtype, which are considered T-cell suppressive and may be responsible for significantly worse prognosis in gastric cancer [hazard ratio (HR), 0.42; 95% confidence interval (CI), 0.33–0.54; P < 0.001]. Multivariate analysis revealed that the TMEscore was an independent prognostic biomarker, and its value in predicting immunotherapeutic outcomes was also confirmed (IMvigor210 cohort: HR, 0.63; 95% CI, 0.46–0.89; P = 0.008; GSE78220 cohort: HR, 0.25; 95% CI, 0.07–0.89; P = 0.021). Depicting a comprehensive landscape of the TME characteristics of gastric cancer may, therefore, help to interpret the responses of gastric tumors to immunotherapies and provide new strategies for the treatment of cancers.
Genomic analysis has been the primary methodology used in international efforts to discover novel biological targets in gastric cancer (1, 2), although this method has not led to the successful discovery of distinct mechanisms. However, some studies have revealed the significance of tumor-related structures, as well as upregulated signaling pathways in both cancer cells and the tumor microenvironment (TME; refs. 3, 4), suggesting that intercellular relationships are more important than genomic factors at the single-cell level (5, 6). An increasing body of literature suggests a crucial role for the TME in cancer progression and therapeutic responses (7, 8). For example, differences in the compositions of resident cell types within the TME, including cytotoxic T cells, helper T cells, dendritic cells (DCs), tumor-associated macrophages, mesenchymal stem cells, and associated inflammatory pathways, have been reported in patients with cancer (5, 6, 9, 10). The TME context determined at diagnosis reflects the immune response (11) and chemotherapy benefit (8), and changes in the numbers of CD8+ T cells, CD4+ T cells, macrophages, and cancer-associated fibroblasts infiltrating in the TME correlate with clinical outcomes in various malignancies, including gastric cancer, melanoma, urothelial cancer, lung cancer, and breast cancer (10, 12–14).
Because gastric cancers are significantly associated with infectious agents, most notably Helicobacter pylori and Epstein–Barr virus (EBV), biomarkers that can predict responsiveness to immune-checkpoint blockade are being extensively investigated to further improve precision immunotherapy (15). The abundance of immune cells and other cells in the TME can be estimated using computational methods (16–18). Although several studies using these methodologies have explored the clinical utility of TME infiltrates (7, 19), and although several mechanisms associated with the role of TME in immunotherapy response and resistance have been experimentally identified for some tumor types (4, 13, 14, 20, 21), to date, the comprehensive landscape of cells infiltrating the TME has not yet been elucidated.
In the present study, two proposed computational algorithms (16, 17) were used to estimate the fractions of 22 immune cell types and cancer-associated fibroblasts based on clinically annotated gastric cancer gene-expression profiles (1, 22). We estimated the TME infiltration patterns of 1,524 tumors from patients with gastric cancer, and systematically correlated the TME phenotypes with genomic characteristics and clinical and pathologic features of gastric cancer. As a result, we established a methodology to quantify the TME infiltration pattern (TMEscore). The TMEscore was found to be a robust prognostic biomarker and predictive factor for response to immune-checkpoint inhibitors.
Materials and Methods
Gastric cancer data sets and preprocessing
We systematically searched for gastric cancer gene-expression data sets that were publicly available and reported full clinical annotations. Patients without survival information were removed from further evaluation. In total, we gathered seven treatment-naïve cohorts of samples from patients with gastric cancer for this study: ACRG/GSE62254, GSE57303, GSE84437, GSE15459, GSE26253, GSE29272, and TCGA-STAD. The raw data from the microarray data sets generated by Affymetrix and Illumina were downloaded from the Gene-Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). The raw data for the data set from Affymetrix were processed using the RMA algorithm for background adjustment in the Affy software package (23). RMA was used to perform background adjustment, quantile normalization, and final summarization of oligonucleotides per transcript using the median polish algorithm. The raw data from Illumina were processed using the lumi software package.
The procedure used for data set selection in the GEO database was as follows. The following search parameters were used: (((survival OR prognosis OR prognostic OR outcome OR death OR relapse OR recurrence))) AND ((gastric cancer[MeSH Terms]) OR ((((((((((gastric cancer[Title]) OR gastric adenocarcinoma[Title]) OR gastric neoplasm[Title]) OR gastric tumor[Title]) OR gastric carcinoma[Title]) OR stomach cancer[Title]) OR stomach adenocarcinoma[Title]) OR stomach neoplasm[Title]) OR stomach tumor[Title]) OR stomach carcinoma[Title])). In the initial search, 656 items were recognized, and only the first 100 were independent chip series. Among the 100 series, 32 contained mRNA expression profiles of cancer tissues from patients with gastric cancer. Two additional series were subsequently identified from the subseries list of the corresponding super series, and one additional series was identified from the related literature. Among the 35 items, three were repeated; nine items included fewer than 40 patients; and four were derived from high-throughput sequencing data. Among the remaining, we only obtained survival data for the following six items: GSE62254/ACRG, GSE15459, GSE29272, GSE84437, GSE26253, and GSE57303. The patients related to these six data sets were included for further analysis.
Level 4 gene-expression data (FPKM normalized) of The Cancer Genome Atlas (TCGA) were downloaded from the UCSC Xena browser (GDC hub: https://gdc.xenahubs.net). For TCGA data set, RNA-sequencing data (FPKM values) were transformed into transcripts per kilobase million (TPM) values, which are more similar to those resulting from microarrays and more comparable between samples (24). Platforms, numbers of samples, baseline information, and clinical end points of each eligible GC data set are summarized in Supplementary Table S1. Data were analyzed with the R (version 3.4.0) and R Bioconductor packages.
Collection of clinical and genome-related data
The corresponding clinical data from these data sets were retrieved and manually organized when available. For some series, clinical data not attached to gene-expression profiles were obtained through one of the following three methods: (i) directly downloaded from the corresponding item page in the GEO data set website, (ii) from the supplementary materials in the relative literature, and (iii) using the GEOquery package in R. Corresponding authors were contacted for further information where necessary. Updated clinical data and sample information for TCGA-STAD samples were obtained from the Genomic Data Commons (https://portal.gdc.cancer.gov/) using the R package TCGAbiolinks (25). Overall survival information of all TCGA data sets was obtained from the supplementary data of published research (26). Somatic mutation data (SNPs and small INDELs, MuTect2 Variant Aggregation and Masking) for STAD patients were obtained (https://gdc.xenahubs.net/download/TCGA-STAD/Xena_Matrices/TCGA-STAD.mutect2_snv.tsv.gz). The numbers of predicted neoepitopes based on tumor-specific HLA typing, total mutations, and CYT signature score for each patient were obtained for 263 STAD samples from the supplementary table of Rooney and colleagues (27). Somatic copy-number alterations, immune signature scores, and cell-cycle signature scores were obtained for 269 STAD samples from the supplementary table of Davoli and colleagues (28), which can be accessed by the link: http://science.sciencemag.org/highwire/filestream/689461/field_highwire_adjunct_files/7/aaf8399-Davoli-SM-table-S7.xlsx.
Inference of infiltrating cells in the TME
To quantify the proportions of immune cells in the gastric cancer samples, we used the CIBERSORT algorithm (16) and the LM22 gene signature, which allows for sensitive and specific discrimination of 22 human immune cell phenotypes, including B cells, T cells, natural killer cells, macrophages, DCs, and myeloid subsets. CIBERSORT is a deconvolution algorithm that uses a set of reference gene-expression values (a signature with 547 genes) considered a minimal representation for each cell type and, based on those values, infers cell type proportions in data from bulk tumor samples with mixed cell types using support vector regression. Gene-expression profiles were prepared using standard annotation files, and data were uploaded to the CIBERSORT web portal (http://cibersort.stanford.edu/), with the algorithm run using the LM22 signature and 1,000 permutations. Proportions of stromal cells were estimated by applying the Microenvironment Cell Populations-counter method, which allows for robust quantification of the absolute abundance of eight immune and two stromal cell populations in heterogeneous tissues from transcriptomic data (17).
Consensus clustering for TME-infiltrating cells
Tumors with qualitatively different TME cell infiltration patterns were grouped using hierarchical agglomerative clustering (based on Euclidean distance and Ward's linkage). Unsupervised clustering methods (K-means; ref. 29) for data set analysis were used to identify TME patterns and classify patients for further analysis. A consensus clustering algorithm was applied to determine the number of clusters in the meta-data set and Asian Cancer Research Group (ACRG) cohort to assess the stability of the discovered clusters. This procedure was performed using the ConsensuClusterPlus R package (30) and was repeated 1,000 times to ensure the stability of classification.
Differentially expressed genes (DEG) associated with the TME phenotype
To identify genes associated with TME cell–infiltrating patterns, we grouped patients into TMEclusters based on immune-cell infiltration. DEGs among these groups were determined using the R package limma (31), which implements an empirical Bayesian approach to estimate gene-expression changes using moderated t tests. DEGs among TME subtypes were determined by significance criteria (adjusted P value < 0.05) as implemented in the R package limma. The adjusted P value for multiple testing was calculated using the Benjamini–Hochberg correction (32).
Dimension reduction and generation of TME gene signatures
The construction of TME metagenes was performed as follows. First, each DEG among TMEclusters was standardized across all samples in the ACRG cohort. An unsupervised clustering method (K-means; ref. 29) for analysis of DEGs was used to classify patients into several groups for further analysis. Then, the random forest classification algorithm was used to perform dimension reduction in order to reduce noise or redundant genes (33). Next, the clusterProfiler R package (34) was adopted to annotate gene patterns. A consensus clustering algorithm (30) was applied to define the cluster of genes. For gene-expression (normalized by RMA or TPM methods) analysis, the expression of each gene in a signature was first transformed into a z-score. Then, a principal component analysis (PCA) was performed, and principal component 1 was extracted to serve as the signature score. This approach has the advantage of focusing the score on the set with the largest block of well-correlated (or anticorrelated) genes in the set, while downweighting contributions from genes that do not track with other set members. After obtaining the prognostic value of each gene signature score, we applied a method similar to GGI (35) to define the TMEscore of each patient:
where i is the signature score of clusters whose Cox coefficient is positive, and j is the expression of genes whose Cox coefficient is negative.
Functional and pathway enrichment analysis
Gene annotation enrichment analysis using the clusterProfiler R package (34) was performed on TME signature genes. Gene Ontology (GO) terms were identified with a strict cutoff of P < 0.01 and false discovery rate (FDR) of less than 0.05. We also identified pathways that were up- and downregulated among TME gene clusters A and C for a certain TME phenotype by running a gene set enrichment analysis (GSEA; ref. 36) of the adjusted expression data for all transcripts. Gene sets were downloaded from the MSigDB database of the Broad Institute (36), and HALLMARK gene sets were selected to perform quantification of pathway activity. Enrichment P values were based on 10,000 permutations and subsequently adjusted for multiple testing using the Benjamini–Hochberg procedure to control the FDR (32). A developing R package enrichplot (https://github.com/GuangchuangYu/enrichplot), implements several visualization methods to help interpreting enrichment results and was adopted to visualize GSEA result of TME gene clusters. To explore the correlation between the TME signature and other relevant biological processes, we used gene sets curated by Mariathasan and colleagues (13), including (i) CD8 T-effector signature (11); (ii) antigen processing machinery (37); (iii) immune-checkpoint; (iv) epithelial–mesenchymal transition (EMT) markers previously reported (38); (v) pan-fibroblast TGFβ response signature (Pan-F-TBRS; ref. 13); (vi) DNA replication–dependent histones (13); (vii) select members of the DDR-relevant gene set (39); (viii) Angiogenesis signature previously reported (40); (ix) cell-cycle genes (KEGG); (x) WNT targets (41); (xi) cell-cycle regulators (42); (xii) mismatch repair (KEGG); (xiii) nucleotide excision repair (KEGG); (xiv) homologous recombination (KEGG).
Genomic and clinical data sets with immune-checkpoint blockade
Five genomic and transcriptomic data sets from patients with metastatic urothelial cancer (13) treated with an anti–PD-L1 agent (atezolizumab), patients with metastatic melanoma (43) treated with anti–PD-1 (pembrolizumab), patients with advanced melanoma treated with various types of immunotherapies from the TCGA-SKCM cohort (44), patients with advanced melanoma treated with MAGE-3 antigen–based immunotherapy (45), and a mouse model treated with anti–CTLA-4 (46) were downloaded and analyzed to determine the predictive value of the TMEscore.
For the urothelial cancer data set, a fully documented software and data package is freely available under the Creative Commons 3.0 license and can be downloaded from http://research-pub.gene.com/IMvigor210CoreBiologies. After quality control using the R package arrayQualityMetrics, count data were normalized using the trimmed mean of M-values and transformed with voom to log2-counts per million with associated precision weights (31). For the melanoma data set (GSE78220, N = 28), expression profiles (FPKM normalized) and phenotypes have been deposited into the GEO under the accession code GSE78220. The expression profiles (FPKM normalized) of GSE78220 were transformed into TPM, converting FPKM data to values more comparable between samples (24). For the TCGA-SKCM cohort, the expression profiles (FPKM normalized) downloaded from the UCSC Xena browser were transformed into TPM, which was used to calculate TMEscore. For the melanoma cohort (GSE35640, N = 55) treated with MAGE-3 antigen-based immunotherapy, the raw data were downloaded and processed using the RMA algorithm for background adjustment using the Affy package (23). For the mouse model treated with CTLA-4 blockade (accession number GSE63557, N = 20), the normalized data were obtained from GEO and annotated with the GPL19103 profile.
The normality of the variables was tested by the Shapiro—Wilk normality test (47). For comparisons of two groups, statistical significance for normally distributed variables was estimated by unpaired Student t tests, and nonnormally distributed variables were analyzed by Mann–Whitney U tests (also called the Wilcoxon rank-sum test). For comparisons of more than two groups, Kruskal–Wallis tests and one-way analysis of variance were used as nonparametric and parametric methods, respectively (48). Correlation coefficients were computed by Spearman and distance correlation analyses. Two-sided Fisher exact tests were used to analyze contingency tables. The cutoff values of each data set were evaluated based on the association between patient overall survival and TMEscore in each separate data set using the survminer package. The R package MaxStat (49), which iteratively tests all possible cut points to find the one achieving the maximum rank statistic, was used to dichotomize TMEscore, and patients were then grouped into low and high TMEscore subtype in each data set to reduce computational batch effect. R package forestplot was used for presentation of the results of subgroup analysis of TMEscore in gastric cancer data sets and TCGA pan cancer data sets. To identify significant genes in the differential gene analysis, we applied the Benjamini–Hochberg method to convert the P values to FDRs (32). The Kaplan–Meier method was used to generate survival curves for the subgroups in each data set, and the log-rank (Mantel–Cox) test was used to determine the statistical significance of differences. The hazard ratios for univariate analyses were calculated using a univariate Cox proportional hazards regression model. A multivariate Cox regression model was used to determine independent prognostic factors using the survminer package. The package pROC (50) was used to plot and visualize receiver operating characteristic (ROC) curves to calculate the area under the curve (AUC) and confidence intervals to evaluate the diagnostic accuracy of tumor mutational burden (TMB), TMEscore, and the combination of them. For comparison of AUCs, likelihood ratio test for two correlated ROC curves was used. R package ggtree (51) was used to visualize phylogenetic trees of the TME signature genes. All heat maps were generated by the function of pheatmap (https://github.com/raivokolde/pheatmap). OncoPrint used to depict mutation landscape of TCGA-STAD cohort was constructed by ComplexHeatmap R package (52). All statistical analyses were conducted using R (https://www.r-project.org/) or SPSS software (version 25.0), and the P values were two-sided. P values of less than 0.05 were considered statistically significant.
Landscape of gastric cancer TME
The construction scheme of TME cell-infiltrating patterns and TME signatures was systematically evaluated (Supplementary Fig. S1A). To select the optimal cluster number, we assessed clustering stability using the ConsensusClusterPlus package (Supplementary Fig. S1B; ref. 30), which supported the existence of three robust subtypes of gastric cancer in a meta-cohort (GSE57303, GSE34942, GSE84437, ACRG/GSE62254, GSE15459, GSE29272, and TCGA-STAD). Unsupervised hierarchical clustering of the 1,524 tumors with matched TME cell expression profiles from the above independent gastric cancer cohorts was performed, and the results are shown in Supplementary Fig. S1C and Supplementary Table S2. The TME cell network depicted a comprehensive landscape of tumor–immune cell interactions, cell lineages, and their effects on the overall survival of patients with gastric cancer (Fig. 1A; Supplementary Tables S3–S4). Three main TME cell infiltration subtypes revealed by the data showed significant differences in survival (log-rank test, P < 0.001; Fig. 1B).
To further characterize and understand the biological and clinical differences among these intrinsic phenotypes, we focused on the ACRG cohort (containing 299 patients with gastric cancer), not merely because it contained the most patients and provided the most comprehensive patient information (Supplementary Table S5), but also because the CIBERSORT algorithm was more suitable to deconvolve microarray data from the Affymetrix platform. Cluster analysis revealed three distinct patterns of TME cell infiltration as all gastric cancer data sets exhibited (Supplementary Fig. S2A–S2D): TMEcluster-A was characterized by increases in the infiltration of cancer-associated fibroblasts, M2 macrophages, resting DCs, and resting mast cells (MC; refs. 53–56) and exhibited variable decreases in other TME cell types; TMEcluster-B exhibited high infiltration of M0 macrophages, neutrophils, activated DCs, and activated MCs; and TMEcluster-C showed significant increases in the infiltration of CD8+ T cells, M1 macrophages, and activated memory CD4+ T cells (refs. 53, 54, 57; Fig. 1C). The significant differences in TME cell infiltration in the three main TME phenotypes were confirmed with Kruskal–Wallis tests (Supplementary Fig. S2E; results of pairwise comparison were summarized in Supplementary Table S6).
In terms of clinical characteristics, TMEcluster-A was associated with a higher “Immunoscore” (Kruskal–Wallis, P < 2.2 × 10−16; Supplementary Fig. S2E; Supplementary Table S6), which we established based on a lasso immune signature score model in a previous study (7) to predict survival outcomes in patients with gastric cancer. We also observed that samples in TMEcluster-A exhibited poorer tumor differentiation and were enriched in the EMT molecular subtype. The opposite patterns were observed in TMEcluster-C (Fig. 1C). Survival analysis based on the TME phenotype showed TMEcluster-A (83 patients) to be significantly associated with poorer prognosis and TMEcluster-C (119 patients) to be associated with better prognosis (log-rank test, P < 0.001). Of the 299 patients with gastric cancer, 97 belonged to TMEcluster-B, which was characterized by an intermediate prognosis (log-rank test, P < 0.001; Fig. 1D).
Construction of the TME signature and functional annotation
To identify the underlying biological characteristics of each TME phenotype, unsupervised analysis of 1,033 DEGs acquired by the limma package (58) was used to classify patients into genomic subtypes, which was significantly consistent with the clustering results of the TME phenotype groups (χ2 contingency tests, P < 2.2 × 10−16). The matching rate of the TME cell clusters and TME gene clusters was 80.5%, 61.5%, and 40% for TME gene cluster C, TME gene cluster A, and TME gene cluster B, respectively (Supplementary Fig. S2F–S2G; Supplementary Table S7). Next, we sought to use random forest algorithms to perform dimension reduction to extract the phenotype signatures. The unsupervised hierarchical cluster analysis was based on the expression of the 238 most representative DEGs (Supplementary Table S8) and separated the ACRG cohort population into three distant patient clusters, termed gene clusters A–C (Fig. 2A). We visualized changes in clusters using an alluvial diagram (Supplementary Fig. S3A). Analysis also revealed two significant expression gene sets (Supplementary Fig. S3B; Supplementary Table S7).
GO enrichment analysis of the signature genes was conducted using the R package clusterProfiler. Significantly enriched biological processes are summarized in Supplementary Table S9. Gene clusters A and C showed enrichment of distinct biological processes. Overexpression of genes involved in immune activation, which were enriched in gene cluster C, correlated with good prognosis in gastric cancer, and upregulated stroma-related genes, which were enriched in gene cluster A, were associated with poorer prognosis (log-rank test, P < 0.001; refs. 10, 20, 21; Fig. 2B–D; Supplementary Table S9). The clusterProfiler R package was used to discover the potential regulatory relationships among the TME signature mRNAs in gastric cancer, and these results suggested that the pathways involved in the EMT and immune activation exhibited a significant amount of overlap with other pathways (Supplementary Fig. S3C). Figure 2E indicates that the significant differences in TME cell infiltration and the “Immunoscore” in the three TME gene clusters were consistent with the outcomes of TME cell-infiltrating patterns (Supplementary Fig. S2E), as expected. Robust correlations between TME signature scores and TME cell-infiltrating patterns were also validated in the GSE15459 and TCGA-STAD data sets (Supplementary Fig. S4A–S4H).
Transcriptome traits and clinical characteristics of TME phenotypes
Next, we defined two aggregate scores using the PCA algorithm: TMEscore A from TME signature genes A and TMEscore B from TME signature genes B (Fig. 2A; Supplementary Table S9). We computed TMEscore A and TMEscore B for each sample in the study as the sum of the relevant individual scores. To this end, we obtained the prognostic signature score, which we termed the TMEscore. In order to analyze the cytokine and chemokine milieu characterizing each gene cluster (Fig. 2A), we analyzed the expression of selected cytokine and chemokine mRNAs in the 299 gastric cancer samples. We considered CXCL10, CXCL9, GZMA, GZMB, PRF1, CD8A, IFNG, TBX2, and TNF to be immune-activated–related transcripts; IDO1, CD274, HAVCR2, PDCD1, CTLA4, LAG3, and PDCD1LG2 to be immune-checkpoint–relevant transcripts; and VIM, ACTA2, COL4A1, TGFBR2, ZEB1, CLDN3, SMAD9, TWIST1, and TGRB1 to be transforming growth factor (TGF)β/EMT pathway–relevant transcripts. Gene cluster A was associated with high expression of TGFβ/EMT pathway–relevant mRNAs, whereas expression of Th1/cytotoxic T lymphocyte (CTL)–related mRNAs (ref. 59; particularly CXCL10, CXCL9, IFNG, and GZMB) was relatively low (Supplementary Fig. S5A–S5C, results of pairwise comparison are summarized in Supplementary Table S6). This suggested that this cluster may be classified as the immune-suppressive group. In contrast, gene cluster C, exhibiting the opposite mRNA expression profile, was classified as the immune-activated group. We also tested known signatures within the gastric cancer data set to better describe the functionality of the TME signature genes (Fig. 3A; Supplementary Fig. S5D; Supplementary Table S10). These analyses confirmed that TMEscore A was significantly associated with immune-relevant signatures, whereas TMEscore B was associated with stromal-relevant signatures (Fig. 3A).
Consistent with these findings, gene cluster A with the EMT subtype (ACRG molecular subtypes; ref. 22) was linked to a low TMEscore (Fig. 3B, Kruskal–Wallis, P < 2.2 × 10−16; Fig. 3C) and was associated with a poorer outcome (Fig. 3B and D). Using GSEA with all transcripts ranked by the log2 (fold change) between clusters A and C, we found gene sets that were considered T-cell suppressive and exclusive (60–62) in TME gene cluster A (Fig. 3E and F; Supplementary Table S11), including gene sets related to the EMT, TGFβ signaling, angiogenesis, and hypoxia.
After having identified the TMEscore as an intrinsic gene-expression signature closely linked to the stromal activation program and immune activation process, we sought to determine whether the TMEscore could accurately predict outcomes. The 299 patients in the ACRG cohort were, therefore, assigned to groups based on high or low TMEscores using the cutoff value (0.661, between third quantile to maximum) obtained with the survminer package. Five-year survival rates were 63% and 41% for the high and low TMEscore groups, respectively (HR, 0.32; 95% CI, 0.20–0.54; P < 0.001; Fig. 3D). When the TMEscore signature was evaluated as a continuous variable with the Cox regression model, the TMEscore model was determined to be an independent and robust prognostic factor (HR, 0.64; 95% CI, 0.50–0.82; P < 0.001; Supplementary Fig. S5E). The TMEscore was also investigated specifically in patients with stage II–III disease in the ACRG series to explore whether the application of adjuvant chemotherapy affected the ability of the TMEscore to predict survival outcomes. Patients were assigned to high and low TMEscore groups, and the survival advantage of the high TMEscore group was obvious, both in patients who received chemotherapy and in those who did not (Fig. 3G; Supplementary Table S6).
TME characteristics of the TCGA subtype and cancer somatic genome
TCGA has completed a comprehensive molecular characterization of gastric adenocarcinomas and has proposed subdividing tumors into four subtypes (1). Differences in the molecular subtypes were assessed in the TCGA-STAD series, and a higher TMEscore was significantly associated with EBV infection, microsatellite instability (MSI), and good prognosis in gastric cancer, whereas the genome stable (GS) subtype had a lower TMEscore (22) and was associated with poorer prognosis (HR, 0.49; 95% CI, 0.31–0.76; P = 0.002; Fig. 4A and B; Supplementary Table S12). The MSI-high subtype, with the best prognosis, had significantly higher TMEscores than the other two subtypes (Kruskal–Wallis, P = 9.2 × 10−12; Fig. 4C). Correlation analyses between the known signatures and the TMEscore were also validated in the TCGA-STAD cohort (Supplementary Fig. S5F), and the results were consistent with those of the ACRG cohort. The TMEscore model was again determined to be an independent and robust prognostic biomarker (HR, 0.74; 95% CI, 0.62–0.88; P < 0.001; Supplementary Fig. S5G).
A significant positive correlation between the TMEscore and mutation load was found (Fig. 4D; Spearman coefficient: R = 0.514, P < 2.2 × 10−16). Similar to the MSI subtype, patients with EBV infection had significantly higher TMEscores and CTL infiltration than those with the genomically stable and chromosomal instability (CIN) molecular subtypes (Kruskal–Wallis P < 2.2 × 10−16; Fig. 4B). Several studies have indicated that EBV+ gastric cancer does not exhibit higher TMB or MSI, but can respond to immune-checkpoint therapy (15, 63), suggesting that the TMEscore may be more useful for predicting clinical benefits in patients with gastric cancer treated with immunotherapy than TMB or MSI. We next investigated the distributions of somatic alterations and observed different patterns among gastric cancer clusters in terms of gene mutations. By analyzing the mutation annotation files of the TCGA-STAD cohort, we identified 33 variant mutated genes, which were associated with the TMEscore, using random forest algorithm with 1,000 iterations (ref. 33; Fig. 4E). Preclinical (64) and clinical (65) reports have described associations between individual altered genes and response or resistance to immune-checkpoint blockade. Relatively few of these genes were exclusively correlated with sensitivity or resistance in TCGA-STAD series, such as PIK3CA and PCDH10. These data may provide a new perspective to study the mechanism of TME formation, as well as explore individual mutations and their role in cancer immunity and immunotherapy.
The TMEscore predicts immunotherapeutic benefits
Upon stratification of the samples according to specific data sets (Fig. 5A), significant differences in overall survival were observed between the low and high TMEscore groups for all gastric cancer data sets except GSE57303 (HR, 0.41; 95% CI, 0.13–1.34), as detailed in Supplementary Table S12. Except for TNM stage I (HR, 0.58; 95% CI, 0.23–1.48), significant differences were observed in the TMEscore among all other stages. Concurrently, the prognostic value of the TME signature was also validated in three other independent data sets (GSE15459: HR, 0.48; 95% CI, 0.29–0.77; GSE57303: HR, 0.41; 95% CI, 0.13–1.34; GSE84437: HR, 0.24; 95% CI, 0.13–0.45; Supplementary Fig. S6A–S6C), as well as in a combined set of the five data sets (ACRG, TCGA-STAD, GSE15459, GSE57303, and GSE84437; Supplementary Fig. S6D; HR, 0.42; 95% CI, 0.33–0.54). The TMEscore was also predictive for relapse-free survival in the GSE26253 cohort (Supplementary Fig. S6E; HR, 0.63; 95% CI, 0.46–0.87). Finally, we evaluated the prognostic value of the TMEscore in 14 independent TCGA cancer cohorts including 7,241 tumors (Supplementary Table S13). Although the results of subgroup analysis were heterogeneous, the TMEscore was supported as a favorable prognostic biomarker in seven independent TCGA cohorts (Fig. 5B), which were acknowledged as hot tumors with diverse T-cell infiltration, including breast cancer, colon cancer, melanoma, lung squamous cell carcinoma, ovarian cancer, and cervical cancer.
Inhibition of immunologic checkpoints with monoclonal antibodies that block the T-cell inhibitory molecules PD-L1 and PD-1 has emerged as an anticancer treatment with unprecedented and synergistic survival benefits (66). We next explored the prognostic value of the TMEscore for immune-checkpoint therapy by assigning patients in the IMvigor210 and GSE78220 cohorts to high or low TMEscore groups. Patients with high TMEscores had significantly longer progression-free survival than those with lower TMEscores in both the IMvigor210 cohort (HR, 0.63; 95% CI, 0.46–0.89; Fig. 5C) and GSE78220 cohort (HR, 0.25; 95% CI, 0.07–0.89). The predictive value of the TMEscore to checkpoint immunotherapy was also confirmed in IMvigor210 (Fig. 5C–F; Supplementary Fig. S7A–S7E) and GSE78220 (Fig. 5G–J; Supplementary Fig. S7F–S7G). TMEscores were not associated with overall survival and response to treatment with immunotherapy in the TCGA-SKCM cohort (HR, 0.48; 95% CI, 0.17–1.41; Supplementary Fig. S7H–S7J). However, this could be due to the patients in the TCGA-SKCM cohort being from different medical centers with different study designs and receiving various types of immunotherapy, including cytokines, vaccines, and checkpoint blockers. If bias is excluded, these results suggest a potential limitation of the TMEscore at identifying responders to different immunotherapies. In good agreement with predicted outcomes of anti–PD-1 (GSE78220) and anti–PD-L1 (IMvigor210) treatment, we validated the predictive value of TMEscores in both the anti–MAGE-A3 (GSE35640; Supplementary Fig. S7K-L) and anti–CTLA-4 (GSE63557; Supplementary Fig. S7M-N) immunotherapy cohorts. Patients with higher TMEscores (TMEscores of patients treated with immunotherapy summarized in Supplementary Table S14) were more likely to benefit from immune-checkpoint therapy (IMvigor210 cohort: two-sided Fisher exact test, P < 0.001; Fig. 5D; Kruskal–Wallis test, P = 0.0041; Fig. 5E; GSE78220 cohort: two-sided Fisher exact test, P = 0.006; Fig. 5H; Wilcoxon test, P = 0.031; Fig. 5I). To investigate the biological characteristics of the TMEscore as it pertained to anti–PD-L1/PD-1 treatment, we observed that TMEscore A was positively correlated with a signature of cytotoxic CD8+ effector T cells (Spearman coefficient: R = 0.96, P < 2.2 × 10−16; Supplementary Fig. S7D), whereas TMEscore B was associated with the TGFβ response signal signature (F-TBRS; Spearman coefficient: R = 0.91, P < 2.2 × 10−16; Supplementary Fig. S7D), consistent with the results of gastric cancer data sets.
TMB (nonsynonymous variants), which is significantly associated with efficacy of immunotherapy, was also evaluated with ROC analysis (50) in the IMvigor210 cohort (61). However, we did not observe a predictive advantage of TMB when compared with the TMEscore (likelihood ratio test, P = 0.974; Fig. 5F). Combining TMB and TMEscore improved the predictive value compared with that of TMB or TMEscore alone using the pROC package (ref. 14; likelihood ratio test, combination versus TMB, P = 0.004; combination versus TMEscore, P = 0.019; Fig. 5F). The survival advantage of patients in the high TMEscore group, for both high and low TMB groups, was higher than that in the low TMEscore group (log-rank test, P = 0.003; Supplementary Fig. S7E). The ROC analyses of the GSE78220 and GSE35640 cohorts also demonstrated that the TMEscore was a predictive biomarker to immunotherapeutic benefits (GSE78220: AUC = 0.731; Fig. 5J; GSE35640: AUC = 0.689; Supplementary Fig. S7L). We next sought to validate the predictive value of TMEscores in a mouse model treated with CTLA-4 antibody (accession number GSE63557, N = 20). We obtained 82% conversion rate of TME signature genes A but only a 6% conversion rate of TME signature genes B (human gene symbols) from mouse probes; thus, only the predictive value of TMEscore A was estimated (Wilcoxon test, P < 0.001; Supplementary Fig. S7M; AUC = 1.000; Supplementary Fig. S7N). Taken together, our data strongly suggest that TME evaluation is associated with response to different immunotherapy approaches, including anti–PD-1/PD-L1/CTLA-4 immune-checkpoint inhibitors and MAGE-A3 antigen–based immunotherapy.
The TME signature, a tool designed to evaluate the comprehensive TME, is a biomarker for predicting survival in gastric cancer and for guiding more effective immunotherapy strategies. Our findings indicated that assessment of the immune and stromal statuses via the TME signature provided a predictor of survival in patients with gastric cancer and several other cancers, with data obtained from TCGA. Based on functional analysis of TME-relevant genes, our observations suggested that the TME signature genes in group B were enriched for genes involved in extracellular matrix remodeling (DCN, TIMP2, FOXF2, and MYH11), EMT (ACTA2, TGFB1L1, and SFRP1), and cell adhesion and angiogenesis (PDGFRA, GREM1, and TMEM100), which are considered T-cell suppressive (13, 43, 61, 67, 68). We also observed enrichment for genes involved in response to viruses (IFNG, TRIM22, CXCL10, CXCL9, and CD8A), response to IFNγ (HLA-DPB1, CCL4, CCL5, and IFNG), and T-cell activation (TRBC1, IDO1, CD2, NLRP3, and CD8A) among TME signature genes in group A.
Therapeutic antibodies that block the PD-1/PD-L1 pathway can induce robust and durable responses in patients with various cancers (11, 12, 43), including advanced gastric cancer (69). However, these responses occur only in a minority of patients, and several studies have found that PD-1 expression, PD-L1 expression, MSI status, and mutation load are not efficient biomarkers for predicting the benefits of immune-checkpoint blockade (15, 69, 70). The establishment of predictive biomarkers for checkpoint immunotherapy is, therefore, of importance in maximizing the therapeutic benefit (12, 15, 43). Emerging data support the idea that the TME plays a crucial role in checkpoint inhibitor immunotherapy (12–14, 71). Here, we have elucidated the comprehensive landscape of interactions between the clinical characteristics of gastric cancer and infiltrating TME cells. With the help of several computational algorithms, a methodology was established to quantify the TME infiltration pattern—the TME gene signature.
Integrated analysis revealed that the TMEscore is a prognostic biomarker for gastric cancer and was significantly elevated in patients with MSI and EBV molecular subtypes (1, 22, 63), which have been confirmed to be more sensitive to immune-checkpoint blockade (15, 72). In line with previous research, EBV+ tumors had low mutation burden, but exhibited immune infiltration (15, 63), suggesting that our methodology to evaluate the TME is a more predictive biomarker to further advance precision immunotherapy of gastric cancer. We also observed that the TMEscore showed a positive correlation with TMB and predicted neoantigen load in the TCGA gastric cancer cohort. Our data indicated that patients with EMT and GS subtypes exhibited the lowest TMEscores, consistent with studies (13, 61, 73) emphasizing that stromal activation is the core mechanism of resistance to checkpoint blockade. This resource may also help to facilitate the development of precision immunotherapy and the combined approach of both immunotherapy and inhibition of the EMT signaling pathway.
By applying ROC curve analysis (50), we also demonstrated the predictive value of the TMEscore for checkpoint blockade in four separate cohorts of patients with metastatic urothelial cancer (13) treated with the anti–PD-L1 agent (atezolizumab), metastatic melanoma treated with anti–PD-1 (pembrolizumab), advanced melanoma treated with a MAGE-A3 blocker (45), and a mouse model treated with anti–CTLA-4 immunotherapy (46). Consistent with a previous study about an immune signature score (74), we observed a significantly higher TMEscores in responders than in nonresponders undergoing checkpoint blockade therapy. However, the immune signature (IS score) of previous research was trained and obtained from the transcriptome profile directly and only enriched in immune-relevant pathways. We focused on the TME-infiltrating patterns and accessed the subtype-relevant gene signatures, including an IS (TMEscore A) and stromal activation signature (TMEscore B). These data offer mechanistic insights into the responses to immune-checkpoint blockade, suggesting that response to PD-L1 and PD-1 blockade is not only related to enhanced cytolytic activity, antigen processing, and IFNγ pathway components (13, 75), but is also associated with inhibition of fibroblast activation, angiogenesis, the EMT, and TGFβ pathway components (13, 61, 67, 68). This suggests that estimation of the immune TME combined with the stromal TME could potentially influence therapeutic resistance. Consistent with these findings, previous studies involving preclinical models of advanced cancer with activation of TGFβ- and EMT-relevant pathways, as well as fibroblast proliferation, demonstrated the inhibition of T cell–mediated tumor killing and a decrease in T-cell trafficking into tumors (13, 61). In line with our findings, some preclinical studies have indicated that antibody–ligand traps (anti-CTLA4–TGFβRII and anti-PDL1–TGFβRII) exhibit a superior therapeutic index compared with those of their parent immune-checkpoint inhibitors, which are currently in clinical use (73, 76).
The results of our study should be further validated in a prospective cohort of patients receiving immunotherapy using the NanoString nCounter gene-expression platform (NanoString Technologies) to more fully define cutoff values to be used. Second, given the major clinical importance of distinct tumor regions, it is appropriate to evaluate immune infiltration systematically in the core of the tumor and at the invasive margin. Because not all patients with high TMEscores have greater benefit of immunotherapy, more clinical factors should be incorporated to prediction models for improvement of accuracy. In the current study, this comprehensive evaluation of the cellular, molecular, and genetic factors associated with TME infiltration patterns has yielded several insights that shed light on how tumors respond to immunotherapies and may guide the development of novel drug combination strategies.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: D. Zeng, M. Shi, W. Liao
Development of methodology: D. Zeng
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): D. Zeng, R. Zhou, H. Sun
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): D. Zeng, M. Li, R. Zhou, Y. Liao
Writing, review, and/or revision of the manuscript: D. Zeng, M. Li, J. Zhang, H. Sun, J. Rao, W. Liao
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): J. Rao
Study supervision: J. Bin, W. Liao
This work was supported by the National Natural Science Foundation of China (No. 81772580 to W. Liao).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.