Abstract
Background: The tumor microenvironment is an important factor in cancer immunotherapy response. To further understand how a tumor affects the local immune system, we analyzed immune gene expression differences between matching normal and tumor tissue.
Methods: We analyzed public and new gene expression data from solid cancers and isolated immune cell populations. We also determined the correlation between CD8, FoxP3 IHC, and our gene signatures.
Results: We observed that regulatory T cells (Tregs) were one of the main drivers of immune gene expression differences between normal and tumor tissue. A tumor-specific CD8 signature was slightly lower in tumor tissue compared with normal of most (12 of 16) cancers, whereas a Treg signature was higher in tumor tissue of all cancers except liver. Clustering by Treg signature found two groups in colorectal cancer datasets. The high Treg cluster had more samples that were consensus molecular subtype 1/4, right-sided, and microsatellite-instable, compared with the low Treg cluster. Finally, we found that the correlation between signature and IHC was low in our small dataset, but samples in the high Treg cluster had significantly more CD8+ and FoxP3+ cells compared with the low Treg cluster.
Conclusions: Treg gene expression is highly indicative of the overall tumor immune environment.
Impact: In comparison with the consensus molecular subtype and microsatellite status, the Treg signature identifies more colorectal tumors with high immune activation that may benefit from cancer immunotherapy. Cancer Epidemiol Biomarkers Prev; 27(1); 103–12. ©2017 AACR.
Introduction
The composition of the immune cell population in tumors varies widely between patients and cancer types. The presence of CD8+ T cells has shown to predict at least partly the response of patients to chemotherapy and CTLA-4 and PD1/PD-L1 antibody treatment (1–3). Drugs that target immune cell populations are being actively developed and many combinations of these therapies tested (4). Detailed determination of the immune cells present in an individual patient's tumor is predictive of outcome (5) and can be used to personalize cancer immunotherapy (6). Recent work by Gentles and colleagues (5) established the use of computational approaches to determine the presence of immune cell populations from tumor sample gene expression. This, in contrast to analysis by IHC, where even though multiplex systems and computer-assisted quantification systems are available, only a limited number of markers can be analyzed per tissue section (7, 8). In addition, gene expression profiles can be reinterrogated even though the original tissue is no longer available. The drawbacks of bulk tumor immune cell gene expression analysis are 2-fold: (i) the relative location of the immune cells in the tissue is lost and (ii) identification and quantification of different types of immune cells is dependent on the robustness of the bioinformatics method used. Single-cell gene expression methods have been published and allow for identification of individual immune cells in tumors (9). Bioinformatics approaches aim to deconvolute and quantify immune cells based on bulk tumor gene expression. The recently published CIBERSORT approach uses a combination of reference genes and machine learning (10) to identify and quantify the different cell populations in a tumor sample. This method appears to solve one of the main issues of using reference genes to determine cell types, that is, multicollinearity, the fact that the same gene can correlate in two separate immune cell types.
Here, we use an immune cell subtype-agnostic method to investigate the differences of immune gene expression between solid cancers and their matching normal tissues. We used TCGA as the discovery set and then validated our results in an independent and new dataset. By doing this we reduced the artificial correlations introduced by combining datasets from different platforms. We derived tumor-specific gene signatures for CD8+ T cells and regulatory T cells. These were used to identify subpopulations of colorectal patients that were then correlated with clinical factors. We confirmed our results in a new colorectal dataset and used this dataset to compare signature gene expression to IHC data of FoxP3 and CD8.
Materials and Methods
mRNA expression datasets
RNA-seq data from TCGA was generated by the TCGA Research Network (http://cancergenome.nih.gov). RNASeq raw count files were downloaded from FireBrowse (firebrowse.org) with version date 2016_01_28. Expression matrices were concatenated across cancer types and normalized together using edgeR (11) and Voom (12) in R (https://cran.r-project.org/). Seventeen housekeeping genes from a Human Reference Gene Panel (product no. 05339545001, Roche) were used to construct an absolute gene expression heatmap (see Supplementary Fig. S1A). The heatmap was visually inspected to determine whether any of the cancer types had grossly different expression for all of the 17 genes, indicative of a cancer type–batch effect. No batch effect was observed. Somatic mutations, methylation, and clinical annotations for patients of the same version date were also downloaded from FireBrowse and used in the integrated analysis as is. Immune cell mRNA expression profiles from Linsley and colleagues (13) and Plitas and colleagues (14) were downloaded from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo) with accession numbers GSE60424 and GSE89225, respectively. The two datasets were normalized separately as described above.
Microarray datasets
Between 55 and 65 formalin-fixed and paraffin-embedded tumor samples for each of four cancer types were obtained from the Institute of Pathology/LMU Munich. The samples were collected in compliance with the guidelines of the local Ethics Committee (Ethics Committee of the University of Munich in concordance with the Declaration of Helsinki). RNA was extracted, amplified, and hybridized on Affymetrix HT-U133plus-2-PM microarrays. Quality control was performed according to the manufacturer's recommendation. CEL files were normalized using RMA from the Affy package and probes assigned to HUGO gene names according to Jetset (15). This dataset has been published to the Gene Expression Omnibus under number GSE103512.
Additional colorectal (GSE14333), breast (GSE2034), and ovarian (GSE74357) cancer gene expression profiles were downloaded from Gene Expression Omnibus and were normalized and probes assigned similar to the internal testing dataset.
Correlation analysis and correlation networks
Eleven genes were chosen that are considered representative for specific immune cell populations involved in tumor immunity. Two other published sets of genes were also used as comparison (16, 17). Each of the 11 seed genes was correlated with the complete transcriptomes of samples from one cancer type. This was repeated for each of the 28 TCGA solid cancer types in an R-environment using the psych package with Pearson correlation analysis. Only correlated genes that had an adjusted P < 0.05 and an R2 ≥ 0.8 were included for further analysis. An R2 ≥ 0.8 was arbitrarily chosen partly to obtain the optimal distinction between adjacent cancer types (see Fig. 1B; Supplementary Fig. S1C). If no gene expression was reported in the GSE60424 dataset, those genes were excluded as those genes were not considered to be expressed by any of the immune cell subsets (Supplementary Table S1). The GSE60424 dataset was used for this as dataset GSE89225 includes only subsets of T cells.
Correlated immune genes reveal relations between cancer types. A, Flowchart of steps to find the 129 correlated immune genes starting from the 11 chosen seed genes. TME, tumor microenvironment; TCGA, the Cancer Genome Atlas. B, Adjacency graph of cancer types correlated by immune gene expression. Immune-correlated genes for each solid tumor type from TCGA were determined and compared with other cancer types. The graph shows how many correlated genes are common between cancer types (indicated by thickness of the connecting edges). The color of the vertices indicates the expression of the 129 common immune genes (red indicates higher absolute mean expression). The abbreviations of cancer types are shown in the table. Skin and uveal melanoma have the most correlated genes in common, whereas thymoma has the highest mean immune gene expression.
Correlated immune genes reveal relations between cancer types. A, Flowchart of steps to find the 129 correlated immune genes starting from the 11 chosen seed genes. TME, tumor microenvironment; TCGA, the Cancer Genome Atlas. B, Adjacency graph of cancer types correlated by immune gene expression. Immune-correlated genes for each solid tumor type from TCGA were determined and compared with other cancer types. The graph shows how many correlated genes are common between cancer types (indicated by thickness of the connecting edges). The color of the vertices indicates the expression of the 129 common immune genes (red indicates higher absolute mean expression). The abbreviations of cancer types are shown in the table. Skin and uveal melanoma have the most correlated genes in common, whereas thymoma has the highest mean immune gene expression.
The adjacency graph was constructed by determining the number of genes that any two cancer types had in common in the correlation analysis as described in Fig. 1A. No normal tissue samples were used in this analysis. This matrix was normalized using the context likelihood of relatedness (18) method, as implemented in the minet package (19). Finally, the graph was constructed using the igraph package (http://igraph.org/r/), with undirected and weighted options.
Clustering
To determine the clustering of colorectal samples based on Treg signature genes, we used the partition around metroids (PAM) as implemented by the pamk function in fps R package, as described by Hennig and colleagues (20). To test for robustness of the clusters, we randomly subsampled half the samples and performed k-metroids clustering for a total of 200 times. From this subsampling, we defined the consensus cluster for each sample as the cluster that the sample was assigned to in majority of the cases. The identity of the cluster is defined as the level of the average Treg signature gene expression. Finally, the number of cases where the sample was assigned to a cluster other than its consensus cluster was determined and reported as a robustness indicator.
CMS and CRIS determination
The CMS group of colorectal cancer patients from TCGA and the internal dataset was determined according to Guinney and colleagues (21) using their R-package.
CRIS classification of colorectal TCGA samples was obtained from Isella and colleagues (22).
Differential expression analysis
To determine which genes were differentially expressed between two immune cell populations, we fitted the normalized and log2 converted expression data to a linear model using lmFit of the limma package. Then a contrast matrix of opposing immune cell populations was made and used to perform a contrast fit. This was done separately for both GSE60424 and GSE89225.
Statistical considerations
Throughout the manuscript results with a P < 0.05 were considered statistically significant. In cases where multiple testing correction (false discovery rate) was necessary, the Benjamini–Hochberg adjustment was used (23) and an adjusted P < 0.05 was considered statistically significant. The χ2 test of independence was used to determine whether the distributions were independent to each other.
Additional methods can be found in the Supplementary Methods.
Results
A data-driven approach to determine immune-correlated genes
We aimed to investigate differences in immune gene expression across solid cancer types in a data-driven manner. To avoid multicollinearity and to keep the initial chosen gene set to a minimum, we chose 11 hallmark genes that are essential either for the function of the specific immune cell or are used for the identification of such in cell sorting experiments. These seed genes and their gene expression in immune cell populations isolated from whole blood are shown in Supplementary Fig. S1B. For comparison, we repeated the following analysis with two other published gene sets used to identify immune cell populations.
To obtain more details on the genes that form part of the immune system in solid tumors, we correlated each of the 11 seed genes with the whole transcriptome of every solid cancer type in TCGA (Fig. 1A). From this list of correlated genes, we arbitrarily excluded any genes that had a correlation below an R2 of 0.8 (Supplementary Table S2 shows how the number of correlated genes changes with different R2 cutoffs). This resulted in 898 correlated immune genes across all cancer types. We determined which correlated genes were common (repeated) across cancer types. Supplementary Fig. S1C shows the number of common genes found versus the number of cancer types they are found in. Fourteen was situated in the linear section of the curve and was therefore considered a reasonable cutoff.
These 129 common genes formed our final list of immune-correlated genes. Similar common immune-correlated genes were found using two other sets of seed genes (Supplementary Table S3). The majority of these genes (83 genes, Supplementary Table S4) were also found in a similar type of analysis carried out by Angelova and colleagues (24), even though they used differential gene expression to select immune-related genes.
To ensure that the sample size differences of TCGA cancer types were not influencing the determination of common immune-correlated genes, we tested whether there was a correlation between sample number, the number of immune-correlated genes found and the mean expression of the common immune-correlated genes (Supplementary Fig. S1E). Neither analysis found a strong correlation, suggesting the larger datasets in TCGA were not overrepresented in the immune-correlated gene list.
Finally, we tested whether similar correlations could be found in datasets independent of TCGA. We tested separate microarray datasets of breast (GSE2034), ovarian (GSE74357), and colorectal cancer (GSE14333) and found in general a much lower number of correlated genes (Supplementary Table S5). We attributed this to the lower number of samples compared to TCGA and the gene expression platform used (microarray vs. RNAseq).
Immune gene correlation network between cancer types
We determined the number of common correlated genes between any two cancer types and constructed an adjacency graph (Fig. 1B). Cancer types that were expected to be closely related indeed shared a high number of correlated genes (represented by the width of edges, e.g. lung cancers and kidney cancers). Notably, skin melanoma and uveal melanoma were the two cancer types with the highest number of common genes (183 genes). Although the name suggests a close relation, uveal melanoma forms in the eye, an immune privileged organ, and is not an immune-gene active tumor. Because the same sets of genes correlate with our seed genes, the close relation might describe a similar underpinning of the two cancers. The average immune gene expression (the mean of 129 immune correlated genes represented by vertex color) showed that lung cancers and thymoma had the highest expression and also demonstrated the stark difference of immune gene expression between skin melanoma and uveal melanoma. Similar to Supplementary Table S2, we tested how the adjacency graph changed with the correlation threshold that we chose (R2 ≥ 0.8). Comparing Fig. 1B with Supplementary Fig. S1D shows that an R2 ≥ 0.9 leaves out several cancer types that do not have common immune-correlated genes with such a high correlation. An R2 ≥ 0.7 shows increased number and width of edges, as expected, and diminishes the useful information that can be observed from the graph.
Immune gene expression across cancer types
The expression of the 129 immune-correlated genes of each cancer type is visualized in the heatmap shown in Fig. 2A. As thymoma is a cancer of immunologic origin, it had the highest immune gene expression, and one can observe the distinct expression pattern compared to other solid cancers. The similarities in expression profiles of colon and rectal cancer are readily observable. In our microarray testing dataset of 60 patients each of breast, colorectal, non–small cell lung, and prostate cancer, we observed higher overall immune gene expression in NSCLC and breast cancer, and lower in colorectal and prostate cancer (Supplementary Fig. S2). These differences were also observed when just considering the expression of 11 seed genes (see Supplementary Fig. S2 lower heatmap). Individual samples visually showed a high degree of similarity within each cancer type, further suggesting that these immune genes are descriptive of the cancer type.
Immune gene expression shows pattern differences between cancer types but shows no correlation with number of mutations. A, Gene expression heatmap of the 129 common immune genes across all solid cancer types of TCGA. Red indicates higher mean absolute expression of the patients of one tumor type. Thymoma has the highest mean expression of the correlated genes but shows a distinct pattern when compared with other cancers. Kidney and lung have high mean immune gene expression, while adrenocortical carcinoma and uveal melanoma have the lowest mean expression. B, Scatterplot depicting the correlation between number of mutations and mean immune gene expression. The graph shows the number of mutations of each tumor sample versus the mean of immune gene expression. Only a low correlation (R2 = 0.015, straight line fit Pearson correlation) is observed, showing mutation number is not definitive in the expression of immune-correlated genes. Standard TCGA abbreviations for cancer types are shown in Fig. 1B, colors indicate different cancer types.
Immune gene expression shows pattern differences between cancer types but shows no correlation with number of mutations. A, Gene expression heatmap of the 129 common immune genes across all solid cancer types of TCGA. Red indicates higher mean absolute expression of the patients of one tumor type. Thymoma has the highest mean expression of the correlated genes but shows a distinct pattern when compared with other cancers. Kidney and lung have high mean immune gene expression, while adrenocortical carcinoma and uveal melanoma have the lowest mean expression. B, Scatterplot depicting the correlation between number of mutations and mean immune gene expression. The graph shows the number of mutations of each tumor sample versus the mean of immune gene expression. Only a low correlation (R2 = 0.015, straight line fit Pearson correlation) is observed, showing mutation number is not definitive in the expression of immune-correlated genes. Standard TCGA abbreviations for cancer types are shown in Fig. 1B, colors indicate different cancer types.
Because the number of mutations of a tumor is thought to influence the tumor immune environment (25–28), we correlated the number of mutations (the number of mutated loci as reported by TCGA) of a tumor sample with the mean expression of the immune-correlated genes (Fig. 2B). Only a low correlation (R2 = 0.015, straight line fit Pearson correlation) was found between immune gene expression and mutation number among all tumor types. On multivariate analysis using cancer type as a variable, we found that colon, adrenocortical, liver, and endometrial cancer had significant correlation, even though the Spearman coefficient was low (Supplementary Table S6). We found no significant correlation between individual seed gene expression and number of mutations (Supplementary Fig. S3). This suggests that the immune gene expression and the mutation number are independent of each other.
Normal tissue shows increased immune gene expression
Paired tumor and neighboring normal tissue is available for several TCGA cancer types. We speculated that there could be differences in the expression of immune genes. Interestingly, for seven of the 16 cancers that had sufficient matched samples to analyze (≥10 matched samples), expression of the 129 immune-correlated genes was generally higher in normal tissue than in matched tumor tissue (Fig. 3A), including lung and colorectal cancer.
Normal tissue has higher immune gene expression in some cancers. A, Gene expression heatmap comparing the immune gene expression of matched normal and tumor tissue samples across solid cancers. Top heatmap shows the mean absolute gene expression of normal samples while the bottom shows the tumor samples. Below the heatmap a scaled heatmap showing for each cancer type whether the normal or the tumor tissue has higher mean immune gene expression (red means higher; blue means lower). Tumor types are grouped by having a higher mean immune gene expression (High) in normal tissue, lower (Low) or similar (Sim). Five of 16 cancer types have higher immune gene expression in normal tissue compared with tumor tissue, 3 have lower, and 8 are similar. B, Boxplot showing the difference in expression of normal and matched tumor tissue of the 11 seed genes. The median of each gene was taken for the matched samples across solid cancers. Pink, normal tissue; blue, tumor. FOXP3 showed the highest increase from normal to tumor tissue, while NCAM1 showed the highest decrease.
Normal tissue has higher immune gene expression in some cancers. A, Gene expression heatmap comparing the immune gene expression of matched normal and tumor tissue samples across solid cancers. Top heatmap shows the mean absolute gene expression of normal samples while the bottom shows the tumor samples. Below the heatmap a scaled heatmap showing for each cancer type whether the normal or the tumor tissue has higher mean immune gene expression (red means higher; blue means lower). Tumor types are grouped by having a higher mean immune gene expression (High) in normal tissue, lower (Low) or similar (Sim). Five of 16 cancer types have higher immune gene expression in normal tissue compared with tumor tissue, 3 have lower, and 8 are similar. B, Boxplot showing the difference in expression of normal and matched tumor tissue of the 11 seed genes. The median of each gene was taken for the matched samples across solid cancers. Pink, normal tissue; blue, tumor. FOXP3 showed the highest increase from normal to tumor tissue, while NCAM1 showed the highest decrease.
Individual seed genes also showed differences in expression between normal and tumor tissue (Fig. 3B). NCAM1 had the largest difference in expression, being noticeably lower in tumor tissue. CD8A had slightly lower tumor expression, especially in a subset of cancer types (with KIRC and KIRP being contrastingly higher in tumor tissue) (Supplementary Fig. S4). FOXP3 and ITGAX expression were notably higher in tumors of most cancer types. Treg cells, as represented here by the FOXP3 seed gene, balance the activity of cytotoxic CD8+ T cells. Therefore, we focused on this balance comparing normal and tumor tissue.
Tumor-specific Treg and CD8 signatures show a balance shift between normal and tumor tissue
We devised tumor-specific CD8 and Treg signatures using a combination of TCGA data and data from isolated immune cells from blood and tumor samples (Supplementary Fig. 5A). Supplementary Fig. S5B shows the gene signature expression in the two isolated immune cell datasets, confirming signature specificity for the two T-cell populations. CD8 signature expression was also high in NK cells. However, as these have similar functions to CD8 T cells, they were not used as a contrast in the differential gene expression analysis.
The CD8 signature showed higher expression in the normal tissue of 12 of the 16 tumor types (Fig. 4A); similar to that previously observed using all correlated immune genes (Fig. 3A). The Treg signature was higher in the tumor tissue of most cancer types (except for liver cancer), suggesting an association between Tregs and malignancy. The majority of individual patients also had higher Treg signature expression in tumor tissue compared to normal tissue (Supplementary Fig. S6A).
Treg gene signature expression is higher in most cancers compared with normal tissue, and FOXP3 shows more hypomethylation in tumor tissue. A, Gene expression heatmap of the Treg and CD8 signatures across cancer types, comparing matched normal tissue to tumor tissue. Below the individual absolute gene expression heatmap is the summary heatmap showing whether normal or tumor tissue had higher expression of both signatures (red means higher expression). Tumor types are ordered in the same groups as Fig. 3A. The CD8 signature was higher in the normal tissue of more than half of the cancer types (12 of 16). However, the Treg signature was higher in all tumor tissue compared with normal tissue with the exception of liver cancer. B, Boxplots showing that more tumor samples have FOXP3 hypomethylation compared with normal tissue. Many samples of the tumor tissue (blue) fall below the third quarter of samples and have a lower median of FOXP3 methylation compared with normal tissue (pink). FOXP3 methylation data for all matched samples of solid cancers in TCGA were included. All seven methylation sites assigned to the FOXP3 gene were included in the analysis. C, Boxplots showing the correlation between the Treg gene signature and the methylation of the 205 site of FOXP3. The methylation of samples of cg10858077_X_49121205 were binned according to their SD and the expression of the Treg signature for each bin was graphed. No correlation between methylation and Treg signature expression was found.
Treg gene signature expression is higher in most cancers compared with normal tissue, and FOXP3 shows more hypomethylation in tumor tissue. A, Gene expression heatmap of the Treg and CD8 signatures across cancer types, comparing matched normal tissue to tumor tissue. Below the individual absolute gene expression heatmap is the summary heatmap showing whether normal or tumor tissue had higher expression of both signatures (red means higher expression). Tumor types are ordered in the same groups as Fig. 3A. The CD8 signature was higher in the normal tissue of more than half of the cancer types (12 of 16). However, the Treg signature was higher in all tumor tissue compared with normal tissue with the exception of liver cancer. B, Boxplots showing that more tumor samples have FOXP3 hypomethylation compared with normal tissue. Many samples of the tumor tissue (blue) fall below the third quarter of samples and have a lower median of FOXP3 methylation compared with normal tissue (pink). FOXP3 methylation data for all matched samples of solid cancers in TCGA were included. All seven methylation sites assigned to the FOXP3 gene were included in the analysis. C, Boxplots showing the correlation between the Treg gene signature and the methylation of the 205 site of FOXP3. The methylation of samples of cg10858077_X_49121205 were binned according to their SD and the expression of the Treg signature for each bin was graphed. No correlation between methylation and Treg signature expression was found.
Notably, the group of patients with higher Treg signature expression in normal tissue had overall lower expression of most checkpoint genes (Supplementary Fig. S6B) in their matching tumor samples, suggesting that the increased presence of Tregs in the tumor is accompanied by increased checkpoint gene expression.
Because Treg function is dependent on FOXP3 expression and its epigenetic regulation (29), we analyzed the seven sites of methylation assigned to the FOXP3 gene. We observed a small decrease in the mean methylation for each site when comparing normal to tumor tissue (Fig. 4B). The “tail” of hypomethylated samples was more pronounced in tumor tissue than in normal tissue, suggesting more tumor samples had hypomethylated FOXP3 and therefore higher FOXP3 expression (as previously shown in Fig. 3B). There was no linear correlation between methylation status and Treg signature expression (Fig. 4C) or FOXP3 gene expression. Similarly, FOXP3 methylation did not correlate with our CD8 signature or the CD8A gene. This could mean that FOXP3 expression is regulated through different or additional mechanisms or that the methylation data available was not sensitive enough to find the correlation.
Treg signature clustering of colorectal cancer patients reveals clinical factors differences
After clustering of the TCGA colorectal dataset using the Treg signature (see Materials and Methods), we distinguished two groups (Fig. 5A), which were then used to correlate with clinical factors (Fig. 5B; Supplementary Fig. S7; Supplementary Table S7). We clustered a separate microarray colorectal cancer dataset (GSE14333) using the same methodology and also found two groups based on Treg signature (Supplementary Table S8).
Colorectal cancer patients can be clustered into two groups based on Treg signature that show distinct clinical characteristics. A, Gene expression heatmap of the genes forming part of the Treg signature. Using the clustering method described in the methods, two clusters were found and named the high and low Treg cluster. Each gene is scaled individually from red (high gene expression) to blue. B, Bar charts showing proportions of samples within each CMS (left) and anatomical site (right) by Treg cluster. The majority of samples in the high Treg cluster are in CMS1 or CMS4, whereas the low Treg cluster has mainly CMS2 samples. Similarly, samples from the High Treg cluster are mainly right-sided tumors while samples from the low Treg cluster are mainly left-sided. NAs in CMS are samples that are not significantly assigned to a CMS. The P values shown indicate that distribution between Treg cluster and CMS or anatomic sites were independent as calculated by the χ2 test of independence.
Colorectal cancer patients can be clustered into two groups based on Treg signature that show distinct clinical characteristics. A, Gene expression heatmap of the genes forming part of the Treg signature. Using the clustering method described in the methods, two clusters were found and named the high and low Treg cluster. Each gene is scaled individually from red (high gene expression) to blue. B, Bar charts showing proportions of samples within each CMS (left) and anatomical site (right) by Treg cluster. The majority of samples in the high Treg cluster are in CMS1 or CMS4, whereas the low Treg cluster has mainly CMS2 samples. Similarly, samples from the High Treg cluster are mainly right-sided tumors while samples from the low Treg cluster are mainly left-sided. NAs in CMS are samples that are not significantly assigned to a CMS. The P values shown indicate that distribution between Treg cluster and CMS or anatomic sites were independent as calculated by the χ2 test of independence.
The group of patients with high Treg signature expression had a significant higher proportion of patients than expected of the consensus molecular subtype 4 (CMS4, 32%) and CMS1 (17%). The low-expression cluster had 39% CMS2 patients (P = 1.20E−22, chi-square test of independence). The high Treg cluster had proportionally more right-sided primary tumors (54%) whereas the low Treg cluster had more left sided (59%; P = 0.009). The high Treg cluster had more MSI-high patients (20%) than the low Treg cluster (4%; P = 6.29E−07), although the CIMP proportions were not significant different from those expected. Tumor stage was barely different between the two groups, with slightly more stage 4 patients in the Low Treg cluster. We also looked at the recently published CRIS classification that aims to exclude any influence from the stroma surrounding the tumor (22). Samples of the high Treg cluster were significantly more classified as CRIS-A and -B than expected, whereas the low Treg cluster had more CRIS-C, -D, and -E (Supplementary Fig. S7D, P = 9.35E−05). Patients from the high Treg cluster had a nonsignificant longer overall survival (P = 0.051; Supplementary Fig. S7E). The 129 correlated immune genes showed higher expression in the high Treg cluster compared to the low Treg cluster (Supplementary Fig. S7F).
When our separate colorectal cancer testing dataset was clustered into two groups by Treg signature (Supplementary Fig. S8A, Supplementary Table S7, and Supplementary Table S8), we observed similar results with the high Treg cluster consisting of more than expected CMS1/4 patients (P = 0.022) and of patients with right-sided tumors (not significant), whereas the low Treg cluster had mainly CMS2 patients and patients with left-sided tumors (Supplementary Fig. S8B).
Tumors from the high Treg cluster have significantly more CD8+ and FoxP3+ cells
We correlated our Treg and CD8 signatures with standard immunohistochemistry staining of FoxP3 and CD8 in the same tumor samples of our testing dataset (Fig. 6). Although overall correlation of the CD8+ density with the CD8 gene signature was low (R2 = 0.061 for the high Treg cluster and R2 = 0.088 for the low Treg cluster), the high Treg cluster had significantly more CD8+ cells (P = 0.0142, see boxplot in Fig. 6). A similar result was observed in the correlation of FoxP3+ density and the Treg signature: significantly more FoxP3+ cells were found in samples from the high Treg cluster (P = 0.0133).
Low correlation between IHC and gene signatures are observed but the high Treg cluster has significantly more CD8+ and FoxP3+ cells. Scatterplots showing the correlations between positive cell densities by IHC and mean gene signature expression in the testing dataset. Samples were divided by Treg cluster (rows). The high Treg cluster has significantly more CD8+ (left) and FoxP3+ (right) cells (see box plot inserts) in this dataset. x-axes are log2 of the number of positive objects per square millimeter and y-axes are mean gene expression values of the respective gene signatures.
Low correlation between IHC and gene signatures are observed but the high Treg cluster has significantly more CD8+ and FoxP3+ cells. Scatterplots showing the correlations between positive cell densities by IHC and mean gene signature expression in the testing dataset. Samples were divided by Treg cluster (rows). The high Treg cluster has significantly more CD8+ (left) and FoxP3+ (right) cells (see box plot inserts) in this dataset. x-axes are log2 of the number of positive objects per square millimeter and y-axes are mean gene expression values of the respective gene signatures.
Discussion
We used expression of 129 immune-correlated genes to give a data-driven view on how immune-activated a tumor is (compare our method to those published by others; refs. 5, 13, 24, 30, 31). The additional comparison of correlated genes found in each cancer type allows detection of similarities that gene expression analysis alone would miss.
Instead of focusing on the correlation between survival and immune gene expression, we wanted to determine what fundamentally changed in the local immune composition between normal and tumor tissue. Normal tissue is collected at the same time as tumor tissue and from macroscopically normal-looking tissue adjacent to the tumor, which is then verified by the pathologist to be normal. It is likely that the presence of the neighboring tumor has already altered the immune composition of the normal tissue. Notwithstanding, we found the higher immune gene expression in normal tissue to be noteworthy. When looking more closely at which seed genes were driving this difference, we found that NCAM1 had the largest difference while CD8A was higher in the normal tissue of only certain cancer types (notably colorectal cancer, Supplementary Fig. S4). This suggests that NK cells (through NCAM1) are partly responsible for higher mean expression of immune genes in normal tissue.
This was in contrast to the regulatory T-cell gene FOXP3, which was expressed more highly in tumor samples, prompting us to establish a tumor Treg-specific signature. Differential gene expression analysis between isolated immune populations from the peripheral blood will miss the changes these immune cells undergo once they are inside a tumor. Therefore, we used only the differentially expressed genes (between CD8+ T cells and CD4+ T cells; and between Tregs and conventional T cells) that were also correlated with two hallmark genes of each population (CD8A/CD8B and FOXP3/CTLA4) in solid tumor tissue. This combination allowed us to derive a Treg-specific signature of only six genes and a CD8-specific signature of ten genes.
Interestingly, our findings that CCR8 forms part of the Treg signature, and that Tregs are higher in tumor tissue than in normal tissue matches the observations of Plitas and colleagues (14) that CCR8 plays an important role in Tregs specifically in the tumor and that breast tumors have more Tregs than normal breast. Moreover, we found the latter to be true in all solid cancer types with matching normal tissue in TCGA with the exception of liver cancer. This exception could be explained by the high presence of Tregs under physiologic conditions that allow the liver to perform its functions without resulting in hepatitis (32).
The function of FOXP3 appears to be a determining factor in the role of a Treg, that is, if the Treg is immunosuppressive or inflammatory (33–35). We found that more tumor samples had hypomethylation of FOXP3 when compared with normal samples. Our goal was to find specific signatures for the different subtypes of Tregs based on the methylation of the FOXP3 gene. However, neither differential gene expression or correlation analysis found meaningful genes in the TCGA dataset. This could be due to other upstream enhancers being responsible (36) for FOXP3 function or because the sensitivity of the methylation of the TCGA dataset was not high enough to discern the genes involved.
The training dataset (TCGA) and an additional colorectal dataset (GSE14333) could be grouped into two clusters based on Treg signature expression and this clustering showed correlation with clinical factors with differences similar to those found in Becht and colleagues and Guinney and colleagues (21, 37). However, the limitations of the CMS, MSI-status, or anatomical site as a method for stratifying patients based on their tumor immune status became apparent. Although CMS1 has more patients with immune activation, not all of them do, and at the same time, CMS2 and 3 also have patients with high immune gene expression. This is in contrast to clustering patients by Treg signature, which shows almost exclusively patients with high overall immune gene expression in the high Treg cluster and few in the low Treg cluster (Supplementary Fig. S7F).
Notably, proportions of patients by stage were similar between the two Treg groups (Supplementary Fig. S7C), with slightly more stage 4 patients in the low Treg cluster. This is in contrast to what previously was reported by Bindea and colleagues (38), who showed the immune response changing along with tumor progression. The datasets we used cannot detect changes in Treg signature expression over the course of development of a tumor, and this should be studied in further detail to better understand the role of Tregs at the different stages of tumor development.
Even though we did not expect to find a difference in overall survival between the two Treg clusters, there was a nonsignificant decrease in OS for patients of the low Treg cluster (Supplementary Fig. S7E). FOXP3 gene expression by itself is not predictive of outcome in colorectal cancer (cbioportal.org). In TCGA, none of the colorectal cancer patients were treated with cancer immunotherapy and therefore the absence of a significant difference in outcome was expected. Future datasets that have outcome data of patients treated with cancer immunotherapy will show if Treg clustering allows for outcome prediction. We can then determine if Treg clustering is an addition or improvement over CMS classification in the identification of patients that benefit from cancer immunotherapy.
The correlation between IHC data for CD8 and FoxP3 and their respective gene signatures was low and previous published data showed similar poor results (39) even though PCR-based methods seemed to fare better (7). We did find significantly more CD8+ and FoxP3+ cells in the high Treg cluster. We did not capture the location of the IHC-positive cells inside the tumor, that is, whether the immune cells were at the tumor center or at the tumor margin. Because gene expression showed us that mean immune gene expression is higher outside and Treg gene expression is higher inside many colorectal tumors, it would be logical to determine the location of the immune cells by IHC. These efforts are ongoing.
This correlation was tested in one dataset of one cancer type. This initial analysis suggests that IHC and gene expression signatures convey complementary information on the status of the immune system inside a tumor. Targeting the Tregs [e.g., through CCR8 as suggested by Plitas and colleagues (14)] in tumors of the high Treg cluster might unleash the cytotoxic activity of already present CD8+ T cells, whereas tumors within the low Treg cluster might need to attract CD8+ T cells to the tumor, e.g. with bispecific antibodies.
In conclusion, analysis of genes involved in the tumor immune response showed stark differences between cancer types and between patients. The increased mean expression of these genes in normal tissue might indicate the suppression of the immune system in tumor development, especially since Treg-related gene expression is higher in the tumor. Treg signature clustering performed better than CMS and MSI-based subtyping methods in identifying colorectal cancer samples with high immune gene expression. To determine the tumor immune status of a patient, we feel that both gene expression and IHC provide complementary and vital information.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Authors' Contributions
Conception and design: J. Brouwer-Visser, A. Bauer-Mehren, K. Lechner, J.T. Dudley
Development of methodology: J. Brouwer-Visser, W.-Y. Cheng, D. Maisel, K. Lechner
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): E. Andersson
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): J. Brouwer-Visser, W.-Y. Cheng, D. Maisel, K. Lechner, J.T. Dudley
Writing, review, and/or revision of the manuscript: J. Brouwer-Visser, W.-Y. Cheng, A. Bauer-Mehren, D. Maisel, E. Andersson, J.T. Dudley
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): K. Lechner, E. Andersson
Study supervision: A. Bauer-Mehren, K. Lechner, F. Milletti
Acknowledgments
We thank Gabriele Dietmann and Astrid Heller for help with the preparation of the testing dataset.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.