Viruses affect approximately 20% of all human cancers and induce expression of immunogenic viral oncoproteins that make these tumors potent targets for immune checkpoint inhibitors. In this study, we apply computational tools to The Cancer Genome Atlas (TCGA) and other genomic datasets to define how virus infection shapes the tumor immune microenvironment and genetic architecture of 6 virus-associated tumor types. Across cancers, the cellular composition of the microenvironment varied by viral status, with virus-positive tumors often exhibiting increased infiltration of cytolytic cell types compared with their virus-negative counterparts. Analyses of the infiltrating T-cell receptor repertoire in these patients revealed that Epstein–Barr virus infection was associated with decreased receptor diversity in multiple cancers, suggesting an antigen-driven clonal T-cell response. Tissue-specific gene-expression signatures capturing virus-associated transcriptomic changes successfully predicted virus status in independent datasets and were associated with both immune- and proliferation-related features that were predictive of patient prognosis. Together, the analyses presented suggest viruses have distinct effects in different tumors, with implications for immunotherapy.

Significance: This study utilizes TCGA and other genomic datasets to further our understanding of how viruses affect the tumor immune response in different cancer types.

Graphical Abstract:β€ˆhttp://cancerres.aacrjournals.org/content/canres/78/22/6413/F1.large.jpg. Cancer Res; 78(22); 6413–23. Β©2018 AACR.

Immune checkpoint inhibitors have yielded promising results in treating cancer. By targeting the proteins that attenuate T-cell receptor (TCR) signaling following antigen recognition, these drugs can initiate a robust antitumor immune response that induces remission and prolongs survival in subgroups of patients with multiple malignancies (1–7). Despite the early successes of these treatments, response rates to these therapies remain low, creating a need for biomarkers that are capable of identifying responsive patients. Tumor mutation burden has been identified as one such biomarker, with tumors exhibiting high mutation loads more likely to express immunogenic neoantigens that are recognized as non-self by the adaptive immune system (8). This association has been supported across tumor types, with high-mutation tumors yielding higher response rates than those tumors with lower mutation rates (9, 10) and also within tumor types, with responsive patients exhibiting significantly higher mutation burdens (11–16). Given these associations, it is possible that other sources of antigen in the tumor microenvironment function similarly with regards to responses to immunotherapy.

Virus infections are a source of tumor antigen that affect approximately 20% of all human cancers. There are seven viruses that have been implicated as oncogenic, including hepatitis B and C virus (HBV and HCV), human papillomavirus (HPV; with types 16, 18, 33, and 45 the most prevalent; ref. 17), human herpesvirus 4, also known as Epstein–Barr virus (HHV4/EBV), human T-cell lymphotropic virus type I (HTLV-I), Merkel cell polyomavirus (MCV), and human herpesvirus 8, also known as Kaposi's sarcoma virus (HHV8; ref. 18). HPV, HTLV-I, EBV, MCV, and HHV8 directly contribute to oncogenesis by expressing oncogenic proteins encoded in their genome. Conversely, HBV and HCV are involved in indirect carcinogenesis, causing chronic inflammation of the infected organs (19). Like neoantigens, the proteins expressed by these viruses are recognized as foreign by the immune system, with infection associated with increased immune activity in some tumor types (20, 21). Immune checkpoint inhibitors thus provide a potentially effective treatment option for these cancers. However, the role viral proteins play in the magnitude and quality of these responses remains unclear. Two clinical trials testing the anti-PD1 agent pembrolizumab in head and neck cancer and Merkel cell carcinoma found higher response rates among virus-positive patients compared with their virus-negative counterparts (22, 23). Moreover, in Merkel cell carcinoma, MCV-positive patients exhibited significantly lower mutation burdens than virus-negative patients, indicating that viral antigen may be a sufficient immunogen to induce response to anti-PD1 (23). Similar results were shown in gastric cancer, where a study identified an EBV-positive patient responsive to anti–PD-L1 despite having a low tumor mutation burden (24). However, a study in cervical cancer revealed that the antitumor T-cell response in HPV-positive tumors receiving adoptive T-cell therapy was directed against a patient's cancer germline antigens or neoantigens rather than viral antigens (25).

Several studies have now characterized the occurrence of virus infections in thousands of tumor samples from The Cancer Genome Atlas (TCGA; refs. 17, 26). In addition, recent advances in genome-based immune profiling technologies have made it possible to systematically characterize the tumor microenvironment of large cohorts of patients. Several methods exist that use gene expression information to infer the infiltration level of different immune cell types in the tumor microenvironment (21, 27, 28). Other methods have been designed that use sequencing data contained in raw RNA-seq reads to profile the TCR and B-cell receptor repertoires from bulk tumor datasets (29–31). These methods together enable large-scale analyses that are more highly-powered to detect associations than traditional studies using flow cytometry and targeted receptor sequencing-based approaches. This is especially beneficial when studying virus infection in different tumor types, as these infections are exceedingly rare in some contexts.

In this study, we apply these tools to TCGA and other datasets to define how virus infection shapes the immune response in six tumor types: bladder urothelial carcinoma (BLCA), cervical squamous cell carcinoma (CESC), colorectal adenocarcinoma (COADREAD), head and neck squamous cell carcinoma (HNSC), liver hepatocellular carcinoma (LIHC) and stomach adenocarcinoma and esophageal carcinoma (STES). We begin by examining how the infiltration levels of CD8+ T cells, B cells, natural killer (NK) cells, and macrophages vary based on virus infection status. We then perform TCR sequencing on bulk tumor data to determine the extent to which virus infections are associated with clonal T-cell responses. To expand our study beyond datasets that include virus infection information, we develop a gene signature to predict virus infection status in each of the tumor types in our study. We then functionally characterize this signature and apply it in a survival-meta analysis to provide a multidataset consensus on how virus infection affects cancer-specific patient prognosis. Together, these analyses provide novel insights into the altered tumor-immune dynamics associated with virus infection that can be used to refine our understanding of virus-associated tumor immunogenicity.

Datasets

Virus abundance information, quantified as the number of virus supporting reads per hundred million reads processed (RPHM) for 2,343 virus-associated TCGA tumor was downloaded from a Supplementary File from a previous study (17). Of the primary samples included in this dataset, 2,341 had matching RNA-seq gene expression information, which was obtained from GDAC FireHose (RNAseqV2, RSEM). Immune infiltration score calculation was performed on log10-transformed TCGA RNAseqV2 data. For patients with multiple primary samples, the average RSEM value for each gene was taken before log-transformation. Raw RNA-seq paired-end reads (.fastq) for TCGA samples were downloaded from the Genomic Data Commons legacy archive (https://portal.gdc.cancer.gov/legacy-archive). RNA-seq reads were aligned to human reference genome hg19 using Bowtie2 (32) run with default parameters. Somatic mutation annotation format (.maf) files were obtained from GDAC Firehose. Non-silent mutation burden for each sample was defined as the number of variants that were not classified as β€œsilent” in their respective file. MANTIS microsatellite instability (MSI) scores for TCGA samples were downloaded from a previous publication (33). All TCGA sample size information is available as a supplement (Supplementary Table S1). Additional gene expression data and the associated virus infection and survival information were obtained from the gene expression omnibus (GEO) under accession numbers GSE40774, GSE6791, GSE55550, GSE39366, GSE65858, GSE49288, GSE62232, GSE44001, as well as from PRECOG (https://precog.stanford.edu/precog_data.php; Supplementary Table S2; ref. 34).

On the basis of findings from the original study (17), all TCGA samples exhibiting β‰₯100 RPHM for a given virus were classified as infected by that virus. In the event that a sample was positive for multiple viruses, we classified the sample as positive for the virus that exhibited the highest RPHM. For MSI, binary thresholds were determined from the distribution of MANTIS scores in each cancer type, with the default of 0.4 used for STES and 0.5 used for COADREAD.

Calculation of immune infiltration scores and T-cell receptor profiling

Immune infiltration scores for CD8+ T cells, B cells, NK cells, and macrophages were calculated as described previously using the same four previously validated signatures derived from the Immunological Genome Project (27). Enrichment scores for additional immune cells were calculated from a previously published set of gene signatures (35) using single sample gene set enrichment analysis (ssGSEA) from the R GSVA package (36). T-cell receptor profiling was run on Bowtie2-aligned TCGA RNA-seq reads (.bam) using TCR Repertoire Utilities in Solid Tissue (TRUST) version 3.0 (https://bitbucket.org/liulab/trust/; ref. 29). During the alignment step, TRUST requires for unmapped reads to be included, local alignment to be disabled, and for the number of mismatches tolerated from mapped reads to be no greater than 2, all default parameters of Bowtie2. TCR clonotype diversity was estimated using the clonotypes per thousand mapped TCR reads for each sample. This number was calculated from TRUST's output by dividing each sample's number of unique CDR3 reads (denoted as β€œcdr3dna” in TRUST's tabular output) by its total number of TCR reads (denoted as β€œest_lib_size”) divided by 1,000. ESTIMATE immune scores were calculated using the ESTIMATE R package (http://bioinformatics.mdanderson.org/estimate/index.html).

Derivation and application of the virus infection gene-expression signature

The virus infection gene-expression signature was designed to capture the transcriptome-wide differential gene-expression activity between virus-positive and virus-negative patients within a given TCGA tumor type. To define this signature, a logistic regression model was constructed for each gene in the transcriptome with a patient's virus infection status as the response variable and the expression level of that gene as the predictor variable. To ensure the signature most accurately captured the difference between virus-positive and virus-negative samples, potential confounding factors, including stage, age, grade, lymph node metastasis status, and MSI status were also included in the model as covariates. A covariate was only included in a specific cancer type's model when less than half of the samples of that cancer type had NA values for that covariate, and in the case of categorical variables, when fewer than 80% of samples were the same category for that variable (Supplementary Table S3). The model used for each gene can be formulized below, where Y is a patient's virus infection status (1 indicating positive, 0 indicating negative), X1 is the expression of the gene under consideration and X2 through Xn are the n-1 covariates:

From these logistic regression models, we derived each gene's Ξ²-coefficient, which indicated whether the gene was up- or downregulated in virus-positive samples compared with virus-negative samples, as well as the P value of the association between each gene's expression level and virus infection status. We then used these statistics to define the final virus infection gene-expression signature as a set of gene-specific weight profiles that indicated the magnitude and directionality (upregulated or downregulated) of the association between a gene's expression level and a patient's virus infection status. In the upregulated weight profile, the P values of all genes with a Ξ²-coefficient >0 were βˆ’log10-transformed and the remainder were set to 0, whereas in the downregulated weight profile, the P values of all genes with a Ξ²-coefficient <0 were βˆ’log10-transformed while the remaining P values were set to 0. The resulting numbers >10 were trimmed to 10 to avoid outliers and then all numbers were rescaled to between 0 and 1 to obtain the final weight signatures.

To calculate the virus infection scores in a series of patients, these signatures along with a patient gene expression dataset were input into the Binding Association with Sorted Expression (BASE) algorithm (37). Briefly, BASE functions by examining how the weights of each input signature are distributed through a patient's ranked gene-expression profile. Patients that exhibit high expression of genes with high upregulated weights and low expression of genes with high downregulated weights are assigned a higher score for that signature, whereas patients deviating from this pattern are assigned a lower score. Full details on how BASE uses these signatures to calculate the score are available in a previous study (38). The virus signatures formatted for BASE input and instructions for applying them are available as a supplement (Supplementary Table S3).

To evaluate the accuracy of the signature, we ranked each patient by their virus infection scores and then performed an iterative procedure where each patient's score in the ranked list was used as a threshold by which to classify patients as virus-positive or virus-negative. For each iteration, we calculated the sensitivity and specificity of the resulting classification and then used these numbers to calculate the AUC. Confidence intervals (CI) were determined using an AUC distribution calculated from 200 bootstrapped samples of the respective dataset.

Statistical analyses

All comparisons of the distributions between two groups were made using the two-tailed Wilcoxon-sum rank test. Logistic regression modeling to derive the virus signature was performed using the β€œglm()” function in R with family β€œbinomial.” Regularized logistic regression modeling was performed using the β€œcv.glmnet()” function from the R glmnet package with family β€œbinomial.” For two-class survival comparisons, samples were stratified into high and low groups based on whether they were above or below the median virus score. Statistical significance of the difference between these survival distributions was calculated using the log-rank test through the β€œsurvdiff()” function from the R survival package. Cox proportional hazards regression was used to model the association between virus score as a continuous variable and patient survival, and was performed using the β€œcoxph()” function from the R survival package. For meta-analyses, P values derived from Wilcoxon sum-rank tests or Cox proportional hazards models were converted to z-scores. The z-scores were then collapsed into meta-z-scores by applying an unweighted version of Stouffer's method (39).

Virus infection is associated with an altered tumor microenvironment

To obtain a global overview of how the tumor microenvironment differs in relation to virus infection, we applied our previously developed computational framework (27) to infer infiltration levels of CD8+ T cells, B cells, NK cells, and macrophages from the RNA-seq gene expression profiles of six virus-associated TCGA tumor types (Supplementary Table S1). We validated the resulting infiltration scores using a published set of lymphocyte fractions inferred from corresponding TCGA hematoxylin and eosin–stained tissue slides (Supplementary Table S4; ref. 40). We then stratified patients from each tumor type based on whether they were positive for reads from at least one virus, as determined by a prior study (Fig. 1A; ref. 17). In 5/6 tumor types, infection was associated with elevated levels of CD8+ T-cell infiltration, with CESC and HNSC exhibiting significant differences (P = 0.02 and 4eβˆ’5, respectively). We observed similar trends for B cells, with significant differences in BLCA, CESC, and HNSC (P = 0.01, 0.01, and 3eβˆ’8, respectively), as well as for NK cells, with significant elevations in the virus-positive samples of CESC, HNSC, and STES (P = 3eβˆ’3, 0.03, and 6eβˆ’4, respectively). Conversely, we observed significantly reduced abundance of macrophages when comparing HNSC samples (P = 3eβˆ’5). An independent set of immune gene-expression signatures (35) revealed similar results, while also suggesting that virus-positive COADREAD and STES samples harbored significantly higher levels of immunosuppressive T-regulatory and myeloid-derived suppressor cells (Supplementary Fig. S1).

Figure 1.

Differences in immune infiltration levels between virus-infected and noninfected samples. A, Boxplots depicting the distribution of immune cell infiltration scores across samples from six cancer types stratified by virus infection status. Dark colors indicate virus-infected samples and light colors indicate noninfected samples. Each box spans quartiles, with the lines representing the median infiltration score for each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot. Significant associations are marked (*, P < 0.05). B, Heatmap marking significant differences in immune infiltration scores between samples infected with noted viruses and noninfected samples. All viruses infecting more than one patient in the denoted tumor type are shown. Red, significant increases in infected samples (P < 0.05); green, significant decreases (P < 0.05); gray, no significant difference (P > 0.05). All P values were calculated using the Wilcoxon sum-rank test.

Figure 1.

Differences in immune infiltration levels between virus-infected and noninfected samples. A, Boxplots depicting the distribution of immune cell infiltration scores across samples from six cancer types stratified by virus infection status. Dark colors indicate virus-infected samples and light colors indicate noninfected samples. Each box spans quartiles, with the lines representing the median infiltration score for each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot. Significant associations are marked (*, P < 0.05). B, Heatmap marking significant differences in immune infiltration scores between samples infected with noted viruses and noninfected samples. All viruses infecting more than one patient in the denoted tumor type are shown. Red, significant increases in infected samples (P < 0.05); green, significant decreases (P < 0.05); gray, no significant difference (P > 0.05). All P values were calculated using the Wilcoxon sum-rank test.

Close modal

We next examined how the type of virus affected theses associations, comparing patients infected with a specific virus to virus-negative patients with the same cancer (Fig. 1B). Although this resulted in small cohorts of samples positive for some rare viruses, most tumor types had sufficient numbers of infections for at least one virus to detect significant differences in immune infiltrate. We found that in CESC, numerous viruses were associated with elevated infiltration levels, with the most common virus, HPV16, associated with significantly higher levels of CD8+ T cells, B cells, and NK cells. In HNSC, which also exhibits frequent HPV16 infection, we observed similar associations, including increased CD8+ T-cell, B-cell, and NK cell infiltration as well as decreased macrophage infiltration. In STES, where samples were most frequently infected by EBV, we observed significantly elevated CD8+ T-cell and NK cell levels and decreased levels of B-cell infiltration in EBV-positive samples. Interestingly, these associations were reversed in HBV-infected LIHC, with HBV-positive samples exhibiting significantly lower levels of CD8+ T-cell and NK cell infiltration. These samples also exhibited decreased expression of HLA-class I genes, suggesting the loss of antigen presentation machinery in these cancers (P = 0.06, 0.03, and 0.01 for HLA-A, HLA-B, and HLA-C, respectively; Supplementary Fig. S2). These divergent results indicate that viruses of different families alter the tumor microenvironment of the samples they infect in distinct ways.

To confirm that the virus-associated changes in immune infiltration were not due to altered neoantigen abundance, we compared how non-silent mutation burden differed by general virus status and by infection with specific viruses (Supplementary Fig. S3). In most contexts, we found that mutation load was not associated with virus infection. However, in HNSC we found virus infection to be associated with a significantly lower mutation load (P = 0.05), despite being associated with higher levels of immune infiltration. We observed similar trends in STES, where EBV infection was associated with a significantly lower mutation load (P = 6eβˆ’3) and in LIHC, where HBV-infection was associated with a higher mutation load (P = 0.03). These inverse associations between mutation burden and immune infiltration indicated that virus-based differences in immune infiltration were not due to neoantigen abundance, and also were suggestive of virus-associated immunoediting processes.

Virus-associated immune infiltration differences are confounded by MSI

MSI is a condition associated with a dramatically higher mutation burden that is a result of defects in the DNA mismatch repair pathway. This condition is especially prevalent in colorectal, gastric, and endometrial tumor types (41) and has been shown to associate with high levels of CD8+ T-cell infiltration (27). We hypothesized that differences in immune infiltration between virus-positive and virus-negative COADREAD and STES samples may have been confounded by MSI-based differences in immune infiltration. To assess this, we stratified patients from the COADREAD and STES cohorts into four groups depending on their MSI and virus infection status. We found that in both cancers the majority of virus-positive samples were in the microsatellite stable (MSS) subgroup, with this subgroup containing 31/33 virus-positive samples in COADREAD and 50/55 samples in STES. In COADREAD, MSI samples exhibited significantly higher levels of CD8+ T-cell infiltration than either the virus-negative or virus-positive MSS groups (P = 8eβˆ’15 and 7eβˆ’4, respectively; Fig. 2). Within the MSS samples, the virus-positive group exhibited moderate increases in CD8+ T-cell infiltration (P = 0.10; Fig. 2) and non-silent mutation load (P = 0.07) compared with their virus-negative counterparts. In STES these associations were weaker; MSI samples exhibited significantly higher levels of CD8+ T-cell infiltration than virus-negative/MSS samples but not virus positive/MSS samples (P = 3eβˆ’3 and 0.32, respectively; Fig. 2), whereas within the MSS group, virus-positive samples exhibited insignificantly higher levels of CD8+ T cells (P = 0.15; Fig. 2) and lower mutation loads compared with virus-negative samples (P = 0.24). These results provided preliminary evidence that MSI status was confounding virus-based immune associations in COADREAD and possibly STES. However, unlike in other cancers, increases in immune infiltration in virus-positive COADREAD may have been related to increased neoantigen abundance.

Figure 2.

CD8+ T-cell infiltration after adjusting for MSI and virus infection status. Boxplots depicting the distribution of CD8+ T-cell infiltration scores across COADREAD and STES samples stratified by MSI status and virus infection status. Dark colors indicate virus-infected samples and light colors indicate noninfected samples. Each box spans quartiles, with the lines representing the median CD8+ T-cell infiltration score for each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot. P values were calculated using the Wilcoxon sum-rank test. Significant associations are marked (⁁, P ≀ 0.10; ***, P < 0.01).

Figure 2.

CD8+ T-cell infiltration after adjusting for MSI and virus infection status. Boxplots depicting the distribution of CD8+ T-cell infiltration scores across COADREAD and STES samples stratified by MSI status and virus infection status. Dark colors indicate virus-infected samples and light colors indicate noninfected samples. Each box spans quartiles, with the lines representing the median CD8+ T-cell infiltration score for each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot. P values were calculated using the Wilcoxon sum-rank test. Significant associations are marked (⁁, P ≀ 0.10; ***, P < 0.01).

Close modal

Infection of EBV is associated with decreased T-cell receptor repertoire diversity

We next examined how infection of specific viruses associates with different aspects of the infiltrating TCR repertoire. We hypothesized that the presence of viral proteins in the tumor microenvironment may be associated with a high proportion of viral antigen-specific T-cell clones, and thus a less diverse TCR repertoire. To test this hypothesis, we employed TRUST (29) to call TCR-specific reads from bulk RNA-seq reads for each TCGA tumor type in our study. In all tumor types, TCR read abundance was well-correlated with our expression-based measures of CD8+ T-cell infiltration (average Spearman ρ = 0.72; range, 0.57–0.81). In addition, differences in TCR read abundance between infected and noninfected samples was largely consistent with our previous findings, indicating concordance between the computational methods chosen for our infiltration analyses (Supplementary Fig. S4). We defined the diversity of each patient's TCR repertoire as the number of unique clonotypes per thousand mapped TCR reads and compared how this metric differed between infected and noninfected samples across each tumor (Fig. 3A). In most cases, there were no significant differences in diversity between subgroups. However, in STES, samples infected with EBV exhibited significantly lower levels of TCR diversity, indicating an antigen-driven T-cell response (P = 1eβˆ’3). Furthermore, in the three other tumor types with at least one EBV infection, EBV-infected samples exhibited the lowest median levels of TCR diversity compared with the other patient subgroups. To provide more power to these analyses, we pooled the associations for each virus across cancer types and determined significance using a meta-Z-score approach (Fig. 3B). EBV remained the only virus associated with differing levels of TCR diversity, with a meta-Z-score of βˆ’3.81 that corresponded to a significant decrease (two-tailed meta–P-value = 1eβˆ’4). These results indicated that the presence of EBV proteins may elicit a clonal T-cell response in different tumor types.

Figure 3.

T-cell receptor repertoire diversity between samples infected with different viruses. A, Boxplots depicting the distribution of unique CDR3 calls (clonotypes) per 1,000 TCR reads in samples from six tumor types infected with different viruses. Each box spans quartiles, with the lines representing the median clonal diversity for each group. Whiskers represent absolute range, excluding outliers. All outliers are included in the plot. P values were calculated using the Wilcoxon sum-rank test. Significant associations are marked (*, P < 0.05). B, Meta-Z-score absolute values, indicating associations between infection of a given virus and altered TCR clonal diversity across 6 tumor types. Viruses were ranked by unweighted meta-Z-score. Green bars indicate an unweighted meta-Z-score <βˆ’1.96 (significantly lower TCR clonal diversity; two-tailed P < 0.05), whereas gray bars indicate an unweighted meta-Z-score whose absolute value is <1.96.

Figure 3.

T-cell receptor repertoire diversity between samples infected with different viruses. A, Boxplots depicting the distribution of unique CDR3 calls (clonotypes) per 1,000 TCR reads in samples from six tumor types infected with different viruses. Each box spans quartiles, with the lines representing the median clonal diversity for each group. Whiskers represent absolute range, excluding outliers. All outliers are included in the plot. P values were calculated using the Wilcoxon sum-rank test. Significant associations are marked (*, P < 0.05). B, Meta-Z-score absolute values, indicating associations between infection of a given virus and altered TCR clonal diversity across 6 tumor types. Viruses were ranked by unweighted meta-Z-score. Green bars indicate an unweighted meta-Z-score <βˆ’1.96 (significantly lower TCR clonal diversity; two-tailed P < 0.05), whereas gray bars indicate an unweighted meta-Z-score whose absolute value is <1.96.

Close modal

To corroborate these findings, we obtained TCR diversity metrics calculated by the MiTCR TCR analysis software from a previous study (31, 42) and compared how each of these metrics, which included Shannon entropy, species richness, and species evenness, varied between samples infected by a given virus and those with no infections (Supplementary Table S5). In agreement with our previous analyses, we found that Shannon entropy and species richness were increased in numerous contexts, such as in HPV-infected CESC and HNSC samples as well as EBV-infected STES samples, suggesting increased levels of T-cell infiltration. Furthermore EBV-infected STES samples exhibited a moderate reduction in TCR evenness that was indicative of a clonal T-cell response (P = 0.08). Together, the MiTCR and TRUST results reinforced each other, providing additional evidence of the altered T-cell dynamics associated with virus infection.

Tissue-specific virus infection gene signatures reproducibly predict infection status

Although some gene-expression datasets contain immunohistochemistry and sequencing information that can inform a patient's virus infection status, several do not, making it difficult to further study how virus infection associates with different clinical variables. To address this issue, we devised a method to create a gene-expression signature that could predict a virus infection status. To design this signature, we applied a previously developed approach that weighted each gene in the transcriptome based on how well the gene's expression level distinguished virus-infected patients from noninfected patients in a generalized linear model (38). Genes that more significantly differed between a virus-infected and noninfected sample were given exponentially more weight in the signature than genes that did not significantly distinguish between the two groups of samples. To minimize the effect of confounding on the creation of this signature, each gene's model was adjusted for age, stage, grade, lymph node metastasis status, and MSI status when possible.

We first applied this approach to the TCGA HNSC dataset, comparing samples positive for any virus to those negative for all viruses. We then applied the signature back to the TCGA HNSC dataset to calculate a virus infection score for each patient and used these scores to classify whether a sample was virus-positive or virus-negative. The resulting classifications yielded an area under the receiver operating characteristic curve (AUC) of 0.92 (95% CI, 0.87–0.96), indicating high classification accuracy (Fig. 4A). To confirm that this performance was not a result of overfitting in the TCGA dataset, we applied the TCGA-derived signature to a series of additional microarray datasets that also contained clinical measures of virus infection status (Supplementary Table S2). In these datasets, the signature maintained excellent accuracy, yielding AUCs ranging from 0.81 to 0.95 (Fig. 4A; Supplementary Fig. S5). This performance was comparable to that from a regularized logistic regression model, but at a higher consistency across datasets (Supplementary Table S6).

Figure 4.

Performance and characterization of the virus infection gene-expression signature. A, ROC curves illustrating the accuracy of using the virus gene-expression signature to classify infected samples from noninfected samples. Plots on the left depict the signature's performance in training data, whereas plots on the right depict the signature's performance in test datasets. Gray lines and numbers in parentheses in the left plots indicate the 95% bootstrap confidence intervals of the ROC curves and their respective AUCs. From top to bottom, test datasets were obtained from GEO under accession numbers GSE40774, GSE49288, and GSE62232. B, Heatmap of AUCs for signatures trained in one tissue type (columns) and applied to another (rows). To show contrast, all AUCs < 0.5 were trimmed to 0.5. C, Boxplots depicting the weight of the gene MKI67 (black diamond), CD274 (large black circle), and the weight distribution of genes comprising the ESTIMATE immune gene-expression signature (boxes). Dotted lines at 0.13 indicate threshold at which weights correspond to P < 0.05. Each box spans quartiles, with the lines representing the median signature weight in each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot.

Figure 4.

Performance and characterization of the virus infection gene-expression signature. A, ROC curves illustrating the accuracy of using the virus gene-expression signature to classify infected samples from noninfected samples. Plots on the left depict the signature's performance in training data, whereas plots on the right depict the signature's performance in test datasets. Gray lines and numbers in parentheses in the left plots indicate the 95% bootstrap confidence intervals of the ROC curves and their respective AUCs. From top to bottom, test datasets were obtained from GEO under accession numbers GSE40774, GSE49288, and GSE62232. B, Heatmap of AUCs for signatures trained in one tissue type (columns) and applied to another (rows). To show contrast, all AUCs < 0.5 were trimmed to 0.5. C, Boxplots depicting the weight of the gene MKI67 (black diamond), CD274 (large black circle), and the weight distribution of genes comprising the ESTIMATE immune gene-expression signature (boxes). Dotted lines at 0.13 indicate threshold at which weights correspond to P < 0.05. Each box spans quartiles, with the lines representing the median signature weight in each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot.

Close modal

To examine whether our signature-based procedure could be used to classify additional tumor types, we derived and tested signatures in two more cancers, CESC and LIHC, for which there were suitable test datasets for validation. Each tumor type's respective signature exhibited high accuracy in the dataset from which it was derived (AUC = 0.80 and 0.84 for CESC and LIHC, respectively, Fig. 4A). In CESC, this performance also translated to an independent dataset (AUC = 0.91). However, in LIHC, the signature performed demonstrably worse in the test dataset (AUC = 0.62), though virus-positive samples still had significantly higher signature scores than virus-negative samples (P = 0.04). These analyses together served as a proof-of-principle that virus status could be inferred from expression information in multiple tumor types.

Given the associations between virus infection and increased immune infiltration across different tumor types, we hypothesized that it might be possible to predict virus infection using signatures derived from a tumor type different from the cancer to which they were applied. Successful prediction of infection status in a tissue-agnostic manner would suggest a shared biology between tumor types following virus infection. To determine the extent to which this was true, we derived a virus infection signature from each of the six virus-associated TCGA tumor types (Supplementary Table S3) and then assessed each signature's performance in cross-tissue classifications using the AUC (Fig. 4B). We found unsurprisingly that each signature performed best in the tissue type it was derived from, with AUCs ranging from 0.76 (STES) to 0.94 (BLCA), and a mean of 0.84. However, we also identified four cross-tissue predictions that yielded strong AUCs, with the BLCA signature performing well in CESC (AUC = 0.78), the CESC signature performing well in BLCA and HNSC (AUC = 0.83 and 0.82, respectively), and the STES signature performing well in HNSC and CESC (AUC = 0.72 and 0.70, respectively).

Virus infection gene signatures are associated with proliferation and immune functions

Functionally, virus infection has been contextually associated with the induction of proliferation-associated expression programs and increased immune infiltration (19). Thus, to better characterize the signals detected by the virus signature, we examined how heavily each signature weighed the single-gene proliferation marker, marker of cell proliferation KI67 (MKI67) and the multi-gene ESTIMATE immune signature (Fig. 4C; ref. 43). We found that MKI67 was weighted most highly in BLCA, HNSC, and LIHC, with weights corresponding to significantly higher proliferation rates in virus-infected cancers (P = 2eβˆ’4, 3eβˆ’3, and 0.01, respectively). In addition, at least 40% of the ESTIMATE genes in the CESC, COADREAD, HNSC, and STES signatures were weighted at levels that corresponded to significantly higher immune infiltration levels in virus-infected samples (P < 0.05). In BLCA and LIHC, this number was 10% and 4%, respectively. Considering that the virus infection score is exponentially more influenced by significant genes than insignificant ones, these results suggested that the CESC, COADREAD, HNSC, and STES signatures are primarily influenced by immune genes, whereas the BLCA signature detected a combination of immune and proliferative signals and the LIHC signature was primarily proliferative.

We additionally examined how the immune checkpoint protein and biomarker of immune checkpoint blockade response, PD-L1 (CD274) was weighted in each of the virus signatures, as this could provide insights into the virus-positive cancers most likely to respond to immune checkpoint blockade. We found that CD274 was most highly weighted in the BLCA, CESC, COADREAD, and STES signatures (P = 0.01, 5eβˆ’4, 4eβˆ’6, and 6eβˆ’6, respectively; Fig. 4C), suggesting that virus-positive samples from these cancers may be more likely to respond to immunotherapy.

Expression-inferred virus status is differentially associated with survival across cancers

We questioned how the altered expression programs associated with virus infection affect patient survival. To answer this question, we applied each virus infection signature to the microarray datasets of the same cancer type in the Prediction of Clinical Outcomes from Genomics (PRECOG) meta-dataset (34). We then modeled the relationship between virus score and patient survival in each dataset using a Cox proportional hazards model and pooled the resulting z-scores to get a summary statistic capturing the association between virus infection and patient survival in each tumor type (Supplementary Fig. S6). We identified two tumor types, HNSC and BLCA, that exhibited significant meta-associations between tissue-specific virus score and patient survival (meta-P < 0.05). In HNSC, this score was reproducibly associated with prolonged patient prognosis. However, in BLCA, this trend was reversed with the BLCA-specific virus score reproducibly associated with shorter survival (Supplementary Table S7). To determine whether these associations were due to the immune- or proliferation-associated programs captured by each tissue's respective virus signature, we dichotomized the samples in one dataset from each tumor type into signature-high and signature-low groups and then examined how MKI67 expression and the ESTIMATE immune score differed between them. Notably, these datasets did not include gold standard virus infection information, demonstrating the utility of our signature in expanding genomic studies of virus infection. In HNSC, we found that signature-high patients exhibited significantly prolonged survival times (Fig. 5A; P = 3eβˆ’3) and significantly higher ESTIMATE immune scores (Fig. 5B; P = 4eβˆ’3) compared with the signature-low group. However, these groups did not differ in their MKI67 expression levels (Fig. 5B). To confirm the validity of these associations, we replicated them in an additional HNSC dataset that lacked survival information but had gold-standard virus infection information (Supplementary Fig. S7). Together, these results suggested that virus infection in HNSC can induce a prolonged patient survival phenotype by inducing a tumor immune response. In BLCA, signature-high patients exhibited significantly shorter survival times (Fig. 5C; P = 5eβˆ’5), as well as higher levels of MKI67 expression (P = 3eβˆ’4) and ESTIMATE immune signature scores (Fig. 5D; P = 1eβˆ’5). This finding indicated that virus infections in BLCA could induce both increased immune infiltration and increased cell proliferation. However, the association between virus infection and shorter survival suggested that the increased proliferation of the tumor cells overwhelms the ability of the immune system to keep tumor development in check.

Figure 5.

Association between the virus infection gene-expression signature and survival of patients with head and neck cancer and bladder cancer. A, Kaplan–Meier plot depicting the survival probability over 5 years for samples with high (red) and low (blue) virus infection signature scores in the Thurlow et al. head and neck cancer dataset (52). B, Boxplots depicting the difference in MKI67 expression (left) and ESTIMATE immune score (right) between signature low and signature high samples in the Thurlow et al. dataset (52). C, Kaplan–Meier plot depicting the survival probability over 5 years for samples with high (red) and low (blue) virus infection signature scores in the Kim et al. bladder cancer dataset (53). D, Boxplots depicting the difference in MKI67 expression (left) and ESTIMATE immune score (right) between signature low and signature high samples in the Kim et al. dataset (53). For all Kaplan–Meier plots, samples were stratified into high and low groups using the median virus infection score. P values were calculated using the log-rank test and indicate difference between the survival distributions of the full dataset. Vertical hash marks indicate censored data. In all boxplots, boxes span quartiles, with the lines representing the median expression or score for each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot. P values were calculated using the Wilcoxon sum-rank test.

Figure 5.

Association between the virus infection gene-expression signature and survival of patients with head and neck cancer and bladder cancer. A, Kaplan–Meier plot depicting the survival probability over 5 years for samples with high (red) and low (blue) virus infection signature scores in the Thurlow et al. head and neck cancer dataset (52). B, Boxplots depicting the difference in MKI67 expression (left) and ESTIMATE immune score (right) between signature low and signature high samples in the Thurlow et al. dataset (52). C, Kaplan–Meier plot depicting the survival probability over 5 years for samples with high (red) and low (blue) virus infection signature scores in the Kim et al. bladder cancer dataset (53). D, Boxplots depicting the difference in MKI67 expression (left) and ESTIMATE immune score (right) between signature low and signature high samples in the Kim et al. dataset (53). For all Kaplan–Meier plots, samples were stratified into high and low groups using the median virus infection score. P values were calculated using the log-rank test and indicate difference between the survival distributions of the full dataset. Vertical hash marks indicate censored data. In all boxplots, boxes span quartiles, with the lines representing the median expression or score for each group. Whiskers represent absolute range, excluding outliers. All outliers were included in the plot. P values were calculated using the Wilcoxon sum-rank test.

Close modal

Virus-induced cancers are distinct from other tumor types in that they are the result of the actions of an infectious agent rather than a mutagenic process. Upon infection, viruses can induce the expression and release of pathogenic proteins into the tumor microenvironment, making the cancers they infect an interesting model for which to investigate the tumor immune response. In this study, we have profiled the tumor microenvironment and genetic programs associated with virus infection in six cancer types. Our analyses provide new insights into how viruses reshape the tumor microenvironment, identifying differences in immune infiltration and TCR diversity in infected compared with non-infected samples. In addition, our work furthers understanding of how viruses affect tumor proliferation and patient survival by expanding genomic studies of virus infections into additional datasets. Collectively, these results may aid in efforts to identify virus-associated cancer patient subgroups that are sensitive to immunotherapy.

We identified three cancer types, CESC, HNSC, and STES, where the microenvironment of virus-positive samples was consistently more infiltrated than that of their noninfected counterparts. When adjusting for MSI status, we found COADREAD samples showed similar trends. In many cases, virus-positive samples from these cancers exhibited significantly lower non-silent mutation loads, suggesting that the immune system may have been shaping their evolution through immunoediting. These tumor types are primarily infected by human papillomaviruses and human herpesviruses, which are both known to be involved in direct carcinogenesis. Several studies have characterized the immunogenic nature of these viral proteins, and our results indicate that the presence of these antigens in the tumor microenvironment can elicit a strong immune response (44–48). In contrast with these findings, these associations were not present in hepatitis-infected LIHC, despite the fact that the power of our analysis was comparable with that of other cancer types. Hepatitis viruses cause chronic infections of the liver, leading to persistent inflammation that can contribute to the deregulation of the immune response and T-cell exhaustion (49, 50). Our results supported these findings, with hepatitis B virus-infected patients exhibiting decreases in the infiltration of cytolytic cells, downregulation of antigen presentation machinery, and elevated mutation loads, all hallmarks of an immunosuppressed microenvironment. Together, our results illustrate how differences in viral prevalence across different tissue types may shape our understanding of the tumor microenvironment of the cancer at the population level. Virus infection should thus be carefully considered when studying the tumor–immune interactions of these cancers.

To better understand how viral proteins can alter the adaptive immune response in these cancers, we examined how the clonal dynamics of the infiltrating TCR repertoire varied across cancers infected with different viruses. Interestingly, EBV-infected samples exhibited the lowest TCR diversity levels in STES and other tumor types, suggesting evidence of a clonal T-cell response. Intriguingly, a recent study has identified a gastric cancer patient positive for EBV that exhibited clinical benefit in response to the anti–PD-L1 drug avelumab (24). Furthermore, this study found, in agreement with our findings, that EBV-positive patients exhibit increased immune infiltration. Given that high immune infiltration and low TCR diversity have both been associated with response to immune checkpoint blockade (14, 51), our analyses suggest that EBV-positive cancers may represent good targets for future immunotherapeutic approaches. Beyond EBV, we observed no significant differences in TCR diversity between samples infected with a given virus and noninfected samples. Although surprising, this result is consistent with a study in CESC, which found that the primary antitumor T-cell responses in patients receiving adoptive T-cell therapy are directed against tumor-intrinsic neoantigens or cancer germline antigens, rather than viral antigens (25). Going forward, analyses performing targeted TCR sequencing on patient samples may further our understanding of how virus infection alters the clonal dynamics of the T-cell response. This will be important, as our early results suggest that improving our understanding of this process can further refine our ability to identify immunotherapy-sensitive patients.

Studying the effects of virus infection on the tumor microenvironment is inherently challenging as virus infection only occurs in a small subset of most cancer types. To complement our study, and increase the power of future ones, we developed a series of tissue-specific virus infection signatures capable of predicting a patient's virus infection status from their gene expression profile. These signatures were highly accurate in a variety of contexts, with virus-infected patients exhibiting significantly higher scores than noninfected patients, and thus enabled the expansion of studies into independent gene expression datasets lacking virus infection information. Functional characterization of these signatures revealed that the CESC, COADREAD, HNSC, and STES were associated with immune-based signals, while BLCA and LIHC were more indicative of higher cell proliferation rates. These results suggested that viruses of different families are associated with distinct transcriptomic programs in the tumors they infect. Survival analyses using these signatures supported this finding, with virus-high HNSC patients exhibiting a prolonged prognosis and high levels of immune infiltration while virus-high BLCA patients exhibiting shorter survival and higher cell proliferation rates, despite also having a higher immune infiltration level. The similarities and differences in the signatures across cancer types are likely due to the viruses that are most prevalent in them, and our results thus indicate the importance of understanding how infections by different viruses can affect tumor development.

In summary, we present a multi-tissue analysis of the microenvironment and genetic changes associated with virus infection. Our results highlight the divergent changes associated with virus infection in different tumor types and suggest that viruses of different families should be treated as unique features with regards to the design of immunotherapeutic approaches. Going forward, it will be important to collect genomic data and virus infection information from additional patients, as in several cancer types infection is exceedingly rare and power to detect significant associations is limited. By integrating this information with findings from previous studies, we can begin to fully appreciate the role of viruses in driving tumor progression and harness this knowledge for therapeutic benefit.

No potential conflicts of interest were disclosed.

Conception and design: F.S. Varn, C. Cheng

Development of methodology: F.S. Varn, C. Cheng

Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): F.S. Varn, E. Schaafsma, Y. Wang, C. Cheng

Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): F.S. Varn, E. Schaafsma, Y. Wang, C. Cheng

Writing, review, and/or revision of the manuscript: F.S. Varn, E. Schaafsma, Y. Wang, C. Cheng

Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): F.S. Varn, C. Cheng

Study supervision: C. Cheng

We thank Randolph Noelle for his valuable suggestions in editing this article as well as John Hudson for his aid in the computing process. This work was supported by the National Institutes of Health under award number 1R01CA547195, the National Center for Advancing Translational Sciences of the National Institutes of Health under award number UL1TR001086, the Norris Cotton Cancer Center Core Grant Developmental Funds, and the Geisel School of Medicine at Dartmouth College start-up funding package (to C. Cheng).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Pardoll
β€ˆ
DM
.β€ˆ
The blockade of immune checkpoints in cancer immunotherapy
.
Nat Rev Cancer
β€ˆ
2012
;
12
:
252
–
64
.
2.
Alsaab
β€ˆ
HO
,
Sau
β€ˆ
S
,
Alzhrani
β€ˆ
R
,
Tatiparti
β€ˆ
K
,
Bhise
β€ˆ
K
,
Kashaw
β€ˆ
SK
, et al
β€ˆ
PD-1 and PD-L1 checkpoint signaling inhibition for cancer immunotherapy: mechanism, combinations, and clinical outcome
.
Front Pharmacol
β€ˆ
2017
;
8
:
561
.
3.
Robert
β€ˆ
C
,
Long
β€ˆ
GV
,
Brady
β€ˆ
B
,
Dutriaux
β€ˆ
C
,
Maio
β€ˆ
M
,
Mortier
β€ˆ
L
, et al
β€ˆ
Nivolumab in previously untreated melanoma without BRAF mutation
.
N Engl J Med
β€ˆ
2015
;
372
:
320
–
30
.
4.
Borghaei
β€ˆ
H
,
Paz-Ares
β€ˆ
L
,
Horn
β€ˆ
L
,
Spigel
β€ˆ
DR
,
Steins
β€ˆ
M
,
Ready
β€ˆ
NE
, et al
β€ˆ
Nivolumab versus docetaxel in advanced nonsquamous non–small cell lung cancer
.
N Engl J Med
β€ˆ
2015
;
373
:
1627
–
39
.
5.
Garon
β€ˆ
EB
,
Rizvi
β€ˆ
NA
,
Hui
β€ˆ
R
,
Leighl
β€ˆ
N
,
Balmanoukian
β€ˆ
AS
,
Eder
β€ˆ
JP
, et al
β€ˆ
Pembrolizumab for the treatment of non–small cell lung cancer
.
N Engl J Med
β€ˆ
2015
;
372
:
2018
–
28
.
6.
Motzer
β€ˆ
RJ
,
Escudier
β€ˆ
B
,
McDermott
β€ˆ
DF
,
George
β€ˆ
S
,
Hammers
β€ˆ
HJ
,
Srinivas
β€ˆ
S
, et al
β€ˆ
Nivolumab versus everolimus in advanced renal-cell carcinoma
.
N Engl J Med
β€ˆ
2015
;
373
:
1803
–
13
.
7.
Hodi
β€ˆ
FS
,
O'Day
β€ˆ
SJ
,
McDermott
β€ˆ
DF
,
Weber
β€ˆ
RW
,
Sosman
β€ˆ
JA
,
Haanen
β€ˆ
JB
, et al
β€ˆ
Improved survival with ipilimumab in patients with metastatic melanoma
.
N Engl J Med
β€ˆ
2010
;
363
:
711
–
23
.
8.
Schumacher
β€ˆ
TN
,
Schreiber
β€ˆ
RD
.β€ˆ
Neoantigens in cancer immunotherapy
.
Science
β€ˆ
2015
;
348
:
69
–
74
.
9.
Turajlic
β€ˆ
S
,
Litchfield
β€ˆ
K
,
Xu
β€ˆ
H
,
Rosenthal
β€ˆ
R
,
McGranahan
β€ˆ
N
,
Reading
β€ˆ
JL
, et al
β€ˆ
Insertion-and-deletion-derived tumour-specific neoantigens and the immunogenic phenotype: a pan-cancer analysis
.
Lancet Oncol
β€ˆ
2017
;
18
:
1009
–
21
.
10.
Goodman
β€ˆ
AM
,
Kato
β€ˆ
S
,
Bazhenova
β€ˆ
L
,
Patel
β€ˆ
SP
,
Frampton
β€ˆ
GM
,
Miller
β€ˆ
V
, et al
β€ˆ
Tumor mutational burden as an independent predictor of response to immunotherapy in diverse cancers
.
Mol Cancer Ther
β€ˆ
2017
;
16
:
2598
–
608
.
11.
Snyder
β€ˆ
A
,
Makarov
β€ˆ
V
,
Merghoub
β€ˆ
T
,
Yuan
β€ˆ
J
,
Zaretsky
β€ˆ
JM
,
Desrichard
β€ˆ
A
, et al
β€ˆ
Genetic basis for clinical response to CTLA-4 blockade in melanoma
.
N Engl J Med
β€ˆ
2014
;
371
:
2189
–
99
.
12.
Rizvi
β€ˆ
NA
,
Hellmann
β€ˆ
MD
,
Snyder
β€ˆ
A
,
Kvistborg
β€ˆ
P
,
Makarov
β€ˆ
V
,
Havel
β€ˆ
JJ
, et al
β€ˆ
Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer
.
Science
β€ˆ
2015
;
348
:
124
–
8
.
13.
Le
β€ˆ
DT
,
Uram
β€ˆ
JN
,
Wang
β€ˆ
H
,
Bartlett
β€ˆ
BR
,
Kemberling
β€ˆ
H
,
Eyring
β€ˆ
AD
, et al
β€ˆ
PD-1 blockade in tumors with mismatch-repair deficiency
.
N Engl J Med
β€ˆ
2015
;
372
:
2509
–
20
.
14.
Van Allen
β€ˆ
EM
,
Miao
β€ˆ
D
,
Schilling
β€ˆ
B
,
Shukla
β€ˆ
SA
,
Blank
β€ˆ
C
,
Zimmer
β€ˆ
L
, et al
β€ˆ
Genomic correlates of response to CTLA-4 blockade in metastatic melanoma
.
Science
β€ˆ
2015
;
350
:
207
–
11
.
15.
Rosenberg
β€ˆ
JE
,
Hoffman-Censits
β€ˆ
J
,
Powles
β€ˆ
T
,
van der Heijden
β€ˆ
MS
,
Balar
β€ˆ
AV
,
Necchi
β€ˆ
A
, et al
β€ˆ
Atezolizumab in patients with locally advanced and metastatic urothelial carcinoma who have progressed following treatment with platinum-based chemotherapy: a single-arm, multicentre, phase 2 trial
.
Lancet
β€ˆ
2016
;
387
:
1909
–
20
.
16.
Johnson
β€ˆ
DB
,
Frampton
β€ˆ
GM
,
Rioth
β€ˆ
MJ
,
Yusko
β€ˆ
E
,
Xu
β€ˆ
Y
,
Guo
β€ˆ
X
, et al
β€ˆ
Targeted next generation sequencing identifies markers of response to PD-1 blockade
.
Cancer Immunol Res
β€ˆ
2016
;
4
:
959
–
67
.
17.
Cao
β€ˆ
S
,
Wendl
β€ˆ
MC
,
Wyczalkowski
β€ˆ
MA
,
Wylie
β€ˆ
K
,
Ye
β€ˆ
K
,
Jayasinghe
β€ˆ
R
, et al
β€ˆ
Divergent viral presentation among human tumors and adjacent normal tissues
.
Sci Rep
β€ˆ
2016
;
6
:
28294
.
18.
Vandeven
β€ˆ
N
,
Nghiem
β€ˆ
P
.β€ˆ
Pathogen-driven cancers and emerging immune therapeutic strategies
.
Cancer Immunol Res
β€ˆ
2014
;
2
:
9
–
14
.
19.
Moore
β€ˆ
PS
,
Chang
β€ˆ
Y
.β€ˆ
Why do viruses cause cancer? Highlights of the first century of human tumour virology
.
Nat Rev Cancer
β€ˆ
2010
;
10
:
878
–
89
.
20.
Rooney
β€ˆ
MS
,
Shukla
β€ˆ
SA
,
Wu
β€ˆ
CJ
,
Getz
β€ˆ
G
,
Hacohen
β€ˆ
N
.β€ˆ
Molecular and genetic properties of tumors associated with local immune cytolytic activity
.
Cell
β€ˆ
2015
;
160
:
48
–
61
.
21.
Li
β€ˆ
B
,
Severson
β€ˆ
E
,
Pignon
β€ˆ
JC
,
Zhao
β€ˆ
H
,
Li
β€ˆ
T
,
Novak
β€ˆ
J
, et al
β€ˆ
Comprehensive analyses of tumor immunity: implications for cancer immunotherapy
.
Genome Biol
β€ˆ
2016
;
17
:
174
.
22.
Seiwert
β€ˆ
TY
,
Burtness
β€ˆ
B
,
Mehra
β€ˆ
R
,
Weiss
β€ˆ
J
,
Berger
β€ˆ
R
,
Eder
β€ˆ
JP
, et al
β€ˆ
Safety and clinical activity of pembrolizumab for treatment of recurrent or metastatic squamous cell carcinoma of the head and neck (KEYNOTE-012): an open-label, multicentre, phase 1b trial
.
Lancet Oncol
β€ˆ
2016
;
17
:
956
–
65
.
23.
Nghiem
β€ˆ
PT
,
Bhatia
β€ˆ
S
,
Lipson
β€ˆ
EJ
,
Kudchadkar
β€ˆ
RR
,
Miller
β€ˆ
NJ
,
Annamalai
β€ˆ
L
, et al
β€ˆ
PD-1 blockade with pembrolizumab in advanced merkel-cell carcinoma
.
N Engl J Med
β€ˆ
2016
;
374
:
2542
–
52
.
24.
Panda
β€ˆ
A
,
Mehnert
β€ˆ
JM
,
Hirshfield
β€ˆ
KM
,
Riedlinger
β€ˆ
G
,
Damare
β€ˆ
S
,
Saunders
β€ˆ
T
, et al
β€ˆ
Immune activation and benefit from avelumab in EBV-positive gastric cancer
.
J Natl Cancer Inst
β€ˆ
2018
;
110
:
316
–
20
.
25.
Stevanovic
β€ˆ
S
,
Pasetto
β€ˆ
A
,
Helman
β€ˆ
SR
,
Gartner
β€ˆ
JJ
,
Prickett
β€ˆ
TD
,
Howie
β€ˆ
B
, et al
β€ˆ
Landscape of immunogenic tumor antigens in successful immunotherapy of virally induced epithelial cancer
.
Science
β€ˆ
2017
;
356
:
200
–
5
.
26.
Tang
β€ˆ
KW
,
Alaei-Mahabadi
β€ˆ
B
,
Samuelsson
β€ˆ
T
,
Lindh
β€ˆ
M
,
Larsson
β€ˆ
E
.β€ˆ
The landscape of viral expression and host gene fusion and adaptation in human cancer
.
Nat Commun
β€ˆ
2013
;
4
:
2513
.
27.
Varn
β€ˆ
FS
,
Wang
β€ˆ
Y
,
Mullins
β€ˆ
DW
,
Fiering
β€ˆ
S
,
Cheng
β€ˆ
C
.β€ˆ
Systematic pan-cancer analysis reveals immune cell interactions in the tumor microenvironment
.
Cancer Res
β€ˆ
2017
;
77
:
1271
–
82
.
28.
Newman
β€ˆ
AM
,
Liu
β€ˆ
CL
,
Green
β€ˆ
MR
,
Gentles
β€ˆ
AJ
,
Feng
β€ˆ
W
,
Xu
β€ˆ
Y
, et al
β€ˆ
Robust enumeration of cell subsets from tissue expression profiles
.
Nat Methods
β€ˆ
2015
;
12
:
453
–
7
.
29.
Li
β€ˆ
B
,
Li
β€ˆ
T
,
Pignon
β€ˆ
JC
,
Wang
β€ˆ
B
,
Wang
β€ˆ
J
,
Shukla
β€ˆ
SA
, et al
β€ˆ
Landscape of tumor-infiltrating T cell repertoire of human cancers
.
Nat Genet
β€ˆ
2016
;
48
:
725
–
32
.
30.
Mose
β€ˆ
LE
,
Selitsky
β€ˆ
SR
,
Bixby
β€ˆ
LM
,
Marron
β€ˆ
DL
,
Iglesia
β€ˆ
MD
,
Serody
β€ˆ
JS
, et al
β€ˆ
Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V'DJer
.
Bioinformatics
β€ˆ
2016
;
32
:
3729
–
34
.
31.
Bolotin
β€ˆ
DA
,
Shugay
β€ˆ
M
,
Mamedov
β€ˆ
IZ
,
Putintseva
β€ˆ
EV
,
Turchaninova
β€ˆ
MA
,
Zvyagin
β€ˆ
IV
, et al
β€ˆ
MiTCR: software for T-cell receptor sequencing data analysis
.
Nat Methods
β€ˆ
2013
;
10
:
813
–
4
.
32.
Langmead
β€ˆ
B
,
Salzberg
β€ˆ
SL
.β€ˆ
Fast gapped-read alignment with Bowtie 2
.
Nat Methods
β€ˆ
2012
;
9
:
357
–
9
.
33.
Bonneville
β€ˆ
R
,
Krook
β€ˆ
MA
,
Kautto
β€ˆ
EA
,
Miya
β€ˆ
J
,
Wing
β€ˆ
MR
,
Chen
β€ˆ
H-Z
, et al
β€ˆ
Landscape of microsatellite instability across 39 cancer types
.
JCO Precis Oncol
β€ˆ
2017
:2017
;1–15
.
34.
Gentles
β€ˆ
AJ
,
Newman
β€ˆ
AM
,
Liu
β€ˆ
CL
,
Bratman
β€ˆ
SV
,
Feng
β€ˆ
W
,
Kim
β€ˆ
D
, et al
β€ˆ
The prognostic landscape of genes and infiltrating immune cells across human cancers
.
Nat Med
β€ˆ
2015
;
21
:
938
–
45
.
35.
Charoentong
β€ˆ
P
,
Finotello
β€ˆ
F
,
Angelova
β€ˆ
M
,
Mayer
β€ˆ
C
,
Efremova
β€ˆ
M
,
Rieder
β€ˆ
D
, et al
β€ˆ
Pan-cancer immunogenomic analyses reveal genotype-immunophenotype relationships and predictors of response to checkpoint blockade
.
Cell Rep
β€ˆ
2017
;
18
:
248
–
62
.
36.
Hanzelmann
β€ˆ
S
,
Castelo
β€ˆ
R
,
Guinney
β€ˆ
J
.β€ˆ
GSVA: gene set variation analysis for microarray and RNA-seq data
.
BMC Bioinformatics
β€ˆ
2013
;
14
:
7
.
37.
Cheng
β€ˆ
C
,
Yan
β€ˆ
X
,
Sun
β€ˆ
F
,
Li
β€ˆ
LM
.β€ˆ
Inferring activity changes of transcription factors by binding association with sorted expression profiles
.
BMC Bioinformatics
β€ˆ
2007
;
8
:
452
.
38.
Zhao
β€ˆ
Y
,
Varn
β€ˆ
FS
,
Cai
β€ˆ
G
,
Xiao
β€ˆ
F
,
Amos
β€ˆ
CI
,
Cheng
β€ˆ
C
.β€ˆ
A P53-deficiency gene signature predicts recurrence risk of patients with early-stage lung adenocarcinoma
.
Cancer Epidemiol Biomarkers Prev
β€ˆ
2018
;
27
:
86
–
95
.
39.
Stouffer
β€ˆ
SA
,
Suchman
β€ˆ
EA
,
Devinney
β€ˆ
LC
,
Williams
β€ˆ
RM
β€ˆJr
.
The American soldier: adjustment during army life
.
Princeton, NJ:
Princeton University Press
;β€ˆ
1949
.
40.
Saltz
β€ˆ
J
,
Gupta
β€ˆ
R
,
Hou
β€ˆ
L
,
Kurc
β€ˆ
T
,
Singh
β€ˆ
P
,
Nguyen
β€ˆ
V
, et al
β€ˆ
Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images
.
Cell Rep
β€ˆ
2018
;
23
:
181
–
93
.
41.
Hause
β€ˆ
RJ
,
Pritchard
β€ˆ
CC
,
Shendure
β€ˆ
J
,
Salipante
β€ˆ
SJ
.β€ˆ
Classification and characterization of microsatellite instability across 18 cancer types
.
Nat Med
β€ˆ
2016
;
22
:
1342
–
50
.
42.
Thorsson
β€ˆ
V
,
Gibbs
β€ˆ
DL
,
Brown
β€ˆ
SD
,
Wolf
β€ˆ
D
,
Bortone
β€ˆ
DS
,
Ou Yang
β€ˆ
TH
, et al
β€ˆ
The immune landscape of cancer
.
Immunity
β€ˆ
2018
;
48
:
812
–
30
.
43.
Yoshihara
β€ˆ
K
,
Shahmoradgoli
β€ˆ
M
,
Martinez
β€ˆ
E
,
Vegesna
β€ˆ
R
,
Kim
β€ˆ
H
,
Torres-Garcia
β€ˆ
W
, et al
β€ˆ
Inferring tumour purity and stromal and immune cell admixture from expression data
.
Nat Commun
β€ˆ
2013
;
4
:
2612
.
44.
Chen
β€ˆ
LP
,
Thomas
β€ˆ
EK
,
Hu
β€ˆ
SL
,
Hellstrom
β€ˆ
I
,
Hellstrom
β€ˆ
KE
.β€ˆ
Human papillomavirus type 16 nucleoprotein E7 is a tumor rejection antigen
.
Proc Natl Acad Sci U S A
β€ˆ
1991
;
88
:
110
–
4
.
45.
Ramos
β€ˆ
CA
,
Narala
β€ˆ
N
,
Vyas
β€ˆ
GM
,
Leen
β€ˆ
AM
,
Gerdemann
β€ˆ
U
,
Sturgis
β€ˆ
EM
, et al
β€ˆ
Human papillomavirus type 16 E6/E7-specific cytotoxic T lymphocytes for adoptive immunotherapy of HPV-associated malignancies
.
J Immunother
β€ˆ
2013
;
36
:
66
–
76
.
46.
Steele
β€ˆ
JC
,
Mann
β€ˆ
CH
,
Rookes
β€ˆ
S
,
Rollason
β€ˆ
T
,
Murphy
β€ˆ
D
,
Freeth
β€ˆ
MG
, et al
β€ˆ
T-cell responses to human papillomavirus type 16 among women with different grades of cervical neoplasia
.
Br J Cancer
β€ˆ
2005
;
93
:
248
–
59
.
47.
Merlo
β€ˆ
A
,
Turrini
β€ˆ
R
,
Dolcetti
β€ˆ
R
,
Martorelli
β€ˆ
D
,
Muraro
β€ˆ
E
,
Comoli
β€ˆ
P
, et al
β€ˆ
The interplay between Epstein-Barr virus and the immune system: a rationale for adoptive cell therapy of EBV-related disorders
.
Haematologica
β€ˆ
2010
;
95
:
1769
–
77
.
48.
La Rosa
β€ˆ
C
,
Diamond
β€ˆ
DJ
.β€ˆ
The immune response to human CMV
.
Future Virol
β€ˆ
2012
;
7
:
279
–
93
.
49.
Seeger
β€ˆ
C
,
Mason
β€ˆ
WS
.β€ˆ
Hepatitis B virus biology
.
Microbiol Mol Biol Rev
β€ˆ
2000
;
64
:
51
–
68
.
50.
Ye
β€ˆ
B
,
Liu
β€ˆ
X
,
Li
β€ˆ
X
,
Kong
β€ˆ
H
,
Tian
β€ˆ
L
,
Chen
β€ˆ
Y
.β€ˆ
T-cell exhaustion in chronic hepatitis B infection: current knowledge and clinical significance
.
Cell Death Dis
β€ˆ
2015
;
6
:
e1694
.
51.
Tumeh
β€ˆ
PC
,
Harview
β€ˆ
CL
,
Yearley
β€ˆ
JH
,
Shintaku
β€ˆ
IP
,
Taylor
β€ˆ
EJ
,
Robert
β€ˆ
L
, et al
β€ˆ
PD-1 blockade induces responses by inhibiting adaptive immune resistance
.
Nature
β€ˆ
2014
;
515
:
568
–
71
.
52.
Thurlow
β€ˆ
JK
,
Pena Murillo
β€ˆ
CL
,
Hunter
β€ˆ
KD
,
Buffa
β€ˆ
FM
,
Patiar
β€ˆ
S
,
Betts
β€ˆ
G
, et al
β€ˆ
Spectral clustering of microarray data elucidates the roles of microenvironment remodeling and immune responses in survival of head and neck squamous cell carcinoma
.
J Clin Oncol
β€ˆ
2010
;
28
:
2881
–
8
.
53.
Kim
β€ˆ
WJ
,
Kim
β€ˆ
EJ
,
Kim
β€ˆ
SK
,
Kim
β€ˆ
YJ
,
Ha
β€ˆ
YS
,
Jeong
β€ˆ
P
, et al
β€ˆ
Predictive value of progression-related gene classifier in primary non-muscle invasive bladder cancer
.
Mol Cancer
β€ˆ
2010
;
9
:
3
. doi: .

Supplementary data