Abstract
Immune checkpoint inhibitors are promising treatments for patients with a variety of malignancies. Toward understanding the determinants of response to immune checkpoint inhibitors, it was previously demonstrated that the presence of somatic mutations is associated with benefit from checkpoint inhibition. A hypothesis was posited that neoantigen homology to pathogens may in part explain the link between somatic mutations and response. To further examine this hypothesis, we reanalyzed cancer exome data obtained from our previously published study of 64 melanoma patients treated with CTLA-4 blockade and a new dataset of RNA-Seq data from 24 of these patients. We found that the ability to accurately predict patient benefit did not increase as the analysis narrowed from somatic mutation burden, to inclusion of only those mutations predicted to be MHC class I neoantigens, to only including those neoantigens that were expressed or that had homology to pathogens. The only association between somatic mutation burden and response was found when examining samples obtained prior to treatment. Neoantigen and expressed neoantigen burden were also associated with response, but neither was more predictive than somatic mutation burden. Neither the previously described tetrapeptide signature nor an updated method to evaluate neoepitope homology to pathogens was more predictive than mutation burden. Cancer Immunol Res; 5(1); 84–91. ©2016 AACR.
Introduction
Checkpoint blockade therapies are improving outcomes for patients with metastatic solid tumors (1–4). As only a subset of patients responds, there is a critical need to identify determinants of response. Expression of programmed death ligand one (PD-L1) is the lead companion diagnostic for PD-1/PD-L1 blockade therapies, but sensitivity and specificity are limited (5–7). An association between elevated tumor mutation burden and benefit from checkpoint blockade therapies has been demonstrated (8–11).
In our study of melanomas treated with checkpoint blockade agents targeting cytotoxic T-lymphocyte associated protein 4 (CTLA-4; ref. 8), we present the hypothesis that responding tumors may share features with each other or with infectious agents and that such resemblance may predict response. In this report, we reanalyzed the data in that study using updated methods and integrating new RNA sequencing (RNA-Seq) data from a subset of 24 samples.
We found that in this small dataset, nonsynonymous mutation burden was associated with clinical benefit from therapy in samples collected before, but not after, treatment with CTLA-4 blockade. Predicted neoantigen burden and percentage of C→T transitions characteristic of ultraviolet damage were associated with, but did not outperform, mutation burden. We developed a publicly available tool, Topeology (https://github.com/hammerlab/topeology), to compare neoantigens to known pathogens. Neither the resemblance of tumor neoantigens to known antigens nor the previously published tetrapeptide signature outperformed mutation burden as a predictor of response.
Materials and Methods
Patient samples
All analyzed samples were collected in accordance with local Internal Review Board policies as described in ref. 8 and summarized in Table 1. Thirty-four patients had tumor samples collected prior to initiating CTLA-4 blockade, and 30 patients had samples collected after initiating CTLA-4 blockade. Clinical benefit was defined as progression-free survival lasting for greater than 24 weeks after initiation of therapy (Online Data File 1). Nine discordant lesions were present, where overall patient benefit did not match individual tumor progression. See Table 1 for details about this patient cohort.
Cohort summary
Group . | Benefit . | No benefit . | Discordant . |
---|---|---|---|
N | 27 | 28 | 9 |
% Cutaneous | 20/27 | 19/28 | 5/9 |
OS | 3.7 (1.6–7.3) | 0.8 (0.2–2.7) | 4 (1.7–7.9) |
Age | 65 (33–81) | 58.5 (18–79) | 68 (40–90) |
Mutations | 611 (165–3,394) | 321 (6–1,816) | 549 (93–1,336) |
Neoantigens | 1,388 (209–6,502) | 714.5 (3–4,510) | 1,048 (197–2,584) |
Group . | Benefit . | No benefit . | Discordant . |
---|---|---|---|
N | 27 | 28 | 9 |
% Cutaneous | 20/27 | 19/28 | 5/9 |
OS | 3.7 (1.6–7.3) | 0.8 (0.2–2.7) | 4 (1.7–7.9) |
Age | 65 (33–81) | 58.5 (18–79) | 68 (40–90) |
Mutations | 611 (165–3,394) | 321 (6–1,816) | 549 (93–1,336) |
Neoantigens | 1,388 (209–6,502) | 714.5 (3–4,510) | 1,048 (197–2,584) |
NOTE: Features of tumors from patients with clinical benefit, no benefit, or in which a discordant lesion was resected.
Abbreviation: OS, overall survival.
Mutation calls
Single-nucleotide variants (SNV) were called with an ensemble of four variant callers: Mutect, Strelka, SomaticSniper, and Varscan as described previously (9). Insertions and deletions (indels) were called using Strelka with default settings.
HLA typing
HLA types were determined by ATHLATES for all samples using exome sequence data and confirmed with seq2HLA for samples that had RNA-Seq available (24 samples; Online Data File 2).
Neoepitope prediction
Somatic SNVs that occurred a single base away from other somatic SNVs were combined into larger variants containing both SNVs. For each somatic variant, we used Topiary (https://github.com/hammerlab/topiary) to generate the predicted 8–11mer amino acid product resulting from somatic alterations (SNV or indel), including predicted neoepitopes generated from combined SNVs (Online Data Files 3–4). Each variant was linked to its corresponding coding DNA sequence (CDS) from Ensembl based on its B37 coordinates. The CDS sequence was retranslated with the mutated DNA residue producing the mutated peptide product. NetMHCcons v1.1 generated a predicted binding affinity for all 8-mers to 11-mers containing the mutated amino acid and all peptides with an IC50 score below 500 nmol/L were considered predicted neoepitopes. For variants longer than a single residue, we looked at all 8-mers to 11-mers generated downstream of the variant. Neoepitopes from ref. 10 were generated from a separate pipeline as published.
RNA-Seq
The 24 tumor RNA samples were a subset of the published 64-sample dataset and included those samples from the 64-sample set that had sufficient tissue for RNA isolation. Data from some of the samples have been presented previously (12). Sequencing libraries were prepared from total RNA with the Illumina TruSeq mRNA Library Kit (v2) and then sequenced on the HiSeq 2500 with 2 × 50 bp paired reads, yielding 47 to 60 million reads per sample (New York Genome Center, New York, NY). The RNA reads were aligned using the STAR aligner after which Cufflinks was used for gene quantification (FPKM; Online Data File 5). seq2HLA was also used to quantify HLA gene expression (RPKM). Allele-specific expression was measured by the fraction of RNA reads supporting the variants found in exome sequencing (Online Data File 6).
Gene set enrichment analysis (GSEA) was performed using v2.2.0 of the software provided by the Broad Institute (http://www.broadinstitute.org/gsea/index.jsp), and the Hallmark gene set collection that was used in comparison was accessed on August 18, 2015, from the MSigDB website (http://www.broadinstitute.org/gsea/msigdb/index.jsp). The Hallmark gene set collection was extended by adding gene symbols corresponding to well-known peptides that are (i) tumor specific; (ii) associated with differentiation; and (iii) overexpressed in cancer cells. To do this, gene symbols were imported from the Cancer Immunity Peptide Database as gene sets that are compatible with the GSEA software. Before running the GSEA, the gene expression data (FPKM) were collapsed using official gene symbol identifiers and the median expression value used when multiple transcripts mapped to the same gene symbol. To normalize the data further, noninformative genes with no variation (SD of 0) across all samples were removed. Three GSEA analyses were conducted comparing: (i) pretreatment benefiting versus pretreatment nonbenefiting; (ii) pretreatment benefiting versus posttreatment benefiting; and (iii) pretreatment nonbenefiting versus posttreatment nonbenefiting. In all these comparisons, the normalized gene expression values were used as the input matrix. The number of permutations was set to 1,000, restricting these permutations to gene set labels rather than the sample phenotype labels due to our sample size, and we kept the rest of the default options (see http://www.hammerlab.org/melanoma-reanalysis/gsea-results/ for complete reports and instructions to replicate them).
Neoantigen homology
We developed a tool, Topeology, to compare tumor neoepitopes with entries in the Immune Epitope Database (IEDB; Online Data File 7; ref. 13), accounting for position, amino acid gaps, and biochemical similarity between amino acids. Epitopes were compared, and the comparison scored, using the Smith–Waterman alignment algorithm (14) supplied with a substitution matrix consisting of PMBEC correlation values derived from the PMBEC covariance matrix (15). We compared amino acids from position 3 to the penultimate amino acid of the peptide, assuming that the anchor residues would be necessary for MHC class I presentation and would therefore not be “visible” to a T cell. A gap penalty equal to the lowest PMBEC correlation value was supplied. Peptide comparisons were only considered if they were the same length. Smith–Waterman scores were normalized for length by dividing by the length of the peptide section being compared.
Peptides were only included for comparison if the mutant peptide score was greater than or equal to the wild-type peptide score. In addition, an epitope from IEDB was only compared with a neoepitope if either (i) the patient's HLA allele(s) presenting that neoepitope were listed as HLA alleles for the IEDB epitope or (ii) the IEDB epitope was a predicted binder for one of the patient's alleles.
To narrow down candidate epitopes to those with some evidence of biological relevance, peptides from IEDB were required to exhibit in vitro evidence of human T-cell activation. Because many peptides showed different T-cell responses depending on the assay used, at least 60% of the instances of that specific amino acid sequence found in IEDB were required to exhibit an activated T-cell response in order for that peptide to be considered “T-cell activating.” Peptides were required to be 8 to 11 amino acids in length and were filtered to remove allergens, zoonotic organisms not known to affect humans, and self-epitopes. Limited manual filtering of source organisms was also performed to ensure the exclusion of nonpathogenic antigens. Peptides from IEDB were evaluated both as a whole and as two groups: viral and nonviral pathogens.
Predictive model evaluation
To evaluate whether similarity to known pathogenic antigens can predict clinical benefit, we generated predictive models using logistic regression with ℓ1 regularization. Only pretreatment samples were considered, and a single pretreatment sample that was considered discordant was also excluded. One model attempted to predict clinical benefit for each sample based on the maximal similarity of the sample's neoepitopes to each IEDB epitope, with a feature for every IEDB epitope. Another model did the same, but aggregated IEDB epitopes based on their source pathogens, resulting in a feature for every IEDB source pathogen. The maximal similarity to an epitope (or pathogen) is referred to as the “score” for that epitope (or pathogen) in Results. To generate two additional models, IEDB epitope features were averaged together to create a single-feature model, and the same method was applied to IEDB pathogen features. See Online Data File 8 for a table of all models, including feature counts per model.
We tested these predictive models using 1,000 rounds of bootstrapping to generate stable measures of performance, using 75% of the samples for training. For each round, we performed 100 inner rounds of bootstrapping to optimize the regularization strength and scaling hyperparameters using 75% of the outer training samples for hyperparameter fitting and 25% of the outer training samples for hyperparameter validation. Each inner and outer round of bootstrapping calculated the AUC of the ROC. A baseline AUC, which was also calculated using the same bootstrapping procedure, used mutation burden in place of a logistic regression probability. Each of the 1,000 AUC values generated by the outer bootstrap sampling was compared with the corresponding baseline mutation burden AUC for the same sampling, resulting in a distribution of differences between each pair. Confidence intervals, as described in ref. 16, were taken from these AUC distributions and pairwise AUC difference distributions.
We also created the same similarity scores and predictive models for the neoepitope predictions generated in ref. 10. Because the pipelines were different, as well as the definitions of clinical benefit, these results are not fully comparable and are found in Online Data File 9.
Tetrapeptide signature evaluation
We evaluated the tetrapeptide signature approach from Snyder and colleagues (8) using the above bootstrapping procedure. Because Snyder and colleagues used validation data to impact signature creation, we did not validate the identical tetrapeptide signature generated in ref. 8. Instead, to perform validation, we generated additional tetrapeptide signatures from discovery data alone, excluding any validation set filtering. We used the same candidate tetrapeptide generation approach used in ref. 8 as opposed to our updated approach used above: Positionality was not considered, HLA alleles were not considered, and IEDB peptides were not filtered by length. We repeated the analysis twice, once using the variant calls and cohort (n = 64) from (8) and once using updated variant calls and only pretreatment, nondiscordant samples (n = 33).
The bootstrapping procedure considered 1,000 randomly sampled training sets. The signature rules from Snyder and colleagues, which are as follows, were applied to each sampled discovery set to generate separate signatures: A tetrapeptide was added to the signature if it was present in either (i) neoepitopes from at least 3 different benefit patients or (ii) neoepitopes from 2 benefit patients and a T-cell–activating epitope from IEDB. Tetrapeptides that occurred in nonbenefit patient neoepitopes were excluded from the signature. For each sampling round, these signature rules only considered the round's discovery set so that each of the 1,000 signatures generated could be tested against their associated validation sets.
Performance was measured using a single binary value, as described by Snyder and colleagues, whether or not a patient's neoepitopes contained any of the signature tetrapeptides. In this binary case, the ROC curve contains a single threshold. The AUC score defined by the area under the line segments that connect to this single threshold is equal to balanced accuracy, which is the metric we used in this case. We also evaluated the AUC using per-patient counts of signature tetrapeptides (Online Data File 10).
Results
Pre-CTLA-4 blockade mutation and expressed mutation burden associated with outcome using updated bioinformatic analysis
We reanalyzed the mutation burden of the advanced melanoma tumors from patients treated with CTLA-4 blockade who had been included in our previous analysis (8), using a modified system of four callers (as described in ref. 9). Analyzing the data using this system increased the median nonsynonymous mutation burden of the group 1.9-fold from 248 to 471 (original range 1–1,878 to new range 6–3,394; Supplementary Fig. S1A and S1B).
In samples collected prior to treatment (n = 34), mutation burden was higher in patients with clinical benefit (Fig. 1A, median: 654 in benefiters vs. 196.5 in nonbenefiters, Mann–Whitney, P = 0.0006), and elevated mutation burden was associated with overall survival (Supplementary Fig. S1C, log-rank test, P = 0.01). Of pretreatment samples, one patient who otherwise experienced disease control had a progressing lesion resected, representing a discordant lesion (Materials and Methods). In patients whose tumor samples were collected after initiating CTLA-4 blockade (n = 30), there was not a significant difference in the mutation burden between benefit and nonbenefit groups (Fig. 1B, median: 592 in benefiters vs. 396 in nonbenefiters, Mann–Whitney, P = 0.19), and elevated mutation burden was not significantly associated with overall survival (Supplementary Fig. S1D, log-rank test, P = 0.29). Eight discordant lesions were present among posttreatment samples; when excluding patients with discordant lesions, there was still not a significant difference in the mutation burden between benefit and nonbenefit groups (median: 592 in benefiters vs. 392 in nonbenefiters, Mann–Whitney, P = 0.20) or a significant association with overall survival (log-rank test, P = 0.39). In summary, mutation burden was associated with clinical benefit only in samples collected prior to treatment.
Mutation burden and ultraviolet signature. A, Median and range of mutation burden and allele-specific expression of mutations in samples collected prior to treatment [for benefit vs. no benefit, all (n = 34, Mann–Whitney, P = 0.0006), expressed (n = 9, P = 0.024)]. A and B, The first pair of bars represents mutation burden for all sequenced tumors; the second pair represents mutation burden in the subset of tumors for which RNA was available; the third represents the expressed mutations. A–C, Blue bars, benefiting tumors; red bars, nonbenefiting tumors. B, Median and range of mutation burden and expressed mutations in samples collected after treatment [for benefit vs. no benefit, all (n = 30, Mann–Whitney, P = 0.19), expressed (n = 15, P = 0.46)]. C, Correlation between signature of DNA damage from UV exposure and clinical response (*, Mann–Whitney, P = 0.003). D, Correlation between UV signature and mutation count (Spearman ρ = 0.77, P = 4e−14).
Mutation burden and ultraviolet signature. A, Median and range of mutation burden and allele-specific expression of mutations in samples collected prior to treatment [for benefit vs. no benefit, all (n = 34, Mann–Whitney, P = 0.0006), expressed (n = 9, P = 0.024)]. A and B, The first pair of bars represents mutation burden for all sequenced tumors; the second pair represents mutation burden in the subset of tumors for which RNA was available; the third represents the expressed mutations. A–C, Blue bars, benefiting tumors; red bars, nonbenefiting tumors. B, Median and range of mutation burden and expressed mutations in samples collected after treatment [for benefit vs. no benefit, all (n = 30, Mann–Whitney, P = 0.19), expressed (n = 15, P = 0.46)]. C, Correlation between signature of DNA damage from UV exposure and clinical response (*, Mann–Whitney, P = 0.003). D, Correlation between UV signature and mutation count (Spearman ρ = 0.77, P = 4e−14).
In some previously published studies, insertion or deletion mutations (indels) have not been considered in calculating mutation and neoantigen burdens. Genetic alterations of greater than one amino acid can theoretically generate peptides that are substantially different from the wild type as a result of a shifted reading frame [termed neo-open reading frames, or neo-ORFs (17)]. The number of predicted neoantigens resulting from indels was also associated with outcome (median 9 and 6 in benefiters and nonbenefiters, respectively, Mann–Whitney, P = 0.018), but comprised a very small minority of neoantigens (median of 0.8% of all predicted neoantigens, range 0%–33.7%). The possibility remains that these transcripts or the resulting translated proteins are subsequently degraded (18).
The previously published analysis did not find a correlation between the signature of ultraviolet DNA damage and clinical benefit. We reexamined this question using updated methods. A tumor was determined to have a UV signature when >60% of mutations were C>T transitions at dipyrimidine sites (19, 20). As expected, 36 of 44 (81%) tumors of cutaneous origin harbored the UV signature. Five of six tumors with acral or uveal histology did not have a UV signature (although one did: ID 6819). In contrast to the originally published data, we found the rate of UV signature mutations correlated with clinical benefit (Fig. 1C, Mann–Whitney, P = 0.003) and overall mutation burden (Fig. 1D; Spearman ρ = 0.77, P = 4e−14).
Two studies have investigated expressed neoantigens in human samples from patients treated with immunotherapy by examining the expression level of genes that harbored mutations (10, 21). However, because many mutations do not result in expressed proteins, we examined allele-specific expression (see Materials and Methods) of mutations. The median rate at which genes containing mutations were expressed (FPKM > 0) in all samples was 37% (range, 20%–50%). Of all tumors with available RNA-Seq data (n = 24), one posttreatment lesion was discordant. For tumors sampled prior to treatment with available RNA-Seq data (n = 9), patients with long-term benefit had a higher number of mutations expressed in the RNA samples (Fig. 1A, Mann–Whitney, P = 0.002). For tumors sampled after treatment (n = 15), the difference in expressed mutation burden between benefiters and nonbenefiters was not significant (Fig. 1B, Mann–Whitney, P = 0.46). Predicting clinical benefit using expressed mutation burden among pretreatment samples with RNA-Seq data (n = 9) resulted in an AUC of 0.89 [95% confidence interval (CI), 0.57–1.00] compared with a baseline mutation burden AUC of 0.94 (95% CI, 0.67–1.00) for those same samples.
Predicted neoantigens are associated with mutation burden and outcome
In this updated analysis, in contrast to the neoantigen prediction approach employed in ref. 8, NetMHCcons was applied to 8 to 11 amino acid stretches of predicted mutant peptides resulting from both SNVs and indels. All predicted binders less than or equal to 500 nmol/L were included, allowing for multiple potential neoepitopes from an SNV or indel. Neoepitopes were predicted on the basis of exome data, and allele-specific expression was measured in tumors with RNA-Seq data. Tumors had a median of 943 (range, 3–6,502) predicted neoantigens. Patients who derived clinical benefit had tumors with a higher median–predicted neoantigen burden (median 1,388; range, 209–6,502) than those who did not (median, 819; range, 3–4,510; Mann–Whitney, P = 0.01). When considering only pretreatment tumor samples (n = 34), this held true: median-predicted neoantigen burden was 1,579 (range, 209–6,502) in benefiters and 582.5 (range, 3–2,485) in nonbenefiters (Mann–Whitney, P = 0.002). There was no significant difference in predicted neoantigen burden among posttreatment tumor samples (n = 30), with a median-predicted neoantigen burden of 940 (range, 539–1,487) in benefiters and 982 (range, 81–4,510) in nonbenefiters (Mann–Whitney, P = 0.49). Using neoantigen burden to predict clinical benefit resulted in an AUC of 0.67 (95% CI, 0.54–0.79) compared with a baseline mutation burden AUC of 0.72 (95% CI, 0.58–0.84). Considering only pretreatment samples resulted in an AUC of 0.80 (95% CI, 0.64–0.93) compared with a baseline mutation burden AUC of 0.83 (95% CI, 0.69–0.95) for those samples. There was no difference in the ratio of predicted neoantigens to SNVs in benefiting tumors versus nonbenefiting (median of 2 vs. 2.17, respectively). For those pretreatment tumors with available RNA-Seq data (n = 9), the median number of expressed, allele-specific neoantigens was 382.5 (88–725) in benefiters and 48 (1–401) in nonbenefiters (Fig. 2A, Mann–Whitney, P = 0.003), while there was no significant difference in those tumors collected after treatment (Fig. 2B, Mann–Whitney, P = 0.46). Among pretreatment tumors (n = 9), the AUC for expressed neoantigens was 0.79 (95% CI, 0.35–1.00) compared with a baseline AUC for mutation burden of 0.94 (95% CI, 0.67–1.00). There was no difference in the percent of expressed predicted neoantigens between benefiting (median, 37.4%; range, 29.9%–46.3%) and nonbenefiting tumors (median, 35%; range, 14.9%–44.4%).
Neoantigen burden. A, Median and range of neoantigen burden and allele-specific expression of neoantigens in samples collected prior to treatment [for benefit versus no benefit, all (n = 34, Mann–Whitney, P = 0.002), expressed (n = 9, P = 0.003)]. B, Median and range of neoantigen burden and expressed neoantigens in samples collected after the initiation of treatment [for benefit versus no benefit, all (n = 30, Mann–Whitney, P = 0.49), expressed (n = 15, P = 0.46)]. A and B, Blue bars, benefiting tumors; red bars, nonbenefiting tumors. The first pair of bars represents predicted neoantigens for all sequenced tumors; the second pair represents predicted neoantigens in the subset of tumors for which RNA was available; the third represents the expressed neoantigens.
Neoantigen burden. A, Median and range of neoantigen burden and allele-specific expression of neoantigens in samples collected prior to treatment [for benefit versus no benefit, all (n = 34, Mann–Whitney, P = 0.002), expressed (n = 9, P = 0.003)]. B, Median and range of neoantigen burden and expressed neoantigens in samples collected after the initiation of treatment [for benefit versus no benefit, all (n = 30, Mann–Whitney, P = 0.49), expressed (n = 15, P = 0.46)]. A and B, Blue bars, benefiting tumors; red bars, nonbenefiting tumors. The first pair of bars represents predicted neoantigens for all sequenced tumors; the second pair represents predicted neoantigens in the subset of tumors for which RNA was available; the third represents the expressed neoantigens.
The lack of correlation between benefit and mutation or neoantigen burden in posttreatment samples may suggest immunoediting of specific neoantigens such that the overall neoantigen burden is nearly maintained, but a small number of particularly important neoantigens have been selectively removed.
Predicted neoepitope homology does not outperform mutation burden as a predictor of response
T-cell receptors (TCR) are known to exhibit considerable degeneracy (22), with evidence in infectious diseases that T cells can crossreact to unknown antigens based on homology to antigens to which the host has not previously been exposed (23, 24). In cancer, fatalities have been reported that resulted from cross-reactivity of tumor antigen–specific engineered TCRs (25). The current RNA-Seq data have been analyzed previously to suggest that an antiviral IFN-related expression signature is associated with benefit from therapy (12). However, it is unknown whether T-cell cross-reactivity plays a role in checkpoint blockade efficacy.
The hypothesis that tumors that respond to checkpoint blockade might harbor recurrent motifs associated with response, either common to responders or homologous to known T-cell epitopes, remains a question of particular debate. In the initial description of the melanoma sequencing data (8), we used an algorithm to compare 4 amino acid stretches contained within nonamer neoantigens (a “tetrapeptide signature”), irrespective of position or HLA type. We now directly evaluated that previous algorithm and separately performed a new comparison.
First, we replicated the tetrapeptide signature approach we used before, inclusive of the same patients (n = 64), variant calls, neoepitope predictions, and IEDB filtering criteria. Unlike previously (8), this time we did not allow held-out data to influence the tetrapeptide signature (see Materials and Methods). When we categorized tumor samples as featuring or not featuring the tetrapeptide signature, this achieved an AUC of 0.61 (95% CI, 0.53–0.71) compared with an associated mutation burden baseline AUC of 0.76 (95% CI, 0.63–0.89). Similarly, using counts of signature tetrapeptides rather than a binary representation of signature presence did not outperform mutation burden (Online Data File 10). In addition to directly replicating the approach and data we used previously, we performed a similar analysis using only pretreatment, nondiscordant samples as well as updated variant calls. The presence of tetrapeptide signature tetrapeptides achieved an AUC of 0.50 (95% CI, 0.50–0.50), compared with an updated mutation burden baseline of 0.85 (95% CI, 0.69–0.98). Using counts of signature peptides did not outperform mutation burden here either (Online Data File 10).
In summary, the originally derived tetrapeptide signature and the tetrapeptide signature generated using the revised mutation calling system did not outperform mutation burden as a predictor of clinical benefit, and using the revised system, was not better than random selection.
Next, we created an automated tool, Topeology, for comparing tumor neoepitopes with pathogens from the IEDB using sequence alignment of nonanchor residues (Fig. 3B). This alignment accounts for position, amino acid gaps, and biochemical similarity between amino acids (see Materials and Methods). We conducted comparisons of single amino acid substitution–based neoepitopes using this tool, considering only nondiscordant, pretreatment samples (n = 33; Fig. 3A and B; Supplementary Fig. S2).
Comparison of neoepitopes. A, Flow chart illustrating the process for tumor-to-pathogen neoepitope comparison by Topeology. B, An example comparison of the CPDKSTSTL neoepitope (tumor ID 0095) and its wild-type peptide to the LPFEKSTVM influenza A epitope from IEDB. In this case, the neoepitope results in a higher score (0.58) than the wild-type peptide (0.42). Only bold amino acids are considered for alignment. Amino acids labeled black do not impact the final Smith–Waterman alignment score for this comparison. Amino acids labeled green are equivalent in both sequences, whereas amino acids labeled orange are not. C, Averaged tumor-to-pathogen similarity scores were highly correlated with the log of mutation burden (Pearson r = 0.97, P = 6.3 × 10–21).
Comparison of neoepitopes. A, Flow chart illustrating the process for tumor-to-pathogen neoepitope comparison by Topeology. B, An example comparison of the CPDKSTSTL neoepitope (tumor ID 0095) and its wild-type peptide to the LPFEKSTVM influenza A epitope from IEDB. In this case, the neoepitope results in a higher score (0.58) than the wild-type peptide (0.42). Only bold amino acids are considered for alignment. Amino acids labeled black do not impact the final Smith–Waterman alignment score for this comparison. Amino acids labeled green are equivalent in both sequences, whereas amino acids labeled orange are not. C, Averaged tumor-to-pathogen similarity scores were highly correlated with the log of mutation burden (Pearson r = 0.97, P = 6.3 × 10–21).
We explored the extent to which mutation burden may confound these similarity scores. When comparing tumor samples with IEDB pathogens (Supplementary Fig. S2), mutation burden was highly correlated with the mean IEDB epitope similarity score for each sample (Fig. 3C, Pearson r = 0.97, P = 6.3 × 10–21). None of these models significantly outperformed the mutation burden baseline AUC of 0.85 (95% CI, 0.69–0.98; Online Data File 8). We repeated this process using a dataset that includes the neoantigens predicted in a published study of 100 tumors from melanoma patients treated with ipilimumab, using the neoantigen predictions and definition of clinical benefit as defined in that study (10). With the caveat that these two neoantigen prediction pipelines were not the same, we found a similar result (Online Data File 9).
Using any of the methods above, the comparison of tumor neoepitopes to pathogens was highly associated with mutation burden but did not outperform mutation burden as a predictor of response.
GSEA of bulk tumor RNA expression: enrichment for IFN signaling and metabolic activity in benefiting tumors
We sought to determine whether clinical benefit is also associated with an inflamed tumor milieu favorable for antitumor immune activation in the setting of CTLA-4 blockade, as described previously (26–29).
When we applied the CIBERSORT method, which was originally developed to deconvolute lymphocyte subsets using microarray data (30), the measure of tumor immune infiltrate correlated with site of origin: Resected tumor-containing lymph nodes exhibited a high Pearson correlation coefficient (range, 0.26–0.73) with immune infiltrate as compared with those resected from other primary or metastatic sites (range, −0.04 to 0.32; per-subtype scores in Supplementary Fig. S3). We next applied GSEA using the 50 Hallmark gene sets provided by the MSigDB database (31, 32). When we compared all benefiting tumors with nonbenefiting tumors, the top five most statistically significant gene sets upregulated in benefiting tumors were IFNγ, IFNα, allograft rejection, inflammatory response, and complement (Supplementary Fig. S4; FDR q-value < 0.005 for each). These data are consistent with previous studies (26). No gene set was significantly enriched when all pretreatment lesions were compared with all posttreatment lesions.
When we examined only pretreatment tumors (n = 9), benefiting tumors (n = 4) were characterized by signals of active metabolism, including MTORC1 signaling, glycolysis, and fatty acid metabolism (FDR q-value < 0.05; see Online Data File 11 for a complete list) relative to nonbenefiting (n = 5). The UV response (FDR q-value < 0.05) and inflammatory response (FDR q-value < 0.05) gene sets were also significantly enriched.
Discussion
To date, studies conducted in three tumor types [melanoma (8, 10), lung (9), and MSI-high cancers (11)] have illustrated an association between mutation burden and response to checkpoint blockade immunotherapy. Here, we present a reanalysis of previously published data (8) and incorporate new expression data from a subset of patients in that study. Predicted class I candidate neoantigens did not outperform mutation burden as a predictor of response, even when RNA expression was considered. This finding reinforces the importance of other factors to response, including class II neoepitopes (for which predictive tools remain suboptimal), signaling and cell populations in the tumor microenvironment, and other systemic factors. Furthermore, the ratio of predicted neoantigens to mutations was not significantly different in nonresponding tumors, and posttreatment tumors did not exhibit a significant difference in mutation or neoantigen burden between benefiting and nonbenefiting tumors. These findings suggest that immunoediting cannot be perceived when evaluating neoantigens in aggregate bioinformatically, but may be occurring at the level of individual neoantigens of particular importance.
In an effort to better evaluate the hypothesis that neoantigens may resemble known pathogens, we created Topeology, a publicly available, biologically relevant tool for peptide comparison to facilitate comparison of T-cell cross-reactivity in any setting (cancer or otherwise). We evaluated neoantigens using both the previously published tetrapeptide signature and Topeology. When either method was applied to two published datasets of melanoma patients treated with CTLA-4 blockade (8, 10), we found that resemblance of neoepitopes to pathogens is associated with but does not outperform mutation burden as a predictor of response to therapy. Therefore, although TCR cross-reactivity may be relevant on an individual T-cell level, neither the tetrapeptide signature or neoepitope homology exhibits an indication for use as a biomarker, as both measures are highly associated with, and do not outperform, mutation burden. Topeology may be used to evaluate candidate peptides on an individual basis, for example, in the exploration of the hypothesis that tumors generate “danger signals” recognizable to T cells (33).
These data confirm what immunologists have long known (34–36): A myriad of additional factors, ranging from the IFNγ signaling seen in GSEA analysis (26), to systemic factors (37, 38), must be integrated with mutation burden to improve our understanding of tumor response and resistance to checkpoint blockade therapy.
Disclosure of Potential Conflicts of Interest
M.D. Hellmann reports receiving commercial research grants from BMS and Genentech and is a consultant/advisory board member for AstraZeneca, BMS, Genentech, Janssen, Merck, and Novartis. E. Van Allen reports receiving a commercial research grant from Bristol-Myers Squibb. J.D. Wolchok reports receiving a commercial research grant from and is a consultant/advisory board member for Bristol-Myers Squibb. A. Snyder is a consultant/advisory board member for Neon Therapeutics. J. Hammerbacher is the chief scientist at Cloudera, is a board member at CIOX, and reports receiving a commercial research grant from Neon Therapeutics. No potential conflicts of interest were disclosed by the other authors.
Authors' Contributions
Conception and design: M.D. Hellmann, A. Snyder
Development of methodology: T. Nathanson, A. Ahuja, A. Rubinsteyn, M.D. Hellmann, E. Van Allen, A. Snyder
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): E. Van Allen, T. Merghoub, J.D. Wolchok, A. Snyder
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): T. Nathanson, A. Ahuja, A. Rubinsteyn, B.A. Aksoy, M.D. Hellmann, E. Van Allen
Writing, review, and/or revision of the manuscript: T. Nathanson, A. Ahuja, A. Rubinsteyn, B.A. Aksoy, M.D. Hellmann, D. Miao, E. Van Allen, T. Merghoub, J.D. Wolchok, A. Snyder
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): T. Nathanson, A. Ahuja, D. Miao
Study supervision: T. Merghoub, A. Snyder, J. Hammerbacher
Acknowledgments
We are grateful for the roles of Vladimir Makarov and Timothy A. Chan in the initial study, to Dr. Makarov in developing the four-caller system, and to Christina Leslie and Jacqueline Buros for their helpful reviews of this manuscript.
Grant Support
This work was supported by NCIK08 CA201544-01 (to A. Snyder).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.