Therapeutic anticancer vaccination has been adapted as an immunotherapy in several solid tumors. However, the selection of promising candidates from the total quantity of possible epitopes poses a challenge to clinicians and bioinformaticians alike, and very few epitopes have been tested in experimental or clinical settings to validate their efficacy. Here, we present a comprehensive database of predicted nonmutated peptide epitopes derived from genes that are overly expressed in a group of 32 melanoma biopsies compared with healthy tissues and that were filtered against expression in a curated list of survival-critical tissues. We hypothesize that these "self-tolerant" epitopes have two desirable properties: they do not depend on mutations, being immediately applicable to a large patient collective, and they potentially cause fewer autoimmune reactions. To support epitope selection, we provide an aggregated score of expected therapeutic efficiency as a shortlist mechanism. The database has applications in facilitating epitope selection and trial design and is freely accessible at https://www.curatopes.com.
A database is presented that predicts and scores antitumor T-cell epitopes, with a focus on tolerability and avoidance of severe autoimmunity, offering a supplementary epitope set for further investigation in immunotherapy.
In the past few years, anticancer immunotherapy has emerged as an effective therapeutic approach for several highly metastatic solid tumors, and in particular for melanoma (1). Most of the recently approved immunotherapies rely on the use of checkpoint inhibitors, drugs in the form of targeted antibodies, which disrupt the signal transduction of certain suppressive lymphocyte receptors and thereby unblock the immune response against tumor cells. However, a substantial fraction of patients do not benefit from checkpoint inhibitor therapy, and several trials are currently ongoing to evaluate combinations of checkpoint inhibitor treatment with other immunologic treatment modalities. Of these, one promising candidate is therapeutic anticancer vaccination, which is also applicable as a monotherapy and can be performed, for example, by immunizing the patient with either adjuvant-combined tumor peptides or autologous mature dendritic cells (DC) loaded with tumor peptides. The tumor peptides constitute T-cell epitopes derived from proteins that are either mutated or overexpressed in the tumor. Cytotoxic lymphocytes (CTL) recognize peptides of 9 to 12 amino acids in length that bind to human leukocyte antigen (HLA) class I molecules. Tumor peptides, when presented by mature DCs, can promote the activation and proliferation of specific CTLs. These are able to specifically recognize the same HLA-borne peptides on the tumor's surface and elicit an immune response against the tumor.
Compared with preventive vaccination against infectious diseases, the production of therapeutic anticancer vaccines has to be personalized because the antigenic profile of each tumor is highly patient-specific. Somatic mutations in the tumor and aberrations in its transcriptomic profile together with the human HLA allele diversity result in the phenomenon that efficacious tumor-specific T-cell epitopes differ from patient to patient. Because of recent advances in next-generation sequencing (NGS) technologies, it is now possible to combine transcriptome and exome measurements from a patient tumor sample and predict mutated tumor epitopes (neoepitopes) using a bioinformatics pipeline. Neoepitopes tend to be truly patient-specific and hence have to be produced individually. The production of customized peptides in the quality required for therapeutic vaccination is expensive and time-consuming. The latter restriction, in particular, undermines the chances of administering this therapy in large cohorts of severely ill and rapidly progressing patients. However, there is an alternative: nonmutated proteins that are known to be regularly overexpressed in specific cancers. They have been utilized in cancer immunotherapy for several decades and have only recently dropped out of focus despite representing an overwhelming quantity of untriaged epitopes. While there are several online resources aggregating clinically and scientifically characterized nonmutated peptides and gene candidates whose expression was found to be restricted to certain tumors (2, 3), no theoretical approach combining NGS, bioinformatics, and individual HLA selection has been put forward so far.
With the goal of supplementing the existing arsenal of melanoma-specific antigens for therapeutic vaccination, we explored a fully computational approach in this study. Through consideration of all human protein-coding genes and stepwise algorithmic removal of unfavorable candidates based on expression data, we have generated a database of computationally predicted, nonmutated tumor epitopes from proteins that are overly expressed in metastatic cutaneous melanoma biopsies (https://www.curatopes.com) but barely expressed in healthy tissues. This database has potential applications in therapy regarding epitope selection and cost-efficient epitope production through complementation of existing resources.
Material and Methods
Concept of the database
Our objective was to identify a set of protein-coding genes that were expressed exclusively or demonstrably higher in metastatic cutaneous melanoma compared with healthy tissues (Fig. 1A; Supplementary Fig. S1). Initially, we identified protein-coding genes through corresponding annotation in at least one of Ensembl (https://www.ensembl.org), Human Protein Atlas (https://www.proteinatlas.org), or Consensus CDS (https://www.ncbi.nlm.nih.gov/CCDS/) and removed genes if they were expressed at any level above “Not detected” in the Human Protein Atlas's IHC dataset in any of the inspected 58 tissues. This yielded the eligible-protein set, an assembly of protein-coding genes whose protein products have not been observed in healthy tissues.
In a parallel complementary approach, we identified those genes whose transcript abundancy conforms to a high-in-tumor, low-in-tissue phenotype. For this, we obtained a published transcriptome dataset of metastatic cutaneous melanoma (27 melanoma biopsies from the Gene Expression Omnibus accession GSE78220 excluding GSM2069836 and five from GSE96619; for NGS data analysis, see Supplementary Fig. S2) and complemented it with transcriptome data for healthy tissues from the GTEx dataset V7 (4). The GTEx dataset encompasses 11,688 individual sequencing runs from 51 tissues (between 5 and 564 available specimens per tissue) and lists aggregate transcript abundancy for each gene in transcripts per kilobase per million (TPM). To classify a gene, we checked whether 90% of the tumor samples expressed it at a higher level than 90% of the tissue samples. The genes that met this condition in all 51 tissues were included in the favorable-transcript set.
To identify genes with a low risk of severe autoimmune reactions against survival-critical tissues, we curated a list of such tissues from the ones available in GTEx (Supplementary Table S1). We then defined a gene as tolerable in a tissue when 90% of the tissue samples showed an expression of less than 10 TPM. Genes that turned up tolerable in all critical tissues were gathered into the expected-tolerability set.
On the basis of the above sets, we defined the superior-tolerance genes as those that appear in all three sets, and the enhanced-tolerance genes as those that appear in the eligible-protein and the favorable-transcript set but not in the expected-tolerability set. Either set was made up of 20 genes at this stage (Supplementary Table S2; Supplementary Fig. S3).
On the 165 transcripts annotated as derived from these genes, we used netMHCpan 4.0 (5) to predict peptides of lengths 9–12 and their biochemical features. For immunogenicity prediction, we used the Immune Epitope Database and Analysis Resource (IEDB) in Epitope Discovery tool suit (6). Epitopes were discarded if the predicted binding affinity between peptide and HLA molecule exceeded 500 nmol/L.
In a post hoc filtering step, we made sure to exclude peptide sequences that could in principle also arise from the regular protein pool in healthy cells. We obtained all human protein sequences from Ensembl (release 94) and discarded predicted peptides whose sequence appeared in any proteins that were not part of our result sets. Due to high sequence similarity with discarded genes from the same family, all peptides from five genes in the superior-tolerance set were removed, reducing the effective number of genes in this set to fifteen (Supplementary Table S2).
After the binding affinity and the post hoc filters, 6,397 (11.7%) unique peptides remained of the initial 54,526 predicted ones. Epitopes were then scored with a metric that included the normalized values of binding affinity, immunogenicity, transcript expression in melanoma, and the expression index between melanoma and tissue (see Supplementary Material for details). Because this score is supposed to evaluate the epitopes’ general applicability for antimelanoma immunotherapy in the population, we dubbed it generalized-epitope predicted immune-efficacy (gPIE). We envision this value as a way for clinicians and other users to quickly assess an aggregate of the most commonly used metrics for an epitope's expected efficacy and tolerability, expediting the nontrivial process of epitope selection.
The database of overly expressed epitopes in metastatic cutaneous melanoma is available at https://www.curatopes.com. It was developed in RStudio and is deployed with the Shiny framework.
The implementation for the pipeline filtering steps has been deposited to Zenodo (https://zenodo.org/) under the DOI 10.5281/zenodo.3354032.
The online database (Fig. 1B) consists of four tables with different peptide subsets: a selection of peptides from 15 superior-tolerance genes, a selection of peptides from 20 enhanced-tolerance genes, the complete list of all predicted epitopes that passed our filters (combining the 35 genes from above), and an evaluation of known and characterized melanoma epitopes (3) with the Curatopes pipeline. The complete list holds 6,397 individual peptides and 36 HLA alleles combining into 29,410 epitopes. In the first two tables, we list for each gene-HLA allele combination the three peptides with the highest scores to provide a gene-oriented overview. This yielded 566 peptides in 933 epitopes in the superior-tolerance set and 760 peptides in 1,384 epitopes in the enhanced-tolerance set. Each epitope is annotated with features like expression and predicted immunogenicity.
In databases like CTpedia (2), TANTIGEN (7), and CAPED (3) as well as in recent studies (8), peptides and genes that elicited a spontaneous antitumor immune response in clinical or experimental settings have been collected and annotated. These resources provide an indispensable overview of antitumor vaccination options with documented immune system engagement, and many past and ongoing clinical trials can be traced to the listed candidates. Our work in this study offers a complementary resource that focuses on avoiding life-threatening off-target toxicity, thus providing to clinicians an alternative set of peptides with a valuable additional property.
The Curatopes pipeline is blind to any preexisting knowledge about melanoma markers, so it could be expected that at least some of the known genes make it into the result set. However, the comparison of their expression profiles in healthy tissue and melanoma reveals that many of them are not completely specific for melanoma and thus do not meet the very strict Curatopes criteria. It should also be noted that even measurable immune responses against those antigens did not necessarily result in tumor regressions (9–11). Even when introduced in masses by adoptive T-cell transfer, T cells specific for gp100 (PMEL) or MART1 (MLANA) were only able to induce clinical responses in few patients, though sometimes at the cost of autoimmunity (12, 13).
We found that epitopes from the well-established melanoma marker MAGE-A3 (14) were scored at top ranks in the superior-tolerance set. In a recent clinical trial (15), a T-cell receptor against a peptide (with the sequence KVAELVHFL) from MAGE-A3 was generated in an HLA-A2-transgenic mouse, bypassing all tolerance mechanisms. Over the course of the trial, two patients died of neurotoxicity. It was later confirmed that the antigenic peptide that the engineered T cells recognized appears in the sequence of several MAGE family members, including some that are expressed in the brain. This episode illustrates the need to discard peptides that can be generated from off-target proteins in survival-critical tissues. In our database, this specific peptide was discarded during our post hoc filtering approach due to sequence identity with MAGE-A9 (MAGEA9).
We envision that the database can be exploited in two ways (Fig. 1A). First, it is of use when designing clinical trials with a cohort of patients by selecting five to ten candidate epitopes per determined HLA class I allele and manufacturing them in good manufacturing practice quality, either as synthetic peptides or in the form of one or more mRNA constructs. Patients with the corresponding HLA haplotype could then receive those without further characterization of their tumors. Second, and in a more individualized treatment approach, one would first assemble a larger library of 20 to 100 highly ranked peptides, produce them in large good manufacturing practice–grade quantities, and use RNA-seq profiling of a patient tumor biopsy to choose a personalized set of 5 to 10 highly expressed peptides for vaccination. In both these suggested strategies, the epitopes from our tolerability-based selection can be used in combination with established epitopes from known tumor-associated antigens, thus providing a supplement of vaccination targets especially for melanomas with low expression of the established tumor antigens. If our approach provides benefits to patients with cutaneous melanoma, other fields of oncology might profit from similar efforts.
Disclosure of Potential Conflicts of Interest
No potential conflicts of interest were disclosed.
Conception and design: T. Jaitly, J. Dörrie, G. Schuler, J. Vera
Development of methodology: C. Lischer, M. Eberhardt, T. Jaitly
Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.): C. Lischer, T. Jaitly, G. Schuler
Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis): C. Lischer, M. Eberhardt, T. Jaitly, J. Dörrie, G. Schuler
Writing, review, and/or revision of the manuscript: C. Lischer, M. Eberhardt, T. Jaitly, N. Schaft, J. Dörrie, G. Schuler, J. Vera
Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases): C. Lischer, M. Eberhardt, T. Jaitly, C. Schinzel, G. Schuler
Study supervision: J. Vera
We thank Christina Blume, Guido Santos, and Xin Lai for suggestions on front-end appearance.
This work was supported by the German Federal Ministry of Education and Research (BMBF) as part of the project eBio:MelEVIR (031L0073A, to J. Vera). J. Vera is funded by the Staedler Stiftung.