A key goal in immuno-oncology is the identification of tumor antigens recognized by T cells. Significant progress has been made in predicting MHC class I presentation of tumor-specific antigens (peptides) recognized by CD8 reactive T cells. However, predicting antigens recognized by CD4 T cells that are presented by the MHC class II has proven to be more challenging. Studies have shown binding affinity to be less predictive for MHC class II presentation than for class I. Class II antigen-directed therapeutics require accurate antigen identification from patient samples, which remains elusive today.

Methods: We focused initially on antigen presentation by the HLA-DR and generated a dataset of human tumor transcriptomes and HLA-DR immunopeptidomes from resections of B cell lymphomas (N=39). Transcriptomes were obtained by NGS of exome-captured cDNA and immunopeptidomes by immunoprecipitation using the HLA-DR specific Ab L243 and MS/MS. Each sample was typed for HLA-DRB1,3,4,5 using standard methods. Additionally, we obtained published class II mass spectrometry data for two B cell lines, each of which expressed a single common HLA class II allele (HLA-DRB1*15:01 and HLA-DRB5*01:01). RNA sequencing data was not available for either cell line; therefore, we substituted RNA-sequencing data from a different B cell line, B721.221. We combined these data to train a deep learning model of Class II HLA peptide presentation. Our model addressed two key challenges: (1) learning HLA-allele-specific models from tumor and normal data where each sample expressed up to 4 unique HLA-DR alleles and (2) reflecting information about all aspects of HLA presentation, including gene expression, antigen processing and stable binding of peptides to HLA. We evaluated the performance of the model on two independent test datasets. First, we tested the model on HLA peptides from a held-out sample from the lymphoma training dataset. Second, we tested the model on a separate public cell line expressing both HLA-DRB1*15:01 and HLA-DRB5*01:01.

Results: An average of 567 training and 203 testing peptides at q<0.01 and >50M unique transcriptome reads were obtained from the lymphoma samples. From cell-lines, 433 peptides for training and 223 peptides for testing were used. The model demonstrated a significant improvement in prediction accuracy, achieving >10%-point gain in area under the ROC curve (ROC AUC) vs a standard binding affinity-based predictor. On the held-out lymphoma test data, it achieved a ROC AUC of .95 vs the binding affinity model ROC AUC .80. On the cell-line test data, it achieved a ROC AUC of .90 vs the binding affinity model ROC AUC of .79.

Conclusion: We used a large dataset of transcriptomes and HLA peptidomes to train a deep learning model for HLA class II antigen presentation. The new model significantly outperforms standard methods and advances in silico HLA class II antigen selection for personalized cancer immunotherapy.

Citation Format: Tommy Boucher, Matthew Davis, Christine Palmer, Tyler Murphy, Andrew Clark, Fujiko Duke, Aaron Yang, Lauren Young, Karin Jooss, Mojca Skoberne, Josh Francis, Roman Yelensky, James Sun, Jennifer Busby. MHC class II antigen identification for cancer immunotherapy by deep learning on tumor HLA peptides [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 4445.