Deep sequencing of T-cell receptors enables the comprehensive profiling of lymphocyte populations and the characterization of the repertoire of T-cell responses against tumors, which could be applied to diagnose cancers. Ostmeyer and colleagues introduce a novel approach to characterize TCR patterns correlating with antigen recognition. By projecting the large TCR sequence space into a handful of biophysicochemical descriptors for key residues and seeking TCRs with similar antigen-binding capabilities even in the absence of identical amino acids, this approach presents several advantages over current methods.
See related article by Ostmeyer et al., p. 1671
Immune surveillance mechanisms can lead to immunologically mediated tumor rejection in mice. Human tumors are often infiltrated by T cells, and this correlates with better survival following historic chemotherapy approaches but also checkpoint immunotherapy (1). Such tumor-infiltrating lymphocytes (TIL) have been assumed to be tumor specific, and the best demonstration comes from adoptive T-cell therapy experiments, where TILs or cells recognizing specific tumor neoantigens have elicited tumor regression in patients with melanoma or solid tumors (2). Although the full extent of tumor antigens recognized by TILs remains unknown to date, it is expected that TILs are directed against both private as well as shared tumor antigens. Most tumor neoantigens are private. However, some recurring mutated neoantigens, common lineage-specific tumor antigens, frequently occurring cancer testis antigens, and emerging noncanonical neoantigens, such as those deriving from tumor-specific alternative splicing or endogenous retroviral elements, may be shared across patients with the same tumor histotype or even across different tumor types (3). The notion of tumor immune surveillance entails that insurgent tumors are eliminated by immune cells, whereas tumor evolutionary mechanisms lead to immune editing, often with tumor growth continuing despite ongoing immune recognition. Under such circumstances, occult tumors should already harbor a T-cell infiltrate enriched in tumor-specific T cells. Their sensitive and accurate measurement could lead to tools aiding early tumor diagnosis.
Deep T-cell receptor (TCR) sequencing offers important opportunities for detailed quantification of TILs and characterization of the repertoire of T-cell responses against tumors. At the same time, recent ground-breaking bioinformatics progresses have allowed clustering TCRs with shared specificities (4, 5), while biophysicochemical computational methods have recently crossed the barrier of assigning specificity to a common (even unknown) antigen to TCR sequences identified by bulk TCR sequencing of biological samples. The approach presented by Ostmeyer and colleagues (6) lays the foundation for a novel class of methods for analyzing immune repertoires to find disease-associated TCRs. The idea is to feed machine learning techniques with biophysicochemical descriptors of the TCR interface, rather than with TCR sequences. The method determines a short list of preferred values for these descriptors at key positions in antigen-binding TCRs, which permit the identification of disease-associated TCRs and ultimately to distinguish repertoires found in tumor-related tissues from those found in healthy ones. The use of such physical descriptors can be particularly efficient to fit the extremely large sequence diversity of immune receptors into a limited number of quantitative characteristics at key positions. First, this description circumvents the principal drawback of purely sequenced-based analysis, which requires a very large number of disease-associated TCR sequences to drive the prediction. Second, it allows the detection of potential antigen-binding TCRs even when they have never been encountered before.
Ostmeyer and colleagues used TCR deep sequencing data from tumor and healthy tissues isolated from patients with colorectal or breast cancer to investigate their possible application in disease diagnosis. The authors decided to use only the complementary determining region 3 (CDR3) of the TCR to encode its surface, because it is the most variable and therefore differentiable portion of the receptor, and the primary factor of the antigen-binding specificity. On the basis of the analysis of 55 available three-dimensional structures of TCR/p–MHC complexes, they further limited the description of the TCR interface to a set of multiple contiguous strips of four amino acids (called 4-mer) of its β chain, precisely from the CDR3β, assuming that at least one of these 4-mers contacts the peptide antigen. The biophysicochemical characteristics of each residue, at each position of the 4-mers, were described using the five Atchley factors. The latter reflect not only the codon diversity and secondary structure, but also the molecular size, polarity, and electrostatic charge of the residues, and are thus anticipated to correlate with receptor-antigen–binding features. The authors developed a logistic regression to score the 4-mers, with the 4-mer relative abundance and Atchley residue factors for each position in the 4-mer as input. Following the assumption that T cells derived from tumor tissue will display at least one TCR able to bind a cancer-related epitope, the scoring function was trained so that at least one 4-mer per tumor repertoire must have a high score, while all 4-mers from healthy tissue repertoires should have low scores.
The trained model was able to correctly predict the origin of tissue (tumor or healthy) of more than 90% of external validation samples. Interestingly, the biophysicochemical description of the TCRβ CDR3 allowed the identification of tumor-associated TCRs whose sequence varied enough to prevent their identification through shared amino acid detection. Indeed, despite their high sequence variability, these motifs shared similar biophysicochemical properties at key positions that enabled their recognition through common Atchley patterns. Another advantage of this strategy lies in the capacity to define the biophysicochemical requirements for the TCR to bind the cancer-related peptide antigen, by analyzing the weights of the Atchley factors in the logistic regression. Of note, the TCR biophysicochemical requirements related to breast and colon cancer were found to be different, as could be expected from the fact that the corresponding TCRs should recognize different peptide epitopes.
Being essentially physics-based, the strategy of Ostmeyer and colleagues could be universally applicable to all antigen:(B/T)-cell receptors. Indeed, using CDR3 of B-cell receptors, the approach has already been successfully tested for the diagnosis of multiple sclerosis (7). This supports the idea that this class of methods could be widely applied to diseases where an adaptive immune response is driven by a limited set of antigens shared across patients.
In addition to the novel and noteworthy bioinformatics developments, this study raises many interesting biological questions. The developed approach identified tumor-associated motifs, allowing the investigators to suggest that if the corresponding TCRs bind the same antigen, the latter should be tumor-associated. In absence of experimental validation, alternative interpretations can be offered. As stated by the authors, the T cells could be responding to tissue damage antigens in the tumor or be themselves T-regulatory cells contributing to immunosuppression. In addition, they could also be effector memory T cells directed against recent acute or persistent chronic infections (8), recruited nonspecifically by tumor inflammatory chemokines, and consequently absent from normal tissues. In the latter case, although the recognized antigen would not be directly tumor-associated, the detected motifs would still constitute relevant biomarkers.
Surprisingly, the developed method generalizes across the studied patients, despite their likely different HLA backgrounds. However, the actual TCR-binding motifs of peptides displayed on antigen-presenting cells differ depending on the HLA allele, that is, different MHC molecules will predominantly bind different peptides (9) and hence present different binding motifs to TCRs. In addition, different HLA alleles will lead to different MHC surfaces and MHC/TCR contacts. The explanation given by the authors for this apparent insensitivity of the model to HLA alleles is that MHC/TCR contacts are made primarily via CDR1 and CDR2, which are excluded from the TCR surface description. We also observe that, by design, the approach looks only for cancer-related TCR motifs common across patients. If we assume that common cancer-related TCR motifs bind to common cancer-related antigens, the methodology used here could implicitly restrict the search to TCRs able to bind peptides that are themselves able to be presented by more than one type of MHC—and therefore potentially able to bind different MHCs present in the patients of this study. If so, on the contrary, the approach would miss all TCR motifs that do recognize cancer-related antigens, but which are highly HLA allele specific. This would not be a problem for the diagnostic capabilities of the model, as long as at least one cancer-related peptide antigen can be presented by a large number of MHCs in the studied population. However, only a relatively small number of patients were studied here and consequently, it cannot be ruled out that most of them could share at least one common, or closely related, HLA allele(s) by chance.
Although there is no quantification of the fraction of the TCRs that are tumor specific in this study, the authors found that the TCRs bearing high-score 4-mers are not necessarily highly expanded T-cell clones, notably in the case of colorectal cancer. Interestingly, this is in line with the study of Schumacher and colleagues, who found that the tumor reactivity of TCRs is restricted to a limited number of intratumoral T-cells (10). Furthermore, several previous studies that reported TCR sequences shared across patients found that these TCRs could be assigned to public clones. Ostmeyer and colleagues underscore that the TCRs with high-scoring 4-mers detected in their study have different amino acid sequences across patients and should therefore not be part of public TCRs. This is an important point and could be checked in a follow-up study by determining whether these TCRs can be found in other patients not included in this study.
Finally, early detection of cancer can rely on blood assays, where material is plentiful for analysis, or on cytology samples, which are also routinely collected for surveillance. Whether tumor-specific TCR signatures can be identified in blood that would be suitable for generic blood testing or whether traditional cytology methods can yield sufficient material for TCR sequencing to capture tumor-specific alterations remains an open question and a matter for future investigation.
Disclosure of Potential Conflicts of Interest
G. Coukos reports receiving a commercial research grant from BMS, Celgene, Boehringer Ingelheim, Roche, Iovance, Kite, has received speakers bureau honoraria from Roche, has ownership interest (including stocks and patents) in the University of Pennsylvania, and is a consultant/advisory board member for Roche, Genentech, Astrazeneca, BMS, NextCure, Sanofi-Aventis, and Geneos Tx. No potential conflicts of interest were disclosed by the other author.
The authors are supported by the Ludwig Institute for Cancer Research (to G. Coukos), the University of Lausanne (to G. Coukos and V. Zoete), and the Swiss Institute of Bioinformatics (to V. Zoete).