T-cell receptor (TCR) repertoire profiling has emerged as a powerful tool for biological discovery and biomarker development in cancer immunology and immunotherapy. A key statistic derived from repertoire profiling data is diversity, which summarizes the frequency distribution of TCRs within a mixed population. Despite the growing use of TCR diversity metrics in clinical trial correlative studies in oncology, their accuracy has not been validated using published ground-truth datasets. Here, we reported the performance characteristics of methods for TCR repertoire profiling from RNA-sequencing data, showed undersampling as a prominent source of bias in diversity estimates, and derived a model via statistical learning that attenuates bias to produce corrected diversity estimates. This modeled diversity improved discrimination in The Cancer Genome Atlas data and associated with survival and treatment response in patients with melanoma treated with anti–PD-1 therapy, where the commonly used diversity normalizations did not. These findings have the potential to increase our understanding of the tumor immune microenvironment and improve the accuracy of predictions of patient responses to immunotherapy.

T-cell receptors (TCR) are heterodimers of alpha and beta (TCRαβ; ref. 1) or gamma and delta (TCRγδ; ref. 2) cell surface proteins that confer exquisite sensitivity and specificity of T cells for their cognate peptides presented by MHC molecules on antigen-presenting cells (Fig. 1; ref. 3). T-cell recognition of peptide/MHC complexes through binding interactions with TCR drives the T-cell activation, proliferation, differentiation, cytokine production, and cytolytic activity necessary for protection from pathogens (4). Between 2 × 109 and 6 × 109 unique TCRs may be present in the circulating T-cell population in healthy individuals (5). This vast number of unique clonotypes is produced stochastically by V(D)J recombination in thymic T-cell precursors, followed by positive and negative selection in the thymus to select T cells bearing TCRs that bind MHC but do not bind tightly to self-peptides (4). Thus, T-cell development produces a diverse repertoire of TCRs able to recognize pathogens while minimizing the risk of autoimmunity.

Figure 1.

Generation and inference of the TCR sequence. Schema illustrates the “individuals” in our system, the CDR3 regions (circled in red) of TCRs, whose diversity is generated through recombined DNA and subsequent deletions and insertions (A). This recombined DNA is then transcribed into mRNA (B) where it is then translated into a TCR (C). Here, the CDR3 region is largely responsible for the binding specificity of a receptor to an antigen (brown sphere). Messenger RNA may also be isolated and amplified in RNA-seq experiments (B). Following amplification, these sequences can be aligned and assembled into TCR clones and counted. The relative counts of different unique CDR3 allow diversity metrics to be made and further analyzed.

Figure 1.

Generation and inference of the TCR sequence. Schema illustrates the “individuals” in our system, the CDR3 regions (circled in red) of TCRs, whose diversity is generated through recombined DNA and subsequent deletions and insertions (A). This recombined DNA is then transcribed into mRNA (B) where it is then translated into a TCR (C). Here, the CDR3 region is largely responsible for the binding specificity of a receptor to an antigen (brown sphere). Messenger RNA may also be isolated and amplified in RNA-seq experiments (B). Following amplification, these sequences can be aligned and assembled into TCR clones and counted. The relative counts of different unique CDR3 allow diversity metrics to be made and further analyzed.

Close modal

Measuring TCR diversity is important in understanding the fundamental immunology of aging, infectious disease, autoimmune disease, hereditary immunodeficiencies, and cancer. Having originated in the ecology literature, metrics to quantify diversity are functions of species richness (the number of unique individuals in a population) and/or species dominance (the relative abundance of each species; ref. 6). In the context of quantifying TCR repertoires via sequencing methods, “abundance” refers to the total number of sequencing reads, whereas the “species” is a unique TCR, also referred to as a “clonotype.” Quantifying the total number of unique clonotypes, i.e., richness, of an organism would require an impossibly large number of reads (7). Even capturing all TCR in the tumor is a subsampling of the total number of clonotypes that exist in that organism. Diversity, however, informs the degree to which clonotypes are equally represented. Given a sufficient abundance of reads, one could calculate the true diversity of a sample and perhaps glean insight into the diversity of the organism as a whole. Diversity is an important summary statistic for the TCR repertoire because it reflects (i) the potential breadth of antigen specificities, (ii) the degree of antigen-driven clonal expansion present in the underlying T-cell population, and (iii) the degree of access/localization that the T-cell repertoire has to the tissue itself. This clonal expansion is a key consequence of T-cell activation, pathogenesis in autoimmune disease, and protective immunity after vaccination or pathogen exposure. Thus, TCR repertoire diversity may influence and mark specific states of health and disease.

In oncology, multiple groups have reported associations between TCR diversity and clinical outcomes in patients with cancer. High TCR repertoire diversity associates with improved survival in multiple tumor types, response to cetuximab in head and neck cancer, and response to CTLA-4 inhibition in melanoma, hepatocellular carcinoma, and with GVAX (an allogeneic whole cell tumor vaccine) in pancreatic cancer (8–15). In contrast, low TCR repertoire diversity (high clonality) associates with clinical response to PD-1 axis inhibition in melanoma and urothelial carcinoma (16–20). These results highlight the complexity of TCR repertoire biology, along with the promise and challenge of integrating repertoire diversity in biomarker development.

Quantifying TCR species is not as straightforward as counting clearly defined organisms. The sequence region of maximal diversity, complementary determining region 3 (CDR3), responsible for conferring most of the specificity for epitope recognition (21), may include junctional deletion of germline nucleotides and nontemplated nucleotide insertions (Fig. 1; ref. 4). In practice, DNA or RNA is extracted from a sample of interest followed by targeted capture and/or amplification of TCR sequences to prepare next-generation sequencing libraries. Assembling and aligning the sequences thus require dedicated computational tools such as TRUST (22) or MiXCR (23) that must be evaluated. Regardless of the method used, TCR repertoire profiling is heavily biased by sampling depth of the T-cell population. Despite the wide array of methods for generating TCR repertoire profiling data and applications of TCR repertoire diversity in biology and medicine, performance characteristics of algorithms to infer TCR repertoires and their resulting diversity estimates have not been reported on ground-truth datasets. Therefore, optimizing TCR repertoire diversity measurements requires answering three questions: (i) which tools accurately identify and quantify TCR sequences, (ii) which diversity metric is the least sensitive to subsampling, and finally (iii) if no diversity metric is unaffected by subsampling, can a model be developed that provides accurate diversity measurements even with extreme subsampling?

Here, we reported an evaluation of published methods for inferring TCR repertoires from RNA sequencing (RNA-seq) data, a description of bias characteristics in diversity estimation caused by undersampling, and an algorithm derived from statistical learning to provide accurate and precise estimates of TCR repertoire diversity. Applying our method to correct TCR repertoire diversity estimates in data from The Cancer Genome Atlas (TCGA), we found more pronounced correlations between diversity and expected immunogenomics features. Applied to RNA-seq data from patients with melanoma treated with immune checkpoint inhibitors, we found otherwise unobserved survival and response associations with TCR repertoire diversity. The methods and analysis code that generated them were freely available with this publication. We expect the method reported here will be valuable in refining TCR repertoire diversity estimates from tumor RNA-seq data, for use in repertoire profiling studies, and TCR repertoire–based biomarker development going forward.

TCR repertoire generation

One thousand TCRαβ repertoires were generated using Synthetic TCR Informatics Generator (STIG; ref. 24). STIG was used to generate 100 million individuals (i.e., virtual cells) from 5,000 clonotypes across a range of diversities. Reads from this population were generated to create paired-end, 48 bp reads to match the TCGA protocol. Base quality scores were provided to STIG to match the same quality scores and error rates as TCGA data. Median insert length and SD were also selected from a range of those commonly found in a subsampling of TCGA samples (153–209 bp and 43–77 bp, respectively). The distribution of TCR clonotypes was set using STIG's χ2 distribution with 2 degrees of freedom and a cutoff range from 6 to 8969. The range of modeled diversity falls within the range of 99% of TCGA samples. Matching the distribution of diversity and abundances was performed empirically and required several rounds of repertoire generation, alignment/assembly of repertoire reads, model generation, and diversity calculation. Thymic tissue (THYM) was excluded in the determination of tissue diversity as the diversity of THYM is much higher than others. Read abundances were also matched to the distribution of those found in TCGA data (see “TCGA analysis” methods section). Approximately 10% of STIG reads aligned to the CDR3 region with the remaining unaligned reads representing other regions of the TCR.

TCGA analysis

TCGA primary tumors were used. See figures for tissue types and counts. Samples were dropped according to the “DO_NOT_USE” column of GDC's TCGA “sample annotations” (see merged_sample_quality_annotations.tsv, https://gdc.cancer.gov/about-data/publications/panimmune).

Furthermore, upon recommendation of TCGA personnel: TCGA-06-0156-01A-02R-1849-01 was used for TCGA-06-0156-01, TCGA-06-0211-01B-01R-1849-01 was used for TCGA-06-0211-01, TCGA-21-1076-01A-01R-0692-07 was used for TCGA-21-1076-01, TCGA-23-1023-01A-02R-1564-13 was used for TCGA-23-1023. Finally, for the remaining duplicated samples, the last sample (ordered alphabetically) was used. The TCGA data types used were batch-corrected RNA-seq expression data (EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv, https://gdc.cancer.gov/about-data/publications/pancanatlas) and TCR diversity metrics (see “Diversity of subsampled populations” methods section) were calculated from MiXCR outputs (see “MiXCR and TRUST comparisons” methods section).

Diversity of subsampled populations

This initial subsampling was performed on the ground-truth counts generated by STIG prior to read generation. Chao1 (25) and Shannon entropy (26) were calculated using the R Package vegan. Evenness was calculated as Shannon entropy/Log (Richness; ref. 27). Evenness produces a comparable result to the “productive clonality” as reported by Adaptive Biotechnologies. Their “clonality” is 1 - evenness, although they use log base 2 in both their entropy and normalization (personal communication). d25, d50, and d75 (dXX) indices were calculated as described by VDJTools (28) GitHub repository: “The estimate equals to 1 - n/N, where n is the minimum number of clonotypes accounting for at least XX% of the total reads and code N is the total number of clonotypes in the sample.”

MiXCR and TRUST comparisons

MiXCR, TRUST, and TRUST 3.0 Dockerfiles and example commands can be found here: https://github.com/Benjamin-Vincent-Lab. As TRUST does not align sequences, we used Bowtie2 (29) to align reads prior to using TRUST for these analyses. Morisita–Horn was calculated using the vegdist function in the vegan R package and shown as a similarity index (1 - Morisita–Horn dissimilarity index). Mismatch Index was calculated as the median number of mappings to a mismatch repertoire over the number of mappings to the correct repertoire. With this index, a number greater than one indicates the tool matched a CDR3 in the wrong repertoire more commonly than the correct repertoire. Basic local alignment (BLAST) searches were performed using BLAST+ (30) v2.2.30 with the STIG ground-truth repertoire sequences being used as the database for each sample.

Model generation

Elastic net model generation (31) was performed using Monte Carlo cross-validation. One thousand simulated TCR repertoires were split into discovery (n = 667 repertoires) and validation (n = 333 repertoires) sets. The discovery set was repeatedly split into 3 folds with 2 folds for training and 1 fold for testing. Elastic net regression models were fit with repertoire features (abundance, Chao1, d25 index, evenness, inverse Simpson, richness, and Shannon entropy) used as predictor variables and ground-truth Shannon entropy as the response variable. Cross-validation was performed on the discovery set 100 times across 10 α values, with the selected model having the lowest root mean squared error. Samples were balanced by the number of events for both the discovery-validation and test-training set splits. For the calculation of modeled entropy, analyzed samples were required to have an abundance of 8 or higher for both α and β TCR chains.

Data availability

TCGA data (32) and Riaz and colleagues' SKCM immune checkpoint data (33) can be accessed through their original publications. The TCR datasets for RNA and amplicon sequencing can be generated using STIG: https://github.com/Benjamin-Vincent-Lab/stig, see “TCR repertoire generation” and “Code availability” method sections for details.

Code availability

Our tool for TCR repertoire simulation, STIG (24), was written in the python programming language, and our method for correcting diversity estimates was written in the R statistical programming language. Both are freely available, along with code necessary to reproduce our analysis, at https://github.com/Benjamin-Vincent-Lab. Code used to calculate modeled entropy may be obtained using the following R package: devtools::install_github(“Benjamin-Vincent-Lab/binfotron”, ref = “0.2–14”). Modeled entropy may be calculated using the functions: predict_true_entropy_from_counts or predict_true_entropy_from_diversity. Help documentation is provided in the package. Ground-truth TCR repertoires may be generated using the STIG software provided here: https://github.com/Benjamin-Vincent-Lab/stig.

Statistical analysis

Analyses were performed and visualized using R (34) v3.5.2 through RStudio (35) with the R packages ggplot2 v3.1.1 (36), ggsurvplot v0.4.3, ggrepel v0.8.2, ggseqlogo v0.1 (37), glmnet v2.0-18 (38), parallel, stats v3.5.2, survival v2.43-3 (39), survminer v0.4.3, tidyr v0.8.3, vegan v2.5-3, and viridis v0.5.1. FDR corrections were performed via the Benjamini–Hochberg method (40) using the R-package stats::p.adjust. P value thresholds were 0.05, 0.01, and 0.001 (*, **, and ***, respectively). “Significance” was calculated by -log10 (P value).

Fidelity of diversity metrics

TCR receptor repertoire diversity is typically calculated by assembling and counting the sequences that encode their CDR3 receptors (Fig. 1). It is becoming more and more common to use bulk RNA-seq data to assess this diversity (Fig. 1B), as RNA-seq data are more available than amplicon-based data for many applications. However, RNA-seq datasets often have far lower CDR3 counts, and thus more subsampling, than TCR amplicon datasets. We first sought to understand the extent to which this subsampling affected mathematical indices of diversity. Starting with a hypothetical population of 50 individuals, we observed the variance in diversity estimation using the metrics of Shannon entropy and evenness (1 – “productive clonality”), a commonly used metric for normalizing Shannon entropy. The variance increased as the subsampling became more severe for both methods (Fig. 2A). To evaluate if subsampling only affected diversity metrics at lower population sizes, we simulated 1,000 repertoires at varying degrees of subsampling from 2 to 100 million individuals and assessed the variation in diversity estimation using common diversity metrics. The ratios of ground-truth diversity index values (i.e., the diversity measurement taken at 100 million individuals) to subsampled values for each index are plotted as a function of abundance (Fig. 2B). The error in diversity was not limited to Shannon entropy and evenness, nor was it confined to those samples that were severely undersampled. As abundance increased, calculated diversity indices did eventually converge to accurate estimates of true diversity. However, this happened at abundances well above those of the relatively low coverage of TCR loci in RNA-seq experiments, including those found in TCGA data (32). Even evenness, a metric commonly used to correct for subsampling, failed to provide a consistent metric for diversity. Conversely, for counts over about 27, evenness was further from the ground-truth value than using an unnormalized metric like Shannon entropy. Although it has been reported that evenness tends to overestimate diversity at lower abundances (41), we were surprised to see that this overestimation was much more pronounced in samples with lower initial diversity (Fig. 2C). Not only would this systematic overestimation of evenness obfuscate correlations with diversity, but in datasets where samples with higher true diversity also have higher abundance, this normalization could flip the direction of a correlation.

Figure 2.

Effect of subsampling on TCR diversity metrics. A, Each dot represents a subsampled individual. The color represents the individual's species or clonal identity. Each subsampling of individuals was performed on the same starting population and would ideally have the same reported diversity. B, One thousand simulated populations were subsampled and normalized to the diversity of the full population (100 million individuals). Shaded regions around the lines indicate confidence intervals (top plot). Original counts came from STIG RNA-seq ground-truth samples (see “TCR repertoire generation” in the Materials and Methods section). The bottom plot shows the density of abundances for TCGA TCR data. C, Subsampled evenness over true evenness on subsampled counts from simulated samples whose Shannon entropy approximates percentiles of TCGA TRB samples.

Figure 2.

Effect of subsampling on TCR diversity metrics. A, Each dot represents a subsampled individual. The color represents the individual's species or clonal identity. Each subsampling of individuals was performed on the same starting population and would ideally have the same reported diversity. B, One thousand simulated populations were subsampled and normalized to the diversity of the full population (100 million individuals). Shaded regions around the lines indicate confidence intervals (top plot). Original counts came from STIG RNA-seq ground-truth samples (see “TCR repertoire generation” in the Materials and Methods section). The bottom plot shows the density of abundances for TCGA TCR data. C, Subsampled evenness over true evenness on subsampled counts from simulated samples whose Shannon entropy approximates percentiles of TCGA TRB samples.

Close modal

We next investigated the rigor of commonly used diversity metrics. Prior to comparing these measurements, we first needed to assess TCR repertoire inference methods for correct inference of individual TCR sequences. MiXCR (23) and TRUST (22) are the two primary tools used for assembling TCR clonotype sequences from paired-end reads. Because the only published comparison of these tools was done by the MiXCR group themselves (42), an independent confirmation was desired. TCR repertoires and reads were generated using Synthetic TCR Informatics Generator (STIG; ref. 24), which allowed us to know the true sequences and distributions of the TCR repertoires for each sample. Our findings indicated MiXCR outperformed TRUST methods with respect to sensitivity, specificity, and precision for both chain identity and CDR3 sequence (Supplementary Fig. S1). These findings were consistent regardless if either exact match or 95% sequence matching by BLAST was used (Q95; Supplementary Fig. S2). In addition, we assessed the performance of MiXCR amplicon and RNA-seq methods on STIG-simulated repertoires of amplicon reads (Supplementary Fig. S3). Amplicon-based sequencing produces higher quality and abundance reads by targeting the area of sequencing to one specific region of the genome, such as the V(D)J segments of TCR (43). TRUST methods could not be tested on these amplicon repertoires as the tool either failed or took over a week to run. Both MiXCR methods produced nearly identical results in all respects. As expected, due the higher quality input data, the performance of MiXCR on amplicon-simulated reads in determining V/J region and CDR3 sequence far exceeded that of the RNA-seq simulated dataset. We next determined the capacity of these TCR methods to support accurate estimation of true diversity (i.e., the known diversity of the STIG samples at the full 100 million individuals). Regardless of the TCR chain or diversity index used, no measurements supported accurate estimation of true index values from RNA-seq–simulated reads (Supplementary Fig. S4A–S4D). An ideal diversity index should be stable across a range of abundance and entropy values. Surprisingly, even at the higher abundances of amplicon-simulated reads, evenness still failed to track well with either the ground-truth evenness or Shannon entropy (Supplementary Fig. S4E and S4F). Only Shannon entropy was consistent across a wide range of entropy values and abundances for these amplicon-simulated datasets (Supplementary Fig. S4G and S4H, respectively).

Modeling entropy for low-abundance samples

Given the stability of Shannon entropy as a diversity index across a wide range of parameters, we sought to derive a model to predict the true TCR repertoire entropy when abundances are low (i.e., in conditions of undersampling such as tumor RNA-seq experiments). To accomplish this, we utilized elastic net regression (31) to model the ground-truth Shannon entropy. This statistical learning was chosen as it (i) includes parameters for penalizing and removing features that would needlessly add model complexity and variance, and (ii) can perform well in cases where predictor variables are co-correlated. Although direct calculation of Shannon entropy showed modest correlation with ground-truth values, albeit with high variance (Fig. 3A), elastic net models showed tighter correlations with less variance, whether models were trained and tested on the same chain (Fig. 3B), trained on one type of chain and tested on the other (Fig. 3C), or trained and tested in a chain agnostic fashion (Fig. 3D). All of these models favored evenness and the d25 index in their feature selection and beta coefficient weight (Supplementary Fig. S5). When abundance was sufficiently high, entropy was accurately estimated by direct calculation, and model-based correction was not needed (Fig. 2B; Supplementary Fig. S4E). To determine the exact conditions under which direct calculation is insufficient and model-based correction improves entropy estimation, we analyzed subsampled repertoires for accuracy of direct calculation of entropy based on the measured value of that entropy calculation (Fig. 3E). Here, we determined the abundance needed at a measured Shannon entropy to be within 5% of the ground-truth Shannon entropy. If a sample's measured Shannon entropy and abundance put the sample over this threshold (blue line, Fig. 3E), the measured Shannon entropy was taken prima facie, and no correction was made for our model entropy calculation (Supplementary Fig. S6).

Figure 3.

Numerous diversity metrics were applied to the MiXCR output of 1,000 STIG-generated RNA-seq samples and split into discovery and validation test sets. An elastic net model was applied to the discovery set to generate a model to predict true Shannon entropy (i.e., calculated on the sample population of 100 million CDR3 from which the reads were generated) from the discovery data and then applied to the validation data shown here. A–D, Dotted gray line is unity line. Blue line indicates a smoothed generalized linear model fit of the data with confidence intervals. A, The measured uncorrected Shannon entropy values against the true Shannon entropy for each chain. B, Elastic net models were generated on each chain's discovery set and applied to validation set of the same chains to model Shannon entropy. C, Models generated in B were cross applied to the other chain. D, A new model was generated on subsampled counts of the ground-truth population data to generate a single chain–agnostic model that was then applied to model entropy measurements of both chains. See Supplementary Fig. S6 for more details. E, To determine the conditions under which no correction was needed, Shannon entropy was calculated on subsampled repertoire counts of the ground-truth data. Colors indicate the average accuracy of the Shannon entropy value at a given abundance [abs(measured-actual)/actual]. The gray border indicates lci:0.975, uci:1.025; black lci:0.9975, uci:1.00025. Blue indicates the line over which the measured Shannon entropy can be taken prima facie and does not need correction.

Figure 3.

Numerous diversity metrics were applied to the MiXCR output of 1,000 STIG-generated RNA-seq samples and split into discovery and validation test sets. An elastic net model was applied to the discovery set to generate a model to predict true Shannon entropy (i.e., calculated on the sample population of 100 million CDR3 from which the reads were generated) from the discovery data and then applied to the validation data shown here. A–D, Dotted gray line is unity line. Blue line indicates a smoothed generalized linear model fit of the data with confidence intervals. A, The measured uncorrected Shannon entropy values against the true Shannon entropy for each chain. B, Elastic net models were generated on each chain's discovery set and applied to validation set of the same chains to model Shannon entropy. C, Models generated in B were cross applied to the other chain. D, A new model was generated on subsampled counts of the ground-truth population data to generate a single chain–agnostic model that was then applied to model entropy measurements of both chains. See Supplementary Fig. S6 for more details. E, To determine the conditions under which no correction was needed, Shannon entropy was calculated on subsampled repertoire counts of the ground-truth data. Colors indicate the average accuracy of the Shannon entropy value at a given abundance [abs(measured-actual)/actual]. The gray border indicates lci:0.975, uci:1.025; black lci:0.9975, uci:1.00025. Blue indicates the line over which the measured Shannon entropy can be taken prima facie and does not need correction.

Close modal

Consistency of diversity metrics

Validating the model-based corrected diversity measurements on biological samples is a difficult task given that there is no ground-truth reference in this context. There are, however, multiple features believed to correlate with diversity that can be tested for association with diversity estimates. TCR diversity is generally accepted to inversely correlate with age, as the pool of TCRs available for selection is gradually lost over time (44). Similarly, diversity may inversely correlate with tumor mutations, as the number of mutations is associated with the number of neoantigens that can be targeted by T cells, driving successfully binding T cells to clonally expand and attack their targets (4). Therefore, we hypothesized that a more accurate diversity metric would have a stronger inverse correlation with age, SNV neoantigens, and InDel neoantigens. Analyzing data from TCGA project (primary tumors with TCR abundance over 8 reads and neoantigen counts available), the corrected diversity for TCRα and TCRβ chains showed coefficients closer to -1 with age and neoantigens than other diversity metrics with low variance and higher significance than any other metric (Fig. 4A; Supplementary Data S1). Conversely, we would expect diversity to be higher in the thymus than any other tissue type because thymus tumor purity is less than 100% and the thymus is the source of T-cell generation (4). Again, the corrected diversity metric outperformed the other diversity metrics in this regard with low variance, a coefficient close to 1 and higher significance than any other metric (Fig. 4A; Supplementary Data S1).

Figure 4.

Multiple diversity metrics were calculated on non–ICP-treated TCGA tumor samples for each TCR chain to compare with the modeled entropy metric. A, Metrics were used to fit a generalized linear model (stats::glm) with age, SNV, and InDel neoantigens. The gray vertical line at -1 indicates the expected coefficient for these models, as diversity is widely held to be inversely correlated with these features. Conversely, for the prediction of which samples were taken from the thymus, a gray line is drawn at 1, as these should be directly correlated with T-cell diversity. Dotted pink line indicates P value of 0.05. FDR corrections done by panel. Point spread indicates 95th confidence interval of coefficients. B and C, CoxPH regressions were performed on TCGA data. Each TCGA tissue was divided into three quantiles, and the width of the HRs Confidence Intervals [log10(Upper C.I.) – log10(Lower C.I.)] was measured. A fourth group was also formed by sampling a third of the samples from each of the quantiles. Each bar shows the distribution of the confidence interval widths for all TCGA tissues. Boxplots indicate the upper and lower quartiles with median centerline. Whiskers show 1.5× interquartile range. Points show outliers. Confidence interval widths were not consistent across abundance ranges for Shannon entropy (Kruskal–Wallis P = 0.0049, stats::kruskal.test), but did not significantly differ for evenness and modeled entropy (P = 0.5018 and 0.7669, respectively). The axis for C is 1/20th that of B to show the distribution of measurements across groups. See Supplementary Fig. S7A for same-axis comparison.

Figure 4.

Multiple diversity metrics were calculated on non–ICP-treated TCGA tumor samples for each TCR chain to compare with the modeled entropy metric. A, Metrics were used to fit a generalized linear model (stats::glm) with age, SNV, and InDel neoantigens. The gray vertical line at -1 indicates the expected coefficient for these models, as diversity is widely held to be inversely correlated with these features. Conversely, for the prediction of which samples were taken from the thymus, a gray line is drawn at 1, as these should be directly correlated with T-cell diversity. Dotted pink line indicates P value of 0.05. FDR corrections done by panel. Point spread indicates 95th confidence interval of coefficients. B and C, CoxPH regressions were performed on TCGA data. Each TCGA tissue was divided into three quantiles, and the width of the HRs Confidence Intervals [log10(Upper C.I.) – log10(Lower C.I.)] was measured. A fourth group was also formed by sampling a third of the samples from each of the quantiles. Each bar shows the distribution of the confidence interval widths for all TCGA tissues. Boxplots indicate the upper and lower quartiles with median centerline. Whiskers show 1.5× interquartile range. Points show outliers. Confidence interval widths were not consistent across abundance ranges for Shannon entropy (Kruskal–Wallis P = 0.0049, stats::kruskal.test), but did not significantly differ for evenness and modeled entropy (P = 0.5018 and 0.7669, respectively). The axis for C is 1/20th that of B to show the distribution of measurements across groups. See Supplementary Fig. S7A for same-axis comparison.

Close modal

A robust estimate for diversity should also be more internally consistent, providing similar predictions across a range of TCR abundances. To test the internal consistency of diversity metrics, we randomly divided all TCGA tissues into three quantiles based on sample abundance and tested for association between overall survival and TCRβ diversity metrics using Cox proportional hazards (CoxPH) regression (45). A fourth group was made by sampling equally across the three abundance ranges. Precision of the predictions was estimated as the difference between the log10 of the upper and log10 of the lower confidence intervals for the HRs of each tissue (Fig. 4B and C). An internally consistent metric should provide a narrow width that is consistent across abundance ranges. Although confidence interval widths were not significantly different across abundance ranges for evenness and modeled entropy, Shannon entropy predictions were significantly different between abundance groups (Kruskal–Wallis, P = 0.0049). Of note, the variance of the survival predictions based on evenness was so large that the confidence interval widths needed to be placed on an axis 20x larger than that of Shannon entropy and modeled entropy (Fig. 4B; see Supplementary Fig. S7A for same axis comparison). Similarly, the variability in evenness resulted in large coefficients but low significance in correlating with other biological correlates of diversity (Fig. 4A). Together, these results demonstrated that the model-based diversity estimates showed improved correlations with true biological diversity and better internal consistency.

Predicting patient outcomes using TCR-modeled entropy

With model-based corrected diversity providing better correlations with biological diversity than other metrics, we then asked: Does this improved metric of diversity improve prediction of survival of patients with cancer? We note that even a perfect measurement of diversity may not predict survival in cancer. That said, we hypothesized that a decrease in diversity should prognosticate better survival as it would indicate the immune system is mounting an effective response against the tumor (16–18). To our surprise, in many TCGA tissue types and in all TCGA samples as a whole, higher corrected diversity associated with increased survival. In addition, we tested the association of pretreatment tumor-associated TCR repertoire diversity with survival in patients with melanoma treated with nivolumab, a monoclonal antibody that inhibits programmed cell death receptor 1 (PD-1). We restricted our analysis to the Riaz and colleagues' dataset (33) as other publicly available RNA-seq datasets associated with immune checkpoint inhibition trials were generated using formalin-fixed paraffin-embedded (FFPE)–derived RNA. Repertoire inference from RNA-seq data is untrustworthy when degraded RNA derived from FFPE tissue is used as template for sequencing library preparation (46). Increased TCR repertoire diversity was associated with improved survival in patients with melanoma treated with nivolumab (Fig. 5A; Supplementary Data S2). The magnitude of the effect was at least that of survival association with diversity in TCGA melanoma samples. Only the significance of TCRα for all TCGA samples was revealed if evenness was used as the diversity normalization (Supplementary Fig. S7B). A similar result was seen looking at the Kaplan–Meier plots for TCGA-treated samples (Fig. 5B; Supplementary Data S2) and the Riaz and colleagues' immune checkpoint–treated samples (Fig. 5C). This effect was largest when both alpha and beta chains had high modeled entropy. When comparing TCR diversity metrics according to patient response, responders (mResist score of partial or complete response) showed significantly higher diversity when looking at modeled entropy (Fig. 5D; TCRα P = 0.0083, TCRβ P = 0.0168, two-sided Mann–Whitney test). Evenness showed no significance (TCRα P = 0.3702, TCRβ P = 0.6273). Nonprogressors (mResist score of partial response, complete response, or stable disease) showed a similar although less significant difference as responders, with nonprogressors showing higher TCRα-modeled entropy than progressors (Fig. 5E; P = 0.0176). The difference when comparing evenness was not significant (P = 0.0985). As mentioned previously, there was a statistically significant association of TCRα-modeled entropy with overall survival in the Riaz and colleagues' immune checkpoint–treated samples. This difference was not observed in TCGA SKCM dataset, in spite of it having a larger sample size (283 TCGA SKCM samples vs. 33 for Riaz and colleagues; Fig. 5A). To explore this further, we bootstrapped TCGA and Riaz SKCM samples and quantified the percentage of samplings that indicated high model predictions of TCRα entropy predict survival (Fig. 5F). In addition, we calculated the median HRs for these samplings (Fig. 5G). Both of these results indicated that fewer immune checkpoint inhibitor–treated samples were needed to obtain the same predictive power as untreated samples.

Figure 5.

A, CoxPH (survival::coxph) was performed on overall survival with modeled Shannon entropy as the predictor variable. HRs with 95th confidence intervals are shown for all TCGA tissue subtypes where treatment is unknown. Immune checkpoint (ICP) therapy–treated results of SKCM patients from TCGA and Riaz and colleagues are plotted in the bottom plots. Counts next to tissue types indicate counts of events to nonevents. Pink indicates FDR significance for panel (high 18.1, P = 7.6 × 1019; low 1.9, P = 0.0141; see Supplementary Data S2 for all values). B and C, Kaplan–Meier plots are shown (survival::survfit and survminer::ggsurvplot) with regular log-rank test P values reported splitting non–ICP-treated TCGA samples (B) and ICP-treated Riaz and colleagues' samples (C) by high and low modeled entropy around the median value for TRA and/or TRB chains. Dashes indicate censorings. D and E, Boxplots show modeled entropy and evenness diversity metrics according to response classifications. Box indicates the top and bottom quartiles with median centerline. Whiskers show 1.5× interquartile range. Points show outliers, with violins indicating sample distributions. P values calculated by Mann–Whitney test (stats::wilcox.test): modeled entropy responder TRA P = 0.0083, TRB P = 0.0168 (D), modeled entropy progressor TRA P = 0.0175 (E). ns, not significant. F, Power estimated by bootstrapping samples and recording the percentage of subsamplings that indicate TRA-modeled entropy predicts survival with P ≤ 0.05. ICP therapy SKCM samples shown alongside non–ICP-treated TGCA and TCGA SKCM data. G, The median HRs for the bootstrapped samplings are shown with the inner 95th percentile shaded.

Figure 5.

A, CoxPH (survival::coxph) was performed on overall survival with modeled Shannon entropy as the predictor variable. HRs with 95th confidence intervals are shown for all TCGA tissue subtypes where treatment is unknown. Immune checkpoint (ICP) therapy–treated results of SKCM patients from TCGA and Riaz and colleagues are plotted in the bottom plots. Counts next to tissue types indicate counts of events to nonevents. Pink indicates FDR significance for panel (high 18.1, P = 7.6 × 1019; low 1.9, P = 0.0141; see Supplementary Data S2 for all values). B and C, Kaplan–Meier plots are shown (survival::survfit and survminer::ggsurvplot) with regular log-rank test P values reported splitting non–ICP-treated TCGA samples (B) and ICP-treated Riaz and colleagues' samples (C) by high and low modeled entropy around the median value for TRA and/or TRB chains. Dashes indicate censorings. D and E, Boxplots show modeled entropy and evenness diversity metrics according to response classifications. Box indicates the top and bottom quartiles with median centerline. Whiskers show 1.5× interquartile range. Points show outliers, with violins indicating sample distributions. P values calculated by Mann–Whitney test (stats::wilcox.test): modeled entropy responder TRA P = 0.0083, TRB P = 0.0168 (D), modeled entropy progressor TRA P = 0.0175 (E). ns, not significant. F, Power estimated by bootstrapping samples and recording the percentage of subsamplings that indicate TRA-modeled entropy predicts survival with P ≤ 0.05. ICP therapy SKCM samples shown alongside non–ICP-treated TGCA and TCGA SKCM data. G, The median HRs for the bootstrapped samplings are shown with the inner 95th percentile shaded.

Close modal

Taken together, these results indicate our modeled diversity metric improved discrimination of responses in TCGA dataset and in patients with melanoma treated with PD-1 inhibition.

TCR repertoire diversity reflects the degree of clonal expansion in a T-cell population and thus is a valuable biological readout as well as a potential biomarker of health or disease. Here, we showed that accurate measurement of TCR repertoire diversity required correction for undersampling that may occur as a result of biological limitations, such as testing only a small portion of a blood or tissue-resident T-cell population of interest, or technical limitations such as low sequencing read coverage for a given sample. Optimal estimation of true population diversity also required use of the most accurate algorithm for TCR repertoire inference. Based on our results, we recommend that MiXCR be used due to improved sensitivity, precision, and sequence length independence regardless of sequence identity threshold used to call individual TCR clonotypes.

Despite the good performance of MiXCR in reconstructing TCR repertoires from RNA-seq reads, existing methods of calculating diversity from these repertoire counts were inaccurate under typical RNA-seq conditions. We developed a method for correcting diversity estimates that can be applied to any dataset using software supplied with this article. Of note, models built from TCRα data performed well when applied to TCRβ datasets and vice versa. Even “chain agnostic” models trained on subsampled population counts performed well on TCR data, suggesting that our method may be generally extensible to entropy estimation in the context of undersampling beyond TCR repertoire data as long as the expected diversity of the sampled population is within the range provided to the model. In datasets where sample entropy estimates fall outside the range of entropies upon which the model was trained (0.98 to 8.11), we recommend training a new entropy correction model on the desired true Shannon entropy range.

Although our method of modeling the true Shannon entropy from RNA-seq data should effectively correct entropy measurements with different abundances, it should only be used to correct entropy within a batch of samples. Sequencing methods that are better or worse at detecting low abundance sequences in low-quality reads could produce very different values. For example, amplicon data could be very sensitive to detecting low abundance sequences and would therefore produce much higher diversity measurements. Given the differences in sensitivity across sample preparations and sequencing methods, it would be very difficult to make a generalizable tool that corrects across sequencing methods. For these high-quality high-abundance sequencing methods, we recommend using Shannon entropy as this metric approaches the ground-truth Shannon entropy when the number of reads is in the tens of thousands range or higher. The use of evenness and “productive clonality” (1-evenness) is strongly discouraged.

We applied our method of improving the accuracy of entropy estimation to TCR repertoires derived from RNA-seq datasets in TCGA and a study of pretreatment samples from patients with melanoma undergoing treatment with the PD-1 inhibitor nivolumab. In the Riaz and colleagues' melanoma study, we found an association between diversity and survival where the magnitude of effect size was at least as great as the survival associations in TCGA (i.e., independent of immunotherapy treatment as well as in the small number of patients with melanoma in TCGA that were treated with immunotherapy after sample collection). Given our analysis of power by bootstrapping, we believe the overlapping effect sizes were a function of far greater numbers in the nontreated TCGA dataset; however, we cannot rule out that TCR repertoire diversity was prognostic rather than predictive in the case of nivolumab treatment. Future work on additional datasets will be required to answer this question.

Our use of modeled entropy indicated higher TCR diversity was associated with better survival. This observation was in contrast to our expectation. We previously hypothesized that better outcomes would correlate with reduced diversity, indicating a clonally restricted T-cell population primed to respond to tumor antigens. There is evidence that lower pretreatment diversity is associated with response to PD-1 axis inhibition (17–20); however, other studies have found associations between higher TCR diversity and more favorable clinical outcomes (8–15). It is possible that the association of low diversity and better outcomes is unique to mechanisms of action that reverse T-cell exhaustion, related to necessity of a clonally expanded T-cell population, reactive against tumor antigens but exhausted, to be present in the tumor prior to treatment. In other therapeutic contexts (i.e., those not driven by the mechanism of reversing T-cell exhaustion), the necessity for a diverse TCR repertoire to recognize tumor antigens may be more important. In addition, there is emerging evidence that immune checkpoint inhibition efficacy is driven by clonal expansion that happens after initiation of therapy (15, 47, 48). Thus, we expect that more accurate measurement of TCR repertoire diversity, as well as measurements of repertoire diversity while on therapy, will provide better biomarkers of response. We also expect that the magnitude and direction of associations between TCR repertoire diversity and clinical outcomes will vary by therapeutic mechanism of action.

If, as our results suggest, clonal expansion occurs after administering immune checkpoint blockade, then monitoring the changes in diversity upon receiving ICP therapy could be an early indicator of how well a patient is responding to immunotherapy. Our method of accurately estimating diversity even with low counts may make such diagnostics readily affordable and aid in numerous fields that must rely upon assessing diversity from subsampled populations.

D.S. Bortone reports personal fees from GeneCentric Therapeutics Inc. (for assay creation and biomarker development) outside the submitted work. B.G. Vincent reports other from GeneCentric Therapeutics (equity) outside the submitted work. No disclosures were reported by the other authors.

D.S. Bortone: Conceptualization, data curation, software, formal analysis, validation, investigation, visualization, methodology, writing–review and editing. M.G. Woodcock: Software, writing–review and editing. J.S. Parker: Conceptualization, supervision, writing–review and editing. B.G. Vincent: Conceptualization, resources, supervision, funding acquisition, methodology, writing–original draft, project administration, writing–review and editing.

The authors thank members of the Vincent Lab in the UNC Lineberger Comprehensive Cancer Center for helpful discussion of the article.

Funding was provided by the UNC University Cancer Research Fund (to B.G. Vincent and J.S. Parker), the UNC-Duke Immunotherapy Training Program (T32-CA211056 to M.G. Woodcock), the Susan G. Komen Foundation (Career Catalyst Research Award to B.G. Vincent), and the V Foundation for Cancer Research (to B.G. Vincent).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

1.
Kappler
J
,
Kubo
R
,
Haskins
K
,
Hannum
C
,
Marrack
P
,
Pigeon
M
, et al
The major histocompatibility complex-restricted antigen receptor on T cells in mouse and man: identification of constant and variable peptides
.
Cell
1983
;
35
:
295
302
.
2.
Raulet
DH
. 
The structure, function, and molecular genetics of the gamma/delta T cell receptor
.
Annu Rev Immunol
1989
;
7
:
175
207
.
3.
Monks
CR
,
Freiberg
BA
,
Kupfer
H
,
Sciaky
N
,
Kupfer
A
. 
Three-dimensional segregation of supramolecular activation clusters in T cells
.
Nature
1998
;
395
:
82
6
.
4.
Janeway
C
,
Travers
P
,
Walport
M
,
Capra
J
.
Immunobiology: the immune system in health and disease
.
New York
:
Elsevier Science Ltd/Garland Publishing
; 
1999
.
5.
Lythe
G
,
Callard
RE
,
Hoare
RL
,
Molina-Paris
C
. 
How many TCR clonotypes does a body maintain?
J Theor Biol
2016
;
389
:
214
24
.
6.
Magurran
AE
.
Measuring biological diversity
.
Malden (MA)
:
Blackwell Publishing
; 
2004
.
7.
Laydon
DJ
,
Bangham
CR
,
Asquith
B
. 
Estimating T-cell repertoire diversity: limitations of classical estimators and a new approach
.
Philos Trans R Soc Lond B Biol Sci
2015
;
370
:
20140291
.
8.
Thorsson
V
,
Gibbs
DL
,
Brown
SD
,
Wolf
D
,
Bortone
DS
,
Ou Yang
TH
, et al
The immune landscape of cancer
.
Immunity
2019
;
51
:
411
2
.
9.
Manuel
M
,
Tredan
O
,
Bachelot
T
,
Clapisson
G
,
Courtier
A
,
Parmentier
G
, et al
Lymphopenia combined with low TCR diversity (divpenia) predicts poor overall survival in metastatic breast cancer patients
.
Oncoimmunology
2012
;
1
:
432
40
.
10.
Jia
Q
,
Zhou
J
,
Chen
G
,
Shi
Y
,
Yu
H
,
Guan
P
, et al
Diversity index of mucosal resident T lymphocyte repertoire predicts clinical prognosis in gastric cancer
.
Oncoimmunology
2015
;
4
:
e1001230
.
11.
Cui
JH
,
Lin
KR
,
Yuan
SH
,
Jin
YB
,
Chen
XP
,
Su
XK
, et al
TCR repertoire as a novel indicator for immune monitoring and prognosis assessment of patients with cervical cancer
.
Front Immunol
2018
;
9
:
2729
.
12.
Kansy
BA
,
Shayan
G
,
Jie
HB
,
Gibson
SP
,
Lei
YL
,
Brandau
S
, et al
T cell receptor richness in peripheral blood increases after cetuximab therapy and correlates with therapeutic response
.
Oncoimmunology
2018
;
7
:
e1494112
.
13.
Postow
MA
,
Manuel
M
,
Wong
P
,
Yuan
J
,
Dong
Z
,
Liu
C
, et al
Peripheral T cell receptor diversity is associated with clinical outcomes following ipilimumab treatment in metastatic melanoma
.
J Immunother Cancer
2015
;
3
:
23
.
14.
Agdashian
D
,
ElGindi
M
,
Xie
C
,
Sandhu
M
,
Pratt
D
,
Kleiner
DE
, et al
The effect of anti-CTLA4 treatment on peripheral and intra-tumoral T cells in patients with hepatocellular carcinoma
.
Cancer Immunol Immunother
2019
;
68
:
599
608
.
15.
Hopkins
AC
,
Yarchoan
M
,
Durham
JN
,
Yusko
EC
,
Rytlewski
JA
,
Robins
HS
, et al
T cell receptor repertoire features associated with survival in immunotherapy-treated pancreatic ductal adenocarcinoma
.
JCI Insight
2018
;
3
:
e122092
.
16.
Keenan
TE
,
Burke
KP
,
Van Allen
EM
. 
Genomic correlates of response to immune checkpoint blockade
.
Nat Med
2019
;
25
:
389
402
.
17.
Roh
W
,
Chen
PL
,
Reuben
A
,
Spencer
CN
,
Prieto
PA
,
Miller
JP
, et al
Integrated molecular analysis of tumor biopsies on sequential CTLA-4 and PD-1 blockade reveals markers of response and resistance
.
Sci Transl Med
2017
;
9
:
eaah3560
.
18.
Tumeh
PC
,
Harview
CL
,
Yearley
JH
,
Shintaku
IP
,
Taylor
EJ
,
Robert
L
, et al
PD-1 blockade induces responses by inhibiting adaptive immune resistance
.
Nature
2014
;
515
:
568
71
.
19.
Hogan
SA
,
Courtier
A
,
Cheng
PF
,
Jaberg-Bentele
NF
,
Goldinger
SM
,
Manuel
M
, et al
Peripheral blood TCR repertoire profiling may facilitate patient stratification for immunotherapy against melanoma
.
Cancer Immunol Res
2019
;
7
:
77
85
.
20.
Snyder
A
,
Nathanson
T
,
Funt
SA
,
Ahuja
A
,
Buros Novik
J
,
Hellmann
MD
, et al
Contribution of systemic and somatic factors to clinical response and resistance to PD-L1 blockade in urothelial cancer: an exploratory multi-omic analysis
.
PLoS Med
2017
;
14
:
e1002309
.
21.
Xu
JL
,
Davis
MM
. 
Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities
.
Immunity
2000
;
13
:
37
45
.
22.
Li
B
,
Li
T
,
Wang
B
,
Dou
R
,
Zhang
J
,
Liu
JS
, et al
Ultrasensitive detection of TCR hypervariable-region sequences in solid-tissue RNA-seq data
.
Nat Genet
2017
;
49
:
482
3
.
23.
Bolotin
DA
,
Poslavsky
S
,
Mitrophanov
I
,
Shugay
M
,
Mamedov
IZ
,
Putintseva
EV
, et al
MiXCR: software for comprehensive adaptive immunity profiling
.
Nat Methods
2015
;
12
:
380
1
.
24.
Woodcock
MG
,
Bortone
DS
,
Vincent
BG
. 
STIG: generation and simulated sequencing of synthetic T cell receptor repertoires
.
BioRxiv
2020.02.28.969469 [Preprint]. 2020. Available from
: .
25.
Chiu
CH
,
Wang
YT
,
Walther
BA
,
Chao
A
. 
An improved nonparametric lower bound of species richness via a modified good-turing frequency formula
.
Biometrics
2014
;
70
:
671
82
.
26.
Shannon
C
. 
A mathematical theory of communication
.
Bell Syst Tech J
1948
;
27
:
379
423
.
27.
Pielou
EC
. 
Species-diversity and pattern-diversity in the study of ecological succession
.
J Theor Biol
1966
;
10
:
370
83
.
28.
Shugay
M
,
Bagaev
DV
,
Turchaninova
MA
,
Bolotin
DA
,
Britanova
OV
,
Putintseva
EV
, et al
VDJtools: unifying post-analysis of T cell receptor repertoires
.
PLoS Comput Biol
2015
;
11
:
e1004503
.
29.
Langmead
B
,
Salzberg
SL
. 
Fast gapped-read alignment with Bowtie 2
.
Nat Methods
2012
;
9
:
357
9
.
30.
Camacho
C
,
Coulouris
G
,
Avagyan
V
,
Ma
N
,
Papadopoulos
J
,
Bealer
K
, et al
BLAST+: architecture and applications
.
BMC Bioinformatics
2009
;
10
:
421
.
31.
Hastie
T
,
Zou
H
. 
Regularization and variable selection via the elastic net
.
J R Statist Soc B
2005
;
67
:
301
20
.
32.
Cancer Genome Atlas Research Network
;
Weinstein
JN
,
Collisson
EA
,
Mills
GB
,
Shaw
KR
,
Ozenberger
BA
, et al
The Cancer Genome Atlas pan-cancer analysis project
.
Nat Genet
2013
;
45
:
1113
20
.
33.
Riaz
N
,
Havel
JJ
,
Makarov
V
,
Desrichard
A
,
Urba
WJ
,
Sims
JS
, et al
Tumor and microenvironment evolution during immunotherapy with nivolumab
.
Cell
2017
;
171
:
934
49
.
34.
R Core Team
.
R: a language and environment for statistical computing
.
Vienna (Austria)
:
R Foundation for Statistical Computing
; 
2014
.
35.
RStudio Team
.
RStudio: integrated development for R
.
Boston (MA)
:
RStudio, Inc.;
2015
.
36.
Wickham
H
.
ggplot2–elegant graphics for data analysis
.
New York
:
Springer-Verlag
; 
2016
.
37.
Wagih
O
. 
ggseqlogo: a versatile R package for drawing sequence logos
.
Bioinformatics
2017
;
33
:
3645
7
.
38.
Friedman
J
,
Hastie
T
,
Tibshirani
R
. 
Regularization paths for generalized linear models via coordinate descent
.
J Stat Softw
2010
;
33
:
1
22
.
39.
Therneau
TM
,
Grambsch
PM
.
Modeling survival data: extending the Cox model
.
New York
:
Springer
; 
2000
.
40.
Benjamini
Y
,
Hochberg
Y
. 
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J R Statist Soc B
1995
;
57
:
289
300
.
41.
Peet
RK
. 
The measurement of species diversity
.
Annu Rev Ecol Syst
1974
;
5
:
285
307
.
42.
Bolotin
DA
,
Poslavsky
S
,
Davydov
AN
,
Frenkel
FE
,
Fanchi
L
,
Zolotareva
OI
, et al
Antigen receptor repertoire profiling from RNA-seq data
.
Nat Biotechnol
2017
;
35
:
908
11
.
43.
Rosati
E
,
Dowds
CM
,
Liaskou
E
,
Henriksen
EKK
,
Karlsen
TH
,
Franke
A
. 
Overview of methodologies for T-cell receptor repertoire analysis
.
BMC Biotechnol
2017
;
17
:
61
.
44.
Britanova
OV
,
Putintseva
EV
,
Shugay
M
,
Merzlyak
EM
,
Turchaninova
MA
,
Staroverov
DB
, et al
Age-related decrease in TCR repertoire diversity measured with deep and normalized sequence profiling
.
J Immunol
2014
;
192
:
2689
98
.
45.
Andersen
P
,
Gill
R
. 
Cox's regression model for counting processes, a large sample study
.
Ann Stat
1982
;
10
:
1100
20
.
46.
von Ahlfen
S
,
Missel
A
,
Bendrat
K
,
Schlumpberger
M
. 
Determinants of RNA quality from FFPE samples
.
PLoS One
2007
;
2
:
e1261
.
47.
Spassova
I
,
Ugurel
S
,
Terheyden
P
,
Sucker
A
,
Hassel
JC
,
Ritter
C
, et al
Predominance of central memory T cells with high T-cell receptor repertoire diversity is associated with response to PD-1/PD-L1 inhibition in Merkel cell carcinoma
.
Clin Cancer Res
2020
;
26
:
2257
67
.
48.
Yost
KE
,
Satpathy
AT
,
Wells
DK
,
Qi
Y
,
Wang
C
,
Kageyama
R
, et al
Clonal replacement of tumor-specific T cells following PD-1 blockade
.
Nat Med
2019
;
25
:
1251
9
.