Abstract
While immune checkpoint inhibitors (ICI) have revolutionized the treatment of cancer by producing durable antitumor responses, only 10%–30% of treated patients respond and the ability to predict clinical benefit remains elusive. Several studies, small in size and using variable analytic methods, suggest the gut microbiome may be a novel, modifiable biomarker for tumor response rates, but the specific bacteria or bacterial communities putatively impacting ICI responses have been inconsistent across the studied populations.
We have reanalyzed the available raw 16S rRNA amplicon and metagenomic sequencing data across five recently published ICI studies (n = 303 unique patients) using a uniform computational approach.
Herein, we identify novel bacterial signals associated with clinical responders (R) or nonresponders (NR) and develop an integrated microbiome prediction index. Unexpectedly, the NR-associated integrated index shows the strongest and most consistent signal using a random effects model and in a sensitivity and specificity analysis (P < 0.01). We subsequently tested the integrated index using validation cohorts across three distinct and diverse cancers (n = 105).
Our analysis highlights the development of biomarkers for nonresponse, rather than response, in predicting ICI outcomes and suggests a new approach to identify patients who would benefit from microbiome-based interventions to improve response rates.
Reanalysis of existing microbial data using a consistent computational approach enables identification of novel bacterial signals and reveals bacterial species to identify patients with an unfavorable response to checkpoint inhibitors. This analysis suggests a new approach to identify patients who might benefit from microbiome-based interventions.
Introduction
Immune checkpoint inhibitors (ICI) have revolutionized treatment across multiple cancer types and have been designated for tumor-agnostic use in patients with microsatellite instability cancers (1). The most widely used ICIs are mAbs that target the programmed cell death protein (PD-1), its ligand (PD-L1), or the CTL antigen 4 protein (CTLA-4), which block the activation of inhibitory checkpoint pathways on immune cells and thus unleash the host immune response (2). Checkpoint blockade is highly effective for a subset of patients, yet only approximately 10%–30% of all tumors respond to treatment (3). Increasing response rates as well as toxicity are observed when ICIs are used in combination (4), although data on combination ICI therapy remain limited.
The microbiome is increasingly recognized for its role in multiple aspects of cancer development and treatment, and additionally, as a putative independent biomarker for tumor response rates to therapy. Several studies have examined the stool composition of patients on checkpoint blockade with melanoma, non–small cell lung cancer (NSCLC), and renal cell cancer (RCC; refs. 5–10). These studies have implicated increased alpha diversity (8), along with specific members of each of the major bacterial phyla colonizing humans, namely, Firmicutes, Actinobacteria, Proteobacteria, Bacteroidetes, and Verrucomicrobia, in clinical responses. In contrast, the gut microbiota of nonresponding patients seemed to be less diverse, with the genera Bacteroides (phylum Bacteroidetes), Ruminococci and Roseburia (both Firmicutes) being prominent. While individually significant, a feature of these studies is that each has largely identified different sets of microbes and, overall, there is a lack of consensus regarding the microbial signals associated with clinical response and/or whether putative microbiota signals are tumor specific or impacted by other clinical metadata. Moreover, use of differing pipelines has limited the interpretability of the data presented to date for prediction of clinical antitumor responses, and whether microbiota signals predict nonresponse to ICIs has received less attention. Given an expanding knowledge of the microbiome's impact (and specific microbes) on normal immune system development and in drug action, the hypothesis that therapeutic tumor responses associate with specific microbiota warrants further investigation (11, 12).
Because conclusions of published individual studies may have been impacted by small sample sizes, variable clinical annotations of response and different bioinformatics methods, we chose to reanalyze the available raw 16S ribosomal RNA (rRNA) gene and metagenomic whole genome sequencing (WGS) data across five published ICI studies using a consistent computational approach for taxonomic assignment and rigorous statistical testing (13–15). Studies were selected on the basis of availability of raw sequencing data using NCBI accession numbers at the start of analysis, and validation cohorts were selected based on data published after analysis of our training set was complete (see Materials and Methods). For 16S rRNA amplicon analysis, we selected Resphera Insight as previously published data demonstrate that this pipeline outperforms QIIME RDP and QIIME UCLUST in species level identification using the Human Microbiome Project reference isolate collection (60%–70% species identified by Resphera Insight compared with 10%–25% by QIIME RDP or QIIME UCLUST; ref. 13). Further Resphera Insight has been validated as highly accurate for detection of Bacterioides and Fusobacterium spp. (13) and is also used by the FDA to enhance species identification contaminating the food chain (16, 17). We hypothesized that reanalysis of raw sequencing data from a larger sample size representing multiple cohorts using a uniform computational approach would enable derivation of a more comprehensive biomarker. Specifically, we posited that we would identify microbes that predict immunotherapeutic outcomes across cancer types, geographic locations, and multiple checkpoint inhibitors. Furthermore, our ability to correct for multiple hypothesis testing would allow detection of the most robust signals and potentially augment generalizability of microbial signals across study populations.
Materials and Methods
All code utilized in this article is available in the following GitHub repository: https://github.com/resphera-jrwhite/shaikh-sears-clin-cancer-research-2021.
Data retrieval
Raw sequencing data were retrieved with the following accession numbers: Chaput and colleagues PRJNA379764; Frankel and colleagues PRJNA397906; Gopalakrishnan and colleagues PRJEB22894, PRJEB22874 PRJEB22895, PRJEB22893; Matson and colleagues PRJNA399742; and Routy and colleagues PRJEB22863. For longitudinal samples, only the pretreatment or initial on-treatment sample was used for analysis unless specifically indicated for longitudinal analysis. All sample IDs and available metadata are provided in Supplementary Table S1. The above studies were selected on the basis of availability of sequencing data in public databases at the time this study was initiated (June 2018). For validation, cohorts were selected on the basis of availability of raw sequencing data in public databases as well as study population clinical metadata. Three studies, Peters and colleagues PRJNA541981, Zheng and colleagues PRJNA505228, and Hakozaki and colleagues PRJNA606061, fit this criteria at the time of this submission (August 2020) and therefore used as validation cohorts (18–20).
Analysis of 16S rRNA amplicon sequence data
16S rRNA amplicon sequence datasets generated through paired-end sequencing on the Illumina platform were preprocessed as follows: raw paired-end reads were trimmed for quality using Trimmomatic (v.0.32; minimum length 200 bp; min Phred quality score 20 over a 25 bp sliding window; ref. 21) and merged into consensus fragments by FLASH (v.1.2.7; minimum overlap 20 bp; 5% maximum mismatch density; ref. 22). Single-end 16S rRNA sequences from the Roche/454 platform were similarly trimmed using Trimmomatic. Passing sequences from all sequencing technologies were subsequently filtered for quality and length using QIIME (v.1.8.0; minimum final length 200 bp; ref. 23). Sequences matching the PhiX control genome were identified using BLASTN (v. 2.2.22) and removed, followed by chimera evaluation with UCLUST (v.1.2.22q; ref. 24), and screening for human-associated contaminant using Bowtie2 (v.2.2.4; ref. 25). Reads assigned to chloroplast or mitochondrial contaminants by the RDP classifier (v.2.2; ref. 26) with a minimum confidence of 50% were also removed. High-quality 16S rRNA amplicon sequences were assigned to a high-resolution taxonomic lineage using Resphera Insight (v.2.2), for which the performance has been previously benchmarked using high-quality draft genome assemblies of well-defined species from the Human Microbiome Reference Genome Database (http://hmpdacc.org/HMRGD/; ref. 13). Briefly, this method utilizes a manually curated 16S rRNA database of 11,000 unique species, and a hybrid global-local alignment procedure to assign short next-generation sequencing sequences from any region of the 16S rRNA gene to a high-resolution taxonomic lineage. The method considers each sequence independently and seeks to achieve species level resolution when possible, but if the underlying model suggests a lack of confidence in final species membership, the approach minimizes false positives by generating “ambiguous assignments,” that is, a list of species reflecting the uncertainty. For example, if a 16S rRNA sequence is ambiguous between Bifidobacterium breve and Bifidobacterium longum, the algorithm will provide the assignment: “Bifidobacterium_breve:Bifidobacterium_longum.” Moreover, sequences that are too divergent to be confidently classified to any species are subsequently clustered de novo into novel operational taxonomic units (OTU) using a 97% identity threshold, and assigned a higher-level taxonomy. In this study, ambiguous assignments and novel OTU assignments were used for calculation of percentage abundance estimates per sample, but were not considered as candidates in the meta-analysis or reanalysis index discovery process.
Whole-genome shotgun metagenomic analysis
Whole-genome shotgun metagenomic datasets were submitted for taxonomic assignment using MetaPhlAn2 (v2.6.0) with paired-end or single-end sequencing data submitted for analysis, depending on the data source (27).
Correlation of relative abundance between 16S rRNA amplicon and WGS datasets
A correlation analysis was performed between 16S rRNA amplicon sequence– and metagenomic sequence-derived taxonomic relative abundance estimates for 12 example species from the following datasets: Gopalakrishnan and colleagues (n = 24), Matson and colleagues (n = 39), and Frankel and colleagues (n = 44; Supplementary Fig. S1). In the Gopalakrishnan and Matson data, paired WGS and 16S rRNA amplicon libraries were developed and sequenced separately by the authors. For the Frankel data, 16S rRNA V4 fragments were bioinformatically identified within the WGS datasets (using a Bowtie2 local search against a V4 database of representative sequences derived from a gut microbiome cohort of 200 healthy individuals; ref. 28), and subsequently extracted and analyzed as described above. Taxonomic assignments from separate 16S rRNA amplicon and WGS marker databases were compared with each other rather than integrated into one dataset. Of the 2,132 unique canonically named Bacterial/Archaeal species (i.e., "genus_species") in the MetaPhlAn2 database, 1,952 (91.56%) are present in the Resphera Insight database.
Generation of integrated microbiome prediction index and literature indexes
Selection for the integrated microbiome prediction index development required species identified in 16S rRNA and/or WGS datasets that were significantly enriched or depleted in responders (R) versus nonresponders (NR) groups as determined by the random effects meta-analysis model (P ≤ 0.10). To calculate an integrated index value for a given sample, each contributing species i was assigned a weighted coefficient (wi) of −1 or 1, reflecting association with NR or R status, respectively. The integrated index value of a given sample was then calculated using the following weighted summation formula:
where N is the total number of species contributing to the index, and pi is the percent abundance of species i in the sample. The only exception to this was Bacteroides uniformis, which was empirically assigned an adjusted coefficient of −0.1 (rather than −1) to adjust for high abundance across samples (up to 25% relative abundance). This empiric adjustment was made to avoid masking other bacterial signatures which were either more dynamic, low abundance, and/or showed a stronger association with response status. Classification accuracy of R versus NR status for each index was evaluated using ROC curve analysis with calculation of the AUC statistic per dataset. The relative importance of each contributing species to an index was evaluated using leave-one-out cross-validation analysis, in which a species is removed, and AUC performance is recalculated from the index defined by the remaining species. Statistical analysis was performed using the Mann–Whitney test.
Data processing for functional elements
For available WGS data from the Matson, Gopalakrishnan, Frankel, and Routy studies, a functional evaluation was performed using the DIAMOND-translated protein search (v. 0.9.24) against multiple databases for antibiotic resistance (ARDB, CARD, NDARO) and virulence factors (VFDB, Victors, and PATRIC_VF; refs. 29–35). A reportable hit required a match of ≥80% amino acid identity and ≥90% alignment coverage of the query sequence. A differential abundance analysis was performed between R and NR groups for each study. Fisher exact test was used for differential taxa presence between R versus NR, and Mann–Whitney for differential abundance, normalizing for high-quality sequencing depth per sample. For 16S rRNA amplicon data, we performed a PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) analysis (v.1.0.0) based on taxonomic composition. Routy and Frankel WGS datasets were mined for 16S rRNA V4 fragments and subsequently analyzed by PICRUSt as well (36).
Statistical analysis
Statistical significance to confirm previously reported signals was calculated using a negative binomial test in DESeq (v.1.14) to detect low-frequency signals in a subgroup analysis with correction for FDR (37). Prior to downstream statistical comparisons, 16S rRNA taxonomic counts within each study were subsampled without replacement to an even level of coverage (see Supplementary Table S1) and transformed to percent abundance values. PICRUSt counts associated with functional categories were also normalized to an even total per sample within each 16S rRNA study. Species level percentage abundance estimates from Metaphlan2 were used for WGS datasets. Alpha diversity measures were calculated using QIIME and were generated from species level percentage abundance estimates normalized to 10,000 observations per sample across all 16S rRNA and WGS cohorts.
Meta-analyses were performed using the metacont function in the R package Meta (v.4.4; ref. 38) with the Hedges' g standardized mean difference statistic (39), and using fixed and random effects models. The fixed effects model assumes a single shared effect size across all analyzed studies, while the random effects model assumes the effect size may vary from study to study (39); in this analysis, we emphasized the random effects model results. Between-study heterogeneity analysis included estimates of I2 (% of variation estimated as true study-to-study heterogeneity), τ2 (the random effects between study variance estimate), and Cochran Q test P value (39, 40).
Results
Initial analysis of reported bacterial signatures
Tables 1 and 2 display the data from the five cohorts as reported in the original publications. Table 1 shows a summary of each cohort, including the total number of patients defined as R or NR, cancer type, treatment, and criteria used to determine response status, including whether radiographic assessment using the RECIST 1.1 was used to determine classification of patients as R or NR (41). Table 2 displays the microbiome features originally reported as significantly associated with ICI response including the statistical test used to determine association, and whether correction for multiple hypothesis testing was applied. Overall, the majority of these studies used nonparametric testing but did not apply multiple hypothesis testing correction in the initial analysis, with the exception of the Gopalakrishnan cohort (8).
Cohort . | Cancer type . | ICI therapy . | Response criteria . | R vs. NR definition . | Sequencing technology . | Patient numbers (R, NR) . |
---|---|---|---|---|---|---|
Chaput (2017) | Metastatic melanoma | anti-CTLA-4 (ipilimumab) | Immune-related response criteria (irRC) | R = Long-term clinical benefit; | 16S rRNA V3V4 (paired-end MiSeq/Roche454) | n = 26 patientsa (R n = 9; NR n = 17) |
NR = Poor benefit | ||||||
Frankel (2017) | Metastatic melanoma | anti-PD-1 (nivolumab, pembrolizumab); anti-CTLA-4 (ipilimumab) | RECIST 1.1 | R = Response or SD; | WGS paired-end Illumina HiSeq 2000 (2 × 100 bp) | n = 39 patientsa (R n = 24; NR n = 15) |
NR = Progression | ||||||
Gopalakrishnan (2017) | Metastatic melanoma | anti-PD-1 | RECIST 1.1 | R = CR, PR, or SD > 6 mo.; | 16S rRNA V4 Paired-End MiSeq | n = 43 patientsa (R n = 30; NR n = 13) |
NR = PD or SD lasting < 6 mo. | WGS Illumina HiSeq 2000 (2 × 100 bp) | n = 25 patientsb (R n = 14; NR n = 11) | ||||
Routy (2017) | NSCLC/RCC | anti-PD-1 (nivolumab) | RECIST 1.1 | R = PR or SD; | WGS single-end Ion Torrent (150 bp) | NSCLC discovery: n = 60 patientsa (R n = 31; NR n = 29) |
NR = PD or death | RCC discovery: n = 40 patientsa (R n = 25; NR n = 15) | |||||
NSCLC validation: n = 27 patientsa (R n = 10; NR n = 17) | ||||||
RCC validation: n = 26 patientsa (R n = 20; NR n = 6) | ||||||
Total: 153 patientsa (R n = 86; NR n = 67) | ||||||
Matson (2018) | Metastatic melanoma | anti-PD-1 (most pts); | RECIST 1.1 | R = Response; | 16S rRNA V4 Paired-End MiSeq | n = 42 patientsa (R n = 16; NR n = 26) |
anti-CTLA-4 | NR = Progression | WGS Illumina NextSeq 500 (2 × 150bp) | n = 39 patientsb (R n = 15; NR n = 24) |
Cohort . | Cancer type . | ICI therapy . | Response criteria . | R vs. NR definition . | Sequencing technology . | Patient numbers (R, NR) . |
---|---|---|---|---|---|---|
Chaput (2017) | Metastatic melanoma | anti-CTLA-4 (ipilimumab) | Immune-related response criteria (irRC) | R = Long-term clinical benefit; | 16S rRNA V3V4 (paired-end MiSeq/Roche454) | n = 26 patientsa (R n = 9; NR n = 17) |
NR = Poor benefit | ||||||
Frankel (2017) | Metastatic melanoma | anti-PD-1 (nivolumab, pembrolizumab); anti-CTLA-4 (ipilimumab) | RECIST 1.1 | R = Response or SD; | WGS paired-end Illumina HiSeq 2000 (2 × 100 bp) | n = 39 patientsa (R n = 24; NR n = 15) |
NR = Progression | ||||||
Gopalakrishnan (2017) | Metastatic melanoma | anti-PD-1 | RECIST 1.1 | R = CR, PR, or SD > 6 mo.; | 16S rRNA V4 Paired-End MiSeq | n = 43 patientsa (R n = 30; NR n = 13) |
NR = PD or SD lasting < 6 mo. | WGS Illumina HiSeq 2000 (2 × 100 bp) | n = 25 patientsb (R n = 14; NR n = 11) | ||||
Routy (2017) | NSCLC/RCC | anti-PD-1 (nivolumab) | RECIST 1.1 | R = PR or SD; | WGS single-end Ion Torrent (150 bp) | NSCLC discovery: n = 60 patientsa (R n = 31; NR n = 29) |
NR = PD or death | RCC discovery: n = 40 patientsa (R n = 25; NR n = 15) | |||||
NSCLC validation: n = 27 patientsa (R n = 10; NR n = 17) | ||||||
RCC validation: n = 26 patientsa (R n = 20; NR n = 6) | ||||||
Total: 153 patientsa (R n = 86; NR n = 67) | ||||||
Matson (2018) | Metastatic melanoma | anti-PD-1 (most pts); | RECIST 1.1 | R = Response; | 16S rRNA V4 Paired-End MiSeq | n = 42 patientsa (R n = 16; NR n = 26) |
anti-CTLA-4 | NR = Progression | WGS Illumina NextSeq 500 (2 × 150bp) | n = 39 patientsb (R n = 15; NR n = 24) |
Note: Each study is indicated by first author and year of publication. For each study, cancer type, type of ICI therapy, response criteria utilized, definition for R and NR, sequencing technology used, and total number of patients with indicated subsets for R and NR patients are shown.
Abbreviations: CR, complete response; ICI, immune checkpoint inhibitor; mo., months; NR, nonresponder; NSCLC, non–small cell lung cancer; PD, progression of disease; PR, partial response; R, responder; RCC, renal cell carcinoma; RECIST, response evaluation criteria in solid tumors; SD, stable disease; WGS, whole-genome sequencing.
aIndicates unique patients, excluding multiple longitudinal samples from the same patient.
bIndicates the subset of patients in whom parallel 16S rRNA gene and metagenomics sequencing was performed.
Study . | Cancer type . | Microbiome feature (response to ICIs) . | Statistical test (R vs. NR) . | Multiple hypothesis testing correction . | Reported significance . |
---|---|---|---|---|---|
Chaput (2017) | Metastatic melanoma | Faecalibacterium (genus) | Mann–Whitney/Wilcoxon rank sum | No correction | P = 0.009 |
Gemmiger (genus) | P < 0.05 | ||||
Frankel (2017) | Metastatic melanoma | Bacteroides caccae | Kruskal–Wallis (LEfSe) | No correction | P = 0.032 |
Streptococcus parasanguinis | P = 0.048 | ||||
Faecalibacterium prausnitzii | P = 0.032 | ||||
Holdemania filiformis | P = 0.043 | ||||
Bacteroides thetaiotamicron | P = 0.046 | ||||
Gopalakrishnan (2017) | Metastatic melanoma | Inverse Simpson alpha diversity | Mann–Whitney/Wilcoxon rank sum | No correction | P < 0.01 |
Faecalibacterium prausnitzii | FDR | adj. P = 0.046 | |||
Matson (2018) | Metastatic melanoma | Bifidobacterium longum | Nonparametric t test (QIIME) | No correction | P = 0.058 |
Collinsella aerofaciens | P = 0.004 | ||||
Enterococcus faecium | P = 0.001 | ||||
Routy (2017) | NSCLC/renal cell carcinoma | Akkermansia muciniphila | Mann–Whitney/Wilcoxon rank sum | No correction | P = 0.004 |
Study . | Cancer type . | Microbiome feature (response to ICIs) . | Statistical test (R vs. NR) . | Multiple hypothesis testing correction . | Reported significance . |
---|---|---|---|---|---|
Chaput (2017) | Metastatic melanoma | Faecalibacterium (genus) | Mann–Whitney/Wilcoxon rank sum | No correction | P = 0.009 |
Gemmiger (genus) | P < 0.05 | ||||
Frankel (2017) | Metastatic melanoma | Bacteroides caccae | Kruskal–Wallis (LEfSe) | No correction | P = 0.032 |
Streptococcus parasanguinis | P = 0.048 | ||||
Faecalibacterium prausnitzii | P = 0.032 | ||||
Holdemania filiformis | P = 0.043 | ||||
Bacteroides thetaiotamicron | P = 0.046 | ||||
Gopalakrishnan (2017) | Metastatic melanoma | Inverse Simpson alpha diversity | Mann–Whitney/Wilcoxon rank sum | No correction | P < 0.01 |
Faecalibacterium prausnitzii | FDR | adj. P = 0.046 | |||
Matson (2018) | Metastatic melanoma | Bifidobacterium longum | Nonparametric t test (QIIME) | No correction | P = 0.058 |
Collinsella aerofaciens | P = 0.004 | ||||
Enterococcus faecium | P = 0.001 | ||||
Routy (2017) | NSCLC/renal cell carcinoma | Akkermansia muciniphila | Mann–Whitney/Wilcoxon rank sum | No correction | P = 0.004 |
Note: Each study is indicated by the first author and year of publication. Major microbial signatures identified are listed along with statistical test, multiple hypothesis testing adjustment procedure (if applicable), and reported significance level.
We next reanalyzed the raw data from each cohort for taxonomic membership using a standardized bioinformatics methodology for 16S rRNA amplicon and an established pipeline for metagenomic WGS data (see Materials and Methods). To determine whether our taxonomic assignments were consistent across 16S rRNA amplicon and metagenomic WGS datasets, we performed a correlation analysis of taxonomic relative abundance estimates for a subset of samples with both 16S rRNA amplicon and metagenomic sequence data (5, 8), as well as an in silico selection of 16S rRNA fragments directly from WGS data (7). These results indicate that data abstracted from WGS using our standardized methodology for 16S rRNA amplicon data produces comparable relative abundance profiles, including for high-interest species such as Faecalibacterium prausnitzii and Akkermansia muciniphila (Supplementary Fig. S1). Notably, in these datasets, taxa relative abundances that do not correlate across 16S rRNA amplicon and WGS data either represent signals occurring at <0.05% relative abundance or represent two taxa (Ruminococcus bromii and Parabacteroides distasonis) derived from only two of the included studies. In this latter case, WGS data did not detect the species but 16S rRNA amplicon data detected them when >0.1% relative abundance.
When we examined the enrichment of specific species previously reported in each individual study, our results showed variation (Fig. 1). While the majority of species enriched in R corresponded to the previously reported results, for all species level assignments, only a subset of microbial signals retained statistical significance after FDR correction (Supplementary Table S2). Specifically, microbial signals for Faecalibacterium prausnitzii, Bifidobacterium longum, Collinsella aerofaciens, and Enterococcus faecium remained significant using our analysis methodology and FDR correction. In contrast, six additional species reported as significantly associated with ICI response trended toward but were not significantly enriched in R when the data were FDR corrected.
We also reanalyzed each dataset using multiple metrics for alpha diversity (normalized for coverage across all samples), including total observed species and Chao1 to evaluate richness and Shannon and Reciprocal Simpson index for evenness (Supplementary Fig. S2). Although no alpha diversity measure was found to be associated with R using a meta-analysis approach (Supplementary Fig. S2A), we confirmed that higher alpha diversity (richness and evenness) correlated with tumor response to checkpoint inhibition in the Gopalakrishan cohort (Supplementary Fig. S2B; ref. 8), In the Routy cohort, we also observed significantly higher alpha diversity associated with R among observed species and Chao1 estimators, but not Shannon or Reciprocal Simpson measures, potentially indicating an association with richness rather than evenness in R. Individually, upon reanalysis, the Chaput, Frankel, and Matson cohorts did not show any association between alpha diversity and tumor response to checkpoint inhibition.
Consistent computational analysis across cohorts identifies novel bacterial signatures
We next sought to discover more comprehensive microbiome members associated with R or NR using a random effects meta-analysis model for each candidate species (P ≤ 0.1 for inclusion). Using this approach, we identified multiple previously unreported OTUs enriched in R, including Roseburia hominis, Oxalobacter formigenes, and Prevotella buccalis (Fig. 2). We also identified many species enriched in NR: Veillonella parvula, Bacteroides spp. (fragilis, coprocola, thetaiotaomicron, uniformis), Clostridium spp. (hathewayi, hylemonae, methylpentosum), Parasutterella excrementihominis, Scardovia wiggsiae, Oribacterium sinus, and Megasphaera micronuciformi. In addition, we manually selected two bacteria for inclusion: F. prausnitzii since it was reported as associated with ICI response in three of the included cohorts (7–9) and A. muciniphila given high prevalence in multiple cohorts, especially nonmelanoma datasets (Supplementary Table S3; ref. 6). To assess whether signal variation between studies impacted these results, we conducted a heterogeneity analysis using the I2 statistic. All bacterial signatures, whether R or NR (Fig. 2), with the exception of F. prausnitzii and B. fragilis, displayed low heterogeneity (I2 = 0%) across studies. Although F. prausnitzii and B. fragilis showed moderate heterogeneity across studies (I2 = 54% and 26%, respectively), these differences did not reach statistical significance.
Development of indicator indices
These newly identified individual members (Fig. 2) enabled us to reevaluate and develop novel indicator indices for R and NR signals, termed integrated microbiome prediction indices, or “integrated index” for short, along with a composite sum of both signals derived as (R index) − (NR index). The random effects meta-analysis models were significant individually for R- or NR-associated indices and also as a composite, termed “integrated index (sum)”. Unexpectedly, but similar to the individual taxon analyses (Fig. 2), the strongest association emerged with the NR integrated index (Fig. 3, top). We also evaluated two additional indices using bacterial genus or species as previously reported in each cohort (Table 2; Supplementary Fig. S3). Our literature index (Fig. 3, middle) utilized all reported bacterial species in the included cohorts associated with ICI R or NR regardless of tumor type (5–9) whereas our melanoma literature index (Fig. 3, bottom) used only those microbial signals reported in the cohorts of patients with melanoma and excluded Routy and colleagues (5, 7–9) that evaluated individuals with epithelial tumors. When our integrated index, utilizing our newly identified R and NR taxa, was compared with either the literature or melanoma literature indices, the NR and composite signals outperformed the corresponding signals from either the literature or melanoma literature indices. In fact, the NR signal was not significant in either literature-based index. Conversely, the R signal was consistently, but modestly, enriched in the integrated and literature indices, but not significant in the ICI R melanoma literature index.
Furthermore, a sensitivity and specificity analysis suggested that the NR integrated index was strongest in predicting tumor outcome to ICI therapy. The NR integrated index displayed an average AUC of 70.3% (± 1.66) compared with 47.8% (± 5.01, P = 0.0022) and 52.1% (±4.51, P = 0.0087) using the literature and melanoma literature NR indices, respectively (Fig. 4, middle). The composite integrated index also showed modest improvement compared with the literature-based indices (Fig. 4, right column). For each species, we performed a leave-one-out cross-validation in all nine R-associated, NR-associated, and composite sum indices individually to assess the contribution of individual members in each index (Supplementary Table S4). The contribution of each member to the respective index was evaluated by AUC for each cohort as well as for impact on the mean AUC and both fixed and random effects models with corresponding statistical and heterogeneity analysis. First, we evaluated the effect of F. prausnitzii and A. muciniphila removal from the integrated indices individually because these species were manually selected. Removal of F. prausnitzii and A. muciniphila from the R-associated integrated index reduced the overall performance as the mean AUC changed from 61.01% (± 4.06) to 53.36 (± 2.56, P = 0.1320) and 57.82 (± 5.50, P = 0.5887), respectively. However, removal of either species from the composite integrated index did not impact the overall performance of the index, likely, in part, due to the strong NR-associated signal in the integrated index. Second, we evaluated the integrated indices using both the fixed and random effects models. The largest reduction in performance occurred in the random effects model with removal of B. coprocola from the NR-associated and composite integrated index. However, removal of B. coprocola did not appear to affect the mean AUC sensitivity and specificity across all cohorts. Finally, the leave-one-out cross-validation allowed us to evaluate the contribution of low prevalence bacteria in each index. Specifically for the integrated indices, we did not find significant changes when low prevalence bacteria, such as Prevotella buccalis and Oribacterium sinus, were removed from their respective indices. Finally, we performed an analysis wherein we removed patients with stable disease (SD) to see how limiting the analysis to clear R versus NR affected the sensitivity and specificity of each index. Only two cohorts, Frankel and Routy, had metadata for individual patients with an SD designation. The other three cohorts utilized SD criteria to determine R versus NR (Table 1) but did not include SD information for individual patients in either the original publication or publicly available metadata. Overall, excluding SD patients modestly improved the AUC of all integrated, literature, and melanoma literature indices for the Frankel cohort (Supplementary Fig. S4A). In contrast, the AUCs for the Routy NSCLC cohort improved by 7.59%–10.91% for integrated indices, but the Routy RCC cohort results did not change. The overall mean AUC for the integrated indices improved when SD patients were removed from the Frankel and Routy cohorts (Supplementary Fig. S4B).
Analysis of longitudinal fecal samples
Two cohorts, Chaput (n = 23; ref. 9) and Routy (n = 44; ref. 6), reported on paired samples prior to and during treatment (up to 2 months after initiation of ICI treatment). Samples collected at the time of immunotherapy-related adverse events were excluded from this analysis. Thus, we next examined the effect of treatment with checkpoint inhibitors on patient microbiome composition over time using our reanalyzed data. Overall, no differences were identified in beta diversity since, as reported previously (42), longitudinal fecal samples were more similar within patients as compared with between patients, despite some drift with time in the Chaput study (Supplementary Fig. S5A and S5C). We also examined specific taxa in longitudinal samples using differential abundance analysis and found no significant changes over time in either study (Supplementary Fig. S5B and S5D). These results suggest stability of the gut microbiome during treatment with checkpoint inhibitors as described in the original reports.
Impact of antibiotic use on alpha diversity
We next examined the effect of antibiotic exposure on alpha diversity using three cohorts (6, 7, 9). The timeframe used to evaluate antibiotic exposure was variable between studies. Namely, both the Chaput and Frankel cohorts recorded information for any antibiotic exposure prior to and during ICI treatment while the Routy cohort used a range of 2 months prior and/or 1 month following the first dose of ICI treatment. The data showed that alpha diversity was, unexpectedly, not significantly impacted by antibiotic exposure (Supplementary Table S5). Instead, among patients without antibiotic exposure, a trend toward decreased alpha diversity in NR was observed, specifically Chao1 and observed species, suggesting decreased richness rather than evenness as associated with NR.
Functional characterization of the microbiome
We examined the functional gene content for each dataset, namely, R and NR to ICI therapy. We did not observe consistent differences in detection of antibiotic resistance elements or relative abundance rates between R and NR groups within each study (Supplementary Fig. S6A–S6C). We also did not observe consistent differences in the presence or absence of specific organism virulence factors between R and NR groups within each study (Supplementary Fig. S6D–S6F). However, we did find certain bacterial characteristics to be significantly enriched in R or NR using pathway prediction tools (Supplementary Fig. S7). Flagella assembly, bacterial motility proteins, and pantothenate/coenzyme A biosynthesis were enriched in R, along with phagosome and mTOR pathway modifiers. Nucleotide sugar metabolism, nitrogen metabolism, cyanoamino acid metabolism, and glycosphingolipid biosynthesis were all enriched in NR. Of note, butanoate metabolism was not predicted to be enriched in either group. Butanoate is the main biosynthetic pathway for butyrate, a short-chain fatty acid considered essential for immune and gut homeostasis. Butyrate and butyrate-producing bacteria such as F. prausnitzii and R. hominis, have been linked to favorable antitumor responses in multiple studies (43).
Evaluation of the integrated microbiome prediction index in independent validation cohorts
To validate our findings, we tested the integrated microbiome prediction index in three independent cohorts across multiple tumor types. All cohorts with available NCBI accession numbers and associated clinical data at the completion of our initial analysis were used as validation cohorts: Peters and colleagues melanoma (n = 27), Hakozaki and colleagues NSCLC (n = 70), and Zheng and colleagues hepatocellular carcinoma (HCC, n = 8; refs. 18–20). While other studies had been published, the data were not publicly deposited, and therefore, we were unable to include these in our validation studies (44–47). We then examined the sensitivity and specificity of the integrated index in the validation cohorts, in all patients (Fig. 5A) and those without antibiotic exposure (Fig. 5B). R, NR, and sum integrated index values are shown for each validation cohort (Fig 5C and D). Overall, the sum integrated index performed well for the melanoma and HCC cohorts with AUCs > 80 (Fig 5A and B). However, the AUC for the NSCLC cohort did not perform as well but did improve in the NR-associated integrated index when patients with antibiotic exposure were removed (compare middle panels, Fig 5A and B; Fig 5D; P = 0.0032). These data validate the integrated index and further suggest that the integrated index may perform better in specific tumor types and will need to be refined as additional data from further studies and tumor types become available.
Discussion
In this study, using a uniform bioinformatics approach to analyze multiple published datasets, we sought to find bacterial signatures associated with tumor response to checkpoint blockade across five cohorts. We were able to use FDR correction to confirm a subset of reported species associated with R previously highlighted in individual studies. We were able to identify new bacterial signatures associated with clinical tumor response or lack of response that were not reported in the original publications, likely because our analysis allows for comparison of individual species across cohorts and a larger sample size. Of note, we identified several new bacterial signatures enriched in R that have not been previously identified, including Roseburia hominis, Oxalobacter formigenes, and Prevotella buccalis. However, microbial signals associated with NR were overall stronger and more consistent than signals associated with R, a distinct finding from prior reports. Thus, use of our standardized computational pipeline allowed for identification of new, previously unreported NR-associated signals. Combined use of our new microbial R and NR signals resulted in development of NR and composite integrated microbiome prediction indexes that outperformed literature-based indexes when assessed using both a random effects model and a sensitivity/specificity analysis. Furthermore, we were able to validate the integrated microbiome prediction index in independent cohorts of two tumor types, melanoma and HCC, albeit with small cohorts using publicly available data, whereas 16S rRNA sequences derived from patients with NSCLC performed less well overall. However, modest improvement in AUC occurred when individuals with recent antibiotic exposure were removed from the analysis. This result is not unexpected given the dominant microbiome remodeling induced by antibiotics and reinforces recent observations suggesting that recent antibiotic exposure hinders ICI therapeutic responses (48, 49). This integrated index, and in particular, the NR-associated integrated index, should be tested as a biomarker in larger independent clinical cohorts with a specific focus on its performance in identifying patients who respond unfavorably to checkpoint inhibitors. While resistance to ICI therapy can result from both tumor-intrinsic and tumor-extrinsic factors, the strength of the NR-associated microbiome signals indicates there may also be potential for immunomodulation hindering response based on the presence of specific microbes or microbial communities. Our findings are the first step, and further investigation into these mechanisms using translational studies in human cohorts as well as animal models is necessary to define microbe–immune interactions that both promote and hinder patient responses to ICI therapy. If further validated, the integrated NR index may serve to identify candidates for clinical trials of microbiome-based therapeutics.
While identifying novel microbial species affecting outcome to ICI, our results also show several concordances to the original publications. F. prausnitzii and A. muciniphila had been previously identified and we confirmed the enrichment of these species in R though neither species met the initial significance cutoff of P ≤0.10 in the random effects model. In contrast, B. thetaiotaomicron was reported as enriched in R in the Frankel and Matson cohorts (5, 7) whereas, using a consistent reanalysis approach for all cohorts, this species was enriched in NR. While these discrepancies could be due to variation(s) among patient cohorts due to regional or other clinical differences, alternatively, this result may argue for functional redundancy among the members of the Bacteroides genus, and in fact, we found three other members of the Bacteroides genus enriched in NR. Of note, B. thetaiotaomicron is commonly considered a symbiote (50). Yet recent data identified cross-reactivity between a B. thetaiotaomicron β–galactosidase and myosin peptide associated with development of cardiomyopathy in genetically susceptible individuals (51). This result is consistent with the hypothesis that molecular mimicry may be one mechanism by which the microbiome triggers R or NR.
The other newly identified organism in this NR group, V. parvula, is an obligate anaerobe and one of the most abundant organisms in the oral and intestinal microbiome of humans. In addition. V. parvula has been previously linked to disease states and specifically enriched in the lung microbiome of patients with a primary lung malignancy (52–54). Given the consistent enrichment of microbial signals in NR across study cohorts, the NR integrated index may be a better predictor of the outcome of ICI therapy and serve to identify patients for whom alteration of the intestinal microbiome might offer clinical benefit. Moreover, while in general, lower alpha diversity has been associated with disease states and higher alpha diversity with healthy states, we found that alpha diversity was not consistently predictive of response or nonresponse to ICIs, though we confirmed the association of higher alpha diversity in R as reported by Gopalakrishnan and colleagues (8). Geographical variation, cancer treatments, other medications, and technical limitations, as described below, may explain some of the inconsistency among studies.
Our dataset has several limitations. First, the criteria used to determine R versus NR was variable or imprecisely defined among the cohorts and most apparent when considering the status of patients with SD and the duration of disease stabilization used to determine classification as R or NR. Given that patients with SD for an extended time display meaningful improvement, it may be necessary to further categorize SD into different time periods to better assess those with short versus long clinical benefit, but who ultimately progress. In fact, when we were able to exclude SD patients (only the Frankel and Routy cohorts), our overall integrated indices improved, with a notable improvement in the Routy NSCLC cohort. It seems likely that more nuanced analyses as well as larger validation cohorts will help to identify microbial signals predictive of R or NR. Second, our sample analysis was limited by only having data available from pretreatment samples and a lack of information to stratify patients based on class of antibiotic exposure, exposure timeline, and other concurrent medication usage. Several retrospective studies have shown the detrimental effect of antibiotic use on overall survival in patients undergoing treatment with checkpoint inhibitors (48, 49). However, we were able to utilize antibiotic exposure data in the validation cohorts, and additional studies are required to define and validate the impact of antibiotic use on microbial diversity and clinical outcomes. Third, many of these studies utilized ICIs as monotherapy in a heavily pretreated population whereas current standard first-line therapy often includes a combination of ICIs or ICI with chemotherapy for multiple tumor types (55–60). Our analysis is thereby limited by lack of data on prior and concurrent therapies at the time of sample collection. These rapid shifts in clinical care speak to the need for additional, contemporaneous analyses of the microbiome and ICI therapeutic outcomes. Fourth, regional differences in baseline microbiota due to diet or other clinical features could affect the comparison of signals across cohorts. For example, the genus Bacteroides in human stools is often more abundant in patients in the United States compared with global levels while the genus Prevotella is more abundant outside the United States (61). While a more universal microbial signal is ideal for clinical application, the baseline variation could mask some signals. However, we note that the consistent signals identified in this study across multiple geographies suggest the signals we identified are robust. Finally, technical limitations can introduce bias, as most cohorts used different primer sets for 16S rRNA amplicon sequencing (V3V4 vs. V4), variable protocols for DNA extraction and library preparation, and multiple sequencing platforms.
In summary, use of a consistent analysis methodology improved on existing pipelines by standardizing critical preprocessing and downstream analysis tools, enabling comprehensive evaluations of taxonomic and functional signals across sequencing datasets that represent patients with melanoma, lung cancer, and RCC. Given the multitude of bioinformatic tools and inherent variability in microbiome data, we recognize the need for consistent, statistically rigorous methods to facilitate the identification of robust microbial signals across study populations. Our analyses enabled identification of novel bacterial signatures associated with clinical R and NR to ICI therapy that were not detected in smaller cohorts, and in particular, an unexpected strong microbiome association with ICI NR. In this combined dataset, the microbiome signals correlating with nonresponse were dominant, suggesting these microbes provide a potentially powerful biomarker for the selection of patients who might not ordinarily respond to checkpoint blockade but might benefit from microbiome-based interventions that reduce or eliminate specific species. The results from the analyses herein require validation and correlation with clinical outcomes in larger, ongoing prospective cohorts of individuals undergoing ICI therapy. We anticipate that integration of our results into prospective analyses will assist in facilitating identification of key microbial factors consistently influencing tumor outcomes.
Authors’ Disclosures
F.Y. Shaikh reports grants from Brystol Myers Squibb during the conduct of the study. J.R. White reports personal fees from Resphera Biosciences LLC outside the submitted work. J.J. Gills reports grants from Bloomberg Kimmel Foundation for Cancer Immunotherapy during the conduct of the study. Y. Okuma reports grants from AbbVie G.K.; grants and personal fees from Chugai Pharmaceutical Co., Ltd.; and personal fees from AstraZeneca, Ono Pharmaceutical Co., Ltd., Bristol Myers Squibb, and Eli Lilly outside the submitted work. J.S. Weber reports personal fees from BMS, Merck, Regeneron, Celldex, Genentech, AstraZeneca, CytoMx, Biond, and Pfizer during the conduct of the study. In addition, J.S. Weber reports is named on a patent for a PD-1 biomarker by Biodesix issued and named on a patent for a CTLA-4 biomarker by Moffitt Cancer Center issued, and equity in Biond, CytoMx, Immunimax and Neximmune. E.J. Lipson reports grants and personal fees from Bristol-Myers Squibb, Merck, and Sanofi/Regeneron, as well as personal fees from Novartis, EMD Serono, Array BioPharma, Macrogenics, Genentech, and Odonate Therapeutics outside the submitted work. J. Naidoo reports grants from Lung Cancer Foundation of America and International Association for the Study of Lung Cancer during the conduct of the study. In addition, J. Naidoo reports grants and personal fees from AstraZeneca, Merck, and Bristol Myers Squibb, as well as personal fees from Daiichi Sankyo, Pfizer, and Roche/Genentech outside the submitted work. D.M. Pardoll reports a patent for BMS issued to BMS. C.L. Sears reports grants from Bloomberg Philanthropies and Bristol Myers Squibb during the conduct of the study, as well as grants from Janssen outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
F.Y. Shaikh: Conceptualization, data curation, visualization, writing–original draft, writing–review and editing. J.R. White: Data curation, formal analysis, validation, visualization, methodology. J.J. Gills: Conceptualization, writing–review and editing. T. Hakozaki: Resources, validation. C. Richard: Resources, validation. B. Routy: Resources, validation. Y. Okuma: Resources, validation. M. Usyk: Resources, validation. A. Pandey: Resources, validation. J.S. Weber: Resources, validation. J. Ahn: Resources, validation. E.J. Lipson: Conceptualization, writing–review and editing. J. Naidoo: Conceptualization, writing–review and editing. D.M. Pardoll: Conceptualization, resources, supervision, funding acquisition, writing–review and editing. C.L. Sears: Conceptualization, resources, supervision, funding acquisition, writing–original draft, writing–review and editing.
Acknowledgments
This work was funded by the Bloomberg∼Kimmel Institute for Cancer Immunotherapy (BKI) and a research grant from Bristol Myers Squibb. F.Y. Shaikh was supported by NIH T32CA009071. J. Naidoo was supported by International Association for the Study of Lung Cancer (IASLC), Lung Cancer Foundation of America, NCATS KL2TR001077, and Johns Hopkins Institute for Clinical and Translational Research (ICTR). E.J. Lipson was supported by the BKI, the Barney Family Foundation, Moving for Melanoma of Delaware, The Laverna Hahn Charitable Trust, and NCI P30 CA006973.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.