Real-world Validation of TMB and Microsatellite Instability as Predictive Biomarkers of Immune Checkpoint Inhibitor Effectiveness in Advanced Gastroesophageal Cancer

Patients with advanced gastroesophageal cancer (mEG) and tumor mutational burden ≥10 mut/Mb (TMB ≥ 10) have more favorable outcomes on immune checkpoint inhibitor (ICPI) monotherapy compared with chemotherapy in subgroup analyses of randomized controlled trials. We sought to evaluate the robustness of these associations in real-world settings where patients and practices are more diverse. A total of 362 2 L and 692 1 L patients, respectively received ICPI (n = 99, 33) or chemotherapy (n = 263, 659) across approximately 280 U.S. academic or community-based cancer clinics March 2014–July 2021. Deidentified data were captured into a real-world clinico-genomic database. All patients underwent Foundation Medicine testing. Time to next treatment (TTNT) and overall survival (OS) comparing ICPI versus chemotherapy were adjusted for treatment assignment imbalances using propensity scores. 2L: TMB ≥ 10 had more favorable TTNT [median 24 vs. 4.1 months; HR: 0.19; 95% confidence interval (CI): 0.09–0.44; P = 0.0001] and OS (median 43.1 vs. 6.2 months; HR: 0.24; 95% CI: 0.011–0.54; P = 0.0005), TMB < 10 did not (P > 0.05). 1L: TMB ≥ 10 had more favorable TTNT (not reached vs. median 4.1 months; HR: 0.13; 95% CI: 0.03–0.48; P = 0.0024) and OS (not reached vs. median 17.1 months; HR: 0.30; 95% CI: 0.08–1.14; P = 0.078), TMB < 10 had less favorable TTNT (median 2.8 vs. 6.5 months; HR: 2.36; 95% CI: 1.25–4.45; P = 0.008) and OS (median 4.5 vs. 13.1 months; HR: 1.82, 95% CI: 0.87–3.81; P = 0.11). TMB ≥ 10 robustly identifies patients with mEG with more favorable outcomes on 2 L ICPI monotherapy versus chemotherapy. 1 L data are more limited, but effects are consistent with 2L. Significance: Using real-world data, we sought to evaluate robustness of these clinical associations using the same assay platform and biomarker cut-off point used in both clinical trials and pan-tumor CDx approvals for later treatment lines. TMB ≥ 10 robustly identified patients with mEG with more favorable outcomes on ICPI monotherapy versus chemotherapy and suggests this subset of patients could be targeted for further trial development.


Introduction
The treatment landscape of gastroesophageal adenocarcinoma has rapidly evolved in the last several years, with immune checkpoint inhibitors (ICPI) becoming an integral component of standard therapies. The recent random-for more rigorous assessments with other candidate biomarkers, such as tumor mutational burden (TMB) and microsatellite instability high (MSI-H; ref. 4). Currently, the National Comprehensive Cancer Network does not endorse single-agent ICPI without chemotherapy as an option in first-line settings. ICPI monotherapy is listed as an option for the second-line and beyond (5) for patients who are mismatch repair deficient or MSI-H (6) or have TMB ≥ 10 mutations/megabase (mut/Mb; ref. 7).
Patients who enroll in clinical trials are often healthier and have better socioeconomic status compared to the overall patient population (12), raising the importance of drug and biomarker effectiveness evaluations reflecting broader real-world patient populations and treatment practices (13). For these reasons, we sought to compare the outcomes of real-world patients on ICPI monotherapy versus chemotherapy stratified by biomarkers. We focused primarily on TMB, given existing predictive clinical validity (14) and cross-platform harmonizations (15), using the same platform (Foundation Medicine) and cutoff of 10 mut/Mb as the existing FDA CDx for pembrolizumab (14).

Real-world Analysis Design
Observational data analyses require special considerations when assessing the validity of biomarkers or effectiveness of drugs. To this end, we employed two complementary techniques: propensity analyses and crossover analyses, with interpretations resting on consistency of observations across different cohorts and methods of evaluation, similar to prior real-world analyses in metastatic prostate cancer (16).
In randomized controlled trials (RCT), treatment assignments are randomized among otherwise homogeneous patient groups, ensuring that between-group differences represent causal effects of treatment. In real-world settings, the treatment assignment is at the discretion of the treating physician who uses clinical factors and best judgment to guide treatment choice. Causal inference approaches (using techniques such as propensity scores) explicitly adjust imbalances in the clinical and pathological factors that influence these decisions by variably weighting or excluding patients from analyses (17). Causal inference enables direct comparison of drug effectiveness in biomarker-defined groups of patients in specific clinical scenarios, but is dependent upon careful adjustment of treatment assignment biases, and cannot adjust for unquantifiable factors influencing treatment decisions or unknown confounders (17). Crossover analyses comparing the drug effectiveness within the same patient, such as when Drug B immediately follows Drug A, overcome these caveats, but at the cost of losing some clinical context and likely bias of disadvantage to Drug B, and preventing OS analyses. Evaluating biomarker-identified populations with potential enhanced effectiveness of Drug B relative to Drug A are valid, but not the opposite. Statistical Analysis and Interpretations (see below) make use of both causal inference and crossover analyses in tandem.
We first evaluated ICPI versus chemotherapy effectiveness between patients in second-line settings, stratified by TMB, then evaluated whether patients with TMB ≥ 10 mut/Mb had enhanced relative effectiveness of second-line ICPI when used after first-line chemotherapy within the same patients. Finally, we evaluated the outcomes of patients who received first-line ICPI in comparison with standard first-line chemotherapy (see Fig. 1 for cohort overviews).

Patient Selection
The study comprised patients with confirmed diagnosis of gastroesophageal adenocarcinoma (gastric, esophageal, or gastroesophageal junction) included in the U.S.-wide Flatiron Health-Foundation Medicine (FMI) deidentified clinico-genomic database between January 2011 and June 2021. All patients underwent genomic testing using Foundation Medicine comprehensive genomic profiling (CGP) assays. Deidentified clinical data originated from approximately 280 U.S. cancer clinics (∼800 sites of care). Retrospective longitudinal clinical data were derived from electronic health records (EHR), comprising patient-level structured and unstructured data, curated via technology-enabled abstraction of clinical notes and radiology and/or pathology reports, which were linked to genomic data derived from Foundation Medicine testing by deidentified, deterministic matching (18). Clinical data included demographics, clinical and laboratory features, timing of treatment exposure, and survival.
Patient records were included in this study if they received a first-or second-line single-agent anti-PD1 or anti-PD-L1 agent or standard chemotherapy (platinum regimens for first-line, non-platinum regimens for second line) in the metastatic setting and had TMB assessed via a tissue specimen. Patients who received both ICPI and chemotherapy in combination at the same time were not included because of low numbers. Patients must have additionally tested negative for ERBB amplification via CGP and not have received an anti-HER2 agent. Analyses were conducted in three cohorts ( Fig. 1): 2 L Comparative Effectiveness Cohort: Patients who received a single-agent ICPI or non-platinum chemotherapy in second line.
Sequential Cohort: Patients who received platinum chemotherapy in first line, and received a single-agent ICPI in second line.
1 L Comparative Effectiveness Cohort: Patients who received either a singleagent ICPI, or a platinum-containing chemotherapy regimen in the first line.
Institutional Review Board approval of the study protocol was obtained prior to study conduct and included a waiver of informed consent.

Comprehensive Genomic Profiling
Hybrid capture-based next-generation sequencing (NGS) assays were performed on patient tumor specimens in Clinical Laboratory Improvement Amendments-certified, College of American Pathologists-accredited laboratory (Foundation Medicine, Cambridge, MA). Samples were evaluated for alterations as described previously (19). TMB was determined on up to 1.1 Mb of sequenced DNA (20). MSI status was determined on 95-114 loci, as described previously (21).

Outcomes
Time to next treatment (TTNT), like PFS, is a time-to-event proxy for drug clinical effectiveness (22). TTNT was calculated from treatment start date until the start of next treatment line (due to any cause), or death. Patients not yet reaching next treatment line or death were right censored at date of last clinical visit, laboratory result, or medication order. OS was calculated from start of treatment to death from any cause, and patients with no record of mortality were right censored at the date of last clinic visit or structured activity. Because patients cannot enter the database until a CGP report is delivered, OS risk intervals were left truncated to the date of CGP report to account for immortal time (23,24). Flatiron Health database mortality information is a composite derived from three sources: documents within the EHR, Social Security Death Index, and a commercial death dataset mining data from obituaries and funeral homes. This mortality information has been externally validated in comparison to the National Death Index with >90% accuracy (25).

Statistical Analysis and Interpretations
A prospectively declared statistical analysis plan was developed and executed. Consistent with ISPOR guidelines (26), the inclusion criteria, exclusion criteria, potential biases, primary outcome measures, exploratory outcome measures, handling of missing data, and all methods described below were prospectively specified prior to analysis execution unless otherwise noted. The prespecified analysis compared the effectiveness of ICPI versus chemotherapy in patients stratified by TMB < 10 mut/Mb and TMB ≥ 10 mut/Mb. Stratification by MSI status and PD-L1 status was included as a secondary analysis. Stratification by biomarkers other than TMB, MSI, and PD-L1 were not prespecified.
Differences in time-to-event outcomes were assessed with the log-rank test and Cox proportional hazard (PH) models. χ 2 tests and Wilcoxon rank-sum tests were used to assess differences between groups of categorical and continuous variables, respectively. Multiple comparison adjustments were not performed; P values are reported to quantify the strength of association for biomarker and each outcome, not for null hypothesis significance testing, and interpretations are adopted broadly considering consistency of multiple outcome measures in concert (TTNT, OS) across defined cohorts (interpatient vs. intrapatient) with no outcome measure or cohort standing on its own. The default interpretation is that a biomarker correlating with OS but not TTNT within a cohort is likely a confounding artifact, and a biomarker correlating with TTNT but not OS is not remarkable. In addition, while the effect size estimates may vary by cohort, the default assumption is that a biomarker effect should not be specific to any of the cohorts evaluated. Missing values were handled by simple imputation with expected values determined using random forests with the R package "missForest." In subsequent analyses, imputed values were treated identically to measured values.

AACRJournals.org
Cancer Res Commun; 2(9) September 2022 Propensity analyses made use of inverse probability of treatment weights targeting the average treatment effect in the ICPI-treated population, implemented with R package "MatchIt." Weights were capped at 10 equivalents to limit influence per observation. Among patients receiving chemotherapy, those with characteristics most similar to the ICPI patient population were weighted more, and those less like the ICPI patients weighted less. These weights were included in all Kaplan-Meier visualizations and Cox PH models, unless otherwise noted.
To focus on the 10 mut/Mb threshold, propensity weights were created separately for TMB ≥ 10 group (TMB-high) and TMB < 10 (TMB-low) group for best possible within-group balance. Predictive biomarker associations (28) made use of inverse-propensity weighted multivariable Cox proportional hazards regression models containing at minimum the following variables: drug class (ICPI or taxane), TMB (high vs. low), and the interaction term between drug class and biomarker. Models evaluating intra-patient treatment interactions in the Sequential Cohort were additionally clustered on the individual patient, making use of robust variances calculated by generalized estimating equations within a working independence structure. HRs were then generated from adjusted Cox models stratified by group (i.e., TMB high vs. low). R version 3.6.3 software was used for all statistical analyses.
Non-proportional hazards over time between treatment groups can limit the interpretability of the HR as an effect measure. For this reason, prespecified methods of HR estimates were augmented with analyses of 3-year restricted mean survival times (29,30)) calculated as the area under the stratum-specific Kaplan-Meier curves at 1-month intervals up to 36 months. In an effort to assess the contribution of TMB in microsatellite stable (mss) tumors, we performed an additional exploratory analysis combining patients from the 1 L and 2 L Comparative Effectiveness Cohorts, limiting to those testing MSS only. For this analysis, an adjusted HR was generated per subgroup that contained adjustment for prognostic factors including line of therapy (1 L vs. 2L), age, ECOG (0-2 vs. 3+), abnormal labs (bilirubin above ULN, albumin below ULN, or Creatinine above ULN), stage at diagnosis (stage IV vs. not), and body mass index (BMI).

Data Availability
The data supporting the findings of this study originated from Flatiron Health, Inc. and Foundation Medicine, Inc. These deidentified data may be made available upon request, and are subject to a license agreement with Flatiron Health and Foundation Medicine; interested researchers should contact <cgdb-fmi@flatiron.com> and <dataaccess@flatiron.com> to determine licensing terms.

Characteristics of Analysis Cohorts
2 L Comparative Effectiveness Cohort: After selection 263 patients received 2 L non-platinum chemotherapy, and 99 patients received 2 L ICPI ( Fig. 1; Supplementary Table S1). No differences with P < 0.05 were observed by treatment group for age, sex, stage at diagnosis, smoking status, prior surgery, ECOG, albumin, or bilirubin. However, patients receiving 2 L ICPI had higher TMB (P < 0.001), PD-L1 CPS scores (P < 0.001), and more frequent MSI-H (P < 0.001). The median TMB of the cohort was 3.8, with interquartile range of 1.7-6.3 (Supplementary Table S1), and is consistent with prior reports in this population (31).

Real-world Patients Receiving Second-line ICPI Monotherapy Versus Chemotherapy Have More
Favorable Outcomes When TMB ≥ 10 but not TMB < 10 In the TMB < 10 subgroup, all features had SMD < 10% after weighting, with the exception of a bias toward higher TMB among patients receiving ICPI (Supplementary Fig. S1A). In the TMB ≥ 10 subgroup, while imbalances were greatly reduced, residual imbalances of SMD ≥ 10% were present, such that patients receiving ICPI were less likely to have had surgery, and more likely to be stage IV at diagnosis (Supplementary Fig. S1B-D).  Fig. 2; Supplementary Fig. S2 and S3). Sensitivity analyses unadjusted for imbalances show similar results ( Supplementary Fig. S4). RSMT analyses additionally show similar results (Supplementary Table S7). Analyses unadjusted for propensity weights have similar results (see Supplementary Fig. S2). Interaction terms in interaction models (see Materials and Methods) for TTNT and OS, respectively. P < 0.0001 and 0.0028 (for full models, see Supplementary Fig. S4).

TMB and MSI-H are Stronger Predictive Biomarkers for ICPI Monotherapy Versus Chemotherapy Benefit Than PD-L1
PD-L1 CPS ≥ 5 does not have particular enrichment in patients with high TMB (Supplementary Table S8), suggesting a degree of biomarker independence in these cohorts. PD-L1 scoring was not available for 47% of the second-line comparative effectiveness cohort (Supplementary Table S1) and for 40% of the sequential cohort (Supplementary Table S2). However, the absolute number of patients with PD-L1 CPS ≥ 10 was comparable and relative prevalence of PD-L1 CPS ≥ 10 was much higher than TMB ≥ 10 or MSI-H. In the 2 L ICPI versus chemotherapy cohort as well as the sequential analysis, patients identified by TMB ≥ 10 and/or MSI-H had similar prevalence, and enrichment for observed outcomes favoring ICPI versus chemotherapy (Fig. 4A-C; Supplementary   Fig. S7). We observed a modest enrichment favoring ICPI versus chemotherapy in patients identified by PD-L1 CPS ≥ 10 in TTNT of the sequential cohort, but not for TTNT or OS in the second-line comparative effectiveness cohort.
The additive predictive associations of TMB ≥ 10 and PD-L1 CPS ≥ 10 were evaluated with multivariable models containing multiple interaction terms ( Supplementary Fig. S8). We did not observe a reduction in TMB ≥ 10 pre-dictive associations with inconsistent or weak additional predictive association of PD-L1 CPS ≥ 10.

Adjusting for Known Treatment Assignment Imbalances, Patients Receiving First-line ICPI Versus Chemotherapy
Have More Favorable Outcomes When TMB ≥ 10 but not TMB < 10 In the TMB < 10 subgroup, residual imbalances remained such that patients receiving ICPI were more likely to have had surgery, and have marginally more likely to have PD-L1 CPS ≥ 5, ECOG 3 or higher, be older, or have abnormal labs. In the TMB ≥ 10 subgroup, patients receiving ICPI were marginally more likely to have been stage IV at diagnosis (Supplementary Fig. S1). However, due to only 7 first-line patients with TMB ≥ 10 receiving ICPI, results must be interpreted cautiously.  Fig. S3). RSMT analyses additionally show similar results (Supplementary Table S7; Supplementary Fig. S9 and S10).

Real-world Data Parallel RCT Data in 1L and 2L Gastroesophageal Adenocarcinomas Despite Differences in Patient Populations
Using ECOG performance scores as a proxy for overall patient frailty across cohorts, the broad cohort characteristics from the phase III Keynote-061 comparing second-line pembrolizumab with paclitaxel (9), the phase III Keynote-062 first-line pembrolizumab with platinum chemotherapy (3) are compared with the second-and first-line comparative effectiveness cohorts (Fig. 6A). The OS subgroup analysis for TMB subgroups in KeyNote-061 (10), and KeyNote-062 were reported as a post hoc analyses (11). These subgroup analyses are shown alongside the intrapatient OS assessments of the secondand first-line comparative effectiveness cohorts (Fig. 6B). Taken together, these findings help place our observations in the context of prospective randomized data. (aHR: 0.35; 95% CI: 0.13-0.96; P = 0.042) and comparable OS (aHR: 0.78; 95% CI: 0.33-1.84; P = 0.563; Fig. 7).

Discussion
The treatment of gastroesophageal cancers is rapidly evolving with several recent phase III trials examining the role of ICPI added to frontline chemotherapy. Within gastric and gastroesophageal junction (GEJ) adenocarcinomas, the totality of the frontline data suggests that patients with a PD-L1 CPS of less than 5 derive no benefit from the addition of anti-PD-1 antibodies to standard chemotherapy. In fact, the European Medicines Agency has restricted the frontline nivolumab approval to patients with a PD-L1 CPS of 5 or greater. With 40%-60% of patients having a CPS of 5 or less, there will remain a large portion of ICPI naïve patients entering the second line. While the phase III KeyNote-061 second-line trial comparing paclitaxel versus pembrolizumab was overall negative, subgroup analyses suggest that high PD-L1-expressing and high TMB patients may benefit from ICPI over chemotherapy. Thus, there remains potential opportunities for ICPI development in 2 L and beyond, particularly using a biomarker-enriched approach.   (Fig. 3).
Additional exploratory analyses considering only patients testing MSS, combining the 1 L and 2 L Comparative Effectiveness cohorts (Fig. 7  frontline patients who could be considered for ICPI-based approaches or future trials. We were able to evaluate a small cohort of first-line patients who received single-agent ICPI in comparison with patients receiving standard chemotherapy (Fig. 5). While this cohort is underpowered to draw precise conclusions, the directionality and magnitude of the point estimates for comparative drug effectiveness in TMB-stratified patients is consistent with the second-line and sequential cohorts (Figs. 2-4) and subgroup analysis of secondline phase III RCT (ref. 14; Fig. 6). Our data hint that a frontline trial of ICPI versus chemotherapy selected for TMB ≥ 10 might be positive for a first-line chemotherapy-sparing regimen; however, the limited sample size (n = 7 ICPI, 59 chemotherapy) must be considered.
We took multiple steps to try and address the limitations of our dataset. Treatment assignments were at the discretion of the clinician, and while biases were carefully considered, known imbalances adjusted, and interpreted in concert with interpatient analyses and compared with existing RCT subgroup analyses, unknown confounders may remain. Although TTNT is a validated real-world data (RWD) measure we recognize that it is possible TTNT may overestimate PFS as there may be a clinical lag time from the time of imaging-based progression to actual initiation of the next treatment. We did not have direct measures of total disease burden at times of treatment. While correlated factors like ECOG were adjusted, ECOG is an imperfect proxy for patient frailty and disease extent which are not fully captured in our dataset. We recognize that TMB calculation can vary considerably by panel size, gene content and bioinformatic filtering (15) and the use of the only FDA-approved TMB companion diagnostic and threshold is a strength of our approach.

Conclusions
TMB ≥ 10 robustly identifies patients with metastatic gastroesophageal who have more favorable outcomes on first-line single-agent ICPI compared with chemotherapy in patient populations and treatment settings more diverse than registrational clinical trials. While first-line data are limited, the effects seen are not inconsistent with second-line observations. Given the high unmet need in gastroesophageal cancers, consideration for biomarker-enriched prospective