Characterization of Non-Monotonic Relationships between Tumor Mutational Burden and Clinical Outcomes

Abstract Potential clinical biomarkers are often assessed with Cox regressions or their ability to differentiate two groups of patients based on a single cutoff. However, both of these approaches assume a monotonic relationship between the potential biomarker and survival. Tumor mutational burden (TMB) is currently being studied as a predictive biomarker for immunotherapy, and a single cutoff is often used to divide patients. In this study, we introduce a two-cutoff approach that allows splitting of patients when a non-monotonic relationship is present and explore the use of neural networks to model more complex relationships of TMB to outcome data. Using real-world data, we find that while in most cases the true relationship between TMB and survival appears monotonic, that is not always the case and researchers should be made aware of this possibility. Significance: When a non-monotonic relationship to survival is present it is not possible to divide patients by a single value of a predictor. Neural networks allow for complex transformations and can be used to correctly split patients when a non-monotonic relationship is present.


INTRODUCTION
When searching for features predictive of survival across different cancer types researchers often use Cox regression [1][2][3] .The Cox model provides a high level of interpretability in the form of the regression coefficients, but these coefficients simply describe the linear relationship of the feature to the predicted log partial hazard 4 .Often there isn't a clear reason to assume a linear (and more importantly monotonic) relationship; furthermore, neural nets have previously been proposed as a more flexible approach in this context 5 , with some recent applications appearing in the literature [6][7][8][9] .
Tumor mutational burden (TMB) is a commonly investigated biomarker in the context of immunotherapy [10][11][12][13][14][15][16] , and its prognostic value has also been investigated in the context of heterogeneous treatments 17,18 .When investigating TMB as a biomarker researchers often bin patients into a "TMB low" group and a "TMB high" group, which implicitly assumes a monotonic relationship of TMB with survival, independent of the number of bins used.In this case the relationship is assumed to be a step function with patients below a certain threshold having a certain risk and patients above the threshold having another.The monotonic assumption is that change in risk only increases (or only decreases) with the value of the predictor variable in question.
Similar to how we previously leveraged the flexibility of neural networks to more optimally model the calibration of next-generation sequencing (NGS) gene panel-derived estimates of exomic TMB 19 , we wondered whether such flexible modeling approaches could be applied to characterizing the relationship of TMB with clinical outcomes data.While a single cutoff approach may work well for monotonic relationships, it would be expected to be suboptimal in the case of a truly non-montonic relationship.In such a scenario the lowest and highest TMB values would have similar risk, and the moderate values would be associated with a different risk.In this study we explored different approaches to attempt to better characterize these more complex relationships and investigated whether such relationships exist in the context of TMB and clinical outcomes data, both in the prognostic sense and predictive sense.

RESULTS
Given the long right-tailed distribution of TMB (Supplemental Figure 1A), when modeling the relationship of this data to the log partial hazard some form of a transformation will generally be required, since these extreme values multiplied by the estimated model coefficient will generate unrealistic partial hazards.We can demonstrate this by fitting a Cox model to untransformed TMB values for uterine corpus endometrial carcinoma (UCEC) and the corresponding survival data in the The Cancer Genome Atlas (TCGA) dataset 20 .Generating predicted survival curves for several different TMB values (Supplemental Figure 1B), we see that what the field would consider large differences in TMB result in minimal differences in the curves, and at the highest TMB values patients are predicted to survive longer than humanly possible.

Approaches with Simulated Data
One common approach for transforming TMB is to define a "TMB low" group and a "TMB high" group, which can be described as transformation via a step function with a given cutoff.This cutoff can be defined by taking the median TMB value, or some other percentile of the data; alternatively, a cutoff can be found that optimizes some metric difference between the two groups.We demonstrate what the optimal cutoff approach looks like for simulated survival data with a monotonic relationship (Figure 1A, B).In this simulated data the TMB values are real (the values of the UCEC TCGA samples), but the survival data is generated according to a simple linear mapping of logged TMB values to log risks (representing a monotonic relationship).As expected, in every simulation a single cutoff approach correctly identified a "TMB high" group with a significant hazard ratio.
While a single cutoff approach works well for this data, we wondered what would happen with a simple but nonmonotomic relationship.Further, we wanted to understand what approach could be used to identify this type of relationship.Instead of searching for a single cutoff we decided to search for two cutoffs, and merge the lowest and highest groups into a single group.Two cutoffs have been proposed before 21 , but in those contexts three groups were generated where the "TMB mid" group was expected to have a risk in between that of "TMB low" and "TMB high", which retains the monotonic nature of the relationship.In contrast, in our two-cutoff approach we assign the same risk to the "TMB low" and "TMB high" groups with the "TMB mid" displaying a different survival risk, which represents a non-monotonic relationship that some describe as a "Goldilocks effect".In the case of a monotonic relationship, we would expect grouping the highest and lowest values together would result in poor correlation, and when we applied the new two-cutoff approach to the monotonic data the statistical significance was often lower than the single-cutoff approach (Figure 1C), with the "TMB mid" group often having the higher hazard ratio.However, in the case of a non-monotonic relationship, grouping the lowest and highest values is exactly what should be done and when these two different approaches were applied to data simulating a non-monotonic relationship of TMB with survival we found the two-cutoff approach correctly associated a middle group with a larger hazard ratio, and typically did so with greater statistical significance than the single-cutoff approach (Figure 1D, E, F).  .Cutoff analysis with simulated data.15 simulated survival datasets were generated for either a monotonic relationship with TMB (A) or a non-monotonic relationship (D).B and E show the hazard ratios and associated log-likelihood ratio tests and associated cutoffs of searching for an optimal cutoff with either the monotonic relationship or non-monotonic relationship, respectively, while C and F show the search for the optimal 2 cutoffs of our proposed approach with a monotonic or non-monotonic relationship, respectively.In A, B the TMB distribution is shown as a rug plot.
These results give a potential framework for identifying non-monotonic relationships of an input variable to survival-if the single-cutoff approach has a better test statistic then the relationship is possibly monotonic, and if the two-cutoff approach has a better test statistic the relationship is possibly non-monotonic.However, while we generally saw this expected pattern in the simulated data it was not always the case, and this heuristic doesn't reveal the true underlying relationship, such as a linear relationship versus a step function or others.Instead of comparing different transformation strategies (one cutoff versus two), we can simply allow a fully connected neural network (FCN) to learn the relationship between predictor variable and outcome measures directly from the available data.
We looked at two of the simulated survival datasets for both the linear and non-monotonic relationships, and compared the results of a fully connected network (FCN) to a Cox regression implemented with lifelines.To compare the model fits we looked at the C-index and log-likelihood (a larger value is better in both cases) for data in the test folds of 10 stratified K-folds, and also compared these metrics to what would be obtained with the ground truth relationship (Table 1).With regards to the linear data both the Cox model and the FCN had metrics nearly indistinguishable from the ground truth, but for the non-monotonic data the FCN had noticeably better metrics than the Cox model.Notably, in the first dataset for the non-monotonic data the two-cutoff approach had previously obtained a lower test statistic than the single-cutoff approach and would have incorrectly inferred the relationship while the relationship is clear with an FCN.
We also visualized the model fits of the FCNs, Cox regressions, and the true underlying relationships (Figure 2).Despite having a large number of parameters our FCN produced a very similar fit to a Cox regression for the linear data, with both closely following the true relationship (Figure 2A, B).Visually, the extra parameters of the FCN allowed it to correctly follow the shape of the non-monotonic data (Figure 2C, D), while the Cox model was a poor fit to the data, which is consistent with the model metrics.

Applying Neural Networks to the TCGA
The simulated data gave us confidence that if the relationship strongly deviates from a monotonic relationship, then an FCN will be able to better model this and this would be reflected in the C-index and model log-likelihood metrics.Our next question was whether this tool would allow us to find evidence for such "Goldilocks effects" in real-world datasets.
Using the mutation call and survival data for solid tumors in the TCGA dataset, we compared the model metrics of a Cox model and our FCN (Table 2).
In most cancer types there was limited difference between a Cox model and an FCN, suggesting the relationship likely is monotonic (or in some cases simply no relationship).However, in SKCM the FCN displayed a noticeably better log likelihood and C-index.SKCM is a cancer type where immunotherapy is recommended and where TMB hase been suggested as predictive biomarker.Looking at some of the model fits we see tumor types where the FCN predicted a perfectly linear relationship and followed the Cox predictions, cases where the FCN has slight deviations from the Cox predictions, and then SKCM where the FCN predicts a strong concave up relationship (Figure 3).

Applying Neural Networks to Immunotherapy Datasets
The TCGA dataset contains patients receiving the standard of care at the time, and includes patients at different stages of disease.In this heterogeneous context, exploring the relationship of TMB with survival is more closely associated with some form of prognostication rather than predicting response to a given therapeutic modality.In contrast, when investigating a cohort in which a specific treatment has been administered we generally are considering a biomarker that is predictive of response.It is in this context that TMB has often been explored as a biomarker, and many approaches effectively attempt to stratify the cohort into "TMB low" and "TMB high" and compare clinical outcome measures  (response, disease-free survival, etc).Given our findings of a non-monotonic relationship of survival for SKCM in the TCGA dataset we wondered if we would identify non-monotonic relationships in other datasets, either in a prognostic or predictive context.

5/10
The AACR Project GENIE Biopharma Collaborative (BPC) recently released clinical data for lung and colon cancer patients.Using the corresponding panel mutational data for these patients we applied our model to a lung cohort treated without immunotherapy (BPC NSCLC NonIO), a lung cohort treated with immunotherapy (BPC NSCLC IO), and a colon cohort treated without immunotherapy (BPC colorectal NonIO).Similar to the TCGA data our neural network did not detect non-monotonic relationships in these cancer types (Table 3, Supplemental Figure 2).Cox Memorial Sloan Kettering (MSK) has released several large datasets of patients assayed with their gene panel and the corresponding clinical information.Without some form of common identifier across these datasets it's difficult to know how much patient overlap exists between the data releases, but we applied our FCN to a 2019 dataset which only included patients treated with immunotherapy 15 , and a 2021 dataset that had both IO naive and IO treated patients 18 .While we had observed TMB to have a non-monotonic relationship with survival in the TCGA dataset for SKCM, in the IO treated MSK melanoma cohort we did not observe a difference between a Cox regression and our FCN (Table 3).This is consistent with a monotonic relationship of TMB and clinical outcome in the context of melanoma treated with IO, and supports a TMB-high vs TMB-low stratification in this context.The FCN model showed better performance over Cox modeling in the non-IO treated colorectal cancer cohort (MSK Colorectal NonIO), in which a concave down model was apparent (Supplemental Figure 3).Some improvement in performance was seen in the MSK non-small cell lung cancer cohorts treated with IO, and the observed relationship was also concave down (Supplemental Figure 3).

Model Application
The benefit of directly modeling the relationship with survival is that any deviations from a linear fit will be accounted for in the model predictions.This allows a researcher wanting to identify a cutoff for splitting patients to simply use a cutoff based on model risks.Then, working backwards from the model risks the corresponding input variable cutoff(s) can be found.If the relationship of the input variable to survival is non-monotonic then a single model risk will result in two input variable values, while a monotonic relationship will result in a single input variable cutoff.
We can demonstrate what this might look like by splitting each cancer in the TCGA dataset by either the median TMB value or median FCN model risk value.When splitting by median TMB the higher TMB group may or may not have a higher hazard, while when splitting by median model risk the higher risk group should have a higher hazard.Ignoring this difference in sign, in most cases there is almost no difference in the test statistic since the FCN predicted a monotonic relationship for most cancers (Figure 4A, B).However, for SKCM we see that a single cutoff of TMB is inappropriate.In fact, with a median cutoff the relationship of TMB to survival is barely significant while it is highly significant with a FCN model output-based risk cutoff (Figure 4C, D).

Discussion
Identifying consistent cutoffs for biomarkers has always been challenging 22 since only relationships that resemble a step function result in stable optimal cutoff values 23 , a result recapitulated in our Figure 1 and Supplemental Figure 4.It is possible to identify a step function with a neural net (Supplemental Table 1, Supplemental Figure 4), but here we used neural nets to investigate yet another consideration when identifying potential biomarkers and cutoffs: the assumption of a monotonic relationship may not hold.If this scenario occurs it has several important implications.First, when no relationship is found with a Cox regression a very strong relationship could still be present.Second, the predictions for values at the extremes, i.e. patients who would be predicted to have the best or worst survival according to a Cox model, will contain the largest errors (see the fits for SKCM in Figure 3).Third, if a non-monotonic relationship is present then a single cutoff is inappropriate and two-cutoffs of the input variable will be needed to split patients into the appropriate two groups.
The possibility of a non-monotonic relationship is not just theoretical.While approaches to correlate TMB to survival have generally assumed a monotonic relationship, using a more flexible approach we have identified multiple datasets where the relationship between TMB and survival appears more complex.This study has a few important limitations, the first of which is that we did not undertake a more complex multivariate modeling of clinical outcome using features such as age, stage, grade, etc.This was done on purpose as the intent of this work is to highlight how the relationship of TMB to clinical outcome can be modeled to account for potential non-monotonic relationships and not to comprehensively approach clinical outcome modeling across the relevant predictors.The second important limitation is related in that we did not design these experiments such that we would be "validating" a specific TMB ↔ clinical outcome correlation across tumor type and clinical setting.Again, this was intended given the modeling focus of the work.We believe what we have developed and presented herein has important implications for studies that investigate TMB as a predictive and/or prognostic biomarker.Further, we show that fairly simple neural networks like we have presented here can help avoid the pitfalls of ordinary Cox-PH based regression with respect to the potential of non-monotonic relationships.

Figure 2 .
Figure 2. Simulated data model fits.Model fits from a Cox model and a neural net were averaged over 10 K-folds for two representative simulated survival datasets for a linear and non-monotonic relationship.A, B monotonic data.C, D non-monotonic data.TMB distributions shown as rug plots.

Figure 3 .
Figure 3. TCGA model fits.Nine selected cancer types are shown.Cox and neural net model fits were averaged over 10 K-folds.TMB distributions shown as rug plots.

Figure 4 .
Figure 4. Model effect on a binary label.Patients were split by either median TMB (A, C), or median risk from a neural net (B, D).A, C show the hazard ratios and log-likelihood tests of each cancer type for these splits while B, D show the Kaplan curves for SKCM with these splits along with the logrank p-value.

Table 2 .
TCGA data metrics.Log-likelihood scores and C-indexes for the test folds of different cancers in the TCGA for a Cox model and a neural net.

Table 3 .
BPC and MSK data metrics.Log-likelihood scores and C-indexes for the test folds of different cancers in the BPC and MSK data for a Cox model and a neural net.