## Abstract

Scientists working in translational oncology regularly conduct multigroup studies of mice with serially measured tumors. Longitudinal data collected can feature mid-study dropouts and complex nonlinear temporal response patterns. Parametric statistical models such as ones assuming exponential growth are useful for summarizing tumor volume over ranges for which the growth model holds, with the advantage that the model's parameter estimates can be used to summarize between-group differences in tumor volume growth with statistical measures of uncertainty. However, these same assumed growth models are too rigid to recapitulate patterns observed in many experiments, which in turn diminishes the effectiveness of their parameter estimates as summary statistics. To address this problem, we generalized such models by adopting a nonparametric approach in which group-level response trends for logarithmically scaled tumor volume are estimated as regression splines in a generalized additive mixed model. We also describe a novel summary statistic for group level splines over user-defined, experimentally relevant time ranges. This statistic reduces to the log-linear growth rate for data well described by exponential growth and also has a sampling distribution across groups that is well approximated by a multivariate Gaussian, thus facilitating downstream analysis. Real-data examples show that this nonparametric approach not only enhances fidelity in describing nonlinear growth scenarios but also improves statistical power to detect interregimen differences when compared with the simple exponential model so that it generalizes the linear mixed effects paradigm for analysis of log-linear growth to nonlinear scenarios in a useful way.

This work generalizes the statistical linear mixed modeling paradigm for summarizing longitudinally measured preclinical tumor volume studies to encompass studies with nonlinear and nonmonotonic group response patterns in a statistically rigorous manner.

## Introduction

Preclinical mouse tumor models serve as a vital bridge between *in vitro* systems and clinical studies because they enable researchers to investigate in-depth hypotheses in a controlled environment while introducing some of the complexity of a living system (1). Such models take many forms ranging in the fidelity to which the system mirrors human pathology (2). A classic and still widely used design for translational studies makes use of subcutaneous or orthotopic xenografts of immortalized cell lines from human tumors in immunodeficient mice (3). Biological artifacts stemming from cell line immortalization and passage in culture can be avoided by directly implanting pieces of human tumors into mice, also known as patient-derived xenografts (PDX; ref. 4). Studies in cancer immunology require immunocompetency to drive response (5). For such studies, syngeneic grafts of mouse-derived tumor cell lines such as MC38, CT26, or EMT6 are implanted into genetically matched mice (6). Still more complex biology can be investigated through genetically engineered mouse models, which develop tumors stochastically over time, often in the tissues normally associated with the human disease (7). A common feature of these models is that they often yield longitudinal tumor response data (8).

Assessing the impact of novel therapeutics frequently rests on between-group comparisons of longitudinal tumor trends, highlighting the importance of statistical study design and analysis (9). A standardized statistical framework for analysis can underlie objective comparisons across regimens within a study and of interregimen differences across studies. Improvements to the statistical framework can simultaneously enhance decision making (10), reduce the numbers of animals required (11), and support larger translational efforts by enabling summary results from each study to be compared through meta-analysis (12). This article grew out of the need to develop such a framework to analyze experiments from a mouse tumor model–centralized core facility where large numbers of studies are conducted over time. Researchers in this environment need a consistent way to model raw data, evaluate treatment regimens (e.g., drugs with distinct mechanisms or a single drug with different dosing schemes) within a study, and monitor how those regimens' inferred effects compare with historical results. Practically achieving these goals requires a statistical model of tumor growth that conceptually connects the raw longitudinal data to envisioned temporal tumor volume curves and a way to assign summary measures of efficacy to groups based on these curves.

Shortcomings of analyzing single time points from longitudinal data separately have long been understood (13) and have led to the development of statistical models of tumor volume that incorporate information across time points to summarize the efficacy of each study regimen. Many such theoretical models of tumor growth exist in the literature. Modeling tumor growth at least over short periods by simple exponential curves has a long history (14) and provides a basis for first-order statistical assessment of treatment effects via, e.g., linear models (15) or linear mixed models (LMM) applied to log-transformed tumor volumes. Exponential growth also serves as a component in more sophisticated schemes using change-point and biexponential methods to assess treatment response (16). A more refined and frequently applied parametric model observes that the growth rate necessarily slows as the tumor itself grows in size. This has inspired a parallel line of work applying Gompertzian growth kinetics to draw insights on tumor biology (17, 18). Other recent proposals include refs. 19, 20, and 21, with a wide-ranging review in ref. 22. Application of such models should also be cognizant of data *dropouts* (23), which describe animals with longitudinal observations stopped prematurely during the study. This can occur for a range of reasons, e.g., an animal is euthanized if the observed tumor grows to be too large or the animal cannot tolerate a treatment; animals can also be taken off study to assess response in one or more biomarkers.

The statistical model in this setting serves two purposes. First, it summarizes and abstracts from the longitudinal data a faithful description of how the tumor volumes change over time for each regimen (essentially a growth curve for each group). These descriptions can be as simple as log-linear growth but are often more complicated with, e.g., periods of tumor regression followed by regrowth, and ideally a model will capture this complexity. Second, the model allows for its by-group descriptions to be distilled into univariate group-level scores for how well or poorly each regimen performs in changing tumor volume growth over time. Comparing these univariate scores allows investigators to quantify the relative efficacy of two or more regimens in a tumor model. For log-linear growth, e.g., such a score can be the log tumor volume growth rate estimated as the slope of a line.

In our experience, a parametric model for tumor growth is chosen for analysis because fitting that model makes it easy to fulfill the second criterion of univariately scoring each group while satisfying the first criterion of describing accurately the tumor growth patterns at least reasonably well. This is a defensible statistical strategy for analysis of any one study in isolation. However, when large numbers of studies each with many different regimens are conducted and analyzed sequentially, it becomes impractical to settle on a common parametric model that plausibly describes all their myriad growth patterns. When a model then notably misfits the data from a treatment regimen, it is reasonable to question whether or how to interpret that model's parametric summary score as a proxy for efficacy.

With this in mind, we refocus on the first purpose of our statistical model and adopt a more flexible, spline-based generalized additive model (GAM; ref. 24) that follows growth trends well even for complicated protocols that induce nonlinear—and sometimes nonmonotonic—log tumor volume changes over time. Introduced to integrate splines into classical generalized linear models (GLM), GAMs build on a rich history of research in statistics (25, 26) and provide a statistical framework from which to infer a smooth nonlinear temporal trend for each treatment regimen while retaining many of the advantages of classical GLMs. Although GAMs, like GLMs, can be configured to analyze response data from any exponential family distribution (e.g., Binomial, Poisson, etc.), in this work we restrict attention to responses that are normally distributed about a curve. Each mouse's tumor volume time series is assumed to be a noisy copy of its group's smooth temporal trend, but potentially with a mouse-specific random affine perturbation (meaning that the trend can randomly “tilt” or vertically “shift” for each mouse). These perturbations extend the GAM to become a generalized additive mixed model (GAMM; ref. 27). GAMMs have been previously applied to longitudinal experiments in translational oncology (28) though without the univariate summary score introduced here. Likelihood-based statistical methods such as LMMs and GAMMs also address the concern posed by dropouts by providing unbiased parameter estimates when applied to longitudinal data with the sorts of dropout patterns typically found in tumor growth studies, where the decision to remove an animal from the study is made based on observed data and a prespecified protocol (29).

A gap in workflows using GAMMs for such analyses is in distilling each group's estimated spline into a univariate score in a consistent, statistically convenient way. In this article, we propose for such a summary statistic a functional that maps each group's spline fit over a specified time interval to a univariate statistical score through numerical integration. This score reduces to the slope in the case of log-linear tumor growth, which we have found to have intuitive appeal to researchers. Like the slope, its joint summary statistics across a multigroup study can be well-approximated by a multivariate Gaussian distribution with one component per group. This facilitates both intergroup comparisons and interstudy meta-analyses. Because the spline statistic reduces seamlessly to the LMM slope in log-linear scenarios while also quantifying nonlinear ones, studies with a mix of such groups can be analyzed and their treatment regimens scored with a common metric, broadening the applicability of mixed models for tumor growth analysis while effectively retaining the familiar slope summary for linear groups.

In subsequent sections, we describe mouse model tumor experiments employed as examples and the statistical models and summary statistics with which we analyze them. We apply our methods to these experiments in a series of three case studies. Finally, we discuss the results and implications of these analyses as well as advantages and limitations to our proposed approach.

## Materials and Methods

### Mouse oncology models

Analysis methods are demonstrated on data from three tumor growth experiments. The first tracks tumor volumes of mouse-derived allografts from a trichoblastoma cell line, whereas the second and third analyze different drug combinations and dosing schedules in a PDX of glioblastoma multiforme (GBM). In all three studies, each mouse has one subcutaneous tumor that is measured serially by calipers. On each day for which a measurement is taken, the diameter length |L $| and perpendicular width |W $| of the tumor are recorded, with volume |V$| then approximated using the modified ellipsoidal method as |V = {\frac{1}{2}}( {{L} \times W^2} )$| (30). Tumors are allowed to grow to a volume in an initial range before animals are randomized to treatment groups so as to create closely matched baseline average tumor sizes across regimens. All models were certified to be pathogen-free for *in vivo* use by testing at IDEXX Laboratories, Inc. All *in vivo* studies were conducted in compliance with Genentech's Institutional Animal Care and Use Committee in South San Francisco, CA.

### Murine trichoblastoma allografts

The procedure described in ref. 31 was followed for establishment of a subcutaneous allograft model of murine trichoblastoma with a loss-of-function mutation in the tumor-suppressor gene *Ptch1*, with human homolog *PTCH1*. Trichoblastoma is a histologic mimic of human basal cell carcinoma (BCC; ref. 32) that serves as a tractable mouse model. The large majority of sporadic BCC clinical cases stem from loss-of-function mutations in *PTCH1* (33), which functions as a natural inhibitor of the *smoothened* (*SMO*) gene. When *PTCH1* is lost, *SMO* becomes dysregulated, leading to abnormal cell growth. A small-molecule inhibitor of SMO such as vismodegib (34) can block SMO activation. Forty-six female CD1-nude mice (Charles River) with subcutaneous allografts between 195 and 310 mm^{3} at baseline were randomized into ten dosing groups, with tumors measured every 3 to 4 days for 21 days. All mice were 10 to 12 weeks of age at study start, with a median body mass of 27.6 g. Each animal was administered an oral dose of vismodegib in MCT vehicle twice daily, at 0, 0.3, 1, 3, 6, 10, 25, 50, 75, or 100 mg/kg.

### PDX of GBM

Primary human GBM SF7796FL xenografts (35) obtained from the University of California San Francisco were established subcutaneously from trocar-implanted tumor fragments in female NCr nude mice (Taconic). This model was used in two different experiments presented as separate case studies.

#### Palbociclib with and without temozolomide in a PDX of GBM

The reference group in this case study consisted of 7 mice treated orally with 200 mg/kg of palbociclib in lactate buffer (50 mmol/L, pH 4.0) once daily for the first 21 days of the study. The active treatment group consisted of 7 mice that received an identical 21-day regimen of palbociclib, but were also administered 22.5 mg/kg of temozolomide in MCT (0.5% methylcellulose, 0.2% Tween-80) orally once daily for the first 5 days of the study. All treatment thus ended for mice in both groups after 21 days. Observations on tumor volume continued for 3 additional weeks beyond this cessation of treatment. All mice were 20 weeks of age at study start, with a median body mass of 29.1 g.

#### Dose escalation of temozolomide on a background of GDC-0084 in a PDX of GBM

In this case study, 20 mice with subcutaneous xenografts between 120 and 245 mm^{3} at baseline were randomized into four dosing groups, with tumors measured every 3 to 4 days for 63 days. All mice were 8 weeks of age at study start, with a median body mass of 24.3 g. Mice in the four groups were administered 0, 10, 40, or 80 mg/kg, respectively, of temozolomide in MCT (0.5% methylcellulose, 0.2% Tween-80) orally once daily for the first 5 days of the study. Each animal was also administered an oral dose of GDC-0084 in MCT vehicle once daily at 1 mg/kg for the first 21 days of the study. GDC-0084 is a dual inhibitor of PI3K (36) and mTOR (37). It is under investigation as a therapy for glioblastoma (38), and in August 2019 was granted the proposed international nonproprietary name “paxalisib” by the World Health Organization. All treatment ended for mice in all four groups after 21 days. Observations on tumor volume continued for 6 additional weeks beyond this cessation of treatment, for a total of 9 weeks of longitudinal data.

### Statistical models

#### Statistical notation

Assume |I \ge 2$| treatment groups each with |{n_i}$| mice (|i \in 1,..,I$|). Typical values for |{n_i}$| are in the range of 4 to 12 mice per group. Each mouse has one tumor, the volume of which is measured at baseline and then at a number of time points typically over several days. Denote by |{t_{ijl}}$| the |l$|'th time point for the |j$|'th mouse from the |i$|'th group, and let |{V_{ijl}}$| be the observed volume of a tumor at |{t_{ijl}}$|. We apply a variance-stabilizing invertible transformation to the raw tumor volume data and denote by |{y_{ijl}}$| the transformed volume at |{t_{ijl}}$|. In this work, we log-transform response, setting |{y_{ijl}} = {\rm{log}}( {{V_{ijl}} + 1} )$|. Other choices (e.g., square roots) can be useful as well; the development below will work with any invertible transformation, though changing the transformation will change the biological interpretation of group-level summary statistics, and we have found logarithms to be suitable in the large majority of cases.

#### Linear and linear mixed effects models

If the log-transformed tumor volumes are linear when graphed against time, a linear model with separate effects by animal and/or treatment group can be both powerful and easy to implement. When each animal's longitudinal profile is cast as a random perturbation of a common group trend, group-level summaries are obtained by LMMs (39). Below we will fit a distinct fixed-effects line for each group with log-scaled tumor volume regressed on time. Each mouse's profile is assumed to have random respective perturbations of its intercept and slope so that the transformed tumor burden |{y_{ijl}}$| is modeled as

where |{\alpha _i}$| and |{m_i}$| are the intercept and slope, respectively, for the |i$|'th group, whereas |\{ {( {{a_{i,j}},{b_{i,j}}} ),j = 1,..,{n_i}} \}$| are the random effects (intercept and slope, respectively) for the tumor profile of the |j$|'th animal in the |i$|'th group. The random effect terms are assumed to stem from underlying Gaussian distributions, though prediction accuracy of mixed models has been found to be robust to moderate violations of this assumption (40).

#### Generalized additive mixed models to estimate nonlinear growth patterns

Group-level tumor volume profiles are often markedly nonlinear over time, even after logarithmic transformation. To model these nonlinearities, we extend the LMM approach with a GAMM that posits a single flexible growth trend induced in common for each subject in a treatment group, but includes random effects (random intercepts and slopes, in this application) for the profile of each mouse. The log-scaled tumor volume |{y_{ijl}}$| is modeled as

where again |\{ {( {{a_{i,j}},{b_{i,j}}} ),j = 1,..,{n_i}} \}$| are the random effects for the |j$|'th animal's tumor profile, but with the fixed component shared across the |i$|'th group now entirely described by a smooth function of time estimated by a spline with parameter vector denoted by |{\beta _i}$|. Thin-plate regression splines (41) were used with default parameter settings from the R packages *mgcv* (ref. 24, including, e.g., ten basis vectors, by default) and *gamm4* (42). The resulting GAM spline is denoted below by |{g_i}( t )$|; the spline evaluated at parameter estimates |{\hat{\beta }_i}$| is denoted by |{\hat{g}_i}( t )$|.

### Summaries of longitudinal profiles

#### Slopes from a linear mixed model

The LMM growth curve shared by all animals in the |i$|'th group is |{\rm{E}}[ {{y_{ijl}}} ] = {\alpha _i} + {m_i}{t_{ijl}}$|. As efficacy is typically assessed in terms of change from a group's baseline, and because animals are most often randomized so that the baseline average tumor burdens are approximately equal across groups, we take the vector |( {{{\hat{m}}_1}, \ldots {{\hat{m}}_I}} )$| of estimated slopes as summary statistics for regimen efficacy. The linear model can in principle be fit to data that are evidently nonlinear and the slope extracted as a summary, albeit imperfect, of group-level efficacy. A simulated example of such data is shown in Fig. 1A with the linear mixed effects group-level fitted regression line overlaid.

#### Summarizing a GAM fit with the eGaIT statistic

GAMMs are preferable to LMMs for accurately estimating tumor volume growth curves because their greater flexibility can capture nonlinear trends that cannot be accommodated by the simpler linear model. The simplicity of the LMM is advantageous, however, in distilling its by-group fitted lines into regimen summary scores because temporal changes from baseline are captured in the slope estimates and their standard errors. In contrast, a by-group fitted GAM spline from a GAMM lacks an obvious univariate score akin to slope. Across studies, spline bases are determined by data-driven criteria (43) so that the cardinality and meaning of |{\hat{\beta }_i}$| can vary. To account for the menagerie of GAM spline models needed to follow patterns across a wide range of tumor volume studies, we propose a regimen summary score that marginalizes across time, intuitively quantifying aspects of a curve *shape*.

For any time |t$| in |[ {a,b} ]$|, the growth curve centered to its starting value, |{g_i}( t ) - {g_i}( a )$|, is the log fold change gain (or loss) in tumor volume for group |i$| up through time |t$|. Adding up the incremental log fold change values across |[ {a,b} ]$| yields the definite integral of |{g_i}( t ) - {g_i}( a )$|. We normalize this definite integral by |{\frac{1}{2}}{(b - a)^2}$|. The resulting statistic is the *endpoint Gain Integrated in Time* (“eGaIT”), denoted by |{\gamma _i}$|:

where the estimate |{\hat{\gamma }_i}$| is obtained by plugging |{\hat{\beta }_i}$| in for |{\beta _i}$| in |{g_i}( t )$|. Because eGaIT is computed by standard methods in numerical integration, it is ultimately a linear combination of the |{\hat{\beta }_i}$| parameter estimates, with standard errors obtained accordingly from the |{\hat{\beta }_i}$| values' estimated covariance matrix inferred in the GAMM fitting. It is shown in the Appendix that with the chosen normalizing constant of |{\frac{1}{2}}{(b - a)^2}$|, eGaIT converges seamlessly to the LMM slope as the spline |{g_i}( t )$| becomes linear.

The eGaIT statistic is illustrated on the simulated data set in Fig. 1B.

### Comparing slope and eGaIT

Although the slope |{\hat{m}_i}$| assumes underlying log-linear growth, the eGaIT statistic |{\hat{\gamma }_i}$| assumes only a smooth response over the study period. Though the eGaIT estimates |{\hat{\gamma }_i}$| are scaled so as to align well with their group-matched LMM slope estimates |{\hat{m}_i}$| in the log-linear case, they are AUC statistics (44): an alternative interpretation of the eGaIT statistic is as the constant log-linear growth rate that *would* have been needed to yield the AUC that we actually observed (see Appendix). This interpretation gives some intuition as to when summarizing longitudinal curves by eGaIT is likely to enhance power to discriminate among treatment effects. Curves that over a range of interest are either increasing and concave or decreasing and convex will be more efficiently summarized by the eGaIT statistic, because eGaIT tends to be higher than the slope for increasing but concave growth curves and lower than the slope for decreasing, convex growth curves. Both phenomena are commonly observed for longitudinal tumor data because tumors in the vehicle group frequently show subexponential growth as they become larger (45), whereas tumor size for effective therapies is bounded below so that trajectories for rapidly shrinking tumors tend to flatten out as the study progresses. More generally, eGaIT's formulation typically boosts power for investigating regimens with sustained effects observed from early in the study as opposed to regimens with effects that are transient or manifest late in the study. In this former scenario, eGaIT will be superior to the LMM slope in its power to detect differences between regimens. Examples are presented in the case studies.

Estimates of the slope |{\hat{m}_i}$| and eGaIT |{\hat{\gamma }_i}$| with approximate 95% confidence intervals are shown in Fig. 1C.

## Results

We present case studies from three experiments to illustrate how the statistical framework and metrics proposed can be applied in practice to answer questions from translational oncology studies.

### A dose-escalation study of vismodegib

A ten-group dose-escalation study evaluated vismodegib in 46 CD-1 nude mice with trichoblastoma allografts as described in Materials and Methods. We fit both LMM and GAMM models, calculating slope and eGaIT statistics in order to see where the summary measures match well and where they differ.

Figure 2A shows the raw data along with fitted lines and splines from the LMM and GAMM, respectively. We observe linearity at doses that are 10 mg/kg or lower, with slight concavity in the splines at the lowest three doses, and convexity at the four highest doses (25–100 mg/kg). Group-level statistical estimates for the LMM slope and eGaIT with 95% confidence intervals are shown in Fig. 2B. Though the two metrics exhibit broad similarity, the eGaIT estimates for the four highest dose groups are all notably lower than the slopes for the same groups, which is a consequence of convexity in the estimated growth curves.

In the six lower-dose groups with essentially log-linear growth, the spline-based statistics and confidence intervals shown in Fig. 2B are practically identical to the slope estimates and corresponding confidence intervals from the linear mixed model. This demonstrates empirically that when the data are well summarized by a line, the eGaIT spline summary proposed in this work will effectively reduce to the slope. However, when the data start to show curvature (as in the four highest dose groups), the eGaIT statistic can adapt to and capture the convex response patterns missed by linear estimators, boosting power to detect differences.

### Temozolomide added to palbociclib in a PDX of GBM

Data from a PDX model of GBM compared treatment over 21 days of palbociclib with and without an initial 5 concurrent days of temozolomide. After day 21 of the study, neither group received any further treatment, but the tumor volumes were recorded for 3 additional weeks, yielding a total of 6 weeks of measurements. In both groups, the tumor growth curves during the 3 weeks of treatment were relatively flat. Over the 3 weeks of posttreatment observation, however, tumors in the group that had received only palbociclib exhibited roughly log-linear growth (Fig. 3A, left), whereas the group that in addition received temozolomide on days 0 to 5 showed a marked decline in the tumor volumes (Fig. 3A, right).

Because palbociclib is a CDK4/6 inhibitor (46) that has been reported to block efficacy of several concurrently dosed chemotherapeutics (47), it was of interest to compare tumor volume growth between the groups over the posttreatment period (days 21 to 42) in order to characterize the effect of temozolomide dosed on days 0 to 5. One way to do so is to fit a LMM to the data from days 21 to 42 only and report the slopes of the fitted lines on this range. These fitted lines are sketched in red in Fig. 3A with slope estimates shown in Fig. 3B (in red) and Table 1. Another way starts with fitting a GAMM to the entire study interval. Such fitted regression splines in Fig. 3A (in black) show mostly static on-treatment tumor volumes for each study group followed by gradually diverging posttreatment growth patterns. Tumor volume growth over the posttreatment period is then estimated from these splines by calculating eGaIT only over days 21 to 42; estimates are shown in Fig. 3B (in black) and Table 1.

Metric . | Palb (SE) . | Palb + temo (SE) . | Difference (SE) . | t-stat . |
---|---|---|---|---|

LMM slope | 4.97 (0.78) | −3.68 (0.78) | −8.65 (1.10) | −7.85 |

eGaIT | 3.75 (0.45) | −1.96 (0.44) | −5.71 (0.63) | −9.03 |

Metric . | Palb (SE) . | Palb + temo (SE) . | Difference (SE) . | t-stat . |
---|---|---|---|---|

LMM slope | 4.97 (0.78) | −3.68 (0.78) | −8.65 (1.10) | −7.85 |

eGaIT | 3.75 (0.45) | −1.96 (0.44) | −5.71 (0.63) | −9.03 |

Note: In each case, the SE of the estimate is shown in parentheses. Effect and SE values are all scaled by 100 for legibility, whereas the t-statistic is the ratio of the difference between treatment effects to its SE.

Table 1 and Fig. 3B show that the individual regimen estimates are concordant between the LMM slope and eGaIT, but that the eGaIT estimates are shrunken toward zero and have tighter precision (i.e., narrower confidence intervals). This shrinkage makes eGaIT estimates more realistic summaries of the observed data as it reflects the gradual transitions observable after day 21 from static growth (i.e., a log-fold change of zero) to increasing or decreasing log tumor volume, respectively, for the two regimens.

The effect of initial treatment with temozolomide on tumor growth in the posttreatment period can be summarized either as the difference of the growth rates (slopes) between the two groups in the LMM approach |{\hat{m}_{( {{\rm{palb}} + {\rm{temo}}} )}} - {\hat{m}_{( {{\rm{palb}}} )}}$| or as the difference of the eGaIT estimates in the GAMM approach: |{\hat{\gamma }_{( {{\rm{palb}} + {\rm{temo}}} )}} - {\hat{\gamma }_{( {{\rm{palb}}} )}}$|. Although eGaIT yields a smaller difference than the LMM slope, its resulting |t$|-statistic is greater owing to its tighter precision (Table 1).

The improved precision stems from the fact that although eGaIT is estimated over only the posttreatment period, the underlying spline is estimated using data across the full study. This is in contrast to the LMM approach, which fits its model and determines precision of slope estimates from data in the posttreatment period only.

### Temozolomide added to GDC-0084 in a PDX of GBM

A four-group study in a PDX model of GBM compared treatment over 21 days of a common dose of GDC-0084 at 1 mg/kg with an initial 5 concurrent days of four doses of temozolomide, as described in the methods. After day 21 of the study, none of the four groups received any further treatment, but the tumor volumes were recorded for 6 additional weeks, yielding a total of 9 weeks of longitudinal measurements.

Figure 4A shows that although the group receiving only GDC-0084 exhibited log-linear growth, the three groups that also received doses of temozolomide (at 10, 40, and 80 mg/kg, respectively) responded with dose-dependent tumor regressions followed by eventual regrowth. It was of interest to quantify each dosing group's degree of regression and regrowth over the 9 weeks of treatment and observation.

The LMM slope is ill-suited to summarizing such marked nonlinearity because the times at the start and end of the measurement range have disproportionate leverage (48) in a linear regression, with the effect that the severe dips and subsequent regrowth measured after treatment with temozolomide are largely irrelevant to the LMM slope. Of the slope estimates with confidence intervals plotted in red in Fig. 4B, only that from the 80 mg/kg group has a negative point estimate. None of the LMM slopes is statistically significantly negative, even though all three nonvehicle groups exhibit tumor shrinkage for most of the 9-week measurement interval.

In contrast, the mid-study dip and regrowth are captured by the GAMM spline and then incorporated into eGaIT, so that these summary scores (plotted in black in Fig. 4B) reflect the evident reductions in volume and dose-dependent delays in regrowth. The eGaIT point estimates over 9 weeks are negative for all three groups dosed with temozolomide, with estimates for the middle and upper doses of temozolomide showing statistically significant reductions.

## Discussion

Preclinical tumor studies play a central role in moving novel therapeutics toward the clinic because they provide experimental ground on which biologists and clinicians can test complex hypotheses in controlled settings. Standardized but adaptable analysis of such experiments has been lacking, motivating us to develop a statistical workflow with the two principal goals of estimating the longitudinal tumor volume growth function for each treatment regimen in a flexible, spline-based manner and then summarizing those growth functions across an experimental range of interest so as to yield statistically grounded summary scores amenable to analysis with standard methods and software. Such a biologically motivated summary score that quantifies the qualitative effects evident in nonlinear growth curves while transitioning smoothly to the slope for log-linear growth widens the applicability of longitudinal mixed models by allowing for studies with groups following both linear and nonlinear growth patterns to be analyzed within a common rubric. This workflow benefits researchers by providing them with growth curve estimates that can be superimposed graphically on the raw data to aid in interpreting the study results, and also by giving them standardized and widely applicable statistical summaries with assessments of precision to use in reporting effects and differences across regimens within a study and also across studies (as in, e.g., meta-analyses), and to inform decision making. Although we have presented our method in the context of tumor volume, it can be adapted for analysis of other longitudinal endpoints. Examples from mouse tumor experiments are tolerability proxies such as body weight (49) and imaging-based metrics distinct from tumor volume such as microvascular changes following irradiation (50).

Although slope estimates from LMMs make excellent summaries in cases with clear log-linear growth such as the lower doses of the vismodegib case study, the log-linear approximation is often suboptimal to inadequate when growth of transformed tumor volume is clearly nonlinear as in the two case studies presented with temozolomide combined with different kinase inhibitors. Although some such cases can in principle be handled within the LMM framework by slicing the data into time windows and fitting separate linear models within each (as in, *e.g*., our case study with palbociclib ± temozolomide), that approach not only is more cumbersome but also suffers from several deficiencies. Firstly, mice that drop out in one period are excluded from influencing growth estimates in any subsequent ranges analyzed. Secondly, linear slope estimates from abbreviated time ranges will also suffer from reduced precision, requiring either larger study sizes to begin with or the loss of statistical power. Finally, inferred growth trends estimated separately from adjacent ranges will not in general join smoothly (or even continuously) at the chosen cut points, whereas the by-eye selection of cut points to identify linear response ranges can itself introduce overly optimistic or biased slope estimates.

In contrast, the application of GAMM-based regression splines as described here allows the growth curves to be estimated by a set of continuous functions based on all the data available. Hypotheses about how regimens compare over ranges can then be summarized and reported with model-based measures of uncertainty. Our proposed eGaIT statistic provides biologically motivated summaries of fitted splines that coincide with the intuitively appealing and widely used slope in the log-linear case but generalize usefully in nonlinear scenarios such as the case study of concurrent dosing with temozolomide and GDC-0084. Although the resulting modeling scheme is not optimal for all designs, it has proven to be at least adequate for the vast majority of studies and provides a major simplification by allowing us to automate robust and flexible modeling and summarization of growth functions that can be applied by oncology researchers for standard analyses while consulting a statistician only as needed.

There are several limitations to our approach with varying levels of impact. One is the assumption that each animal's tumor growth curve within a group follows a common pattern (with random effects and noise). This assumption can fail in studies with extensive within-regimen heterogeneity of response, observed in particular with immune-based therapies. In this setting, a subset of the tumors may have partial or even complete responses while others have growth indistinguishable from a control group. In such a case, estimating the *average* response (linear or nonlinear) is ineffective in summarizing volume changes because few if any of the tumor growth profiles will be well described by that average. One alternative is to enumerate responders within each group and employ methods from categorical data analysis (51). Our software includes functions to facilitate such analyses, with examples in the package documentation. Another limitation of the methods presented here is the need for a study to have enough data to fit a spline. Though we have found the underlying R packages to be impressively resilient, studies with few time points (e.g., three distinct days) can result in nonidentifiability or numerical instability that precludes fitting splines, though trends in such a study are still nonlinear. In such cases, piecewise linear mixed models (24) have often sufficed as an alternative for curve fitting; these are also implemented in our software but are outside the scope of the current discussion. Finally, common spline smoothing terms and variance components for random effects are estimated and applied across the groups. This is most often a reasonable assumption, and the choice was a deliberate trade-off, as keeping the fitted splines within a common statistical model greatly eases incorporating measures of uncertainty in the numerical integrations underlying our summary statistics.

We note that fitting a spline with a GAMM and then summarizing that spline by eGaIT are distinct and modular steps, so that alternative spline summary statistics can be defined and encoded to capture different qualities of a growth trajectory in cases when eGaIT may not suffice. One such alternative spline summary is the *endpoint Difference Over Time* (eDOT), which is the difference in spline values at the study interval endpoints divided by the interval length. This more directly generalizes the concept of linear slope to a nonlinear response, though it relies only on the spline's value at the start and end of the study, ignoring its interim path. Our R package available at https://github.com/wfforrest/maeve enables streamlined fitting of GAMMs to multigroup studies and by-group summarization of the fitted splines to either of eGaIT or eDOT, with further details on both available in package documentation. Included with the R package documentation are example workflows on real data and a characterization study that includes simulations in which confidence intervals for eGaIT and eDOT exhibit nearly nominal coverage for group sizes similar to those of the case studies shown here.

## Appendix: The Normalizing Constant for eGaIT

The choice of |{\frac{1}{2}}{(b - a)^2}$| as a constant of normalization for |\gamma $| is motivated by examining a Taylor expansion of |g$| about |t\ = \ a$|. Dropping the subscript |i$| and letting |{g^{( k )}}( x )$| denote the |k$|'th derivative of |g$|, we have

so that the baseline-normalized AUC over |t \in [ {a,b} ]$| is

The first definite integral on the right evaluates to |{\frac{1}{2}}{(b - a)^2}$|. By choosing |{\frac{1}{2}}{(b - a)^2}$| as the normalizing constant, the statistic |\gamma $| converges smoothly to the slope as the spline straightens sufficiently toward a line, i.e., as |{g^{( k )}}( a ) \downarrow 0,k \ge 2$|:

A more geometric intuition for |{\frac{1}{2}}{(b - a)^2}$| follows from the observation that insofar as a group's growth function |g( t )$| is roughly log-linear with growth rate |m$|, then its estimated trajectory from study time |t = a$| through |t = b$| traces out a right triangle with base of length |b - a$| and height equal to |m( {b - a} )$|. Because the area of this triangle (a.k.a., the definite integral) is |{\frac{1}{2}}m{(b - a)^2},$| our choice of normalization ensures that |\gamma $| mimics the slope when growth is fairly log-linear, in which case either estimate can be interpreted as an average log fold change per unit of time.

## Disclosure of Potential Conflicts of Interest

M. Jakubczak reports funding by Genentech (previous employment) during the conduct of the study and is currently an employee of Ardigen Inc. (outside the submitted work). No potential conflicts of interest were disclosed by the other authors.

## Authors' Contributions

**W.F. Forrest:** Conceptualization, Data Curation, Software, Formal Analysis, Validation, Investigation, Visualization, Methodology, Writing-Original Draft, Writing-Review and Editing. **B. Alicke:** Conceptualization, Resources, Data Curation, Validation, Methodology, Writing-Original Draft, Writing-Review and Editing. **O. Mayba:** Conceptualization, Visualization, Methodology, Writing-Original Draft, Writing-Review and Editing. **M. Osinska:** Resources, Supervision, Funding Acquisition, Validation, Project Administration. **M. Jakubczak:** Conceptualization, Data Curation, Software, Validation, Methodology. **P. Piatkowski:** Conceptualization, Data Curation, Software, Validation, Methodology. **L. Choniawko:** Conceptualization, Data Curation, Software, Validation, Methodology. **A. Starr:** Resources, Supervision, Funding Acquisition, Validation, Project Administration. **S.E. Gould:** Conceptualization, Resources, Data Curation, Supervision, Funding Acquisition, Validation, Project Administration.

## Acknowledgments

The authors would like to thank Aaron Lun, Dorothee Nickles, Michael Lawrence, Yasin Senbabaoglou, Jason Hackney, and the anonymous external reviewers for their thoughtful and constructive suggestions in the course of developing this work. We also thank Kazia Therapeutics for permission to include data from the case study with GDC-0084.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.