## Abstract

We propose that the quantitative cancer biology community makes a concerted effort to apply lessons from weather forecasting to develop an analogous methodology for predicting and evaluating tumor growth and treatment response. Currently, the time course of tumor response is not predicted; instead, response is only assessed *post hoc* by physical examination or imaging methods. This fundamental practice within clinical oncology limits optimization of a treatment regimen for an individual patient, as well as to determine in real time whether the choice was in fact appropriate. This is especially frustrating at a time when a panoply of molecularly targeted therapies is available, and precision genetic or proteomic analyses of tumors are an established reality. By learning from the methods of weather and climate modeling, we submit that the forecasting power of biophysical and biomathematical modeling can be harnessed to hasten the arrival of a field of predictive oncology. With a successful methodology toward tumor forecasting, it should be possible to integrate large tumor-specific datasets of varied types and effectively defeat one cancer patient at a time. *Cancer Res; 75(6); 918–23. ©2015 AACR*.

## Introduction

The past decade has witnessed a dramatic increase in our knowledge of cancer on multiple scales, leading to a host of potential drug targets and subsequent clinical trials. Yet the outcome for many cancers has not improved (1). A fundamental reason for this sobering reality is that we do not have a validated theoretical framework to understand how tumors within the individual patient respond to treatment; that is, there is no accepted mathematical description that enables us to generate testable, patient-specific hypotheses. More specifically, we do not have a theory that, given patient-specific data, can we reliably and reproducibly predict the spatiotemporal changes of that patient's tumor in response to an intervention. Currently, providing optimal therapies for a specific tumor phenotype, particularly with combinations of therapies, is extraordinarily difficult, as the number of potentially important adjustable parameters, such as the order and dosages of therapy, is too large to span in clinical trials and patient heterogeneity in response is large. Clinical trials too frequently lead to inconclusive and confusing results such that approximately half are never even published in the peer reviewed literature (2). As our knowledge of cancer grows, there is a desperate need to make real connections between those designing clinical trials and those studying mathematical models of tumor growth and treatment response so that the field of theoretical oncology can provide systematic, testable predictions of the response of individual patients to individual therapeutic regimens. We envision a diagnostic/prognostic toolkit containing experimentally validated, mathematical tumor models coupled with a battery of patient-specific measurements to initialize and constrain a patient-specific model. Oncologists could then select the most promising approach by systematically and exhaustively exploring, *in silico*, all relevant combinations of therapies. Unfortunately, there is not yet a theoretical approach that can harness the spectrum of cancer knowledge toward usable predictions.

Cancer is a multiscale problem that extends from the micro (DNA/RNA), to the meso (protein expression, cell behaviors), to the macro (organ function) scales, and critical to understanding cancer will be to model impacts at each level. Multiscale models (and the associated experiments), an active area of cancer research, will need time to develop. However, it may be possible to establish theories of clinical tumor progression that captures the behavior of an individual patient's tumor well before the influence of underlying events on tumor scale behaviors is elucidated. That is, one does not require an exhaustive understanding, at all scales, of a phenomenon to form a useful theory. The history of science teems with examples of extraordinarily useful theories, whose development was driven by a need to quantitatively describe complex systems or complex behaviors, when the underlying principles were not known. One instructive example is weather modeling, which is guided in large part by the fluid equations with hydrostatic balance, also known as the primitive equations (3). We posit that the methods and paradigms of weather forecasting over the last century provide the most pertinent perspective on how to proceed when attempting to build a quantitative theory of cancer. Although analogies between weather and tumor forecast are somewhat common in both the scientific and popular press (e.g., refs. 4, 5), detailed steps for adapting the methods of meteorology to oncology are often overlooked. We establish some relevant parallels in the next section.

## Guidance from the Meteorologist

A “weather forecast” is a scientific estimate of weather conditions at some time in the future. There are essentially two steps in developing a weather forecast: (i) estimate the current physical state of the atmosphere (the “diagnostic phase”), and (ii) use techniques to predict the future state of the atmosphere (“prognostic phase”). Once the diagnostic phase is complete, there are two main ways to develop a prognosis: one is based on historical data (both near and long term), while the other uses mathematical models. We provide brief descriptions of both before discussing possible application to cancer.

The simplest approach based on historical data is “persistence forecasting,” which assumes that the tendency of the weather is to remain unchanged in the short term. “Climatological forecasting” uses average weather statistics accumulated over decades to offer a short-term forecast: If a prediction of temperature and precipitation is desired for a particular day, one averages the temperature and precipitation measured on that day for as far back as one's dataset allows, and from these average values offers a forecast. A third method from historical data is the “analog method,” based on the assumption that weather repeats itself, at least in a general way. The idea is to match current conditions with similar conditions previously observed and construct a forecast based on the observations most frequently seen to follow those conditions in the database. The fourth approach is termed “trend forecasting,” which is based on determining the velocity of certain features (e.g., storm fronts) and extrapolating the future position of those features.

These historical data approaches are useful as they can provide reasonably accurate forecasts for short time periods into the future (generally, less than 24 hours). However, they cannot account for changes in intensity and direction and/or predict the formation or dissipation of major events over extended time (days or weeks). We argue that current clinical methods of predicting treatment response in patients with cancer are akin to these “historical trend” methods and, consequently, suffer from the same severe limitations in predictive ability. The clear strategic and economic benefits of knowing what the weather will be tomorrow or next week encouraged the development of numerical weather prediction (NWP) methods in the last century. The cost of health care and the aging population coupled with recent scientific and technological advancements provide both the incentive and the capacity to develop numerical tumor prediction (NTP) methods now.

NWP methods stem from a set of physical principles that govern the behavior of the atmosphere and that can be expressed by a system of equations. If these equations can be solved, one can provide a description of the future state of the atmosphere (i.e., a forecast) based on its current state. The diagnostic phase of a numerical weather model relies on a vast, global network of surface (land and sea) and air measurements to obtain reliable measures of temperature, cloud cover, humidity, precipitation, and wind velocity. It is difficult to overstress the importance of these measurements. For weather modeling, measurements in one particular region are insufficient to characterize the whole atmosphere because the system is both (i) heterogeneous and (ii) connected. Indeed, the forecaster must have a complete—and maximally accurate—picture of current conditions before the weather at future times can be predicted.

NWP is an “initial value problem”; that is, to integrate the equations of atmospheric motion and project its state at future time points, one must specify the values of the independent variables at an initial time. Mathematically, the atmosphere is described in terms of a state vector, *X*(*r*, *t*_{0}) = {*X _{1}*(

*r*,

*t*),

_{0}*X*(

_{2}*r*,

*t*),…,

_{0}*X*(

_{n}*r*,

*t*

_{0})}, of the

*n*model variables at grid points

*r*and initial time

*t*(i.e., the diagnostic phase). For meteorology, the

_{0}*X*vary depending on the form of the equations but include some form of conservation of momentum (horizontal velocity and hydrostatic balance), energy (temperature), air density, and specific humidity. Once obtained,

_{i}*X*(

*r*,

*t*), along with additional diagnostic quantities to close the system, are then substituted into the model equations and evaluated at future time points to generate the forecast (i.e., the prognostic phase).

_{0}Two key techniques commonly used in NWP that are concerned with providing the best possible forecast are appropriate for tumor modeling. The first technique is “dynamic model weighting,” by which multiple predictions (from multiple models) are weighed to provide a forecast that combines the most realistic aspects of various models given the particulars of the system under investigation. As it is rarely known, *a priori*, which model is most appropriate for describing a particular weather system, dynamic model weighing provides a method by which such uncertainty can be minimized by statistically determining how to combine the best predictions from multiple models (6). The second concerns “data assimilation,” which aims to improve upon a short-term model forecast by incorporating all available observed data from a diverse suite of *in situ* and remotely gathered sources that are not regularly available at each temporal or spatial grid point, and using advanced numerical techniques to create an optimized, evenly-gridded representation of the state of the atmosphere in time and space with minimized error (7, 8). One application of this technique to determine the relevance for tumor growth showed promise with a moderate degree of model error and measurement noise (9). Related to data assimilation is the process of “reanalysis,” which takes the most advanced data assimilation process to create a long-term data product for comparison with weather and climate models. This provides a consistent data source for analysis and model validation. One challenge for reanalysis products, which is the proper inclusion of changing data quality and coverage over the Earth's past and present (10), would be minimal for tumor prediction because the methods and data sources could be applied evenly over the lifetime of the tumor prediction window.

## Numerical Tumor Prediction

An effective field of predictive oncology would develop a comprehensive and systematic method for predicting the future status of an individual tumor given an optimal representation of its initial conditions and an appropriate biophysical model. It is now well known that tumors are a mosaic of different cells with tremendous heterogeneity across scales and between patients even in the same disease subtype (11, 12). The currently used “climatological” and “analog” approach for tumor forecasting based on an average of prior patients response has limited applicability to forecast an individual tumor development. The persistence and trend forecasting approaches also have limited applicable in oncology as tumors are temporally and spatially dynamic; thus, forecasting that a tumor will maintain its current status (in the case of persistence forecasting) or continue to evolve as it has in the past (in the case of trend forecasting) will be accurate only for a short period of time. We note that persistence forecasting is used for patients for whom “watchful waiting” is appropriate. Of course, persistence forecasting does not provide insights into if and when relapse occurs.

What would clinical oncology look like if a predictive NWP-equivalent was available? On the basis of NTP algorithms, a patient would be given a personalized quantitative output charting the course of the disease. This would include levels of confidence in the prediction itself, as well as a time interval for future interventions, controls, and updates. Although the goal is ambitious (as was the conceit in the early 20th century that we could accurately predict the weather), the success of NWP provides a roadmap and optimism that NTP can be a reality in the near future.

In analogy to NWP, the first step toward developing NTP is to characterize the state vector of the tumor; that is, we need to form a complete diagnosis of the state of the tumor. To achieve this, we must determine what (independent) tumor variables are the most critical for quantitatively summarizing tumor status. This is no small challenge, but a candidate list has been developed by Hannahan and Weinberg (13). We then form a spatiotemporal equation for each tumor variable and link these to each other so that model variables can influence each other; in the parlance of mathematics, we would build a coupled system of partial differential equations describing the evolution of the tumor mass (rather than also including energy and momentum as is done in NWP). Figure 1 illustrates this approach.

We note that there is no shortage of mathematical models of tumor growth in the literature (e.g., refs. 14–17), but they are not typically of the kind that can readily be populated by data that are measured clinically at multiple time points with any reasonable spatial resolution. Models must be recast to deal with data that can be measured on an individual basis. Furthermore, dynamic model weighting should be used to eliminate bias that may exist in particular models by combining the predictions of multiple, independent models. Data assimilation can be used to provide updates to the tumor forecasts by including new data points as they become available. It should also be noted that much biologic and clinical progress can be made without a comprehensive initial state vector; indeed, the first NWP model used only seven independent variables (pressure, temperature, density, humidity, and three velocity vectors; ref. 18). Regardless of what state variables are chosen for a particular model, the data simply must be acquired from the system (i.e., the patient) for which the prediction is to be made. Initializing predictive models using data culled from population statistics or the literature will not guide the way any more than the historical trend forecasts do for weather prediction.

## Clinical Oncology Settings Likely to Benefit

Selecting an optimal treatment regimen for a tumor phenotype, and predicting how an individual patient will respond to a treatment regimen are two areas most likely to benefit quickly from the approach described in Fig. 1. Currently, in the best cases, treatment regimens are selected based on molecular phenotype and/or genotype (e.g., breast cancer) whereas, in the worst cases, untargeted chemotherapy is used (e.g., pancreatic cancer). If we were able to initialize a reasonable model of tumor growth dynamics with a characterization of an individual tumor's initial state vector, then it would be possible to systematically run a series of *in silico* simulations to determine how this particular tumor will respond to an array of treatment regimens. That is, we could run a myriad of patient-specific virtual clinical trials to determine the optimal regimen and timing for that particular patient. This is an especially attractive features in the combination therapy setting where one drug is designed to target tumor-associated vasculature, whereas another is designed to target the tumor cells themselves (Fig. 2); indeed, such trials are common and frequently have unclear results (e.g., refs. 19–21). Another promising avenue for this modeling approach is in situations where one drug has the potential to sensitize the tumor to a second therapy. Such is the case in, for example, triple-negative breast cancers that are sensitive to PI3K inhibitors, which, in turn, may increase their susceptibility to DNA-damaging agents (22). An important feature of this theoretical approach is that it generates predictions that experimentally are testable in preclinical animal models of cancer. An early, and successful example of this has already been achieved (23) using very limited patient-specific data and this speaks to the power of the paradigm. Once a therapeutic approach is selected, we are then faced with the difficulty of using early treatment changes to predict long-term response.

Currently, responses are not predicted, they are assessed *post hoc* by physical examination, or structural ultrasound, MRI, or computed tomography. Many patients are forced to undergo invasive biopsies during their therapy, and others are found to have received ineffective therapy only after months of treatment. The ability to identify—early in the course of therapy—patients that are not responding to a given regimen is highly significant. In addition to limiting patients' exposure to the toxicities associated with unsuccessful therapies, it would allow patients the opportunity to switch to a potentially more efficacious treatment. As there are typically many therapeutic options available, and many more being developed, switching treatment early in the course of therapy is a very real option—but only if a reliable method to determine early response were available. Unfortunately, existing methods of determining early response are inadequate. A particularly exciting area to apply this approach is in the neoadjuvant setting, where patients who achieve a complete pathologic response at the time of surgery are predictive (for many cancers) of improved survival (24–27). Thus, if it were possible to predict before therapy, test early in therapy, and then change (if needed) therapy, long-term survival could be significantly improved.

## Modeling Successes in Clinical Oncology

As was recently and elegantly described by Larry Norton (28), mathematical modeling actually guided the development of medical oncology in the 1960s when Skipper and colleagues developed the concept of the “log-kill hypothesis” (29). This hypothesis states that when tumor growth increases by a constant fraction per unit time (i.e., the growth is exponential), then treatment with a cytoxic therapy results in tumor shrinkage by a constant fraction over time. This paradigm led to many ideas (eventually accepted as dogma) on how to properly test and use anticancer drugs (28). Of course, it was subsequently established that tumors do not exhibit unrestrained exponential growth *in vivo*, and tumor growth is better characterized by logistic or Gompertzian growth (30). This development, in turn, led to important repercussions for dosing schedules for patients with cancer. Thus, historically, mathematical modeling has played a key role in drug development and clinical trial design. More recently, there has been promising evidence of the utility of patient-specific modeling using clinically available data.

Chmielecki and colleagues used mathematical modeling to develop a theoretically optimum dosing schedule for non–small cell lung cancers with mutations in the EGFR gene (23). Briefly, the group incorporated knowledge of the different growth kinetics of drug-sensitive and -resistant *EGFR*-mutant cells into an evolutionary model constrained by clinically available data. Their model led to the prediction that high-dose pulses of erlotinib combined with a continuous low dose of the drug can significantly delay the onset of resistance. This hypothesis is now being tested in a prospective clinical trial (31).

Bozic and colleagues also used evolutionary dynamics to predict the effects of combination therapy for patients receiving the targeted therapy vemurafenib (32). One exciting result generated by their model is that combination therapy with two drugs given simultaneously is more effective than if the therapies are given sequentially thereby providing important guidance for the design of subsequent clinical trials.

There have also been efforts to use clinically available data to predict response, after the treatment regimen has been selected. Neal and colleagues developed and applied a patient-specific model of glioblastoma multiforme (GBM) growth to predict, at the first postradiation treatment time point, progression-free and overall survival (33). Their model is based on a realization of the reaction diffusion model that describes GBM growth in terms of cellular proliferation and random diffusion. Those two parameters are then determined on a patient-specific basis and used to project tumor development forward in time. An extension of the “personalized” reaction diffusion equation was put forward by Weis and colleagues who coupled the random diffusion term to tissue mechanical properties (34). Applying this model to patients with breast cancer receiving neoadjuvant chemotherapy achieved an area under the ROC curve of 0.81 for predicting, after the first cycle of treatment, which patients would go on to achieve a pathologic complete response (35).

It is important to stress that each of these (illustrative) studies was based on clinically available data and their results indicate that clinically relevant “forecasts” can be made to both optimize the therapeutic regimen, and then predict which patients will respond to that regimen.

## Potential Limitations

In general, the factors that dramatically affect the accuracy of initial value problems are: (i) inadequate mathematical representation of the processes being modeled, (ii) errors in the initial conditions (i.e., errors in the diagnostic phase), and (iii) inadequate model resolution (i.e., physical spacing between the measurements). Unfortunately, all three of these limitations are well represented in oncology. In addition to lacking clinically relevant mathematical descriptions of tumor growth and treatment response, we have an incomplete description of the state vector described in Fig. 1. Our assessment of initial conditions is frequently based on sampling a small portion of tissue *via*, for example, a biopsy, and this is almost certainly not adequate to characterize the entire tumor (12). One would desire the state vector used to initialize the NTP model to be populated at every grid point associated with the scale at which the tumor forecast is to be made. Furthermore, just as in NWP, if too small an area is chosen to make a prediction, the forecast will be quite poor as information from outside the forecast area spreads into the forecast area. Similarly, in cancer, it is critical to not only characterize the tumor itself, but also the surrounding healthy-appearing tissue.

We have previously stressed the importance of using quantitative noninvasive imaging methods to help address this issue (36). Of course, the spatial resolution available from (even advanced) imaging methods are limited to a several hundred microns and this coarseness will obscure more subtle features whose importance will almost certainly grow with time. However, it could be that such subtleties are not required to answer many clinically relevant questions; that is, perhaps a mesoscale model making predictions on bulk tumor properties is appropriate for predicting (say) progression-free survival in a particular patient. Indeed, such “mesoscale” predictions that lack local area specifics and tend to be less precise (e.g., long range forecasts are not specific—make claims like warmer/cooler, wetter/drier than historical mean) are still of critical use in NWP for the transportation, agricultural, and energy industries. Thus, it may be that similar predictive power in oncology is “good enough” to indicate whether or not an individual patient will beat the average progression-free or overall survival?

## Conclusion

In this article, we have proposed that the cancer community can learn much from the phenomenological models that enable NWP. Following such an approach potentially enables optimal pairing of treatment regimen with tumor phenotype, and predicting how an individual patient will respond to a particular regimen. All of this is predicated on initializing models with patient-specific data obtained early after diagnosis to predict future patient status. Just as the problem of calculating future events based on present events has seen enormous advances in the physical sciences over the last century, it must be tacked in all earnest for oncology. Indeed, this is the whole goal of science: the calculation of future events based on present events. Oncology should be no different.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

This article has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this article, or allow others to do so, for U.S. Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

## Authors' Contributions

**Conception and design:** T.E. Yankeelov, V. Quaranta

**Development of methodology:** T.E. Yankeelov, K.J. Evans

**Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.):** T.E. Yankeelov

**Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis):** E.C. Rericha

**Writing, review, and/or revision of the manuscript:** T.E. Yankeelov, K.J. Evans, E.C. Rericha

**Administrative, technical, or material support (i.e., reporting or organizing data, constructing databases):** T.E. Yankeelov

**Study supervision:** T.E. Yankeelov

## Acknowledgments

The authors thank the National Cancer Institute for support through grants 1U01CA142565, 1U01CA174706, R01 CA138599, and P30 CA68485.

## Grant Support

This work was supported in part by the U.S. Department of Energy, Office of Science under the Scientific Discovery through Advanced Computing (SciDAC) project on Multiscale Methods for Accurate, Efficient, and Scale-Aware Models of the Earth System.