## Abstract

Breast cancer progresses in a multistep process from primary tumor growth and stroma invasion to metastasis. Nutrient-limiting environments promote chemotaxis with aggressive morphologies characteristic of invasion. It is unknown how coexisting cells differ in their response to nutrient limitations and how this impacts invasion of the metapopulation as a whole. In this study, we integrate mathematical modeling with microenvironmental perturbation data to investigate invasion in nutrient-limiting environments inhabited by one or two cancer cell subpopulations. Subpopulations were defined by their energy efficiency and chemotactic ability. Invasion distance traveled by a homogeneous population was estimated. For heterogeneous populations, results suggest that an imbalance between nutrient efficacy and chemotactic superiority accelerates invasion. Such imbalance will spatially segregate the two populations and only one type will dominate at the invasion front. Only if these two phenotypes are balanced, the two subpopulations compete for the same space, which decelerates invasion. We investigate ploidy as a candidate biomarker of this phenotypic heterogeneity and discuss its potential to inform the dose of mTOR inhibitors (mTOR-I) that can inhibit chemotaxis just enough to facilitate such competition.

This study identifies the double-edged sword of high ploidy as a prerequisite to personalize combination therapies with cytotoxic drugs and inhibitors of signal transduction pathways such as mTOR-Is.

## Introduction

Invasion and infiltration are hallmarks of advanced cancers, including breast cancer, and accumulating evidence suggests that invasive subclones arise early during tumor evolution (1).

Infiltrating and invasive phenotypes are often observed among high-ploidy cells. Converging evidence from different cancer types, including colorectal, breast, lung, and brain cancers, suggests a strong enrichment of high-ploidy cells among metastatic lesions as compared with the primary tumor (2, 3). Even in normal development, trophoblast giant cells, the first cell type to terminally differentiate during embryogenesis, are responsible for invading the placenta and these cells often have hundreds of copies of the genome (4). Coexistence of cancer cells at opposite extremes of the ploidy spectrum occurs frequently in cancer and is often caused by whole-genome doubling (WGD). Similar to infiltration, the timing of WGD is early in tumor progression across several cancer types (5, 6), including breast cancer. Tetraploid cells resulting from WGD often lose regions of the genome, giving rise to poly-aneuploid cancer cells (PACC). Multiple studies have described a minority population of PACCs with an unusual resilience to stress (7–9). A very recent investigation of evolutionary selection pressures for WGD suggests that it mitigates the accumulation of deleterious somatic alterations (10). However, it is not clear what cost cells with a duplicated genome pay for this robustness.

To address this question, we developed a mathematical model of high- and low-ploidy clones under various energetic contingencies. We calibrated the model to recapitulate doubling times and spatial growth patterns measured for the HCC1954 ductal breast carcinoma cell line via MEMA profiling (11). This includes exposure of HCC1954 cells to hepatocyte growth factor (HGF) in combination with 48 extracellular matrices (ECM), followed by multi-color imaging (12). Sequencing (13) and karyotyping studies (14, 15) of cancer cell lines have shown that technically, sequences obtained from a cancer cell line encode a metagenome (16), because they represent the aggregate genomes of all clones that coexist within the cell line. To capture this heterogeneity, we modeled how high- and low-ploidy clones coevolve and how that affects the invasiveness of the metapopulation. Our results show that long-term coexistence of low- and high-ploidy clones occurs when sensitivity of the latter to energy scarcity is well-correlated to their chemotactic ability to populate new terrain. Higher energy uniformity throughout population expansion steers selection in favor of the low-ploidy clone, by minimizing the fitness gain the high-ploidy clone gets from its chemotactic superiority. Better understanding of how these two phenotypes coevolve is necessary to develop therapeutic strategies that suppress slowly proliferating, invasive cells before cytotoxic therapy favors them.

## Materials and Methods

We first introduce the conceptual framework that led to the model, formulate the model equations, and then we derive analytic and numerical solutions. Finally, we describe drug sensitivity and RNA sequencing (RNA-seq) data analysis of cell lines with different ploidies.

### Overall model design

The use of partial differential equations (PDE) over a stochastic-based approach, such as agent-based modeling, permits us to make predictions based on analytic results derived from the subsequent PDEs and an increase in computational efficiency.

We modeled growth dynamics in polyploid populations of various subpopulation compositions. Appealing to a continuity description and assuming a continuum approximation of the cellular and energy concentration is valid, we derived a system of coupled PDEs. Each compartment in the PDE describes the spatio-temporal dynamics of the quantity of interest (e.g., energy or cellular dynamics). At the core of our model lies the assumption that chemotactic response to an energy gradient is a function of the cell's energetic needs. This trade-off implies that heterogeneous populations will segregate spatially, with higher energy-demanding cells leading the front of tumor growth and invasion. In contrast, for an energy-rich environment, we expect the cells to grow in a similar way as they will have no need to search for places of higher energy density.

We modeled competition for energy in a heterogeneous population, consisting of goer and grower subpopulations, to predict their behavior during plentiful and energy sparse conditions. We assumed both goer and grower have the same random cell motility coefficient Γ, the same chemotactic coefficient χ, and maximal growth rate λ. Sensitivity to the available energy was modeled via a Michaelis–Menten type equation with coefficients |$\Phi _i}$| that determine the amount of energy that population, *i*, needs for a half maximal growth rate. Chemotactic motion is asymmetrically sensitive to the amount of energy available. This is accounted for by |$Ξ _i}$|. The goers (*U*) are more motile and require more energy compared with the growers (*V*). This manifests itself mathematically via the parameter relations |$\Phi _U} \gt{\Phi _V}$| and |$Ξ _U} \lt {Ξ _V}$|.

Quantitative estimates of how a cell's growth rate and motility depend on energy availability have been described previously (17–19). Energetic resources come in various forms, and the identities of the limiting resources that ultimately drive a cell's fate decision vary in space and time. We used MEMA profiling to investigate what likely is only a narrow range of that variability, 48 HGF-exposed ECMs (12). HGF stimulates both growth and migration of epithelial and endothelial cells in a dose-dependent manner, whereby maximal growth-stimulating effects have been reported at concentrations twice as high as concentrations that maximize migration (20). In-line with these reports, our model demonstrates a shift from proliferation to migration as resources get depleted.

Mathematical models of a dichotomy between proliferation and migration are numerous (21–23), but whether the two phenotypes are indeed mutually exclusive remains controversial (24). Our efforts to use mathematical modeling to inform what cost high-ploidy cells (goers) pay for their robustness build upon these prior works. We extend it by accounting for differences in the rate at which cells consume energy (δ) and differences between the rate at which energy diffuses in media (coefficient Γ* _{E}*). For mid-range energy diffusion coefficients, our model describes directed cell motility in response to a gradient of a soluble attractant, that is, chemotaxis. In contrast, small values of Γ

*approximate cell motility toward insoluble attractants, that is, haptotaxis. As such, the chosen value for Γ*

_{E}*determines where along the continuum between haptotaxis- and chemotaxis-directed cell movement resides. A special case applies when Γ*

_{E}_{E}is very large relative to cell movement, in which case neither chemotaxis nor haptotaxis occurs. All these energetic contingencies determine whether phenotypic differences between goers and growers manifest as such, and explain why nonproliferative arrested cells can have the same motility as cycling cells (24).

### Quick guide to equations

Our model assumes that the energy diffusion coefficient, Γ* _{E}*, depends on the type of media or surface upon which the cells grow. We also suppose that energy is consumed in proportion to the number of cells present. For cell motility, we assume it is driven both by random cell motion and chemotaxis. This leads to our general coupled system:

The system (Eqs. A–C) is defined on a dish of radius *R*, subject to no-flux boundary conditions. We assume the goers (*U*) and growers (*V*) are initially concentrated at the center with radius |$\rho _0} \le R$| and initial concentrations |$U_0}\ {\rm{and}}\ {V_0}$|. Furthermore, we assume that the energy density is uniformly distributed on the plate with initial value *E*_{0}. All parameters, except otherwise stated, are independent of any of the state variables (|$\tilde{E},U,{\rm{and}}\ V$|). The energy is consumed at rate δ. Both cells can divide at the maximal rate λ, but are restricted by the energy density |$\tilde{E}$|. The cells can locally grow to a local maximal density given by *K*. This parameter is often cell line dependent and is related to contact inhibition and a cell's ability to grow on top of other cells.

We now convert our system to dimensionless form that was used for all subsequent simulations and analysis via appropriate rescaling (Supplementary Materials and Methods),

Hereby, rescaling simplified the system from 13 (10 parameters and three initial conditions) to nine (seven parameters and two initial conditions), with dimensionless variables: |$\gamma _E} = \cr {\Gamma _E}/\chi ,\;a = \delta K/\lambda ,\;b = \Gamma /\chi$|, |$\phi = \Phi /{E_0}$|, and |$\xi = Ξ /{E_0}$|.

### Analytic estimates of infiltration

The desire for cells to move is inherently tied to the availability of nutrients and space. To this end, we define |$Ψ ( \tau ): = [ {\rho ( \tau ) - {\rho _0}} ]/{\rho _0}$|, where |$\rho ( 0 ) = {\rho _0}$|, the initial radius of cell seeding density. Ψ can be thought of as a nondimensional measure of infiltration attained after time τ. This dimensionless measure has the added benefit of being scale independent. An inherent difficulty with random cell motility and calculating infiltration is that the system always reaches the boundary of the dish in finite time. Instead, we defined the maximum degree of infiltration to be given by the time needed for the total energy to be below a threshold, |$\varepsilon \ll 1$|.

In general, the maximum degree of infiltration is difficult to predict analytically, so we only considered the single subpopulation case when obtaining our analytic estimates. We also made use of the simplification that most energy type molecules (e.g., glucose) have a diffusion coefficient that is very large, relative to cell movement. This allowed us to write a reduced model which has energy homogeneous in space (see Supplementary Eqs. S3a–S3b in Supplementary Materials and Methods). We estimated infiltration analytically through two different approaches. First, we extracted an ordinary differential equation (ODE) system that couples nutrient consumption to the wave front location. Second, we derived estimates of the infiltration achieved by using the cell concentration after it has become uniform.

#### No chemotaxis infiltration estimate

We can derive an estimate for infiltration in the absence of chemotaxis by appealing to Supplementary Eqs. S3a–S3b. These logistic growth reaction diffusion models often exhibit complex dynamics. One such example is that of a traveling wave solution, where one state, typically the stable state, travels (infiltrates) through the domain. The canonical example of this phenomenon is the Fisher–KPP. In contrast to the Fisher–KPP and other traveling wave problems typically studied, our model has a decaying growth rate and so the magnitude of the nonlinearity that caused the traveling wave tends to zero. Therefore, in the classic sense, our system does not admit a traveling wave. We here extend the theory by assuming a separation of time scales between consumption of energy (e.g., decay of energy-dependent growth rate) and the speed of the traveling wave.

To begin, we made the assumptions that the wave speed is a slow function of *r* and τ. The solution obtained will verify that these assumptions are valid for our system. Our ansatz takes the form |$u( {r,\tau } ) = U( {r - \eta \tau } ) = U( z )$|. Note that in spatial equilibrium, *u* = 1 is stable and *u* = 0 is an unstable steady state. If the unstable state is what governs the wave speed, then the wave is said to be “pulled,” otherwise it is “pushed” (25, 26). Following Witelski and colleagues (27), the resulting analysis yielded a coupled system of ODEs that govern the speed of the front (Supplementary Materials and Methods):

where we have the dimension *n* of the system (e.g., |$n = 2,\ 3$|) entering in both the consumption of energy and influencing the second term of the wave front location ρ. We saw that our assumptions on the behavior of the traveling wave were verified, because ρ is assumed much larger than 0 and |$0 \ll \rho \ll R$|. This shows that, *E* is a slowly varying function of time and |$\eta = d\rho /d\tau$| is a slowly varying function of time and its current distance from the center. In other words, a traveling wave will only form when the initial seeding radius is relatively large (|$\rho _0} \sim 1$| was sufficient in most simulations).

#### Estimating the degree of infiltration from equilibration

The previous section yields a system that can be integrated to track the evolution of the cell front over time. However, we may be more interested in how far it will ultimately travel before energy exhaustion and not the speed at which it gets there. An interesting alternative to tracking the wave over time is to only assume it travels as a wave, but only record the density after the system has reached uniformity. This is possible if the death rate (which has been neglected) is much smaller than the time it would take the cells to spread uniformly. If this is the case, we can bound the degree of infiltration from only knowing the uniform value |$\bar{u}$| at the end of the experiment (see Supplementary Materials and Methods for details):

where Λ is the transition width, that is, the length scale on which cell concentration goes from *u* = 1 to *u* = 0, and |$\bar{u}$| is the concentration at the end of the experiment (at equilibration). Solving Eq. I gives us the estimated infiltration as:

Because we assumed that the transition width is unknown, we can bound ψ [or ρ(*T*)] by considering the lower and upper bounds |$\Lambda = 0,\ 1$|, respectively.

### Numerical estimates of infiltration

When directed cell movement is not negligible, analytic approximations are more difficult to obtain, and numerical simulations are preferred to estimate the degree of infiltration. For simulations where we measured infiltration, we took the 1-norm |$\parallel\!\! E{\parallel _1} = \mathop \sum_i | {{E_i}} | \,\lt\,\varepsilon = 10^{- 4}$| as the threshold defining minimum energy requirements. We calibrated the model to recapitulate doubling times and spatial growth patterns measured for the HCC1954 ductal breast carcinoma cell line via MEMA profiling. The dataset included exposure of HCC1954 cells to HGF in combination with 48 ECMs in a DMSO environment (i.e., no drug was added to the media). Between 13 and 30 replicates of each ECM were printed on a rectangular MEMA array as circular spots (Supplementary Fig. S1A), adding up to a total of 692 spots (12).

#### MEMA data analysis

An average of 31 cells (90% confidence interval, 23–41) were seeded on each 350-μm spot and grown for 3 days. The median number of cells imaged at day 3 was 121, which falls within the range expected from the estimated seeding density and a doubling time reported for this cell line (72–128 cells). Confluence at seeding was calculated from the ratio between the cumulative area of cells and the area of the spot (see Supplementary Fig. S1B and S1C).

Quantification of segmented, multi-color imaging data obtained for each spot 3 days after seeding was downloaded from Synapse (synID: syn9612057; plateID: LI8C00243; and well: B03). Cells tended to grow in a symmetric, toroidal shape (Supplementary Fig. S1D), albeit considerable variability was observed across the ECMs. We binned cells detected within a given spot according to their distance to the center of the spot and calculated the confluence of each bin (Supplementary Materials and Methods). This was then compared with the confluence obtained from the simulations as described below (Supplementary Data S1–S3).

#### Simulation environment

Simulations were run to recapitulate logistics and initial growth conditions on the MEMA array (Supplementary Fig. S1A), the spatial domain was circular with radius |$R = 1\!,\!750$| μm and the temporal domain was 3 days. Cells were seeded uniformly at the center of this domain along a radius |$\rho _0} = 175$| μm at 36% confluence. To recapitulate the configuration of the MEMA profiling experiment, cells leaving the ρ_{0} domain can no longer adhere to the ECM and die. This was implemented by introducing the carrying capacity, *K*(*x*), rapidly approaching zero when *x* > ρ_{0}. This setup can lead to energy attracting cells to the periphery of a MEMA spot and beyond. We ran 161,000 simulations (Supplementary Data S1) at variable energy consumption rates, chemotactic/haptotactic coefficients, energetic sensitivities, and diffusion rates of the growth-limiting resource (i.e., ECM bound HGF; Table 1; Supplementary Table S1; Supplementary Data S2). Ninety-two percent of these simulations were completed successfully. For each simulation/ECM pair, we compared spatial distributions of *in silico* and *in vitro* confluence (Supplementary Data S3) using the Wasserstein metric (28).

Experimentally measured parameters . | Symbol . | Homogeneous cells on any ECM . | Heterogeneous cells . | Unit . |
---|---|---|---|---|

Maximal growth rate | λ | 0.56 | 0.56 | Per day |

Goer's seeding density | u _{0} | 36.0 | 3.8 | % confluence |

Grower's seeding density | v _{0} | 1.3 | % confluence | |

Simulation time | t | 3 | 127 | Days |

Radius of domain | |$\hat{\bf R}$| | 1,750 | 1,750 | μm |

Seeding radius | ρ _{0} | 175 | 175 | μm |

Growth beyond ρ_{0} | No | Yes |

Experimentally measured parameters . | Symbol . | Homogeneous cells on any ECM . | Heterogeneous cells . | Unit . |
---|---|---|---|---|

Maximal growth rate | λ | 0.56 | 0.56 | Per day |

Goer's seeding density | u _{0} | 36.0 | 3.8 | % confluence |

Grower's seeding density | v _{0} | 1.3 | % confluence | |

Simulation time | t | 3 | 127 | Days |

Radius of domain | |$\hat{\bf R}$| | 1,750 | 1,750 | μm |

Seeding radius | ρ _{0} | 175 | 175 | μm |

Growth beyond ρ_{0} | No | Yes |

Inferred parameters . | Symbol . | Homogeneous cells on GAP43 . | Heterogeneous cells . | Unit . |
---|---|---|---|---|

Energy diffusion coefficient on ECM | Γ _{E} | 9,500 | 8,982 | μm^{2}/day |

Energy consumption rate | δ | 57 | 23 | Per cells/day |

Energy needed for half-maximal growth | Φ _{u} | (0.01, 3) | 0.310 | |

Φ _{v} | (0.05, 0.16) | |||

Chemotactic coefficient | χ | 5,500 | 1,919 | μm^{2}/day |

Energy deficit inducing chemotaxis | ξ _{u} | (2, 300) | (1, 8) | |

ξ _{v} | 140 |

Inferred parameters . | Symbol . | Homogeneous cells on GAP43 . | Heterogeneous cells . | Unit . |
---|---|---|---|---|

Energy diffusion coefficient on ECM | Γ _{E} | 9,500 | 8,982 | μm^{2}/day |

Energy consumption rate | δ | 57 | 23 | Per cells/day |

Energy needed for half-maximal growth | Φ _{u} | (0.01, 3) | 0.310 | |

Φ _{v} | (0.05, 0.16) | |||

Chemotactic coefficient | χ | 5,500 | 1,919 | μm^{2}/day |

Energy deficit inducing chemotaxis | ξ _{u} | (2, 300) | (1, 8) | |

ξ _{v} | 140 |

Note: Parentheses indicate ranges with equally good fits. Corresponding nondimensional parameters are shown in Supplementary Table S1. Net chemotaxis of cells is given by the interaction of the chemotactic coefficient (χ) and energy deficit inducing chemotaxis (ξ* _{u}*), that is, effective cell chemotaxis is approximately χ/ξ

*. The larger ξ*

_{u}*, the later the cells sense an energy deficit, that is, the larger the energy gradient must be in order for them to accelerate their movement in response to it.*

_{u}The model was implemented in C++ (standard C++11). The armadillo package (ARMA version: 9.860.1; ref. 29) was used for simulation of the PDEs. Simulations were run on a Intel Core i7 MacBook Pro, 2.6 GHz, 32 GB RAM. The source code is available at the GitHub repository for the Integrated Mathematical Oncology department: GoOrGrow (https://github.com/MathOnco/GoOrGrow_PloidyEnergy). An R Markdown script was included, enabling replication of comparisons between experimental data and model simulations (Supplementary Data S1–S3).

### Ploidy as biomarker of phenotypic divergence

#### Drug sensitivity analysis

We used growth rate inhibition (GR) metrics as proxies of differences in drug sensitivities between cell lines. Unlike traditional drug sensitivity metrics, like the IC_{50}, GR curves account for unequal division rates, arising from biological variation or variable culture conditions, a major confounding factor of drug response (32). Previously calculated GR curves and metrics were available for 41 of 44 breast cancer cell lines. A total of 46 drugs had been profiled on at least 80% of these cell lines and their area over the curve (*GR _{AOC}*) drug–response metric (33) was downloaded from GRbrowser (http://www.grcalculator.org/grtutorial/Home.html). For each drug, we calculated the z-score of

*GR*across cell lines to compare drugs administered at different dose ranges. Of these 46 drugs, 39 could be broadly classified into two categories as either cytotoxic (25 drugs) or inhibitors of signaling pathways (14 drugs; Supplementary Table S2). We then evaluated a cell line's ploidy as a predictor of its

_{AOC}*GR*value using a linear regression model. Because molecular subtype of breast cancer cell lines is known to influence drug sensitivity, we performed a multivariate analysis, including the molecular subtype as well as an interaction term between ploidy and drug category into the model.

_{AOC}#### RNA-seq analysis

The molecular subtype classification of all cell lines was available from prior studies (34–37). Of these 44 cases, four were suspension cell lines and were excluded from further analysis. Of the remaining 40 cell lines, 20 originated from primary breast cancer tumors and were the focus of our analysis. Gene expression data were downloaded from CCLE. We used gene set variation analysis to model variation in pathway activity across cell lines (38). Pathways for which less than 10 gene members were expressed in a given cell lines were not quantified. The gene membership of 1,417 pathways was downloaded from the REACTOME database (ref. 39; v63) for this purpose.

## Results

### High-ploidy breast cancer cell lines have increased metabolic activity and cell motility

To better understand the phenotypic profile of high-ploidy cells, we compared the ploidy of 41 breast cancer cell lines with their response to 46 drugs. For a drug–response metric, we used the integrated effect of the drug across a range of concentrations estimated from the *GR _{AOC}* (32, 33). We observed that cytotoxic drugs and drugs inhibiting signal transduction pathways were at opposite ends of the spectrum (Fig. 1A). Namely, ploidy was negatively correlated with the

*GR*for several cytotoxic drugs and positively correlated with the

_{AOC}*GR*of various mTOR inhibitors, suggesting high-ploidy breast cancer cell lines tend to be resistant to DNA-damaging agents, while sensitive to drugs targeting nutrient sensing and motility.

_{AOC}We built a multivariate regression model of drug sensitivities to test the hypothesis that the relationship between ploidy and *GR _{AOC}* was different for cytotoxic drugs than for inhibitors of cell signaling pathways. Molecular subtype alone (Fig. 1B), could explain 0.4% of the variability in

*GR*z-scores across cell lines (adjusted

_{AOC}*R*

^{2}= 0.0044;

*P*= 0.026). Including ploidy into the model did not improve its predictive accuracy (adjusted

*R*

^{2}= 0.0037;

*P*= 0.058). However, an interaction term between ploidy and drug category (cytotoxic: 27 drugs vs. signaling: 16 drugs) increased accuracy to explain 2.6% of variability in drug sensitivity across cell lines (adjusted

*R*

^{2}= 0.026;

*P*< 1e-5; Fig. 1C). The same improvement from an interaction term between ploidy and drug category was observed in an independent dataset of half-maximal inhibitory concentration (IC

_{50}) values of 34 cytotoxic drugs and 51 signaling inhibitors obtained from the Genomics of Drug Sensitivity in Cancer database (Supplementary Fig. S2; ref. 40).

We then focused on a subset of the aforementioned 41 cell lines, namely those that had been established from primary breast cancer tumors as adherent cells (20 cell lines; Fig. 1D), and we quantified their pathway activity (see Materials and Methods). A total of 27 pathways were correlated to ploidy at a significant *P* value (|$| {{\rm{Pearson}}\,r} | \ge 0.44$|; *P* ≤ 0.05; Supplementary Table S3). The strongest correlations were observed for metabolic pathways such as hyaluronan metabolism and metabolism of vitamins (Fig. 1E and F). Hyaluronic acid is a main component of the ECM and its synthesis has been shown to associate with cell migration (41, 42).

These results support a model that connects high ploidy with both the chemotactic ability and metabolic energy deficit of a cell.

### Infiltration of homogeneous populations

Model design was guided by the goal to describe growth dynamics along two axes: from random to directed cell motility and from homogeneous to heterogeneous cell compositions (Materials and Methods, Eqs. D–F). While the last section will step into the second axis, the following two subsections distinguish scenarios along the first axis: (i) “homogeneous nutrient environments” are environments in which random cell motility dominates throughout population growth and (ii) “heterogeneous nutrient environments” imply the formation of gradients, which cause cells to move in a directed fashion, as is the case during cellular growth on an ECM.

#### Homogeneous nutrient environments

When the diffusion of the nutrient occurs at a much faster time scale than the actions taken by cells, we can assume that at the time scale of cells, the nutrient is essentially uniform in space. This simplification allows us to neglect chemotactic/haptotactic motion and consider only random cell motility as the driving force that spreads the cell density throughout the dish.

We obtained analytic estimates for the degree of infiltration in a homogeneous environment that lay the groundwork for new predictions. To arrive at Eq. G–Eq. H, we employed the approximations that diffusion of energy molecules (e.g., glucose) is fast relative to cell movement and that the cells' movement through the dish can be approximated by a traveling wave (43). These assumptions were verified by comparing the front estimates with results from the full numerical model (Eqs. D–F; Fig. 2A and B).

An interesting alternative to tracking the wave over time is to assume it travels as a wave, but only record the density after the system has reached uniformity. If the death rate is much smaller than the time needed for the cells to spread uniformly, we can bound the degree of infiltration that occurred by only knowing the uniform density of cells (equation; ref. 4; Fig. 2C).

These analytic solutions point to scaling relationships for the speed of the moving front. For highly efficient energy-using cell lines (|$\phi \ll 1$|; Fig. 2D), the front will evolve at a speed nearly independent of energy. In contrast, for large |$\phi \gg 1$|, the speed of the front falls off as |$1/\sqrt \phi$|. These predictions of the behavior of infiltration on parameters can be investigated experimentally.

#### Heterogeneous nutrient environments

Assumptions made in the prior section apply to standard cell cultures of adhesive cells in a typical cell culture dish, where energetic resources diffuse so fast that gradients do not form. These assumptions break down during cellular growth on an ECM. Binding to the ECM can cause soluble factors (like HGF) to act and signal as solid phase ligands (44, 45). Proteolytic degradation of these ECMs then creates haptotactic gradients. Figure 1 includes HCC1954, a near-tetraploid breast cancer cell line, whose growth on various ECMs has been measured via MEMA profiling. We analyzed HCC1954 and looked to determine whether our mathematical model can explain its spatial growth patterns. MEMA profiling resulted in considerable variability of growth patterns across different ECMs (Fig. 3A and B).

We projected two-dimensional (2D) spatial distributions measured on the MEMA array onto one dimension (1D; Fig. 3C), rendering them comparable with those obtained from our simulations (Fig. 3D and E). For each simulation/ECM pair, we calculated the distance between *in silico* and *in vitro* spatial cell distributions using the Wasserstein metric (Fig. 3C and F) and ranked simulations by their minimum distance across ECMs. The top 2.3% simulation parameters were then stratified by the ECM whose spatial pattern they best resembled and compared with uniform prior parameter distributions (Fig. 3D and E). Seventy-five percent of the ECMs accounted for only 1.55% of the top simulations. The tendency of cells on these ECMs to grow in a toroidal shape was strong, suggesting it may be the consequence of nonuniform printing of ECMs onto the array. We concluded that, our model cannot explain the growth patterns on those ECMs well and focused our attention on the remaining 12 ECMs, represented by 3,429 simulations. We refer to the parameters of these simulations as inferred parameter space. Principal component analysis, uniform manifold approximation and projection (UMAP; ref. 46), and density clustering (47) of the inferred parameter space revealed three clusters (Fig. 3G), with different ECMs segregating mainly into different clusters (Fig. 3H).

The two largest clusters differed mostly in their chemotactic/haptotactic and energy diffusion coefficients; while the small cluster stood out by a high sensitivity to low energy and fast chemotactic/haptotactic response (Fig. 3I). Overall, all five model parameters showed significant differences between the three clusters, suggesting they all contribute to distinction between ECM growth patterns (Fig. 3I). This was further affirmed when looking at the % variance explained per principal component per parameter (Fig. 3J). To formalize parameter sensitivity analysis independent of ECMs, we also calculated the Sobol index (48) of each parameter. The Sobol index quantifies how much of the variability in spatial cell concentration is explained by each parameter, while accounting for all its interaction effects. Each parameter contributed to significant variance (Sobol index > 0.02; ref. 48) in at least one of three spatial statistics (Fig. 3K): skewness, confluence, and gradient near the edge of the ECM.

We observed substantial differences in chemotactic/haptotactic coefficients and energy consumption rates between ECMs (Supplementary Fig. S3). To query the biological significance of this variability, we quantified the expression of the 12 ECMs in the HCC1954 cell line (Materials and Methods). Two of the five inferred ECM-specific model parameters were correlated with RNA-seq–derived expression of the corresponding ECM: energy consumption rate (Pearson *r* = −0.657; *P* = 0.028) and sensitivity to low energy (Pearson *r* = 0.562; *P* = 0.071), although latter fell short above significance (Supplementary Fig. S4).

In summary, the posterior distributions of model parameters represent a substantial departure from the uniform priors and could explain a significant proportion of growth conditions on the HGF-exposed MEMA array. This approach identified regions of interest in the parameter search space, allowing us to focus further simulations on biologically relevant chemotactic/haptotactic coefficients and energy diffusion rates.

### Infiltration of heterogeneous, chemotactic populations

Growth of cells in a given ECM environment was measured across 13–30 replicates on the MEMA platform. While our model, when calibrated to the corresponding ECM environment, could explain the observed growth pattern in the majority of these replicates, a substantial fraction could not be explained by fixed choices of sensitivity to low energy and directed cell motility (Supplementary Fig. S3). One possibility that may explain this is HCC1954 is a heterogeneous cell line, with clones of variable phenotypes coevolving. Representation of these clones among the 31 cells that were on an average sampled for each replicate may vary (Supplementary Fig. S5). This hypothesis is supported by a bimodal distribution of DNA content observed among replicating HCC1954 cells on individual ECM spots (Fig. 4A and B; Supplementary Fig. S6A and S6B). If the HCC1954 population was homogeneous, we would expect a unimodal distribution of DAPI intensity among S-phase cells of this cell line. The observation of a bimodal distribution among S-phase cells suggests that HCC1954 is likely a polyploid cell line, that is, clones of variable ploidies coexist in this cell line.

To better understand the growth dynamics in a polyploid population, we used the two subpopulation version of our model, whereby variable chemotactic abilities and energetic sensitivities of goer and grower subpopulations compete with one another (Eqs. D–F). We used fixed values for energy diffusion and consumption rates as informed by model calibration (Fig. 3B) and varied sensitivity to low energy and chemotactic ability of both goer and grower, subject to Eqs. D–F (Table 1; Supplementary Table S1). We initially used the same spatial and temporal domains as during model calibration, but concluded that the implied duration of the experiment (3 days) was too short for dynamics between the two populations to manifest. Each MEMA spot has a low capacity, whereby confluence is reached at no more than a few hundreds of cells. Such a small number of cells will not exhibit wave-like behavior and, therefore, will not suffice for spatial structure to emerge. We, therefore, extended temporal and spatial domains of our simulations, seeding cells at a lower confluence and letting them grow onto the entire energy domain until they consumed all available energy (average of 127 days; Table 1).

We observed a nonmonotonic relation between the goer's chemotactic ability and the speed with which the metapopulation invades the dish, with intermediate values being the least beneficial to its growth and spread (Fig. 4C). Temporal analysis of the simulations (Supplementary Data S4–S6) revealed that if the goer's chemotactic motility is too high, it will leave the center of the dish too soon, leaving room for the grower to expand locally (Fig. 4D). In contrast, if the goer's motility is too low, it will miss the window of opportunity to ensure its dominance further away from the center of the dish while energy is still abundant. As a consequence, it will be outgrown by the grower at the edge of the dish once energy becomes sparse (Fig. 4E). Only when the goer has an intermediate motility, does the grower persistently coexist with it, both at the center and edge of the dish (Fig. 4F).

## Discussion

Models of infiltration are typically formulated under two critical assumptions. First, energy production and consumption are nonuniform, leading to the formation of an energy gradient (49–51); or second, energy consumption is very slow compared with production, leading to an essentially infinite energetic resource (52). Here, we formulated a generalized model of infiltration when energy is finite and investigated its behavior along a spectrum of scenarios, from permanent energy uniformity to scenarios where this uniformity is gradually lost. The model derivation does not assume a particular dimension (e.g., 2D *in vitro* experiments vs. *in vivo* or 3D spheroid experiments). Many parameters that were valid in 2D will also extend naturally to 3D. For example, we would not expect a difference in consumption rates or half maximal growth rates of the cells. However, energy diffusion (Γ* _{E}*) or random cell motion (Γ) will be higher because of the increased degree of freedom (53). When energy is uniformly distributed at all times and the time scale for cell death is substantially longer than that of cell motility and birth, our results suggest that the degree of infiltration can be approximated using the cells' density at equilibration of movement and growth (Fig. 2C).

With an energy gradient that becomes steeper over time, our analytic approximations no longer hold, as directed cell movement becomes nonnegligible. For this scenario, we leveraged MEMA profiling to inform regions of interest in the parameter search space. These regions of interest are relevant for cellular growth on a variety of HGF-exposed ECM proteins. We observed correlations between inferred model parameters and RNA-seq–derived signatures, even though the latter were not used during parameter inference. A potential explanation for the negative correlation between ECM-specific energy consumption and expression is that our model does not account for the possibility that cells can replace the ECM they degrade. The slower the rate of this replacement is, the higher the consumption rate appears to be. On the other hand, the more dependent cells are on an ECM for growth, the faster they must replace it, potentially explaining a positive correlation between ECM-specific expression and sensitivity to low energy.

When calibrating our model to a given ECM environment, growth patterns of a substantial fraction of replicates of that ECM could not be explained by fixed choices of sensitivity to low energy (Supplementary Fig. S3). A potential explanation for this is variable cell compositions across experimental replicates. An alternative explanation is that this variability stems from artifacts that arise during nonuniform printing of ECMs onto the array—the so-called ring effect. However, a bimodal distribution was also observed in the DNA content of replicating cells, which is not affected by potential printing artifacts. The second peak of this bimodal distribution was wider, consistent with the fact that high-ploidy cells with more DNA need longer to replicate.

The cell line, HCC1954, is described as a hypertetraploid cell line with an average DNA content of 4.2 (30). However, this average value may be misleading, as suggested by stark variability in nuclei sizes (Fig. 4A). Despite a wealth of genomic information generated for this cell line (30), to the best of our knowledge, no prior reports indicate whether or not the cell line is polyploid. We and others have found that high ploidy is an aneuploidy-tolerating state that accompanies intratumor heterogeneity *in vivo* and *in vitro* (5, 54, 55). Our results suggest that HCC1954 is likely polyploid.

One event that could have led to this polyploid state is WGD. In contrast to cell lines, WGD events in primary tumors are mostly clonal, not subclonal (5, 6, 10), clones carrying a doubled genome often sweep over the population, such that by the time the tumor is detected, the diploid ancestor no longer exists. A related scenario is advanced, therapy-exposed tumors shown to revert to genomic stability, potentially bringing a WGD population back to a genomic state that more closely resembles its diploid ancestral state (56). The model presented here can investigate how dynamics between the two subpopulations unfold in both of these scenarios—early, shortly after the WGD or late, after therapy exposure. This would characterize what circumstances prevent the WGD carrying clone from becoming dominant or from retaining its dominance and could help explain WGD incidence in primary and recurrent tumors.

If spatial and temporal domains were to be extended beyond the configuration of MEMA spots, our simulations predict that spatial segregation of two coexisting subpopulations according to their ploidy is a likely scenario and depends on the energy consumption rate. Our model can easily be extended to more than two subpopulations, for example, to include a subpopulation of normal cells. For each additional cell type, a new compartment can be added to the model, with growth- and motion-related parameters that are specific to the corresponding cell type. However, the model currently does not incorporate mutations, that is, the process of generating new clonal lines. A next step will be to extend our model to include mutation events, specifically chromosome mis-segregations that contribute extensively to diversify ploidy of a population (57, 58). The additional DNA content of high-ploidy cells, although energetically costly, brings a masking effect against the deleterious consequences of chromosome losses (10). This duality may explain the higher sensitivity of high-ploidy cells to glycolysis inhibitors and their lower sensitivity to cytotoxic drugs reported previously in glioblastoma (59).

In-line with prior reports, we found that increased resistance of breast cancer cell lines to cytotoxic drugs was associated with high ploidy. In contrast, high-ploidy breast cancer cell lines were sensitive to inhibitors of signal transduction pathways, including EGFR and especially mTOR signaling. A commonality among those pathways is their contribution to a cell's chemotactic response (60–62), suggesting opportunities to tune chemotaxis. mTOR inhibitors (mTOR-I), such as rapamycin, significantly decrease migration of breast cancer cells in a dose-dependent manner (63, 64). Rapamycin inhibits cell motility by a mechanism similar to that by which it inhibits cell proliferation (65), suggesting that the mTOR pathway lies at the intersection of a cell's decision between proliferation and migration. If high ploidy is indeed a characteristic specific to goer-like cells, then mTOR-Is are likely affecting this cell type (Fig. 1A, E, and F), and could be used to inhibit its chemotactic response, thereby moving the population up the *x*-axis of Fig. 4C. Delaying chemotactic response of highly chemotactic cells could slow down invasion by maximizing competition within a polyploid population. If, on the other hand, chemotactic response of high-ploidy cells is already at an intermediate level, our simulation suggests that, further reduction may accelerate invasion of low-ploidy cells. For such scenarios, therapeutic strategies that include an mTOR-I may not be successful. Experiments will be needed to verify these *in silico* results *in vitro*. Knowing how coexisting clones with differential drug sensitivities segregate spatially can offer opportunities to administer these drug combinations more effectively.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

## Disclaimer

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the H. Lee Moffitt Cancer Center and Research Institute

## Authors' Contributions

**G.J. Kimmel:** Conceptualization, software, formal analysis, validation, visualization, methodology, writing-original draft, writing-review and editing. **M. Dane:** Resources, validation, visualization, writing-review and editing. **L.M. Heiser:** Resources, data curation, validation, methodology. **P.M. Altrock:** Supervision, funding acquisition, visualization, writing-original draft, writing-review and editing. **N. Andor:** Conceptualization, data curation, supervision, funding acquisition, visualization, methodology, writing-original draft, writing-review and editing.

## Acknowledgments

We thank the researchers of the Department of Integrated Mathematical Oncology at Moffitt Cancer Center for fruitful discussions and early feedback on parts of the article. This work was supported by the NCI grant R00CA215256 awarded to N. Andor. P.M. Altrock and N. Andor acknowledge support through the NCI, part of the NIH, under grant number P30-CA076292. L.M. Heiser acknowledges support through NIH research grants 1U54CA209988 and U54-HG008100, Jayne Koskinas Ted Giovanis Foundation for Health and Policy, and Breast Cancer Research Foundation.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.