## Abstract

High rates of local recurrence in tobacco-related head and neck squamous cell carcinoma (HNSCC) are commonly attributed to unresected fields of precancerous tissue. Because they are not easily detectable at the time of surgery without additional biopsies, there is a need for noninvasive methods to predict the extent and dynamics of these fields. Here, we developed a spatial stochastic model of tobacco-related HNSCC at the tissue level and calibrated the model using a Bayesian framework and population-level incidence data from the Surveillance, Epidemiology, and End Results (SEER) registry. Probabilistic model analyses were performed to predict the field geometry at time of diagnosis, and model predictions of age-specific recurrence risks were tested against outcome data from SEER. The calibrated models predicted a strong dependence of the local field size on age at diagnosis, with a doubling of the expected field diameter between ages at diagnosis of 50 and 90 years, respectively. Similarly, the probability of harboring multiple, clonally unrelated fields at the time of diagnosis was found to increase substantially with patient age. On the basis of these findings, we hypothesized a higher recurrence risk in older than in younger patients when treated by surgery alone; we successfully tested this hypothesis using age-stratified outcome data. Further clinical studies are needed to validate the model predictions in a patient-specific setting. This work highlights the importance of spatial structure in models of epithelial carcinogenesis and suggests that patient age at diagnosis may be a critical predictor of the size and multiplicity of precancerous lesions. *Cancer Res; 76(24); 7078–88. ©2016 AACR*.

Patient age at diagnosis was found to be a critical predictor of the size and multiplicity of precancerous lesions. This finding challenges the current one-size-fits-all approach to surgical excision margins.

## Introduction

Head and neck squamous cell carcinoma (HNSCC) arises in the epithelial lining of the oral cavity, pharynx, and larynx. The annual incidence rate of HNSCC is estimated to be around 600,000 new cases worldwide (1), and in the United States alone, the death toll is approximately 11,500 cases per year (2). While a subgroup of HNSCC, including the oropharynx, is caused by infection with high-risk types of the human papillomavirus (HPV; ref. 3), the majority of HNSCC are HPV-negative and primarily associated with tobacco use and alcohol consumption (1). Despite a growing number of therapeutic strategies, survival in HPV-negative HNSCC has not improved significantly over the past decades, with a low median survival of approximately 20 months (4).

Poor prognosis in tobacco-related head and neck cancers is commonly attributed to the development of local recurrences and metastases after removal of the primary tumor (1). Field cancerization, or the presence of premalignant fields surrounding the primary tumor, has been shown to drive the high rate of local recurrence (5–11). In fact, molecular studies have shown that the majority of HPV-negative HNSCC develop within local fields of premalignant cells that are clonally related to the resected primary cancer (7, 9). These fields can be much larger than the actual carcinoma and are generally difficult to detect without genomic analyses due to their visually normal appearance (12). If such an invisible premalignant field extends beyond the surgical margins, the portion of the field left behind after resection of the primary tumor increases the risk of subsequent recurrence and contributes to poor prognosis (11). The occurrence of precancerous fields was first reported by Slaughter and colleagues (5) and has since been documented in most epithelial cancers (10, 13–15).

To date, it remains difficult to account for the phenomenon of field cancerization in clinical practice. The main reason for this translational barrier is a poor understanding of the dynamics and geometry of these invisible fields. Indeed, it is impossible to account for the risk factor of an unresected field of precancerous tissue in absence of any information on the extent of the suspected field. To overcome this barrier, we synthesized data and knowledge sources from the tissue, clinical and population scales to develop a quantitative model of HNSCC carcinogenesis that accounts for spatial features of the precancer field. On the basis of this model, we then sought to identify aspects of standard clinical practice that could be improved by means of patient-specific modeling tools.

We developed and calibrated a spatial stochastic model of tobacco-related head and neck squamous cell carcinoma at the tissue level. The major model assumptions are:

(Epi)genetic events lead to mutated cells with increased fitness advantage, and mutant clones can spread through the basal layer of the affected epithelium prior to onset of invasive cancer.

Because of the time scales of carcinogenesis, only long-lived progenitor cells in the basal layer of the epithelium are relevant; this monolayer of progenitor cells is modeled as a two-dimensional lattice, where each node is occupied by a cell.

Wild-type and mutant progenitor cells undergo evolutionary competition in the basal layer; only nearest-neighbor interactions are considered.

Tissue architecture and homeostasis are maintained (constant population size) until the invasive cancer stage.

Total population size (|$N$|); cellular transition rates from normal to precancer (|${u_1}$|) and from precancer to carcinoma *in situ* (|${u_2}$|), respectively; relative proliferative advantage of precancer (|${s_1}$|) and carcinoma *in situ* cells (|${s_2}$|); mean sojourn time from preclinical lesion to clinical diagnosis with cancer (|$1/\psi$|).

We were interested in quantifying the geometry of the field of precancerous cells surrounding the tumor at time of diagnosis (|$\sigma _3}$|). The key equations are:

- (i) The survival function (probability that cancer has not been diagnosed by time (|$t$|) of the model,
where

*γ*is the incomplete gamma function, |$\lambda \equiv N{u_1}{\bar{s}_1}$| and |$\theta \equiv \ {u_2}{\bar{s}_2}\pi c_2^2( {{s_1}})/3$||$.$| - (ii) Conditioned on diagnosis occurring at time |$t$|, the probability density function of the radius of the precancer field that surrounds the primary tumor, for all |$r \in [ {0,{c_2}t} ]$|, and zero otherwise, where |${c_2}$| is the radial expansion speed of the precancer field and |$\zeta \equiv 3\theta /c_2^3$|.

The microscopic tissue-level models were calibrated on the basis of population-level incidence data. Using a computational Bayesian framework, we determined the posterior distributions for the identifiable parameters |$\lambda$|, |$\theta ,\$|and |$\psi .$| On the basis of the posterior distributions, we derived model-based predictions of field quantities at time of diagnosis.

## Materials and Methods

### Biological mechanism of carcinogenesis

To model the carcinogenesis of tobacco-related HNSCC, we first developed a spatial stochastic model of homeostasis in stratified squamous epithelia of the head and neck. The homeostatic epithelia of this region undergo periodic bottom–up renewal (16), whereby long-lived progenitor cells in the basal layer of the epithelium give rise to transit amplifying cells (TAC) of limited proliferative potential (17, 18). As they divide, the TAC move toward the superficial layers of the stratified epithelium, where they eventually exit the cell cycle and get sloughed off at the tissue surface. Because differentiating TAC are lost from the tissue within a few weeks (16), they are unlikely to contribute to the emergence of a neoplastic clone of cells, and it suffices to focus on the population of progenitor cells in the bottom layer of the epithelium.

The transformation from normal to cancerous cells is largely attributed to the successive accumulation of (epi)genetic aberrations, see Fig. 1A. Once a normal progenitor cells has acquired a growth advantage, the resulting clone of mutant cells starts spreading across the affected epithelium by replacing adjacent cells of lower proliferative potential (19). There are a multitude of genetic alterations commonly found in HPV-negative HNSCC (1), but both the total number and temporal ordering of events necessary for cancer initiation are patient- and tumor-specific (20). In view of this genotypic heterogeneity, we focused on the phenotypic progression instead. Indeed, the majority of tobacco-related HNSCC progress through a series of precancerous stages called epithelial dysplasia, see Fig. 1B (21, 22). These stages are histopathologically classified into three categories: mild, moderate, and severe dysplasia [carcinoma *in situ* (CIS); ref. 23]. On the basis of this observation, we modeled neoplastic progression at the cellular level in four stages (Fig. 1A), developing from normal cells (type 0) into mildly dysplastic cells (type 0*), into moderately dysplastic cells (type 1), and eventually into severely dysplastic cells (type 2).

Although the number and ordering of mutations responsible for HNSCC carcinogenesis are not unique, a better understanding of the genetic underpinnings and fitness landscapes of the phenotypic transitions enhances the mechanistic foundation of the model, see Fig. 1A. It is generally accepted that loss of function of the tumor suppressor gene *TP53*, either through mutation or loss of heterozygosity, is an early event during tumorigenesis of HNSCC. Indeed, histopathologic studies have found that expanding cancer fields are preceded by small *TP53*-mutated patches (<200 cells in diameter; ref. 8). Two studies (21, 22) found no significant differences in proliferation index for low-grade dysplasia compared with normal tissue, even though mutant *TP53* was significantly elevated in low-grade dysplasia (22). This suggests that a likely first step of tumorigenesis is the formation of an evolutionarily neutral patch (mild dysplasia) of *TP53*^{+/−} cells. Once a cell within this patch acquires a further genetic aberration, either through methylation or deletion of *CDKN2A* (24) or through gain or amplification of *CCND1* (25), the resulting clone has a selective advantage over its neighboring cells and causes the emergence of an expanding precancer field (26). Once the field starts expanding, further hits to the EGFR or TGFβ pathways can lead to moderate dysplasia, CIS, and eventually invasive HNSCC.

### Microscopic model of carcinogenesis

To capture the spatial dynamics of the above mechanisms of carcinogenesis, we developed a stochastic Moran model on a regular two-dimensional lattice (27, 28). Initially, all cells on the lattice are normal progenitor cells (type 0) and proliferate at rate *f*_{0}. Cell division is stochastic, and when a progenitor cells divides, one daughter cell replaces the mother cell, and the other daughter cell replaces one of the nearest neighbor cells on the lattice, chosen uniformly at random. Each normal cell can, at rate *u*_{1,a}, acquire a mutation to become a mildly dysplastic cell (type 0*). At the tissue level, this leads to a patch of dysplasia as illustrated in the first panel of Fig. 1C. The proliferation rate of type 0* cells is still *f*_{0}, but acquisition of a second hit, at cellular rate *u*_{1,b}, can transform type 0* cells into precancerous cells (type 1). Type 1 cells in turn have a proliferative advantage *f*_{1} = *f*_{0} (1 + *s*_{1}) over type 0 and type 0* cells and can clonally expand as precancerous fields of moderate dysplasia by replacing neighboring cells of lower proliferative potential, see second panel in Fig. 1C. Mathematically, we can simplify the model by deriving the effective mutation rate *u*_{1} from type 0 to type 1 cells: the probability |${\upsilon _{01}}$| that a type 0* cell gives rise to growing field of type 1 cells before its progeny dies out is given by (27)

where |${\bar{s}_1} = {\frac{{{s_1}}}{{1 \+\ {s_1}}}$|. At rate |${u_2}$|, type 1 cells in turn can mutate into malignant (type 2) cells that initiate growth of cancer, see third panel in Fig. 1C. Cancer cells have a fitness advantage of |${s_2}$| over type 1 cells and hence divide at rate *f*_{2} = *f*_{1} (1 + *s*_{2}). Finally, the time between onset of CIS and diagnosis was modeled as an exponentially distributed random variable with rate |$\psi$|, see Fig. 1B. The reason for this simplified transition model from first cancer cell to diagnosis is two-fold. First, upon penetration of the basement membrane and infiltration of the stroma, the epithelial architecture is irreversibly disrupted and the microscopic dynamics with lateral displacement no longer apply; second, depending on the location of the lesion and the patient's behavior, time to clinical diagnosis can be highly variable.

### Mesoscopic model approximation

We previously derived the following mesoscopic approximation to the spatial model that enables analytical calculations of waiting times and field geometries (28, 29). In the mesoscopic model, the arrival of expanding type 1 clones is a stochastic Poisson process with rate |$N{u_1}{\bar{s}_1}$|. The factor |${\bar{s}_1}$| accounts for the fact that the progeny of a new type 1 cell will either fluctuate to extinction with probability |$1 - {\bar{s}_1}$| or expand indefinitely with probability |${\bar{s}_1}$|. Next, we characterized the growth of an expanding type 1 clone on the basis of a shape theorem (30). According to the theorem, expanding type 1 clones asymptotically grow as a convex symmetric shape with constant radial growth rate |${c_2}$|. Simulations of the process show that this shape can be approximated as a disk or circle, see (29). The rate |${c_2}$| in turn depends on the selective advantage |${s_1}$|. For small |${s_1}$| it scales as |${c_2}( {{s_1}}) \sim \sqrt {4\pi {s_1}/{\rm{log}}( {1/{s_1}})}$| where |$f( s ) \sim g( s )$| means that |$f( s )/g( s ) \to \$||$1$| as |$s \to 0$| (28); for larger values of |${s_1}$|, we used numerical simulations to estimate the value of |${c_2}$|, see Fig. 3 in ref. 29. In particular, for |${s_1} \, \gt \ 0.5$|, we found an approximately linear dependence |${c_2}( {{s_1}} ) \approx 0.6{s_1} + 0.22$| (29). Finally, we note that multiple precancer fields of type 1 cells can coexist as illustrated in Fig. 1C.

### Incidence data

To parameterize the tissue-level model and ensure its compatibility with population-level data, we calibrated the mesoscopic approximation of the microscopic model on the basis of age-specific incidence rates from the Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (18 Registries, 2000–2012) in a Bayesian framework. As we focused on HPV-negative cancers, we restricted our search to males and females diagnosed with malignant tumors in the following categories of “Site Recode ICD-O-3/WHO 2008”: lip, tongue, floor of mouth, gum and other mouth, hypopharynx, and larynx (31). We further excluded cases with primary site labeled as “C01.9-Base of tongue” because cancers at this site are commonly HPV-positive (32). We derived the number of susceptible individuals and the number of cancer cases diagnosed in each of the following age groups: 15–19 years, 20–24 years,…, 80–84 years, and 85+ years. We note that the age-specific incidence curve for HPV-negative HNSCC has a peaked profile, see Fig. 2A. However, the model developed here predicts monotonically increasing incidence rates and thus cannot recapitulate peaked incidence patterns. Because changing baseline risks are more likely to cause the peaked profile than changes in biological parameters (see Supplementary Information S1), we truncated the incidence data used for parameter inference at the age of 74 years instead of introducing additional age-dependent model parameters. The vast majority of HPV-negative HNSCC are attributed to tobacco consumption (33, 34); thus, we reduced the pool of susceptible individuals to the approximately 20% of current smokers within the considered age groups (35). Furthermore, we assumed the exposure to start at the age of 15 years (36, 37).

### Parameter inference

To compare the model predictions with SEER data, we first derived the age-specific incidence function under the evolutionary model at the tissue-level. To this end, we defined the random variable |${\sigma _3}$| as the time from mean age at smoking initiation to diagnosis with invasive cancer and calculated the *survival function*

where

|$\gamma ( {a,x} ) = \mathop \int _0^x {t^{a - 1}}{e^{ - t}}dt$| is the lower incomplete |${\rm{\Gamma }}$|-function, and |$\psi$| is the transition rate from onset of CIS to diagnosis. The parameter groups (eq. C) have the following interpretation: |$\lambda$| is the initiation rate of expanding type 1 clones, see also previous paragraph; |$\theta$| can be rewritten as |$( {\gamma t_2^{ - 3}} )/3$|, where |${t_2} = {( {c_2^2{u_2}{s_2}} )^{ - 1/3}}$| is approximately the time it takes for an expanding type 1 clone to give rise to an expanding type 2 clone (28). On the basis of |$S( t ),$| we then computed the *hazard function*|$h( t ) = - S^′( t )/S( t )$|, which provides a direct link to the SEER-derived age-specific incidence rates. The derivation of the likelihood function used for inference is outlined in Supplementary Information S1.

We then used a computational Markov Chain Monte Carlo (MCMC) Metropolis Hastings algorithm to derive posterior distributions for the parameters |$\lambda$|, |$\theta$|, and |$\psi$|. We used improper priors over |$[ {0,\infty } ]$| for all 3 parameters and ran a Markov chain of length 500,000. Computations were performed in MATLAB (Version 8.5.0, The MathWorks Inc. 2015). To determine plausible initial conditions and step lengths for the MCMC algorithm, we independently estimated the 3-parameter groups based on literature estimates of the underlying biologic parameters, see Supplementary Information S2. After discarding the first 50,000 burn-in steps, we sampled every 100th step to characterize the posterior distributions. Convergence to steady-state was visually ascertained. Credible intervals (CI; 95%) for the posterior distributions were computed by discarding the 2.5% largest and smallest values for each parameter. The marginals of the posterior distributions for |$\lambda$|, |$\theta ,$| and |$\psi$| are summarized in Fig. 2B. Median estimates and 95% CIs of the posterior distributions, together with the literature-based order of magnitude estimates are summarized in Table 1. Goodness of fit was deemed satisfactory on the basis of visual inspection, see Fig. 2A.

Parameter . | Units . | Prior . | Median (95% CI) . | Literature estimates . |
---|---|---|---|---|

λ | [y^{−1}] | |$[ {0,\infty } ]$| | 4.3 × 10^{−3} (4.0 × 10^{−3}–4.8 × 10^{−3}) | ∼10^{−3} |

θ | [y^{−3}] | |${\rm{\ }}[ {0,\infty } ]$| | 5.9 × 10^{−6} (4.4 × 10^{−6}–7.9 × 10^{−6}) | ∼10^{−4} |

|$\psi$| | [y^{−1}] | |$[ {0,\infty } ]$| | 9.8 × 10^{−2} (6.7 × 10^{−2}–13.5 × 10^{−2}) | ∼10^{−1} |

Parameter . | Units . | Prior . | Median (95% CI) . | Literature estimates . |
---|---|---|---|---|

λ | [y^{−1}] | |$[ {0,\infty } ]$| | 4.3 × 10^{−3} (4.0 × 10^{−3}–4.8 × 10^{−3}) | ∼10^{−3} |

θ | [y^{−3}] | |${\rm{\ }}[ {0,\infty } ]$| | 5.9 × 10^{−6} (4.4 × 10^{−6}–7.9 × 10^{−6}) | ∼10^{−4} |

|$\psi$| | [y^{−1}] | |$[ {0,\infty } ]$| | 9.8 × 10^{−2} (6.7 × 10^{−2}–13.5 × 10^{−2}) | ∼10^{−1} |

NOTE: For each parameter, the median and 95% CIs for the MCMC Metropolis-Hastings–derived posterior distributions and the literature-based estimates are shown.

To compare model predictions with clinical recurrence patterns, SEER (13 Registries, 1991–2011) was queried for a retrospective analysis of the recurrence risk in patients diagnosed and treated for HNSCC lesions of size ≤ 3 cm. Data extraction was performed using an MP-SIR session with the software SEER*Stat (http://seer.cancer.gov/seerstat). Again, we focused on the predominantly HPV-negative sites as specified under “Model calibration.” Only patients aged 18 to 85 years at diagnosis with a known tumor size of ≤30 mm were included. Patients with unknown surgery status, unknown radiation status, and those identified at autopsy or on death certificate only were excluded. The following categorical variables were extracted from the dataset: age at diagnosis (younger: 18–49 years, older: 50–85 years) and treatment (surgery only, radiation only, surgery and radiation). For each patient, the time from diagnosis to recurrence (defined as second malignant event in any of the included head and neck sites), death, or censoring (defined as the minimum of end-of-study, loss to follow-up, and 10 years) was calculated. For each of the three treatment groups with known treatment status, a univariate Cox proportional hazards regression for recurrence was performed on the basis of the categorical age at diagnosis variable, stratified into younger (<50 years) and older (≥50 years) patients. Kaplan–Meier curves were used to visualize the results. Hazard ratios with 95% CIs were computed, and all *P* values were calculated as two-sided with significance declared for *P* values below 0.05. Computations were performed in R (http://www.R-project.org).

## Results

### Local field size at diagnosis

To characterize the field geometry at time of diagnosis with invasive cancer, we first derived the size distribution of the local field. More precisely, denoting the radius of the local field at time |$t$| by |${R_l}( t )$|, we conditioned on |$\{ {{\sigma _3} = t} \}$| to compute the probability density function of the local field radius as follows:

where |$\zeta \equiv 3\theta /c_2^3$|. We refer to Supplementary Information S3 for details of the calculation. The corresponding density functions (eq. D) for ages at diagnosis of 40, 60, and 80 years are shown in Fig. 3A. Unfortunately, these results have an explicit dependence on the radial growth rate of the precancer field, |${c_2}( {{s_1}} )$|, which cannot be directly estimated due to identifiability constraints (we used literature-based estimates instead). To overcome this issue, we focused on the relative size of a precancerous field compared with the field size at the age of 50 years. More precisely, we introduced the relative mean field radius (RMFR), defined as the mean field radius at a given age divided by the mean field radius at age 50 years. Indeed, the RMFR is completely specified by the inferred parameters |$\lambda$|, |$\theta$| and |$\psi$| and does not explicitly depend on |${c_2}$| (see Supplementary Information S3). On the basis of the posterior distributions from the Bayesian inference, we computed the RMFR as a function of patient age at diagnosis in Fig. 3B. The model predicted an approximately linear increase in RMFR between the ages of 20 and 90 years, and the median (95% CI) RFMR at age 90 was found to be 1.90 (1.88–1.93).

### Multiple fields at diagnosis

In addition to the local field size at diagnosis, we estimated the probability of harboring distant fields in addition to the local field. The exact distribution of the total number |$M( t )$| of clonally independent fields, including the local and all distant fields, is found in Supplementary Information S3. In Fig. 3C, the probability distribution for the number of multiple fields is shown for different ages at diagnosis (40, 60, and 80 years), based on the median posterior values of |$\lambda$|, |$\theta ,$| and |$\psi$|. Using this distribution, we then computed the probability of harboring at least 2 clonally unrelated fields in the head and neck region at time of diagnosis,

Accounting for the posterior distributions of the parameter groups, the probability of harboring at least one distant field is shown as a function of age at diagnosis in Fig. 3D. The model predicts that the probability (95% CI) of harboring at least one distant field increases by one order of magnitude from 2.1% (2.0%–2.4%) at 20 years of age to 20.0% (18.6%–22.7%) at 80 years of age.

### Age-specific recurrence patterns

To test the validity of our model, we made model-based predictions and tested them against outcome data from SEER. The first prediction was based on the fact that current clinical practice recommends an age-independent excision margin width of ∼1 cm (38, 39). Considering the predicted increase of local field size with age at diagnosis (Fig. 3A), an age-independent margin width implies that, for the same tumor size, the area of precancerous tissue left behind after resection of a primary tumor is bigger in older patients (with larger precancer fields) than in younger patients (with smaller precancer fields), see Fig. 4. This increase in recurrence risk in older patients is further increased by the elevated probability of harboring distant fields (Fig. 3B) that may not be affected by local excision of the primary tumor. To test the resulting hypothesis of an age-dependent recurrence risk, we performed a retrospective time-to-event analysis for patients with small ≤ 3 cm) tumors who received surgery but no radiation (*n* = 9,665). In alignment with our model-based prediction, we found a 58% increase in recurrence risk in older patients (≥50 years, *n* = 7,774) than in younger patients (<50 years, *n* = 1,891; HR, 1.58; 95% CI, 1.27–2.10; *P* < 0.0001). The corresponding Kaplan–Meier curve for recurrence-free survival is shown in Fig. 5A.

Because adjuvant radiation therapy after surgery targets the margins of the resected tissue portion, the combination of surgery and radiation is more likely to eradicate the entire local field and hence diminish the recurrence risk from a residual field portion. Therefore, we hypothesized a reduction in the age-related difference in recurrence risk among patients who received radiation in addition to surgery. This prediction was corroborated by the analysis of outcome data. Repeating the above time-to-event analysis for patients who received both surgery and radiation, we found a smaller, nonsignificant difference (HR, 1.29; 95% CI, 0.91–1.84) between older (≥50 years, *n* = 4,537) and younger patients (<50 years, *n* = 993), see Fig. 5B.

Finally, we note that for the patient group with radiation but without surgery (Fig. 5C), no significant difference in recurrence risk between the two age groups was observed. Interestingly, in comparison to younger patients (<50 years, *n* = 266), older patients (≥50 years, *n* = 2,622) did have a slightly smaller recurrence risk (HR, 0.78; CI, 0.40–1.52). However, we were not able to draw any definite conclusions because of a small sample size and patient attrition in the younger age group.

## Discussion

Leftover fields of precancerous tissue that are not removed during resection of HPV-negative, tobacco-related HNSCC lead to an increased risk of local recurrence. Because of their visually normal appearance, epithelial precancer fields are generally not detectable at the time of surgery without harmful biopsies of the surrounding tissue portions. Thus, there is a need for noninvasive methods to predict the extent of the field and to make patient-tailored decisions with respect to optimal treatment modality, size of excision margins, and frequency of postoperative surveillance. In this work, we developed and calibrated a mechanistic model of spatial tumorigenesis in tobacco-related HNSCC. Following the lines of previous multiscale modeling work (40–43), we linked biologic mechanism at the tissue scale to data at the population scale and used a Bayesian framework to ensure compatibility of the model dynamics across scales. Using the model, we made predictions about the geometry of precancerous fields at the time of diagnosis with invasive cancer and tested them against outcome data.

The models predicted a strong dependence of the local field size on age at diagnosis. More precisely, we found the expected field size of the local precancerous field surrounding the primary tumor to double between the ages of 50 and 90 years. The model further predicted a substantial increase in the probability of harboring independent synchronous fields at the time of diagnosis, increasing to 20% at age 80. Together, these findings indicate that the current one-size-fits-all approach to the width of surgical excision margins may need to be critically re-evaluated.

On the basis of the model predictions, we hypothesized that larger local fields and a higher risk of harboring multiple fields in older patients would translate into a higher recurrence risk in older than in younger patients when treated by surgery only. We tested this hypothesis by analyzing recurrence patterns in younger (<50 years) and older (≥50 years) patients who received surgery without radiation. As predicted, we found a higher risk of local recurrence in older patients than in younger patients. In contrast, we did not find a statistically significant age effect in patients who received adjuvant radiation therapy. This is consistent with the observation that radiation is less focalized than surgical excision and hence is more likely to remove the surrounding field portions, even in older patients with larger fields. In addition, radiation may have an age-specific effect on local recurrence that is independent of the field size. For example, studies on the effects of radiotherapy in patients with ductal carcinoma *in situ* reported a higher likelihood of recurrence in patients who received adjuvant radiation therapy compared with those who did not (44, 45). It is important to note that the observed recurrence patterns may be due to a combination of several biologic and clinical factors, not just the age-related size of the precancer field.

Our study has several limitations. First, inherent limitations of the SEER database such as ascertainment biases and incomplete recurrence records may impact the validity of our results (46). Second, HPV status is not recorded in SEER, and despite careful selection of sites that are predominantly HPV-negative, a portion of recorded cases may have been misclassified. Third, although the incidence data used for model calibration were restricted to a relatively short period (2000–2012), it is likely subject to secular trends that are at least partially due to changing smoking patterns in the population (4, 47). In addition, it has been shown that smoking cessation leads to a slow decrease in head and neck cancer risk (48), which may result in differential field sizes between former and current tobacco users. In future work, the use of more granular smoking prevalence and cancer incidence data, adjusted for secular trends, is expected to address these issues and improve the model predictions. Fourth, a limitation shared with most multistage modeling analyses is the assumption of identical parameters for all individuals. This issue is partially mitigated by the Bayesian approach, which provides posterior distribution of parameters rather than point estimates. However, incorporating patient-level heterogeneity into the modeling framework constitutes a critical next step toward the long-term goal of developing personalized approaches to head and neck cancer care. Finally, we did not explicitly account for the role of the immune system as a first line of defense against neoplastic progression. Although immune effects could be incorporated into the model, current knowledge about the exact mechanisms of immune response to neoplastic transformation seems insufficient to develop meaningful models.

Historically, different types of multistage models have been used to infer the nature of cancer-causing mechanisms on the basis of incidence and mortality data. The first such models (49) were based on the assumption that cancer arises as the product of an organ-specific number of rare mutations. These models assumed a well-mixed population of cells and neglected cellular dynamics and spatial tissue structure. Later, these models were extended to account for clonal expansions of precancerous cells (40, 41, 50) and used to analyze the number and size of premalignant clones in nonspatial populations, both for exponential mean growth (51) and for more general growth dynamics (52, 53). In parallel to multistage models, population dynamic models such as the Wright–Fisher and Moran processes have also been used to model cancer initiation under well-mixed assumptions (54, 55). In these models, the expansion of premalignant clones is constrained by competition with healthy cells or premalignant cells at other stages. Our current work constitutes an extension of both multistage and population dynamic models. In particular, we accounted for the spatial structure of the epithelial lining of head and neck sites, and we developed a mechanistic model on the basis of the current understanding of tissue homeostasis and molecular biology of HPV-negative HNSCC. Using the spatially explicit model, we achieved a very good fit to the SEER-based incidence data, Fig. 2A. For the sake of comparison, we also fitted the classical multistage model (49) to the incidence data, see Supplementary Information S4. While the inferred number of 3 stages corresponds well to the current understanding of HNSCC etiology, the model fit was found to be poor. More recently, A study (47) showed that nonspatial 2-stage clonal expansion models yield good fits to the data when accounting for period and cohort effects in biologic parameters. Nevertheless, our findings suggest that for solid cancers subject to field cancerization, spatial effects may play an important role in shaping population-level incidence patterns.

While our predictions for age-dependent recurrence risks and field sizes were successfully corroborated by SEER-based outcome data, pathologic studies are needed to provide a definite confirmation of our models. Our predictions of an increase in field size in older patients will hopefully spur a critical re-evaluation of the one-size-fits-all approach to surgical excision margin width, as well as the effectiveness of surgery without radiation therapy in older patients with extended exposure and potentially very large fields. In addition, appropriate follow-up intervals for the monitoring of local recurrences may be optimized according to patient-specific predicted field geometries, and a quantitative understanding of the field dynamics combined with a set of reliable biomarkers for premalignant fields may enable physicians to perform targeted risk-assessments in high-risk groups, such as heavy smokers. Finally, the proposed modeling framework can be applied to other 2-dimensional epithelial sites affected by field cancerization, such as bladder, esophagus and skin, and it may provide valuable insights into observed differences in field extent and outcome between HPV-positive and HPV-negative head and neck cancers.

## Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

## Disclaimer

The views expressed in this article are those of the author and do not necessarily represent the views of the Department of Veterans Affairs or the United States government.

## Authors' Contributions

**Conception and design:** M.D. Ryser, W.T. Lee, K.Z. Leder, J. Foo

**Development of methodology:** M.D. Ryser, W.T. Lee, K.Z. Leder, J. Foo

**Acquisition of data (provided animals, acquired and managed patients, provided facilities, etc.):** W.T. Lee

**Analysis and interpretation of data (e.g., statistical analysis, biostatistics, computational analysis):** M.D. Ryser, N.E. Ready, J. Foo

**Writing, review, and/or revision of the manuscript:** M.D. Ryser, W.T. Lee, N.E. Ready, K.Z. Leder, J. Foo

**Study supervision:** W.T. Lee, J. Foo

## Acknowledgments

The authors thank Prof. F. Michor (Dana-Farber Cancer Institute) and Prof. R. Durrett (Duke University) for valuable feedback on model design and analysis.

## Grant Support

The study was supported by the following grants: NIH R01-GM096190 and SNSF P300P-154583 (M.D. Ryser); NSF CMMI-1362236 (K.Z. Leder); and NSF DMS-1224362 and NSF DMS-1349724 (J. Foo).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.