## Abstract

Cancer screening and early detection efforts have been partially successful in reducing incidence and mortality, but many improvements are needed. Although current medical practice is informed by epidemiologic studies and experts, the decisions for guidelines are ultimately *ad hoc*. We propose here that quantitative optimization of protocols can potentially increase screening success and reduce overdiagnosis. Mathematical modeling of the stochastic process of cancer evolution can be used to derive and optimize the timing of clinical screens so that the probability is maximal that a patient is screened within a certain “window of opportunity” for intervention when early cancer development may be observable. Alternative to a strictly empirical approach or microsimulations of a multitude of possible scenarios, biologically based mechanistic modeling can be used for predicting when best to screen and begin adaptive surveillance. We introduce a methodology for optimizing screening, assessing potential risks, and quantifying associated costs to healthcare using multiscale models. As a case study in Barrett's esophagus, these methods were applied for a model of esophageal adenocarcinoma that was previously calibrated to U.S. cancer registry data. Optimal screening ages for patients with symptomatic gastroesophageal reflux disease were older (58 for men and 64 for women) than what is currently recommended (age > 50 years). These ages are in a cost-effective range to start screening and were independently validated by data used in current guidelines. Collectively, our framework captures critical aspects of cancer evolution within patients with Barrett's esophagus for a more personalized screening design.

This study demonstrates how mathematical modeling of cancer evolution can be used to optimize screening regimes, with the added potential to improve surveillance regimes.

## Introduction

The main rationale for cancer screening is that earlier detection of a disease during a patient's lifetime offers the opportunity to change its prognosis. Compared with incidental cancers, screening can improve the prognosis for patients with early occult cancers detected before symptoms develop, and perhaps even more importantly through removal of preinvasive lesions such as adenomas in the colon, cervical intraepithelial neoplasia in the cervix, and ductal carcinoma *in situ* in the breast. In 2018, it was estimated that nearly half of all cancer incidence in the United States was attributable to detection by screening (1) but it is difficult to assess how many of these patients were destined to die from other causes before these cancers would have led to a symptomatic diagnosis, thus proving their screens to be ineffective. Current programs suffer from the problem of overdiagnosis of benign lesions and yet, with often dismal and costly consequences, there is still underdiagnosis of dangerous lesions because they were either missed by the screen itself or there was a lack of screening uptake by high-risk individuals at the appropriate age (2). Consequently, improvement in screening success is an important health policy goal, and one primed for quantitative assessment. Simply put, there is a mathematical balancing act to solve from both public health and cost-effectiveness perspectives: the goal is to maximize successful prevention of future lethal cancers (often by removing their precursors detected on a screen) and minimize the likelihood of overdiagnosis.

From a biological perspective, malignant cells develop in the body originally from normal cells existing at birth through the evolutionary process of carcinogenesis (3). This process of cancer evolution is stochastic, meaning that random mutations can occur in cells throughout life, which may eventually lead to a malignant phenotype selected in a tissue, but not necessarily so. Mathematical models of carcinogenesis, tumor progression, and metastasis are powerful tools to describe this process and, when calibrated to observed data, can infer important parameters for evolution such as time of metastatic seeding and selection strength of certain mutations that may confer resistance to treatment (4, 5). Because such variables are not immediately measurable from data, mathematical models are now being explicitly incorporated into translational aims such as interval planning for adaptive therapy trials in patients with cancer (6, 7). We propose that these quantitative models can be further utilized at an earlier stage for cancer prevention and control, namely, to infer sojourn (waiting) times for specific stages of cancer evolution that we wish to target, and then to use these targets as objective functions to optimize the efficacy of cancer screening strategies and surveillance for early detection (8).

For example, in colorectal cancer, there are certain stages of carcinogenesis that are both common in type (*APC* gene inactivation for initiation of adenomas) and in timing (adenomas become detectable around age 45 in males and females; ref. 9). Thus, although no two paths of carcinogenesis are exactly the same in 2 patients, there are most probable time scales for this stochastic process to take place (as seen in age-specific incidence curves) while still allowing for less likely events to happen (e.g., very young colorectal cancer cases). These stochastic features were captured in the first mathematical descriptions of cancer, which have been studied and applied extensively in combination with biostatistical methods for over 60 years (10, 11). By analyzing the structure of multistage models, we can also formulate these probability distributions as certain “windows of opportunity” that are crucial to capture in cancer screening planning. In this study, we derive conclusions from the theory of stochastic processes applied to carcinogenesis to formally optimize the timing of initial cancer screens in a population, and optimize subsequent surveillance adaptively, conditional on a previous exam result.

We present analytical results for a generalized framework and then apply it in a proof-of-concept study for screening of Barrett's esophagus, the metaplastic precursor to esophageal adenocarcinoma, a deadly cancer of unmet need clinically. Briefly, because Barrett's esophagus itself is essentially asymptomatic, the majority of patients with Barrett's esophagus remain undiagnosed and thus most esophageal adenocarcinoma cases are diagnosed at an advanced stage. This is unfortunate because (i) mortality associated with esophageal adenocarcinoma is very high (a majority of patients die within a year), and (ii) prior diagnosis of Barrett's esophagus is positively associated with improved esophageal adenocarcinoma survival in patients (12). Therefore, U.S. and UK gastroenterologists have been focused on positively identifying patients with Barrett's esophagus on screening endoscopy, and then recommending that these patients undergo lifelong endoscopic surveillance to detect any dysplasia or early cancer that may be removed; thus, preventing future lethal esophageal adenocarcinoma. However, the clinical reality is that only 0.2%–0.5% of people with non-dysplastic Barrett's esophagus develop esophageal adenocarcinoma each year (13). Thus, the majority of diagnosed patients with Barrett's esophagus will attain little benefit per follow-up exam and oversurveillance of patients with Barrett's esophagus with no evidence of premalignant high-grade dysplasia (HGD) poses a costly problem due to lifelong surveillance recommendations (as frequently as every 3 years by current guidelines). Nonetheless, computational modeling of available population data suggests that, although the annual risk is low for Barrett's esophagus to progress to esophageal adenocarcinoma, the vast majority of incident esophageal adenocarcinomas are expected to arise in patients with Barrett's esophagus so identifying these patients could save lives through early detection during surveillance (14).

To capture this population effectively, recommendations for screening high-risk individuals and enrolling patients with positive Barrett's esophagus into surveillance regimens have been proposed to beneficially use health care resources. Overall, these tactics have been limited in strata choices, variable across organizations, and based on conclusions from observational data alone (15). For example, one current U.S. guideline from the American College of Gastroenterology strongly recommends Barrett's esophagus screening for men with chronic and/or frequent (more than weekly) gastroesophageal reflux disease (GERD) symptoms and two or more risk factors for Barrett's esophagus or esophageal adenocarcinoma. These risk factors include: age greater than 50 years, Caucasian race, presence of central obesity, current or past history of smoking, and a confirmed family history of Barrett's esophagus or esophageal adenocarcinoma (in a first-degree relative; ref. 16). Alternatively in the United Kingdom, the British Society of Gastroenterology recommends endoscopic Barrett's esophagus screening for any gender of patients with chronic GERD symptoms and multiple risk factors (at least three among: age 50 years or older, white race, male sex, obesity; ref. 17). After Barrett's esophagus diagnosis, all of the guidelines recommend a certain uniform timing of surveillance that are determined by presence or absence or detectable dysplasia (no other risk factors considered formally, including patient age), which were proposed by experts (18). Beyond cost-effectiveness for a small number of microsimulated screening ages (19), the optimal starting age for Barrett's esophagus screening is unknown.

Overall, some cancer screening recommendations have been justified by microsimulation models (9) but most screening guidelines consider current epidemiologic data alone and are ultimately proposed by and decided by experts. The clinical consequences of these decisions are recorded as population outcomes, and then further research determines how effective these recommendations proved to be. When appropriate randomized control trial (RCT) data are lacking, as is commonly the case in cancer screening (20), we propose that the initial choice of screening design could first be determined as an optimal control problem with a machine learning and/or model-based approach. To address this problem, we derive probability equations to be used for optimal screening/surveillance timing and risk estimation, and present results for Barrett's esophagus screening ages using a calibrated mathematical model that reproduces U.S. cancer incidence data. Our predictions are in a cost-effective range and were independently validated with available screening data used for current Barrett's esophagus guideline rationale and RCT data using novel sensitive screening technology.

## Materials and Methods

In the derivation of our optimal timing framework, we refer to cancer screening as initial testing for the presence of a specified premalignant/malignant change (Barrett's esophagus in this case study). A second screen after a negative result is defined as rescreening and subsequent tests after a positive screen are defined as surveillance. If the length of time between preclinical detection of Barrett's esophagus and clinical presentation of esophageal adenocarcinoma surpasses patient lifetime, the patient is considered an overdiagnosed case. For cancers with curable precursors, screening programs seek the optimal screening age for which the probability is maximal that (i) individuals in a population harbor a screen-detectable precursor, and (ii) this timing occurs before incidental cancers would have developed (Fig. 1A).

Biological processes give rise to considerable interpatient heterogeneity during the progression from normal tissue to incident cancer, and thus optimal timing for surveillance follow-up is expected to be variable within the population based on screening outcome. To account for this heterogeneity in “snap-shot” temporal data, mathematical models that employ a stochastic approach can incorporate and quantify premalignant and malignant clonal expansions during somatic evolution explicitly (Fig. 1B; refs. 21–24). This offers the advantage that the onset of precursor clones can be defined as random variables and clonal population sizes can be forecasted into future ages (Fig. 1C).

### Previous methods for optimal timing assessment

Population data can provide quantitative benchmarks for certain preclinical states like precursor prevalence but this data alone cannot determine an exact optimal age to initialize screening (and RCTs with arms for every starting age in an at-risk population with long-term follow-up are not feasible). To address this, multistate Markov models have most commonly been used to microsimulate large cohorts of individuals *in silico* from birth through clinical states of carcinogenesis (boxes in Fig. 1D, for example) to quantify expected outcomes (i.e., natural history). Model rates are inferred by fitting to cancer incidence and mortality data (25), or fitting to screening trial data explicitly (26). Then with a calibrated model, screening processes are built into microsimulations to compare strategies based on beneficial outcomes from interventions and to quantify associated costs such as numbers of expected surveillance exams. For our specific health policy question of interest for determining the optimal screening age, simulation models have been used in numerous examples: U.S. Preventive Services Task Force recommendations for colorectal cancer screening in the United States by testing three starting ages (45, 50, or 55; ref. 27), American Cancer Society recommendations for colorectal cancer screening by testing three starting ages (40, 45, or 50; ref. 28), and recently for one-time colorectal cancer screening using single ages from 50 to 65 (29); for lung cancer screening in the United States by testing four scenarios of initial age (45, 50, 55, or 60; ref. 30); and for Barrett's esophagus screening to reduce esophageal adenocarcinoma by testing three starting ages (50, 60, or 70; ref. 19).

While simulation models have assessed particular screening ages, they can only identify cost-effective strategies among those finite number tested and results are often sensitive to numerous inputs and assumptions. Alternatively, mathematical models can search for a global optimum given an objective (utility) function (see refs. 31–33 for methodologies using discrete clinical states). We extend this type of approach to biomathematical models that include temporal dynamics of cancer evolution detailed above, which simultaneously reproduce cancer incidence patterns. Thus far, these types of stochastic models have been used in simulation studies on screening (18, 21, 34) but not in an optimization algorithm that does not require simulation. Here we formulate the novel combination of an evolutionary-based, analytical approach to be used for optimizing timing for early cancer precursor detection.

### Case study in Barrett's esophagus

For specific optimal timing examples presented in Results, we apply our methods to Barrett's esophagus. For this, we utilize the multistage clonal expansion for esophageal adenocarcinoma (MSCE-EAC) model that has been employed within the NCI Cancer Intervention and Surveillance Modeling Network (CISNET) for cost-effectiveness studies of esophageal adenocarcinoma prevention (see Fig. 1; refs. 18, 25, 35). The random variables described below can be formulated (and expanded) for multistage models in esophagus, colon, lung, breast, and others (see Fig. 1C; Supplementary Methods 3.1–3.2 for derivations). We present the methods for three clinical scenarios: (i) initial screening, (ii) rescreening based on negative screen, and (iii) surveillance based on positive screen. In each scenario, we define the “window of opportunity” for the desired target and the objective (utility) function used to compute optimal times (ages). For initial screening, we present examples for associated costs. For surveillance, we define the age-dependent risk of future cancer.

### Optimizing initial screening

The probability for the screening “window of opportunity” is defined as the probability that a specific age is both between the onset of precursor Barrett's esophagus and cancer, and that the age is also younger than time of patient death (Fig. 1C, beige). Here we incorporate realistic cut-off times due to life expectancy in our framework by using life table estimates for U.S. all-cause mortality survival probabilities (36). We then define a *screening objective function* of age, which is the difference between the probabilities the patient has already developed Barrett's esophagus and that the patient has already developed esophageal adenocarcinoma (i.e., the patient is in the window of opportunity), weighting the relative differences by a factor, *w*, and multiplied by the probability the patient is alive at that age. This variable weighting allows policy makers to decide how much to value avoiding underdiagnosis versus avoiding costs associated with overdiagnosis. We then solve for the optimal screening age that maximizes the screening objective function (see Supplementary Methods 3.4 for explicit solution). For examples of two other potential “windows of opportunity” to use as targets for screening, see Supplementary Methods 3.5 and 3.6.

### Calculating costs of different screening ages

Once an initial screening age is determined, we present two examples of associated cost functions. First, we define a cost function for screening at each age and surveillance until some fixed final age at which time surveillance is discontinued even if the patient remains alive. This first cost quantifies the number of patient-years expected from needless surveillance of the overdiagnosed population (the “overscreened” fraction of patients diagnosed with precursor Barrett's esophagus who will not get cancer before the fixed final age of discontinuing surveillance). Following from this, we define a second cost for the number of futile exams among the overdiagnosed Barrett's esophagus cases (see Supplementary Methods 3.4 for derivation).

### Optimizing rescreening

After a negative (normal) screen, the patient may be asked to return for repeat screen at a later time. We define a new probability for the rescreening “window of opportunity” similar to the one for the first screening above, conditional on the outcome of the first screening. We likewise define a rescreening objective function with similar weighting of overdiagnosis versus underdiagnosis, *w*, and solve for the optimal rescreening age (see Supplementary Methods 3.8 for explicit solution).

### Optimizing surveillance

After a positive screen (precursor detected), most clinical guidelines recommend fixed follow-up intervals without incorporating risk factors of patient progression (e.g., gender, uncontrolled GERD). Alternatively in adaptive surveillance, we make mechanistic predictions conditional on patient demographic/clinical features and detected stage of progression at each specific screening age to provide more refined surveillance. The idea is to iteratively condition on the screening/surveillance result at a patient's age and derive an interval for the next surveillance given some outcome of interest, to then obtain the optimal age for the next surveillance. If the screening/surveillance result states that no later stages of neoplasia were detected, the probability for the next-surveillance “window of opportunity” is defined as the probability that the surveillance age is both between the ages of having detectable premalignancy and having malignancy of any size, and younger than time of patient death (Fig. 1C, pink). Similar to above, we derive the optimal next-surveillance age as the age that maximizes the corresponding surveillance objective function with similar weighting of overdiagnosis versus underdiagnosis, *w* (see Supplementary Methods 3.9 for full solution when precursor onset is known).

### Calculating risk for surveillance

In general, a physician will often not know how long a patient has harbored undetected cancer precursors in the body, only that onset occurred sometime before screening. But if we are able to measure this precursor onset age (e.g., through measuring a molecular age of a precursor), then his/her associated cancer risk for next-surveillance age could instead be specifically defined in terms of the precursor onset age (e.g., Barrett's esophagus onset; equivalently, Barrett's esophagus dwell time or Barrett's esophagus age) rather than the patient's years since birth (patient age). See Supplementary Methods 3.10 for explicit solution.

### Data

For results on Barrett's esophagus screening, we apply the methods above for the MSCE-EAC model specifically. The model inputs only include (i) GERD prevalence (modeled from data for age- and sex-specific estimates), and (ii) esophageal adenocarcinoma age- and sex-specific incidence curves provided by Surveillance, Epidemiology, and End Results (SEER) registry (35). The Barrett's esophagus prevalence and neoplastic progression rates are model outputs, that is, they are not fit to observed Barrett's esophagus prevalence nor neoplastic progression rates from empiric studies.

### Previous calibration to SEER cancer incidence data

The cell-level description of evolution in this model is linked to the population scale by means of the model hazard function, defined as the instantaneous rate of detecting cancers among individuals who have not been previously diagnosed with cancer (35). Briefly, the hazard function can be derived from the backward equations for the stochastic multistage branching process described above and solved numerically via a system of coupled ordinary differential equations (Supplementary Methods 3.3). Thus, one may infer rates of cellular processes shown in Fig. 1D from cancer incidence data. From previous rigorous model selection, we found the best model was birth cohort specific (see refs. 25 and 35 for previous results using esophageal adenocarcinoma incidence data from the SEER9 registry 1975–2010, with predicted trends to 2030). Parameters used as input data for the case of Barrett's esophagus screening are provided in Supplementary Table S1. We separate all results considering sex as a biological variable.

### Validation using population prevalence data

The results below include predictions for optimal screening ages that are not fit to screening data explicitly, but are outputs of the model fit to cancer incidence. While population data are conditional on patient survival, our aim was to calculate an optimal window (i.e., joint probabilities) for which no perfectly comparable data exists. Rather, to test the accuracy of our model, we recently validated the simultaneous predictions of Barrett's esophagus prevalence (our main target population for screening) using the MSCE-EAC model with various independent data from population-based studies (14). In that study, we also found that our simultaneous model estimates for cancer progression corresponded with published progression rates from non-dysplastic Barrett's esophagus to esophageal adenocarcinoma.

Here we include two main validation datasets that are relevant for our optimization problem. First, a main study currently used for screening rationale included age- and sex-specific data from the Clinical Outcomes Research Initiative (CORI; ref. 37). This data included endoscopic reports for more than 150,000 patients, most of whom were born around 1950 with no prior esophageal adenocarcinoma, stratified by GERD symptoms as indicated in the endoscopy report. To directly compare the CORI prevalence with our model that depends on Barrett's esophagus onset rate *ν*(*t*) (Fig. 1), we derive the analogous conditional probability to represent the data, which was computed as the fraction of first-time diagnosed Barrett's esophagus cases by age within the total patient cohort in CORI who underwent first endoscopy between 2000 and 2006 (see Supplementary Methods 3.7). Second, we compared our results with prospective, RCT (BEST3) Barrett's esophagus prevalence data for patients who were screened with sensitive technology Cytosponge-TFF3. Eligible patients in BEST3 were 50 years of age or older, had been taking acid suppressants for GERD symptoms for over 6 months, and had not undergone an endoscopy procedure within the previous 5 years (38). To compare our birth cohort–specific model predictions with similarly aged cohorts included in most available studies (including BEST3), we present main results for the U.S. population born in 1950 (as in previous cost-effectiveness analyses of Barrett's esophagus screening; refs. 18, 19, 25, 35), and show that results are similar for modern birth cohorts in a 50-year range.

### Code and data availability

All analytic distributions to recreate our results (data in Supplementary Table S1) and apply to other data are available in the text and Supplementary Data. Code to solve equations was developed in R (version 3.6.1) and is publicly available at: github.com/yosoykit/OptimalTimingScreening. CORI data can be accessed through application with ethical approval to NIDDK: https://niddkrepository.org/studies/cori/.

## Results

We applied our optimal timing framework to Barrett's esophagus for the scenarios of initial screening, rescreening, and surveillance, to obtain theoretical predictions *a priori* without relying on microsimulations.

### Optimal ages to initialize screening for Barrett's esophagus

Our first aim was to optimize the choice of recommended age to start screening in U.S. populations that would capture the most patients with Barrett's esophagus before they have progressed to esophageal adenocarcinoma (optimal window in Fig. 1C, beige). All current guidelines that specify an age to begin screening assume age 50 as the threshold across all risk groups but this choice was not based on any optimization process (15). In our optimal control framework (see Materials and Methods), we achieve this by maximizing the *screening objective function*, thus obtaining optimal screening ages.

In line with currently considered risk factors in guidelines, we computed the screening objective functions stratified by age, sex, and GERD status (i.e., general population or symptomatic GERD population). Figure 2 depicts the contour plots of these objective functions along with predicted optimal ages for each weight parameter *w* (orange lines). We sought ages by whole years for simplicity, in the way that guidelines are formulated currently. For *w* = 1 (equal weighting of positive screen and safeguarding from cancer before screening), the optimal screening times were 64 years old for all males (Fig. 2A orange triangle), 58 years old for males with GERD symptoms (Fig. 2B, orange triangle), 69 years old for all females (Fig. 2C, orange circle), and 64 years old for females with GERD symptoms (Fig. 2D, orange circle). As incrementally greater weight, *w*, is given to avoiding underdiagnosis than avoiding overdiagnosis, the optimal screening ages become younger (orange lines).

Table 1 provides results on predicted optimal screening ages along with associated metrics of efficacy. Optimal screening ages depend on an *a priori* choice of parameter *w*, which can be chosen such that the risk of esophageal adenocarcinoma before age of screening is below a tolerable level. We found the example for the proper probability for optimal window for screening (see Materials and Methods) with *w* = 1 had low associated risk of esophageal adenocarcinoma before optimal ages (Table 1, final column), so we present further results also for *w* = 1.

Risk group . | Sex . | Optimal age (95% CI) . | SD . | OD . | PPV . | Cancer risk . |
---|---|---|---|---|---|---|

General | Male | 64 (63–66) | 99.5% | 2.4% | 16.8% | 0.17% |

GERD | Male | 58 (56–60) | 99.6% | 7.1% | 28.6% | 0.37% |

General | Female | 69 (68–71) | 100% | 0.6% | 9.6% | 0.04% |

GERD | Female | 64 (59–67) | 100% | 1.9% | 17.9% | 0.11% |

Risk group . | Sex . | Optimal age (95% CI) . | SD . | OD . | PPV . | Cancer risk . |
---|---|---|---|---|---|---|

General | Male | 64 (63–66) | 99.5% | 2.4% | 16.8% | 0.17% |

GERD | Male | 58 (56–60) | 99.6% | 7.1% | 28.6% | 0.37% |

General | Female | 69 (68–71) | 100% | 0.6% | 9.6% | 0.04% |

GERD | Female | 64 (59–67) | 100% | 1.9% | 17.9% | 0.11% |

Note: Model predictions for optimal screening ages and associated metrics for *w* = 1 (see Supplementary Methods 3.7 for derivations): successful diagnoses (SD) predictive of future esophageal adenocarcinoma cases up to fixed age 80, overdiagnoses (OD) of non-esophageal adenocarcinoma cases, positive predictive value (PPV), and the risk of cancer occurring before the screening age recommendations (see Supplementary Methods 3.4 for risk equation) for U.S. persons, all races, stratified by sex and GERD symptom status.

Abbreviations: GERD, gastroesophageal reflux disease; OD, overdiagnoses; PPV, positive predictive value; SD, successful diagnoses.

### Validation of model-predicted optimal ages

We computed Barrett's esophagus prevalence for risk strata in Fig. 2 as predicted by our model (which was only fit to SEER incidence data) to compare with independent published data (see Materials and Methods; Fig. 3A and B). To account for likely heterogenous relative risk (RR) of developing Barrett's esophagus in GERD populations based on symptom onset age, Barrett's esophagus length, and other factors (14), we considered a range of fixed values (*RR* = 2–6; Fig. 3A shaded areas) and found age-specific trends broadly consistent with overall Barrett's esophagus prevalence results in CORI (37). Furthermore, we confirmed a main result from Rubenstein and colleagues that women with GERD (Fig. 3B, green lines) were at no increased risk compared with men without GERD (Fig. 3A, blue lines) and thus, if screening were considered, it should be recommended at the same age. In the pragmatic RCT BEST3 for Barrett's esophagus screening, 1,750 patients who underwent the screening procedure (8 patients with prior Barrett's esophagus were excluded), 127 total were diagnosed with Barrett's esophagus by 12 months (weighted overall average) after enrollment (7.3% prevalence with no concurrent esophago-gastric cancer; ref. 38). Of the 1,654 patients with a successful procedure, 52% were women and the median age was 69 years old. Similarly, our model predicted the Barrett's esophagus prevalence at age 69 for a GERD population with 52% women was 6.3%, assuming a fixed 5-fold RR of Barrett's esophagus in the GERD population [RR = 5 (39)]. Thus, Barrett's esophagus prevalence found in BEST3 was slightly higher than our model's estimate; one reason for this, as the authors note, could be that the 1,750 participants who agreed to undergo the Cytosponge-TFF3 procedure might have had more problematic symptoms than the average patient with GERD (38).

For previous guidelines, the decision for age to screen has been essentially made *ad hoc* considering such prevalence data that is *conditional* on being cancer-free and alive, and grouped into decade age groups (37). Our model estimates for this were consistent with these studies (14) but did not address optimality. We therefore computed the corresponding *joint* probability of being cancer-free and alive (captured in the relevant objective functions to optimize; Fig. 2) for any age in general population (Fig. 3C) and subpopulation with GERD symptoms (Fig. 3D). Importantly, we found that identical optimal screening ages were predicted for men and women with GERD for a broad range of relative risks (*RR* = 2–6) of developing Barrett's esophagus (Fig. 3D). Thus, although we allow for heterogeneous RR of Barrett's esophagus in patients with GERD resulting in a plausible range for Barrett's esophagus prevalence (Fig. 3B, shaded regions), the optimal age for screening was the same across these high-risk individuals.

### Optimal screening ages can decrease oversurveillance and costs versus *ad hoc* guidelines

For symptomatic GERD males (highest risk group tested), we next compared the costs associated with the initial screen age of 50 (current practice) versus our predicted optimal age of 58. For both screening ages 50 and 58 in GERD men, there was less than 0.4% risk of screening too late (i.e., very few men who develop esophageal adenocarcinoma will do so before those screening ages). Thus, the current cost burden for esophageal adenocarcinoma cancer control is likely attributable more so to years of futile surveillance than costs associated with missed cases. We estimated the number of patient-years of oversurveillance (see Materials and Methods) of overdiagnosed Barrett's esophagus cases in *x* number of individuals surveilled up to fixed age 80. For screening age 50, we computed 1.68*x* patient-years, and for optimal screening age 58, 1.52*x* patient-years. This implies that if we screened the roughly *x* = 400,000 U.S. men with GERD from the 1950 birth cohort at-risk at those specific ages, the overall costs associated with screening at optimal age 58 were estimated to be 64,000 patient-years less spent on futile surveillance for this birth cohort alone than what would be expected for screening at age 50; this is due to the savings gained from eliminating 8 years of exams in a large number of middle-aged patients who will not progress to esophageal adenocarcinoma (40).

We also estimated the resulting number of futile endoscopies (esophagogastroduodenoscopies, EGD) in these overdiagnosed patients with Barrett's esophagus in *x* number of individuals screened (see Materials and Methods). Here we previously estimated in a cost-effectiveness analysis of current Barrett's esophagus surveillance that the number of endoscopies per Barrett's esophagus patient per year after diagnosis was equal to 0.4 (see ref. 18; Supplementary Table E2). Therefore by fixed age 80, we computed 0.67*x* EGDs for screening age 50, compared with 0.61*x* EGDs for optimal screening age 58. Thus, if we screened *x* = 400,000 GERD men from the 1950 birth cohort, the number of futile endoscopies due to screening at optimal age 58 were estimated to be 24,000 less than expected with current practice for screening at 50. Assuming an EGD costs $745 USD (19), this implies a savings of $17.9 million for this birth cohort alone when choosing the optimal age to start screening recommendation versus current practice. In reality, this is a lower bound for savings that does not include costs for treatment of dysplastic lesions found during lifelong surveillance of each additional overdiagnosed case. The cost savings also increase proportionally when applied to additional birth cohorts beyond just that of 1950.

### Less than 3% of normal patients will be found with Barrett's esophagus on rescreen

When a patient is screened for suspected Barrett's esophagus and is normal, an important decision for clinicians is whether to suggest that this patient come back for a rescreen at a later age and if so, at what age they should return. Current guidelines do not advise rescreening unless a patient is healing from esophagitis wounding to ensure underlying Barrett's esophagus was not missed initially (16). This recommendation was based on a CORI study that found 2.3% of patients with an initial negative screen had Barrett's esophagus on repeat examination, implying that Barrett's esophagus is rarely missed and develops early in patients on surveillance (41). With our methods, we can quantify: (i) when is the best age to offer a second screen, and (ii) how likely it would be that Barrett's esophagus would be found at that age.

We optimized when to screen this individual again at a later age when he/she is most probable to have developed Barrett's esophagus by this time but not yet clinical cancer. Given a range of initial screening ages (age = 45, 50, 58, 64, 69), we computed the rescreening objective functions to maximize (Fig. 4). The black diamonds depict the optimal rescreening ages for the U.S. population, all races, born in 1950. For our choices of initial screening age in ascending order, we found that corresponding optimal timing for follow-up screens were ages 73, 74, 76, 77, 78, for all males (Fig. 4A); ages 72, 73, 75, 76, 77, for GERD males (Fig. 4B); ages 77, 78, 79, 81, 82, for all females (Fig. 4C); and ages 76, 77, 79, 80, 81, for GERD females (Fig. 4D). Thus, even though initial screening times may be over 20 years apart, the optimal range of a follow-up screen was only within a 5-year window.

Our estimates for these probabilities of Barrett's esophagus yield on rescreen were very close to those found for rescreening both genders in CORI (41), further validating our predictions of screening outcomes and yields based on continuous age. Importantly, we found that the initial age of screening (rather than time since initial screening) affects the future probability of Barrett's esophagus yield; such information is not currently considered when clinicians decide whether certain patients with GERD should return to be rescreened. For example we found that, if initial screening took place for GERD males at age 58, there would be a 1% chance of a positive rescreen rather than 2%–3% when screening first at age 50. Importantly, even this 1% will be mostly overdiagnoses as *de novo* Barrett's esophagus is much less likely to have time to develop to esophageal adenocarcinoma within the remainder of patient lifespan, thus further devaluing a rescreen after initial negative result.

### Accurate cancer risk prediction depends on premalignant molecular age

When a patient has a positive Barrett's esophagus screen but no neoplasia (HGD/malignancy) at the first screening age, current guidelines suggest these patients return for surveillance every 3–5 years (16, 17) because they still carry risk of esophageal adenocarcinoma that may in fact increase with time regardless of persisting without a diagnosis of dysplasia (42). Furthermore, non-dysplastic patients with Barrett's esophagus with “missed” HGD/esophageal adenocarcinoma were significantly older than those who later progressed (43), suggesting the important role age plays as a significant risk factor indicating missed esophageal adenocarcinoma and the need for surveillance of patients with Barrett's esophagus who have unknowingly harbored Barrett's esophagus for longer than younger patients. This prompts the question whether patient-specific Barrett's esophagus “dwell time” influences future neoplastic risk.

If we measured Barrett's esophagus onset age for a patient with Barrett's esophagus diagnosed at some years after onset [e.g., via methylation-based molecular clocks (44)], we can predict with the model his/her associated esophageal adenocarcinoma risk for given next surveillance age (see Supplementary Methods 3.10). We used the model to predict future esophageal adenocarcinoma risk up to age 85 with prior screening ages 45, 50, 58, 64 for males (Fig. 5A), and 45, 50, 64, 69 for females (Fig. 5B). The model predicted that in general the associated esophageal adenocarcinoma risk for those that developed Barrett's esophagus earlier in life is greater than the risk for those patients with Barrett's esophagus who developed Barrett's esophagus later in life. For the screen performed at age 50 (dashed lines) in both sexes, the patient with Barrett's esophagus who developed Barrett's esophagus at age 20 (purple line) has a 2.4 predicted relative esophageal adenocarcinoma risk versus the patient with Barrett's esophagus who developed Barrett's esophagus at age 40 (yellow line). Currently in clinical practice however, these 2 patients would likely be treated equally because they were diagnosed at the same age. Our findings reiterate that knowledge of Barrett's esophagus onset ages may translate to large differences in esophageal adenocarcinoma risk (44).

### Sensitivity of predictions

Our multistage model was calibrated to population incidence data (35) and has been previously tested to quantify the effects of perturbations in model parameters in Supplementary Table S1 (see details in refs. 25, 45, 46). Similarly here, we tested our model and found that our optimal timing predictions are robust to perturbations in parameters (Table 1) as calculated in a bootstrap analysis of 1,000 resampling iterations of the Markov chain Monte Carlo posterior distributions for each calibrated parameter (Supplementary Fig. S1A).

We also found that screening optimization predictions are robust through modern birth cohorts in a 50-year range (Supplementary Fig. S1B). Specifically, we considered five additional modern birth cohorts beyond the 1950 base case and found that: (i) optimal screening ages remain unchanged for males (GERD and general populations) born between 1950 and 2000, and (ii) similar results to the base case were found for females. Specifically, optimal screening ages varied between these cohorts by only 3 years in females with GERD and 2 years for the general female population (Supplementary Fig. S1B). This is consistent with findings from the CORI study that Barrett's esophagus prevalence estimates at index endoscopy show only a small secular age-specific trend in index Barrett's esophagus prevalence in white male GERD screenees but no trend in women across the calendar year period years 2000–2006 (37).

## Discussion

Mathematical models of cancer evolution are being widely developed to understand timing of events and dynamics of carcinogenesis. In this study, we applied such models to optimize screening and surveillance. Examples that could be applied within our framework include Markov models for natural history of disease (30, 47, 48), biologically based models that incorporate dynamic processes (21, 24, 25, 34, 49, 50), and biological event timing models that infer ordered genetic events (51). Our mathematical formalism provides one means of targeting distinct latency periods, or “windows of opportunity” and the specified weight *w* for the adverse event to suit the purpose of the health policy goal.

Unlike previous studies on screening that incorporated microsimulation of clonal evolution (21, 34), this study provides an analytical framework that does not require simulation to determine the specific age to screen (and whether to screen) for cancer precursors. Microsimulation models have previously been used to inform policy-making decisions on screening, but the optimal design framework developed here offers computational advantages and strengths based on theory that can complement and enhance simulation methods. In particular, microsimulations of many hypothetical scenarios can be time intensive, prone to sensitivities in parameter choices, and ultimately only test a finite number of options (52). In contrast, we showed that successful screening probabilities can be analytically solved in certain cases and that these equations (i) are straightforward to optimize in a single computation, (ii) reveal model assumptions that have the greatest implications on the sensitivity of the results obtained, and (iii) predict future trends to obtain results for younger birth cohorts in a data-driven way.

In the case of Barrett's esophagus, screening has been recommended to begin in patients with multiple risk factors including increasing age (the specific threshold is 50 years of age), but the clinical evidence for these recommendations has been low to moderate in quality (15). An appropriate RCT to test proposed screening ages is not practical so cost-effectiveness studies have mainly relied on *in silico* evaluations of specific screening scenarios (18). While these models have tested a small number of screening cohorts, we considered the full range of screening ages and found that the optimal ages to begin screening patients with symptomatic GERD were age 58 for men and age 64 for women. Importantly, the corresponding Barrett's esophagus prevalence estimates predicted by the model align with population studies (14), and here we specifically validated such estimates considering both data used in guideline formation for starting age (37), along with recent data from RCT BEST3 (38). The optimal age predictions are still within cost-effective ranges found for men with GERD in previous modeling analyses that compared screening ages 50, 60, and 70 (19, 53), particularly when using minimally invasive, cheaper screening modalities in both men and women (19). We found that an earlier screening age of 50 was more costly to healthcare for men with GERD due to overdiagnoses, while we expect a 9% reduction in total future Barrett's esophagus surveillance endoscopies (and 10% reduction in total patient-years of needless surveillance) if screening took place at later age 58. Although incidental esophageal adenocarcinoma cases between ages 50–58 are rare compared with those in 59+ age group, costs due to underdiagnosis of Barrett's esophagus that would progress to esophageal adenocarcinoma in this 8-year range could be considered, but would also be counterbalanced against overtreatment of small indolent lesions detected on surveillance.

For the general population, we found that the optimal ages to recommend screening were age 64 for men and age 69 for women (for equal weighting given to Barrett's esophagus onset and esophageal adenocarcinoma risk in optimization). These findings for the ages of optimal Barrett's esophagus yield are supported by prospective studies' results that it is more likely to find Barrett's esophagus during screening at ages closer to 60 than 50 (54), specifically in two large studies that found the same mean age at Barrett's esophagus diagnosis of 61 in men (55), and 61 in both genders together (56). Finally, women found with Barrett's esophagus were more likely to be older, between ages 61 and 70 at diagnosis, than women found without Barrett's esophagus (57, 58). Taken together, these studies confirm our findings that it may be justified biologically to begin screening at these later ages. These proposed older ages to initialize screening in GERD and general population could be specifically tested in future cost-effectiveness analyses.

Our model describes a number of key transitions in cell biology, each defined by a specific rate parameter. Direct measurement of these parameters is usually lacking, and indeed there remains uncertainty about what key rate-limiting transitions in cancer evolution are essential. Nevertheless, we estimated a tight range for optimal screening ages in our sensitivity analysis with varying cell-level rate inputs, and identical optimal ages for patients with GERD when assuming a wide range of published estimates for RR of developing Barrett's esophagus in this high-risk group. With respect to effective surveillance, we did find that molecular age of precursors (beyond patient chronological age at time of screening) was a critical variable for accurate cancer risk prediction given screening outcome (59). Because the framework is predicated on a biological description of the natural history of cancer, the inferences drawn are predictive and will need to be validated prospectively even though the underlying mechanistic model framework was validated using available screening data. In particular, additional data on age-specific Barrett's esophagus prevalence in GERD populations are needed to strengthen model validation. This approach complements the development of adaptive treatment strategies and dynamic treatment regimen that rely on large amounts of patient-specific health records and machine learning (60). Although these new developments are likely to yield important insights considering the wealth of clinical records for common diseases, they are difficult to extend to rare or understudied diseases and therefore are prone to biases. In contrast, models borrow strength from incidence patterns and are easily interpretable and applicable to balance risks with benefits when there is an intention to treat or intervene.

While it is difficult to justify delaying screening from a public health perspective, the problem of overdiagnosis is only becoming larger as fervor for early detection assays continues to grow. Our study's predictions are reasonable considering the scarcity of epidemiologic data that support earlier screening. We show that using a mechanism-based model of cancer could help relieve the costs of unnecessary screening and surveillance, though further validation of associated risks and sensitivities is needed. Overall such approaches are currently underutilized in public health policies related to cancer screening. Here we demonstrate the potential to inform guidelines through deeper mathematical examination of data for cancers where screening can be lifesaving. As “learning cancer screening programs” are being proposed to test many potential strategies in populations via randomized screening arms (20), we suggest that incorporating a model-based approach from the outset could aid in an optimal design plan (such as screening age choice in certain risk strata) to significantly improve health outcomes at acceptable costs.

## Authors' Disclosures

K. Curtius reports grants from UKRI and NIH/NCI during the conduct of the study. No disclosures were reported by the other authors.

## Authors' Contributions

**K. Curtius:** Conceptualization, data curation, software, formal analysis, funding acquisition, investigation, visualization, methodology, writing-original draft, writing-review and editing. **A. Dewanji:** Conceptualization, formal analysis, investigation, methodology, writing-original draft, writing-review and editing. **W.D. Hazelton:** Software, investigation, methodology, writing-original draft, writing-review and editing. **J.H. Rubenstein:** Resources, data curation, investigation, visualization, writing-review and editing. **G.E. Luebeck:** Software, supervision, funding acquisition, investigation, methodology, writing-original draft, project administration, writing-review and editing.

## Acknowledgments

This research was supported by the NCI (http://www.cancer.gov) under grants U01 CA152926, U01 CA199336 (CISNET; to K. Curtius, G.E. Luebeck, W.D. Hazelton, and J.H. Rubenstein), U01 CA182940 (BG-U01; to G.E. Luebeck and W.D. Hazelton), U54 CA163059 (BETRNET; to J.H. Rubenstein), and UKRI Rutherford Fund Fellowship (to K. Curtius). The authors thank Trevor Graham for helpful discussions during the writing of this article. The authors also thankfully acknowledge the National Endoscopic Database of the Clinical Outcomes Research Initiative (CORI) for raw data on Barrett's esophagus prevalence.

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.