## Abstract

Models can help guide colorectal cancer screening policy. Although models are carefully calibrated and validated, there is less scrutiny of assumptions about test performance.

We examined the validity of the CRC-SPIN model and colonoscopy sensitivity assumptions. Standard sensitivity assumptions, consistent with published decision analyses, assume sensitivity equal to 0.75 for diminutive adenomas (<6 mm), 0.85 for small adenomas (6–10 mm), 0.95 for large adenomas (≥10 mm), and 0.95 for preclinical cancer. We also selected adenoma sensitivity that resulted in more accurate predictions. Targets were drawn from the Wheat Bran Fiber study. We examined how well the model predicted outcomes measured over a three-year follow-up period, including the number of adenomas detected, the size of the largest adenoma detected, and incident colorectal cancer.

Using standard sensitivity assumptions, the model predicted adenoma prevalence that was too low (42.5% versus 48.9% observed, with 95% confidence interval 45.3%–50.7%) and detection of too few large adenomas (5.1% versus 14.% observed, with 95% confidence interval 11.8%–17.4%). Predictions were close to targets when we set sensitivities to 0.20 for diminutive adenomas, 0.60 for small adenomas, 0.80 for 10- to 20-mm adenomas, and 0.98 for adenomas 20 mm and larger.

Colonoscopy may be less accurate than currently assumed, especially for diminutive adenomas. Alternatively, the CRC-SPIN model may not accurately simulate onset and progression of adenomas in higher-risk populations.

Misspecification of either colonoscopy sensitivity or disease progression in high-risk populations may affect the predicted effectiveness of colorectal cancer screening. When possible, decision analyses used to inform policy should address these uncertainties.

*See related commentary by Etzioni and Lange, p. 702*

This article is featured in Highlights of This Issue, p. 693

## Introduction

Colorectal cancer is the third leading cause of cancer death in the United States (1). There is great potential for screening to reduce the burden of colorectal cancer through earlier detection and prevention via the removal of precursor lesions at colonoscopy. Both colorectal cancer mortality (2) and incidence (3) have declined in the United States since the introduction of colorectal cancer screening over 40 years ago (4) and the release of screening guidelines over 15 years ago (5).

The Colorectal Cancer Simulated Population model for Incidence and Natural history (CRC-SPIN) was one of three models recently used to estimate the comparative effectiveness of different colorectal cancer screening regimens (6, 7). CRC-SPIN is a decision-analytic model that is used to inform policy by predicting outcomes under different scenarios, for example, varying the ages to begin and end screening, screening intervals, and screening modalities. The model needs to be calibrated before it can be used to inform decisions, meaning that any unknown parameters are selected so that the model accurately predicts targets. Although calibration is important, we can have more confidence in a model that has also been externally validated and found to accurately predict outcomes that have not been used for model calibration (8–10).

Because models are complex, great effort goes into demonstrating their face validity and carrying out calibration and validation to demonstrate internal and external validity. However, test performance, which is a key part of screening effectiveness, is often assumed to be fixed and known. Yet there is considerable uncertainty about colonoscopy sensitivity to detect diminutive adenomas (those smaller than 6 mm; ref. 11). Meta-analyses that describe colonoscopy sensitivity do not report sensitivity for adenomas smaller than 6 mm and ranges for the sensitivity of larger adenomas are wide, from 0.63 to 0.96 for small adenomas (between 6 and 10 mm), and from 0.70 to 0.99 for adenomas 10 mm and larger (12). Estimates from tandem exams that involve same-day independent examination by two gastroenterologists are the gold standard for determining colonoscopy sensitivity, and find colonoscopy sensitivity that ranges from 0.65 to 0.73 for diminutive adenomas, 0.82 to 0.92 for small adenomas, and 92.7 to 99.7 for large adenomas (≥10 mm; ref. 13). However, these studies may overestimate sensitivity because lesions missed by both examiners are ignored. For example, to estimate colonoscopy miss rates, Rex and colleagues specified that “[a]ny polyp identified during the second [exam] was scored as a missed polyp by the first examination” (14). The potential for overestimation is greatest for diminutive adenomas, because they have a greater chance of being missed by both gastroenterologist. In addition, these estimates may not reflect performance in community settings because participating gastroenterologists knew they were being evaluated. Although we cannot directly observe colonoscopy sensitivity in community settings, the adenoma detection rate (ADR) is a measure of colonoscopy quality, and a proxy for colonoscopy sensitivity. It is well known that ADR varies across endoscopists, and there are interventions designed to improve ADR (15, 16).

We demonstrate the external validation of CRC-SPIN version 2.0 and evaluate assumptions about colonoscopy sensitivity that are integral to model-based estimates of screening effectiveness.

## Materials and Methods

We used the CRC-SPIN model to predict outcomes from the Wheat Bran Fiber (WBF) trial.

### Microsimulation model

The CRC-SPIN model was developed in 2009 (17), updated in 2019 (18), and recalibrated to improve prediction of screen-detected colorectal cancer. The model is based on the adenoma–carcinoma sequence (19, 20) and has four components (outlined in Table 1): (i) adenoma risk and accumulation; (ii) adenoma growth; (iii) adenoma transition to asymptomatic (preclinical) cancer; and (iv) transition from preclinical to clinically detected cancer.

. | . |
---|---|

Adenoma risk: Nonhomogeneous Poisson process with risk ψ (a)_{i} | |

Seven parameters: |${\tf="OT61751f86"{{\tf="OT66de3b7b_I" A},{\sigma _\alpha },\ {\alpha _1},\ {\alpha _{20}},\ {\alpha _{50}},\ {\alpha _{60}},\ {\alpha _{70}}}}$ | | |

|${\tf="OT61751f86"{{\rm{ln}}( {{\psi _{\tf="OT66de3b7b_I"i}}( {\tf="OT66de3b7b_I" a} )} )$}} | | |${\tf="OT61751f86"{\eqalign{= {\alpha _{0{\tf="OT66de3b7b_I" i}}} + \ {\alpha _1}{\rm{se}}{{\rm{x}}_{\tf="OT66de3b7b_I" i}} + \delta ( {{\tf="OT66de3b7b_I" a} \ge 20} ){\rm{min}}( {{\tf="OT66de3b7b_I" a} - 20,30} ){\alpha _{20}} \cr + \delta ( {{\tf="OT66de3b7b_I" a} \ge 50} ){\rm{min}}( {{\tf="OT66de3b7b_I" a} - 50,10} ){\alpha _{50}} \cr + \delta ( {{\tf="OT66de3b7b_I" a} \ge 60} ){\rm{min}}( {{\tf="OT66de3b7b_I" a} - 60,10} ){\alpha _{60}} + \delta ( {{\tf="OT66de3b7b_I" a} \ge 70} )( {{\tf="OT66de3b7b_I" a} - 70} ){\alpha _{70}}}}$ | |

where |$\tf="OT61751f86" \delta ({\tf="OT66de3b7b_I" x} ) = 1\ $ | when |$\tf="OT66de3b7b_I" x$ | is true and is |$\tf="OT61751f86" 0$ | otherwise and | |

|${{\tf="OT61751f86"{\alpha _{0{\tf="OT66de3b7b_I" i}}}\sim {\tf="OT66de3b7b_I" N}( {{\tf="OT66de3b7b_I" A},{\sigma _\alpha }} )$}} | captures random between-individual variation in risk | |

Adenoma growth: Richard's growth model | |

One parameter, |$\tf="OT66de3b7b_I"p$ | | |

|${\tf="OT66de3b7b_I"{d_{ij}}( t )}$ | | Size (diameter) of the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person at time |$\tf="OT66de3b7b_I" t$ | |

|${\tf="OT66de3b7b_I"{{d_{ij}}( t ) = {d_\infty }{[ {{\tf="OT61751f86" 1} + ( {{{( {\frac{{{d_0}}}{{{d_\infty }}}} )}^{{\tf="OT61751f86" 1}/p}} - {\tf="OT61751f86" 1}} ){{exp}}( { - {\lambda _{ij}}t} )} ]^p}$}} | | |

where |$\tf="OT66de3b7b_I"{\lambda _{ij}}$ | is the growth rate for the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person, based on |$\tf="OT66de3b7b_I"{t_{\rm 10}}$ | , |$\tf="OT66de3b7b_I"{d_{\rm 0}} = $ | 1 mm, |$\tf="OT66de3b7b_I"{d_\infty } = $ | 50 mm | |

Time to reach 10 mm: Fréchet distribution | |

Four parameters: β_{1}_{c} and β_{2}_{c} (adenomas in the colon), β_{1}_{r} and β_{2}_{r} (adenomas in the rectum) | |

|$\tf="OT66de3b7b_I"{t_{\rm 10}}$ | | Time from adenoma initiation at 1 mm to reach 10 mm. |

|$\tf="OT66de3b7b_I"{P( {{t_{\rm 10}} \le t} ) = F( t ) = {{exp}}[ { - {{( {\frac{t}{{{\beta _{\rm 2}}}}} )}^{ - {\beta _{\rm 1}}}}} ]$} | | |

Size at transition to preclinical cancer: Log-normal distribution | |

Seven parameters: |$\tf="OT66de3b7b_I"{\gamma _{\rm 0}},\ {\gamma _{\rm 1}},\ {\gamma _{\rm 2}},\ {\gamma _{\rm 3}},\ {\gamma _{\rm 4}},\ {\gamma _{\rm 5}},$ | and |$\tf="OT66de3b7b_I"{\sigma _\gamma }$ | | |

|$\tf="OT66de3b7b_I"{s_{ij}}( t )$ | | Size (diameter) of the |$j$ | th adenoma in the |$i$ | th simulated person at the time of transition to preclinical cancer, |$\tf="OT66de3b7b_I" {\rm{log}}( {{s_{ij}}} )\sim N( {{\mu _{ij}},{\sigma _\gamma }} )$ | . |

|$\tf="OT66de3b7b_I"{\mu _{ij}}$ | | Expected log-size (diameter) at transition for the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person |

|$\tf="OT66de3b7b_I"{\mu _{ij}} = {\gamma _{\rm 0}} + {\gamma _{\rm 1}}{\rm{se}}{{\rm{x}}_i} + {\gamma _{\rm 2}}{\rm{rectu}}{{\rm{m}}_{ij}} + {\gamma _{\rm 3}}{\rm{se}}{{\rm{x}}_i}{\rm{rectu}}{{\rm{m}}_{ij}} + {\gamma _{\rm 4}}{\rm{ag}}{{\rm{e}}_{ij}} + {\gamma _{\rm 5}}{\rm{ag}}{{\rm{e}}_{ij}}^{\rm 2}$ | | |

where |$\tf="OT66de3b7b_I"{\rm{ag}}{{\rm{e}}_{ij}}\ $ | is the age at adenoma initiation in decades, centered at 50 years and |$\tf="OT66de3b7b_I"{\rm{rectu}}{{\rm{m}}_{ij}}\ $ | indicates rectal location | |

Sojourn time: Weibull distribution with proportional hazards | |

Three parameters: |$\tf="OT66de3b7b_I"{\lambda _{\rm 1}},\ {\lambda _{\rm 2}},\ {\lambda _{\rm 3}}$ | | |

|$\tf="OT66de3b7b_I"{T_{ij}}$ | | Time from preclinical transition to symptomatic presentation for the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person, Weibull distribution with scale parameter |$\tf="OT66de3b7b_I"{\lambda _{\rm 1}}{\rm{exp}}{( {{\lambda _{\rm 3}}{\rm{rectu}}{{\rm{m}}_{ij}}} )^{ - {\rm 1}/{\lambda _{\rm 2}}}}\ $ | and shape parameter |$\tf="OT66de3b7b_I"{\lambda _{\rm 2}}$ | |

where |${\rm{rectu}}{{\rm{m}}_{ij}}\ $ | indicates rectal location |

. | . |
---|---|

Adenoma risk: Nonhomogeneous Poisson process with risk ψ (a)_{i} | |

Seven parameters: |${\tf="OT61751f86"{{\tf="OT66de3b7b_I" A},{\sigma _\alpha },\ {\alpha _1},\ {\alpha _{20}},\ {\alpha _{50}},\ {\alpha _{60}},\ {\alpha _{70}}}}$ | | |

|${\tf="OT61751f86"{{\rm{ln}}( {{\psi _{\tf="OT66de3b7b_I"i}}( {\tf="OT66de3b7b_I" a} )} )$}} | | |${\tf="OT61751f86"{\eqalign{= {\alpha _{0{\tf="OT66de3b7b_I" i}}} + \ {\alpha _1}{\rm{se}}{{\rm{x}}_{\tf="OT66de3b7b_I" i}} + \delta ( {{\tf="OT66de3b7b_I" a} \ge 20} ){\rm{min}}( {{\tf="OT66de3b7b_I" a} - 20,30} ){\alpha _{20}} \cr + \delta ( {{\tf="OT66de3b7b_I" a} \ge 50} ){\rm{min}}( {{\tf="OT66de3b7b_I" a} - 50,10} ){\alpha _{50}} \cr + \delta ( {{\tf="OT66de3b7b_I" a} \ge 60} ){\rm{min}}( {{\tf="OT66de3b7b_I" a} - 60,10} ){\alpha _{60}} + \delta ( {{\tf="OT66de3b7b_I" a} \ge 70} )( {{\tf="OT66de3b7b_I" a} - 70} ){\alpha _{70}}}}$ | |

where |$\tf="OT61751f86" \delta ({\tf="OT66de3b7b_I" x} ) = 1\ $ | when |$\tf="OT66de3b7b_I" x$ | is true and is |$\tf="OT61751f86" 0$ | otherwise and | |

|${{\tf="OT61751f86"{\alpha _{0{\tf="OT66de3b7b_I" i}}}\sim {\tf="OT66de3b7b_I" N}( {{\tf="OT66de3b7b_I" A},{\sigma _\alpha }} )$}} | captures random between-individual variation in risk | |

Adenoma growth: Richard's growth model | |

One parameter, |$\tf="OT66de3b7b_I"p$ | | |

|${\tf="OT66de3b7b_I"{d_{ij}}( t )}$ | | Size (diameter) of the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person at time |$\tf="OT66de3b7b_I" t$ | |

|${\tf="OT66de3b7b_I"{{d_{ij}}( t ) = {d_\infty }{[ {{\tf="OT61751f86" 1} + ( {{{( {\frac{{{d_0}}}{{{d_\infty }}}} )}^{{\tf="OT61751f86" 1}/p}} - {\tf="OT61751f86" 1}} ){{exp}}( { - {\lambda _{ij}}t} )} ]^p}$}} | | |

where |$\tf="OT66de3b7b_I"{\lambda _{ij}}$ | is the growth rate for the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person, based on |$\tf="OT66de3b7b_I"{t_{\rm 10}}$ | , |$\tf="OT66de3b7b_I"{d_{\rm 0}} = $ | 1 mm, |$\tf="OT66de3b7b_I"{d_\infty } = $ | 50 mm | |

Time to reach 10 mm: Fréchet distribution | |

Four parameters: β_{1}_{c} and β_{2}_{c} (adenomas in the colon), β_{1}_{r} and β_{2}_{r} (adenomas in the rectum) | |

|$\tf="OT66de3b7b_I"{t_{\rm 10}}$ | | Time from adenoma initiation at 1 mm to reach 10 mm. |

|$\tf="OT66de3b7b_I"{P( {{t_{\rm 10}} \le t} ) = F( t ) = {{exp}}[ { - {{( {\frac{t}{{{\beta _{\rm 2}}}}} )}^{ - {\beta _{\rm 1}}}}} ]$} | | |

Size at transition to preclinical cancer: Log-normal distribution | |

Seven parameters: |$\tf="OT66de3b7b_I"{\gamma _{\rm 0}},\ {\gamma _{\rm 1}},\ {\gamma _{\rm 2}},\ {\gamma _{\rm 3}},\ {\gamma _{\rm 4}},\ {\gamma _{\rm 5}},$ | and |$\tf="OT66de3b7b_I"{\sigma _\gamma }$ | | |

|$\tf="OT66de3b7b_I"{s_{ij}}( t )$ | | Size (diameter) of the |$j$ | th adenoma in the |$i$ | th simulated person at the time of transition to preclinical cancer, |$\tf="OT66de3b7b_I" {\rm{log}}( {{s_{ij}}} )\sim N( {{\mu _{ij}},{\sigma _\gamma }} )$ | . |

|$\tf="OT66de3b7b_I"{\mu _{ij}}$ | | Expected log-size (diameter) at transition for the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person |

|$\tf="OT66de3b7b_I"{\mu _{ij}} = {\gamma _{\rm 0}} + {\gamma _{\rm 1}}{\rm{se}}{{\rm{x}}_i} + {\gamma _{\rm 2}}{\rm{rectu}}{{\rm{m}}_{ij}} + {\gamma _{\rm 3}}{\rm{se}}{{\rm{x}}_i}{\rm{rectu}}{{\rm{m}}_{ij}} + {\gamma _{\rm 4}}{\rm{ag}}{{\rm{e}}_{ij}} + {\gamma _{\rm 5}}{\rm{ag}}{{\rm{e}}_{ij}}^{\rm 2}$ | | |

where |$\tf="OT66de3b7b_I"{\rm{ag}}{{\rm{e}}_{ij}}\ $ | is the age at adenoma initiation in decades, centered at 50 years and |$\tf="OT66de3b7b_I"{\rm{rectu}}{{\rm{m}}_{ij}}\ $ | indicates rectal location | |

Sojourn time: Weibull distribution with proportional hazards | |

Three parameters: |$\tf="OT66de3b7b_I"{\lambda _{\rm 1}},\ {\lambda _{\rm 2}},\ {\lambda _{\rm 3}}$ | | |

|$\tf="OT66de3b7b_I"{T_{ij}}$ | | Time from preclinical transition to symptomatic presentation for the |$\tf="OT66de3b7b_I"j$ | th adenoma in the |$i$ | th simulated person, Weibull distribution with scale parameter |$\tf="OT66de3b7b_I"{\lambda _{\rm 1}}{\rm{exp}}{( {{\lambda _{\rm 3}}{\rm{rectu}}{{\rm{m}}_{ij}}} )^{ - {\rm 1}/{\lambda _{\rm 2}}}}\ $ | and shape parameter |$\tf="OT66de3b7b_I"{\lambda _{\rm 2}}$ | |

where |${\rm{rectu}}{{\rm{m}}_{ij}}\ $ | indicates rectal location |

#### Adenoma risk

For each individual, adenomas are initiated using a nonhomogeneous Poisson process that allows simulation of multiple adenomas over each individual's lifespan. Adenoma risk varies systematically by gender and age and randomly across individuals. Between-individual variation in adenoma risk induces clustering of adenomas within individuals, with some individuals more likely to develop adenomas than others. Given individual risk, adenomas are simulated independently within each individual, and each is independently assigned a location using a multinomial distribution based on findings from autopsy and colonoscopy studies (17). The distribution across six possible sites of the large intestine (from proximal to distal) is: 8% cecum; 23% ascending colon; 24% transverse colon; 12% descending colon; 24% sigmoid colon; 9% rectum.

#### Adenoma growth

Adenomas are initiated at 1 mm in diameter and may grow up to 50 mm. Adenoma size at any point after initiation is calculated using a Richards growth curve model (21). Rather than directly simulating the growth rate for each adenoma, the model simulates the time it takes for an adenoma to reach 10 mm in diameter, |${t_{10}}$ | , using an inverse Weibull distribution. Given the growth function, the growth rate is a simple function of |${t_{10}}$ | . The model simulates |${t_{10}}$ | independently for each adenoma and allows |${t_{10}}$ | to vary systematically based on location to capture differences in the proportions of colorectal cancers and adenomas located in the rectum. Although only 9% of adenomas are located in the rectum, half to one third of colorectal cancers are located in the rectum (22). The time to reach 10 mm is independent of other characteristics, including age, sex, and individual-level adenoma risk.

#### Transition from adenoma to invasive cancer

The time from adenoma initiation to preclinical disease is driven by the model for adenoma growth and the size at an adenoma's transition to preclinical cancer. The cumulative probability that an adenoma transitions to preclinical cancer is simulating using a log-normal distribution for the size at transition, in mm, and the mean size at transition is a function of location (colon or rectum), sex, and age at initiation. Adenomas do not transition to preclinical cancer if their simulated size at transition is greater than the maximum adenoma size or if the individual dies before the adenoma reaches transition size.

#### Time from preclinical to clinically detected cancer (sojourn time)

Clinically detected cancer is defined as cancer diagnosed in the absence of screening. Sojourn time is defined as the time from cancer initiation to clinical detection. Sojourn time is simulated using a Weibull distribution, with a proportional hazard model used to simulate sojourn time for adenomas in the rectum relative to the colon.

#### Stage and survival

Stage at clinical (symptomatic) detection is simulated based on the stage distribution from the NCI's Surveillance Epidemiology and End Results (SEER) 1975–1979 (23). This is the most recent time period when there was little or no colorectal cancer screening. The model stochastically assigns size at detection, conditional on stage, using a smoothed distribution estimated using the same 1975–1979 SEER data. Finally, the model stochastically assigns time of colorectal cancer death, using relative survival probabilities estimated from SEER survival data for cases diagnosed from 1975 to 2003 that vary by age at diagnosis and gender, and is stratified by location (colon or rectum) and American Joint Committee on Cancer stage, with age and gender included as covariates (24).

The model stochastically assigns time of non-colorectal cancer death using survival probabilities based on product-limit estimates for age and birth-year cohorts from the National Center for Health Statistics Databases (25).

CRC-SPIN 2.0 contains 22 calibrated parameters (7 adenoma risk parameters, 5 adenoma growth parameters, 7 adenoma transition parameters, and 3 sojourn time parameters). The model was calibrated using incremental mixture approximate Bayesian computation, a likelihood-free approach that results in simulated draws from the joint posterior distribution of model parameters given calibration targets (18). Calibration targets included SEER colon and rectal cancer incidence rates in 1975–1979 (23), the most recent period prior to dissemination of colorectal cancer screening tests, studies that describe adenoma prevalence (26), adenoma size at screen detection (27), preclinical cancer by lesion size (28, 29), and preclinical colorectal cancer prevalence by sex (30). CRC-SPIN 2.0 was internally validated to follow-up findings from the UK Flexible Sigmoidoscopy Screening Trial (30–32) and was found to adequately predict the 10- and 17-year reduction in colorectal cancer incidence and mortality following a single flexible sigmoidoscopy (33).

### Colonoscopy sensitivity

We carried out analyses using two sets of colonoscopy sensitivity assumptions. One set of analyses made “standard” colonoscopy sensitivity assumptions that were consistent with previous decision analyses. Standard assumptions set sensitivity equal to 0.75 for diminutive adenomas (<6 mm), 0.85 for small adenomas (6–10 mm), 0.95 for large adenomas (≥10 mm), and 0.95 for preclinical cancer (7, 34, 35).

The second set of analyses was based on calibrated colonoscopy sensitivities, with sensitivities selected so that model predictions that were consistent with validation targets. We used a grid search to evaluate possible sensitivities. The grid search explored the following sensitivities in 0.05 increments: sensitivity for diminutive adenomas from 0.20 to 0.75; small adenomas from 0.55 to 0.85; and large adenomas (10 to 20 mm) from 0.80 to 0.95. For extra-large adenomas (≥20 mm), we examined sensitivity ranging from 0.90 to 0.98 in 0.02 increments. We also assumed that the sensitivity was a nondecreasing function of size. This resulted in 1,199 combinations of colonoscopy sensitivity. We selected the best set of colonoscopy sensitivities from these scenarios, defined as the set that resulted in accurate prediction of the most targets, meaning that predictions were in a target's 95% confidence interval. To break ties, we identified the scenario that resulted in predicted targets that were nearest to observed targets, based on the sum of scaled differences, scaling by the range of the 95% confidence interval.

### Validation targets: the WBF trial

The WBF trial was a double-blind placebo-controlled trial that examined the effect of three years of wheat bran supplementation on adenoma recurrence (36–39). The study found no evidence of an intervention effect. The trial enrolled men and women ages 40 to 80 years residing in the Phoenix metropolitan area from September 1990 through July 1995, three months after they were found to have one or more adenoma(s) 3 mm or larger at a baseline colonoscopy. Of the 4,705 individuals initially eligible for the trial, 44% declined participation, 21% were determined to be ineligible, 2% dropped out prior to the run-in period, which determined ability to comply with the intervention, and 2% were unable to comply by consuming at least 75% of the supplement for 6 weeks. The study randomized 1,429 individuals within three months of their baseline colonoscopy, and of these, 1,303 (91%) completed at least one follow-up colonoscopy assessment. Study participants had a mean age of 66.2 years (standard deviation, SD = 8.8), were mostly male (66.8%), and 34.7% reported having an adenoma detected prior to the baseline exam (37, 39). Study participants were scheduled to undergo colonoscopy at one and three years after their baseline exam; 10% of participants received only the one-year exam, 21% received only the three-year exam, and 69% received both exams. For participants who received both exams, outcomes were based on combined results across exams. Study colonoscopies were conducted in the community, with outcomes based on review of colonoscopy reports and histologic slides and tissue blocks.

### Method for simulating validation targets

We simulated a sample that matched the baseline WBF study sample in terms of the marginal distribution of participants' age, sex, personal history of detected adenomas, the percentage who had each follow-up exam, and colonoscopy findings (shown in Table 2). To create this sample, we first simulated a population of 5 million that was 50% male with age at the baseline exam drawn from a normal distributing with mean 70 and SD = 15 that was truncated to range from 40 to 80 years old, corresponding to WBF study eligibility requirements.

. | Observed . |
---|---|

Percent male | 66.87% |

Age quartiles | |

40.3–61.1 years old | 25% |

61.2–68.1 years old | 25% |

68.2–73.1 years old | 25% |

73.2–81.1 years old | 25% |

Percent with a personal history of adenomas^{a} | 34.66% |

Findings at baseline colonoscopy | |

Number of adenomas detected | |

1 | 57.26% |

2 | 22.30% |

3+ | 20.44% |

Size of largest adenoma | |

<5 mm | 30.68% |

5–10 mm | 41.87% |

>10 mm | 27.45% |

. | Observed . |
---|---|

Percent male | 66.87% |

Age quartiles | |

40.3–61.1 years old | 25% |

61.2–68.1 years old | 25% |

68.2–73.1 years old | 25% |

73.2–81.1 years old | 25% |

Percent with a personal history of adenomas^{a} | 34.66% |

Findings at baseline colonoscopy | |

Number of adenomas detected | |

1 | 57.26% |

2 | 22.30% |

3+ | 20.44% |

Size of largest adenoma | |

<5 mm | 30.68% |

5–10 mm | 41.87% |

>10 mm | 27.45% |

^{a}Missings were set to “no.”

Next, we simulated prior colonoscopy. Because we only had information about personal history of at least one adenoma, but not the number or timing of past exams or characteristics of individuals with a history of adenomas, we randomly selected 25% of the population, and in this subset simulated a single prior colonoscopy. Primary (base-case) analyses assumed that the prior exam occurred five years before the baseline exam. We dropped individuals with a simulated “clean” colonoscopy, with no adenomas detected, as these individuals would not return for colonoscopy by the time of the baseline exam. Sensitivity analyses considered three alternative assumptions about the timing of the prior colonoscopy: (i) one prior exam three years before the baseline exam, excluding individuals with a clean colonoscopy from the population; (ii) one prior exam 10 years before the baseline exam, retaining individuals with a clean colonoscopy; and (iii) no prior exams. When simulating one prior colonoscopy 10 years ago, we assumed 50% of the sample had a prior exam. When simulating no prior exams, we did not match to personal history of adenomas. Simulation of no prior exams was used to examine the impact of ignoring this feature of the study sample on validation.

After simulating prior colonoscopy, we simulated the baseline colonoscopy and retained only study-eligible individuals: those with at least one adenoma 3 mm or larger detected at the baseline exam that had not been diagnosed with colorectal cancer. Among eligible individuals, we simulated up to two follow-up colonoscopies, using the observed percentage of participants who received each exam. We assumed that the one-year follow-up exam occurred 15 months after baseline, and that the three-year follow-up exam occurred 39 months after baseline. We assumed that 95% of exams were complete to the cecum, 1% were complete only to the ascending colon, 1% reached the entire transverse colon, 1% reached the entire descending colon, 1% reached the entire sigmoid colon, and 1% only fully examined the rectum.

All analyses were carried out using the R version 4.1.0 programming language (40). Selection of colonoscopy sensitivity was performed using EMEWS software to implement the grid experimental design on a high performance computing cluster at Argonne National Laboratory (41). For each point in the experimental design, we calculate a set of model outcomes reported by the WBF trial under the assumption of a single prior colonoscopy 5 years before baseline (base-case assumptions), including the number of adenomas detected (0, 1, 2, or 3+), the size of largest adenoma, and the colorectal cancer prevalence during follow-up colonoscopy. The full experimental design of 1,089 colonoscopy sensitivity assumptions required 120 computing hours to complete.

### Data availability statement

Data generated in this study are available upon request from the corresponding author.

## Results

When we assumed standard colonoscopy sensitivities, the model predicted only three of the eight validation targets within 95% confidence intervals, accurately predicting the percentage of people with two or more adenomas detected at follow-up and colorectal cancer incidence during follow-up. The model predicted that too many people would have no adenomas detected at follow-up, too few people with a single adenoma detected, too many with a largest adenoma detected at follow-up that was diminutive, and too few with a largest adenoma detected that was small or large (Table 3).

. | Targets . | Colonoscopy sensitivity . | Percentage of simulated sensitivity scenarios that accurately predict targets^{b}
. | |
---|---|---|---|---|

. | Observed with 95% CI^{a}
. | Standard . | Calibrated . | . |

Sample size^{c} | 1,303 | 3,750 | 4,014 | |

Number of adenomas detected^{c} | ||||

0 | 52.0% (49.3%–54.7%) | 57.5% | 51.3% | 84.8% |

1 | 22.4% (20.1%–24.7%) | 15.9% | 18.2% | 0.2% |

2 | 9.2% (7.6%–10.8%) | 9.9% | 10.8% | 73.5% |

3+ | 16.3% (14.3%–18.3%) | 16.7% | 19.7% | 22.6% |

Size of largest adenoma^{c} | ||||

<5 mm | 41.3% (37.4%–45.2%) | 71.6% | 42.0% | 9.4% |

5–10 mm | 44.1% (40.2%–48.0%) | 23.3% | 44.1% | 27.4% |

>10 mm | 14.6% (11.8%–17.4%) | 5.1% | 14.6% | 1.1% |

Colorectal cancer detected during follow-up (N = 9)^{d} | 0.7% (0.2%–1.2%) | 0.3% | 0.9% | 99.2% |

. | Targets . | Colonoscopy sensitivity . | Percentage of simulated sensitivity scenarios that accurately predict targets^{b}
. | |
---|---|---|---|---|

. | Observed with 95% CI^{a}
. | Standard . | Calibrated . | . |

Sample size^{c} | 1,303 | 3,750 | 4,014 | |

Number of adenomas detected^{c} | ||||

0 | 52.0% (49.3%–54.7%) | 57.5% | 51.3% | 84.8% |

1 | 22.4% (20.1%–24.7%) | 15.9% | 18.2% | 0.2% |

2 | 9.2% (7.6%–10.8%) | 9.9% | 10.8% | 73.5% |

3+ | 16.3% (14.3%–18.3%) | 16.7% | 19.7% | 22.6% |

Size of largest adenoma^{c} | ||||

<5 mm | 41.3% (37.4%–45.2%) | 71.6% | 42.0% | 9.4% |

5–10 mm | 44.1% (40.2%–48.0%) | 23.3% | 44.1% | 27.4% |

>10 mm | 14.6% (11.8%–17.4%) | 5.1% | 14.6% | 1.1% |

Colorectal cancer detected during follow-up (N = 9)^{d} | 0.7% (0.2%–1.2%) | 0.3% | 0.9% | 99.2% |

The grid search revealed that much lower colonoscopy sensitivities would be required for closer prediction of WBF trial targets, especially for diminutive adenomas. Figure 1 shows results across the range of colonoscopy sensitivity assumptions in light gray for 350 scenarios randomly selected from the 1,199 considered. The model was able to match at most 6 of the 8 validation targets and was only able to do so for 12 sensitivity scenarios (shown in Table 4 and in purple on Fig. 1) with very low sensitivity. Across these 12 scenarios, the sensitivities for diminutive adenomas equal 0.20 in 11 cases and 0.25 in one other case; sensitivities for small adenomas ranged from 0.55 to 0.75, sensitivities for large adenomas ranged from 0.080 to 0.90, and sensitivities for extra-large adenomas ranged from 0.90 to 0.98. The best of these 12 scenarios, which predicted closest to targets (shown as the black line on Fig. 1), set sensitivity equal to 0.20 for diminutive adenomas, 0.55 for small adenomas, 0.80 for large adenomas, and 0.90 for extra-large adenomas. However, as shown in Fig. 1, differences in the predictive accuracy across these 12 scenarios were small. The one scenario among these 12 with sensitivity of 0.25 for diminutive adenomas resulted in the highest predicted percentage of individuals whose largest adenoma at follow-up was diminutive, and the smallest percentage whose largest adenoma was small. A scenario that set sensitivity equal to 0.20 for diminutive adenomas, 0.75 for small adenomas, 0.90 for large adenomas, and 0.92 for extra-large adenomas is closer to standard assumptions and was also consistent with six of the eight WBF targets.

. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Colonoscopy sensitivity | ||||||||||||

Diminutive (<6 mm) | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 | 0.25 | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 |

Small (6–10 mm) | 0.60 | 0.55 | 0.55 | 0.60 | 0.55 | 0.55 | 0.55 | 0.55 | 0.60 | 0.75 | 0.70 | 0.75 |

Large (10–20 mm) | 0.80 | 0.80 | 0.80 | 0.80 | 0.85 | 0.85 | 0.80 | 0.85 | 0.90 | 0.80 | 0.85 | 0.90 |

Extra-large (>20 mm) | 0.98 | 0.90 | 0.92 | 0.90 | 0.94 | 0.98 | 0.98 | 0.90 | 0.94 | 0.92 | 0.96 | 0.92 |

Number of adenomas detected | ||||||||||||

0 | 50.8% | 51.0% | 51.0% | 50.8% | 52.9% | 50.9% | 49.9% | 50.2% | 52.8% | 53.1% | 53.4% | 53.8% |

1 | 19.8% | 19.0% | 19.0% | 19.1% | 18.2% | 18.7% | 19.1% | 19.4% | 18.2% | 18.8% | 18.0% | 18.3% |

2 | 10.0% | 10.5% | 10.3% | 10.5% | 10.4% | 10.4% | 10.5% | 10.1% | 10.7% | 10.3% | 10.7% | 9.9% |

3+ | 19.5% | 19.6% | 19.8% | 19.6% | 18.5% | 20.0% | 20.4% | 20.2% | 18.3% | 17.9% | 17.9% | 18.0% |

Size of largest adenoma detected | ||||||||||||

<5 mm | 40.8% | 41.5% | 42.3% | 42.6% | 42.9% | 40.6% | 44.7% | 40.7% | 42.7% | 43.6% | 42.2% | 43.1% |

5–10 mm | 47.2% | 46.2% | 45.3% | 45.5% | 45.2% | 46.9% | 42.4% | 47.3% | 46.7% | 46.9% | 47.8% | 47.7% |

>10 mm | 12.0% | 12.4% | 12.4% | 11.9% | 11.9% | 12.5% | 12.9% | 12.0% | 10.6% | 9.5% | 9.9% | 9.1% |

Colorectal cancer detected | 0.9% | 0.9% | 1.1% | 1.0% | 0.9% | 1.0% | 1.0% | 1.1% | 0.8% | 0.8% | 0.9% | 0.9% |

Scaled distance from targets | 0.018 | 0.019 | 0.019 | 0.021 | 0.022 | 0.022 | 0.023 | 0.023 | 0.031 | 0.040 | 0.041 | 0.049 |

. | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Colonoscopy sensitivity | ||||||||||||

Diminutive (<6 mm) | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 | 0.25 | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 |

Small (6–10 mm) | 0.60 | 0.55 | 0.55 | 0.60 | 0.55 | 0.55 | 0.55 | 0.55 | 0.60 | 0.75 | 0.70 | 0.75 |

Large (10–20 mm) | 0.80 | 0.80 | 0.80 | 0.80 | 0.85 | 0.85 | 0.80 | 0.85 | 0.90 | 0.80 | 0.85 | 0.90 |

Extra-large (>20 mm) | 0.98 | 0.90 | 0.92 | 0.90 | 0.94 | 0.98 | 0.98 | 0.90 | 0.94 | 0.92 | 0.96 | 0.92 |

Number of adenomas detected | ||||||||||||

0 | 50.8% | 51.0% | 51.0% | 50.8% | 52.9% | 50.9% | 49.9% | 50.2% | 52.8% | 53.1% | 53.4% | 53.8% |

1 | 19.8% | 19.0% | 19.0% | 19.1% | 18.2% | 18.7% | 19.1% | 19.4% | 18.2% | 18.8% | 18.0% | 18.3% |

2 | 10.0% | 10.5% | 10.3% | 10.5% | 10.4% | 10.4% | 10.5% | 10.1% | 10.7% | 10.3% | 10.7% | 9.9% |

3+ | 19.5% | 19.6% | 19.8% | 19.6% | 18.5% | 20.0% | 20.4% | 20.2% | 18.3% | 17.9% | 17.9% | 18.0% |

Size of largest adenoma detected | ||||||||||||

<5 mm | 40.8% | 41.5% | 42.3% | 42.6% | 42.9% | 40.6% | 44.7% | 40.7% | 42.7% | 43.6% | 42.2% | 43.1% |

5–10 mm | 47.2% | 46.2% | 45.3% | 45.5% | 45.2% | 46.9% | 42.4% | 47.3% | 46.7% | 46.9% | 47.8% | 47.7% |

>10 mm | 12.0% | 12.4% | 12.4% | 11.9% | 11.9% | 12.5% | 12.9% | 12.0% | 10.6% | 9.5% | 9.9% | 9.1% |

Colorectal cancer detected | 0.9% | 0.9% | 1.1% | 1.0% | 0.9% | 1.0% | 1.0% | 1.1% | 0.8% | 0.8% | 0.9% | 0.9% |

Scaled distance from targets | 0.018 | 0.019 | 0.019 | 0.021 | 0.022 | 0.022 | 0.023 | 0.023 | 0.031 | 0.040 | 0.041 | 0.049 |

Note: Values in bold correspond to predictions that were within 95% confidence intervals.

Table 3 shows the percentage of simulated scenarios that resulted in accurate prediction of each target. Some targets were relatively insensitive to assumptions about colonoscopy sensitivity: 99.6% of the scenarios examined resulted in accurate prediction of cancer incidence, 93.1% in accurate prediction of adenoma prevalence, and 72.4% in accurate prediction of the percentage of people with two adenomas detected at follow-up. Other targets were rarely predicted accurately. Only two scenarios resulted in accurate prediction of the percentage of people with one adenoma detected at follow-up (with sensitivity for diminutive adenomas equal to 0.20 and 0.25); all other model predictions were too low. Similarly, only 13 scenarios were able to accurately predict the percentage of people whose largest adenoma detected was ≥10 mm (sensitivity for diminutive adenomas was 0.20 for nine and 0.25 for four); all other model predictions were too low. None of the scenarios examined were able to accurately predict both the percentage of people with three or more adenomas detected and the percentage whose largest adenoma detected was ≥10 mm (results shown in Table 4).

Sensitivity analyses that examined the timing of past exams assumed standard sensitivities and best-fitting calibrated colonoscopy sensitivities. Findings were insensitive to assumptions about the timing of past exams. When we assumed a single colonoscopy 10 years ago, there were slightly fewer people whose maximum adenoma size was 5 mm or smaller and slightly more with a maximum size that was ≥10 mm; there were also slightly fewer people with a single adenoma detected and slightly more with three or more adenomas detected. When we assumed a single colonoscopy three years ago, there were slightly fewer people whose maximum adenoma size was ≥10 mm. When we assumed no prior exams and ignored prior adenoma history when creating the simulated sample, the predicted adenoma prevalence at follow-up was too low, likely reflecting how past screening modifies risk in the simulated sample.

## Discussion

Accurate prediction of targets from the WBF trial required that we assume sensitivity that was lower than values often used, especially for diminutive adenomas. Calibration of colonoscopy sensitivity clearly favored low sensitivity for diminutive adenomas. Upper limits of sensitivities for larger adenomas were also well below the values often assumed in decision analysis. As the true sensitivity of lesion detection decreases, the chance of overestimating sensitivity based on tandem exams increases. As a result, differences between standard sensitivities and calibrated sensitivities were largest for diminutive adenomas. Missing diminutive adenomas can affect predictions of the long-term effectiveness of colonoscopy but not shorter-term predictions of cancers detected because most cancers arise from large (≥10 mm) adenomas. These results are consistent with prior model validation of the CRC-SPIN model, which predicted a somewhat stronger protective effect of one-time sigmoidoscopy on colorectal cancer incidence and mortality 17 years after the intervention (33). The relatively low sensitivities required for validation to the WBF trial results may reflect the optical imaging technology available in the 1990s when the trial was conducted. High-definition endoscopes available today can better visualize lesions.

Even when we selected sensitivities that resulted in the closest prediction to targets, we could not accurately predict the distribution of adenoma counts, predicting too few people with one adenoma detected and too many with three or more adenomas detected. The model predicts a simplified screening scenario, but real life is complicated. The model does not simulate incomplete resection, which could result in recurrence (42). The model also assumes all gastroenterologists have the same sensitivity, though there is evidence of variation in detection rates (43). When the WBF study was conducted, colonoscopy quality was not widely studied, emphasized, and monitored. This may have led to wider variation in adenoma detection rates among the community endoscopists who examined study participants. Variability in sensitivity across exams would result in greater variability in the number of adenomas detected and more follow-up exams with a single adenoma detected, but little change in the number with three or more adenomas detected. The model does not simulate behaviors that sometimes occur in practice, but are unlikely to an impact on observed findings given the timing of the WBF study, including a “one and done” strategy, whereby gastroenterologists are less likely to identify more than one adenoma (44) and a resect and discard strategy for some diminutive polyps (45).

If we believe the CRC-SPIN model, then it is relatively easy to examine our assumptions about sensitivity. But another explanation for our inability to accurately predict all outcomes is that the natural history model inadequately describes the colorectal cancer disease process. The model may not simulate enough fast-growing adenomas, especially for populations that are at higher risk for developing adenomas, because the model assumes adenoma risk and adenoma growth are uncorrelated. Future analyses will extend the model to allow positive correlation between adenoma risk and growth. Randomized prevention trials provide the best data available to calibrate such an extended model. The analyses presented in this paper provide new information about the variation of colonoscopy sensitivity for adenoma detection that could be incorporated into model calibration, sensitivity analysis, and robust decision-making. Alternatively, the model may simulate too many diminutive adenomas, so that the model can only fit WBF trial data when colonoscopy is assumed to have low sensitivity to detect these lesions. The adenoma size distribution was based on data from a study of 243 individuals with an average age of 57, which found that 60% of adenomas detected by either CTC or optical colonoscopy were less than 5 mm, 29% were 6 to 9 mm, and 9% were 10 mm or larger (27). Our calibrated model predicted this distribution closely, simulating 63% of detected adenomas <5 mm, 25% of adenomas 6 to 9 mm, and 12% of adenomas 10 mm or larger (18). Another possible explanation is that sessile serrated lesions with neoplastic changes were included among the large adenomas identified by colonoscopy. The model does not include the serrated pathway and is under revision to add this pathway. Extending the model to examine limitations is much more difficult than examining colonoscopy sensitivity, but ultimately must be part of model evaluation.

Using results from a single study allowed generation of simulated study samples that closely matched WBF study characteristics at baseline. Adenoma recurrence estimates from the WBF trial align with estimates from similar adenoma prevention trials (46–49). Adenoma recurrence ranged from 25.8% to 39.6% across these studies versus 47.9% in the WBF trial. These somewhat lower recurrence rates may be attributable to enrollment of participants with lower adenoma risk: other trials enrolled younger participants (average age 57.9–61.5 years versus 66.2 years; refs. 46–49), two included fewer men (approximately 57% male versus 67% male; refs. 48, 49), and two used less stringent inclusion criteria, based on detection of an adenomas of any size versus 3 mm or larger. However, the overall similarity of adenoma recurrence across these trials suggests generalizability of the results we present.

Had the model had predicted all validation targets accurately using standard sensitivity assumptions, we would have gained confidence in CRC-SPIN predictions. The model's inability to predict all validation targets accurately under standard assumptions raises questions. We focused on colonoscopy sensitivity as a key area of uncertainty because it is often overlooked, but uncertainties about other model assumptions also merit exploration. Uncertainties that cannot be fully resolved given the current state of scientific knowledge have the potential to affect conclusions about screening effectiveness, and the relative effectiveness of different screening modalities. Ideally, decision analyses carried out to inform policy, such as screening guidelines, would address the robustness of recommendations to the identified uncertainties (50). Robust decision-making approaches have been applied to a wide range of deeply uncertain decision problems, including terrorism risk insurance (51), water resources management (52), and COVID-19 reopening plans (53), and could be applied to health policy. Although thorough robustness analyses are computationally intensive, they can offer a systematic method for characterizing and demonstrating the robustness of model-based recommendations to potentially consequential assumptions.

These findings also highlight the need for more studies addressing the colonoscopy sensitivity for adenomas by size, especially in the era of high-quality colonoscopies and incredible imaging technology. These analyses need to use statistical methods to account for the lack of a true gold standard, which could leave adenomas missed by both tandem colonoscopies.

## Authors' Disclosures

C.M. Rutter reports grants from the NCI of the NIH during the conduct of the study. P. Nascimento de Lima reports grants from the NCI of the NIH during the conduct of the study. J. Ozik reports grants from the NCI of the NIH during the conduct of the study. No disclosures were reported by the other authors.

## Authors' Contributions

**C.M. Rutter:** Conceptualization, resources, software, supervision, funding acquisition, methodology, writing–original draft, project administration, writing–review and editing. **P. Nascimento de Lima:** Software, formal analysis, validation, visualization, methodology, writing–review and editing. **J.K. Lee:** Writing–original draft, writing–review and editing. **J. Ozik:** Resources, software, methodology, writing–original draft.

## Acknowledgments

The research reported in this publication was supported by the NCI of the NIH under Award Number U01CA199335 (C.M. Rutter) and U01CA253913 (C.M. Rutter, P. Nascimento de Lima, J. Ozik). J.K. Lee was supported by the NCI under Award Number K07CA212057. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. This work was completed with resources provided by the Laboratory Computing Resource Center at Argonne National Laboratory (Bebop cluster).

The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked *advertisement* in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.