Abstract
Cancer modeling has become an accepted method for generating evidence about comparative effectiveness and cost-effectiveness of candidate cancer control policies across the continuum of care. Models of early detection policies require inputs concerning disease natural history and screening test performance, which are often subject to considerable uncertainty. Model validation against an external data source can increase confidence in the reliability of assumed or calibrated inputs. When a model fails to validate, this presents an opportunity to revise these inputs, thereby learning new information about disease natural history or diagnostic performance that could both enhance the model results and inform real-world practices. We discuss the conditions necessary for validly drawing conclusions about specific inputs such as diagnostic performance from model validation studies. Doing so requires being able to faithfully replicate the validation study in terms of its design and implementation and being alert to the problem of non-identifiability, which could lead to explanations for failure to validate other than those identified.
See related article by Rutter et al., p. 775
Cancer modeling is many things. It is a field of research, a tool for information synthesis, and a method for evidence generation. A cancer model may be deployed at any point in the continuum of care, from prevention to salvage therapy. In early detection, models are routinely used to study the potential harms and benefits of candidate screening strategies that cannot all be evaluated empirically.
Early detection models build on two key components: disease natural history, which describes the mostly latent events in disease onset and progression; and screening test performance, which measures (among other things) the test's sensitivity to detect latent disease. The interaction between test sensitivity and natural history advances detection, which in turn has the potential to lengthen disease-specific life expectancy.
Model inputs that are not immediately estimable from observed data must be learned by calibrating model-projected incidence patterns against those observed. External validation in an independent dataset can help to increase confidence that the learned model is valid. But what happens when model validation is unsuccessful?
In this issue of Cancer Epidemiology, Biomarkers and Prevention, Rutter and colleagues use a failure of their colorectal cancer model to validate externally as a learning opportunity (1). Their model, CRC-SPIN (2), captures not only the process of cancer onset and progression, but also the precancer pathway from tiny adenomas to larger ones with risks of becoming cancerous. The model includes the potential for individuals to develop multiple adenomas. This turns out to be vital for the validation, which involves simulating adenomas following prior adenoma detection. Validation targets are from the Wheat Bran Fiber Study, in which participants with resected adenomas larger than 3 mm in size were randomized to a high-fiber diet, in an effort to prevent new adenoma growth (3, 4).
Rutter and colleagues find that when inputting published values for colonoscopy sensitivity to detect smaller and larger adenomas, their model underpredicts the frequency and size of the adenomas in the validation study. Only when they reset colonoscopy sensitivity to be considerably lower, particularly for the smallest (diminutive) adenomas, are they able to more adequately match their validation targets. Their conclusion? That the published sensitivities must be inflated, leading to the model identifying – and removing – too many adenomas at the start of the simulated validation study. The model's best values for the revised sensitivities are dramatically lower – 0.2 rather than 0.75 for the smallest adenomas and 0.55 rather than 0.85 for small adenomas.
Using the validation process to produce new insights about key model components has some precedent. Notably, the creators of the Wisconsin Breast Cancer Epidemiology Simulation Model found that a natural history model predicated on disease always being progressive did not adequately account for the sharply increased incidence of in situ and small invasive cancers after the start of population-level screening (5). They hypothesized that a certain fraction of tumors had to be of limited malignant potential and were either indolent or ultimately regressive. Subsequent studies have confirmed that such cancers contribute substantially to overdiagnosis under mammography screening (6, 7). Rutter and colleagues use the validation process to zero in on the diagnostic properties of their screening modality rather than disease natural history, but the idea is the same.
A few criteria must be addressed before a specific cause can be determined as the driver of an unsuccessful model validation. The model projections must reflect the validation study population and protocol, not just as designed, but also as implemented. Rutter and colleagues make considerable effort to ensure they are faithfully reflecting the Wheat Bran Fiber Study, simulating representative personal histories of detected adenomas and patterns of adherence to scheduled colonoscopies during the follow-up interval. But even if the validation study can be replicated, there remains the question of identifiability – whether a unique explanation for the lack of validation can be convincingly isolated.
This problem of identifiability arises when models attempt to match observed detection patterns based on underlying events in disease progression that are not fully observable. In practice, it may be possible for the same detection patterns to result from a disease process that generates fewer such events but is detected with higher sensitivity and a disease process that generates more events but is detected with lower sensitivity. Non-identifiability can be an issue when the diagnostic performance of the detection modality is not well established; this turns out to be the case for colonoscopy and small adenomas.
Even though Rutter and colleagues cite published studies of colonoscopy sensitivity by adenoma size, true sensitivity is notoriously difficult to estimate in prospective screening studies. In the case of colonoscopy, it requires gold-standard assessment of true adenoma status across the length of the colon, which is difficult to imagine given currently available technologies. The CRC-SPIN model's adenoma sensitivity estimates by size are based on tandem colonoscopy studies (8); it is entirely possible that such studies will underestimate sensitivity for small adenomas. To arrive at their conclusion that this is likely the case, Rutter and colleagues condition on their previously calibrated progression model as they seek a better-fitting set of colonoscopy sensitivities. While their conclusions may well prompt a reckoning of sorts regarding whether the field has been overly optimistic about the diagnostic performance of colonoscopy, they do not escape the identifiability question. The authors recognize this issue and in fact mention that tweaking a model assumption about natural history could conceivably produce more larger adenomas in their simulation of the Wheat Bran Fiber Study. Whether this would have led to a closer correspondence between model projections and validation targets without changing sensitivity inputs remains unclear.
In conclusion, cancer modeling can, via the process of validation, become a learning exercise – an opportunity to learn features of the disease or diagnostic process that are not directly informed by empirical observation. Indeed, the mechanistic understanding that goes into building a model can help modelers delineate the set of plausible explanations when the model fails to validate. By systematically interrogating the various possibilities, modelers can convince themselves – and others – that their conclusions are watertight.
Authors' Disclosures
No disclosures were reported.
Acknowledgments
This work was supported by the Rosalie and Harold Rea Brown Endowed Chair at the Fred Hutchinson Cancer Research Center and NCI Cancer Center Support Grant P30 CA0157040.