Abstract
Multi-cancer tests offer screening for multiple cancers with one blood draw, but the potential population impact is poorly understood.
We formulate mathematical expressions for expected numbers of: (i) individuals exposed to unnecessary confirmation tests ( |${\rm{EUC}}$| ), (ii) cancers detected ( |${\rm{CD}}$| ), and (iii) lives saved ( |${\rm{LS}}$| ) given test performance, disease incidence and mortality, and mortality reduction. We add colorectal, liver, lung, ovary, and pancreatic cancer to a test for breast cancer, approximating prevalence at ages 50, 60, or 70 using incidence over the next 5 years and mortality using corresponding probabilities of cancer death over 15 years in the Surveillance, Epidemiology, and End Results registry.
|${\rm{EUC}}$| is overwhelmingly determined by specificity. For a given specificity, |${\rm{EUC}}/{\rm{CD}}$| is most favorable for higher prevalence cancers. Under 99% specificity and sensitivities as published for a 50-cancer test, |${\rm{EUC}}/{\rm{CD}}$| is 1.1 for breast + lung versus 1.3 for breast + liver at age 50. Under a common mortality reduction associated with screening, |${\rm{EUC}}/{\rm{LS}}$| is most favorable when the test includes higher mortality cancers (e.g., 19.9 for breast + lung vs. 30.4 for breast + liver at age 50 assuming a common 10% mortality reduction).
Published multi-cancer test performance suggests a favorable tradeoff of |${\rm{EUC}}$| to |${\rm{CD}}$|, yet the full burden of unnecessary confirmations will depend on the posttest work-up protocol. Harm–benefit tradeoffs will be improved if tests prioritize more prevalent and/or lethal cancers for which curative treatments exist.
The population impact of multi-cancer testing will depend not only on test performance but also on disease characteristics and efficacy of early treatment.
See related commentary by Duffy and Sasieni, p. 3
Introduction
The advent of liquid biopsy technology has ushered in a new era of cancer diagnostics. Assays to detect circulating cell-free DNA supplemented by protein and methylation signatures promise to dramatically alter the landscape of cancer surveillance and early detection. In particular, the possibility of multi-cancer early detection, where a single blood sample is interrogated for multiple cancers, is attracting a great deal of attention (1, 2).
Several multi-cancer tests are in development, each harnessing different features of the circulating tumor DNA. Liu and colleagues present a test using targeted methylation analysis of circulating cell-free DNA that in principle detects and localizes more than 50 cancer types, and quantify its sensitivity for a prespecified set of 12 cancers (3). Cohen and colleagues present diagnostic performance of a test using circulating DNA and protein biomarkers (4); Lennon and colleagues identify 10 cancer types using an updated version of the test (5); and Cristiano and colleagues present a test that uses fragmentation patterns of cell-free DNA across the genome along with mutation-based cell-free DNA, estimating the diagnostic performance of this test across seven cancer types (6).
Multi-cancer testing offers the potential for improved diagnosis of cancers where tests already exist, such as mammograms in women with dense breasts (7, 8). It is less invasive than some existing tests, such as colonoscopies for colorectal cancer. Furthermore, multi-cancer tests include cancers for which tests do not currently exist, either because the search for biomarkers with adequate performance has been unsuccessful or because disease prevalence is extremely low (9). Aggregating diagnosis of these cancers into a single test could increase the combined prevalence to an acceptable level for population screening. There may also be an economic advantage to doing a single test rather than a series of individual tests.
At this point, however, many questions exist about the likely impact of multi-cancer testing on population benefits, such as cancer deaths prevented, and harms, such as unnecessary imaging tests or biopsies (1). These outcomes depend critically on disease prevalence at the time of the test, mortality in the absence of the test, and the ability to treat potentially fatal disease if detected early.
This study investigates how the potential benefits and harms of a single-occasion multi-cancer test depend on test and disease characteristics. We define a multi-cancer test as a test that screens for a specified set of ≥2 target cancers. We first explore the simple setting of hypothetical two-cancer tests to establish a conceptual foundation for thinking about outcomes in the multi-cancer setting. We then examine more realistic tests involving up to six cancers with implications for more general multi-cancer tests. Our analysis creates a framework for quantifying the population impact of multi-cancer tests while pointing to criteria for the number and type of cancers to include.
Materials and Methods
Overview
Suppose that we have a test for a single cancer (cancer |${\rm{A}}$| ) and our goal is to evaluate a test that includes two cancers (cancers |$\rm A $| and |$\rm B $| ). We assume that the test produces an assessment of whether cancer is present and indicates a tissue of origin (TOO). We further assume that individuals do not have both cancers present concurrently at the time of the test. We present results for hypothetical two-cancer tests and show how they depend on test performance and cancer characteristics. We then present results for breast cancer (cancer |${\rm{A}})$| and colorectal, liver, lung, ovary, or pancreatic cancer (cancer |${\rm{B}}$|), evaluating the measures of harm and benefit as we sequentially build up to six cancers based on published characteristics of an existing multi-cancer test (3). We select these cancers to include both common (breast, lung) and rare (ovary, liver) cancers as well as cancers with particularly poor prognosis if detected at advanced stage (ovary, pancreatic). For all multi-cancer tests, we produce results for single-occasion testing at ages 50, 60, or 70 to explore age dependence of the outcomes.
Test performance
The sensitivity of a multi-cancer test has two components: (i) the test sensitivity, which is the probability the test returns a cancer signature given that a targeted cancer is present, and (ii) the probability of correct localization for each targeted cancer. For cancer |${\rm{A}}$|, the test sensitivity is the probability that the test returns a cancer signature when cancer |${\rm{A}}$| is present. Both the test sensitivity and the probability of correct localization may differ across cancers. We note that, while our discussion refers to sensitivity in general terms, it is the sensitivity to detect early-stage tumors that counts. Published multi-cancer studies have shown considerably poorer sensitivities for early compared with advanced-stage tumors (4).
We define the marginal sensitivity for a target cancer as the probability that the test returns a cancer signal and correctly localizes the target cancer when it is present. The marginal sensitivity of a multi-cancer test for cancer |${\rm{A}}$| can be written:
where |${P_A}( {{T^ + }} )$| is the sensitivity of the multi-cancer test, that is, the probability the test returns a cancer signal given cancer |${\rm{A}}$| is present, and |${L_A}( {{T^ + }} )$| is the probability of correctly localizing the TOO as |${\rm{A}}$| given that the multi-cancer test returns a cancer signal (|${T^ + }$|) and cancer |${\rm{A}}$| is in fact present. With |$k$| targeted cancers there are |$k$| marginal sensitivities. An early-stage version of the marginal sensitivity can be defined that specifies the test sensitivity as the probability the test returns a cancer signal if an early-stage cancer is present with the correct localization probability defined accordingly.
We define the specificity of a multi-cancer test as the probability the test returns a “no cancer” signature when none of the targeted cancers is present. If a cancer outside of the target set is detectable by the test and produces a cancer signature, this would be considered a false-positive result. This implies that the specificity of a multi-cancer test will vary with the specified set of target cancers.
Outcome metrics
In single-cancer testing, a key metric used to evaluate harm-benefit is the positive predictive value ( |${\rm{PPV}})$| . In this setting, |$1/{\rm{PPV}}$| is often cited as the number of biopsies required per cancer detected, and |$1/{\rm{PPV}} - 1$| the number of unnecessary biopsies per cancer detected (assuming that all positive tests are followed by a biopsy). Thus, for example, a |${\rm{PPV}}$| of 1/3 corresponds to 2 unnecessary biopsies per cancer detected, and a |${\rm{PPV}}$| of 1/2 corresponds to 1 unnecessary biopsy per cancer detected. The |${\rm{PPV}}$| has proven to be a useful metric even though it depends on disease prevalence and therefore may not generalize from one population to another.
In single-cancer testing, each individual is subject to one biopsy at most, so the unnecessary biopsies arise solely from false positive tests, that is, from individuals without cancer. In multi-cancer testing, however, unnecessary confirmation tests may occur in individuals who have one of the cancers included in the test, when the multi-cancer test correctly returns a cancer signal but incorrectly identifies the TOO. In this case, a confirmation test performed to verify that cancer is present in the putative TOO would constitute an unnecessary confirmation test. Thus, in multi-cancer testing, unnecessary confirmation tests arise from individuals without any of the targeted cancers as well as from those whose tumor is incorrectly localized.
For a two-cancer test, we can directly formulate the expected number of screened individuals potentially exposed to unnecessary confirmation |$( {{\rm{EUC}}} )$| as:
where |$N$| is the number of individuals tested, |${\rho _A}$| and |${\rho _B}$| are the prevalence of cancers |$A\ $| and |$B$|, respectively, and |$Sp$| is the specificity of the test. The first two terms in square brackets reflect the correct return of a cancer signal but incorrect localization, and the third term reflects a false positive. In practice, if the specificity is even modestly below 100% and the prevalence of each cancer is not high, the third term will dominate. We note that |${\rm{EUC}}$| reflects persons exposed to unnecessary confirmation tests and not the actual number of such tests performed.
The expected number of cancers detected depends on the marginal sensitivity of the multi-cancer test to detect each cancer and, in the setting of two cancers, is given by:
The formulation of |${\rm{CD}}$| reflects the expected number of people diagnosed with cancer following confirmation precipitated by the multi-cancer test.
We project the expected lives saved by assuming a disease-specific mortality reduction for each targeted cancer that is detected early and applying this value to the cumulative risk of disease-specific death without the test. The mortality reduction due to any screening test is a complex function of the interaction between screening test performance, screening protocol, and disease natural history. Rather than explicitly modeling these interactions for each cancer, we assume a value for the mortality reduction among cancers detected by the test. Specifically, we write the expected number of lives saved as:
where |${m_A}$| and |${m_B}$| are probabilities of cancer-specific death for cancers |${\rm{A}}$| and |${\rm{B}}$|, respectively, in the absence of the screening test, and |${R_A}$| and |${R_B}$| are the corresponding mortality reductions among those detected by the test. Implicit in this formula is the understanding that lives saved will be limited to cases that are prevalent at the time of the test. For a given mortality reduction factor, this formulation automatically increases the expected lives saved in proportion to the marginal sensitivity; higher marginal sensitivity will lead to more lives saved. We note that the mortality reduction factor need not be the same for all cancers, and for a given cancer it may depend on whether marginal sensitivity is specified for all cancers or for early-stage cancers.
General expressions for |${\rm{EUC}}$|, |${\rm{CD}}$|, and |${\rm{LS}}$| for a |$k$|-cancer test are given in the Supplementary Methods and Materials (available online). In practice, all three outcomes depend on the protocol for confirmation testing, including how to proceed if confirmation testing of the putative TOO returns a negative result. In addition to the absolute outcomes, we also examine |${\rm{EUC}}/{\rm{CD}}$| and |${\rm{EUC}}/{\rm{LS}}$| as measures of harm-to-benefit tradeoffs.
Evaluation of hypothetical and more realistic multi-cancer tests
We first consider a two-cancer test that screens for breast cancer (|${\rm{A}}$|) and colorectal, liver, lung, ovary, or pancreatic cancer (|${\rm{B}}$|) among women in the United States. We base sensitivities and correct localization probabilities on estimates from Liu and colleagues for cancers in stages I–III (3) and specificity (99%) on their estimate (99.3%). Similar values for specificity have been reported in other studies (4, 6). Because a multi-cancer test for fewer cancers than the number considered in that study ( |$n\ = \ 50$| ) may have a lower specificity, we also examine a lower specificity (97%) in a sensitivity analysis.
Age-specific prevalence of cancer is based on incidence rates observed in the Surveillance, Epidemiology, and End Results (SEER) registry over the years 2000 to 2002 inclusive (10). We assume that cancers diagnosed within the 5-year age groups 50 to 54, 60 to 64, and 70 to 74 years are prevalent at ages 50, 60, and 70 years, respectively, and thus could potentially be detected by a multi-cancer test offered at these ages. The prevalence at age 60, for example, is based on the number of diagnoses at ages 60 to 64 divided by the population at age 60, which we approximate using one-fifth of the population ages 60 to 64.
At any given age, screening cannot affect mortality among cancers diagnosed prior to that age. Therefore, we use SEER incidence-based mortality, which permits calculation of the probability of diagnosis at a specific age (e.g., age 50–54) and subsequent cancer-specific death over a specified time interval (11). We estimate 15-year incidence-based mortality and apply the assumed mortality reduction to cancers detected early to project expected lives saved. We restrict the time interval for mortality to 10 years in a sensitivity analysis.
For screening benefit, we consider the consensus benefit estimate of 20% mortality reduction across breast cancer screening trials (12) as an upper bound since we are modeling only single-occasion screening and the trials all evaluated serial screening. We denote the mortality reduction among women randomized to screening by |$R_A^*$| . This is not the same as the mortality reduction among women whose breast cancers were detected early by screening (denoted |${R_A}$| above). If we assume that detectability of a latent cancer is independent of the mortality risk of that cancer in the absence of screening, then we can write:
For example, if the marginal sensitivity of a test to detect breast cancer ( |$M{S_A}$| ) is 60%, then a 20% mortality reduction among women randomized to screening ( |$R_A^*$| ) corresponds to a 33% mortality reduction among women with cancers detected early by the test ( |${R_A}$| ). We adopt this formulation as a transparent device to link marginal sensitivity with deaths prevented by screening. In practice, latent tumors that are detected may be more advanced and have poorer baseline survival than those not detected, and thus this assumption may produce optimistic projections of lives saved.
Because we are modeling only single-occasion testing, we use a 10% mortality reduction associated with screening ( |$R_A^*\ = \ 0.10$| ) for each cancer, but also examine how results change under mortality reductions of 5% and 20%. For |$M{S_A}$| equal to 60%, for example, these values correspond to |${R_A}\ = \ 16\% \ ( {R_A^*\ = \ 0.10} )$| , |${R_A}\ = \ 8\% \ $| ( |$R_A^*\ = \ 0.05)$| , and |${R_A}\ = \ 33\% $||$(R_A^*\ = \ 0.20)$| . In a sensitivity analysis, we use |$R_A^*\ = \ 0.10$| for three high-prevalence cancers for which screening tests currently exist and |$R_A^*\ = \ 0.50$| for three low-prevalence cancers to explore how increasing the benefit for low-prevalence cancers impacts the relative ordering of harm-benefit measures across the six target cancers.
Online calculator
To provide direct access to the proposed framework, we created an online calculator. Users can configure a multi-cancer test with up to 50 cancers for any combination of sex, race (All races or Black), screening age (50, 60, or 70 years); input test specificity; and input test sensitivity, localization probability, and mortality reduction associated with early detection for each targeted cancer. The interface calculates the three outcomes formulated in this paper, and results can be downloaded for external analysis.
Data availability
Reproducible code used to generate all tables and figures in this article is available at Code Ocean https://codeocean.com/capsule/9054015/tree/v3.
Results
A hypothetical two-cancer test
Figure 1 shows |${\rm{EUC}}$| and |${\rm{CD}}$| for a hypothetical two-cancer test under specified diagnostic characteristics and prevalence settings. The values used for prevalence are low in keeping with the estimates derived from the SEER registry for the cancers examined in the more realistic analyses. Projected values for a wider range of settings are shown in Supplementary Table S1 (available online).
Results show that |${\rm{EUC}}$| is driven primarily by the specificity of the test. Its dependence on sensitivity is minor; its dependence on prevalence and localization accuracy (Supplementary Fig. S1) are also small. |${\rm{CD}}$| increases with the prevalence of cancer |${\rm{B}}$| and the sensitivity of the test, leading to lower |${\rm{EUC}}/{\rm{CD}}$| for higher prevalence cancers. While the results are intuitive, the message is that harm-benefit tradeoffs of multi-cancer testing depend on factors pertaining to the cancers included in addition to the characteristics of the test.
A more realistic test for up to six cancers
Table 1 gives age-specific 5-year risks of cancer diagnosis, which we use to approximate prevalence for the cancers considered. As expected, breast and lung cancer have the highest prevalence while liver cancer has the lowest. All prevalence estimates are less than 2.5%, and they are less than 0.3% for liver, ovary, and pancreatic cancers. Table 2 summarizes sensitivities and localization probabilities from Liu and colleagues (3) for these cancers. The highest marginal sensitivity is for colorectal cancer; the lowest is for liver cancer. This table also shows mortality reductions associated with early detection that correspond to 10% mortality reductions associated with screening for each target cancer.
TOO . | Age, y . | 5-year probability of diagnosis, % . | 15-year probability of death, % . |
---|---|---|---|
Breast | 50 | 1.24 | 0.32 |
60 | 1.89 | 0.66 | |
70 | 2.23 | 1.29 | |
Colorectal | 50 | 0.23 | 0.10 |
60 | 0.52 | 0.26 | |
70 | 1.04 | 0.71 | |
Liver | 50 | 0.02 | 0.01 |
60 | 0.05 | 0.04 | |
70 | 0.09 | 0.08 | |
Lung | 50 | 0.25 | 0.20 |
60 | 0.83 | 0.71 | |
70 | 1.52 | 1.35 | |
Ovary | 50 | 0.11 | 0.07 |
60 | 0.20 | 0.15 | |
70 | 0.25 | 0.21 | |
Pancreas | 50 | 0.04 | 0.03 |
60 | 0.12 | 0.10 | |
70 | 0.26 | 0.24 |
TOO . | Age, y . | 5-year probability of diagnosis, % . | 15-year probability of death, % . |
---|---|---|---|
Breast | 50 | 1.24 | 0.32 |
60 | 1.89 | 0.66 | |
70 | 2.23 | 1.29 | |
Colorectal | 50 | 0.23 | 0.10 |
60 | 0.52 | 0.26 | |
70 | 1.04 | 0.71 | |
Liver | 50 | 0.02 | 0.01 |
60 | 0.05 | 0.04 | |
70 | 0.09 | 0.08 | |
Lung | 50 | 0.25 | 0.20 |
60 | 0.83 | 0.71 | |
70 | 1.52 | 1.35 | |
Ovary | 50 | 0.11 | 0.07 |
60 | 0.20 | 0.15 | |
70 | 0.25 | 0.21 | |
Pancreas | 50 | 0.04 | 0.03 |
60 | 0.12 | 0.10 | |
70 | 0.26 | 0.24 |
TOO . | Test sensitivity, % . | Localization probability, % . | Marginal sensitivity, % . | Mortality reduction, % . |
---|---|---|---|---|
Breast | 64 | 96 | 61 | 16 |
Colorectal | 74 | 97 | 72 | 14 |
Lung | 59 | 92 | 54 | 18 |
Ovary | 67 | 96 | 64 | 16 |
Pancreas | 78 | 79 | 62 | 16 |
Liver | 68 | 72 | 49 | 20 |
TOO . | Test sensitivity, % . | Localization probability, % . | Marginal sensitivity, % . | Mortality reduction, % . |
---|---|---|---|---|
Breast | 64 | 96 | 61 | 16 |
Colorectal | 74 | 97 | 72 | 14 |
Lung | 59 | 92 | 54 | 18 |
Ovary | 67 | 96 | 64 | 16 |
Pancreas | 78 | 79 | 62 | 16 |
Liver | 68 | 72 | 49 | 20 |
Note: The implied mortality reduction among cases detected early corresponding to a 10% mortality reduction associated with single-occasion screening is also shown.
The 5 two-cancer tests (breast cancer plus one of the other cancers) are associated with similar EUC (Supplementary Table S2), but they show clear patterns of higher |${\rm{EUC}}/{\rm{CD}}\ $| when prevalence of the candidate second cancer is lower (Fig. 2). As expected, |${\rm{EUC}}/{\rm{LS\ }}$| is an order of magnitude greater than |${\rm{EUC}}/{\rm{CD}}$| . Both harm-to-benefit measures improve with screening age and are most favorable when the second cancer is lung cancer, which has the highest prevalence and mortality (except at age 50 years when |${\rm{EUC}}/{\rm{CD}}$| is most favorable for breast+colorectal cancer). These features lead to lung cancer having the highest |${\rm{CD}}$| and |${\rm{LS}}$| at ages 60 and 70 despite the marginal sensitivity for lung cancer being relatively low. On the basis of these metrics, the test for breast+lung cancer is optimal among the two-cancer tests considered at these ages. Similar patterns are observed under a more modest screening benefit. Figure 2; Supplementary Table S2 also show how |${\rm{EUC}}/{\rm{LS}}$| changes when the mortality reduction associated with screening is 5%, 10%, or 20%.
Similar patterns are observed when adding a third cancer to a breast +lung cancer test (Fig. 3; Supplementary Table S3). Once again, |${\rm{EUC}}$| is similar across tests considered, and the optimal candidate has the highest prevalence and mortality, in this case colorectal cancer. Building up to a six-cancer test in this way yields the projected outcomes shown in Table 3. At all ages |${\rm{EUC}}/{\rm{CD}}$| is below 1, indicating a favorable tradeoff of numbers exposed to unnecessary confirmation and cancers detected, likely due to the high values used for sensitivity and the relatively high prevalence of cases combined across cancers.
Screening age, y . | |${\rm{EUC}}$| , n . | |${\rm{CD}}$| , n . | |${\rm{LS}}$| , n . | |${\rm{EUC}}/{\rm{CD}}$| . | |${\rm{EUC}}/{\rm{LS}}$| . |
---|---|---|---|---|---|
50 | 10.4 | 11.7 | 0.7 | 0.9 | 14.2 |
60 | 11.0 | 22.1 | 1.9 | 0.5 | 5.7 |
70 | 11.6 | 33.1 | 3.9 | 0.4 | 3.0 |
Screening age, y . | |${\rm{EUC}}$| , n . | |${\rm{CD}}$| , n . | |${\rm{LS}}$| , n . | |${\rm{EUC}}/{\rm{CD}}$| . | |${\rm{EUC}}/{\rm{LS}}$| . |
---|---|---|---|---|---|
50 | 10.4 | 11.7 | 0.7 | 0.9 | 14.2 |
60 | 11.0 | 22.1 | 1.9 | 0.5 | 5.7 |
70 | 11.6 | 33.1 | 3.9 | 0.4 | 3.0 |
Lowering test specificity from 99% to 97% triples |${\rm{EUC}}/{\rm{LS}}$| for any two-cancer test without altering the relative ordering of this tradeoff across candidate second cancers (Supplementary Fig. S2). In contrast, improving the mortality reduction associated with screening for low-prevalence cancers from 10% to 50% alters the ordering of the two-cancer tests in terms of harm-benefit tradeoffs. Under these settings, the breast+ovary two-cancer test is associated with lower |${\rm{EUC}}/{\rm{LS}}$| than the breast+lung cancer test for 50- and 60-year-old women (Supplementary Fig. S3). Restricting attention from 15- to 10-year probabilities of cancer death (Supplementary Fig. S4) does not alter the original patterns in the harm–benefit measures.
An online calculator that permits users to configure a multi-cancer test for a range of patient populations and test characteristics for selected target cancers is available at http://mced-calculator.fredhutch.org.
Discussion
Multi-cancer tests represent a major technological advance in cancer early detection, but their impact on clinically and policy-relevant outcomes remains to be determined. In this study, we present a quantitative framework that formalizes how test performance and disease features combine to produce population harm and benefit. We use this framework to address the question of how to compose a multi-cancer test, building up from one to six cancers on the basis of harm-benefit tradeoffs. In practice, detectable signatures apply to a large number of cancers that will automatically be identifiable by the test, thus the question of how to specify a multi-cancer test may translate into whether to remove, rather than add, cancers. We note, however, that this is only a conceptual difference, as cancer signatures associated with less favorable harm-benefit tradeoffs for additional work-up can be excluded at the outset, and a bottom-up approach can be taken to configure a sustainable multi-cancer test.
A framework to clarify multi-cancer testing outcomes is necessitated by the novelty of the multi-cancer testing technology and the ways in which it alters the foundational metrics of diagnostic performance. In addition to multiple components of test sensitivity, we must also reconsider the standard definition of specificity in the setting of a multi-cancer test. In single-cancer testing, a truly negative test is taken to imply that the individual being tested does not have that cancer. In multi-cancer testing for a specified set of cancers, a truly negative test implies that the person being tested does not have one of the cancers in the target set. In practice, specificity may depend on the number of cancers in the test, with a test targeting a smaller set of cancers potentially having poorer specificity than a test targeting a larger set of cancers. However, including fewer cancers in the test may reduce the complexity of subsequent work-up and the actual number of confirmation tests performed. The number and type of cancers included must necessarily consider this tradeoff. At this point, the optimal protocol to confirm the result of a multi-cancer test is an open question.
Our analysis considers a limited set of outcome measures and approximates them based on information currently available plus a few transparent assumptions. Precise projection of any of these measures would require information about disease natural history and screening test performance in a prospective setting. This information would be also needed to explore other measures of benefit and harm, such as overdiagnosis, and to extrapolate from single-occasion to serial testing protocols.
The primary measure of harm in this study was unnecessary confirmation testing. In the absence of an established protocol for confirmation testing after a multi-cancer test, we only quantify persons exposed to unnecessary confirmation tests and not the number or cost of these tests, which may vary in terms of their invasiveness and accuracy. These features, as well as their implications for patient quality of life, should be included in a full accounting of the burden of unnecessary confirmation testing.
Our results are subject to limitations that stem mostly from the uncertainty about the natural history of different cancers and the need to use available and/or simplified inputs to fill out our harm-benefit framework. First, diagnostic performance estimates were sourced from a published multi-cancer testing study (3). These performance estimates were not prospective; they were derived from patients with and without a diagnosed cancer. In a prospective setting of a healthy population, we might expect lower sensitivity, particularly for early-stage tumors. Second, we approximated disease prevalence at the time of the test by the incidence within 5-year age groups and baseline mortality by the risk of cancer-specific death over the next 15 years among individuals diagnosed at those ages. We acknowledge that, particularly for some cancers known or suspected to have longer latencies, the underlying prevalence at the time of the test could be higher than that assumed. Conversely, for cancers with shorter latencies and poorer baseline survival, the 15-year interval for baseline mortality might be too long. Using a shorter interval for all cancers did not produce different decisions about which cancers to include, although as expected the associated harm-benefit tradeoff was less favorable. Ideally, we would want to project the prevalence of early-stage cancer at the time of the test, but this would require additional data and modeling.
We used published cancer screening trials to provide a benchmark for how disease-specific mortality might be reduced by multi-cancer early detection. These trials all included serial screening in the intervention group, while we model one-time testing. We use the observed mortality reduction as an upper bound for the mortality reduction associated with screening, which we convert to a mortality reduction among cases detected early by the test. This conversion is based on an assumption that detectability of a latent tumor is independent of its risk of disease-specific mortality. This assumption permits the lives saved to increase with the test sensitivity but will potentially produce an optimistic assessment of deaths prevented.
The extent to which multi-cancer tests might prevent disease-specific deaths is still highly uncertain and depends on how early they can reliably identify potentially fatal tumors as well as the efficacy of early treatment, which varies across cancers. The mortality benefit also will be affected by how the tests are used in practice, whether alongside or instead of existing tests, such as those for breast, colorectal, and lung cancer. By utilizing current estimates of prevalence and mortality for these cancers, we modeled multi-cancer tests used alongside existing tests. This motivated our sensitivity analysis, which assumes a less pronounced mortality benefit among currently-screened-for cancers than among non-screened-for cancers.
While our analysis is designed to address primary questions about the population impact of multi-cancer testing, it also raises many more. Beyond metrics for harm and benefit and how to reliably approximate them, questions remain about which cancers to include and how best to prioritize confirmation testing. A consensus about these matters will be needed before we can compare the different multi-cancer testing products currently under development. There are also important questions about the place of multi-cancer tests alongside established early detection modalities and how frequently the tests should be offered.
In conclusion, while multi-cancer testing may offer the possibility of screening for many cancers, its population impact depends on characteristics of both the cancers and the test as well as the potential benefit of early detection. The primary goal of this study has been to clarify in a quantitative framework the link between these features and clinically relevant population health outcomes. A key lesson from previous population screening tests is that the consequences of any early detection approach go far beyond test performance. Much more work is needed to determine harm–benefit tradeoffs of multi-cancer tests, as well as whether and how to deploy multi-cancer tests to reduce the population cancer burden.
Authors' Disclosures
R. Gulati reports grants from NCI during the conduct of the study. R. Etzioni reports other support from Seno Medical and personal fees from Grail Inc. outside the submitted work. No disclosures were reported by the other authors.
Authors' Contributions
B. Jiao: Conceptualization, software, formal analysis, investigation, methodology, writing–original draft, writing–review and editing. R. Gulati: Data curation, software, formal analysis, investigation, visualization, methodology, project administration, writing–review and editing. H.A. Katki: Investigation, methodology, writing–review and editing. P.E. Castle: Investigation, methodology, writing–review and editing. R. Etzioni: Conceptualization, supervision, investigation, methodology, writing–original draft, writing–review and editing.
Acknowledgments
We acknowledge helpful comments on previous drafts from Drs. Noel S. Weiss and Scott D. Ramsey. This work was supported in part by the NCI at the NIH (grant number R50 CA221836, to R. Gulati) and the Rosalie and Harold Rea Brown Endowed Chair (to R. Etzioni).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Note: Supplementary data for this article are available at Cancer Epidemiology, Biomarkers & Prevention Online (http://cebp.aacrjournals.org/).