Abstract
More than 75% of cancer-related deaths occur from cancers for which we do not screen. New screening liquid biopsies may help fill these clinical gaps, although evidence of benefit still needs to be assessed. Which lessons can we learn from previous efforts to guide those of the future? Screening trials for ovarian, prostate, pancreatic, and esophageal cancers are revisited to assess the evidence, which has been limited by small effect sizes, short duration of early-stage disease relative to screening frequency, study design, and confounding factors. Randomized controlled trials (RCT) to show mortality reduction have required millions of screening-years, two-decade durations, and been susceptible to external confounding. Future RCTs with late-stage incidence as a surrogate endpoint could substantially reduce these challenges, and clinical studies demonstrating safety and effectiveness of screening in high-risk populations may enable extrapolation to broader average-risk populations. Multicancer early detection tests provide an opportunity to advance these practical study designs. Conditional approvals based on RCTs with surrogate endpoints, contingent upon real world evidence generation and continuation of trials to definitive endpoints, may lower practical barriers to innovation in cancer screening and enable greater progress.
Introduction
Cancer therapeutics have been revolutionized by insights gained from modern biology, including the development of targeted therapies, immunotherapy, CAR T cells, and antibody-drug conjugates (1–3). However, cancer remains a leading cause of death in Americans, in large part due to persistence of late-stage diagnoses (4). Late-stage cancer is associated with poorer survival, poorer patient care experiences and higher treatment morbidity, and costs patients twice as much as localized disease (5–8). Only a few tests are recommended by the US Preventive Services Task Force (USPSTF) to promote detection of cancer at early stage; over 75% of cancer deaths are due to cancers without population-level screening (9). Current USPSTF recommendations include low-dose CT (LDCT) for lung cancer (in heavy smokers), mammography for breast cancer, colonoscopy/fecal immunochemical testing (FIT)/sDNA-FIT for colorectal cancer, and Papanicolaou screening for cervical cancer (10). The only approved cancer screen recommended for average-risk American men is for colorectal cancer. Unfortunately, even available recommended screening suffers from low utilization. Only 42% of screen-eligible women and 29% of screen-eligible men are up-to-date on recommended screens (10). Under 5% of people who qualify for LDCT screening receive screening (11, 12). In addition, 70% of lung cancers occur in people who do not qualify, although this number will be reduced by recent expansion of LDCT screening to smokers with fewer pack years (11–13).
Why is there less research emphasis placed on cancer screening compared with new cancer therapeutics? Many people do not sense urgency in early detection (EDx)/prevention, because they may not see immediate personal gain (14). Funding for research and economic incentives for EDx are also limited: under 10% of the NCI's budget is allocated to cancer control; the majority funds drug development (15). In the private sector, oncology therapeutics are most profitable, earning $123 billion in 2018 (16), but investment hasn't focused on EDx due to massive costs of large cohort randomized controlled trials (RCT) required to show a cancer-specific mortality reduction. Moreover, physicians who treat cancer (medical, radiation, and surgical oncologists) are compensated far more than primary care physicians tasked with cancer EDx/prevention (17).
The value of EDx for certain cancers has also been the topic of historic controversies (18, 19). Lead-time and length-time bias in observational studies confound conclusions about the impact of screening on survival outcomes (20, 21). These biases are eliminated by RCTs where survival is measured from time at randomization; however, RCTs to establish a mortality reduction require more than half a million patient-screening years and require 1 to 2 decades to complete [e.g., UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS); ref. 22], creating high barriers to entry of cost and time. Finally, effective EDx tools have been difficult to develop for most cancers. These include cancers for which overdiagnosis and overtreatment are major concerns (e.g., prostate; ref. 23, 24) as well as cancers that progress rapidly (e.g., pancreas). We reconsider the available data on EDx efforts in ovarian, prostate, pancreatic, and esophageal cancers. These diseases represent the extremes of indolence (prostate) or aggressiveness (esophageal, ovarian, and pancreatic) have suffered from limited or inconsistent research conclusions (prostate screening trials), have significant differences in treatment morbidity at early versus late stage (esophageal), and have large unmet needs in terms of screening.
Lessons from RCTs for Cancer Screening
Ovarian cancer
Of the 22,530 American women diagnosed with ovarian cancer in 2019, only 15% were diagnosed with localized disease (25, 26). Five-year overall survival (OS) was 29.2% at advanced stage and 92.4% at local stage (26). There is no population-level screen for ovarian cancer. Two large RCTs, the UKCTOCS and the Prostate, Lung, Colorectal, and Ovarian (PLCO), were performed to assess ovarian cancer screening in average-risk women, initially arriving at divergent and later convergent conclusions of no effect after two decades of study.
PLCO screened 78,216 women from 1993 to 2001 (27). PLCO randomized patients to no screening or annual screening with CA-125 (>35 U/mL) and transvaginal ultrasound (TVUS). Follow-up of any positive test was referral to a gynecologist; the primary endpoint was ovarian cancer mortality. Although the majority of cases of ovarian cancer were detected by screening, screening did not cause a stage shift and did not reduce ovarian cancer mortality (RR = 1.21; 95% confidence interval (CI), 0.99–1.48). Retrospective reviews of this trial pointed out that: (i) 15% to 27% of women in the screening cohort did not receive annual screening, and (ii) many women identified with ovarian cancer in the screening cohort developed it multiple years after screening ended and should not have been counted as positive endpoints (29). Furthermore, the lack of follow-up protocol for gynecologists and relatively poor test characteristics of simultaneous single threshold CA-125 and TVUS [positive predictive value (PPV) ∼1%; ref. 28] may have led to delays in interventions that would obviate any potential advantage of EDx.
The UKCTOCS randomized 202,638 women to annual screening, exceeding a half a million women-years of screening, or standard of care from 2001 to 2011 with follow-up through 2020 (22). Screening involved annual TVUS or annual multimodal screening (MMS), which detected significant rises above each woman's personalized CA-125 baseline (29). UKCTOCS reported a 10% reduction in incidence of late-stage ovarian cancer, which was not sufficient for an ovarian cancer mortality reduction. In comparison with the global CA-125 cut-off value in PLCO, the personalized algorithm for CA-125 interpretation in UKCTOCS generated a higher PPV (over 20%; ref. 30–33). There was also a significant benefit to screening incident versus prevalent cancers in the initial analysis. Despite the enormous amount of effort, time (20 years from randomization to final publication), and multiple preliminary EDx trials required to develop and then test this approach, the results were negative. This experience highlights the necessity of more rapid development and execution of EDx research and RCTs.
In comparison, in two prospective single-arm screening trials for ovarian cancer in high-risk cohorts undergoing testing every 3 to 4 months (increasing the chances of detection in early stage), there was a much larger 40% late-stage reduction (31, 33). No RCTs have been conducted in the high-risk population because randomization to a no-screening arm was ethically questionable, and mortality endpoints require too large a cohort for a high-risk population. On the basis of a review of mammography trials (34), a late-stage reduction greater than 20% is required for a significant mortality reduction. Whether this link can be extrapolated to other cancers remains to be seen, in particular to ovarian cancer for the UKFOCSS and CGN/GOG trials (31, 33).
Prostate cancer
One in 9 American men are diagnosed with prostate cancer per lifetime and 1 in 41 dies from it (9). In 1994, the FDA approved PSA screening annually in men aged 55 to 69 years (35). From 2003 to 2012, PSA screening became common, and the proportion of men diagnosed at metastatic stage decreased by 80%; prostate cancer mortality decreased by 42% (36). Most (45%–70%) of that mortality benefit has been attributed to PSA screening (35, 36) despite its poor performance characteristics (PSA > 4: sensitivity = 21%, specificity = 91%; ref. 37, 38).
Despite these positive trends, there was growing concern about overdiagnosis—only 25% of prostate biopsies result in cancer diagnosis (39). Furthermore, prostate needle biopsies are routinely performed without imaging and may miss cancers or not capture the most aggressive lesions, leading to undertreatment (39). While uncommon, complications of transrectal biopsies include infections and hematospermia (39); prostate cancer treatments themselves may cause urinary incontinence or sexual impotence, which is of greatest concern in men with indolent tumors (40, 41).
To assess the value of PSA screening, PLCO enrolled 76,693 men aged 55 to 74 years between 1993 and 2001 (42). Men were randomized to usual care or annual PSA screening for six years and digital rectal examination (DRE) for 4 years. The primary endpoint was prostate cancer–specific mortality. More men in the screened group were diagnosed with prostate cancer (RR = 1.12; 95% CI, 1.07–1.17) but there was no mortality benefit after 13 years of follow-up (RR = 1.09; 95% CI, 0.87–1.36; 43). Because of screening contamination outside the trial, the average number of PSA tests among the controls was three, compared with five in the screening group (42). Therefore, PLCO found that “five versus three PSA checks in six years” does not impact mortality.
The European Randomized Study of Screening for Prostate Cancer (ERSPC) trial screened 162,387 men in the 1990s in 7 European countries (44). Study eligibility criteria and methods differed between countries. Men were randomized to PSA testing with or without DRE every 4 years in most countries (8). The primary endpoint was prostate cancer mortality. PSA screening reduced prostate cancer mortality by 20% (RR = 0.79; 95% CI, 0.69–0.91). The number needed to screen (NNS) to prevent one cancer-specific death was 781 (45). Notably, there was a large variance across sites: there was no mortality benefit in Finland, whereas there was a 42% cancer-specific relative mortality reduction in Sweden. Overall, the risk of metastases was 30% lower among screened men (45).
The UK Cluster Randomized Trial of PSA Testing for prostate cancer (CAP) randomized 419,582 men aged 50 to 69 years from 2001 to 2009 (46). CAP studied the effect of a single invitation for a one-time PSA test. Men with PSA >3.0 ng/mL were offered a biopsy. Only 36% of invited men underwent screening, and no mortality benefit was observed in 10 years of follow-up (RR = 0.96; 95% CI, 0.85–1.08; ref. 46).
On the basis of the results of PLCO, the USPSTF gave PSA screening a D grade in 2012. From 2012 to 2018, US incidence of metastatic prostate cancer and prostate cancer mortality increased (35). Furthermore, a longer follow-up of ERSPC confirmed the 20% mortality reduction with greater absolute reduction in metastatic prostate cancer among screened men (47). Evidence supporting active surveillance rather than surgery/radiation for low-risk cancers subsequently mitigated the harms of overdiagnosis (47). In 2018, the USPSTF changed PSA screening to grade C (shared decision-making; ref. 35). It is too soon to assess the impact of this change, but subtle differences in screening usage and clinical outcomes by race may be emerging (48).
PLCO, ERSPC, and CAP have not provided clear guidance about PSA screening. The suggestion of some studies that PSA screening reduces mortality is particularly remarkable given the poor performance characteristics of PSA (49). Tests with higher specificity for aggressive PCs are needed and efforts to filter PSA positives with other tools are underway. For example, multi-parametric MRI (50), new biomarkers (4-Kallikrein, PCA3, Free PSA, and PSA velocity; ref. 51, 52), and genomic markers may predict tumor aggressiveness and aid in biopsy decisions to reduce the frequency of over-biopsy/overtreatment (63).
Overall, contemporary screening for prostate cancer is significantly different from the eras in which the PLCO, ERSPC, and CAP trials enrolled, teaching us that screening RCTs with mortality endpoints may be expensive, prolonged over decades, and outdated by the time of publication. For example, the use of PSA testing in conjunction with prostate MRI to reduce over-biopsy and overtreatment may result in a different cost: benefit ratio than indicated by the original trials based on PSA testing alone. Finally, inclusion of Black men in PCLO was 4% (ref. 42; not reported in ERSPC/CAP), although prostate cancer mortality is two to three times higher in Black than in white men (53). It is possible that prostate cancer screening would be of clear benefit in the Black community or other populations at elevated prostate cancer risk, but no RCT has tested this to date.
Cancers without Screening RCTs
RCTs testing the effect of EDx on cancer mortality do not exist for most types of cancer. Outcomes other than mortality, such as reduction in late stage incidence, candidacy for curative treatment (surgery, radiation) or treatment morbidity at early versus late stage, may be used to guide thinking on the value of EDx in these types. Esophageal cancer and pancreatic cancer are good examples of cancers, which lack screening RCTs in average-risk adults, but detection in early stage: (i) improves survival in high-risk individuals, and (ii) is causally linked to candidacy for curative interventions or interventions with lower morbidity.
Esophageal cancer
Annually, about 18,440 Americans are diagnosed with esophageal cancer; approximately 16,170 die (9). The 5-year OS at localized stage is 47% and at advanced stage is 5% (9, 54). There is no screening recommended for average-risk people (10). Esophageal adenocarcinoma is more common in the US (55) due to the prevalence of gastroesophageal reflux disease and obesity (56–58). Squamous cell esophageal cancer has a strong association with tobacco, alcohol, malnutrition, and Human Papilloma Virus, and is more common in Black patients (55). Survival is known to be superior at early stages regardless of tissue subtype (9), although assessment of the survival benefit is confounded by lead-time bias. Which other endpoints might we consider to understand the value of esophageal cancer EDx?
Quality of life may be severely compromised by esophageal cancer treatments, even if an individual is cured of the cancer through radiation/surgery (59). In precancerous lesions/stage 1a esophageal cancer, endoscopic therapies can be used to spare the esophagus (60). Esophageal cancer typically remains asymptomatic until the esophageal lumen has decreased to 13 mm (about half its baseline; ref. 61); this is already beyond the scope of esophagus-sparing ablation. Hence, without systematic screening, it is difficult to detect early lesions that are amenable to esophagus-sparing approaches. The potential for curative and less-morbid therapy at early stages of esophageal cancer is reflected in small clinical studies.
In one prospective clinical study of 24 villages in China with high esophageal cancer incidence (62), residents of 14 northern villages were screened with a one-time endoscopy and residents of 10 southern villages were unscreened. Detected preneoplastic lesions/cancers were resected. Over a 10-year follow up, there was a reduction in squamous cell cancer incidence (4.2% vs. 5.9%) and death (3.4% vs. 5.1%). This study was not randomized, blinded, or balanced for potential cultural differences (62). A population-wide case–control study was also undertaken in a Chinese province where a one-time endoscopy screening program had been in place for 10 years (63). Cases (n = 253) were individuals who died of esophageal cancer; controls (n = 759) were age- and gender-matched residents from the same area who had not died of esophageal cancer. In comparison with never-screened subjects, there was a 47% reduction in esophageal cancer mortality attributed to screening (63).
In the US, 10% of patients with reflux have Barrett's esophagus, which transforms to esophageal cancer at a rate of ∼0.5% annually (64, 65). Endoscopic screening in these patients is recommended by multiple societies but not by USPSTF (65, 66). Patients enrolled in Barrett's screening programs are diagnosed at earlier stages than those diagnosed with esophageal cancer without screening, making it possible for more screened patients to be treated with esophageal-sparing treatment (67). However, about 80% to 90% of patients with esophageal cancer have no known history of Barrett's (68), limiting the impact of this high-risk surveillance program.
Overall, esophageal cancer is an example of a disease where RCTs on screening do not exist, but other relevant causal relationships may be drawn between EDx and “candidacy for esophageal-sparing treatment.” Consideration of non-randomized clinical studies builds support for the benefit of early stage detection.
Pancreatic cancer
An additional obstacle in improving our repertoire of cancer screening is justifying the costs versus benefits of screening interventions for less common cancers, even when we know EDx improves outcomes. A good example is pancreatic cancer. Annually, 57,600 patients are diagnosed with pancreatic adenocarcinoma (PA) in the US and 47,050 die, making it the third leading cause of cancer death (9). The 5-year OS in patients is 83.7% in patients at earliest stage (1a; ref. 69) but only 3% in patients at advanced stage (9). For PA, medical therapy can prolong survival but cure requires definitive surgery (70). About 80% to 90% of PAs are diagnosed at incurable stages (71); 52% of patients present with metastases (72). Lymph node involvement or vascular encasement are strong negative predictors of survival as they decrease the chance of complete resection (73). Detection before these features develop makes definitive resection possible (9, 70).
USPSTF recommends against screening average-risk adults for PA using current tools (10), but national screening programs for PA in high-risk populations are present in Germany, Sweden, Spain, Canada, and many other countries (74). In the US, annual MRI screening for PA in individuals with lifetime risk >5% is thought to be cost-effective (75) and is recommended by multiple societies (76). Some suggest this risk threshold be lowered (77). Surveillance is targeted at patients with inherited risk (e.g., inherited genetic mutations including BRCA1/2 or a family history of PA), or pancreatic cystic lesions, typically detected incidentally (78, 79). In small studies of patients at elevated genetic risk, annual screening for PA (using endoscopic ultrasound/MRI/CT) detected PAs at curable stages (74, 80), suggesting that PA outcomes in high-risk individuals could be improved with screening. However, over 90% of PAs occur in patients without known increased risk beforehand (81).
The Power of Aggregate Prevalence
PA and esophageal cancer are examples of cancers in which most patients are diagnosed at advanced stage (73). Surveillance programs exist for patients at known high risk and these interventions seem to improve survival (62, 63, 80). However, >90% of PA/esophageal cancer diagnoses occur in patients who had no indicators of elevated risk (81, 83), so the impact of the single-cancer surveillance programs for high-risk adults is limited by the lack of population-level risk stratification.
This conundrum highlights the fact that, although cancer is common, individual types of cancer are relatively rare (84). For example, the NNS using a “perfect” test to find one esophageal cancer among average Americans aged 50 to 80 is 1,000, whereas to find any cancer, the NNS is only 33 (84). Thus, the NNS for low incidence cancers is high and PPV (which depends on prevalence) is low, making the cost–benefit ratio unfavorable. This results in restriction of promising EDx tools to the minority of people who have predetermined high risk to increase “disease prevalence’ in the tested population or to higher incidence cancers (lung, breast, colon), although the majority of people who would have benefited from screening for the low incidence cancers belong to the general population.
A single test that detects multiple cancers would benefit from the aggregate prevalence of all target cancers combined, which would increase PPV, decrease NNS (84), and offer a cost-effective approach to low incidence cancers, which are difficult and costly to screen for one-by-one (84). Furthermore, a multicancer screening test is in line with most individuals’ desires to avoid death due to any cancer, not just a specific cancer.
Multicancer EDx Liquid Biopsy
There has been rapid recent scientific development by academic groups and numerous companies of blood-based liquid biopsies for cancer screening which target circulating cancer materials. Tumors are known to shed DNA and other cellular material into the bloodstream (85–87), enabling detection of early stage cancers across multiple types, with varying sensitivity and specificity for different types (88). For example, aberrant global and oncogene-specific hypomethylation and focal hypermethylation on tumor suppressors (89) has been leveraged by the biotechnology company GRAIL to design a multicancer screening blood test (90). This test is designed to increase the overall cancer detection rate (across all types) rather than to maximize sensitivity for any individual cancer. In early studies, the Galleri test (Grail, Inc) preferentially detects more aggressive tumors compared with indolent ones, as highly proliferative tumors shed more methylated DNA (91). Although these data are so far preliminary, this feature of the test would theoretically address the problem of overdiagnosis of indolent cancers, although the benefit may be balanced by aggressive tumors having a shorter duration in early-stage disease. It is possible for aggressive “interval cancers” to be detected at late stage even with annual testing.
Another company, Thrive, uses mutations in circulating tumor DNA and protein biomarkers to enable multi-cancer EDx through its CancerSEEK test, which does not currently specify tissue of origin signal (92). Following a positive CancerSEEK blood test, physicians can use a PET-CT to localize a cancer signal. Tumors detected using this approach include a spectrum of aggressive (lung) and indolent (thyroid) types (92). Clinical follow-up of a positive MCED test requires significant research on how to reduce the multiplicity of harms from false positives (e.g., full-body PET-CT is unlikely to be a generally acceptable follow-up).
Many other multicancer EDx technologies are in preclinical development stages. These technologies are based on circulating tumor DNA fragment length (Delfi; ref. 93), multi-omics (Freenome, which has clinical studies on colorectal cancer detection; ref. 94), and other circulating tumor materials (AnchorDx, ArcherDx, Foundation Medicine, Guardant Health, Laboratory for Advanced Medicine, and Singlera Genomics).
Lessons Learned and Next Steps
Large RCTs with long follow up, costing many hundreds of millions of dollars each, may not give clear answers about cancer screening interventions (27, 28, 43, 46, 47, 95). In part, conservative over-reliance on mortality endpoints in RCTs at the exclusion of earlier endpoints and real-world evidence has held back progress in cancer screening. How can we learn from historical experiences in RCTs for cancer screening to improve assessment of MCED tests in the future?
Retrospective analysis of RCTs suggest certain surrogate outcomes, such as incidence of late stage cancer, could be used to predict mortality and shorten the length of RCTs (96) for approvals. Lessons from mammography trials suggest mortality reduction occurs when late-stage incidence is reduced by at least 20% (6, 97). Conditional approvals of screening interventions based on surrogate endpoints may be considered, contingent upon follow-up to mortality endpoints and real-world evidence of efficacy post-implementation (5). The use of surrogate endpoints as early indicators of a screening intervention's efficacy may promote innovation in the cancer screening space by reducing “time to reward” while longer follow-ups ensure ascertainment of cancer-specific mortality benefits and ensure time-limited impact of false positive conclusions.
To determine if a multicancer screening test's benefits exceed harms, some investigators have considered combining outcomes across cancer types in an RCT. Combining benefits, such as number of life-years saved from all-cancer deaths, may be reasonable. Combining harms may be more challenging as a breast biopsy, a lung biopsy, and a laparotomy for false positives have very disparate morbidities. Methods for combining safety results across cancer types are likely to reach consensus only after considerable research and clinical experience in this new field. For example, it is possible that MCED tests reduce all-cancer mortality overall but that the ratio of costs to benefits is only favorable in specific reported tissues of origin due to higher likelihoods or costs of false positive reports in specific cancer types. Clinical trials testing MCED tests may have to be powered to allow for stratified analysis of individual types of cancer, perhaps ultimately resulting in selective censoring of positive reports for certain tissues of origin where there is no improvement in clinical outcomes as a result of screening, or where costs outweigh benefits. Such a practical approach may enable access to the benefits of MCED while restricting its potential harms.
Signaling the rapid growth of next-generation cancer screening technologies, a federal bill was introduced (House of Representatives bill 8845, Senate bill 5051) in December 2020 suggesting Medicare/Medicaid reimbursement of MCEDs contingent upon FDA approval. Traditionally, after FDA approval, USPSTF assessment and recommendation is required prior to Center for Medicare/Medicaid coverage; this bill would remove the necessity for USPSTF review prior to coverage, which may expedite patient access but also increase the risk of premature implementation (98).
It is likely that multiple breakthrough MCED technologies will emerge in the coming years. Assessment metrics used in the past for single cancers should be adapted to employ earlier surrogate endpoints—without earlier endpoints, advancement in sequencing technology and algorithms used in MCED liquid biopsy will outpace the one to two decades required to assess mortality reduction, making the trials obsolete by the time they report. Due to the high aggregate prevalence of all cancers, this shift in paradigm from single-cancer screening has the potential to be a practical and affordable supplement to approved USPSTF screens. Next-generation MCEDs may shift the landscape of stage at diagnosis for cancers if follow-up procedures can be adequately implemented and harms due to false positives adequately limited.
Authors' Disclosures
S. Raoof reports personal fees from GRAIL outside the submitted work. R.J. Lee reports grants from Janssen; personal fees from Janssen, Bayer, Exelixis, Dendreon, Blue Earth; and personal fees from GE Healthcare outside the submitted work. S.J. Skates reports grants from NCI during the conduct of the study; other support from SISCAPA Assay Technologies; personal fees from Guardant Health; grants from Mercy BioAnalytics and Freenome; and personal fees from Grail outside the submitted work; in addition, S.J. Skates has a patent for ROC method for early detection of disease licensed to Abcodia. No disclosures were reported by the other authors.